This Is AuburnElectronic Theses and Dissertations

Item-Based Recommendation Algorithm Using Hadoop

Date

2015-12-16

Author

Somani, Chetan Prakash

Type of Degree

Master's Thesis

Department

Computer Science

Abstract

Recommendation systems are used to provide solutions to the problem of making personalized recommendations. They are achieving widespread success in E-commerce, social networking and advertisements. In E-Commerce, day-to-day growth of customers and products poses key challenges for recommendation systems as they are responsible for producing high quality recommendations. In addition, they are even needed to perform many recommendations per second, for millions of customers and products. Hence, new recommendation system technologies are required to produce high quality personalized recommendations. Implementing a recommendation algorithm using a sequential approach for a large dataset has large performance issues. We address the performance issues by implementing a parallel algorithm to derive recommendations by using Hadoop map-reduce framework along with an item-based similarity collaborative filtering technique. Map-Reduce is a programming framework used for processing and generating large datasets. In map-reduce framework, input and output is represented in terms of key-value pairs. Users specify a map function that processes a key-value pair to generate a set of intermediate key-value pairs, and a reduce function merges all intermediate key-value pairs associated with the similar intermediate key to produce final output key-value pair. Similarity among item pairs followed by deriving recommendations is computed using map-reduce programming approach. Similarity among item pairs is configured by finding out the similarity ranking which is calculated by using different similarity measures. The item ratings with highest degree of similarity with any given item are given highest priority for recommendation for the given item. Map jobs are responsible for gathering the information from the input dataset and then, compute relationship among items on multiple nodes in parallel to generate item pairs and reduce jobs combine all item pairs from all nodes to generate the recommendation list by computing the similarity among the items using similarity measurement techniques. In this study, we will focus on item-based collaborative filtering technique, which is a well known technique used in recommendation systems using Hadoop map-reduce framework. They are two collaborative filtering methods: User-based and Item-based collaborative filtering methods. User-based collaborative filtering focus on computing relationship among users i.e. they find out how similar two users are and based on their similarity recommendations are made. Item-based collaborative filtering focus on computing relationship among items i.e. they find out how similar two items are and based on their similarity recommendations are made. Experiments prove that the implementation of item based collaborative recommendation algorithm on Hadoop using map-reduce framework has higher degree of performance with the increase in number of nodes within a cluster when compared to the results of implementation in a single node cluster.