User-Based recommendation algorithm on Hadoop cluster
Type of DegreeMaster's Thesis
MetadataShow full item record
Recommender systems apply knowledge discovery techniques to the problem of making personalized product recommendations using customers usage pattern. Systems like the k- nearest neighbors and neighborhood-based collaborative filtering are achieving widespread success in E-commerce nowadays. The tremendous growth of customers and products in recent years poses some key challenges for recommender systems. They are producing high quality recommendations and performing many recommendations per second, for millions of customers and products. New recommender systems technologies are needed to quickly produce high quality recommendations, even for very large-scale problems. A sequential implementation of deriving recommendations for a large user base has severe performance issues. We address performance issues by implementing the algorithm using Hadoop Map-Reduce framework combined with similarity based collaborative filtering techniques. Map-Reduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Similarity among user pairs is computed for all the users using map/reduce programming approach. Using the similarity ranking of each item is computed. The ratings of the users with highest degree of similarity produce higher ranking and are given more priority for recommendation. Map jobs will gather the compute recommendation ratings for all products on multiple nodes and reduce jobs combine the results from all the nodes to form the effective recommendation list. In this study, we will focus on user-based collaborative filtering methods, which are well known techniques used in recommender systems using Hadoop Map-Reduce. Neighbourhood-based collaborative filtering methods are user-based and item based, meaning user preferences are inferred solely from what items they and other users in the dataset have interacted with. Experiments prove that the implementation of algorithm on Hadoop has higher performance with the increase in number of data nodes when compared to the results of implementation in a single node.