Building a Movie Recommendation Service with Apache Spark
In this tutorial I’ll show you building a movie recommendation service with Apache Spark. Two users are alike if they rated a product similarly. For example, if Alice rated a book 3/5 and Bob also rated the same book 3.3/5 they are very much alike. Now if Bob buys another book and rates it 4/5 we should suggest that book to Alice, that’s what a recommender system does. See references if you want to know more about how recommender systems work. We are going to use Alternating Least Squares method from MLLib, and MovieLens 100K dataset which is only 5 MB in size. Download the dataset from https://grouplens.org/datasets/movielens/. Code :
from pyspark.mllib.recommendation import ALS,MatrixFactorizationModel, Rating
from pyspark import SparkContext
sc = SparkContext ()
#Replace filepath with appropriate data
movielens = sc.textFile(“filepath/u.data”)
movielens.first() #u’196\t242\t3\t881250949’
movielens.count() #100000
#Clean up the data by splitting it,
#movielens readme says the data is split by tabs and
#is user product rating timestamp
clean_data = movielens.map(lambda x:x.split(‘\t’))
#We’ll need to map the movielens data to a Ratings object
#A Ratings object is made up of (user, item, rating)
mls = movielens.map(lambda l: l.split(‘\t’))
ratings = mls.map(lambda x: Rating(int(x[0]),\
int(x[1]), float(x[2])))
#Setting up the parameters for ALS
rank = 5 # Latent Factors to be made
numIterations = 10 # Times to repeat process
#Need a training and test set, test set is not used in this example.
train, test = ratings.randomSplit([0.7,0.3],7856)
#Create the model on the training data
model = ALS.train(train, rank, numIterations)
For Product X, Find N Users to Sell To
model.recommendUsers(242,100)
For User Y Find N Products to Promote
model.recommendProducts(196,10)
#Predict Single Product for Single User
model.predict(196, 242)
References:
Building a Movie Recommendation Service with Apache Spark
http://idevji.com/building-movie-recommendation-service-apache-spark.html