In this tutorial I’ll show you building a movie recommendation service with Apache Spark. Two users are alike if they rated a product similarly. For example, if Alice rated a book 3/5 and Bob also rated the same book 3.3/5 they are very much alike. Now if Bob buys another book and rates it 4/5 we should suggest that book to Alice, that’s what a recommender system does. See references...

## GraphFrames PySpark Example : Learn Data Science

In this post, GraphFrames PySpark example is discussed with shortest path problem. GraphFrames is a Spark package that allows DataFrame-based graphs in Saprk. Spark version 1.6.2 is considered for all examples. Including the package with PySaprk shell : pyspark --packages graphframes:graphframes:0.1.0-spark1.6 Code: from pyspark import SparkContext from pyspark.sql import SQLContext sc =...

## Logistic Regression with Spark : Learn Data Science

Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0). from pyspark import SparkContext from pyspark import SparkContext import numpy as np...

## k-Means Clustering Spark Tutorial : Learn Data Science

k-Means clustering with Spark is easy to understand. MLlib comes bundled with k-Means implementation (KMeans) which can be imported from pyspark.mllib.clustering package. Here is a very simple example of clustering data with height and weight attributes. Arguments to KMeans.train: k is the number of desired clusters maxIterations is the maximum number of iterations to run. runs is the number of...