Logistic Regression with Spark : Learn Data Science

Logistic Regression with Spark

Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0).

from pyspark import SparkContext
from pyspark import SparkContext
import numpy as np
from numpy import array
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.classification import LogisticRegressionWithLBFGS
 
sc = SparkContext ()
 
def createLabeledPoints(label, points):
    return LabeledPoint(label, points)
 
studyHours = [
 [ 0, [0.5]],
 [ 0, [0.75]],
 [ 0, [1.0]],
 [ 0, [1.25]],
 [ 0, [1.5]],
 [ 0, [1.75]],
 [ 1, [1.75]],
 [ 0, [2.0]],
 [ 1, [2.25]],
 [ 0, [2.5]],
 [ 1, [2.75]],
 [ 0, [3.0]],
 [ 1, [3.25]],
 [ 0, [3.5]],
 [ 1, [4.0]],
 [ 1, [4.25]],
 [ 1, [4.5]],
 [ 1, [4.75]],
 [ 1, [5.0]],
 [ 1, [5.5]]
]
 
data = []
 
for x, y in studyHours:
	data.append(createLabeledPoints(x, y))
 
model = LogisticRegressionWithLBFGS.train( sc.parallelize(data) )
 
print (model)
 
print (model.predict([1]))

Output:

[email protected]:$ spark-submit regression-mllib.py
(weights=[0.215546777333], intercept=0.0)
1

References:

  1. Logistic Regression – Wikipedia.org
  2. See other posts in Learn Data Science

Comments

avatar
  Subscribe  
Notify of