Logistic Regression with Spark : Learn Data Science

L

Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0).

from pyspark import SparkContext
from pyspark import SparkContext
import numpy as np
from numpy import array
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.classification import LogisticRegressionWithLBFGS

sc = SparkContext ()

def createLabeledPoints(label, points):
    return LabeledPoint(label, points)

studyHours = [
 [ 0, [0.5]],
 [ 0, [0.75]],
 [ 0, [1.0]],
 [ 0, [1.25]],
 [ 0, [1.5]],
 [ 0, [1.75]],
 [ 1, [1.75]],
 [ 0, [2.0]],
 [ 1, [2.25]],
 [ 0, [2.5]],
 [ 1, [2.75]],
 [ 0, [3.0]],
 [ 1, [3.25]],
 [ 0, [3.5]],
 [ 1, [4.0]],
 [ 1, [4.25]],
 [ 1, [4.5]],
 [ 1, [4.75]],
 [ 1, [5.0]],
 [ 1, [5.5]]
]

data = []

for x, y in studyHours:
	data.append(createLabeledPoints(x, y))

model = LogisticRegressionWithLBFGS.train( sc.parallelize(data) )

print (model)

print (model.predict([1]))

Output:

[email protected]:$ spark-submit regression-mllib.py
(weights=[0.215546777333], intercept=0.0)
1

References:

  1. Logistic Regression – Wikipedia.org
  2. See other posts in Learn Data Science

Add Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe

Follow me on