Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0).

from pyspark import SparkContext from pyspark import SparkContext import numpy as np from numpy import array from pyspark.mllib.regression import LabeledPoint from pyspark.mllib.classification import LogisticRegressionWithLBFGS sc = SparkContext () def createLabeledPoints(label, points): return LabeledPoint(label, points) studyHours = [ [ 0, [0.5]], [ 0, [0.75]], [ 0, [1.0]], [ 0, [1.25]], [ 0, [1.5]], [ 0, [1.75]], [ 1, [1.75]], [ 0, [2.0]], [ 1, [2.25]], [ 0, [2.5]], [ 1, [2.75]], [ 0, [3.0]], [ 1, [3.25]], [ 0, [3.5]], [ 1, [4.0]], [ 1, [4.25]], [ 1, [4.5]], [ 1, [4.75]], [ 1, [5.0]], [ 1, [5.5]] ] data = [] for x, y in studyHours: data.append(createLabeledPoints(x, y)) model = LogisticRegressionWithLBFGS.train( sc.parallelize(data) ) print (model) print (model.predict([1])) |

**Output:**

[email protected]:$ spark-submit regression-mllib.py (weights=[0.215546777333], intercept=0.0) 1

**References:**

## Comments