# Logistic Regression with Spark : Learn Data Science

L

Logistic regression with Spark is achieved using MLlib. Logistic regression returns binary class labels that is “0” or “1”. In this example, we consider a data set that consists only one variable “study hours” and class label is whether the student passed (1) or not passed (0).

```from pyspark import SparkContext
from pyspark import SparkContext
import numpy as np
from numpy import array
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.classification import LogisticRegressionWithLBFGS

sc = SparkContext ()

def createLabeledPoints(label, points):
return LabeledPoint(label, points)

studyHours = [
[ 0, [0.5]],
[ 0, [0.75]],
[ 0, [1.0]],
[ 0, [1.25]],
[ 0, [1.5]],
[ 0, [1.75]],
[ 1, [1.75]],
[ 0, [2.0]],
[ 1, [2.25]],
[ 0, [2.5]],
[ 1, [2.75]],
[ 0, [3.0]],
[ 1, [3.25]],
[ 0, [3.5]],
[ 1, [4.0]],
[ 1, [4.25]],
[ 1, [4.5]],
[ 1, [4.75]],
[ 1, [5.0]],
[ 1, [5.5]]
]

data = []

for x, y in studyHours:
data.append(createLabeledPoints(x, y))

model = LogisticRegressionWithLBFGS.train( sc.parallelize(data) )

print (model)

print (model.predict([1]))
```

Output:

```[email protected]:\$ spark-submit regression-mllib.py
(weights=[0.215546777333], intercept=0.0)
1
```

References:

### Add Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.