k-Means Clustering Spark Tutorial : Learn Data Science

k
k-Means clustering with Spark is easy to understand. MLlib comes bundled with k-Means implementation (KMeans) which can be imported from pyspark.mllib.clustering package. Here is a very simple example of clustering data with height and weight attributes. Arguments to KMeans.train: k is the number of desired clusters maxIterations is the maximum number of iterations to run. runs is the number of times to run the k-means algorithm initializationMode can be either 'random'or 'k-meansII' from pyspark import SparkContext from pyspark.mllib.clustering import KMeans from numpy import array sc = SparkContext() sc.setLogLevel ("ERROR") #12 records with height, weight data data = array([185,72,
Subscribe or log in to read the rest of this content.

About the author

Devji Chhanga

I teach computer science at university of Kutch since 2011, Kutch is the western most district of India. At iDevji, I share tech stories that excite me. You will love reading the blog if you too believe in the disruptive power of technology. Some stories are purely technical while others can involve empathetical approach to problem solving using technology.

Add Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Devji Chhanga

I teach computer science at university of Kutch since 2011, Kutch is the western most district of India. At iDevji, I share tech stories that excite me. You will love reading the blog if you too believe in the disruptive power of technology. Some stories are purely technical while others can involve empathetical approach to problem solving using technology.

Get in touch

Quickly communicate covalent niche markets for maintainable sources. Collaboratively harness resource sucking experiences whereas cost effective meta-services.