k-means Clustering Algorithm with Python : Learn Data Science

k-Means Clustering Spark

k-means clustering algorithm is used to group samples (items) in k clusters; k is specified by the user. The method works by calculating mean distance between cluster centroids and samples, hence the name k-means clustering. Euclidean distance is used as distance measure. See references for more information on the algorithm. This is a article describes k-means Clustering Algorithm with Python.

About this implementation :

  • First k samples are assigned as cluster centroids
  • Cluster IDs start with 1
  • Final assignments are printed in the file named assignment-results.txt
  • Final assignments are printed in the format : [var1, var2 … varn] <tab> Cluster-ID

Implementation :

import math
nsample = int (input ("Number of Samples: "))
nvar = int (input ("Number of Variables: "))
k = int (input ("Number of Clusters: "))
sampleList = [[0 for x in range(nvar)] for y in range(nsample)]
#Input samples
sampleCount = 0
for sample in sampleList:
	print ("\n\nCollecting Data for Sample #{}:".format(sampleCount+1))
	print ("----------------------------------------")
	i = 0
	while i &lt; nvar:
		sample [i] = int (input ("Data for var-{} : ".format(i+1)))
		i += 1
#First k samples are chosen as cluster centroids
centroidList = [[0 for x in range(nvar)] for y in range(k)]
i = 0
while i &lt; k:
	j = 0
	while j &lt; nvar:
		centroidList[i][j] = sampleList[i][j]
		j += 1
	i += 1
# distanceList maintains Euclidean distance of given sample
# for all clusters k
distanceList = [0.0 for x in range (k)]
#Open file for writing assignments
fileObject = open ("assignment-results.txt","w")
for sample in sampleList:
	n = 0
	for centroid in centroidList:
		var = 0
		total = 0
		while var &lt; nvar:
			temp = (sample[var] - centroid[var]) ** 2
			var += 1
			total += temp
		distanceList[n] = math.sqrt (total)
		n += 1
	#Write assignments to file
	fileObject.write("{} \t {}\n".format(sample, distanceList.index(min(distanceList))+1))
#Close the file
print ("\n\n Final assignments successfully written to file! \n")
References :

  1. K Means Clustering Algorithm: Explained


Notify of