k-means clustering algorithm is used to group samples (items) in k clusters; k is specified by the user. The method works by calculating mean distance between cluster centroids and samples, hence the name k-means clustering. Euclidean distance is used as distance measure. See references for more information on the algorithm. This is a article describes k-means Clustering Algorithm with Python.

• First k samples are assigned as cluster centroids
• Final assignments are printed in the file named assignment-results.txt
• Final assignments are printed in the format : [var1, var2 … varn] <tab> Cluster-ID

Implementation :

 import math   nsample = int (input ("Number of Samples: ")) nvar = int (input ("Number of Variables: ")) k = int (input ("Number of Clusters: "))   sampleList = [[0 for x in range(nvar)] for y in range(nsample)]   #Input samples sampleCount = 0 for sample in sampleList: print ("\n\nCollecting Data for Sample #{}:".format(sampleCount+1)) print ("----------------------------------------") i = 0 while i < nvar: sample [i] = int (input ("Data for var-{} : ".format(i+1))) i += 1   #First k samples are chosen as cluster centroids centroidList = [[0 for x in range(nvar)] for y in range(k)] i = 0 while i < k: j = 0 while j < nvar: centroidList[i][j] = sampleList[i][j] j += 1 i += 1   # distanceList maintains Euclidean distance of given sample # for all clusters k distanceList = [0.0 for x in range (k)]   #Open file for writing assignments fileObject = open ("assignment-results.txt","w")   for sample in sampleList: n = 0 for centroid in centroidList: var = 0 total = 0 while var < nvar: temp = (sample[var] - centroid[var]) ** 2 var += 1 total += temp distanceList[n] = math.sqrt (total) n += 1 #Write assignments to file fileObject.write("{} \t {}\n".format(sample, distanceList.index(min(distanceList))+1))   #Close the file fileObject.close() print ("\n\n Final assignments successfully written to file! \n")
References : Want to Learn More? Signup in a Click. 