iDevji

Posted 2017-11-11Updated 2023-08-22Big Data / Algorithms / Data Mining / Machine Learning2 minutes read (About 291 words)

k-means Clustering Algorithm with Python : Learn Data Science

_k_-means clustering algorithm is used to group samples (items) in k clusters; k is specified by the user. The method works by calculating mean distance between cluster centroids and samples, hence the name k-means clustering. Euclidean distance is used as distance measure. See references for more information on the algorithm. This is a article describes k-means Clustering Algorithm with Python.

About this implementation :

First k samples are assigned as cluster centroids
Cluster IDs start with 1
Final assignments are printed in the file named assignment-results.txt
Final assignments are printed

Implementation :

import math

nsample = int (input ("Number of Samples: "))
nvar = int (input ("Number of Variables: "))
k = int (input ("Number of Clusters: "))

sampleList = [[0 for x in range(nvar)] for y in range(nsample)]

#Input samples
sampleCount = 0
for sample in sampleList:
print ("\n\nCollecting Data for Sample #{}:".format(sampleCount+1))
print ("----------------------------------------")
i = 0
while i < nvar:
sample [i] = int (input ("Data for var-{} : ".format(i+1)))
i += 1

#First k samples are chosen as cluster centroids
centroidList = [[0 for x in range(nvar)] for y in range(k)]
i = 0
while i < k:
  j = 0
  while j < nvar:
    centroidList[i][j] = sampleList[i][j]
    j += 1
  i += 1

# distanceList maintains Euclidean distance of given sample
# for all clusters k
distanceList = [0.0 for x in range (k)]

#Open file for writing assignments
fileObject = open ("assignment-results.txt","w")

for sample in sampleList:
  n = 0
  for centroid in centroidList:
    var = 0
    total = 0
    while var < nvar:
      temp = (sample[var] - centroid[var]) ** 2 var += 1
      total += temp
    distanceList[n] = math.sqrt (total)
n += 1

#Write assignments to file
fileObject.write("{} \t {}\n".format(sample, distanceList.index(min(distanceList))+1))

#Close the file
fileObject.close()
print ("\n\n Final assignments successfully written to file! \n")

References :

K Means Clustering Algorithm: Explained

Links

Categories

Recents

Archives

Tags

Subscribe for updates