Skip to Content

K-means algorithm falls under the clustering category.

Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data.

pal.JPG

After clustering the above balls will be arranged as follows.

pal.JPG

K-means clustering algorithm

In this algorithm, we decide how many clusters(K) we want.

We need to select K cluster instances which in such a way that they are “Farthest apart” from each other.

Our aim is to partition the points into k groups such that the sum of squares from points or distance to the assigned cluster centres is minimized.

This can be illustrated using Voronoi cells. From wikipedia,

Suppose we want to estimate the number of customers of a given shop. With all else being equal (price, products, quality of service, etc.), it is reasonable to assume that customers choose their preferred shop simply by distance considerations: they will go to the shop located nearest to them.

http://upload.wikimedia.org/wikipedia/commons/8/89/2Ddim-L2norm-10site.png

In order to achieve K-mean clustering by programmatic means, we use Lloyd’s Algorithm in PAL which uses an iterative refinement technique.

  1. Choose a value for K, the total number of clusters.
  2. Randomly choose K points as cluster centers.
  3. Assign the remaining instances to their closest cluster center.
  4. Calculate a new cluster center for each cluster.
  5. Repeat steps 3-4 until the cluster centers do not change.

Organizations can thus divide their customers into segments based on variants such as demography, customer behavior, customer profitability, measure of risk, etc.

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

Leave a Reply