~/K-Medians

Brandon Rozek

Photo of Brandon Rozek

PhD Student @ RPI studying Automated Reasoning in AI and Linux Enthusiast.

This is a variation of k-means clustering where instead of calculating the mean for each cluster to determine its centroid we are going to calculate the median instead.

This has the effect of minimizing error over all the clusters with respect to the Manhattan norm as opposed to the Euclidean squared norm which is minimized in K-means

Algorithm

Given an initial set of $k$ medians, the algorithm proceeds by alternating between two steps.

Assignment step: Assign each observation to the cluster whose median has the leas Manhattan distance.

Update Step: Calculate the new medians to be the centroids of the observations in the new clusters

The algorithm is known to have converged when assignments no longer change. There is no guarantee that the optimum is found using this algorithm.

The result depends on the initial clusters. It is common to run this multiple times with different starting conditions.