K-means and K-medoids

K-means looks for \(K\) groups that minimize within-cluster variation:

\[ \min \sum_{k=1}^{K} \sum_{x_i \in C_k} \|x_i - \mu_k\|^2, \]

where \(\mu_k\) is the centroid of cluster \(k\).

The algorithm alternates between:

  1. assigning each point to its nearest center,
  2. recomputing each center as the mean of its assigned points.

This loop repeats until assignments stabilize.

K-means is attractive because it is:

  • fast,
  • easy to implement,
  • and effective when clusters are roughly compact and spherical.

Its main limits are equally important:

  • it requires \(K\) in advance,
  • it is sensitive to outliers,
  • and it may converge to a local optimum depending on initialization.

That is why multiple random starts or k-means++ initialization are common.

K-medoids is similar in spirit, but instead of representing a cluster by its mean, it uses an actual data point as its center.

That makes it:

  • more robust to outliers,
  • compatible with more general distance measures,
  • but usually slower than K-means.

The most classical algorithm is PAM (Partitioning Around Medoids), which repeatedly tests swaps between medoids and non-medoids to reduce clustering cost.

Selecting \(K\) is part of the problem, not an afterthought.

Common tools include:

  • the elbow method,
  • the silhouette score,
  • the gap statistic,
  • and information criteria for probabilistic models.

No rule is perfect. Good practice combines one or more quantitative criteria with domain judgment.

In this lesson we covered:

  1. The K-means objective function
  2. The assignment-update loop behind the algorithm
  3. Why K-means is fast but sensitive to outliers and initialization
  4. How K-medoids replaces centroids with representative data points
  5. Why choosing \(K\) requires both metrics and interpretation

Next: We will soften the hard cluster assignments of K-means with Gaussian mixture models and EM.