K-means and K-medoids
K-means Objective
K-means looks for \(K\) groups that minimize within-cluster variation:
\[ \min \sum_{k=1}^{K} \sum_{x_i \in C_k} \|x_i - \mu_k\|^2, \]where \(\mu_k\) is the centroid of cluster \(k\).
The algorithm alternates between:
- assigning each point to its nearest center,
- recomputing each center as the mean of its assigned points.
This loop repeats until assignments stabilize.
Why K-means Is Popular
K-means is attractive because it is:
- fast,
- easy to implement,
- and effective when clusters are roughly compact and spherical.
Its main limits are equally important:
- it requires \(K\) in advance,
- it is sensitive to outliers,
- and it may converge to a local optimum depending on initialization.
That is why multiple random starts or k-means++ initialization are common.
K-medoids (PAM)
K-medoids is similar in spirit, but instead of representing a cluster by its mean, it uses an actual data point as its center.
That makes it:
- more robust to outliers,
- compatible with more general distance measures,
- but usually slower than K-means.
The most classical algorithm is PAM (Partitioning Around Medoids), which repeatedly tests swaps between medoids and non-medoids to reduce clustering cost.
Choosing the Number of Clusters
Selecting \(K\) is part of the problem, not an afterthought.
Common tools include:
- the elbow method,
- the silhouette score,
- the gap statistic,
- and information criteria for probabilistic models.
No rule is perfect. Good practice combines one or more quantitative criteria with domain judgment.
Summary
In this lesson we covered:
- The K-means objective function
- The assignment-update loop behind the algorithm
- Why K-means is fast but sensitive to outliers and initialization
- How K-medoids replaces centroids with representative data points
- Why choosing \(K\) requires both metrics and interpretation
Next: We will soften the hard cluster assignments of K-means with Gaussian mixture models and EM.