Introduction and Hierarchical Clustering
What Clustering Tries to Do
Clustering is an unsupervised learning task. Instead of predicting known labels, we try to discover structure that is already present in the data.
The basic goal is simple:
- points inside a cluster should be similar to one another,
- points from different clusters should be less similar.
This makes clustering useful for exploration, compression, segmentation, and pattern discovery.
Common Families of Clustering
The methods in this section fall into four broad families:
- hierarchical methods,
- partition-based methods such as K-means,
- probabilistic methods such as Gaussian mixtures,
- density-based methods such as DBSCAN and OPTICS.
Each family makes different assumptions about shape, scale, and what a "cluster" should mean.
Hierarchical Clustering
Hierarchical clustering builds a nested sequence of groups.
- In the agglomerative version, each point starts alone and clusters are merged step by step.
- In the divisive version, all points start together and the data is split recursively.
The result is a dendrogram, which lets us inspect cluster structure at multiple resolutions.
Distance Measures and Linkage Criteria
To run hierarchical clustering, we need:
- a distance between individual observations,
- a rule for the distance between clusters.
Common linkage choices are:
| Linkage | Main idea | Typical behavior |
|---|---|---|
| Single | nearest pair across clusters | can create long chained clusters |
| Complete | farthest pair across clusters | favors compact groups |
| Average | average pairwise distance | balanced compromise |
| Ward | smallest increase in within-cluster variance | often produces compact spherical groups |
Complexity and When to Use It
Hierarchical clustering is especially useful when:
- you want an exploratory view of the data,
- you do not want to commit to the number of clusters immediately,
- or you want a visual structure rather than just labels.
Its main drawback is scalability. Computing all pairwise distances can become expensive for large datasets, so hierarchical clustering is usually less suitable for very large \(n\).
Summary
In this lesson we covered:
- What clustering tries to discover in unlabeled data
- The main families of clustering methods
- Agglomerative and divisive hierarchical clustering
- How dendrograms summarize nested cluster structure
- Why linkage choice changes the final result
- Why hierarchical clustering is powerful but not always scalable
Next: We will move to partition-based methods with K-means and K-medoids.