Introduction and Hierarchical Clustering

Clustering is an unsupervised learning task. Instead of predicting known labels, we try to discover structure that is already present in the data.

The basic goal is simple:

  • points inside a cluster should be similar to one another,
  • points from different clusters should be less similar.

This makes clustering useful for exploration, compression, segmentation, and pattern discovery.

The methods in this section fall into four broad families:

  • hierarchical methods,
  • partition-based methods such as K-means,
  • probabilistic methods such as Gaussian mixtures,
  • density-based methods such as DBSCAN and OPTICS.

Each family makes different assumptions about shape, scale, and what a "cluster" should mean.

Hierarchical clustering builds a nested sequence of groups.

  • In the agglomerative version, each point starts alone and clusters are merged step by step.
  • In the divisive version, all points start together and the data is split recursively.

The result is a dendrogram, which lets us inspect cluster structure at multiple resolutions.

To run hierarchical clustering, we need:

  1. a distance between individual observations,
  2. a rule for the distance between clusters.

Common linkage choices are:

Linkage Main idea Typical behavior
Single nearest pair across clusters can create long chained clusters
Complete farthest pair across clusters favors compact groups
Average average pairwise distance balanced compromise
Ward smallest increase in within-cluster variance often produces compact spherical groups

Hierarchical clustering is especially useful when:

  • you want an exploratory view of the data,
  • you do not want to commit to the number of clusters immediately,
  • or you want a visual structure rather than just labels.

Its main drawback is scalability. Computing all pairwise distances can become expensive for large datasets, so hierarchical clustering is usually less suitable for very large \(n\).

In this lesson we covered:

  1. What clustering tries to discover in unlabeled data
  2. The main families of clustering methods
  3. Agglomerative and divisive hierarchical clustering
  4. How dendrograms summarize nested cluster structure
  5. Why linkage choice changes the final result
  6. Why hierarchical clustering is powerful but not always scalable

Next: We will move to partition-based methods with K-means and K-medoids.