Gaussian Mixtures and the EM Algorithm

K-means assigns each point to exactly one cluster. Gaussian mixture models (GMMs) take a softer view: a point can belong to several components with different probabilities.

A GMM assumes that the data distribution is a mixture of \(K\) Gaussian components:

\[ p(x) = \sum_{j=1}^{K} \tau_j \mathcal{N}(x; \mu_j, \Sigma_j), \]

where:

  • \(\tau_j\) is the weight of component \(j\),
  • \(\mu_j\) is its mean,
  • \(\Sigma_j\) is its covariance matrix.

Because the component responsible for each observation is unknown, GMMs are usually fit with the Expectation-Maximization (EM) algorithm.

Compute the responsibility of each component for each point:

\[ T_{j,i} = P(Z_i = j \mid X_i = x_i). \]

This gives a soft assignment rather than a hard label.

Update the component weights, means, and covariances using those responsibilities as weights.

The two steps alternate until the likelihood stops improving meaningfully.

Because GMMs are probabilistic models, criteria such as AIC and BIC are natural tools for choosing \(K\).

In general:

  • lower AIC or BIC means a better trade-off between fit and complexity,
  • BIC usually penalizes complexity more heavily than AIC.

GMMs are a strong choice when:

  • clusters are not well represented by simple Voronoi regions,
  • soft membership is meaningful,
  • or cluster covariances differ across groups.

Their main drawbacks are:

  • sensitivity to initialization,
  • convergence to local optima,
  • and the need to choose the number of components.

In this lesson we covered:

  1. Why Gaussian mixtures provide soft cluster membership
  2. The mixture-model form of a GMM
  3. The logic of the EM algorithm
  4. How responsibilities differ from hard assignments
  5. Why AIC and BIC help choose the number of components
  6. When GMMs are more flexible than K-means

Next: We will finish with density-based methods that can detect irregular shapes and outliers.