Gaussian Mixtures and the EM Algorithm

From Hard Clusters to Soft Clusters

K-means assigns each point to exactly one cluster. Gaussian mixture models (GMMs) take a softer view: a point can belong to several components with different probabilities.

A GMM assumes that the data distribution is a mixture of \(K\) Gaussian components:

\[ p(x) = \sum_{j=1}^{K} \tau_j \mathcal{N}(x; \mu_j, \Sigma_j), \]

where:

\(\tau_j\) is the weight of component \(j\),
\(\mu_j\) is its mean,
\(\Sigma_j\) is its covariance matrix.

The EM Algorithm

Because the component responsible for each observation is unknown, GMMs are usually fit with the Expectation-Maximization (EM) algorithm.

E-step

Compute the responsibility of each component for each point:

\[ T_{j,i} = P(Z_i = j \mid X_i = x_i). \]

This gives a soft assignment rather than a hard label.

M-step

Update the component weights, means, and covariances using those responsibilities as weights.

The two steps alternate until the likelihood stops improving meaningfully.

Selecting the Number of Components

Because GMMs are probabilistic models, criteria such as AIC and BIC are natural tools for choosing \(K\).

In general:

lower AIC or BIC means a better trade-off between fit and complexity,
BIC usually penalizes complexity more heavily than AIC.

When Gaussian Mixtures Are Useful

GMMs are a strong choice when:

clusters are not well represented by simple Voronoi regions,
soft membership is meaningful,
or cluster covariances differ across groups.

Their main drawbacks are:

sensitivity to initialization,
convergence to local optima,
and the need to choose the number of components.

Summary

In this lesson we covered:

Why Gaussian mixtures provide soft cluster membership
The mixture-model form of a GMM
The logic of the EM algorithm
How responsibilities differ from hard assignments
Why AIC and BIC help choose the number of components
When GMMs are more flexible than K-means

Next: We will finish with density-based methods that can detect irregular shapes and outliers.

Introduction & Background

Simple Linear Regression

Inference & Diagnostic

Multiple Regression and Feature Engineering

Model Selection and Regularization

Generalized Linear Models (GLM) and Logistic Regression

Mathematical Annexes

Introduction and Partitioning

Splitting Criteria and Best Split

Growth Control and Pruning

Foundations of Model Evaluation

Metrics for Regression and Classification

Cross-Validation Strategies

Feature Selection and Preprocessing

Hyperparameter Tuning and Early Stopping

Imbalanced Data and Threshold Selection

Introduction and Hierarchical Clustering

K-means and K-medoids