Taxonomy of Explainability Methods

The lecture organizes explainability methods along three simple axes. This is a helpful way to avoid mixing techniques that answer very different questions.

Axis First side Second side Main question
Timing Ante hoc Post-hoc Is the model interpretable by design, or explained after training?
Scope Global Local Do we want to understand the whole model or one prediction?
Dependence Model-specific Model-agnostic Does the method need internal access to the model?

The slide below is worth keeping because it compresses the full taxonomy into one visual map before we discuss each axis separately.

Ante Hoc vs Post-Hoc

  • Ante hoc methods are interpretable by construction. Decision trees and generalized linear models are the main examples in this course.
  • Post-hoc methods are applied after a model has already been trained. They aim to summarize or approximate the behavior of a more complex predictor.

Ante hoc explanations are usually cleaner because the model itself is transparent. Post-hoc explanations are more flexible, but they are often approximate.

  • A global explanation tells us how the model behaves overall.
  • A local explanation tells us why one particular input received one particular prediction.

These are complementary rather than competing views. A model can look sensible globally while still failing on specific edge cases, and a convincing local explanation does not guarantee that the entire model is well-behaved.

  • Model-specific methods use internal structure such as trees, gradients, or hidden-layer activations.
  • Model-agnostic methods treat the predictor as a black box and only rely on inputs and outputs.

Model-agnostic tools are portable, but they can be slower or less exact. Model-specific tools can be sharper, but only for one model family.

The categories become easier to remember once we attach a few examples to them.

Category Typical methods from this course
Ante hoc Decision trees, GLMs
Global post-hoc Permutation importance, PDP, LOFO, surrogate models
Local post-hoc LIME, SHAP, counterfactuals, anchors
Model-specific Tree importance, TCAV, Grad-CAM, LRP

Before applying any method, it helps to ask:

  1. Do I need to explain the full model or a single decision?
  2. Can I access the internals of the model?
  3. Do I need a faithful explanation, a simple approximation, or both?

Once these questions are clear, method selection becomes much easier.

In this lesson we classified explanations by:

  1. Ante hoc vs post-hoc
  2. Global vs local
  3. Model-specific vs model-agnostic
  4. The practical question each axis helps answer

Next: We move to intrinsically interpretable models, where the explanation is built into the model itself.