Model-Specific Explainability

Model-specific methods take advantage of what the model is made of. Instead of treating the predictor as a pure black box, they use tree splits, gradients, activations, or relevance propagation rules to produce sharper explanations.

For tree ensembles, the lecture highlights two common families of importance scores.

  • GINI importance in Random Forests: measure how much each feature reduces impurity across the forest.
  • Gradient-boosted tree importance: use gain, split frequency, or cover to summarize how often and how usefully a feature is used.

A common generic form is

\[ I_j = \sum_{t \in T_j} p(t)\,\Delta i(t), \]

where \(T_j\) is the set of tree nodes split on feature \(j\), \(p(t)\) is the probability of reaching node \(t\), and \(\Delta i(t)\) is the impurity decrease there.

These tools are fast and convenient, but they can be biased toward high-cardinality variables or features that create many plausible split points.

Testing with Concept Activation Vectors (TCAV) moves beyond raw pixels and asks whether a human-level concept influences the model.

Examples of concepts include:

  • striped texture,
  • curvature,
  • medical imaging patterns,
  • or other domain-defined visual properties.

Instead of saying "these individual pixels matter," TCAV asks whether movement in a concept direction changes the class score. That makes it a useful bridge between feature-level explanations and more semantic reasoning.

At a high level, a TCAV score is the fraction of examples for which the concept pushes the class score in a positive direction:

[ \mathrm{TCAV}_{C,k,l}

\frac{1}{n}\sum_{i=1}^n \mathbf 1!\left{S_{C,k,l}(x_i) > 0\right}. ]

For neural networks on images, local gradient-based explanations are often the first stop.

  • Vanilla saliency maps compute the gradient of the class score with respect to input pixels.
  • Grad-CAM pushes that idea into the last convolutional feature maps, then produces a heatmap showing which regions most influenced the class.

Grad-CAM is especially popular because its heatmaps are visually intuitive and align well with CNN structure.

Its core computation is

\[ \alpha_k^c = \frac{1}{Z}\sum_i \sum_j \frac{\partial y^c}{\partial A_{ij}^k}, \qquad L_{\mathrm{Grad\text{-}CAM}}^c = \mathrm{ReLU}\!\left(\sum_k \alpha_k^c A^k\right). \]

So we first estimate how important each feature map \(A^k\) is for class \(c\), then combine those maps into one class-specific heatmap.

Here the lecture visual is useful because Grad-CAM is not just a formula; the appeal is the heatmap itself.

The lecture also introduces Layer-wise Relevance Propagation (LRP) and mentions DeepLIFT in the same family of backward explanation methods.

  • Gradient methods tell us about sensitivity.
  • Relevance methods try to attribute the prediction itself back through the network.

LRP is built around a conservation idea: relevance is propagated backward so that the total relevance remains tied to the output score. This often yields explanations that feel more attribution-focused than raw gradients.

\[ \sum_j R_j^{[l]} = \sum_k R_k^{[l+1]} \]

That conservation view is one reason LRP often feels closer to accounting than to raw derivative analysis.

In this lesson we covered:

  1. Tree-specific importance measures
  2. TCAV for concept-level explanations
  3. Saliency maps and Grad-CAM for image models
  4. The distinction between gradient sensitivity and relevance propagation

Next: We will bring the main ideas together in a credit-risk case study using one model and several explainability tools.