Explainability in AI

This course studies how we make machine learning systems understandable enough to trust, debug, and govern. We begin with the motivations behind explainability, then organize the field into clear categories before moving through global, local, and model-specific explanation tools.

The material follows the same progression as your lecture source: intrinsically interpretable models first, then post-hoc methods such as PDP, ICE, LIME, and SHAP, and finally concept- and gradient-based techniques for deep models. We close with a credit-risk case study that ties several methods together on one realistic pipeline.

By the end, you'll be able to:

  • Explain why interpretability matters for trust, fairness, and regulatory compliance.
  • Distinguish ante hoc from post-hoc explanations, and global from local ones.
  • Use the main explainability families appropriately for tabular models and deep networks.
  • Read explanation plots critically instead of treating them as unquestionable truth.
  • Motivations, case studies, and the black-box challenge
  • Taxonomy of explainability methods
  • Decision trees, GLMs, confounding, and Simpson's paradox
  • Permutation importance, PDP, ICE, LOFO, interaction strength, and surrogate models
  • LIME, Shapley values, SHAP, and neighboring local-explanation ideas
  • Tree-specific importance, TCAV, saliency maps, Grad-CAM, and LRP
  • A practical credit-risk workflow using global and local explanations together
Introduction and Motivations