What Is Interpretability?

Understand the mechanics behind black-box machine learning model predictions

Interpretability is the degree to which machine learning algorithms can be understood by humans. Machine learning models are often referred to as “black box” because their representations of knowledge are not intuitive, and as a result, it is often difficult to understand how they work. Interpretability techniques help to reveal how black-box machine learning models make predictions.

By revealing how various features contribute (or do not contribute) to predictions, interpretability techniques can help you validate that the model is using appropriate evidence for predictions, and find biases in your model that were not apparent during training. Some machine learning models, such as linear regression, decision trees, and generative additive models are inherently interpretable. However, interpretability often comes at the expense of power and accuracy.

Figure 1: Trade-off between model performance and explainability.

Interpretability and explainability are closely related. Interpretability is used more often in the context of (classic) machine learning, while in the context of deep neural networks many use “AI explainability.”

Applying Interpretability

Practitioners seek model interpretability for three main reasons:

  • Debugging: Understanding where or why predictions go wrong and running “what-if” scenarios can improve model robustness and eliminate bias.
  • Guidelines: Black-box models may violate corporate technology best practices and personal preference
  • Regulations: Some government regulations require interpretability for sensitive applications such as in finance, public health, and transportation

Model interpretability addresses these concerns and increases trust in the models in situations where explanations for predictions are important or required by regulation.

Interpretability is typically applied at two levels:

  • Global Methods: Provide an overview of the most influential variables in the model based on input data and predicted output
  • Local Methods: Provide an explanation of a single prediction result

Figure 2 illustrates the difference between the local and global scope of interpretability. You can also apply interpretability to groups within your data and arrive at conclusions at the group level, such as why a group of manufactured products were classified as faulty.

Figure 2: Local versus global interpretability: The two classes are represented by purple and orange dots.

Popular techniques for local interpretability include Local Interpretable Model-Agnostic Explanations (LIME) and Shapley values. For global interpretability, many start with feature ranking (or importance) and the visual partial dependence plots. You can apply these techniques using MATLAB®.

Using Interpretability Techniques in MATLAB

Using MATLAB for machine learning, you can apply techniques to interpret most popular machine learning models, which can be highly accurate but are not inherently interpretable.

Local Interpretable Model-Agnostic Explanations: This approach involves approximating a complex model in the neighborhood of the prediction of interest with a simple interpretable model, such as a linear model or decision tree. You can then use the simpler model as a surrogate to explain how the original (complex) model works. Figure 3 illustrates the three main steps of applying LIME.

Figure 3: By fitting a lime object in MATLAB, you can obtain LIME explanations via a a simple interpretable model.

Partial Dependence (PDP) and Individual Conditional Expectation (ICE) Plots:
With these methods, you examine the effect of one or two predictors on the overall prediction by averaging the output of the model over all the possible feature values. Figure 4 shows a partial dependence plot that was generated with the MATLAB function plotPartialDependence.

Figure 4: Partial dependence plot showing that the probability of “standing” decreases sharply if the gyroscope indicates significant angular velocity.

Strictly speaking, a partial dependence plot just shows that certain ranges in the value of a predictor are associated with specific likelihoods for prediction; that’s not sufficient to establish a causal relationship between predictor values and prediction. However, if a local interpretability method like LIME indicates the predictor significantly influenced the prediction (in an area of interest), you can arrive at an explanation why a model behaved a certain way in that local area.

Shapley Values: This technique explains how much each predictor contributes to a prediction by calculating the deviation of a prediction of interest from the average. This method is particularly popular within the finance industry because it is derived from game theory as its theoretical underpinning, and because it satisfies the regulatory requirement of providing complete

explanations: the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. The MATLAB function shapley computes Shapley values for a query point of interest.

Figure 5: The Shapley values indicate how much each predictor deviates from the average prediction at the point of interest, indicated by the vertical line at zero.

Figure 5 shows that in the context of predicting heart arrhythmia near the sample of interest, MFCC4 had a strong positive impact on predicting “abnormal,” while MFCC11 and 5 leaned against that prediction, i.e., towards a “normal” heart.

Evaluating all combinations of features generally takes a long time. Therefore, Shapley values are often approximated by applying Monte Carlo simulation in practice.

Predictor Importance Estimations by Permutation: MATLAB also supports permuted predictor importance for random forests. This approach takes the impact changes in predictor values have on model prediction error as an indication of predictor importance. The function shuffles the values of a predictor on test or training data, and observes the magnitude of the resulting changes in error.

Choosing a Method for Interpretability

Figure 6 provides an overview of inherently explainable machine learning, various (model-agnostic) interpretability methods, and guidance on how to apply them.

Figure 6: How to select the appropriate interpretability method.

Different interpretability methods have their own limitations. A best practice is to be aware of those limitations as you fit these algorithms to the various use cases. Interpretability tools help you understand why a machine learning model makes the predictions that it does. These approaches are likely to become increasingly relevant as regulatory and professional bodies continue to work towards a framework for certifying AI for sensitive applications, such as autonomous transportation and medicine.

See also: artificial intelligence, machine learning, supervised learning, deep learning, AutoML