Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

https://arxiv.org/abs/2008.05030

As black box explanations are increasingly being employed to establish model credibility in high-stakes settings, it is important to ensure that these explanations are accurate and reliable. However, prior work demonstrates that explanations generated by s

arxiv.org

Author: Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju

Affiliation: UC Irvine, Harvard University

Publisher: NeurIPS 2021

Introduction

Background
1) LIME: Local Interpretable Model-Agnostic Explanation
2) KernelSHAP
3) Difference betwwen LIME and Kernel SHAP

Methodology
1) Notion of uncertainty
2) Estimating the Number of Perturbations
3) Focused Sampling of Perturbations

Experiment
1) Setup
2) Quality of Uncertainty Estimates
3) Correctness of Estimated Number of Perturbations
4) Efficiency of Focused Sampling
5) Stability of BayesLIME & BayesSHAP
6) User Study

Conclusion
Own Review

Introduction

ML models achieve state-of-the art accuracy but hard to understand when decision makers need AI Models application. This AI explainability is getting more important, as AI model get increasingly deployed in domains as healthcare and criminal justice.

To diagnose the Blackbox Issue as AI Complexity, Post-hoc Explanations construct interpretable local approximations(e.g., LIME [1], SHAP [2], MAPLE [3], Anchors [4]). Despite their generality on various domains, Previous post-hoc work may not be stable, not have decidability to the values of certain hyperparameters of local explanations.

Figure: A pipeline for Post-hoc Explanation.

Figure: While LIME produces very different and contradictory feature importance for different number of perturbations(a), BayesLIME provides more context on for an instance from the COMPAS dataset(b).

Our proposed Bayesian framework, novel method, BayesLIME and BayesSHAP deal with uncertainty and unreliability of local explanations. Bayesian versions of Post-hoc Explanations reduce computational complexity and make concrete inference from the posteriors of the explanations. More specifically, the number of perturbations is critical hyperparameters to generate explanations that satisfy desired levels of confidence.

Background

The background of BayesLIME and BayesSHAP is LIME and KernelSHAP.

LIME (Local Interpretable Model-agnostic Explanations)
Pros
Human-friendly
Available on Tabular, Text, Image and even embedding Data

Cons
Instability because of variation on model explanation

SHAP (Shapley Additive Explanations)
Pros
Based on Shapley Values as notion of game theory
Connection between LIME and Shapley values

Cons
Kernel SHAP is very slow and ignores dependence among features

1) LIME: Local Interpretable Model-Agnostic Explanation
Training Method about vicinity to prediction value needed to explain
Formula describes searching for model $ g $ that minimizes difference between $ L $ and $ Ω $

$ z $: Perturbed instances
$ π_{x}(z) $: Weight depending on distance between sample $ z $ and $ x $
$ f $: Previous model

$ g $: Interpretable model(Lasso)
$ Ω $: The number of weight (not zero of Lasso regression)

2) Kernel SHAP
SHAP is Explanation method about prediction of instance $ x $ to calculate level of contribution to each feature.

SHAP is one of feature attribution method.

The property of feature attribution method:

- Local Accuracy

When input $ x' $ into explanation model, output $ g(x') $ shoud match well $ f(x) $ of original input $ x $.

- Missingness

When there is no feature, also attribution is zero.

- Consistency

If marginal contribution of feature increases or becomes same regardless of other property, attribution should stay the same and can't be decreased.

Kernal SHAP is Linear LIME plus Shapely Values.
Explanation model and regression coefficient can predict Shapley values through weighted linear regression(as local surrogate model) and weighting kernel.

$ m $: The number of featrue
|$ z' $|: The number of non-zero feature of simplified input $ z' $

3) Difference betwwen LIME and Kernel SHAP

How to chose $ π_{x}(z) $ is difference between them.
In LIME, it is chosen heuristically: $ π_{x}(z) $ is computed as the cosine or $ L2 $ distance.
KernelSHAP leverages game theoretic principles to compute $ x $($ z $) , guaranteeing that explanations satisfy certain properties.

Methodology

1) Notion of uncertainty
We model the black box prediction of each perturbation $ z $ as a linear
combination of the corresponding feature values($ \phi^{T}(z) $) plus an error term ($ e $).
The error term is modeled as a Gaussian whose variance relies on the proximity function $ π_{x}(z) $.
The distributions on error $ e $ and feature importance $ \phi $ both consider the parameter $ \sigma^2 $
The smaller error is, the more confident featrure importance has.

Generative process captures two source of uncertainty in local explanations, as the number of perturbations around $ x $ goes to i.e., $ N \rightarrow \infty $

Feature importance uncertainty

The estimate of $ \phi $ converges to the true feature importance scores, and its uncertainty to $ 0 $ .

Error uncertainty

Uncertainty of the error term $ e $ converges to the bias of the local linear model $ \phi $.

Evaluate the probability density function (PDF) of the above posterior at $ 0 $ .

2) Estimating the Number of Perturbations

LIME and KernelSHAP is that they do not provide any guidance on how to choose the number of perturbations. We leverage the uncertainty estimates output by our framework to compute perturbations-to-go ($ G $)

The number of additional perturbations required ($ G $) to achieve a credible interval width $ W $ of feature importance for a data point $ x $at user-specified confidence level $ a $ can be computed.

$ \overline{π_{S}} $: The average proximity $ π_{x}(z) $
$ {S_{S}}^2 $: Empirical sum of squared errors (SSE) between the black box and local linear model predictions
$ {\phi}^-1 $($ a $): Two-tailed inverse normal CDF at confidence level $ a $

3) Focused Sampling of Perturbations

If Perturbations-to-go (G) is large, computation cost querying the black-box model is expensive.
To reduce this cost, instead of querying randomly, use sampling procedure called $ focused $ $ sampling $

We draw a batch of $ A $ candidate perturbations, compute their predictive variance with the
Bayesian explanation, and induce a distribution over the perturbations by running softmax on the
variances with tempurature parameter $ \tau $.

Experiment

1) Setup

Dataset

COMPAS [5]: Criminal history, jail and prison time, and demographic attributes of 6172 defendants.
German Credit [6]: Financial and demographic information (including account information, credit history, employment, gender) for 1000 loan applications
MNIST [7]: Handwritten digits dataset
Imagenet [8]: A sample of 100 images of the following classes French Bulldog, Scuba Diver, Corn, and Broccoli

2) Quality of Uncertainty Estimates

Table: the 95% credible intervals with 100 perturbations include their true values (estimated on 10; 000 perturbations).

Our methods are well calibrated and therefore highly reliable in capturing the uncertainty of the feature importances.

3) Correctness of Estimated Number of Perturbations

Figure: Perturbations-to-go ($ G $).

We then leverage these estimates to compute G for 6 different certainty levels. We generate explanation with $ G $ perturbations, where $ G $ is computed using the desired credible interval width (x-axis), and compare desired levels to the observed credible interval width (y-axis) (blue line indicates ideal calibration). Results are averaged over 100 MNIST images of the digit “4” We see that $ G $ provides a good approximation of the additional perturbations needed.

4) Efficiency of Focused Sampling

Figure: Efficiency of focused sampling

Focused sampling results in faster convergence to reliable and high quality explanations.
Focused sampling stabilizes within a couple hundred model queries while random sampling takes over 1,000.

5) Stability of BayesLIME & BayesSHAP
A clear improvement (on average 53%) in stability in all cases except German Credit for BayesSHAP.

Figure: Assessing the % increase in stability of BayesLIME and BayesSHAP over LIME and SHAP respectively.

6) User Study
We perform a user study with 31 subjects to compare BayesLIME and LIME explanations
on MNIST. We ask users to guess the digit of the masked images. The better the explanation, the more difficult it should be for the users to get it right.

We find that the explanations output by our methods focus on more informative parts of the image, since hiding them makes it difficult for humans to guess the digit. Users had an error rate of 25:7% for LIME, while it was 30:7% for BayesLIME.

Conclusion

We developed a Bayesian framework to overcome shortcomings of local explanations.

Difficult to set hyperparameters: PTG(Perturbations-to-go)

Unclear when you have a good explanation: Credible Intervals

Unstable, re-reruns lead to different explanations: Credible Intervals

Often naive sampling : Focused sampling

Own Review

As widening area of domains as healthcare and criminal justice that relates to human's real life, Complexity of AI Model is gettting increasingly hard to understand for decision maker. Sometimes decision maker is more dependent on intuition rather than reason, when frequentism has no effect.

This is unstable and uncertain condition that there is no follwing hyperparameters after training, when applying methods that analyze the model. We need more final hyperparameter like Bayesian method that helps decision maker judge reasonabley beyond intuition.
Furthermore, someday we extend more widely area of hyperparameter's notion to explain a chance factor out of specific local explanation.

Reference

[1] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why Should I Trust You? explaining
the predictions of any classifier. In Knowledge Discovery and Data mining (KDD), 2016.

[2] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In
Advances in Neural Information Processing Systems, pages 4765–4774, 2017.

[3] Gregory Plumb, Denali Molitor, and Ameet S Talwalkar. Model agnostic supervised local
explanations. In Neural Information Processing Systems, 2018.

[4] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision modelagnostic
explanations. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[5] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. In ProPublica,
2016.

[6] Dheeru Dua and Casey Graff. Uci machine learning repository, 2017. URL http://archive.
ics.uci.edu/ml.

[7] Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database. ATT Labs
[Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.

[8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale
Hierarchical Image Database. In CVPR09, 2009.

[9] Christoph Molnar, Interpretable Machine Learning : A Guide for Making Black Box Models Explainable, Lulu.com, 2020.

[10] Nickil Maveli. Demystifying Post-hoc Explainability for ML models. spectra. 2021.09

'News > 논문' 카테고리의 다른 글

ELMo, GPT, BERT (0)	2023.07.06
RNN, Transformer (0)	2023.07.06
arxiv-utils Google Chrome Extension (0)	2023.06.22
A Survey on Model Compression and Acceleration for Pretrained Language Models (0)	2023.04.09

Reliable Post hoc Explanations: Modeling Uncertainty in Explainability

Introduction

Background

Methodology

Experiment

Conclusion

Own Review

Reference

'News > 논문' 카테고리의 다른 글

티스토리툴바