Explainability and Bias in
ML/AI Models
Naveen Sundar Govindarajulu
August 9, 2019
Visit
and
sign up
RealityEngines.AI
Why now?
Life
Impacting
ML & AI
models
COMPAS
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingFrom:
Non recidivating black people twice as likely to be labelled high risk than non
recidivating white people
Why Explainability?
• More use of ML/AI models by laypersons.
• Laypersons need explanations
• Developers also need quick explanations to debug models
faster
• There may be a legal need for explanations:
• If you deny someone a loan, you may need to explain the
reason for the denial.
Explainability
Explainability using Interpretable Models
Prior offenses <= 0
Low Risk High Risk
Armed offense?
Med Risk
YES
NO
NO YES
Explainability vs Performance
Tradeoff
• Some machine learning models are more explainable than
others.
Performance
Explainability
Deep learning models
Linear Models
DecisionTrees
Explainability Method:
Feature Attribution
Classifier
Explainer
features
“Weights” for
features
Input
features
Output
What Features?
Interpretable Features
• We need interpretable features.
• Difficult for laypersons to understand raw feature spaces (e.g.
word embeddings)
• Humans are good at understanding presence or absence of
components.
Interpretable Instance
• E.g.
• For Text:
• Convert to a binary vector indicating presence or absence
of words
• For images
• Convert to a binary vector indicating presence or absence
of pixels or contiguous regions.
Method 1: LIME
From
https://github.com/marcotcr/lime
Locally Interpretable Model-agnostic
Explanations
Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust
You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (pp. 1135-1144). ACM.
Method 1: LIME
Any classifier
1 1 0 1 1 0 1 0 0 1 0
0 0 0 1 0 1 1 1 1 0 1
-2.1 1.1 -0.5 2.2 -1.2 -1.5 1 -3 0.8 5.6 1.5
Weights for the linear classifier then
give us feature importances
Binary vectors
-2.1 2.2 -3 5.6
Enforce
sparsity
Example:
Text Sentiment Classification
“The movie is not bad”
This movie is not bad
0 0 0 2.3 -1.5
Explanation for “Cat”
LIME with Images
From
https://github.com/marcotcr/lime
Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM.
Explanations for Multi-Label
Classifiers
Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM.
Using LIME for Debugging (E.g. 1)
https://github.com/marcotcr/lime
Using LIME for Debugging (E.g. 2)
https://github.com/marcotcr/lime
Using LIME for Debugging (E.g. 2)
Method 2: SHAP
Unifies many different feature attribution methods and has some
desirable properties.
1. LIME
2. Integrated Gradients
3. Shapley values
4. DeepLift
Lundberg, S.M. and Lee, S.I., 2017. A unified approach to interpreting model predictions. In
Advances in Neural Information Processing Systems (pp. 4765-4774).
Method 2: SHAP
• Derives from game-theoretic foundations.
• Shapley values used in game theory to assign values to players
in cooperative games.
What are Shapley values?
• Suppose there is a set S of N players
participating in a game with payoff for any S
subset of players participating in the game
given by:
• Shapley values provide one fair
way of dividing up the total
payoff among the N players.
ShapleyValue
Payoff for the group
including player i
Shapley value for player i
Payoff for a group without player i
SHAP Explanations
• Players are features.
• Payoff is the model’s real valued prediction.
SHAP Implementation
(https://github.com/slundberg/shap)
Different kinds of explainers:
1. TreeExplainer: fast and exact SHAP values for tree ensembles
2. KernelExplainer: approximate explainer for black box estimators
3. DeepExplainer: high-speed approximate explainer for deep learning models.
4. ExpectedGradients: SHAP-based extension of integrated gradients
XGBoost on UCI Income Dataset
Output is probability of income
over 50k
f87
f23
f23 f3
f34
f41
Base ValueOutput
Note: SHAP values are Model
Dependent.
Model 1
Model 2
Is This Form of Explainability
Enough?
• Explainability does not provide us with recourse.
• Recourse: Information needed to change a specific prediction to a
desired value.
• “If you had paid your credit card balance in full for the last three
months, you would have got that loan.”
Issues with SHAP and LIME
SHAP and LIME values are highly variable for instances that are very similar for
non-linear models.

On the Robustness of Interpretability Methods
https://arxiv.org/abs/1806.08049
Issues with SHAP and LIME
SHAP and LIME values are highly variable for instances that are very similar for
non-linear models.

On the Robustness of Interpretability Methods
https://arxiv.org/abs/1806.08049
Issues with SHAP and LIME
SHAP and LIME values don’t provide insight into how the model will behave on new instances.
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16982
High-Precision Model-Agnostic Explanations
Take-home message
• Explainability is possible need not come at the cost of
performance.
• Explainability is not enough
• Recourse, etc.
Bias
Fairness and Bias in Machine
Learning
1. Bias in this context is unfairness (more or less).
2. Note we are not talking about standard statistical bias in machine
learning (the bias in the bias vs. variance tradeoff).
3. For completeness, this is one definition of statistical bias in machine
learning.
• Bias = Expected value of model - true value
Definitions of Fairness or Bias
1. Many, many, many definitions exists.
2. Application dependent. No one definition is better.
3. See “21 Definitions of Fairness” tutorial by Arvind Narayanan,ACM
2018 FAT*.
1. Key Point: Dozens of definitions exist (and not just 21)
Setting
1. Classifier C with binary output d in {+, -}, a real-valued score s.
1. Instances or data points are generally humans.
2. The + class is desired and the negative - class is not desired.
2. Input X, and
1. one or more sensitive/protected attribute G (e.g. gender) that are part
of the input. E.g. Possible values of G = {m, f}
3. A set of instances sharing a common sensitive attribute is privileged
(receives more + labels).The other is unprivileged (receives less + labels)
4. True output Y
1. Fairness through
Unawareness
• Simple Idea: Do not consider any sensitive attributes when
building the model.
• Advantage: Some support in the law (disparate treatment)?
• Disadvantage:: Other attributes may be correlated with
sensitive attributes (such as job history, geographical location
etc.)
2. Statistical Parity Difference
• Different groups should have the same proportion (or
probability) of positive and negative labels. Ideally the below
value should be close to zero:
• Advantages: Legal support in the form of a rule known as the fourth-fifths rule. May remove
historical bias.
• Disadvantages:
• Trivial classifiers such as classifiers which randomly assign the same of proportion of labels
across different groups satisfy this definition.
• Perfect classifier Y = d may not be allowed if ground truth rates of labels are different across
groups.
3. Equal Opportunity
Difference
• Different groups have the same true positive rate. Ideally the
below value should be close to zero:
• Advantages:
• Perfect classifier allowed.
• Disadvantages:
• May perpetuate historical biases.
• E.g. Hiring application with 100 privileged and 100 unprivileged, but 40 qualified in privileged and 4 in unprivileged.
• By hiring 20 and 2 from each privileged and unprivileged you will satisfy this.
4. False Negative Error
Balance
• If the application is punitive in nature
• Different groups should have the same false negative scores.
• Example:
• The proportion of black defendants who don’t recidivate and receive high risk
scores

Should be the same as
• The proportion of white defendants who don’t recidivate and receive high risk
scores.
5.Test Fairness
• Scores should have the same meaning across different groups.
Impossibility Results
• Core of the debate in COMPAS.
• ProPublica: false negatives should be the same across
different groups
• Northpointe: scores should have the same meaning across
groups. (test fairness)
• Result: If prevalence rates (ground truth proportion of labels
across different groups) are different, and if test fairness is
satisfied then false negatives will differ across groups.
Chouldechova, A., 2017. Fair prediction with disparate impact: A study of bias in recidivism
prediction instruments. Big data, 5(2), pp.153-163.
Tools for Measuring Bias
https://github.com/IBM/AIF360
AI Fairness 360 (AIF 360):
Measuring Bias
Mitigation: Removing Bias
• Mitigation can be happen in three different places:
• Before the model is built, in the training data
• In the model
• After the model is built, with the predictions:
Accuracy = 66%
COMPAS
Before the model is built
• Reweighing (roughly at a high-level):
• Increase weights for some
• Unprivileged with positive labels
• Privileged with negative labels
• Decrease weights for some
• Unprivileged with negative labels
• Privileged with positive labels
+ -
- +
COMPAS
Accuracy = 66%
Accuracy = 66%
Reweighing
AI Fairness 360 Toolkit https://aif360.mybluemix.net
In the model
Zhang, B.H., Lemoine, B. and Mitchell, M., 2018, December. Mitigating
unwanted biases with adversarial learning. In Proceedings of the 2018
AAAI/ACM Conference on AI, Ethics, and Society (pp. 335-340). ACM.
COMPAS
Adversarial
De-biasing
Accuracy = 67%Accuracy = 66%
AI Fairness 360 Toolkit https://aif360.mybluemix.net
After the model is built
• Reject option classification:
• Assume the classifier outputs a probability score.
• If the classifier score is within a small band around 0.5:
• If unprivileged then predict positive
• If privileged then predict negative
Probability of + label for
unprivileged
0 1
0
1
Probability of - label for
unprivileged
COMPAS
Reject
Option
Accuracy = 66% Accuracy = 65%
AI Fairness 360 Toolkit https://aif360.mybluemix.net
Tools
https://github.com/IBM/AIF360
AI Fairness 360 (AIF 360):
Mitigating Bias
Take-home message
• Many forms of fairness and bias exist: most of them are
incompatible with each other.
• Bias can be decreased with algorithms (with usually some
loss in performance)
Thank you
Extras
Choosing Definitions
https://dsapp.uchicago.edu/projects/aequitas/From

Explainability and bias in AI

  • 1.
    Explainability and Biasin ML/AI Models Naveen Sundar Govindarajulu August 9, 2019 Visit and sign up RealityEngines.AI
  • 2.
  • 3.
  • 4.
    Why Explainability? • Moreuse of ML/AI models by laypersons. • Laypersons need explanations • Developers also need quick explanations to debug models faster • There may be a legal need for explanations: • If you deny someone a loan, you may need to explain the reason for the denial.
  • 5.
  • 6.
    Explainability using InterpretableModels Prior offenses <= 0 Low Risk High Risk Armed offense? Med Risk YES NO NO YES
  • 7.
    Explainability vs Performance Tradeoff •Some machine learning models are more explainable than others. Performance Explainability Deep learning models Linear Models DecisionTrees
  • 8.
  • 9.
    What Features? Interpretable Features •We need interpretable features. • Difficult for laypersons to understand raw feature spaces (e.g. word embeddings) • Humans are good at understanding presence or absence of components.
  • 10.
    Interpretable Instance • E.g. •For Text: • Convert to a binary vector indicating presence or absence of words • For images • Convert to a binary vector indicating presence or absence of pixels or contiguous regions.
  • 11.
    Method 1: LIME From https://github.com/marcotcr/lime LocallyInterpretable Model-agnostic Explanations Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM.
  • 12.
    Method 1: LIME Anyclassifier 1 1 0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 -2.1 1.1 -0.5 2.2 -1.2 -1.5 1 -3 0.8 5.6 1.5 Weights for the linear classifier then give us feature importances Binary vectors -2.1 2.2 -3 5.6 Enforce sparsity
  • 13.
    Example: Text Sentiment Classification “Themovie is not bad” This movie is not bad 0 0 0 2.3 -1.5
  • 14.
    Explanation for “Cat” LIMEwith Images From https://github.com/marcotcr/lime
  • 15.
    Ribeiro, M.T., Singh,S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM. Explanations for Multi-Label Classifiers
  • 16.
    Ribeiro, M.T., Singh,S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM. Using LIME for Debugging (E.g. 1)
  • 17.
  • 18.
  • 19.
    Method 2: SHAP Unifiesmany different feature attribution methods and has some desirable properties. 1. LIME 2. Integrated Gradients 3. Shapley values 4. DeepLift Lundberg, S.M. and Lee, S.I., 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (pp. 4765-4774).
  • 20.
    Method 2: SHAP •Derives from game-theoretic foundations. • Shapley values used in game theory to assign values to players in cooperative games.
  • 21.
    What are Shapleyvalues? • Suppose there is a set S of N players participating in a game with payoff for any S subset of players participating in the game given by: • Shapley values provide one fair way of dividing up the total payoff among the N players.
  • 22.
    ShapleyValue Payoff for thegroup including player i Shapley value for player i Payoff for a group without player i
  • 23.
    SHAP Explanations • Playersare features. • Payoff is the model’s real valued prediction.
  • 24.
    SHAP Implementation (https://github.com/slundberg/shap) Different kindsof explainers: 1. TreeExplainer: fast and exact SHAP values for tree ensembles 2. KernelExplainer: approximate explainer for black box estimators 3. DeepExplainer: high-speed approximate explainer for deep learning models. 4. ExpectedGradients: SHAP-based extension of integrated gradients
  • 25.
    XGBoost on UCIIncome Dataset Output is probability of income over 50k f87 f23 f23 f3 f34 f41 Base ValueOutput
  • 26.
    Note: SHAP valuesare Model Dependent. Model 1 Model 2
  • 27.
    Is This Formof Explainability Enough? • Explainability does not provide us with recourse. • Recourse: Information needed to change a specific prediction to a desired value. • “If you had paid your credit card balance in full for the last three months, you would have got that loan.”
  • 28.
    Issues with SHAPand LIME SHAP and LIME values are highly variable for instances that are very similar for non-linear models.
 On the Robustness of Interpretability Methods https://arxiv.org/abs/1806.08049
  • 29.
    Issues with SHAPand LIME SHAP and LIME values are highly variable for instances that are very similar for non-linear models.
 On the Robustness of Interpretability Methods https://arxiv.org/abs/1806.08049
  • 30.
    Issues with SHAPand LIME SHAP and LIME values don’t provide insight into how the model will behave on new instances. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16982 High-Precision Model-Agnostic Explanations
  • 31.
    Take-home message • Explainabilityis possible need not come at the cost of performance. • Explainability is not enough • Recourse, etc.
  • 32.
  • 33.
    Fairness and Biasin Machine Learning 1. Bias in this context is unfairness (more or less). 2. Note we are not talking about standard statistical bias in machine learning (the bias in the bias vs. variance tradeoff). 3. For completeness, this is one definition of statistical bias in machine learning. • Bias = Expected value of model - true value
  • 34.
    Definitions of Fairnessor Bias 1. Many, many, many definitions exists. 2. Application dependent. No one definition is better. 3. See “21 Definitions of Fairness” tutorial by Arvind Narayanan,ACM 2018 FAT*. 1. Key Point: Dozens of definitions exist (and not just 21)
  • 35.
    Setting 1. Classifier Cwith binary output d in {+, -}, a real-valued score s. 1. Instances or data points are generally humans. 2. The + class is desired and the negative - class is not desired. 2. Input X, and 1. one or more sensitive/protected attribute G (e.g. gender) that are part of the input. E.g. Possible values of G = {m, f} 3. A set of instances sharing a common sensitive attribute is privileged (receives more + labels).The other is unprivileged (receives less + labels) 4. True output Y
  • 36.
    1. Fairness through Unawareness •Simple Idea: Do not consider any sensitive attributes when building the model. • Advantage: Some support in the law (disparate treatment)? • Disadvantage:: Other attributes may be correlated with sensitive attributes (such as job history, geographical location etc.)
  • 37.
    2. Statistical ParityDifference • Different groups should have the same proportion (or probability) of positive and negative labels. Ideally the below value should be close to zero: • Advantages: Legal support in the form of a rule known as the fourth-fifths rule. May remove historical bias. • Disadvantages: • Trivial classifiers such as classifiers which randomly assign the same of proportion of labels across different groups satisfy this definition. • Perfect classifier Y = d may not be allowed if ground truth rates of labels are different across groups.
  • 38.
    3. Equal Opportunity Difference •Different groups have the same true positive rate. Ideally the below value should be close to zero: • Advantages: • Perfect classifier allowed. • Disadvantages: • May perpetuate historical biases. • E.g. Hiring application with 100 privileged and 100 unprivileged, but 40 qualified in privileged and 4 in unprivileged. • By hiring 20 and 2 from each privileged and unprivileged you will satisfy this.
  • 39.
    4. False NegativeError Balance • If the application is punitive in nature • Different groups should have the same false negative scores. • Example: • The proportion of black defendants who don’t recidivate and receive high risk scores
 Should be the same as • The proportion of white defendants who don’t recidivate and receive high risk scores.
  • 40.
    5.Test Fairness • Scoresshould have the same meaning across different groups.
  • 41.
    Impossibility Results • Coreof the debate in COMPAS. • ProPublica: false negatives should be the same across different groups • Northpointe: scores should have the same meaning across groups. (test fairness) • Result: If prevalence rates (ground truth proportion of labels across different groups) are different, and if test fairness is satisfied then false negatives will differ across groups. Chouldechova, A., 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2), pp.153-163.
  • 42.
    Tools for MeasuringBias https://github.com/IBM/AIF360 AI Fairness 360 (AIF 360): Measuring Bias
  • 43.
    Mitigation: Removing Bias •Mitigation can be happen in three different places: • Before the model is built, in the training data • In the model • After the model is built, with the predictions:
  • 44.
  • 45.
    Before the modelis built • Reweighing (roughly at a high-level): • Increase weights for some • Unprivileged with positive labels • Privileged with negative labels • Decrease weights for some • Unprivileged with negative labels • Privileged with positive labels + - - +
  • 46.
    COMPAS Accuracy = 66% Accuracy= 66% Reweighing AI Fairness 360 Toolkit https://aif360.mybluemix.net
  • 47.
    In the model Zhang,B.H., Lemoine, B. and Mitchell, M., 2018, December. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (pp. 335-340). ACM.
  • 48.
    COMPAS Adversarial De-biasing Accuracy = 67%Accuracy= 66% AI Fairness 360 Toolkit https://aif360.mybluemix.net
  • 49.
    After the modelis built • Reject option classification: • Assume the classifier outputs a probability score. • If the classifier score is within a small band around 0.5: • If unprivileged then predict positive • If privileged then predict negative Probability of + label for unprivileged 0 1 0 1 Probability of - label for unprivileged
  • 50.
    COMPAS Reject Option Accuracy = 66%Accuracy = 65% AI Fairness 360 Toolkit https://aif360.mybluemix.net
  • 51.
  • 52.
    Take-home message • Manyforms of fairness and bias exist: most of them are incompatible with each other. • Bias can be decreased with algorithms (with usually some loss in performance)
  • 53.
  • 54.
  • 55.