Explainability and bias in AI

Explainability and Bias in
ML/AI Models
Naveen Sundar Govindarajulu
August 9, 2019
Visit
and
sign up
RealityEngines.AI

Why now?
Life
Impacting
ML & AI
models

COMPAS
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingFrom:
Non recidivating black people twice as likely to be labelled high risk than non
recidivating white people

Why Explainability?
• More use of ML/AI models by laypersons.
• Laypersons need explanations
• Developers also need quick explanations to debug models
faster
• There may be a legal need for explanations:
• If you deny someone a loan, you may need to explain the
reason for the denial.

Explainability using Interpretable Models
Prior offenses <= 0
Low Risk High Risk
Armed offense?
Med Risk
YES
NO
NO YES

Explainability vs Performance
Tradeoff
• Some machine learning models are more explainable than
others.
Performance
Explainability
Deep learning models
Linear Models
DecisionTrees

Explainability Method:
Feature Attribution
Classiﬁer
Explainer
features
“Weights” for
features
Input
features
Output

What Features?
Interpretable Features
• We need interpretable features.
• Difﬁcult for laypersons to understand raw feature spaces (e.g.
word embeddings)
• Humans are good at understanding presence or absence of
components.

Interpretable Instance
• E.g.
• For Text:
• Convert to a binary vector indicating presence or absence
of words
• For images
• Convert to a binary vector indicating presence or absence
of pixels or contiguous regions.

Method 1: LIME
From
https://github.com/marcotcr/lime
Locally Interpretable Model-agnostic
Explanations
Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust
You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (pp. 1135-1144). ACM.

Method 1: LIME
Any classiﬁer
1 1 0 1 1 0 1 0 0 1 0
0 0 0 1 0 1 1 1 1 0 1
-2.1 1.1 -0.5 2.2 -1.2 -1.5 1 -3 0.8 5.6 1.5
Weights for the linear classiﬁer then
give us feature importances
Binary vectors
-2.1 2.2 -3 5.6
Enforce
sparsity

Example:
Text Sentiment Classiﬁcation
“The movie is not bad”
This movie is not bad
0 0 0 2.3 -1.5

Explanation for “Cat”
LIME with Images
From

Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM.
Explanations for Multi-Label
Classiﬁers

Ribeiro, M.T., Singh, S. and Guestrin, C., 2016, August. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144). ACM.
Using LIME for Debugging (E.g. 1)

Using LIME for Debugging (E.g. 2)

Method 2: SHAP
Uniﬁes many different feature attribution methods and has some
desirable properties.
1. LIME
2. Integrated Gradients
3. Shapley values
4. DeepLift
Lundberg, S.M. and Lee, S.I., 2017. A unified approach to interpreting model predictions. In
Advances in Neural Information Processing Systems (pp. 4765-4774).

Method 2: SHAP
• Derives from game-theoretic foundations.
• Shapley values used in game theory to assign values to players
in cooperative games.

What are Shapley values?
• Suppose there is a set S of N players
participating in a game with payoff for any S
subset of players participating in the game
given by:
• Shapley values provide one fair
way of dividing up the total
payoff among the N players.

ShapleyValue
Payoﬀ for the group
including player i
Shapley value for player i
Payoﬀ for a group without player i

SHAP Explanations
• Players are features.
• Payoff is the model’s real valued prediction.

SHAP Implementation
(https://github.com/slundberg/shap)
Different kinds of explainers:
1. TreeExplainer: fast and exact SHAP values for tree ensembles
2. KernelExplainer: approximate explainer for black box estimators
3. DeepExplainer: high-speed approximate explainer for deep learning models.
4. ExpectedGradients: SHAP-based extension of integrated gradients

XGBoost on UCI Income Dataset
Output is probability of income
over 50k
f87
f23
f23 f3
f34
f41
Base ValueOutput

Note: SHAP values are Model
Dependent.
Model 1
Model 2

Is This Form of Explainability
Enough?
• Explainability does not provide us with recourse.
• Recourse: Information needed to change a speciﬁc prediction to a
desired value.
• “If you had paid your credit card balance in full for the last three
months, you would have got that loan.”

Issues with SHAP and LIME
SHAP and LIME values are highly variable for instances that are very similar for
non-linear models. 
On the Robustness of Interpretability Methods
https://arxiv.org/abs/1806.08049

Issues with SHAP and LIME
SHAP and LIME values don’t provide insight into how the model will behave on new instances.
https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16982
High-Precision Model-Agnostic Explanations

Take-home message
• Explainability is possible need not come at the cost of
performance.
• Explainability is not enough
• Recourse, etc.

Fairness and Bias in Machine
Learning
1. Bias in this context is unfairness (more or less).
2. Note we are not talking about standard statistical bias in machine
learning (the bias in the bias vs. variance tradeoff).
3. For completeness, this is one deﬁnition of statistical bias in machine
learning.
• Bias = Expected value of model - true value

Definitions of Fairness or Bias
1. Many, many, many definitions exists.
2. Application dependent. No one definition is better.
3. See “21 Definitions of Fairness” tutorial by Arvind Narayanan,ACM
2018 FAT*.
1. Key Point: Dozens of definitions exist (and not just 21)

Setting
1. Classiﬁer C with binary output d in {+, -}, a real-valued score s.
1. Instances or data points are generally humans.
2. The + class is desired and the negative - class is not desired.
2. Input X, and
1. one or more sensitive/protected attribute G (e.g. gender) that are part
of the input. E.g. Possible values of G = {m, f}
3. A set of instances sharing a common sensitive attribute is privileged
(receives more + labels).The other is unprivileged (receives less + labels)
4. True output Y

1. Fairness through
Unawareness
• Simple Idea: Do not consider any sensitive attributes when
building the model.
• Advantage: Some support in the law (disparate treatment)?
• Disadvantage:: Other attributes may be correlated with
sensitive attributes (such as job history, geographical location
etc.)

2. Statistical Parity Difference
• Different groups should have the same proportion (or
probability) of positive and negative labels. Ideally the below
value should be close to zero:
• Advantages: Legal support in the form of a rule known as the fourth-fifths rule. May remove
historical bias.
• Disadvantages:
• Trivial classifiers such as classifiers which randomly assign the same of proportion of labels
across different groups satisfy this definition.
• Perfect classifier Y = d may not be allowed if ground truth rates of labels are different across
groups.

3. Equal Opportunity
Difference
• Different groups have the same true positive rate. Ideally the
below value should be close to zero:
• Advantages:
• Perfect classiﬁer allowed.
• Disadvantages:
• May perpetuate historical biases.
• E.g. Hiring application with 100 privileged and 100 unprivileged, but 40 qualiﬁed in privileged and 4 in unprivileged.
• By hiring 20 and 2 from each privileged and unprivileged you will satisfy this.

4. False Negative Error
Balance
• If the application is punitive in nature
• Different groups should have the same false negative scores.
• Example:
• The proportion of black defendants who don’t recidivate and receive high risk
scores 
Should be the same as
• The proportion of white defendants who don’t recidivate and receive high risk
scores.

5.Test Fairness
• Scores should have the same meaning across different groups.

Impossibility Results
• Core of the debate in COMPAS.
• ProPublica: false negatives should be the same across
different groups
• Northpointe: scores should have the same meaning across
groups. (test fairness)
• Result: If prevalence rates (ground truth proportion of labels
across different groups) are different, and if test fairness is
satisﬁed then false negatives will differ across groups.
Chouldechova, A., 2017. Fair prediction with disparate impact: A study of bias in recidivism
prediction instruments. Big data, 5(2), pp.153-163.

Tools for Measuring Bias
https://github.com/IBM/AIF360
AI Fairness 360 (AIF 360):
Measuring Bias

Mitigation: Removing Bias
• Mitigation can be happen in three different places:
• Before the model is built, in the training data
• In the model
• After the model is built, with the predictions:

Before the model is built
• Reweighing (roughly at a high-level):
• Increase weights for some
• Unprivileged with positive labels
• Privileged with negative labels
• Decrease weights for some
• Unprivileged with negative labels
• Privileged with positive labels
+ -
- +

COMPAS
Accuracy = 66%
Accuracy = 66%
Reweighing
AI Fairness 360 Toolkit https://aif360.mybluemix.net

In the model
Zhang, B.H., Lemoine, B. and Mitchell, M., 2018, December. Mitigating
unwanted biases with adversarial learning. In Proceedings of the 2018
AAAI/ACM Conference on AI, Ethics, and Society (pp. 335-340). ACM.

COMPAS
Adversarial
De-biasing
Accuracy = 67%Accuracy = 66%

After the model is built
• Reject option classification:
• Assume the classifier outputs a probability score.
• If the classifier score is within a small band around 0.5:
• If unprivileged then predict positive
• If privileged then predict negative
Probability of + label for
unprivileged
0 1
0
1
Probability of - label for
unprivileged

COMPAS
Reject
Option
Accuracy = 66% Accuracy = 65%

Tools
https://github.com/IBM/AIF360
AI Fairness 360 (AIF 360):
Mitigating Bias

Take-home message
• Many forms of fairness and bias exist: most of them are
incompatible with each other.
• Bias can be decreased with algorithms (with usually some
loss in performance)

Choosing Deﬁnitions
https://dsapp.uchicago.edu/projects/aequitas/From

Explainability and bias in AI

More Related Content

What's hot(20)

Similar to Explainability and bias in AI(20)

More from Bill Liu(20)

Recently uploaded(20)

Explainability and bias in AI