Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant Professor of Computer Science, UC Irvine

Explaining Black-Box
Machine Learning Predictions
Sameer Singh
University of California, Irvine

Machine Learning is Everywhere…

Classification: Wolf or a Husky?
Machine
Learning
Model
Wolf!

Machine
Learning
Model
Husky!

Only 1 mistake!

More Complex: Question Answering
Is there a moustache in the picture?
> Yes
What is the moustache made of?
> Banana

Essentially black-boxes!
How can we trust the
predictions are correct?
How do we know they are
not breaking regulations?
How do we avoid
“stupid mistakes”?
Trust
How can we understand and
predict the behavior?
Predict
How do we improve it to
prevent potential mistakes?
Improve

Only 1 mistake!
We’ve built a
snow detector…

Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant Professor of Computer Science, UC Irvine

Visual Question Answering
What is the moustache made of?
> Banana
What are the eyes made of?
> Bananas
What?
> Banana What is?
> Banana

Text Classification
Why did this
happen?
From: Keith Richards
Subject: Christianity is the answer
NTTP-Posting-Host: x.x.com
I think Christianity is the one true religion.
If you’d like to know more, send me a note

Applying for a Loan
Machine
Learning
I would like to apply for a loan.
vvvvvvv
vvvvvvv
vvvvvvv
vvvvvvv
vvvvvvv
Here is my information.
vvvvvvv
vvvvvvv
vvvvvvv
vvvvvvv
vvvvvvv
Sorry, your request has been denied
Why? What were the reasons?
Currently
Cannot explain.. [0.25,-4.5,3.5,-10.4,…]

How did we get here?
Big Data and Deep Learning

Linear Classifiers
X1
X2
You can interpret it…
- Both have a positive effect
- X1 > X2
10X1 + X2 - 5 > 0if:
otherwise

Decision trees
X1
X2
X1 > 0.5
X2 > 0.5
You can interpret it…
- X2 is irrelevant if X1<0.5
- Otherwise X2 is enough

Looking at the structure
Trust
Predict
Improve
Test whether the structure
agrees with our intuitions.
Structure tells us exactly what
will happen on any data.
Structure tells you where the
error is, thus how to fix it.

Big Data: More Complexity
X1
X2

http://www.mckinsey.com/industries/high-tech/our-insights/an-executives-guide-to-machine-learning

Big Data: More Dimensions
Savings
Income
Profession
Loan Amount
Age
Marital
Status
Past defaults
Credit scores
Recent defaults
This easily goes to hundreds
- Images: thousands
- Text: tens of thousands
- Video: millions
- … and so on

X1
X2
Complex Surfaces
Savings
Income
Profession
Loan Amount
Age
Married
Past defaults
Credit scores
Recent defaults
…
Lots of dimensions
+
Black-boxes!

Accuracy vs Interpretability
Interpretability
Accuracy
10X1 + X2 - 5 > 0
X1 > 0.5
X2 > 0.5
millions of weights,
complex features
Real-world use case
Research on
“interpretable models”

Deep Learning
Interpretability
Accuracy
Real-world use case
Research on
“interpretable models”
Focus on accuracy!
Human-level

Explaining Predictions
The LIME Algorithm

Being Model-Agnostic…
No assumptions about the internal structure…
X1 > 0.5
X2 > 0.5f(x)
Explain any existing, or future, model
Data Decision

LIME: Explain Any Classifier!
Interpretability
Accuracy
Real-world use case Make
everything
interpretable!

What an explanation looks like
Why did this
happen?

“Global” explanation is too complicated

“Global” explanation is too complicated
Explanation is an interpretable model,
that is locally accurate

Google’s Object Detector
P( ) = 0.21P( ) = 0.24P( ) = 0.32

Neural Network Explanations
We’ve built a great snow detector…

Understanding Behavior
We’ve built a great snow detector…

Comparing Classifiers
Classifier 1
Classifier 2
Explanations?
Look at Examples?
Deploy and Check?
“I have a gut feeling..”
Accuracy?
Change the model
Different data
Different parameters
Different “features”
…

Comparing Classifiers
Original Image “Bad” Classifier “Good” Classifier

Explanation for a bad classifier
After looking at the explanation,
we shouldn’t trust the model!

“Good” Explanation
It seems to be picking
up on more reasonable
things.. good!

Recent Work
Counter-examples and Counter-factuals

Understanding via Predicting
Users “understand” a model if they can
predict its behavior on unseen instances
Precision is much more
important than Coverage!
Precision
How accurate are the users guesses?
If the users guess wrong, they don’t understand
Coverage
How often do the users make confident guesses?
It’s okay not to be able to guess!
It’s much better not to guess than to guess
confidently, but be completely wrong!

Linear Explanations
This movie is not bad. This movie is not very good.
LIMELIME
D
D
…
D
This director is always bad.
This movie is not nice.
This stuff is rather honest.
This star is not bad.
Problem 1: Where is the explanation good?
Explanation is wrong in this region
This explanation is a better approximation
than the other one.
Problem 2: What is the coverage?
Explanation doesn’t apply here
→ Users will make mistakes!

Anchors: Precise Counter-factuals
Anchor: ”not bad” →
This movie is not bad.
This audio is not bad.
This novel is not bad.
This footage is not bad.
D(.|A)
Positive
This movie is not very good.
Anchor: ”not good” →
This poster is not ever good.
This picture is not rarely good.
This actor is not incredibly good.
D(.|A)
Negative
anchor
anchor
LIMELIME
D
D
An anchor is a sufficient condition
Clear (and adaptive) coverage Probabilistic guarantee avoids human mistakes

Salary Prediction
IF Education < High School
Then Predict Salary < 50K
Salary
71%
29% >$50K
<$50K

What’s a Good Explanation?
We want to understand the models
Compact description
Lines, Decision Trees,
Simple Rules, etc.
When we read them,
we imagine instances
where they apply, and
where they don’t
Directly show useful examples?
What examples describe the behavior?
Closest Counter-example:
How can we change this example
to change the prediction?

Adversarial Examples
Goodfellow et al, "Explaining and Harnessing Adversarial Examples", ICLR 2015.
adversary predicted as "2"original MNIST digit "3"
+ .02 x =
adversarial noise
"inputs formed by applying small but
intentionally worst-case perturbations
to examples from the dataset, such
that the perturbed input results in the
model outputting an incorrect answer
with high confidence"

Adversarial Examples: Pros
Advantages:
◦ Applicable to any gradient -based classifier
◦ Useful to evaluate the robustness of the model against adversaries
◦ Small perturbations often lead to imperceivable adversarial examples

Adversarial Examples: Cons
Disadvantages:
◦ Examples are unnatural
◦ may not look anything you would naturally see in the "wild"
◦ Distance is not always meaningful
◦ E.g. color change or translation/rotation of an image
◦ Cannot be used for structured domains like text, code, etc.:
◦ E.g. replacing/removing words results in sentences that are not grammatical
◦ Do not provide insights into why the sample is an adversary
◦ How is the model working?
◦ How to fix the model?

Machine Translation
Debug Google Translate, remotely!

Explanations are important!
Trust
Predict
Improve
Model Agnostic
Explanations

Thanks! sameer@uci.edu
sameersingh.org
Model Agnostic
Explanations
Work with Marco T. Ribeiro, Carlos Guestrin, Dheeru Dua, and Zhengli Zhao

Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant Professor of Computer Science, UC Irvine

More Related Content

What's hot (20)

Similar to Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant Professor of Computer Science, UC Irvine (20)

More from Sri Ambati (20)

Recently uploaded (20)

Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant Professor of Computer Science, UC Irvine

Editor's Notes