SlideShare a Scribd company logo
"You can't just turn the crank"
Machine learning for fighting abuse on the consumer web
David Freeman
Research Scientist/Engineer, Facebook Inc.
ScAINet 2018
Atlanta, GA USA, 11 May 2018
The consumer web
What do they try to do?
Malware
Payment

Fraud
Scraping
Click
Fraud
Phishing Spam Social

Engineering
Fake
Products
Scams
"Like"
FraudPromotion
Fraud
Identity
Theft
What do we see?
Fake
Reviews
Misinfor-
mation
Financial
Theft
Account
Resale
Fundamental question: Which requests are bad?
• Perfect for machine learning!
What could possibly go wrong?
Machine learning workflow
Label
Train
Validate
Launch
Measure
Profit!Lots!
How do we obtain labeled data?
(hint: not from your users)
Machine learning workflow
Label
• Human labeling of random samples.
• Labelers don't always know what they're looking for
• Labelers are inconsistent (with themselves and each other)
• Labelers get tired (esp. if most samples are good)
• Apply crowdsourcing best practices:
• Precise definitions, multiple labeling, ML-assisted sampling
• But will it scale?
Labeling: Gold standard
Objective measurement
• Find high-precision signals of badness
• Examples: unusual user-agent, malformed header
• DO NOT BLOCK ON THESE SIGNALS
• They are controlled by the adversary
• When the adversary adapts you will lose visibility
• Automatically generate signals using anomaly detection.
Labeling: Silver standard
Automatic labeling
• Use whatever you have!
• CS data, rules, other models

• Mitigate risks of blindness and feedback loops:
• Oversample manually labeled examples
• Oversample false positives and false negatives when retraining.
• Undersample positive examples from previous iterations of this model.
• Sample and label examples near the decision boundary
Labeling: Bronze standard
Be scrappy
• Users are terrible at
reporting.
• Product flows bias
reporting.
• Reports can be gamed.
• Reports can serve as an
directional measure.
Labeling: Iron standard
Have your users do the work
• Segment the problem
• e.g. status with link from country X
• Downsample intelligently
• if your distribution is lumpy, sample from all the lumps
• Learning the prior vs. focusing on the bad stuff
• no golden rule here -- you have to experiment
Assembling a training set
Labeling is just the beginning
{Training set 2
{
Model v2
Refreshing your data
Don't forget the past!
{Training set 1 {
Model v1
Mitigation:
• Keep old attacks around (exponential decay?)
• Keep old models around (raise thresholds?)
{
Training set N
{
Model vN
How do you know your model is ready to go?
Machine learning workflow
Train
Validate
• Labels aren't perfect
• Often miss on recall

• Models interact with each other
• Use offline P-R and ROC to stack-rank model candidates
Validating Performance
Don't trust offline replay Model B
FP
Model A
• Fundamental A/B testing assumption:

Experiment effects are independent of the cohorts chosen


The Perils of A/B Testing
The Perils of A/B Testing
A B
X
• Looks good so far....
Start with a small experiment
The Perils of A/B Testing
A B
X
• Did the adversary give up or iterate?
Roll it out to (almost) everyone — Option 1
The Perils of A/B Testing
A B
• Now your experiment is a vulnerability
Roll it out to (almost) everyone — Option 2
• Run new model online in "log-only" mode
• Evaluate performance where the new
model disagrees with the old one.
• ideally via sampling & labeling
• Push based on FP/FN tradeoff
Using Shadow Mode
Prod model
FP
New model
How do you figure out if it worked?
Machine learning workflow
Launch
Measure
True Positives Don't Matter
What's happening here?
Time
Precision
• Really want # of good users affected
• Solution: use one minus specificity (aka FPR)
True Positives Don't Matter
What's happening here?
Time
TP
Time
FP vs.
Time
FP
Time
TP
1
TN
FP + TN<latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit>
Not so fast!
Machine learning workflow
Profit!Adapt!
What not to Do (I)
Show the adversary what your limits are
Message 500 people
Message 400 people
Message 300 people
🛑
🛑
✅
• Introduce delay in blocking
response (and/or)

• Undo the damage without
telling the user.
What to do (I)
Don't give immediate feedback
"We don't want to be the ones solving the CAPTCHAs"
What not to Do (II)
Look for specific content to block
What to Do (II)
Focus on bad behavior, not only bad content
What to Do (III)
Use data the adversary doesn't know/control
Scoring at Entry Points
prevent access to accounts
Clustering, Anomaly Detection
prevent accounts from doing damage
User Reporting
find false negatives
Behavioral Analysis
detect bad activityIncreasing
speed
More
information
available
What to Do (IV)
Defense in depth
• Think about each step of the ML process.
• It's hard to build a good training set.
• Adversarial adaptation breaks many assumptions.
• Control the data & the response.
Take aways
Thanks to: Hervé Robert, Isaac Fullinwider, Henry Lu, Sagar Patel, Hongyang Li, Nektarios Leontiadis

More Related Content

DOCX
AES Abstract
SathrukaSinha
 
PPT
Digital Signature
nayakslideshare
 
PPTX
Connection Machine
butest
 
PPT
Email security
Indrajit Sreemany
 
PDF
Pianatoni Luca. Sintesi convegno
Gianfranco Tammaro
 
PPTX
Presentation - Electronic Data Interchange
Sharad Srivastava
 
PDF
Animation ppt
Ashok Mannava
 
PDF
Introduction to OpenMP
Akhila Prabhakaran
 
AES Abstract
SathrukaSinha
 
Digital Signature
nayakslideshare
 
Connection Machine
butest
 
Email security
Indrajit Sreemany
 
Pianatoni Luca. Sintesi convegno
Gianfranco Tammaro
 
Presentation - Electronic Data Interchange
Sharad Srivastava
 
Animation ppt
Ashok Mannava
 
Introduction to OpenMP
Akhila Prabhakaran
 

What's hot (20)

PPTX
Introduction to Cryptography
Md. Afif Al Mamun
 
PPT
24227541 cyber-law
Md Aktar
 
PPTX
Cryptography
subodh pawar
 
PDF
Hamming code checksum
Dr. Mahadev Gawas
 
PDF
Cryptography
Venkatesh Jambulingam
 
PPTX
ipgoals,assumption requirements
rajisri2
 
PDF
Digital signatures
Ishwar Dayal
 
PPTX
Confusion and Diffusion.pptx
bcanawakadalcollege
 
PPTX
Computer network switching
Shivani Godha
 
PPTX
cryptography
Abhijeet Singh
 
PPTX
Simple Mail Transfer Protocol
Ujjayanta Bhaumik
 
PPT
CPU Scheduling
SyedTalhaBukhari2
 
PPTX
BCT.pptx
ssuser3a47cb
 
DOCX
JNTUK r20 AIML SOC NLP-LAB-MANUAL-R20.docx
bslsdevi
 
PDF
Gestione Password
Danilo De Rogatis
 
PPT
E payment
Vishal Sancheti
 
PPTX
Network security and firewalls
Murali Mohan
 
PPTX
Pgp pretty good privacy
Pawan Arya
 
PPT
Chapter 3: Block Ciphers and the Data Encryption Standard
Shafaan Khaliq Bhatti
 
Introduction to Cryptography
Md. Afif Al Mamun
 
24227541 cyber-law
Md Aktar
 
Cryptography
subodh pawar
 
Hamming code checksum
Dr. Mahadev Gawas
 
Cryptography
Venkatesh Jambulingam
 
ipgoals,assumption requirements
rajisri2
 
Digital signatures
Ishwar Dayal
 
Confusion and Diffusion.pptx
bcanawakadalcollege
 
Computer network switching
Shivani Godha
 
cryptography
Abhijeet Singh
 
Simple Mail Transfer Protocol
Ujjayanta Bhaumik
 
CPU Scheduling
SyedTalhaBukhari2
 
BCT.pptx
ssuser3a47cb
 
JNTUK r20 AIML SOC NLP-LAB-MANUAL-R20.docx
bslsdevi
 
Gestione Password
Danilo De Rogatis
 
E payment
Vishal Sancheti
 
Network security and firewalls
Murali Mohan
 
Pgp pretty good privacy
Pawan Arya
 
Chapter 3: Block Ciphers and the Data Encryption Standard
Shafaan Khaliq Bhatti
 
Ad

Similar to "You can't just turn the crank": Machine learning for fighting abuse on the consumer web (20)

PDF
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
PDF
An introduction to machine learning and statistics
Spotle.ai
 
PDF
Barga Data Science lecture 10
Roger Barga
 
PDF
L15. Machine Learning - Black Art
Machine Learning Valencia
 
PDF
Before Kaggle
Pierre Gutierrez
 
PDF
Before Kaggle : from a business goal to a Machine Learning problem
Dataiku
 
PPTX
ODSC East 2020 : Continuous_learning_systems
Anuj Gupta
 
PPTX
Machine-Learning-Overview a statistical approach
Ajit Ghodke
 
PPTX
Building Continuous Learning Systems
Anuj Gupta
 
PDF
Drifting Away: Testing ML Models in Production
Databricks
 
PDF
MLSEV Virtual. Automating Model Selection
BigML, Inc
 
PDF
DataEngConf SF16 - Three lessons learned from building a production machine l...
Hakka Labs
 
PPTX
AI-900 - Fundamental Principles of ML.pptx
kprasad8
 
PPTX
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
PPTX
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
PPTX
Continuous Learning Systems: Building ML systems that learn from their mistakes
Anuj Gupta
 
PDF
Barga Data Science lecture 9
Roger Barga
 
PDF
Online Machine Learning: introduction and examples
Felipe
 
PPTX
An Overview of automated testing (1)
Rodrigo Lopes
 
PPTX
Supervised learning
Alia Hamwi
 
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
An introduction to machine learning and statistics
Spotle.ai
 
Barga Data Science lecture 10
Roger Barga
 
L15. Machine Learning - Black Art
Machine Learning Valencia
 
Before Kaggle
Pierre Gutierrez
 
Before Kaggle : from a business goal to a Machine Learning problem
Dataiku
 
ODSC East 2020 : Continuous_learning_systems
Anuj Gupta
 
Machine-Learning-Overview a statistical approach
Ajit Ghodke
 
Building Continuous Learning Systems
Anuj Gupta
 
Drifting Away: Testing ML Models in Production
Databricks
 
MLSEV Virtual. Automating Model Selection
BigML, Inc
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
Hakka Labs
 
AI-900 - Fundamental Principles of ML.pptx
kprasad8
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Anuj Gupta
 
Barga Data Science lecture 9
Roger Barga
 
Online Machine Learning: introduction and examples
Felipe
 
An Overview of automated testing (1)
Rodrigo Lopes
 
Supervised learning
Alia Hamwi
 
Ad

Recently uploaded (20)

PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Software Development Methodologies in 2025
KodekX
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Doc9.....................................
SofiaCollazos
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
The Future of Artificial Intelligence (AI)
Mukul
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Software Development Methodologies in 2025
KodekX
 

"You can't just turn the crank": Machine learning for fighting abuse on the consumer web

  • 1. "You can't just turn the crank" Machine learning for fighting abuse on the consumer web David Freeman Research Scientist/Engineer, Facebook Inc. ScAINet 2018 Atlanta, GA USA, 11 May 2018
  • 3. What do they try to do? Malware Payment
 Fraud Scraping Click Fraud Phishing Spam Social
 Engineering Fake Products Scams "Like" FraudPromotion Fraud Identity Theft What do we see? Fake Reviews Misinfor- mation Financial Theft Account Resale Fundamental question: Which requests are bad? • Perfect for machine learning!
  • 4. What could possibly go wrong? Machine learning workflow Label Train Validate Launch Measure Profit!Lots!
  • 5. How do we obtain labeled data? (hint: not from your users) Machine learning workflow Label
  • 6. • Human labeling of random samples. • Labelers don't always know what they're looking for • Labelers are inconsistent (with themselves and each other) • Labelers get tired (esp. if most samples are good) • Apply crowdsourcing best practices: • Precise definitions, multiple labeling, ML-assisted sampling • But will it scale? Labeling: Gold standard Objective measurement
  • 7. • Find high-precision signals of badness • Examples: unusual user-agent, malformed header • DO NOT BLOCK ON THESE SIGNALS • They are controlled by the adversary • When the adversary adapts you will lose visibility • Automatically generate signals using anomaly detection. Labeling: Silver standard Automatic labeling
  • 8. • Use whatever you have! • CS data, rules, other models
 • Mitigate risks of blindness and feedback loops: • Oversample manually labeled examples • Oversample false positives and false negatives when retraining. • Undersample positive examples from previous iterations of this model. • Sample and label examples near the decision boundary Labeling: Bronze standard Be scrappy
  • 9. • Users are terrible at reporting. • Product flows bias reporting. • Reports can be gamed. • Reports can serve as an directional measure. Labeling: Iron standard Have your users do the work
  • 10. • Segment the problem • e.g. status with link from country X • Downsample intelligently • if your distribution is lumpy, sample from all the lumps • Learning the prior vs. focusing on the bad stuff • no golden rule here -- you have to experiment Assembling a training set Labeling is just the beginning
  • 11. {Training set 2 { Model v2 Refreshing your data Don't forget the past! {Training set 1 { Model v1 Mitigation: • Keep old attacks around (exponential decay?) • Keep old models around (raise thresholds?) { Training set N { Model vN
  • 12. How do you know your model is ready to go? Machine learning workflow Train Validate
  • 13. • Labels aren't perfect • Often miss on recall
 • Models interact with each other • Use offline P-R and ROC to stack-rank model candidates Validating Performance Don't trust offline replay Model B FP Model A
  • 14. • Fundamental A/B testing assumption:
 Experiment effects are independent of the cohorts chosen 
 The Perils of A/B Testing
  • 15. The Perils of A/B Testing A B X • Looks good so far.... Start with a small experiment
  • 16. The Perils of A/B Testing A B X • Did the adversary give up or iterate? Roll it out to (almost) everyone — Option 1
  • 17. The Perils of A/B Testing A B • Now your experiment is a vulnerability Roll it out to (almost) everyone — Option 2
  • 18. • Run new model online in "log-only" mode • Evaluate performance where the new model disagrees with the old one. • ideally via sampling & labeling • Push based on FP/FN tradeoff Using Shadow Mode Prod model FP New model
  • 19. How do you figure out if it worked? Machine learning workflow Launch Measure
  • 20. True Positives Don't Matter What's happening here? Time Precision
  • 21. • Really want # of good users affected • Solution: use one minus specificity (aka FPR) True Positives Don't Matter What's happening here? Time TP Time FP vs. Time FP Time TP 1 TN FP + TN<latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit>
  • 22. Not so fast! Machine learning workflow Profit!Adapt!
  • 23. What not to Do (I) Show the adversary what your limits are Message 500 people Message 400 people Message 300 people 🛑 🛑 ✅
  • 24. • Introduce delay in blocking response (and/or)
 • Undo the damage without telling the user. What to do (I) Don't give immediate feedback
  • 25. "We don't want to be the ones solving the CAPTCHAs" What not to Do (II) Look for specific content to block
  • 26. What to Do (II) Focus on bad behavior, not only bad content
  • 27. What to Do (III) Use data the adversary doesn't know/control
  • 28. Scoring at Entry Points prevent access to accounts Clustering, Anomaly Detection prevent accounts from doing damage User Reporting find false negatives Behavioral Analysis detect bad activityIncreasing speed More information available What to Do (IV) Defense in depth
  • 29. • Think about each step of the ML process. • It's hard to build a good training set. • Adversarial adaptation breaks many assumptions. • Control the data & the response. Take aways Thanks to: Hervé Robert, Isaac Fullinwider, Henry Lu, Sagar Patel, Hongyang Li, Nektarios Leontiadis