SlideShare a Scribd company logo
Responsible Data Use
Sofus A. Macskássy
Data Science @ LinkedIn
smacskassy@linkedin.com
November 2019
Pillars of
Responsible
Data Use
Bias
Privacy
Explainability
Governance
The Coded Gaze [Joy Buolamwini 2016]
Face detection software: Fails for some darker faces
Bias
• Facial analysis software:
Higher accuracy for light
skinned men
• Error rates for dark skinned
women: 20% - 34%
Gender Shades
[Joy Buolamwini &
Timnit Gebru, 2018]
Bias
Bias
• Ethical challenges posed
by AI systems
• Inherent biases present in
society
• Reflected in training data
• AI/ML models prone to
amplifying such biases
Algorithmic Bias
Bias
Massachusetts Group
Insurance Commission
(1997): Anonymized medical
history of state employees
William Weld vs
Latanya Sweeney
Latanya Sweeney (MIT grad
student): $20 – Cambridge
voter roll
born July 31, 1945
resident of 02138
Privacy
64%Uniquely identifiable with ZIP +
birth date + gender (in the US
population)
Golle, “Revisiting the Uniqueness of Simple Demographics in the US Population”, WPES 2006
Privacy
A self driving car knocked down and
killed a pedestrian in Tempe, AZ in 2018.
- Who is to blame (accountability)
- Who to prevent this (safety)
- Should we ban self-driving cars
(liability and policy evaluation)
The need for XAI
Explainability
https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html
A recent research paper shows that a
classifier that could recognize wolves
from husky dogs was basing its decision
solely on the presence of snow in the
background.
The need for XAI
Ribeiro, Singh, and Guestrin. 2016. "Why Should I Trust You?": Explaining
the Predictions of Any Classifier. SIGKDD 2016.
Explainability
The need for XAI
Explainable AI is good for multiple reasons:
- Builds trust (why did you do this)
- Can be judged (how much do I believe
the prediction)
- Can be corrected (new training or
tweaks to correct errors)
- Is actionable (I know what do to next)
- … Explainability
Data Governance
Governance
Reflect company policies
Ensures compliance
Protects data
Protects company
Involves all orgs in a company
https://www.dama.org/sites/default/files/download/DAMA-DMBOK2-Framework-V2-20140317-FINAL.pdf
Laws against Discrimination
Immigration Reform and Control Act
Citizenship
Rehabilitation Act of 1973;
Americans with Disabilities
Act of 1990
Disability status
Civil Rights Act of 1964
Race
Age Discrimination in Employment
Act of 1967
Age
Equal Pay Act of 1963;
Civil Rights Act of 1964
Sex
And more...
Fairness Privacy
Transparency Explainability
Responsible
Data Use @
LinkedIn
Case studies
- Bias
- Privacy
- Governance
LinkedIn operates the largest professional
network on the Internet
Tell your story 645M+ members
30M+
companies are
represented on
LinkedIn
90K+
schools listed
(high school &
college)
35K+
skills listed
20M+
open jobs
on LinkedIn
Jobs
280B
Feed updates
Bias @
LinkedIn
Fairness-aware Talent
Search Ranking
Guiding Principle:
“Diversity by Design”
Insights to
Identify Diverse
Talent Pools
Representative
Talent Search
Results
Diversity
Learning
Curriculum
“Diversity by Design” in LinkedIn’s Talent Solutions
Plan for Diversity
Plan for Diversity
Identify Diverse Talent Pools
Inclusive Job Descriptions / Recruiter Outreach
Representative Ranking for Talent Search
S. C. Geyik, S. Ambler,
K. Kenthapadi, Fairness-
Aware Ranking in Search &
Recommendation Systems with
Application to LinkedIn Talent
Search, KDD’19.
[Microsoft’s AI/ML
conference
(MLADS’18). Distinguished
Contribution Award]
Building Representative
Talent Search at LinkedIn
(LinkedIn engineering blog)
Intuition for Measuring and Achieving
Representativeness
• Ideal: Top ranked results should follow a desired distribution on
gender/age/…
• E.g., same distribution as the underlying talent pool
• Inspired by “Equal Opportunity” definition [Hardt et al, NIPS’16]
• Defined measures (skew, divergence) based on this intuition
Fairness-aware Reranking Algorithm (Simplified)
• Partition the set of potential candidates into different buckets for each
attribute value
• Rank the candidates in each bucket according to the scores assigned by
the machine-learned model
• Merge the ranked lists, balancing the representation requirements and
the selection of highest scored candidates
• Algorithmic variants based on how we choose the next attribute
Architecture
Validating Our Approach
• Gender Representativeness
• Over 95% of all searches are representative compared to the qualified
population of the search
• Business Metrics
• A/B test over LinkedIn Recruiter users for two weeks
• No significant change in business metrics (e.g., # InMails sent or accepted)
• Ramped to 100% of LinkedIn Recruiter users worldwide
Lessons
learned
• Post-processing approach desirable
• Model agnostic
• Scalable across different model choices
for our application
• Acts as a “fail-safe”
• Robust to application-specific business
logic
• Easier to incorporate as part of existing
systems
• Build as a stand-alone service or
component for post-processing
• No significant modifications to the existing
components
• Complementary to efforts to reduce bias from
training data & during model training
Engineering for Fairness in AI Lifecycle
Problem
Formation
Dataset
Construction
Algorithm
Selection
Training
Process
Testing
Process
Deployment
Feedback
Is an algorithm an ethical
solution to our problem?
Does our data include enough
minority samples?
Are there missing/biased
features?
Do we need to apply debiasing
algorithms to preprocess our
data?
Do we need to include fairness
constraints in the function?
Have we evaluated the model
using relevant fairness metrics?
Is the model’s effect
similar across all users?
Are we deploying our
model on a population
that we did not train/test
on?
Does the model encourage
feedback loops that can
produce increasingly unfair
outcomes?
Credit: K. Browne & J. Draper
Engineering for Fairness in AI Lifecycle
S.Vasudevan, K. Kenthapadi, FairScale: A Scalable Framework for Measuring Fairness in AI Applications, 2019
Fairness-aware Experimentation
[Saint-Jacques & Sepehri, KDD’19 Social Impact Workshop]
Imagine LinkedIn has 10 members.
Each of them has 1 session a day.
A new product increases sessions by +1 session per member on average.
Both of these are +1 session / member on average!
One is much more unequal than the other. We want to catch that.
Privacy @
LinkedIn
Framework to compute
robust, privacy-
preserving analytics
Analytics & Reporting Products at LinkedIn
Profile View
Analytics
34
Content
Analytics
Ad Campaign
Analytics
All showing demographics
of members engaging
with the product
• Admit only a small # of predetermined query types
• Querying for the number of member actions, for a specified time period,
together with the top demographic breakdowns
Analytics & Reporting Products at LinkedIn
• Admit only a small # of predetermined query types
• Querying for the number of member actions, for a specified time period,
together with the top demographic breakdowns
Analytics & Reporting Products at LinkedIn
E.g., Title = “Senior
Director”
E.g., Clicks on a
given ad
Privacy Requirements
• Attacker cannot infer whether a member performed an action
• E.g., click on an article or an ad
• Attacker may use auxiliary knowledge
• E.g., knowledge of attributes associated with the target member (say,
obtained from this member’s LinkedIn profile)
• E.g., knowledge of all other members that performed similar action (say, by
creating fake accounts)
Possible Privacy Attacks
38
Targeting:
Senior directors in US, who studied at Cornell
Matches ~16k LinkedIn members
→ over minimum targeting threshold
Demographic breakdown:
Company = X
May match exactly one person
→ can determine whether the person
clicks on the ad or not
Require minimum reporting threshold
Attacker could create fake profiles!
E.g. if threshold is 10, create 9 fake profiles
that all click.
Rounding mechanism
E.g., report incremental of 10
Still amenable to attacks
E.g. using incremental counts over time to
infer individuals’ actions
Need rigorous techniques to preserve member privacy
(not reveal exact aggregate counts)
Problem Statement
•Compute robust, reliable analytics in a privacy-
preserving manner, while addressing the product needs.
Differential Privacy
Curator
Defining Privacy
Defining Privacy
42
CuratorCurator
+ your data
- your data
Differential Privacy
43
Databases D and D′ are neighbors if they differ in one person’s data.
Differential Privacy: The distribution of the curator’s output M(D) on database
D is (nearly) the same as M(D′).
Curator
+ your data
- your data
Dwork, McSherry, Nissim, Smith [TCC 2006]
Curator
Privacy System Architecture
Governance
@ LinkedIn
Keeping our data safe and
secure for members
Problem statement
• We have a lot of data
• Some may have PII data
• How do we keep this secure?
• Removing PII data
• Tracking access
Policy: Keeping the data safe
Solution through Technology
• Meta data store
• Tag all attributes in all tables
• Know which fields are PII
• Know which fields need protection
• Audit access to data
• Obfuscate data wherever possible
Tracking pedigree of data
• Tables can be combined to
create new tables
• Automatically track
pedigree of attributes and
their PII value
• Assess new attributes for
PII as well
• Have authors be
accountable
Name Type PII ?
Name string Yes
Age String Yes
A1 string No
A2 url No
Name Type PII ?
Name string Yes
Adult boolean No
B2 string No
C1 number No
Name Type PII ?
Name string Yes
B1 number No
B2 string No
B3 string No
Reflections
Fairness in ML
• Application specific challenges
• Conversational AI systems: Unique bias/fairness/ethics considerations
• E.g., Hate speech, Complex failure modes
• Beyond protected categories, e.g., accent, dialect
• Entire ecosystem (e.g., including apps such as Alexa skills)
• Two-sided markets: e.g., fairness to buyers and to sellers, or to content consumers
and producers
• Fairness in advertising (externalities)
• Tools for ensuring fairness (measuring & mitigating bias) in AI lifecycle
• Pre-processing (representative datasets; modifying features/labels)
• ML model training with fairness constraints
• Post-processing
• Experimentation & Post-deployment
Explainability in ML
• Actionable explanations
• Balance between explanations & model secrecy
• Robustness of explanations to failure modes (Interaction between ML
components)
• Application-specific challenges
• Conversational AI systems: contextual explanations
• Gradation of explanations
• Tools for explanations across AI lifecycle
• Pre & post-deployment for ML models
• Model developer vs. End user focused
Privacy in ML
• Privacy-preserving model training, robust against adversarial
membership inference attacks
• Privacy for highly sensitive data: model training & analytics using secure
enclaves, homomorphic encryption, federated learning / on-device
learning, or a hybrid
• Privacy-preserving transfer learning (broadly, privacy-preserving
mechanisms for data marketplaces)
Thank you
Sofus A. Macskássy
Data Science @ LinkedIn
smacskassy@linkedin.com

More Related Content

PDF
AI Governance – The Responsible Use of AI
NUS-ISS
 
PDF
Generative-AI-in-enterprise-20230615.pdf
Liming Zhu
 
PDF
Fairness in Machine Learning and AI
Seth Grimes
 
PPTX
Responsible AI in Industry (ICML 2021 Tutorial)
Krishnaram Kenthapadi
 
PPTX
Generative AI Risks & Concerns
Ajitesh Kumar
 
PPTX
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
PPTX
Generative AI and law.pptx
Chris Marsden
 
PPTX
Future of AI - 2023 07 25.pptx
Greg Makowski
 
AI Governance – The Responsible Use of AI
NUS-ISS
 
Generative-AI-in-enterprise-20230615.pdf
Liming Zhu
 
Fairness in Machine Learning and AI
Seth Grimes
 
Responsible AI in Industry (ICML 2021 Tutorial)
Krishnaram Kenthapadi
 
Generative AI Risks & Concerns
Ajitesh Kumar
 
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Generative AI and law.pptx
Chris Marsden
 
Future of AI - 2023 07 25.pptx
Greg Makowski
 

What's hot (20)

PDF
A Framework for Navigating Generative Artificial Intelligence for Enterprise
RocketSource
 
PDF
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
PDF
Introduction to the ethics of machine learning
Daniel Wilson
 
PDF
Leveraging Generative AI & Best practices
DianaGray10
 
PDF
GenAI in Research with Responsible AI
Liming Zhu
 
PDF
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
PDF
Using the power of Generative AI at scale
Maxim Salnikov
 
PPTX
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Krishnaram Kenthapadi
 
PPTX
Technology for everyone - AI ethics and Bias
Marion Mulder
 
PPTX
A Tutorial to AI Ethics - Fairness, Bias & Perception
Dr. Kim (Kyllesbech Larsen)
 
PDF
Responsible AI
Neo4j
 
PDF
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
PPTX
Using Generative AI
Mark DeLoura
 
PDF
A comprehensive guide to Agentic AI Systems
Debmalya Biswas
 
PDF
An Introduction to Generative AI - May 18, 2023
CoriFaklaris1
 
PDF
How do we train AI to be Ethical and Unbiased?
Mark Borg
 
PDF
Generative AI
lutzsuarnaba1
 
PPTX
Ethics of Analytics and Machine Learning
Mark Underwood
 
PPTX
Ethical Considerations in the Design of Artificial Intelligence
John C. Havens
 
PDF
AIF360 - Trusted and Fair AI
Animesh Singh
 
A Framework for Navigating Generative Artificial Intelligence for Enterprise
RocketSource
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
PremNaraindas1
 
Introduction to the ethics of machine learning
Daniel Wilson
 
Leveraging Generative AI & Best practices
DianaGray10
 
GenAI in Research with Responsible AI
Liming Zhu
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
Using the power of Generative AI at scale
Maxim Salnikov
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Krishnaram Kenthapadi
 
Technology for everyone - AI ethics and Bias
Marion Mulder
 
A Tutorial to AI Ethics - Fairness, Bias & Perception
Dr. Kim (Kyllesbech Larsen)
 
Responsible AI
Neo4j
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
Using Generative AI
Mark DeLoura
 
A comprehensive guide to Agentic AI Systems
Debmalya Biswas
 
An Introduction to Generative AI - May 18, 2023
CoriFaklaris1
 
How do we train AI to be Ethical and Unbiased?
Mark Borg
 
Generative AI
lutzsuarnaba1
 
Ethics of Analytics and Machine Learning
Mark Underwood
 
Ethical Considerations in the Design of Artificial Intelligence
John C. Havens
 
AIF360 - Trusted and Fair AI
Animesh Singh
 
Ad

Similar to Responsible Data Use in AI - core tech pillars (20)

PPTX
Fairness and Privacy in AI/ML Systems
Krishnaram Kenthapadi
 
PPTX
Fairness and Privacy in AI/ML Systems
Krishnaram Kenthapadi
 
PPTX
Fairness, Transparency, and Privacy in AI @ LinkedIn
Krishnaram Kenthapadi
 
PDF
Fairness, Transparency, and Privacy in AI @LinkedIn
C4Media
 
PDF
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Krishnaram Kenthapadi
 
PDF
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Patrick Van Renterghem
 
PPTX
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
PDF
Data ethics for developers
anilramnanan
 
PDF
Engineering Ethics: Practicing Fairness
Clare Corthell
 
PPTX
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Krishnaram Kenthapadi
 
PPTX
Privacy-preserving Analytics and Data Mining at LinkedIn
Krishnaram Kenthapadi
 
PPTX
Putting data science into perspective
Sravan Ankaraju
 
PPTX
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
PDF
Cool vs Creepy - Ethics and Data Science - Cooper 2Feb
Cathy Cooper
 
PDF
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Edge AI and Vision Alliance
 
PPTX
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Krishnaram Kenthapadi
 
PPTX
Algorithmic fairness
AnthonyMelson
 
PPTX
ARTIFICIAL INTELLIGENCE AND ETHICS 29.pptx
AmalaPaulson
 
PPTX
Bsa cpd a_koene2016
Ansgar Koene
 
Fairness and Privacy in AI/ML Systems
Krishnaram Kenthapadi
 
Fairness and Privacy in AI/ML Systems
Krishnaram Kenthapadi
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Krishnaram Kenthapadi
 
Fairness, Transparency, and Privacy in AI @LinkedIn
C4Media
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Krishnaram Kenthapadi
 
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Patrick Van Renterghem
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
Data ethics for developers
anilramnanan
 
Engineering Ethics: Practicing Fairness
Clare Corthell
 
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Krishnaram Kenthapadi
 
Privacy-preserving Analytics and Data Mining at LinkedIn
Krishnaram Kenthapadi
 
Putting data science into perspective
Sravan Ankaraju
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
Cool vs Creepy - Ethics and Data Science - Cooper 2Feb
Cathy Cooper
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
Edge AI and Vision Alliance
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Krishnaram Kenthapadi
 
Algorithmic fairness
AnthonyMelson
 
ARTIFICIAL INTELLIGENCE AND ETHICS 29.pptx
AmalaPaulson
 
Bsa cpd a_koene2016
Ansgar Koene
 
Ad

Recently uploaded (20)

PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
Chad Readey - An Independent Thinker
Chad Readey
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
short term internship project on Data visualization
JMJCollegeComputerde
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 

Responsible Data Use in AI - core tech pillars

  • 1. Responsible Data Use Sofus A. Macskássy Data Science @ LinkedIn [email protected] November 2019
  • 3. The Coded Gaze [Joy Buolamwini 2016] Face detection software: Fails for some darker faces Bias
  • 4. • Facial analysis software: Higher accuracy for light skinned men • Error rates for dark skinned women: 20% - 34% Gender Shades [Joy Buolamwini & Timnit Gebru, 2018] Bias
  • 6. • Ethical challenges posed by AI systems • Inherent biases present in society • Reflected in training data • AI/ML models prone to amplifying such biases Algorithmic Bias Bias
  • 7. Massachusetts Group Insurance Commission (1997): Anonymized medical history of state employees William Weld vs Latanya Sweeney Latanya Sweeney (MIT grad student): $20 – Cambridge voter roll born July 31, 1945 resident of 02138 Privacy
  • 8. 64%Uniquely identifiable with ZIP + birth date + gender (in the US population) Golle, “Revisiting the Uniqueness of Simple Demographics in the US Population”, WPES 2006 Privacy
  • 9. A self driving car knocked down and killed a pedestrian in Tempe, AZ in 2018. - Who is to blame (accountability) - Who to prevent this (safety) - Should we ban self-driving cars (liability and policy evaluation) The need for XAI Explainability https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html
  • 10. A recent research paper shows that a classifier that could recognize wolves from husky dogs was basing its decision solely on the presence of snow in the background. The need for XAI Ribeiro, Singh, and Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. SIGKDD 2016. Explainability
  • 11. The need for XAI Explainable AI is good for multiple reasons: - Builds trust (why did you do this) - Can be judged (how much do I believe the prediction) - Can be corrected (new training or tweaks to correct errors) - Is actionable (I know what do to next) - … Explainability
  • 12. Data Governance Governance Reflect company policies Ensures compliance Protects data Protects company Involves all orgs in a company https://www.dama.org/sites/default/files/download/DAMA-DMBOK2-Framework-V2-20140317-FINAL.pdf
  • 13. Laws against Discrimination Immigration Reform and Control Act Citizenship Rehabilitation Act of 1973; Americans with Disabilities Act of 1990 Disability status Civil Rights Act of 1964 Race Age Discrimination in Employment Act of 1967 Age Equal Pay Act of 1963; Civil Rights Act of 1964 Sex And more...
  • 15. Responsible Data Use @ LinkedIn Case studies - Bias - Privacy - Governance
  • 16. LinkedIn operates the largest professional network on the Internet Tell your story 645M+ members 30M+ companies are represented on LinkedIn 90K+ schools listed (high school & college) 35K+ skills listed 20M+ open jobs on LinkedIn Jobs 280B Feed updates
  • 19. Insights to Identify Diverse Talent Pools Representative Talent Search Results Diversity Learning Curriculum “Diversity by Design” in LinkedIn’s Talent Solutions
  • 23. Inclusive Job Descriptions / Recruiter Outreach
  • 24. Representative Ranking for Talent Search S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness- Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search, KDD’19. [Microsoft’s AI/ML conference (MLADS’18). Distinguished Contribution Award] Building Representative Talent Search at LinkedIn (LinkedIn engineering blog)
  • 25. Intuition for Measuring and Achieving Representativeness • Ideal: Top ranked results should follow a desired distribution on gender/age/… • E.g., same distribution as the underlying talent pool • Inspired by “Equal Opportunity” definition [Hardt et al, NIPS’16] • Defined measures (skew, divergence) based on this intuition
  • 26. Fairness-aware Reranking Algorithm (Simplified) • Partition the set of potential candidates into different buckets for each attribute value • Rank the candidates in each bucket according to the scores assigned by the machine-learned model • Merge the ranked lists, balancing the representation requirements and the selection of highest scored candidates • Algorithmic variants based on how we choose the next attribute
  • 28. Validating Our Approach • Gender Representativeness • Over 95% of all searches are representative compared to the qualified population of the search • Business Metrics • A/B test over LinkedIn Recruiter users for two weeks • No significant change in business metrics (e.g., # InMails sent or accepted) • Ramped to 100% of LinkedIn Recruiter users worldwide
  • 29. Lessons learned • Post-processing approach desirable • Model agnostic • Scalable across different model choices for our application • Acts as a “fail-safe” • Robust to application-specific business logic • Easier to incorporate as part of existing systems • Build as a stand-alone service or component for post-processing • No significant modifications to the existing components • Complementary to efforts to reduce bias from training data & during model training
  • 30. Engineering for Fairness in AI Lifecycle Problem Formation Dataset Construction Algorithm Selection Training Process Testing Process Deployment Feedback Is an algorithm an ethical solution to our problem? Does our data include enough minority samples? Are there missing/biased features? Do we need to apply debiasing algorithms to preprocess our data? Do we need to include fairness constraints in the function? Have we evaluated the model using relevant fairness metrics? Is the model’s effect similar across all users? Are we deploying our model on a population that we did not train/test on? Does the model encourage feedback loops that can produce increasingly unfair outcomes? Credit: K. Browne & J. Draper
  • 31. Engineering for Fairness in AI Lifecycle S.Vasudevan, K. Kenthapadi, FairScale: A Scalable Framework for Measuring Fairness in AI Applications, 2019
  • 32. Fairness-aware Experimentation [Saint-Jacques & Sepehri, KDD’19 Social Impact Workshop] Imagine LinkedIn has 10 members. Each of them has 1 session a day. A new product increases sessions by +1 session per member on average. Both of these are +1 session / member on average! One is much more unequal than the other. We want to catch that.
  • 33. Privacy @ LinkedIn Framework to compute robust, privacy- preserving analytics
  • 34. Analytics & Reporting Products at LinkedIn Profile View Analytics 34 Content Analytics Ad Campaign Analytics All showing demographics of members engaging with the product
  • 35. • Admit only a small # of predetermined query types • Querying for the number of member actions, for a specified time period, together with the top demographic breakdowns Analytics & Reporting Products at LinkedIn
  • 36. • Admit only a small # of predetermined query types • Querying for the number of member actions, for a specified time period, together with the top demographic breakdowns Analytics & Reporting Products at LinkedIn E.g., Title = “Senior Director” E.g., Clicks on a given ad
  • 37. Privacy Requirements • Attacker cannot infer whether a member performed an action • E.g., click on an article or an ad • Attacker may use auxiliary knowledge • E.g., knowledge of attributes associated with the target member (say, obtained from this member’s LinkedIn profile) • E.g., knowledge of all other members that performed similar action (say, by creating fake accounts)
  • 38. Possible Privacy Attacks 38 Targeting: Senior directors in US, who studied at Cornell Matches ~16k LinkedIn members → over minimum targeting threshold Demographic breakdown: Company = X May match exactly one person → can determine whether the person clicks on the ad or not Require minimum reporting threshold Attacker could create fake profiles! E.g. if threshold is 10, create 9 fake profiles that all click. Rounding mechanism E.g., report incremental of 10 Still amenable to attacks E.g. using incremental counts over time to infer individuals’ actions Need rigorous techniques to preserve member privacy (not reveal exact aggregate counts)
  • 39. Problem Statement •Compute robust, reliable analytics in a privacy- preserving manner, while addressing the product needs.
  • 43. Differential Privacy 43 Databases D and D′ are neighbors if they differ in one person’s data. Differential Privacy: The distribution of the curator’s output M(D) on database D is (nearly) the same as M(D′). Curator + your data - your data Dwork, McSherry, Nissim, Smith [TCC 2006] Curator
  • 46. Keeping our data safe and secure for members Problem statement • We have a lot of data • Some may have PII data • How do we keep this secure? • Removing PII data • Tracking access
  • 47. Policy: Keeping the data safe Solution through Technology • Meta data store • Tag all attributes in all tables • Know which fields are PII • Know which fields need protection • Audit access to data • Obfuscate data wherever possible
  • 48. Tracking pedigree of data • Tables can be combined to create new tables • Automatically track pedigree of attributes and their PII value • Assess new attributes for PII as well • Have authors be accountable Name Type PII ? Name string Yes Age String Yes A1 string No A2 url No Name Type PII ? Name string Yes Adult boolean No B2 string No C1 number No Name Type PII ? Name string Yes B1 number No B2 string No B3 string No
  • 50. Fairness in ML • Application specific challenges • Conversational AI systems: Unique bias/fairness/ethics considerations • E.g., Hate speech, Complex failure modes • Beyond protected categories, e.g., accent, dialect • Entire ecosystem (e.g., including apps such as Alexa skills) • Two-sided markets: e.g., fairness to buyers and to sellers, or to content consumers and producers • Fairness in advertising (externalities) • Tools for ensuring fairness (measuring & mitigating bias) in AI lifecycle • Pre-processing (representative datasets; modifying features/labels) • ML model training with fairness constraints • Post-processing • Experimentation & Post-deployment
  • 51. Explainability in ML • Actionable explanations • Balance between explanations & model secrecy • Robustness of explanations to failure modes (Interaction between ML components) • Application-specific challenges • Conversational AI systems: contextual explanations • Gradation of explanations • Tools for explanations across AI lifecycle • Pre & post-deployment for ML models • Model developer vs. End user focused
  • 52. Privacy in ML • Privacy-preserving model training, robust against adversarial membership inference attacks • Privacy for highly sensitive data: model training & analytics using secure enclaves, homomorphic encryption, federated learning / on-device learning, or a hybrid • Privacy-preserving transfer learning (broadly, privacy-preserving mechanisms for data marketplaces)
  • 53. Thank you Sofus A. Macskássy Data Science @ LinkedIn [email protected]