SlideShare a Scribd company logo
Finding and Fixing Bias
in Natural Language Processing
Yves Peirsman
Artificial Intelligence
Natural Language Processing
A primer in NLP
Machine
translation
Sentiment
analysis
Information
retrieval
Information
extraction
Text
classification
We provide consultancy
for companies that need
guidance in the NLP domain
We develop software
and train custom NLP
models for challenging
or domain-specific
applications.
Training data Training process Model
We integrate
models with
workflows.
NLP Town
We help annotate
training data.
We train models
for NLP
applications.
We provide consultancy
for NLP projects.
Bias in Natural Language Processing
Bias in Natural Language Processing
A primer in NLP
Training data Training process Model
A primer in NLP
Word Embeddings
Word embeddings allow NLP models to generalize better.
Word Embeddings
Word embeddings capture both general and linguistic knowledge.
Word Embeddings
Word embeddings also encode bias:
● Man is to king as woman is to ___.
● Man is to programmer as woman is to ___.
Experiment:
● Measure the similarity between occupations and
○ A set of “male” words: man, son, father, he, him, etc.
○ A set of “female” words: woman, daughter, mother, she, her, etc.
Word Embeddings
Pretrained NLP models
Pretrained language models are a recent significant breakthrough in NLP:
● Language models predict masked words.
● They learn a lot about language.
● This knowledge can be reused in “downstream” tasks.
This movie won her an Oscar for best actress.
The keys to the house are on the table.
Pretrained NLP models
ULMFit, Howard and Ruder 2018
Pretrained language models
Experiment: association with a large number
of positive adjectives
● One of the several recent Dutch Bert
models
● Association between 240 positive
adjectives and hij/zij:
○ aantrekkelijk, ambitieus, intelligent,
slim, knap, nauwkeurig,
nieuwsgierig, etc.
The problem with bias
or
Step 1: Identify bias with explainable AI
Challenge
● First we need to find out our models are biased: search for known, but also
unexpected bias
● An important role for explainable AI
Experiment
● A simple classifier for toxic comments
● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a
hole go to hell!"
Step 1: Identify bias with explainable AI
● Visualize the classifier features and their weights:
Step 1: Identify bias with explainable AI
Step 1: Identify bias with explainable AI
Step 2: Fixing and avoiding bias
Training data Training process Model
Training data Training process Model
Ensure the training
data is free of bias.
Step 2: Fixing and avoiding bias
Bias in annotation
Inform annotators about possible confounding factors, such as dialect.
● Example: if people are informed that a tweet contains African American
English dialect, they are less likely to label it as offensive (Sap et al. 2019)
Bias in text
● If you create a new corpus, ensure your texts contain as little bias as
possible.
● If you use existing data, try mitigating biases through data
augmentation, over- and/or undersampling, etc.
Step 2: Fixing and avoiding bias
Training data Training process Model
Pick a training
procedure that
makes the system
blind to bias.
Step 2: Fixing and avoiding bias
Adversarial training
Train your model to shine at your task, but to fail at
predicting “protected variables”, such as gender or race.
ModelCV
Step 2: Fixing and avoiding bias
Training data Training process Model
Change the
weights of the
model so that the
bias is reduced.
Step 2: Fixing and avoiding bias
Word embeddings
Transform the embeddings so that bias is removed.
Pre-trained models
Fine-tune on non-biased data, so that the models “forget” their bias.
Step 2: Fixing and avoiding bias
None of these methods are foolproof:
● You need to be aware of the bias before you can remove it
● Often only “superficial” bias is removed, but deeper bias remains (Honen
and Goldberg 2019)
As AI developers, it is our responsibility to deploy our system in such a way that
potentially harmful side effects are minimized.
● Effective feedback loops
● Human-in-the-loop AI
Step 2: Fixing and avoiding bias
http://www.nlp.town yves@nlp.town
Thanks! Questions?

More Related Content

PPTX
Machine Learning Contents.pptx
Naveenkushwaha18
 
PPTX
Introduction to natural language processing, history and origin
Shubhankar Mohan
 
PPTX
Natural Language Processing (NLP)
Abdullah al Mamun
 
PPTX
Basics of Soft Computing
Sangeetha Rajesh
 
PPT
Introduction to Natural Language Processing
rohitnayak
 
PPTX
Natural Language processing
Sanzid Kawsar
 
PPTX
Semi supervised approach for word sense disambiguation
kokanechandrakant
 
PPTX
Artificial neural network
GauravPandey319
 
Machine Learning Contents.pptx
Naveenkushwaha18
 
Introduction to natural language processing, history and origin
Shubhankar Mohan
 
Natural Language Processing (NLP)
Abdullah al Mamun
 
Basics of Soft Computing
Sangeetha Rajesh
 
Introduction to Natural Language Processing
rohitnayak
 
Natural Language processing
Sanzid Kawsar
 
Semi supervised approach for word sense disambiguation
kokanechandrakant
 
Artificial neural network
GauravPandey319
 

What's hot (20)

PPT
Machine learning
Rajib Kumar De
 
PPT
Natural Language Processing
Yasir Khan
 
PDF
Natural language processing (NLP) introduction
Robert Lujo
 
PPTX
Introduction to AI and its domains.pptx
Neeru Mittal
 
PPT
Analysis of Algorithm
أحلام انصارى
 
PPTX
Machine translation
mohamed hassan
 
PPTX
Rules of data mining
Sulman Ahmed
 
PDF
Natural language processing
Aanchal Chaurasia
 
PPTX
Introduction to Natural Language Processing
Mercy Rani
 
PPTX
Introduction to Machine Learning
snehal_152
 
PDF
Natural language processing and its application in ai
Ram Kumar
 
PDF
Lecture1 introduction to machine learning
UmmeSalmaM1
 
PDF
Natural Language Processing (NLP)
Yuriy Guts
 
PPTX
Natural language processing PPT presentation
Sai Mohith
 
PDF
Natural Language Processing In Healthcare
LaxmiMPriya
 
PPTX
Introduction To Machine Learning
Knoldus Inc.
 
PDF
23 Matrix Algorithms
Andres Mendez-Vazquez
 
PPTX
NLP.pptx
Rahul Borate
 
PPTX
Machine learning
eonx_32
 
PPTX
NLP PPT.pptx
Lipika Sharma
 
Machine learning
Rajib Kumar De
 
Natural Language Processing
Yasir Khan
 
Natural language processing (NLP) introduction
Robert Lujo
 
Introduction to AI and its domains.pptx
Neeru Mittal
 
Analysis of Algorithm
أحلام انصارى
 
Machine translation
mohamed hassan
 
Rules of data mining
Sulman Ahmed
 
Natural language processing
Aanchal Chaurasia
 
Introduction to Natural Language Processing
Mercy Rani
 
Introduction to Machine Learning
snehal_152
 
Natural language processing and its application in ai
Ram Kumar
 
Lecture1 introduction to machine learning
UmmeSalmaM1
 
Natural Language Processing (NLP)
Yuriy Guts
 
Natural language processing PPT presentation
Sai Mohith
 
Natural Language Processing In Healthcare
LaxmiMPriya
 
Introduction To Machine Learning
Knoldus Inc.
 
23 Matrix Algorithms
Andres Mendez-Vazquez
 
NLP.pptx
Rahul Borate
 
Machine learning
eonx_32
 
NLP PPT.pptx
Lipika Sharma
 
Ad

Similar to He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town (20)

PDF
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Yves Peirsman
 
PDF
Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman
Openbar
 
PDF
Reflective Plan Examples
Monica Turner
 
PDF
What can Natural Language Processing do for you?
Yves Peirsman
 
PDF
CBUSDAW - Ash Lewis - Reducing LLM Hallucinations
Jason Packer
 
PDF
DataScientist Job : Between Myths and Reality.pdf
Jedha Bootcamp
 
PDF
ConveyUX Elegant Precision
laurentgc
 
PDF
Fine-tuning Pre-Trained Models for Generative AI Applications
Benjaminlapid1
 
PPT
Clark ch 8 and 9
Christian King
 
PPTX
How to fine-tune and develop your own large language model.pptx
Knoldus Inc.
 
PDF
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
PPT
Clark ch 8 and 9
Christian King
 
PDF
ChatGPT in academic settings H2.de
David Döring
 
DOCX
Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx
croysierkathey
 
PDF
AI & Marketing, The Signal and the Noise - Dave Hayward
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PDF
Babak Rasolzadeh: The importance of entities
Zoltan Varju
 
PDF
Ai demystified for HR and TA leaders
Antonia Macrides
 
PPT
E-Learning Balancing Act: Good vs Efficient development-web_version092010
tmharpster
 
PPTX
Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...
IL Group (CILIP Information Literacy Group)
 
PPTX
Pair Programming with a Large Language Model
Knoldus Inc.
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Yves Peirsman
 
Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman
Openbar
 
Reflective Plan Examples
Monica Turner
 
What can Natural Language Processing do for you?
Yves Peirsman
 
CBUSDAW - Ash Lewis - Reducing LLM Hallucinations
Jason Packer
 
DataScientist Job : Between Myths and Reality.pdf
Jedha Bootcamp
 
ConveyUX Elegant Precision
laurentgc
 
Fine-tuning Pre-Trained Models for Generative AI Applications
Benjaminlapid1
 
Clark ch 8 and 9
Christian King
 
How to fine-tune and develop your own large language model.pptx
Knoldus Inc.
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
Clark ch 8 and 9
Christian King
 
ChatGPT in academic settings H2.de
David Döring
 
Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx
croysierkathey
 
AI & Marketing, The Signal and the Noise - Dave Hayward
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Babak Rasolzadeh: The importance of entities
Zoltan Varju
 
Ai demystified for HR and TA leaders
Antonia Macrides
 
E-Learning Balancing Act: Good vs Efficient development-web_version092010
tmharpster
 
Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...
IL Group (CILIP Information Literacy Group)
 
Pair Programming with a Large Language Model
Knoldus Inc.
 
Ad

More from Patrick Van Renterghem (20)

PDF
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Patrick Van Renterghem
 
PDF
Implementing error-proof, business-critical Machine Learning, presentation by...
Patrick Van Renterghem
 
PDF
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Patrick Van Renterghem
 
PDF
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
Patrick Van Renterghem
 
PDF
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Patrick Van Renterghem
 
PDF
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Patrick Van Renterghem
 
PPTX
How obedient digital twins and intelligent beings contribute to ethics and ex...
Patrick Van Renterghem
 
PDF
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Patrick Van Renterghem
 
PDF
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Patrick Van Renterghem
 
PDF
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Patrick Van Renterghem
 
PDF
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Patrick Van Renterghem
 
PDF
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
Patrick Van Renterghem
 
PDF
Engie's Digital Workplace and "Connecting the company" business case, present...
Patrick Van Renterghem
 
PDF
Face your communication challenges when implementing a digital workplace, bas...
Patrick Van Renterghem
 
PDF
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
Patrick Van Renterghem
 
PDF
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Patrick Van Renterghem
 
PDF
Tim scottkoenverheyenpresentation
Patrick Van Renterghem
 
PDF
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Patrick Van Renterghem
 
PDF
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Patrick Van Renterghem
 
PDF
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Patrick Van Renterghem
 
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Patrick Van Renterghem
 
Implementing error-proof, business-critical Machine Learning, presentation by...
Patrick Van Renterghem
 
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Patrick Van Renterghem
 
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
Patrick Van Renterghem
 
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Patrick Van Renterghem
 
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Patrick Van Renterghem
 
How obedient digital twins and intelligent beings contribute to ethics and ex...
Patrick Van Renterghem
 
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Patrick Van Renterghem
 
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Patrick Van Renterghem
 
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Patrick Van Renterghem
 
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Patrick Van Renterghem
 
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
Patrick Van Renterghem
 
Engie's Digital Workplace and "Connecting the company" business case, present...
Patrick Van Renterghem
 
Face your communication challenges when implementing a digital workplace, bas...
Patrick Van Renterghem
 
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
Patrick Van Renterghem
 
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Patrick Van Renterghem
 
Tim scottkoenverheyenpresentation
Patrick Van Renterghem
 
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Patrick Van Renterghem
 
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Patrick Van Renterghem
 
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Patrick Van Renterghem
 

Recently uploaded (20)

PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
The Future of Artificial Intelligence (AI)
Mukul
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 

He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

  • 1. Finding and Fixing Bias in Natural Language Processing Yves Peirsman
  • 2. Artificial Intelligence Natural Language Processing A primer in NLP Machine translation Sentiment analysis Information retrieval Information extraction Text classification
  • 3. We provide consultancy for companies that need guidance in the NLP domain We develop software and train custom NLP models for challenging or domain-specific applications.
  • 4. Training data Training process Model We integrate models with workflows. NLP Town We help annotate training data. We train models for NLP applications. We provide consultancy for NLP projects.
  • 5. Bias in Natural Language Processing
  • 6. Bias in Natural Language Processing
  • 7. A primer in NLP Training data Training process Model
  • 9. Word Embeddings Word embeddings allow NLP models to generalize better.
  • 10. Word Embeddings Word embeddings capture both general and linguistic knowledge.
  • 11. Word Embeddings Word embeddings also encode bias: ● Man is to king as woman is to ___. ● Man is to programmer as woman is to ___. Experiment: ● Measure the similarity between occupations and ○ A set of “male” words: man, son, father, he, him, etc. ○ A set of “female” words: woman, daughter, mother, she, her, etc.
  • 13. Pretrained NLP models Pretrained language models are a recent significant breakthrough in NLP: ● Language models predict masked words. ● They learn a lot about language. ● This knowledge can be reused in “downstream” tasks. This movie won her an Oscar for best actress. The keys to the house are on the table.
  • 14. Pretrained NLP models ULMFit, Howard and Ruder 2018
  • 15. Pretrained language models Experiment: association with a large number of positive adjectives ● One of the several recent Dutch Bert models ● Association between 240 positive adjectives and hij/zij: ○ aantrekkelijk, ambitieus, intelligent, slim, knap, nauwkeurig, nieuwsgierig, etc.
  • 16. The problem with bias or
  • 17. Step 1: Identify bias with explainable AI Challenge ● First we need to find out our models are biased: search for known, but also unexpected bias ● An important role for explainable AI Experiment ● A simple classifier for toxic comments ● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a hole go to hell!"
  • 18. Step 1: Identify bias with explainable AI ● Visualize the classifier features and their weights:
  • 19. Step 1: Identify bias with explainable AI
  • 20. Step 1: Identify bias with explainable AI
  • 21. Step 2: Fixing and avoiding bias Training data Training process Model
  • 22. Training data Training process Model Ensure the training data is free of bias. Step 2: Fixing and avoiding bias
  • 23. Bias in annotation Inform annotators about possible confounding factors, such as dialect. ● Example: if people are informed that a tweet contains African American English dialect, they are less likely to label it as offensive (Sap et al. 2019) Bias in text ● If you create a new corpus, ensure your texts contain as little bias as possible. ● If you use existing data, try mitigating biases through data augmentation, over- and/or undersampling, etc. Step 2: Fixing and avoiding bias
  • 24. Training data Training process Model Pick a training procedure that makes the system blind to bias. Step 2: Fixing and avoiding bias
  • 25. Adversarial training Train your model to shine at your task, but to fail at predicting “protected variables”, such as gender or race. ModelCV Step 2: Fixing and avoiding bias
  • 26. Training data Training process Model Change the weights of the model so that the bias is reduced. Step 2: Fixing and avoiding bias
  • 27. Word embeddings Transform the embeddings so that bias is removed. Pre-trained models Fine-tune on non-biased data, so that the models “forget” their bias. Step 2: Fixing and avoiding bias
  • 28. None of these methods are foolproof: ● You need to be aware of the bias before you can remove it ● Often only “superficial” bias is removed, but deeper bias remains (Honen and Goldberg 2019) As AI developers, it is our responsibility to deploy our system in such a way that potentially harmful side effects are minimized. ● Effective feedback loops ● Human-in-the-loop AI Step 2: Fixing and avoiding bias