He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

0 likes•212 views

The document discusses the identification and mitigation of bias in natural language processing (NLP), highlighting the significance of training data and model integration. It outlines steps for recognizing bias using explainable AI and emphasizes the importance of ensuring that training data is free from bias while employing techniques such as adversarial training and adjusting embeddings. Ultimately, the document underscores the responsibility of AI developers to minimize harmful side effects in deployed systems.

Technology

Finding and Fixing Bias
in Natural Language Processing
Yves Peirsman

Artificial Intelligence
Natural Language Processing
A primer in NLP
Machine
translation
Sentiment
analysis
Information
retrieval
Information
extraction
Text
classification

We provide consultancy
for companies that need
guidance in the NLP domain
We develop software
and train custom NLP
models for challenging
or domain-specific
applications.

Training data Training process Model
We integrate
models with
workflows.
NLP Town
We help annotate
training data.
We train models
for NLP
applications.
We provide consultancy
for NLP projects.

A primer in NLP
Training data Training process Model

Word Embeddings
Word embeddings allow NLP models to generalize better.

Word Embeddings
Word embeddings capture both general and linguistic knowledge.

Word Embeddings
Word embeddings also encode bias:
● Man is to king as woman is to ___.
● Man is to programmer as woman is to ___.
Experiment:
● Measure the similarity between occupations and
○ A set of “male” words: man, son, father, he, him, etc.
○ A set of “female” words: woman, daughter, mother, she, her, etc.

Pretrained NLP models
Pretrained language models are a recent significant breakthrough in NLP:
● Language models predict masked words.
● They learn a lot about language.
● This knowledge can be reused in “downstream” tasks.
This movie won her an Oscar for best actress.
The keys to the house are on the table.

Pretrained NLP models
ULMFit, Howard and Ruder 2018

Pretrained language models
Experiment: association with a large number
of positive adjectives
● One of the several recent Dutch Bert
models
● Association between 240 positive
adjectives and hij/zij:
○ aantrekkelijk, ambitieus, intelligent,
slim, knap, nauwkeurig,
nieuwsgierig, etc.

Step 1: Identify bias with explainable AI
Challenge
● First we need to find out our models are biased: search for known, but also
unexpected bias
● An important role for explainable AI
Experiment
● A simple classifier for toxic comments
● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a
hole go to hell!"

Step 1: Identify bias with explainable AI
● Visualize the classifier features and their weights:

Step 1: Identify bias with explainable AI

Step 2: Fixing and avoiding bias
Training data Training process Model

Training data Training process Model
Ensure the training
data is free of bias.
Step 2: Fixing and avoiding bias

Bias in annotation
Inform annotators about possible confounding factors, such as dialect.
● Example: if people are informed that a tweet contains African American
English dialect, they are less likely to label it as offensive (Sap et al. 2019)
Bias in text
● If you create a new corpus, ensure your texts contain as little bias as
possible.
● If you use existing data, try mitigating biases through data
augmentation, over- and/or undersampling, etc.
Step 2: Fixing and avoiding bias

Training data Training process Model
Pick a training
procedure that
makes the system
blind to bias.
Step 2: Fixing and avoiding bias

Adversarial training
Train your model to shine at your task, but to fail at
predicting “protected variables”, such as gender or race.
ModelCV
Step 2: Fixing and avoiding bias

Training data Training process Model
Change the
weights of the
model so that the
bias is reduced.
Step 2: Fixing and avoiding bias

Word embeddings
Transform the embeddings so that bias is removed.
Pre-trained models
Fine-tune on non-biased data, so that the models “forget” their bias.
Step 2: Fixing and avoiding bias

None of these methods are foolproof:
● You need to be aware of the bias before you can remove it
● Often only “superficial” bias is removed, but deeper bias remains (Honen
and Goldberg 2019)
As AI developers, it is our responsibility to deploy our system in such a way that
potentially harmful side effects are minimized.
● Effective feedback loops
● Human-in-the-loop AI
Step 2: Fixing and avoiding bias

http://www.nlp.town yves@nlp.town
Thanks! Questions?

More Related Content

PPTX

Machine Learning Contents.pptxNaveenkushwaha18

PPTX

Introduction to natural language processing, history and originShubhankar Mohan

PPTX

Natural Language Processing (NLP)Abdullah al Mamun

PPTX

Basics of Soft Computing Sangeetha Rajesh

PPT

Introduction to Natural Language Processingrohitnayak

PPTX

Natural Language processingSanzid Kawsar

PPTX

Semi supervised approach for word sense disambiguationkokanechandrakant

PPTX

Artificial neural networkGauravPandey319

Machine Learning Contents.pptxNaveenkushwaha18

Introduction to natural language processing, history and originShubhankar Mohan

Natural Language Processing (NLP)Abdullah al Mamun

Basics of Soft Computing Sangeetha Rajesh

Introduction to Natural Language Processingrohitnayak

Natural Language processingSanzid Kawsar

Semi supervised approach for word sense disambiguationkokanechandrakant

Artificial neural networkGauravPandey319

What's hot (20)

PPT

Machine learningRajib Kumar De

PPT

Natural Language ProcessingYasir Khan

PDF

Natural language processing (NLP) introductionRobert Lujo

PPTX

Introduction to AI and its domains.pptxNeeru Mittal

PPT

Analysis of Algorithmأحلام انصارى

PPTX

Machine translationmohamed hassan

PPTX

Rules of data miningSulman Ahmed

PDF

Natural language processingAanchal Chaurasia

PPTX

Introduction to Natural Language ProcessingMercy Rani

PPTX

Introduction to Machine Learning snehal_152

PDF

Natural language processing and its application in aiRam Kumar

PDF

Lecture1 introduction to machine learningUmmeSalmaM1

PDF

Natural Language Processing (NLP)Yuriy Guts

PPTX

Natural language processing PPT presentationSai Mohith

PDF

Natural Language Processing In HealthcareLaxmiMPriya

PPTX

Introduction To Machine LearningKnoldus Inc.

PDF

23 Matrix AlgorithmsAndres Mendez-Vazquez

PPTX

NLP.pptxRahul Borate

PPTX

Machine learningeonx_32

PPTX

NLP PPT.pptxLipika Sharma

Machine learningRajib Kumar De

Natural Language ProcessingYasir Khan

Natural language processing (NLP) introductionRobert Lujo

Introduction to AI and its domains.pptxNeeru Mittal

Analysis of Algorithmأحلام انصارى

Machine translationmohamed hassan

Rules of data miningSulman Ahmed

Natural language processingAanchal Chaurasia

Introduction to Natural Language ProcessingMercy Rani

Introduction to Machine Learning snehal_152

Natural language processing and its application in aiRam Kumar

Lecture1 introduction to machine learningUmmeSalmaM1

Natural Language Processing (NLP)Yuriy Guts

Natural language processing PPT presentationSai Mohith

Natural Language Processing In HealthcareLaxmiMPriya

Introduction To Machine LearningKnoldus Inc.

23 Matrix AlgorithmsAndres Mendez-Vazquez

NLP.pptxRahul Borate

Machine learningeonx_32

NLP PPT.pptxLipika Sharma

Similar to He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town (20)

PDF

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman

PDF

Openbar Leuven // Less is more. Working with less data in NLP by Yves PeirsmanOpenbar

PDF

Reflective Plan ExamplesMonica Turner

PDF

What can Natural Language Processing do for you?Yves Peirsman

PDF

CBUSDAW - Ash Lewis - Reducing LLM HallucinationsJason Packer

PDF

DataScientist Job : Between Myths and Reality.pdfJedha Bootcamp

PDF

ConveyUX Elegant Precisionlaurentgc

PDF

Fine-tuning Pre-Trained Models for Generative AI ApplicationsBenjaminlapid1

PPT

Clark ch 8 and 9Christian King

PPTX

How to fine-tune and develop your own large language model.pptxKnoldus Inc.

PDF

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays

PPT

Clark ch 8 and 9Christian King

PDF

ChatGPT in academic settings H2.deDavid Döring

DOCX

Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docxcroysierkathey

PDF

AI & Marketing, The Signal and the Noise - Dave HaywardDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

PDF

Babak Rasolzadeh: The importance of entitiesZoltan Varju

PDF

Ai demystified for HR and TA leadersAntonia Macrides

PPT

E-Learning Balancing Act: Good vs Efficient development-web_version092010tmharpster

PPTX

Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...IL Group (CILIP Information Literacy Group)

PPTX

Pair Programming with a Large Language ModelKnoldus Inc.

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman

Openbar Leuven // Less is more. Working with less data in NLP by Yves PeirsmanOpenbar

Reflective Plan ExamplesMonica Turner

What can Natural Language Processing do for you?Yves Peirsman

CBUSDAW - Ash Lewis - Reducing LLM HallucinationsJason Packer

DataScientist Job : Between Myths and Reality.pdfJedha Bootcamp

ConveyUX Elegant Precisionlaurentgc

Fine-tuning Pre-Trained Models for Generative AI ApplicationsBenjaminlapid1

Clark ch 8 and 9Christian King

How to fine-tune and develop your own large language model.pptxKnoldus Inc.

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays

Clark ch 8 and 9Christian King

ChatGPT in academic settings H2.deDavid Döring

Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docxcroysierkathey

AI & Marketing, The Signal and the Noise - Dave HaywardDigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions

Babak Rasolzadeh: The importance of entitiesZoltan Varju

Ai demystified for HR and TA leadersAntonia Macrides

E-Learning Balancing Act: Good vs Efficient development-web_version092010tmharpster

Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...IL Group (CILIP Information Literacy Group)

Pair Programming with a Large Language ModelKnoldus Inc.

More from Patrick Van Renterghem (20)

PDF

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...Patrick Van Renterghem

PDF

Implementing error-proof, business-critical Machine Learning, presentation by...Patrick Van Renterghem

PDF

Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...Patrick Van Renterghem

PDF

AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...Patrick Van Renterghem

PDF

Responsible AI: An Example AI Development Process with Focus on Risks and Con...Patrick Van Renterghem

PDF

Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...Patrick Van Renterghem

PPTX

How obedient digital twins and intelligent beings contribute to ethics and ex...Patrick Van Renterghem

PDF

Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...Patrick Van Renterghem

PDF

Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...Patrick Van Renterghem

PDF

Digital Workplace Case Study: How the Municipality of Duffel successfully swi...Patrick Van Renterghem

PDF

Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...Patrick Van Renterghem

PDF

The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...Patrick Van Renterghem

PDF

Engie's Digital Workplace and "Connecting the company" business case, present...Patrick Van Renterghem

PDF

Face your communication challenges when implementing a digital workplace, bas...Patrick Van Renterghem

PDF

The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...Patrick Van Renterghem

PDF

Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...Patrick Van Renterghem

PDF

Tim scottkoenverheyenpresentationPatrick Van Renterghem

PDF

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Patrick Van Renterghem

PDF

Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...Patrick Van Renterghem

PDF

Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...Patrick Van Renterghem