SlideShare a Scribd company logo
Less is More.
Working with Less Data in Natural Language Processing
Yves Peirsman
Artificial Intelligence
Natural Language Processing
Natural Language Processing
Machine
translation
Sentiment
analysis
Information
retrieval
Information
extraction
Text
classification
We provide consultancy
for companies that need
guidance in the NLP domain
and/or would like to develop
their AI software in-house.
We develop software
and train custom NLP
models for challenging
or domain-specific
applications.
Example projects
Sentiment analysis from Tweets NER and text classification for
personalization
Example projects
Document parsing Text generation
“type”: “dress”,
“color”: “red”,
“length”: “knee-length”,
“sleeve-length”: “short”,
“style”: “60s-style”,
We are selling this knee-length
dress. Its 60s-style look and
red color will completely win you
over. With its short sleeves, it is
perfect for long summer
evenings. With one click, this
fantastic dress can be yours.
Age of Big Data
We live in the age of big data.
● Enormous amounts of texts are
created every day: e-mails, tweets,
text messages, blogs, research
papers, news articles, legislation,
books, etc., etc.
● This holds great promise for NLP:
○ We need NLP to uncover
information in these texts.
○ We can use this data as training
data
Age of Big Data
Transfer Learning
The problem
● Machine Learning is data-hungry.
● Labelling training data is difficult, time-consuming and expensive.
● This limits the application of NLP in low-resource domains or languages.
⇒ How can we train accurate Machine Learning models with little data?
The solution: Transfer Learning
Re-use knowledge gained while solving one problem and apply it to a new problem
Pretrained task-specific models
Benefit from pretrained models.
● For many tasks, pretrained models are
available that are trained on data
different than yours.
● These models can often be finetuned
on your data.
● Example: spaCy’s generic Dutch NER
finetuned on a limited set of financial
news articles.
From task-specific to generic models
Pretrained task-specific models
● are only useful for classic NLP
tasks,
● are not available for custom
tasks and smaller languages,
● still require lots of labelled
training data.
From task-specific to generic models
Pretrained task-specific models
● are only useful for classic NLP
tasks,
● are not available for custom
tasks and smaller languages,
● still require lots of labelled
training data.
Pretrained generic models
● are useful for virtually any NLP
task,
● are easy to obtain for smaller
languages,
● should require unlabelled data
only.
From task-specific to generic models
Solution: language models predict a word on the basis of its context.
● Texts are self-labelled for language modelling tasks.
● Language models need knowledge of word meaning, syntax, co-reference, etc.
● This generic knowledge can be reused for specific NLP tasks.
This movie won her an Oscar for best actress.
The keys to the house are on the table.
From task-specific to generic models
Pre-trained language models can be finetuned for new NLP tasks.
ULMFit, Howard and Ruder 2018
Experiment: Sentiment Analysis
Sentiment Analysis:
● distinguish between positive (four/five stars) and negative (one/two
stars) product reviews (cf. Pang, Lee and Vaithyanathan).
● 6 languages: English, Dutch, French, German, Italian and Spanish
● 1000 training, 1000 development, 1000 testing examples
● 50% positive, 50% negative
This is a crap product. Not sure how plantronics labelled it a $50 headphones.
Sound quality is a disaster.
Good value for money. Can't complain. Beats the stuff at regular stores.
Recommended.
Experiment: Models
Baseline: spaCy
● One of the most popular open-source NLP libraries
● Pre-trained parsing, part-of-speech tagging, NER models
● Allows user to train text classification models based on a convolutional
neural network
State of the art: BERT
● Popular transfer learning model, developed by Google
● Pre-trained (mostly) by predicting masked words
First results
● spaCy: accuracy between 79.5% (Italian) and 83.4% (French)
● BERT: accuracy +8.4%, 45% error reduction
Disadvantages
Transfer-learning models typically have hundreds of millions of parameters.
This makes them heavy, slow and challenging to deploy.
(source: Huggingface)
Distillation
Options for shrinking these models:
● quantization: reduce the precision of the weights in a model by
encoding them in fewer bits
● pruning: remove certain parts of a model completely (connection
weights, neurons or even full weight matrices)
● distillation: train a small model to mimic the behaviour of a larger one
Experiment: can we use model distillation to train small spaCy models that
rival BERT?
Augmented data
Challenge: distillation requires more than 1000 labelled examples
Solution: Augmented data (Tang et al. 2019)
● mask random words in the training data
○ I like this book ⇒ I [MASK] this book
● replace random words in the training data by another word with the
same part of speech.
○ I like this book ⇒ I like this screen
● sample a random n-gram of length 1 to 5 from the training example
● sample a random sentence from the training example
Use BERT’s output for 60,000 such examples as spaCy’s training input.
Distillation
spaCy distilled
The distilled spaCy models perform almost as well as the BERT models:
improvement in accuracy of 7.3% and error reduction of 39%.
Conclusions
● Transfer learning allows us to train better NLP models with less data.
● Many transfer-learning models are huge and slow.
● For many tasks you don't need hundreds of millions of parameters to
achieve high accuracies.
● Approaches like model distillation allow us to train simpler models that
rival more complex ones.
http://www.nlp.town yves@nlp.town
Thanks! Questions?

More Related Content

PDF
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
MLconf
 
PDF
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Yves Peirsman
 
PDF
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
Patrick Van Renterghem
 
PDF
Deep learning for NLP
Shishir Choudhary
 
PDF
NLP with Deep Learning
fmguler
 
PDF
Approaches to teaching primary computing
JEcomputing
 
PDF
Transformer Introduction (Seminar Material)
Yuta Niki
 
PPTX
Neural Machine Translation: a report from the front line
Iconic Translation Machines
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
MLconf
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Yves Peirsman
 
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
Patrick Van Renterghem
 
Deep learning for NLP
Shishir Choudhary
 
NLP with Deep Learning
fmguler
 
Approaches to teaching primary computing
JEcomputing
 
Transformer Introduction (Seminar Material)
Yuta Niki
 
Neural Machine Translation: a report from the front line
Iconic Translation Machines
 

What's hot (10)

PPTX
Road map to competitive programming
Tutort Academy
 
PDF
Fusing Modeling and Programming into Language-Oriented Programming
Markus Voelter
 
PDF
Machine Translation: The Neural Frontier
Iconic Translation Machines
 
PDF
How to build a perfect ML-based question answering model which doesn't work -...
Dataconomy Media
 
PDF
NLP using transformers
Arvind Devaraj
 
PPTX
Professional Portfolio Rajat Pashine
rajatpashine
 
PDF
Deep learning for NLP and Transformer
Arvind Devaraj
 
PPT
Cs 1114 - lecture-1
Zeeshan Sabir
 
PDF
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
PDF
De cero a Machine Learning: un camino sencillo para llegar muy lejos
Emergya
 
Road map to competitive programming
Tutort Academy
 
Fusing Modeling and Programming into Language-Oriented Programming
Markus Voelter
 
Machine Translation: The Neural Frontier
Iconic Translation Machines
 
How to build a perfect ML-based question answering model which doesn't work -...
Dataconomy Media
 
NLP using transformers
Arvind Devaraj
 
Professional Portfolio Rajat Pashine
rajatpashine
 
Deep learning for NLP and Transformer
Arvind Devaraj
 
Cs 1114 - lecture-1
Zeeshan Sabir
 
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
De cero a Machine Learning: un camino sencillo para llegar muy lejos
Emergya
 
Ad

Similar to Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman (20)

PDF
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Seth Grimes
 
PDF
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
oranisalcani
 
PDF
Domain specific nlp pipelines
Rajesh Muppalla
 
PPTX
Thomas Wolf "Transfer learning in NLP"
Fwdays
 
PDF
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
PPTX
NLP in action talk
AliaksandrVasiuk
 
PDF
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
PPT
Nlp 2020 global ai conf -jeff_shomaker_final
Jeffrey Shomaker
 
PPTX
Understanding Generative AI Models and Their Real-World Applications.pptx
shilpamathur13
 
PDF
Week 2 Sentiment Analysis Using Machine Learning
SARCCOM
 
PPTX
NLP Deep Dive - recurrent neural networks .pptx
mailtoahmedhassan
 
PDF
Devday @ Sahaj - Domain Specific NLP Pipelines
Rajesh Muppalla
 
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
PPTX
State of the art in Natural Language Processing (March 2019)
Liad Magen
 
PDF
NLP and Deep Learning for non_experts
Sanghamitra Deb
 
PDF
Pre-Trained-Language-Models-for-NLU
POOJA BHOJWANI
 
PPTX
Transfer Learning in NLP: A Survey
NUPUR YADAV
 
PDF
Applications of NLP to become a high earning ML Engineer.pdf
NandaKishoreMallapra1
 
PPTX
ICLR 2020 Recap
Sri Ambati
 
PDF
Transfer Learning for Natural Language Processing
Sebastian Ruder
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Seth Grimes
 
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
oranisalcani
 
Domain specific nlp pipelines
Rajesh Muppalla
 
Thomas Wolf "Transfer learning in NLP"
Fwdays
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
NLP in action talk
AliaksandrVasiuk
 
NLP@DATEV: Setting up a domain specific language model, Dr. Jonas Rende & Tho...
Erlangen Artificial Intelligence & Machine Learning Meetup
 
Nlp 2020 global ai conf -jeff_shomaker_final
Jeffrey Shomaker
 
Understanding Generative AI Models and Their Real-World Applications.pptx
shilpamathur13
 
Week 2 Sentiment Analysis Using Machine Learning
SARCCOM
 
NLP Deep Dive - recurrent neural networks .pptx
mailtoahmedhassan
 
Devday @ Sahaj - Domain Specific NLP Pipelines
Rajesh Muppalla
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
State of the art in Natural Language Processing (March 2019)
Liad Magen
 
NLP and Deep Learning for non_experts
Sanghamitra Deb
 
Pre-Trained-Language-Models-for-NLU
POOJA BHOJWANI
 
Transfer Learning in NLP: A Survey
NUPUR YADAV
 
Applications of NLP to become a high earning ML Engineer.pdf
NandaKishoreMallapra1
 
ICLR 2020 Recap
Sri Ambati
 
Transfer Learning for Natural Language Processing
Sebastian Ruder
 
Ad

More from Openbar (20)

PDF
Openbar Kontich Online // The Competences of the future: how we applied AI to...
Openbar
 
PDF
Openbar Kontich Online // The Legal reality of VR and AR - Kris Seyen
Openbar
 
PDF
Openbar Leuven Online // Launching in Digital Space - Seb De Roover
Openbar
 
PDF
Openbar Leuven Online // How to Build and maintain your Agile Data Hub - Jona...
Openbar
 
PDF
Openbar Leuven // Omnicannel chatbots in Retail - Sam Hendrickx en Michiel Va...
Openbar
 
PDF
Openbar Leuven // Top 5 focus areas in cyber security linked to you digital t...
Openbar
 
PDF
Openbar Kontich // The key to successful entrepreneurship
Openbar
 
PDF
Openbar Leuven // Science fiction and AI
Openbar
 
PDF
Openbar Leuven // Ethics in technology - Laurens Somers
Openbar
 
PDF
Openbar Kontich // How to create intelligent & personal conversational AI - W...
Openbar
 
PDF
Openbar Kontich // Unified e-commerce with Netsuite - Roger Van Beeck
Openbar
 
PDF
Openbar Leuven // Edge-Computing: On-device AI // Nick Destrycker
Openbar
 
PDF
Openbar Leuven // What's up with augmented reality // Bert Waltniel
Openbar
 
PDF
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar
 
PDF
Openbar Kontich // Mobile app automation on a budget by Wim Vervust & Bram Thys
Openbar
 
PDF
Openbar Leuven \\ Using API Management to improve developers productivity \\ ...
Openbar
 
PDF
Openbar Kontich // Is your AI plotting to kill you? Why AI and Machine Learni...
Openbar
 
PDF
Openbar Kontich // Serverless - A view from the business side by Stef Ceyssen...
Openbar
 
PDF
Openbar Leuven // Safety first... in the Cloud by Koen Jacobs
Openbar
 
PDF
Openbar Kontich // RPA: A Hype or a Proven Technology? by Tim Vangilbergen & ...
Openbar
 
Openbar Kontich Online // The Competences of the future: how we applied AI to...
Openbar
 
Openbar Kontich Online // The Legal reality of VR and AR - Kris Seyen
Openbar
 
Openbar Leuven Online // Launching in Digital Space - Seb De Roover
Openbar
 
Openbar Leuven Online // How to Build and maintain your Agile Data Hub - Jona...
Openbar
 
Openbar Leuven // Omnicannel chatbots in Retail - Sam Hendrickx en Michiel Va...
Openbar
 
Openbar Leuven // Top 5 focus areas in cyber security linked to you digital t...
Openbar
 
Openbar Kontich // The key to successful entrepreneurship
Openbar
 
Openbar Leuven // Science fiction and AI
Openbar
 
Openbar Leuven // Ethics in technology - Laurens Somers
Openbar
 
Openbar Kontich // How to create intelligent & personal conversational AI - W...
Openbar
 
Openbar Kontich // Unified e-commerce with Netsuite - Roger Van Beeck
Openbar
 
Openbar Leuven // Edge-Computing: On-device AI // Nick Destrycker
Openbar
 
Openbar Leuven // What's up with augmented reality // Bert Waltniel
Openbar
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar
 
Openbar Kontich // Mobile app automation on a budget by Wim Vervust & Bram Thys
Openbar
 
Openbar Leuven \\ Using API Management to improve developers productivity \\ ...
Openbar
 
Openbar Kontich // Is your AI plotting to kill you? Why AI and Machine Learni...
Openbar
 
Openbar Kontich // Serverless - A view from the business side by Stef Ceyssen...
Openbar
 
Openbar Leuven // Safety first... in the Cloud by Koen Jacobs
Openbar
 
Openbar Kontich // RPA: A Hype or a Proven Technology? by Tim Vangilbergen & ...
Openbar
 

Recently uploaded (20)

PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Doc9.....................................
SofiaCollazos
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Doc9.....................................
SofiaCollazos
 

Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman

  • 1. Less is More. Working with Less Data in Natural Language Processing Yves Peirsman
  • 2. Artificial Intelligence Natural Language Processing Natural Language Processing Machine translation Sentiment analysis Information retrieval Information extraction Text classification
  • 3. We provide consultancy for companies that need guidance in the NLP domain and/or would like to develop their AI software in-house. We develop software and train custom NLP models for challenging or domain-specific applications.
  • 4. Example projects Sentiment analysis from Tweets NER and text classification for personalization
  • 5. Example projects Document parsing Text generation “type”: “dress”, “color”: “red”, “length”: “knee-length”, “sleeve-length”: “short”, “style”: “60s-style”, We are selling this knee-length dress. Its 60s-style look and red color will completely win you over. With its short sleeves, it is perfect for long summer evenings. With one click, this fantastic dress can be yours.
  • 6. Age of Big Data We live in the age of big data. ● Enormous amounts of texts are created every day: e-mails, tweets, text messages, blogs, research papers, news articles, legislation, books, etc., etc. ● This holds great promise for NLP: ○ We need NLP to uncover information in these texts. ○ We can use this data as training data
  • 7. Age of Big Data
  • 8. Transfer Learning The problem ● Machine Learning is data-hungry. ● Labelling training data is difficult, time-consuming and expensive. ● This limits the application of NLP in low-resource domains or languages. ⇒ How can we train accurate Machine Learning models with little data? The solution: Transfer Learning Re-use knowledge gained while solving one problem and apply it to a new problem
  • 9. Pretrained task-specific models Benefit from pretrained models. ● For many tasks, pretrained models are available that are trained on data different than yours. ● These models can often be finetuned on your data. ● Example: spaCy’s generic Dutch NER finetuned on a limited set of financial news articles.
  • 10. From task-specific to generic models Pretrained task-specific models ● are only useful for classic NLP tasks, ● are not available for custom tasks and smaller languages, ● still require lots of labelled training data.
  • 11. From task-specific to generic models Pretrained task-specific models ● are only useful for classic NLP tasks, ● are not available for custom tasks and smaller languages, ● still require lots of labelled training data. Pretrained generic models ● are useful for virtually any NLP task, ● are easy to obtain for smaller languages, ● should require unlabelled data only.
  • 12. From task-specific to generic models Solution: language models predict a word on the basis of its context. ● Texts are self-labelled for language modelling tasks. ● Language models need knowledge of word meaning, syntax, co-reference, etc. ● This generic knowledge can be reused for specific NLP tasks. This movie won her an Oscar for best actress. The keys to the house are on the table.
  • 13. From task-specific to generic models Pre-trained language models can be finetuned for new NLP tasks. ULMFit, Howard and Ruder 2018
  • 14. Experiment: Sentiment Analysis Sentiment Analysis: ● distinguish between positive (four/five stars) and negative (one/two stars) product reviews (cf. Pang, Lee and Vaithyanathan). ● 6 languages: English, Dutch, French, German, Italian and Spanish ● 1000 training, 1000 development, 1000 testing examples ● 50% positive, 50% negative This is a crap product. Not sure how plantronics labelled it a $50 headphones. Sound quality is a disaster. Good value for money. Can't complain. Beats the stuff at regular stores. Recommended.
  • 15. Experiment: Models Baseline: spaCy ● One of the most popular open-source NLP libraries ● Pre-trained parsing, part-of-speech tagging, NER models ● Allows user to train text classification models based on a convolutional neural network State of the art: BERT ● Popular transfer learning model, developed by Google ● Pre-trained (mostly) by predicting masked words
  • 16. First results ● spaCy: accuracy between 79.5% (Italian) and 83.4% (French) ● BERT: accuracy +8.4%, 45% error reduction
  • 17. Disadvantages Transfer-learning models typically have hundreds of millions of parameters. This makes them heavy, slow and challenging to deploy. (source: Huggingface)
  • 18. Distillation Options for shrinking these models: ● quantization: reduce the precision of the weights in a model by encoding them in fewer bits ● pruning: remove certain parts of a model completely (connection weights, neurons or even full weight matrices) ● distillation: train a small model to mimic the behaviour of a larger one Experiment: can we use model distillation to train small spaCy models that rival BERT?
  • 19. Augmented data Challenge: distillation requires more than 1000 labelled examples Solution: Augmented data (Tang et al. 2019) ● mask random words in the training data ○ I like this book ⇒ I [MASK] this book ● replace random words in the training data by another word with the same part of speech. ○ I like this book ⇒ I like this screen ● sample a random n-gram of length 1 to 5 from the training example ● sample a random sentence from the training example Use BERT’s output for 60,000 such examples as spaCy’s training input.
  • 21. spaCy distilled The distilled spaCy models perform almost as well as the BERT models: improvement in accuracy of 7.3% and error reduction of 39%.
  • 22. Conclusions ● Transfer learning allows us to train better NLP models with less data. ● Many transfer-learning models are huge and slow. ● For many tasks you don't need hundreds of millions of parameters to achieve high accuracies. ● Approaches like model distillation allow us to train simpler models that rival more complex ones.