SlideShare a Scribd company logo
IC-SDV 2019
April 9-10, 2019
Nice, France
Addressing requirements for
real-world deployments of ML & NLP
Stefan Geißler, Kairntech
Agenda
Looking back: the NLP landscape has changed
dramatically
Algorithms  Data!
Support dataset creation: The Kairntech Sherpa
Kairntech? Who are we
Conclusion
Looking back : NLP landscape has changed
2000:
Very few open source components
Lexicons, Taggers, Morphology,
Parsers mostly proprietory, complex to
install and maintain, limited coverage
« Make or Buy »
High level of manual efforts in
creating and maintaining lexical
knowledge bases, rule systems
Today
2019:
Sharing! (Github, …)
Lexicons, Taggers, Morphology,
Parsers often in the public domain
« Combine & Adapt »
Broad success of learning-based
approaches
2019: A tipping point in ML & NLP?
 « 2018 was the ‘image net’ moment for deep learning in NLP’ (S.
Ruder)
 In Image Processing in 2012 a Deep Learning network won a
public contest by a large margin. Now in 2018 we saw exciting NLP
models implementing transfer learning: ELMo, UMLfit, BERT
 « ML Engineering in NLP will truly blossom in 2019 » (E. Ameisen)
 Focus on Tools beyond model building! Link NLP/AI to production
use! What does it mean to build data-driven products and
services?
 « Enough papers: Let’s build AI now! » (A. Ng, 2017)
 « AI is the new electricity! »
Example: Named Entity Recognition
Cf.
https://www.researchgate.net/publication/329933780_A_Survey_on_Deep_Learning_for_Named_Entity_Recognition/download
Many / most of
these approaches
available with
code
NLP: A commodity?
Named entity recognition in four steps:
$ pip install spacy
$ python –m spacy download en
$ cat > testspacy.py
import spacy
nlp = spacy.load(‘en’)
doc = nlp(“Angela Merkel will meet Emmanuel Macron at the summit in Amsterdam”)
for entity in doc.ents:
print(entity.text)
CRTL-D
$ python testspacy.py
Angela Merkel
Emmanuel Macron
Amsterdam
Algorithms are commodity
Even the top scoring system from the list earlier is available on github:
https://github.com/zalandoresearch/flair
For the protocol:
The survey does not list Delft (
https://github.com/kermitt2/delft),
implemented by the Kairntech chief
ML expert and which
•Scores exactly at 93,09% on
Conll2003, too
•Creates models that are very
compact (~5MB vs. >150MB)
•Loads model in ~2sec at initialization
Nice and easy
But…
Pain points
 Off-the-shelf NLP models often don’t work for
specific needs
 Implementation is slowed down by the need of
building specific training dataset
 AI/NLP services are often require integration of
business glossaries & knowledge graph
 Absence of maintenance leads to quality deviations
Frequent requirements in real-world projects
 In many commercial scenarios around entity extraction, an entity not only has to be
recognized but also typed
 A DATE in a contract may be the date when the contract becomes effective,
when it was signed, when it will be terminated
 A PERSON in a legal opinion may be the defendant, the lawyer, the judge, the
witness …
 A DISEASE in clinical study may be the core therapeutic area or a peripheral
occasional adverse event
 This is beyond the public named entity recognition modules
 Typically, for these decisions no training corpora exist. They must be established
within a project.
You don’t have to take my word on that.
Let’s listen to what the experts say:
 Algorithms are commodity, data is gold
Peter Norvig:
“We [at Google] don't have
better algorithms than anyone
else; we just have more data!”
“More data beats clever
algorithms.”
Angela Merkel:
“Data is the new oil of the 21st
century!“
So: We need data, not only algorithms
Charts copied from https://hackernoon.com/%EF%B8%8F-big-challenge-in-deep-learning-training-data-31a88b97b282
Requirements
What will be more important for
the success of your project?
Driving the training accuracy from, say,
92,4 to 93,6% on a pre-defined data set?
or
ML components that allow high quality with
small training sets and moderate annotation and
training time?
Example
 The Conll2003 data set used in many academic NER
experiments contains >100000 entities
 Assume 30sec per entity  100 person days pure annotation
time! (With one single annotator)
Unrealistic in most commercial project settings.
Commercial projects have requirements that are different
from academic research!
On dataset preparation: Requirements
Web-based (no install), intuitive GUI, usable by domain experts
Limit manual annotation efforts: Active Learning
Collaboration (work in teams, measure inter-annotator agreement)
Not just NER annotation: Entity typing, document categorization, …
Must facilitate deployment-to-production
Why another tool?
 WebAnno:
 Scientific focus: « Annotate corpora to allow the study of
linguistic phenomena »
 Sentence-based, Loosing all layout information
 Spacy/Prodi.gy:
 Focus on local/lexical named entity recognition. Underlying
model by default considering a narrow window of n (n=4) words
left and right.
 Brat:
 Interface-only. Integration with model building, semi-automatic
suggestions, deployment?
Kairntech Sherpa
Annotation
environment
Raw or preannotated
Corpora:
Text, Audio, …
ML model
Curated AnnotationsAutomatic Annotation
Suggestions
User
Datasets and
ML models
Search, Collaboration, Manual &
assisted annotation, Quality
metrics, Synchronisation into ML
model
Active Learning?
 Reduce effort in manual annotation of data by presenting the user with data in
some informed order:
 Ask the user for feedback on the samples that promises the highest benefit:
Samples that are least certain*
(*) Diagrams used from datacamp.com
 Active Learning applied on NLP tasks has been shown to reduce the amount of
required training data dramatically
 7% of the sample under AL regime yield the same quality as naive selection
(cf. Laws 2012: https://d-nb.info/1030521204/34)
 In a project that would mean 1 day annotation instead of 14 days
Benefits of AL?
 Growing accuracy on a
(simple) ML task as number
of samples grows
 Naive selection (« Random »,
orange line) growing slowly
 Informed selection (« QBC,
« query by committee », red
line) grows much faster
 AL promises to reduce effort
required for manual
annotation
A non-expert workflow for dataset creation
Ask the
application for
suggestions
(De-) validate
and retrain
Once satisfied,
export/deploy
About Kairntech
 Kairntech: The company
 Created in dec 2018, 10 partners
 France (Paris & Grenoble/Meylan), Germany
(Heidelberg)
 Kairntech: The team
 Background in Software engineering, Machine
Learning, Sales, Management
 +15 years of experence in NLP development and
deployment from Xerox, IBM, TEMIS. Development of
components currently in production at CERN, NASA,
EPO…)
Kairntech: Our profile
 Industrialize the creation of document sets (training
corpora) by offering an environment for the data
preparation by domain experts, easy and efficient to use
 The transformation of data sets in document analysis
services, adding value to enterprise knowledge
repositories (e.g. knowledge graphs)
 Industrial deployment of maintenance of these services.
Kairntech: Our offering
Conclusions
 So much data!
 But very little of it labelled and useful for superised learning
 So many pretrained models!
 But most of the time they do not quite do what you need in
your project
 So many algorithms!
 But a library alone will not allow you to implement the solution
you need
 Kairntech is there to support you!
Thank you for your attention !
Stefan.Geissler@kairntech.com

More Related Content

PDF
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
Dr. Haxel Consult
 
PDF
The Architect's Blind Spot - Ilionx Dev Days 2019
Pepijn van de Kamp
 
PPTX
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
Sri Ambati
 
PDF
TensorFlow London 18: Dr Alastair Moore, Towards the use of Graphical Models ...
Seldon
 
PPT
Aspects of the sustainability of software
Paul Walk
 
PDF
Artificial Intelligence (AI): Deep Learning
Flevy.com Best Practices
 
PDF
Ijetcas14 533
Iasir Journals
 
PPTX
Proposed Talk Outline for Pycon2017
Dr. Ananth Krishnamoorthy
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
Dr. Haxel Consult
 
The Architect's Blind Spot - Ilionx Dev Days 2019
Pepijn van de Kamp
 
Carmelo Iaria, AI Academy - How The AI Academy is accelerating NLP projects w...
Sri Ambati
 
TensorFlow London 18: Dr Alastair Moore, Towards the use of Graphical Models ...
Seldon
 
Aspects of the sustainability of software
Paul Walk
 
Artificial Intelligence (AI): Deep Learning
Flevy.com Best Practices
 
Ijetcas14 533
Iasir Journals
 
Proposed Talk Outline for Pycon2017
Dr. Ananth Krishnamoorthy
 

What's hot (7)

PPTX
The Python ecosystem for data science - Landscape Overview
Dr. Ananth Krishnamoorthy
 
PDF
Final_version_SAI_ST_projectenboekje_2015
Spyridon (Spiros) Skoumpakis
 
DOC
2012 - 2013 DOTNET IEEE PROJECT TITLES
JPINFOTECH JAYAPRAKASH
 
PDF
Demystifying transfer learning with Tensorflow
Knoldus Inc.
 
PPTX
Hadoop training in mumbai
faizrashid1995
 
PDF
Cornell University Uses Splashtop to Deliver 2D/3D Applications using Amazon ...
Splashtop Inc
 
PPTX
Keras: A versatile modeling layer for deep learning
Dr. Ananth Krishnamoorthy
 
The Python ecosystem for data science - Landscape Overview
Dr. Ananth Krishnamoorthy
 
Final_version_SAI_ST_projectenboekje_2015
Spyridon (Spiros) Skoumpakis
 
2012 - 2013 DOTNET IEEE PROJECT TITLES
JPINFOTECH JAYAPRAKASH
 
Demystifying transfer learning with Tensorflow
Knoldus Inc.
 
Hadoop training in mumbai
faizrashid1995
 
Cornell University Uses Splashtop to Deliver 2D/3D Applications using Amazon ...
Splashtop Inc
 
Keras: A versatile modeling layer for deep learning
Dr. Ananth Krishnamoorthy
 
Ad

Similar to Stefan Geissler kairntech - SDC Nice Apr 2019 (20)

PPTX
AI-SDV 2020: Kairntech
Dr. Haxel Consult
 
PPTX
Scaling Training Data for AI Applications
Applause
 
PPTX
ICLR 2020 Recap
Sri Ambati
 
PPT
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION
butest
 
PDF
Machine Learning in NLP
Vijay Ganti
 
PDF
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Yves Peirsman
 
PDF
Nlp and Neural Networks workshop
QuantUniversity
 
PDF
Introducción práctica al análisis de datos hasta la inteligencia artificial
fcoalberto
 
PPTX
AI hype or reality
Awantik Das
 
PPTX
What is AI ML NLP and how to apply them
Extreme Innovations Inc
 
PPTX
Introduction.pptx about the mechine Learning
AdhiNaidu
 
PDF
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
PPTX
Taras Fedorov "Evolution from ML to DL in NLP project"
Lviv Startup Club
 
PDF
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
smckeever
 
PPTX
State of the art in Natural Language Processing (March 2019)
Liad Magen
 
PDF
Teaching AI about human knowledge
Ines Montani
 
PDF
Deep learning for NLP
Shishir Choudhary
 
PDF
Natural Language Processing (NLP)
Yuriy Guts
 
PPTX
Deep learning
Sumit Sony
 
PPTX
Machine Learning AND Deep Learning for OpenPOWER
Ganesan Narayanasamy
 
AI-SDV 2020: Kairntech
Dr. Haxel Consult
 
Scaling Training Data for AI Applications
Applause
 
ICLR 2020 Recap
Sri Ambati
 
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION
butest
 
Machine Learning in NLP
Vijay Ganti
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Yves Peirsman
 
Nlp and Neural Networks workshop
QuantUniversity
 
Introducción práctica al análisis de datos hasta la inteligencia artificial
fcoalberto
 
AI hype or reality
Awantik Das
 
What is AI ML NLP and how to apply them
Extreme Innovations Inc
 
Introduction.pptx about the mechine Learning
AdhiNaidu
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
Taras Fedorov "Evolution from ML to DL in NLP project"
Lviv Startup Club
 
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
smckeever
 
State of the art in Natural Language Processing (March 2019)
Liad Magen
 
Teaching AI about human knowledge
Ines Montani
 
Deep learning for NLP
Shishir Choudhary
 
Natural Language Processing (NLP)
Yuriy Guts
 
Deep learning
Sumit Sony
 
Machine Learning AND Deep Learning for OpenPOWER
Ganesan Narayanasamy
 
Ad

Recently uploaded (20)

PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 

Stefan Geissler kairntech - SDC Nice Apr 2019

  • 1. IC-SDV 2019 April 9-10, 2019 Nice, France Addressing requirements for real-world deployments of ML & NLP Stefan Geißler, Kairntech
  • 2. Agenda Looking back: the NLP landscape has changed dramatically Algorithms  Data! Support dataset creation: The Kairntech Sherpa Kairntech? Who are we Conclusion
  • 3. Looking back : NLP landscape has changed 2000: Very few open source components Lexicons, Taggers, Morphology, Parsers mostly proprietory, complex to install and maintain, limited coverage « Make or Buy » High level of manual efforts in creating and maintaining lexical knowledge bases, rule systems
  • 4. Today 2019: Sharing! (Github, …) Lexicons, Taggers, Morphology, Parsers often in the public domain « Combine & Adapt » Broad success of learning-based approaches
  • 5. 2019: A tipping point in ML & NLP?  « 2018 was the ‘image net’ moment for deep learning in NLP’ (S. Ruder)  In Image Processing in 2012 a Deep Learning network won a public contest by a large margin. Now in 2018 we saw exciting NLP models implementing transfer learning: ELMo, UMLfit, BERT  « ML Engineering in NLP will truly blossom in 2019 » (E. Ameisen)  Focus on Tools beyond model building! Link NLP/AI to production use! What does it mean to build data-driven products and services?  « Enough papers: Let’s build AI now! » (A. Ng, 2017)  « AI is the new electricity! »
  • 6. Example: Named Entity Recognition Cf. https://www.researchgate.net/publication/329933780_A_Survey_on_Deep_Learning_for_Named_Entity_Recognition/download Many / most of these approaches available with code
  • 7. NLP: A commodity? Named entity recognition in four steps: $ pip install spacy $ python –m spacy download en $ cat > testspacy.py import spacy nlp = spacy.load(‘en’) doc = nlp(“Angela Merkel will meet Emmanuel Macron at the summit in Amsterdam”) for entity in doc.ents: print(entity.text) CRTL-D $ python testspacy.py Angela Merkel Emmanuel Macron Amsterdam
  • 8. Algorithms are commodity Even the top scoring system from the list earlier is available on github: https://github.com/zalandoresearch/flair For the protocol: The survey does not list Delft ( https://github.com/kermitt2/delft), implemented by the Kairntech chief ML expert and which •Scores exactly at 93,09% on Conll2003, too •Creates models that are very compact (~5MB vs. >150MB) •Loads model in ~2sec at initialization
  • 10. Pain points  Off-the-shelf NLP models often don’t work for specific needs  Implementation is slowed down by the need of building specific training dataset  AI/NLP services are often require integration of business glossaries & knowledge graph  Absence of maintenance leads to quality deviations
  • 11. Frequent requirements in real-world projects  In many commercial scenarios around entity extraction, an entity not only has to be recognized but also typed  A DATE in a contract may be the date when the contract becomes effective, when it was signed, when it will be terminated  A PERSON in a legal opinion may be the defendant, the lawyer, the judge, the witness …  A DISEASE in clinical study may be the core therapeutic area or a peripheral occasional adverse event  This is beyond the public named entity recognition modules  Typically, for these decisions no training corpora exist. They must be established within a project.
  • 12. You don’t have to take my word on that. Let’s listen to what the experts say:  Algorithms are commodity, data is gold Peter Norvig: “We [at Google] don't have better algorithms than anyone else; we just have more data!” “More data beats clever algorithms.” Angela Merkel: “Data is the new oil of the 21st century!“
  • 13. So: We need data, not only algorithms Charts copied from https://hackernoon.com/%EF%B8%8F-big-challenge-in-deep-learning-training-data-31a88b97b282
  • 14. Requirements What will be more important for the success of your project? Driving the training accuracy from, say, 92,4 to 93,6% on a pre-defined data set? or ML components that allow high quality with small training sets and moderate annotation and training time?
  • 15. Example  The Conll2003 data set used in many academic NER experiments contains >100000 entities  Assume 30sec per entity  100 person days pure annotation time! (With one single annotator) Unrealistic in most commercial project settings. Commercial projects have requirements that are different from academic research!
  • 16. On dataset preparation: Requirements Web-based (no install), intuitive GUI, usable by domain experts Limit manual annotation efforts: Active Learning Collaboration (work in teams, measure inter-annotator agreement) Not just NER annotation: Entity typing, document categorization, … Must facilitate deployment-to-production
  • 17. Why another tool?  WebAnno:  Scientific focus: « Annotate corpora to allow the study of linguistic phenomena »  Sentence-based, Loosing all layout information  Spacy/Prodi.gy:  Focus on local/lexical named entity recognition. Underlying model by default considering a narrow window of n (n=4) words left and right.  Brat:  Interface-only. Integration with model building, semi-automatic suggestions, deployment?
  • 18. Kairntech Sherpa Annotation environment Raw or preannotated Corpora: Text, Audio, … ML model Curated AnnotationsAutomatic Annotation Suggestions User Datasets and ML models Search, Collaboration, Manual & assisted annotation, Quality metrics, Synchronisation into ML model
  • 19. Active Learning?  Reduce effort in manual annotation of data by presenting the user with data in some informed order:  Ask the user for feedback on the samples that promises the highest benefit: Samples that are least certain* (*) Diagrams used from datacamp.com  Active Learning applied on NLP tasks has been shown to reduce the amount of required training data dramatically  7% of the sample under AL regime yield the same quality as naive selection (cf. Laws 2012: https://d-nb.info/1030521204/34)  In a project that would mean 1 day annotation instead of 14 days
  • 20. Benefits of AL?  Growing accuracy on a (simple) ML task as number of samples grows  Naive selection (« Random », orange line) growing slowly  Informed selection (« QBC, « query by committee », red line) grows much faster  AL promises to reduce effort required for manual annotation
  • 21. A non-expert workflow for dataset creation Ask the application for suggestions (De-) validate and retrain Once satisfied, export/deploy
  • 22. About Kairntech  Kairntech: The company  Created in dec 2018, 10 partners  France (Paris & Grenoble/Meylan), Germany (Heidelberg)  Kairntech: The team  Background in Software engineering, Machine Learning, Sales, Management  +15 years of experence in NLP development and deployment from Xerox, IBM, TEMIS. Development of components currently in production at CERN, NASA, EPO…)
  • 23. Kairntech: Our profile  Industrialize the creation of document sets (training corpora) by offering an environment for the data preparation by domain experts, easy and efficient to use  The transformation of data sets in document analysis services, adding value to enterprise knowledge repositories (e.g. knowledge graphs)  Industrial deployment of maintenance of these services.
  • 25. Conclusions  So much data!  But very little of it labelled and useful for superised learning  So many pretrained models!  But most of the time they do not quite do what you need in your project  So many algorithms!  But a library alone will not allow you to implement the solution you need  Kairntech is there to support you!
  • 26. Thank you for your attention ! [email protected]

Editor's Notes

  • #9: Attention: Numbers are not always comparable! Are the models trained with or without the validation set? Are the numbers the best of a set of n experiments? Or the average of n experiments? We have spent some effort in redoing the experiments reported in the literature and there are often slight variations. This does not mean that there is dishonesty involved!! But it means that when results are within a few tenth of a percent, the question “which approach is best” becomes blurry.
  • #14: https://blog.floydhub.com/ten-trends-in-deep-learning-nlp/ What does it mean for me? Can this research be applied to everyday applications? Or is the underlying technology still evolving so rapidly that it is not worth investing time developing an approach which may be considered obsolete with the next research paper?
  • #18: Also doccano, talen,