SlideShare a Scribd company logo
Cross-Evaluation of Entity Linking and Disambiguation
Systems for Clinical Text Annotation
Camilo Thorne Stefano Faralli Heiner Stuckenschmidt
Data and Web Science (DWS) Group
Universit¨at Mannheim, Germany
{camilo,stefano,heiner}@informatik.uni-mannheim.de
SEMANTiCS 2016
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 1 / 13
Motivation
Low dose pramipexole is neuroprotective in
the MPTP mouse model of Parkinson’s disease
(*)
Problems:
1 identify entities (nouns, noun phrases) within an text;
2 identify or resolve the meaning of such entities within such text by linking
them to a sense repository
3 resolve meaning of both domain-specific and generic terms
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
Motivation
Low dose pramipexole is neuroprotective in
the MPTP mouse model of Parkinson’s disease
(*)
Problems:
1 identify entities (nouns, noun phrases) within an text;
2 identify or resolve the meaning of such entities within such text by linking
them to a sense repository
3 resolve meaning of both domain-specific and generic terms
Question
Are there annotation services capable of both?
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
Annotators
MetaMap
(Aronson and
Lang, 2010)
clinical domain
sense repository:
UMLS
REST service
multilingual
sense: CUI
BabelFly
(Moro et al., 2014)
general domain
sense repository:
BabelNet
REST service
multilingual
sense: babelsynset
TagMe
(Ferragina and
Scaiella, 2010)
general domain
“sense” repository:
Wikipedia
REST service
English/Italian
“sense”: Wiki page
WordNet (Lesk)
(custom)
general domain
sense repository:
WordNet 3.0
Baseline
English
sense: synset
Problem: Sense repositories a priori not aligned
Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot
(partial mappings)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
Annotators
MetaMap
(Aronson and
Lang, 2010)
clinical domain
sense repository:
UMLS
REST service
multilingual
sense: CUI
BabelFly
(Moro et al., 2014)
general domain
sense repository:
BabelNet
REST service
multilingual
sense: babelsynset
TagMe
(Ferragina and
Scaiella, 2010)
general domain
“sense” repository:
Wikipedia
REST service
English/Italian
“sense”: Wiki page
WordNet (Lesk)
(custom)
general domain
sense repository:
WordNet 3.0
Baseline
English
sense: synset
Problem: Sense repositories a priori not aligned
Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot
(partial mappings)
!! UMLS can be mapped to DBpedia via Medline and the LikedLifeData
initiative (Momtchev et al., 2009)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
Annotations (Overview)
Use DBpedia as pivot:
sense sense ID DBpedia URI
Clinical pramipexol C0074710 http://dbpedia.org/resource/Pramipexole
(Gold) Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease
MetaMap pramipexol C0074710 http://dbpedia.org/resource/Pramipexole
Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease
BabelFly ATC code N04BC05 bn:03124207n http://dbpedia.org/resource/Pramipexole
TagMe pramipexole https://goo.gl/twrSVu http://dbpedia.org/resource/Pramipexole
Parkinson’s disease https://goo.gl/Xke6W3 http://dbpedia.org/resource/Parkinson’s disease
annotations for example (*)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 4 / 13
SemRep Corpus (Kilicoglu et al., 2011)
Experiments ran over the SemRep corpus
Small annotated clinical corpus
428 clinical excerpts (MedLine/PubMed)
13, 948 word tokens
856 UMLS-annotated clinical terms
For each sentence, two noun phrases annotated with their corresponding
UMLS CUI by clinicians
606 terms can be associated to a corresponding DBpedia URI
Example (*) taken from SemRep
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 5 / 13
Annotation Statistics
# of CUIs in corpus (total) = 856
# of corpus DBpedia URIs = 606
# of resolved corpus URIs = 404
# of MetaMap DBpedia URIs = 343
# of resolved MetaMap URIs = 242
# of BabelFly DBpedia URIs = 432
# of resolved BabelFly URIs = 269
# of TagMe DBpedia URIs = 469
# of resolved TagMe URIs = 320
# of WordNet DBpedia URIs = 182
# of resolved WordNet URIs = 97
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 6 / 13
Cross-Evaluation
Pre = #corrent senses
#returned senses Rec = #corrent senses
#corpus senses F1 = 2·Pre·Rec
Pre+Rec
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(unresolved URIs)
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(resolved URIs)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
Cross-Evaluation
Pre = #corrent senses
#returned senses Rec = #corrent senses
#corpus senses F1 = 2·Pre·Rec
Pre+Rec
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(unresolved URIs)
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(resolved URIs)
Conclusion
When URIs are resolved via same as, generic EL systems such as TagMe and
BabelNet match domain-specific annotators like MetaMap
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
Semantic Relatedness Measures
syn(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | wn>0.2(w, w )}
|g(s)| + |g(s )|
syn+
(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | wn>0(w, w )}
|g(s)| + |g(s )|
dsyn(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | dn>0.2(w, w )}
|g(s)| + |g(s )|
dsyn+
(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | dn>0(w, w )}
|g(s)| + |g(s )|
We measured:
1 WordNet similarity (low coverage, but better accuracy) under two
“synonymy” thresholds (“strict” > 0.2, “loose” > 0)
2 word embedding relatedness (standard Wikipedia-trained word2vec
word space models) under two “synonymy” thresholds (“strict” > 0.2
and “loose” > 0)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 8 / 13
Annotation Relatedness
sy
sy+
dsy
dsy+
0.0
0.1
0.2
0.3
0.4
0.5
Coverage(avg.)
MetaMap BabelFly TagMe WordNet
Annotations Avg. len. (sent.)
Corpus sense glosses 66.41 words
BabelFly sense glosses 199.43 words
TagMe sense glosses 325.51 words
MetaMap sense glosses 191.76 words
WordNet sense glosses 50.50 words
Test Null hyp. p-value
Kruskal-Wallis identical 0.897
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
Annotation Relatedness
sy
sy+
dsy
dsy+
0.0
0.1
0.2
0.3
0.4
0.5
Coverage(avg.)
MetaMap BabelFly TagMe WordNet
Annotations Avg. len. (sent.)
Corpus sense glosses 66.41 words
BabelFly sense glosses 199.43 words
TagMe sense glosses 325.51 words
MetaMap sense glosses 191.76 words
WordNet sense glosses 50.50 words
Test Null hyp. p-value
Kruskal-Wallis identical 0.897
Conclusion
No significant differences w.r.t. semantic relatedness
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
Summing up...
We have cross-evaluated generic WSD and linking systems (BabelFly,
TagMe) with domain-specific (MetaMap) annotators
Generic WSD and linking systems show competitive results over the SemRep
gold standard
In particular, their greater coverage yields improvements in F1-score (TagMe
outclasses MetaMap in F1-score, but by a small margin)
In the future we plan to investigate if domain adaptation yields better results
and improve linking
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 10 / 13
Thank You!
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 11 / 13
References I
Aronson, A. R. and Lang, F.-M. (2010). And overview of MetaMap: Historical
perspective and recent advances. Journal of the American Medical Informatics
Association, 17(3):229–236.
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and
Hellmann, S. (2009). DBpedia - A crystallization point for the web of data.
Journal of Web Semantics, 7(3):154–165.
Ferragina, P. and Scaiella, U. (2010). TAGME: on-the-fly annotation of short text
fragments (by wikipedia entities). In Proceedings of the 19th ACM International
Conference on Information and Knowledge Management (CIKM 2010).
Kilicoglu, H., Rosenblat, G., Fiszman, M., and Rindfleisch, T. C. (2011).
Constructing a semantic predication gold standard from the biomedical
literature. BMC Bioinformatics, 12(486).
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 12 / 13
References II
Momtchev, V., Peychev, D., Primov, T., and Georgiev, G. (2009). Expanding the
pathway and interaction knowledge in linked life data. Proceedings of 2009
International Semantic Web Challenge.
Moro, A., Raganato, A., and Navigli, R. (2014). Entity linking meets word sense
disambiguation: a unified approach. Transactions of the Association for
Computational Linguistics, 2:231–244.
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 13 / 13

More Related Content

PDF
Survey On Building A Database Driven Reverse Dictionary
Editor IJMTER
 
PDF
Identifying the semantic relations on
ijistjournal
 
DOCX
Extracting Person Names from Diverse and Noisy OCR Text Thomas ...
butest
 
PPT
Boolean Retrieval
mghgk
 
PDF
601-CriticalEssay-2-Portfolio Edition
Jordan Chapman
 
PDF
P99 1067
ALEXANDRASUWANN
 
PDF
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
semanticsconference
 
PDF
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
semanticsconference
 
Survey On Building A Database Driven Reverse Dictionary
Editor IJMTER
 
Identifying the semantic relations on
ijistjournal
 
Extracting Person Names from Diverse and Noisy OCR Text Thomas ...
butest
 
Boolean Retrieval
mghgk
 
601-CriticalEssay-2-Portfolio Edition
Jordan Chapman
 
P99 1067
ALEXANDRASUWANN
 
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
semanticsconference
 
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
semanticsconference
 

Viewers also liked (20)

PDF
Michael Fuchs | How to compute semantic relationships between entities and fa...
semanticsconference
 
PDF
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
semanticsconference
 
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 
PDF
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
semanticsconference
 
PPTX
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
semanticsconference
 
PPTX
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
semanticsconference
 
PDF
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
semanticsconference
 
PDF
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
semanticsconference
 
PPTX
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
semanticsconference
 
PPTX
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
semanticsconference
 
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
semanticsconference
 
PPTX
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
semanticsconference
 
PPTX
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
semanticsconference
 
PPTX
Kostas Kastrantas | Business Opportunities with Linked Open Data
semanticsconference
 
PDF
Victor Charpenay | Standardized Semantics for an Open Web of Things
semanticsconference
 
PPTX
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
semanticsconference
 
PDF
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
semanticsconference
 
PDF
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
semanticsconference
 
PPTX
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
semanticsconference
 
PPTX
Thomas Vavra | New Ways of Handling Old Data
semanticsconference
 
Michael Fuchs | How to compute semantic relationships between entities and fa...
semanticsconference
 
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
semanticsconference
 
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
semanticsconference
 
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
semanticsconference
 
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
semanticsconference
 
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
semanticsconference
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
semanticsconference
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
semanticsconference
 
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
semanticsconference
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
semanticsconference
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
semanticsconference
 
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
semanticsconference
 
Kostas Kastrantas | Business Opportunities with Linked Open Data
semanticsconference
 
Victor Charpenay | Standardized Semantics for an Open Web of Things
semanticsconference
 
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
semanticsconference
 
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
semanticsconference
 
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
semanticsconference
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
semanticsconference
 
Thomas Vavra | New Ways of Handling Old Data
semanticsconference
 
Ad

Similar to Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation (20)

PPTX
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
PDF
Tutoriel ssmt
Lorraine Goeuriot
 
PPTX
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
PDF
White, "Ontologies & User Needs in Publishing"
National Information Standards Organization (NISO)
 
PPT
Driving Deep Semantics in Middleware and Networks: What, why and how?
Amit Sheth
 
PDF
Biomedical Entity Linking - Introduction, approaches, challenges
Anja Pilz
 
PDF
Experiences on integrating explicit knowledge on information access tools in ...
Manuel de la Villa
 
PDF
ALA 2010 -- Jabin White
bisg
 
PPTX
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
Erich Gombocz
 
PDF
Semantic Web concepts used in Web 3.0 applications
IRJET Journal
 
PPT
AAUP 2008: Making XML Work (T. Kerner)
Association of University Presses
 
PDF
Powering Biomedical Artificial Intelligence with a Holistic Knowledge Graph (...
Catia Pesquita
 
PDF
About the use of biomedical ontologies to play with text in the context of th...
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
PPTX
Semantic annotation of biomedical data
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
PPTX
121129 umls yes
Eunsil Yoon
 
PDF
MIXHS12-Zhe
Zhe (Henry) He
 
PPTX
Enriching the semantic web tutorial session 1
Tobias Wunner
 
PPTX
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
PPT
The BioMoby Semantic Annotation Experiment
Mark Wilkinson
 
PDF
Towards comprehensive syntactic and semantic annotations of the clinical narr...
Jinho Choi
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
Tutoriel ssmt
Lorraine Goeuriot
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
White, "Ontologies & User Needs in Publishing"
National Information Standards Organization (NISO)
 
Driving Deep Semantics in Middleware and Networks: What, why and how?
Amit Sheth
 
Biomedical Entity Linking - Introduction, approaches, challenges
Anja Pilz
 
Experiences on integrating explicit knowledge on information access tools in ...
Manuel de la Villa
 
ALA 2010 -- Jabin White
bisg
 
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
Erich Gombocz
 
Semantic Web concepts used in Web 3.0 applications
IRJET Journal
 
AAUP 2008: Making XML Work (T. Kerner)
Association of University Presses
 
Powering Biomedical Artificial Intelligence with a Holistic Knowledge Graph (...
Catia Pesquita
 
About the use of biomedical ontologies to play with text in the context of th...
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
Semantic annotation of biomedical data
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
121129 umls yes
Eunsil Yoon
 
MIXHS12-Zhe
Zhe (Henry) He
 
Enriching the semantic web tutorial session 1
Tobias Wunner
 
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
The BioMoby Semantic Annotation Experiment
Mark Wilkinson
 
Towards comprehensive syntactic and semantic annotations of the clinical narr...
Jinho Choi
 
Ad

More from semanticsconference (20)

PPTX
Linear books to open world adventure
semanticsconference
 
PDF
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
semanticsconference
 
PDF
Session 4.3 semantic annotation for enhancing collaborative ideation
semanticsconference
 
PDF
Session 1.1 dalicc - data licenses clearance center
semanticsconference
 
PDF
Session 1.3 context information management across smart city knowledge domains
semanticsconference
 
PDF
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
semanticsconference
 
PPTX
Session 0.0 keynote sandeep sacheti - final hi res
semanticsconference
 
PPTX
Session 1.1 linked data applied: a field report from the netherlands
semanticsconference
 
PDF
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
semanticsconference
 
PDF
Session 1.4 connecting information from legislation and datasets using a ca...
semanticsconference
 
PDF
Session 1.4 a distributed network of heritage information
semanticsconference
 
PDF
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
semanticsconference
 
PDF
Session 1.3 semantic asset management in the dutch rail engineering and con...
semanticsconference
 
PPTX
Session 1.3 energy, smart homes & smart grids: towards interoperability...
semanticsconference
 
PDF
Session 1.2 improving access to digital content by semantic enrichment
semanticsconference
 
PPTX
Session 2.3 semantics for safeguarding & security – a police story
semanticsconference
 
PPTX
Session 2.5 semantic similarity based clustering of license excerpts for im...
semanticsconference
 
PDF
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
semanticsconference
 
PDF
Session 1.6 slovak public metadata governance and management based on linke...
semanticsconference
 
PPTX
Session 5.6 towards a semantic outlier detection framework in wireless sens...
semanticsconference
 
Linear books to open world adventure
semanticsconference
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
semanticsconference
 
Session 4.3 semantic annotation for enhancing collaborative ideation
semanticsconference
 
Session 1.1 dalicc - data licenses clearance center
semanticsconference
 
Session 1.3 context information management across smart city knowledge domains
semanticsconference
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
semanticsconference
 
Session 0.0 keynote sandeep sacheti - final hi res
semanticsconference
 
Session 1.1 linked data applied: a field report from the netherlands
semanticsconference
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
semanticsconference
 
Session 1.4 connecting information from legislation and datasets using a ca...
semanticsconference
 
Session 1.4 a distributed network of heritage information
semanticsconference
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
semanticsconference
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
semanticsconference
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
semanticsconference
 
Session 1.2 improving access to digital content by semantic enrichment
semanticsconference
 
Session 2.3 semantics for safeguarding & security – a police story
semanticsconference
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
semanticsconference
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
semanticsconference
 
Session 1.6 slovak public metadata governance and management based on linke...
semanticsconference
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
semanticsconference
 

Recently uploaded (20)

PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Doc9.....................................
SofiaCollazos
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 

Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation

  • 1. Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation Camilo Thorne Stefano Faralli Heiner Stuckenschmidt Data and Web Science (DWS) Group Universit¨at Mannheim, Germany {camilo,stefano,heiner}@informatik.uni-mannheim.de SEMANTiCS 2016 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 1 / 13
  • 2. Motivation Low dose pramipexole is neuroprotective in the MPTP mouse model of Parkinson’s disease (*) Problems: 1 identify entities (nouns, noun phrases) within an text; 2 identify or resolve the meaning of such entities within such text by linking them to a sense repository 3 resolve meaning of both domain-specific and generic terms C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
  • 3. Motivation Low dose pramipexole is neuroprotective in the MPTP mouse model of Parkinson’s disease (*) Problems: 1 identify entities (nouns, noun phrases) within an text; 2 identify or resolve the meaning of such entities within such text by linking them to a sense repository 3 resolve meaning of both domain-specific and generic terms Question Are there annotation services capable of both? C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
  • 4. Annotators MetaMap (Aronson and Lang, 2010) clinical domain sense repository: UMLS REST service multilingual sense: CUI BabelFly (Moro et al., 2014) general domain sense repository: BabelNet REST service multilingual sense: babelsynset TagMe (Ferragina and Scaiella, 2010) general domain “sense” repository: Wikipedia REST service English/Italian “sense”: Wiki page WordNet (Lesk) (custom) general domain sense repository: WordNet 3.0 Baseline English sense: synset Problem: Sense repositories a priori not aligned Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot (partial mappings) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
  • 5. Annotators MetaMap (Aronson and Lang, 2010) clinical domain sense repository: UMLS REST service multilingual sense: CUI BabelFly (Moro et al., 2014) general domain sense repository: BabelNet REST service multilingual sense: babelsynset TagMe (Ferragina and Scaiella, 2010) general domain “sense” repository: Wikipedia REST service English/Italian “sense”: Wiki page WordNet (Lesk) (custom) general domain sense repository: WordNet 3.0 Baseline English sense: synset Problem: Sense repositories a priori not aligned Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot (partial mappings) !! UMLS can be mapped to DBpedia via Medline and the LikedLifeData initiative (Momtchev et al., 2009) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
  • 6. Annotations (Overview) Use DBpedia as pivot: sense sense ID DBpedia URI Clinical pramipexol C0074710 http://dbpedia.org/resource/Pramipexole (Gold) Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease MetaMap pramipexol C0074710 http://dbpedia.org/resource/Pramipexole Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease BabelFly ATC code N04BC05 bn:03124207n http://dbpedia.org/resource/Pramipexole TagMe pramipexole https://goo.gl/twrSVu http://dbpedia.org/resource/Pramipexole Parkinson’s disease https://goo.gl/Xke6W3 http://dbpedia.org/resource/Parkinson’s disease annotations for example (*) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 4 / 13
  • 7. SemRep Corpus (Kilicoglu et al., 2011) Experiments ran over the SemRep corpus Small annotated clinical corpus 428 clinical excerpts (MedLine/PubMed) 13, 948 word tokens 856 UMLS-annotated clinical terms For each sentence, two noun phrases annotated with their corresponding UMLS CUI by clinicians 606 terms can be associated to a corresponding DBpedia URI Example (*) taken from SemRep C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 5 / 13
  • 8. Annotation Statistics # of CUIs in corpus (total) = 856 # of corpus DBpedia URIs = 606 # of resolved corpus URIs = 404 # of MetaMap DBpedia URIs = 343 # of resolved MetaMap URIs = 242 # of BabelFly DBpedia URIs = 432 # of resolved BabelFly URIs = 269 # of TagMe DBpedia URIs = 469 # of resolved TagMe URIs = 320 # of WordNet DBpedia URIs = 182 # of resolved WordNet URIs = 97 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 6 / 13
  • 9. Cross-Evaluation Pre = #corrent senses #returned senses Rec = #corrent senses #corpus senses F1 = 2·Pre·Rec Pre+Rec Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (unresolved URIs) Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (resolved URIs) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
  • 10. Cross-Evaluation Pre = #corrent senses #returned senses Rec = #corrent senses #corpus senses F1 = 2·Pre·Rec Pre+Rec Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (unresolved URIs) Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (resolved URIs) Conclusion When URIs are resolved via same as, generic EL systems such as TagMe and BabelNet match domain-specific annotators like MetaMap C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
  • 11. Semantic Relatedness Measures syn(s, s ) = {(w, w ) ∈ g(s) × g(s ) | wn>0.2(w, w )} |g(s)| + |g(s )| syn+ (s, s ) = {(w, w ) ∈ g(s) × g(s ) | wn>0(w, w )} |g(s)| + |g(s )| dsyn(s, s ) = {(w, w ) ∈ g(s) × g(s ) | dn>0.2(w, w )} |g(s)| + |g(s )| dsyn+ (s, s ) = {(w, w ) ∈ g(s) × g(s ) | dn>0(w, w )} |g(s)| + |g(s )| We measured: 1 WordNet similarity (low coverage, but better accuracy) under two “synonymy” thresholds (“strict” > 0.2, “loose” > 0) 2 word embedding relatedness (standard Wikipedia-trained word2vec word space models) under two “synonymy” thresholds (“strict” > 0.2 and “loose” > 0) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 8 / 13
  • 12. Annotation Relatedness sy sy+ dsy dsy+ 0.0 0.1 0.2 0.3 0.4 0.5 Coverage(avg.) MetaMap BabelFly TagMe WordNet Annotations Avg. len. (sent.) Corpus sense glosses 66.41 words BabelFly sense glosses 199.43 words TagMe sense glosses 325.51 words MetaMap sense glosses 191.76 words WordNet sense glosses 50.50 words Test Null hyp. p-value Kruskal-Wallis identical 0.897 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
  • 13. Annotation Relatedness sy sy+ dsy dsy+ 0.0 0.1 0.2 0.3 0.4 0.5 Coverage(avg.) MetaMap BabelFly TagMe WordNet Annotations Avg. len. (sent.) Corpus sense glosses 66.41 words BabelFly sense glosses 199.43 words TagMe sense glosses 325.51 words MetaMap sense glosses 191.76 words WordNet sense glosses 50.50 words Test Null hyp. p-value Kruskal-Wallis identical 0.897 Conclusion No significant differences w.r.t. semantic relatedness C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
  • 14. Summing up... We have cross-evaluated generic WSD and linking systems (BabelFly, TagMe) with domain-specific (MetaMap) annotators Generic WSD and linking systems show competitive results over the SemRep gold standard In particular, their greater coverage yields improvements in F1-score (TagMe outclasses MetaMap in F1-score, but by a small margin) In the future we plan to investigate if domain adaptation yields better results and improve linking C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 10 / 13
  • 15. Thank You! C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 11 / 13
  • 16. References I Aronson, A. R. and Lang, F.-M. (2010). And overview of MetaMap: Historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229–236. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and Hellmann, S. (2009). DBpedia - A crystallization point for the web of data. Journal of Web Semantics, 7(3):154–165. Ferragina, P. and Scaiella, U. (2010). TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010). Kilicoglu, H., Rosenblat, G., Fiszman, M., and Rindfleisch, T. C. (2011). Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinformatics, 12(486). C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 12 / 13
  • 17. References II Momtchev, V., Peychev, D., Primov, T., and Georgiev, G. (2009). Expanding the pathway and interaction knowledge in linked life data. Proceedings of 2009 International Semantic Web Challenge. Moro, A., Raganato, A., and Navigli, R. (2014). Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics, 2:231–244. C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 13 / 13