Cross-Evaluation of Entity Linking and Disambiguation
Systems for Clinical Text Annotation
Camilo Thorne Stefano Faralli Heiner Stuckenschmidt
Data and Web Science (DWS) Group
Universit¨at Mannheim, Germany
{camilo,stefano,heiner}@informatik.uni-mannheim.de
SEMANTiCS 2016
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 1 / 13
Motivation
Low dose pramipexole is neuroprotective in
the MPTP mouse model of Parkinson’s disease
(*)
Problems:
1 identify entities (nouns, noun phrases) within an text;
2 identify or resolve the meaning of such entities within such text by linking
them to a sense repository
3 resolve meaning of both domain-specific and generic terms
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
Motivation
Low dose pramipexole is neuroprotective in
the MPTP mouse model of Parkinson’s disease
(*)
Problems:
1 identify entities (nouns, noun phrases) within an text;
2 identify or resolve the meaning of such entities within such text by linking
them to a sense repository
3 resolve meaning of both domain-specific and generic terms
Question
Are there annotation services capable of both?
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
Annotators
MetaMap
(Aronson and
Lang, 2010)
clinical domain
sense repository:
UMLS
REST service
multilingual
sense: CUI
BabelFly
(Moro et al., 2014)
general domain
sense repository:
BabelNet
REST service
multilingual
sense: babelsynset
TagMe
(Ferragina and
Scaiella, 2010)
general domain
“sense” repository:
Wikipedia
REST service
English/Italian
“sense”: Wiki page
WordNet (Lesk)
(custom)
general domain
sense repository:
WordNet 3.0
Baseline
English
sense: synset
Problem: Sense repositories a priori not aligned
Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot
(partial mappings)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
Annotators
MetaMap
(Aronson and
Lang, 2010)
clinical domain
sense repository:
UMLS
REST service
multilingual
sense: CUI
BabelFly
(Moro et al., 2014)
general domain
sense repository:
BabelNet
REST service
multilingual
sense: babelsynset
TagMe
(Ferragina and
Scaiella, 2010)
general domain
“sense” repository:
Wikipedia
REST service
English/Italian
“sense”: Wiki page
WordNet (Lesk)
(custom)
general domain
sense repository:
WordNet 3.0
Baseline
English
sense: synset
Problem: Sense repositories a priori not aligned
Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot
(partial mappings)
!! UMLS can be mapped to DBpedia via Medline and the LikedLifeData
initiative (Momtchev et al., 2009)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
Annotations (Overview)
Use DBpedia as pivot:
sense sense ID DBpedia URI
Clinical pramipexol C0074710 http://dbpedia.org/resource/Pramipexole
(Gold) Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease
MetaMap pramipexol C0074710 http://dbpedia.org/resource/Pramipexole
Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease
BabelFly ATC code N04BC05 bn:03124207n http://dbpedia.org/resource/Pramipexole
TagMe pramipexole https://goo.gl/twrSVu http://dbpedia.org/resource/Pramipexole
Parkinson’s disease https://goo.gl/Xke6W3 http://dbpedia.org/resource/Parkinson’s disease
annotations for example (*)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 4 / 13
SemRep Corpus (Kilicoglu et al., 2011)
Experiments ran over the SemRep corpus
Small annotated clinical corpus
428 clinical excerpts (MedLine/PubMed)
13, 948 word tokens
856 UMLS-annotated clinical terms
For each sentence, two noun phrases annotated with their corresponding
UMLS CUI by clinicians
606 terms can be associated to a corresponding DBpedia URI
Example (*) taken from SemRep
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 5 / 13
Annotation Statistics
# of CUIs in corpus (total) = 856
# of corpus DBpedia URIs = 606
# of resolved corpus URIs = 404
# of MetaMap DBpedia URIs = 343
# of resolved MetaMap URIs = 242
# of BabelFly DBpedia URIs = 432
# of resolved BabelFly URIs = 269
# of TagMe DBpedia URIs = 469
# of resolved TagMe URIs = 320
# of WordNet DBpedia URIs = 182
# of resolved WordNet URIs = 97
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 6 / 13
Cross-Evaluation
Pre = #corrent senses
#returned senses Rec = #corrent senses
#corpus senses F1 = 2·Pre·Rec
Pre+Rec
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(unresolved URIs)
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(resolved URIs)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
Cross-Evaluation
Pre = #corrent senses
#returned senses Rec = #corrent senses
#corpus senses F1 = 2·Pre·Rec
Pre+Rec
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(unresolved URIs)
Pre
Rec
F-1
0.0
0.2
0.4
0.6
0.8
1.0
Performance
MetaMap BabelFly TagMe WordNet
(resolved URIs)
Conclusion
When URIs are resolved via same as, generic EL systems such as TagMe and
BabelNet match domain-specific annotators like MetaMap
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
Semantic Relatedness Measures
syn(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | wn>0.2(w, w )}
|g(s)| + |g(s )|
syn+
(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | wn>0(w, w )}
|g(s)| + |g(s )|
dsyn(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | dn>0.2(w, w )}
|g(s)| + |g(s )|
dsyn+
(s, s ) =
{(w, w ) ∈ g(s) × g(s ) | dn>0(w, w )}
|g(s)| + |g(s )|
We measured:
1 WordNet similarity (low coverage, but better accuracy) under two
“synonymy” thresholds (“strict” > 0.2, “loose” > 0)
2 word embedding relatedness (standard Wikipedia-trained word2vec
word space models) under two “synonymy” thresholds (“strict” > 0.2
and “loose” > 0)
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 8 / 13
Annotation Relatedness
sy
sy+
dsy
dsy+
0.0
0.1
0.2
0.3
0.4
0.5
Coverage(avg.)
MetaMap BabelFly TagMe WordNet
Annotations Avg. len. (sent.)
Corpus sense glosses 66.41 words
BabelFly sense glosses 199.43 words
TagMe sense glosses 325.51 words
MetaMap sense glosses 191.76 words
WordNet sense glosses 50.50 words
Test Null hyp. p-value
Kruskal-Wallis identical 0.897
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
Annotation Relatedness
sy
sy+
dsy
dsy+
0.0
0.1
0.2
0.3
0.4
0.5
Coverage(avg.)
MetaMap BabelFly TagMe WordNet
Annotations Avg. len. (sent.)
Corpus sense glosses 66.41 words
BabelFly sense glosses 199.43 words
TagMe sense glosses 325.51 words
MetaMap sense glosses 191.76 words
WordNet sense glosses 50.50 words
Test Null hyp. p-value
Kruskal-Wallis identical 0.897
Conclusion
No significant differences w.r.t. semantic relatedness
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
Summing up...
We have cross-evaluated generic WSD and linking systems (BabelFly,
TagMe) with domain-specific (MetaMap) annotators
Generic WSD and linking systems show competitive results over the SemRep
gold standard
In particular, their greater coverage yields improvements in F1-score (TagMe
outclasses MetaMap in F1-score, but by a small margin)
In the future we plan to investigate if domain adaptation yields better results
and improve linking
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 10 / 13
Thank You!
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 11 / 13
References I
Aronson, A. R. and Lang, F.-M. (2010). And overview of MetaMap: Historical
perspective and recent advances. Journal of the American Medical Informatics
Association, 17(3):229–236.
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and
Hellmann, S. (2009). DBpedia - A crystallization point for the web of data.
Journal of Web Semantics, 7(3):154–165.
Ferragina, P. and Scaiella, U. (2010). TAGME: on-the-fly annotation of short text
fragments (by wikipedia entities). In Proceedings of the 19th ACM International
Conference on Information and Knowledge Management (CIKM 2010).
Kilicoglu, H., Rosenblat, G., Fiszman, M., and Rindfleisch, T. C. (2011).
Constructing a semantic predication gold standard from the biomedical
literature. BMC Bioinformatics, 12(486).
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 12 / 13
References II
Momtchev, V., Peychev, D., Primov, T., and Georgiev, G. (2009). Expanding the
pathway and interaction knowledge in linked life data. Proceedings of 2009
International Semantic Web Challenge.
Moro, A., Raganato, A., and Navigli, R. (2014). Entity linking meets word sense
disambiguation: a unified approach. Transactions of the Association for
Computational Linguistics, 2:231–244.
C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 13 / 13

More Related Content

PDF
Survey On Building A Database Driven Reverse Dictionary
PDF
Identifying the semantic relations on
DOCX
Extracting Person Names from Diverse and Noisy OCR Text Thomas ...
PPT
Boolean Retrieval
PDF
601-CriticalEssay-2-Portfolio Edition
PDF
P99 1067
PDF
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
PDF
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Survey On Building A Database Driven Reverse Dictionary
Identifying the semantic relations on
Extracting Person Names from Diverse and Noisy OCR Text Thomas ...
Boolean Retrieval
601-CriticalEssay-2-Portfolio Edition
P99 1067
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...

Viewers also liked (20)

PDF
Michael Fuchs | How to compute semantic relationships between entities and fa...
PDF
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
PDF
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
PPTX
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
PPTX
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
PDF
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
PDF
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
PPTX
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
PPTX
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
PPTX
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
PPTX
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
PPTX
Kostas Kastrantas | Business Opportunities with Linked Open Data
PDF
Victor Charpenay | Standardized Semantics for an Open Web of Things
PPTX
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
PDF
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
PDF
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
PPTX
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
PPTX
Thomas Vavra | New Ways of Handling Old Data
Michael Fuchs | How to compute semantic relationships between entities and fa...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
Kostas Kastrantas | Business Opportunities with Linked Open Data
Victor Charpenay | Standardized Semantics for an Open Web of Things
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
Thomas Vavra | New Ways of Handling Old Data
Ad

Similar to Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation (20)

PPTX
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
PDF
Tutoriel ssmt
PPTX
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
PDF
White, "Ontologies & User Needs in Publishing"
PPT
Driving Deep Semantics in Middleware and Networks: What, why and how?
PDF
Biomedical Entity Linking - Introduction, approaches, challenges
PDF
Experiences on integrating explicit knowledge on information access tools in ...
PDF
ALA 2010 -- Jabin White
PPTX
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
PDF
Semantic Web concepts used in Web 3.0 applications
PPT
AAUP 2008: Making XML Work (T. Kerner)
PDF
Powering Biomedical Artificial Intelligence with a Holistic Knowledge Graph (...
PDF
About the use of biomedical ontologies to play with text in the context of th...
PPTX
Semantic annotation of biomedical data
PPTX
121129 umls yes
PDF
MIXHS12-Zhe
PPTX
Enriching the semantic web tutorial session 1
PPTX
Diversity and Depth: Implementing AI across many long tail domains
PPT
The BioMoby Semantic Annotation Experiment
PDF
Towards comprehensive syntactic and semantic annotations of the clinical narr...
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Tutoriel ssmt
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
White, "Ontologies & User Needs in Publishing"
Driving Deep Semantics in Middleware and Networks: What, why and how?
Biomedical Entity Linking - Introduction, approaches, challenges
Experiences on integrating explicit knowledge on information access tools in ...
ALA 2010 -- Jabin White
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
Semantic Web concepts used in Web 3.0 applications
AAUP 2008: Making XML Work (T. Kerner)
Powering Biomedical Artificial Intelligence with a Holistic Knowledge Graph (...
About the use of biomedical ontologies to play with text in the context of th...
Semantic annotation of biomedical data
121129 umls yes
MIXHS12-Zhe
Enriching the semantic web tutorial session 1
Diversity and Depth: Implementing AI across many long tail domains
The BioMoby Semantic Annotation Experiment
Towards comprehensive syntactic and semantic annotations of the clinical narr...
Ad

More from semanticsconference (20)

PPTX
Linear books to open world adventure
PDF
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
PDF
Session 4.3 semantic annotation for enhancing collaborative ideation
PDF
Session 1.1 dalicc - data licenses clearance center
PDF
Session 1.3 context information management across smart city knowledge domains
PDF
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
PPTX
Session 0.0 keynote sandeep sacheti - final hi res
PPTX
Session 1.1 linked data applied: a field report from the netherlands
PDF
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
PDF
Session 1.4 connecting information from legislation and datasets using a ca...
PDF
Session 1.4 a distributed network of heritage information
PDF
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
PDF
Session 1.3 semantic asset management in the dutch rail engineering and con...
PPTX
Session 1.3 energy, smart homes & smart grids: towards interoperability...
PDF
Session 1.2 improving access to digital content by semantic enrichment
PPTX
Session 2.3 semantics for safeguarding & security – a police story
PPTX
Session 2.5 semantic similarity based clustering of license excerpts for im...
PDF
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
PDF
Session 1.6 slovak public metadata governance and management based on linke...
PPTX
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Linear books to open world adventure
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 1.1 dalicc - data licenses clearance center
Session 1.3 context information management across smart city knowledge domains
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0 keynote sandeep sacheti - final hi res
Session 1.1 linked data applied: a field report from the netherlands
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4 a distributed network of heritage information
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.2 improving access to digital content by semantic enrichment
Session 2.3 semantics for safeguarding & security – a police story
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 1.6 slovak public metadata governance and management based on linke...
Session 5.6 towards a semantic outlier detection framework in wireless sens...

Recently uploaded (20)

PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
Advancing precision in air quality forecasting through machine learning integ...
giants, standing on the shoulders of - by Daniel Stenberg
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
Improvisation in detection of pomegranate leaf disease using transfer learni...
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
future_of_ai_comprehensive_20250822032121.pptx
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Comparative analysis of machine learning models for fake news detection in so...
A symptom-driven medical diagnosis support model based on machine learning te...
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
SGT Report The Beast Plan and Cyberphysical Systems of Control
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
Data Virtualization in Action: Scaling APIs and Apps with FME
MuleSoft-Compete-Deck for midddleware integrations
EIS-Webinar-Regulated-Industries-2025-08.pdf

Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for Clinical Text Annotation and Disambiguation

  • 1. Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation Camilo Thorne Stefano Faralli Heiner Stuckenschmidt Data and Web Science (DWS) Group Universit¨at Mannheim, Germany {camilo,stefano,heiner}@informatik.uni-mannheim.de SEMANTiCS 2016 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 1 / 13
  • 2. Motivation Low dose pramipexole is neuroprotective in the MPTP mouse model of Parkinson’s disease (*) Problems: 1 identify entities (nouns, noun phrases) within an text; 2 identify or resolve the meaning of such entities within such text by linking them to a sense repository 3 resolve meaning of both domain-specific and generic terms C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
  • 3. Motivation Low dose pramipexole is neuroprotective in the MPTP mouse model of Parkinson’s disease (*) Problems: 1 identify entities (nouns, noun phrases) within an text; 2 identify or resolve the meaning of such entities within such text by linking them to a sense repository 3 resolve meaning of both domain-specific and generic terms Question Are there annotation services capable of both? C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 2 / 13
  • 4. Annotators MetaMap (Aronson and Lang, 2010) clinical domain sense repository: UMLS REST service multilingual sense: CUI BabelFly (Moro et al., 2014) general domain sense repository: BabelNet REST service multilingual sense: babelsynset TagMe (Ferragina and Scaiella, 2010) general domain “sense” repository: Wikipedia REST service English/Italian “sense”: Wiki page WordNet (Lesk) (custom) general domain sense repository: WordNet 3.0 Baseline English sense: synset Problem: Sense repositories a priori not aligned Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot (partial mappings) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
  • 5. Annotators MetaMap (Aronson and Lang, 2010) clinical domain sense repository: UMLS REST service multilingual sense: CUI BabelFly (Moro et al., 2014) general domain sense repository: BabelNet REST service multilingual sense: babelsynset TagMe (Ferragina and Scaiella, 2010) general domain “sense” repository: Wikipedia REST service English/Italian “sense”: Wiki page WordNet (Lesk) (custom) general domain sense repository: WordNet 3.0 Baseline English sense: synset Problem: Sense repositories a priori not aligned Solution: Use linked data in the form of DBpedia (Bizer et al., 2009) as pivot (partial mappings) !! UMLS can be mapped to DBpedia via Medline and the LikedLifeData initiative (Momtchev et al., 2009) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 3 / 13
  • 6. Annotations (Overview) Use DBpedia as pivot: sense sense ID DBpedia URI Clinical pramipexol C0074710 http://dbpedia.org/resource/Pramipexole (Gold) Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease MetaMap pramipexol C0074710 http://dbpedia.org/resource/Pramipexole Parkinson disease C0030567 http://dbpedia.org/resource/Parkinson disease BabelFly ATC code N04BC05 bn:03124207n http://dbpedia.org/resource/Pramipexole TagMe pramipexole https://goo.gl/twrSVu http://dbpedia.org/resource/Pramipexole Parkinson’s disease https://goo.gl/Xke6W3 http://dbpedia.org/resource/Parkinson’s disease annotations for example (*) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 4 / 13
  • 7. SemRep Corpus (Kilicoglu et al., 2011) Experiments ran over the SemRep corpus Small annotated clinical corpus 428 clinical excerpts (MedLine/PubMed) 13, 948 word tokens 856 UMLS-annotated clinical terms For each sentence, two noun phrases annotated with their corresponding UMLS CUI by clinicians 606 terms can be associated to a corresponding DBpedia URI Example (*) taken from SemRep C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 5 / 13
  • 8. Annotation Statistics # of CUIs in corpus (total) = 856 # of corpus DBpedia URIs = 606 # of resolved corpus URIs = 404 # of MetaMap DBpedia URIs = 343 # of resolved MetaMap URIs = 242 # of BabelFly DBpedia URIs = 432 # of resolved BabelFly URIs = 269 # of TagMe DBpedia URIs = 469 # of resolved TagMe URIs = 320 # of WordNet DBpedia URIs = 182 # of resolved WordNet URIs = 97 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 6 / 13
  • 9. Cross-Evaluation Pre = #corrent senses #returned senses Rec = #corrent senses #corpus senses F1 = 2·Pre·Rec Pre+Rec Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (unresolved URIs) Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (resolved URIs) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
  • 10. Cross-Evaluation Pre = #corrent senses #returned senses Rec = #corrent senses #corpus senses F1 = 2·Pre·Rec Pre+Rec Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (unresolved URIs) Pre Rec F-1 0.0 0.2 0.4 0.6 0.8 1.0 Performance MetaMap BabelFly TagMe WordNet (resolved URIs) Conclusion When URIs are resolved via same as, generic EL systems such as TagMe and BabelNet match domain-specific annotators like MetaMap C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 7 / 13
  • 11. Semantic Relatedness Measures syn(s, s ) = {(w, w ) ∈ g(s) × g(s ) | wn>0.2(w, w )} |g(s)| + |g(s )| syn+ (s, s ) = {(w, w ) ∈ g(s) × g(s ) | wn>0(w, w )} |g(s)| + |g(s )| dsyn(s, s ) = {(w, w ) ∈ g(s) × g(s ) | dn>0.2(w, w )} |g(s)| + |g(s )| dsyn+ (s, s ) = {(w, w ) ∈ g(s) × g(s ) | dn>0(w, w )} |g(s)| + |g(s )| We measured: 1 WordNet similarity (low coverage, but better accuracy) under two “synonymy” thresholds (“strict” > 0.2, “loose” > 0) 2 word embedding relatedness (standard Wikipedia-trained word2vec word space models) under two “synonymy” thresholds (“strict” > 0.2 and “loose” > 0) C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 8 / 13
  • 12. Annotation Relatedness sy sy+ dsy dsy+ 0.0 0.1 0.2 0.3 0.4 0.5 Coverage(avg.) MetaMap BabelFly TagMe WordNet Annotations Avg. len. (sent.) Corpus sense glosses 66.41 words BabelFly sense glosses 199.43 words TagMe sense glosses 325.51 words MetaMap sense glosses 191.76 words WordNet sense glosses 50.50 words Test Null hyp. p-value Kruskal-Wallis identical 0.897 C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
  • 13. Annotation Relatedness sy sy+ dsy dsy+ 0.0 0.1 0.2 0.3 0.4 0.5 Coverage(avg.) MetaMap BabelFly TagMe WordNet Annotations Avg. len. (sent.) Corpus sense glosses 66.41 words BabelFly sense glosses 199.43 words TagMe sense glosses 325.51 words MetaMap sense glosses 191.76 words WordNet sense glosses 50.50 words Test Null hyp. p-value Kruskal-Wallis identical 0.897 Conclusion No significant differences w.r.t. semantic relatedness C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 9 / 13
  • 14. Summing up... We have cross-evaluated generic WSD and linking systems (BabelFly, TagMe) with domain-specific (MetaMap) annotators Generic WSD and linking systems show competitive results over the SemRep gold standard In particular, their greater coverage yields improvements in F1-score (TagMe outclasses MetaMap in F1-score, but by a small margin) In the future we plan to investigate if domain adaptation yields better results and improve linking C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 10 / 13
  • 15. Thank You! C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 11 / 13
  • 16. References I Aronson, A. R. and Lang, F.-M. (2010). And overview of MetaMap: Historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3):229–236. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., and Hellmann, S. (2009). DBpedia - A crystallization point for the web of data. Journal of Web Semantics, 7(3):154–165. Ferragina, P. and Scaiella, U. (2010). TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010). Kilicoglu, H., Rosenblat, G., Fiszman, M., and Rindfleisch, T. C. (2011). Constructing a semantic predication gold standard from the biomedical literature. BMC Bioinformatics, 12(486). C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 12 / 13
  • 17. References II Momtchev, V., Peychev, D., Primov, T., and Georgiev, G. (2009). Expanding the pathway and interaction knowledge in linked life data. Proceedings of 2009 International Semantic Web Challenge. Moro, A., Raganato, A., and Navigli, R. (2014). Entity linking meets word sense disambiguation: a unified approach. Transactions of the Association for Computational Linguistics, 2:231–244. C. Thorne et al. (UMa) EL for Clinical Text Leipzig, 14.09.2016 13 / 13