SlideShare a Scribd company logo
Entity Linking in Queries:
Tasks and Evaluation
ICTIR conference, September 2015
Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
2
Entity linking
Definition from Wikipedia:
https://en.wikipedia.org/wiki/Natural_language_processing
https://en.wikipedia.org/wiki/Statistical_classification
3
Why entity linking in queries?
• ~70% of queries contain entities
• To exploit semantic representation of
queries
Improves:
• Ad-hoc document retrieval
• Entity retrieval
• Query understanding
• Understanding users’ task (Tasks track, TREC)
J. Pound, P. Mika, and H. Zaragoza. Ad-hoc object retrieval in the web of data. In Proc. of WWW '10.
tom cruise movies
semantic representation
<relation>
4
It is different …
Different from conventional entity linking:
• Limited or even no context
• A mention may be linked to more than one entity
France France national football teamFIFA world cup
france world cup 98
{France, FIFA world cup}
or
{France national football team, FIFA world cup}
5
In this talk
How entity linking should be performed for queries?
➤ Task:
“Semantic Mapping” or “Interpretation Finding”?
➤ Evaluation metrics
➤ Test collections
➤ Methods
6
In this talk
How entity linking should be performed for queries?
➤ Task:
“Semantic Mapping” or “Interpretation Finding”?
➤ Evaluation metrics
➤ Test collections
➤ Methods
7
Entity linking
• Output is set of entities
• Each mention is linked to a single entity
• Mentions do not overlap
• Entities are explicitly mentioned
obama mother the music man new york pizza manhattan
{Barack
Obama}
{The Music Man} {New York City,
Manhattan}
8
Semantic mapping
• Output is ranked list of entities
• Mentions can overlap and be linked to multiple entities
• Entities may not be explicitly mentioned
• Entities do not need to form semantically compatible sets
• False positive are not penalized
obama mother the music man new york pizza manhattan
Ann Dunham
Barack Obama
The Music Man
The Music Man (1962
film)
The Music Man (2003
film)
…
New York City
New York-style pizza
Manhattan
Manhattan pizza
...
9
Interpretation finding
• Output is set(s) of semantically related entity sets
• Each entity set is an interpretation of the query
• Mention do not overlap within a set
obama mother the music man new york pizza manhattan
{ 􏰅{Barack
Obama}􏰅}
{
{The Music Man}
{The Music Man (1962
film)},
{The Music Man (2003
film)}􏰆
􏰅}
{
􏰆 {New York City,
Manhattan},
{New York-style pizza,
Manhattan}􏰆
􏰅}
D. Carmel, M.-W. Chang, E. Gabrilovich, B.J. P. Hsu, and K. Wang. ERD: Entity recognition and disambiguation challenge, 2014.
10
Tasks summary
Entity Linking Semantic Mapping Interpretation Finding
Entities explicitly
mentioned?
Yes No Yes
Mentions can
overlap?
No Yes No*
Results format Set Ranked list Sets of sets
Evaluation criteria Mentioned entities
found
Relevant entities
found
Interpretations found
Evaluation metrics Set-based Ranked-based Set-based
* Not within the same interpretation
✓Entity linking requirements are relaxed in semantic mapping.
11
In this talk
How entity linking should be performed for queries?
➤ Task:
“Semantic Mapping” or “Interpretation Finding”?
➤ Evaluation metrics
➤ Test collections
➤ Methods
12
Evaluation
• Macro-averaged metrics (precision, recall, F-measure)
• Matching condition:
– Interpretation sets should exactly match the ground truth
System
query interpretation
Ground truth
query interpretation
︸
What if or ?
D. Carmel, M.-W. Chang, E. Gabrilovich, B.J. P. Hsu, and K. Wang. Entity recognition and disambiguation challenge, 2014.
☞
︸
13
Evaluation (revisited)
Solution:
System output
matches ground truth.
System output
does not match
ground truth.
This evaluation is methodologically correct, but strict.
☞
14
Lean evaluation
• Partial matches are not rewarded in
• E.g. {{New York City, Manhattan}} ≠ {{New York City},
{Manhattan}}
Solution: Combine them with entity-based metrics.
15
In this talk
How entity linking should be performed for queries?
➤ Task:
“Semantic Mapping” or “Interpretation Finding”?
➤ Evaluation metrics
➤ Test collections
➤ Methods
16
Test collections - ERD
The ERD challenge introduced two test collections:
• ERD-dev (91 queries)
• ERD-test (500 queries)
– Unavailable for traditional offline evaluation
Annotation rules:
• The longest mention is used for entities
• Only proper noun entities are annotated (e.g., companies, locations)
• Overlapping mentions are not allowed within a single interpretation
1 http://web-ngram.research.microsoft.com/erd2014/Datasets.aspx
☞ ERD-dev is not suitable for training purposes (small)
1
17
Test collections - YSQLE
Yahoo Search Query Log to Entities (YSQLE)
• 2398 queries, manually annotated with Wikipedia entities
• Designed for training and testing entity linking systems for queries
Issues:
• Not possible to automatically form interpretation sets
– E.g. Query “france world cup 1998”
• Linked entities are not necessarily mentioned explicitly
– E.g. Query “charlie sheen lohan” is annotated with Anger Management (TV
series)
• Annotations are not always complete
– E.g. Query “louisville courier journal” is not annotated with Louisville, Kentucky
Yahoo! Webscope L24 dataset - Yahoo! search query log to entities, v1.0. http://webscope.sandbox.yahoo.com/
☞ YSQLE is meant for the semantic mapping task
18
Test collections - Y-ERD
Y-ERD is manually re-annotated based on:
• YSQLE annotations
• ERD rules
Additional rules:
• Site search queries are not linked
– E.g. Query “facebook obama slur” is only linked to Barack Obama
• Clear policy about misspelled mentions
– Two versions of Y-ERD is made available
☞ Y-ERD is made publicly available
http://bit.ly/ictir2015-elq
19
In this talk
How entity linking should be performed for queries?
➤ Task:
“Semantic Mapping” or “Interpretation Finding”?
➤ Evaluation metrics
➤ Test collections
➤ Methods
20
Candidate
entity ranking
Interpretation
finding
Mention
detection
ranked list
of entities
set of
mentions
interpretationsquery
Methods
Pipeline architecture for two tasks:
Semantic mapping
Interpretation Finding
The goal of entity linking in queries
21
Mention detection
Entity name variants are gathered from:
• KB: A manually curated knowledge base (DBpedia)
• WEB: Freebase Annotations of the ClueWeb Corpora (FACC)
E. Gabrilovich, et. al. FACC1: Freebase annotation of ClueWeb corpora, 2013.
Recall in mention detection step
22
Methods
Pipeline architecture for two tasks:
Semantic mapping
Interpretation Finding
The goal of entity linking in queries
Candidate
entity ranking
Interpretation
finding
Mention
detection
ranked list
of entities
set of
mentions
interpretationsquery
23
Candidate entity ranking
Ranking using language models:
P(q) - Query length
normalization
- P. Ogilvie and J. Callan. Combining document representations for known-item search. In Proc. of SIGIR ’03, 2003
- W. Kraaij and M. Spitters. Language models for topic tracking. In Language Modeling for Information Retrieval, 2003.
Scores should be comparable across queries
– P(Q) should be considered
Mixture of Language
Models (MLM)
24
Candidate entity ranking
Combining MLM and Commonness:
Commonness
Probability of entity e being the link target of mention m Query length normalized
MLM score
25
Candidate entity ranking
Semantic mapping results on YSQLE:
TAGME is an entity linking system.
• Should not be evaluated using rank-based metrics
• Should not be compared with semantic mapping results
P. Ferragina and U. Scaiella. TAGME: On-the-fly annotation of short text fragments. In Proc. of CIKM 2010.
26
Methods
Pipeline architecture for two tasks:
Semantic mapping
Interpretation Finding
The goal of entity linking in queries
Candidate
entity ranking
Interpretation
finding
Mention
detection
ranked list
of entities
set of
mentions
interpretationsquery
27
Interpretation finding
Greedy Interpretation Finding (GIF):
Example query: “jacksonville fl riverside”
Mention Entity Score
“jacksonville fl” Jacksonville Florida 0.9
“jacksonville” Jacksonville Florida 0.8
“riverside” Riverside Park
(Jacksonville)
0.6
“jacksonville fl” Naval Station
Jacksonville
0.2
“riverside Riverside (band) 0.1
Step 1:
Pruning based on a score
threshold (0.3)
Step 2:
Pruning containment
mentions
Step 3:
Forming interpretation sets { {Jacksonville Florida, Riverside Park
(Jacksonville)} }
28
Interpretation finding
ERD-dev
Y-ERD
29
Take home messages
• Entity linking in queries is different from documents
• Different flavors, different evaluation criteria:
– Interpretation finding (yes)
– Semantic mapping (no)
• Ultimate goal should be interpretation finding
• SM and EL should not be compared to each other
• Resources are available at http://bit.ly/ictir2015-elq
30
Thanks!

More Related Content

PDF
On the Reproducibility of the TAGME entity linking system
Faegheh Hasibi
 
PDF
Entity Linking in Queries: Efficiency vs. Effectiveness
Faegheh Hasibi
 
PDF
Entity Search: The Last Decade and the Next
krisztianbalog
 
PDF
Entity Linking
krisztianbalog
 
PDF
Evaluation Initiatives for Entity-oriented Search
krisztianbalog
 
PDF
Entity Retrieval (WSDM 2014 tutorial)
krisztianbalog
 
PDF
On Entities and Evaluation
krisztianbalog
 
PDF
Table Retrieval and Generation
krisztianbalog
 
On the Reproducibility of the TAGME entity linking system
Faegheh Hasibi
 
Entity Linking in Queries: Efficiency vs. Effectiveness
Faegheh Hasibi
 
Entity Search: The Last Decade and the Next
krisztianbalog
 
Entity Linking
krisztianbalog
 
Evaluation Initiatives for Entity-oriented Search
krisztianbalog
 
Entity Retrieval (WSDM 2014 tutorial)
krisztianbalog
 
On Entities and Evaluation
krisztianbalog
 
Table Retrieval and Generation
krisztianbalog
 

What's hot (20)

PPTX
WISS QA Do it yourself Question answering over Linked Data
Andre Freitas
 
PDF
Entity Retrieval (SIGIR 2013 tutorial)
krisztianbalog
 
PDF
Entities for Augmented Intelligence
krisztianbalog
 
PDF
Entity Retrieval (WWW 2013 tutorial)
krisztianbalog
 
PPTX
WiSS Challenge - Day 2
Andre Freitas
 
PDF
Linked Open Data to support content based Recommender Systems
Vito Ostuni
 
PDF
Schema-agnositc queries over large-schema databases: a distributional semanti...
Andre Freitas
 
PPT
Information extraction for Free Text
butest
 
PDF
Alexander Sirenko - Query expansion for Question Answering
Alexander Sirenko
 
PDF
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
Enrico Palumbo
 
PDF
Context-Enhanced Adaptive Entity Linking
Giuseppe Rizzo
 
PPT
Introduction to question answering for linked data & big data
Andre Freitas
 
PDF
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
PDF
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Amparo Elizabeth Cano Basave
 
PDF
Question Answering with Lydia
Jae Hong Kil
 
PDF
Linkanalysis handout
csedays
 
PDF
Best Practices for Large Scale Text Mining Processing
Ontotext
 
PDF
NEEL2015 challenge summary
Giuseppe Rizzo
 
PPTX
Introduction to Named Entity Recognition
Tomer Lieber
 
PDF
Knowledge Graph Embeddings for Recommender Systems
Enrico Palumbo
 
WISS QA Do it yourself Question answering over Linked Data
Andre Freitas
 
Entity Retrieval (SIGIR 2013 tutorial)
krisztianbalog
 
Entities for Augmented Intelligence
krisztianbalog
 
Entity Retrieval (WWW 2013 tutorial)
krisztianbalog
 
WiSS Challenge - Day 2
Andre Freitas
 
Linked Open Data to support content based Recommender Systems
Vito Ostuni
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Andre Freitas
 
Information extraction for Free Text
butest
 
Alexander Sirenko - Query expansion for Question Answering
Alexander Sirenko
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
Enrico Palumbo
 
Context-Enhanced Adaptive Entity Linking
Giuseppe Rizzo
 
Introduction to question answering for linked data & big data
Andre Freitas
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Amparo Elizabeth Cano Basave
 
Question Answering with Lydia
Jae Hong Kil
 
Linkanalysis handout
csedays
 
Best Practices for Large Scale Text Mining Processing
Ontotext
 
NEEL2015 challenge summary
Giuseppe Rizzo
 
Introduction to Named Entity Recognition
Tomer Lieber
 
Knowledge Graph Embeddings for Recommender Systems
Enrico Palumbo
 
Ad

Viewers also liked (20)

PDF
Exploiting Entity Linking in Queries For Entity Retrieval
Faegheh Hasibi
 
PPTX
Erd for teaching staff
Mrsjalland
 
PPT
Faculty evaluation system
Edwin Marquez
 
PPTX
Learning to assess Linked Data relationships using Genetic Programming
Vrije Universiteit Amsterdam
 
PDF
Publishing and Using Linked Data
ostephens
 
PPTX
Linked Open Data Principles, benefits of LOD for sustainable development
Martin Kaltenböck
 
PPTX
Technical Background
Nikolaos Konstantinou
 
PPTX
Transient and persistent RDF views over relational databases in the context o...
Nikolaos Konstantinou
 
PPTX
Incremental Export of Relational Database Contents into RDF Graphs
Nikolaos Konstantinou
 
PPTX
Materializing the Web of Linked Data
Nikolaos Konstantinou
 
PPTX
Conclusions: Summary and Outlook
Nikolaos Konstantinou
 
PPTX
An Approach for the Incremental Export of Relational Databases into RDF Graphs
Nikolaos Konstantinou
 
PPTX
Deploying Linked Open Data: Methodologies and Software Tools
Nikolaos Konstantinou
 
PPTX
Introduction: Linked Data and the Semantic Web
Nikolaos Konstantinou
 
PDF
Publishing Linked Data from RDB
Boris Villazón-Terrazas
 
PPT
Linking KOS Data [using SKOS and OWL2]
Marcia Zeng
 
PPTX
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
Ig Bittencourt
 
PPT
Online Performance Evaluation System
Pratham Vision
 
PPTX
Introduction to Linked Data 1/5
Juan Sequeda
 
PPTX
Linked Data tutorial at Semtech 2012
Juan Sequeda
 
Exploiting Entity Linking in Queries For Entity Retrieval
Faegheh Hasibi
 
Erd for teaching staff
Mrsjalland
 
Faculty evaluation system
Edwin Marquez
 
Learning to assess Linked Data relationships using Genetic Programming
Vrije Universiteit Amsterdam
 
Publishing and Using Linked Data
ostephens
 
Linked Open Data Principles, benefits of LOD for sustainable development
Martin Kaltenböck
 
Technical Background
Nikolaos Konstantinou
 
Transient and persistent RDF views over relational databases in the context o...
Nikolaos Konstantinou
 
Incremental Export of Relational Database Contents into RDF Graphs
Nikolaos Konstantinou
 
Materializing the Web of Linked Data
Nikolaos Konstantinou
 
Conclusions: Summary and Outlook
Nikolaos Konstantinou
 
An Approach for the Incremental Export of Relational Databases into RDF Graphs
Nikolaos Konstantinou
 
Deploying Linked Open Data: Methodologies and Software Tools
Nikolaos Konstantinou
 
Introduction: Linked Data and the Semantic Web
Nikolaos Konstantinou
 
Publishing Linked Data from RDB
Boris Villazón-Terrazas
 
Linking KOS Data [using SKOS and OWL2]
Marcia Zeng
 
From Research to Innovation: Linked Open Data and Gamification to Design Inte...
Ig Bittencourt
 
Online Performance Evaluation System
Pratham Vision
 
Introduction to Linked Data 1/5
Juan Sequeda
 
Linked Data tutorial at Semtech 2012
Juan Sequeda
 
Ad

Similar to Entity Linking in Queries: Tasks and Evaluation (20)

PDF
Collective entity linking with WSRM DocEng'19
ngamou
 
PDF
Entity Linking Combining Open Source Annotators
pruiz_
 
DOCX
Entity linking with a knowledge baseissues, techniques, and solutions
Shakas Technologies
 
PPTX
Mining Web content for Enhanced Search
Roi Blanco
 
PPTX
Understanding Queries through Entities
Peter Mika
 
DOCX
Entity linking with a knowledge base issues techniques and solutions
Pvrtechnologies Nellore
 
PDF
Perspectives on mining knowledge graphs from text
Jennifer D'Souza
 
PPT
Related Entity Finding on the Web
Peter Mika
 
PPTX
Knowledge Integration in Practice
Peter Mika
 
PDF
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Marieke van Erp
 
PDF
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation
Pikakshi Manchanda
 
DOCX
Entity linking with a knowledge base issues techniques and solutions
CloudTechnologies
 
PDF
Entity Linking, Link Prediction, and Knowledge Graph Completion
Jennifer D'Souza
 
PDF
From Linked Data to Semantic Applications
Andre Freitas
 
DOCX
Entity linking with a knowledge base issues,
Nexgen Technology
 
PPT
Semantic Search
sssw2012
 
PPTX
2015 07-tuto2-clus type
jins0618
 
PDF
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
semanticsconference
 
PDF
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab
 
PPTX
Semantic Search on the Rise
Peter Mika
 
Collective entity linking with WSRM DocEng'19
ngamou
 
Entity Linking Combining Open Source Annotators
pruiz_
 
Entity linking with a knowledge baseissues, techniques, and solutions
Shakas Technologies
 
Mining Web content for Enhanced Search
Roi Blanco
 
Understanding Queries through Entities
Peter Mika
 
Entity linking with a knowledge base issues techniques and solutions
Pvrtechnologies Nellore
 
Perspectives on mining knowledge graphs from text
Jennifer D'Souza
 
Related Entity Finding on the Web
Peter Mika
 
Knowledge Integration in Practice
Peter Mika
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Marieke van Erp
 
KDIR2015-Entity Linking and Knowledge Discovery in Microblogs-Presentation
Pikakshi Manchanda
 
Entity linking with a knowledge base issues techniques and solutions
CloudTechnologies
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Jennifer D'Souza
 
From Linked Data to Semantic Applications
Andre Freitas
 
Entity linking with a knowledge base issues,
Nexgen Technology
 
Semantic Search
sssw2012
 
2015 07-tuto2-clus type
jins0618
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
semanticsconference
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab
 
Semantic Search on the Rise
Peter Mika
 

Recently uploaded (20)

PPTX
The Obesity Paradox. Friend or Foe ?pptx
drdgd1972
 
PPTX
Hepatopulmonary syndrome power point presentation
raknasivar1997
 
PPTX
Quality control test for plastic & metal.pptx
shrutipandit17
 
PPTX
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
PPTX
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
PDF
Challenges of Transpiling Smalltalk to JavaScript
ESUG
 
PPTX
Pharmacognosy: ppt :pdf :pharmacognosy :
Vishnukanchi darade
 
PPTX
Modifications in RuBisCO system to enhance photosynthesis .pptx
raghumolbiotech
 
PDF
Drones in Disaster Response: Real-Time Data Collection and Analysis (www.kiu...
publication11
 
PPTX
Limbic system_components_connections_ functions.pptx
muralinath2
 
PDF
Approximating manifold orbits by means of Machine Learning Techniques
Esther Barrabés Vera
 
PDF
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
PPTX
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
PPTX
INTERNATIONAL CLASSIFICATION OF DISEASES ji.pptx
46JaybhayAshwiniHari
 
PPTX
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
PPTX
Reticular formation_nuclei_afferent_efferent
muralinath2
 
PPTX
Qualification of.UV visible spectrophotometer pptx
shrutipandit17
 
PDF
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
PDF
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST
Sérgio Sacani
 
PPT
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
The Obesity Paradox. Friend or Foe ?pptx
drdgd1972
 
Hepatopulmonary syndrome power point presentation
raknasivar1997
 
Quality control test for plastic & metal.pptx
shrutipandit17
 
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
Challenges of Transpiling Smalltalk to JavaScript
ESUG
 
Pharmacognosy: ppt :pdf :pharmacognosy :
Vishnukanchi darade
 
Modifications in RuBisCO system to enhance photosynthesis .pptx
raghumolbiotech
 
Drones in Disaster Response: Real-Time Data Collection and Analysis (www.kiu...
publication11
 
Limbic system_components_connections_ functions.pptx
muralinath2
 
Approximating manifold orbits by means of Machine Learning Techniques
Esther Barrabés Vera
 
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
INTERNATIONAL CLASSIFICATION OF DISEASES ji.pptx
46JaybhayAshwiniHari
 
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
Reticular formation_nuclei_afferent_efferent
muralinath2
 
Qualification of.UV visible spectrophotometer pptx
shrutipandit17
 
Migrating Katalon Studio Tests to Playwright with Model Driven Engineering
ESUG
 
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST
Sérgio Sacani
 
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 

Entity Linking in Queries: Tasks and Evaluation

  • 1. Entity Linking in Queries: Tasks and Evaluation ICTIR conference, September 2015 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
  • 2. 2 Entity linking Definition from Wikipedia: https://en.wikipedia.org/wiki/Natural_language_processing https://en.wikipedia.org/wiki/Statistical_classification
  • 3. 3 Why entity linking in queries? • ~70% of queries contain entities • To exploit semantic representation of queries Improves: • Ad-hoc document retrieval • Entity retrieval • Query understanding • Understanding users’ task (Tasks track, TREC) J. Pound, P. Mika, and H. Zaragoza. Ad-hoc object retrieval in the web of data. In Proc. of WWW '10. tom cruise movies semantic representation <relation>
  • 4. 4 It is different … Different from conventional entity linking: • Limited or even no context • A mention may be linked to more than one entity France France national football teamFIFA world cup france world cup 98 {France, FIFA world cup} or {France national football team, FIFA world cup}
  • 5. 5 In this talk How entity linking should be performed for queries? ➤ Task: “Semantic Mapping” or “Interpretation Finding”? ➤ Evaluation metrics ➤ Test collections ➤ Methods
  • 6. 6 In this talk How entity linking should be performed for queries? ➤ Task: “Semantic Mapping” or “Interpretation Finding”? ➤ Evaluation metrics ➤ Test collections ➤ Methods
  • 7. 7 Entity linking • Output is set of entities • Each mention is linked to a single entity • Mentions do not overlap • Entities are explicitly mentioned obama mother the music man new york pizza manhattan {Barack Obama} {The Music Man} {New York City, Manhattan}
  • 8. 8 Semantic mapping • Output is ranked list of entities • Mentions can overlap and be linked to multiple entities • Entities may not be explicitly mentioned • Entities do not need to form semantically compatible sets • False positive are not penalized obama mother the music man new york pizza manhattan Ann Dunham Barack Obama The Music Man The Music Man (1962 film) The Music Man (2003 film) … New York City New York-style pizza Manhattan Manhattan pizza ...
  • 9. 9 Interpretation finding • Output is set(s) of semantically related entity sets • Each entity set is an interpretation of the query • Mention do not overlap within a set obama mother the music man new york pizza manhattan { 􏰅{Barack Obama}􏰅} { {The Music Man} {The Music Man (1962 film)}, {The Music Man (2003 film)}􏰆 􏰅} { 􏰆 {New York City, Manhattan}, {New York-style pizza, Manhattan}􏰆 􏰅} D. Carmel, M.-W. Chang, E. Gabrilovich, B.J. P. Hsu, and K. Wang. ERD: Entity recognition and disambiguation challenge, 2014.
  • 10. 10 Tasks summary Entity Linking Semantic Mapping Interpretation Finding Entities explicitly mentioned? Yes No Yes Mentions can overlap? No Yes No* Results format Set Ranked list Sets of sets Evaluation criteria Mentioned entities found Relevant entities found Interpretations found Evaluation metrics Set-based Ranked-based Set-based * Not within the same interpretation ✓Entity linking requirements are relaxed in semantic mapping.
  • 11. 11 In this talk How entity linking should be performed for queries? ➤ Task: “Semantic Mapping” or “Interpretation Finding”? ➤ Evaluation metrics ➤ Test collections ➤ Methods
  • 12. 12 Evaluation • Macro-averaged metrics (precision, recall, F-measure) • Matching condition: – Interpretation sets should exactly match the ground truth System query interpretation Ground truth query interpretation ︸ What if or ? D. Carmel, M.-W. Chang, E. Gabrilovich, B.J. P. Hsu, and K. Wang. Entity recognition and disambiguation challenge, 2014. ☞ ︸
  • 13. 13 Evaluation (revisited) Solution: System output matches ground truth. System output does not match ground truth. This evaluation is methodologically correct, but strict. ☞
  • 14. 14 Lean evaluation • Partial matches are not rewarded in • E.g. {{New York City, Manhattan}} ≠ {{New York City}, {Manhattan}} Solution: Combine them with entity-based metrics.
  • 15. 15 In this talk How entity linking should be performed for queries? ➤ Task: “Semantic Mapping” or “Interpretation Finding”? ➤ Evaluation metrics ➤ Test collections ➤ Methods
  • 16. 16 Test collections - ERD The ERD challenge introduced two test collections: • ERD-dev (91 queries) • ERD-test (500 queries) – Unavailable for traditional offline evaluation Annotation rules: • The longest mention is used for entities • Only proper noun entities are annotated (e.g., companies, locations) • Overlapping mentions are not allowed within a single interpretation 1 http://web-ngram.research.microsoft.com/erd2014/Datasets.aspx ☞ ERD-dev is not suitable for training purposes (small) 1
  • 17. 17 Test collections - YSQLE Yahoo Search Query Log to Entities (YSQLE) • 2398 queries, manually annotated with Wikipedia entities • Designed for training and testing entity linking systems for queries Issues: • Not possible to automatically form interpretation sets – E.g. Query “france world cup 1998” • Linked entities are not necessarily mentioned explicitly – E.g. Query “charlie sheen lohan” is annotated with Anger Management (TV series) • Annotations are not always complete – E.g. Query “louisville courier journal” is not annotated with Louisville, Kentucky Yahoo! Webscope L24 dataset - Yahoo! search query log to entities, v1.0. http://webscope.sandbox.yahoo.com/ ☞ YSQLE is meant for the semantic mapping task
  • 18. 18 Test collections - Y-ERD Y-ERD is manually re-annotated based on: • YSQLE annotations • ERD rules Additional rules: • Site search queries are not linked – E.g. Query “facebook obama slur” is only linked to Barack Obama • Clear policy about misspelled mentions – Two versions of Y-ERD is made available ☞ Y-ERD is made publicly available http://bit.ly/ictir2015-elq
  • 19. 19 In this talk How entity linking should be performed for queries? ➤ Task: “Semantic Mapping” or “Interpretation Finding”? ➤ Evaluation metrics ➤ Test collections ➤ Methods
  • 20. 20 Candidate entity ranking Interpretation finding Mention detection ranked list of entities set of mentions interpretationsquery Methods Pipeline architecture for two tasks: Semantic mapping Interpretation Finding The goal of entity linking in queries
  • 21. 21 Mention detection Entity name variants are gathered from: • KB: A manually curated knowledge base (DBpedia) • WEB: Freebase Annotations of the ClueWeb Corpora (FACC) E. Gabrilovich, et. al. FACC1: Freebase annotation of ClueWeb corpora, 2013. Recall in mention detection step
  • 22. 22 Methods Pipeline architecture for two tasks: Semantic mapping Interpretation Finding The goal of entity linking in queries Candidate entity ranking Interpretation finding Mention detection ranked list of entities set of mentions interpretationsquery
  • 23. 23 Candidate entity ranking Ranking using language models: P(q) - Query length normalization - P. Ogilvie and J. Callan. Combining document representations for known-item search. In Proc. of SIGIR ’03, 2003 - W. Kraaij and M. Spitters. Language models for topic tracking. In Language Modeling for Information Retrieval, 2003. Scores should be comparable across queries – P(Q) should be considered Mixture of Language Models (MLM)
  • 24. 24 Candidate entity ranking Combining MLM and Commonness: Commonness Probability of entity e being the link target of mention m Query length normalized MLM score
  • 25. 25 Candidate entity ranking Semantic mapping results on YSQLE: TAGME is an entity linking system. • Should not be evaluated using rank-based metrics • Should not be compared with semantic mapping results P. Ferragina and U. Scaiella. TAGME: On-the-fly annotation of short text fragments. In Proc. of CIKM 2010.
  • 26. 26 Methods Pipeline architecture for two tasks: Semantic mapping Interpretation Finding The goal of entity linking in queries Candidate entity ranking Interpretation finding Mention detection ranked list of entities set of mentions interpretationsquery
  • 27. 27 Interpretation finding Greedy Interpretation Finding (GIF): Example query: “jacksonville fl riverside” Mention Entity Score “jacksonville fl” Jacksonville Florida 0.9 “jacksonville” Jacksonville Florida 0.8 “riverside” Riverside Park (Jacksonville) 0.6 “jacksonville fl” Naval Station Jacksonville 0.2 “riverside Riverside (band) 0.1 Step 1: Pruning based on a score threshold (0.3) Step 2: Pruning containment mentions Step 3: Forming interpretation sets { {Jacksonville Florida, Riverside Park (Jacksonville)} }
  • 29. 29 Take home messages • Entity linking in queries is different from documents • Different flavors, different evaluation criteria: – Interpretation finding (yes) – Semantic mapping (no) • Ultimate goal should be interpretation finding • SM and EL should not be compared to each other • Resources are available at http://bit.ly/ictir2015-elq

Editor's Notes

  • #3: Entity linking is a key enabling component for semantic search It is so important that it has its own Wikipedia article, referring to the task and the research that has been done in this area. This text is the definition of EL from Wikipedia. In facts this piece of text represent entity lining queit well, not because of the text, but because Of course this has been done by human, by the aim of entity linking is to make it automatically.
  • #4:  evaluate system's understanding of tasks users aim to achieve
  • #5: This is the beginning of the stories
  • #6: Both approaches have their place, but there is an important distinction to be made as they are designed to accomplish different tasks.
  • #9: is no difference made between system A that does not return anything and system B that returns meaningless or nonsense suggestions
  • #11: Mentions can be overlapping Entities do not need to form semantically compatible sets False positive are not penalized in semantic mapping
  • #13: according to this definition, if the query does not have any interpretations in the ground truth (Iˆ = ;) then precision is un- defined;
  • #14: We correct for this behavior by defining precision and recall for interpretation-based evaluation: It captures the extent to which the interpretations of the query are identified. Interpretation-based metrics:
  • #18: As with any problem in information retrieval, the availability of public datasets is of key importance.
  • #19: Terms that are meant to restrict the search to a certain site (such as Facebook or IMDB) should not be linked.
  • #24: to rank items (here: entities) based on query likelihood:
  • #25: It ranks entities based on the highest scoring mention, i.e., ranking is dependent not only on the query but on the specific mention as well:
  • #26: Though we include results for TAGME, we note that this comparison, despite having been done in prior work (e.g., in [6]), is an unfair one. TAGME is not a baseline for semantic mapping task
  • #29: Recall what Y-ERD and ERD-dev are.
  • #30: Semantic mapping and entity linking task should not be compared to each other. Next: Exploiting multiple query interpretations in entity ranking
  • #31: Some questions: Q1: Why multiple interpretations? ERD-dev is better representation of Web search queries Y-ERD is limited to annotations from YSQLE, and yes it has its own limitations Multiple interpretation is crucial; this is not just our idea, but also ERD organizer’s idea. Q2: Why use both KB and WEB for mention detection? Using KB is an offline process and it comes for free. All FACC entities are not used for entity ranking (it is not efficient). In this case, KB gives high quality name variants and improves recall. Q3: Are these evaluation metrics biased for a system that tend to return less entities? The naïve baseline for this datasets is f-measure 0.5 A system should be of course better than this baseline Refer to the results table, top-ranked results are low. Q4: Why interpretation finding? Closer to how human think. Part of understanding of ambiguity of queries.