SlideShare a Scribd company logo
Ido Dagan Lillian Lee Fernando Pereira
The problem is ” How to get sense from  unseen word pairs  that are not present in training set”. Eg:  I want to bee a scientist. Robbed the bank
They compared four similarity based estimation methods: KL divergence  Total divergence to average L1 Norm Confusion probability Against two well established methods  Katz’s back-off scheme Maximum likelihood estimation
Katz’s back-off scheme(1987) widely used in bigram language modeling, estimates the probability of an unseen bigram by utilizing unigram estimates using baye’s conditional probability theorem. Eg: {make,take} plans.
As the estimation of probability of unseen bigram depends on unigram frequencies ,so this has undesirable result of assigning unseen bigrams the same probability if they are made up of unigrams of same frequency. Eg:{a b} and {c b}
In this method, words of similar meaning are grouped together statically to form a class. So for a group of words there is  only one representative , which is its class. A word is therefore modeled by average behavior of many words. When in doubt between two words search the testing data related to words of those classes. Eg: {a,b,c,d,e} & {f,g,h,I}  W
As the word is modeled by average behavior of many words so the uniqueness of meaning of word is lost. Eg: Thanda Initially probability for unseen word pairs remains zero which leads to extremely inaccurate estimates for word pair probabilities. Eg: Periodic table
Estimates for most compatible(similar) words with a word  w  are combined and based on evidence provided by word  w ’   ,is weighted by a function of its compatibility with  w  . No word pair is dropped even it is very rare one,as there were in katz’s back off scheme .
Similarity based word sense can be achieved in 3 steps… A scheme for deciding which word pairs require similarity based estimation. A method for combining information from similar words. A function measuring similarity between words.
Good points of katz’s back off scheme and MLE are combined… In the MLE probability is  P ML(w2/w1)  =c(w1,w2)/c(w1) But for similarity based sense  P(w2/w1)={ P d  (w2/w1)  c(w1,w2)>0 for seen pair α (w1)P r  (w2/w1)  for unseen pair
Similarity based models assume that if word  w1’  is similar to word  w1 ,then  w1’  can yield the information about probability of unseen word pairs involving  w1. It is proved that  w2  is more likely to occur with  w1  if it tends to occur with the words that are most similar to  w1 . They used a weighted average of evidence provided by similar words, where the weight given to a particular word depends on its similarity to w1.
Number of words similar to a word w1 are set up to a threshold value because in a large training set it will use very large amount of resources. Number of similar words(k) and threshold of dissimilarity between words(t) is tuned experimentally.
These word similarity functions can be derived  automatically  from statistics of training data, as opposed to functions derived from manually constructed word classes. KL divergence Total divergence to average L1 Norm Confusion Probability
KL divergence is standard measure of dissimilarity between two probability mass functions For D to be defined  P(w2|w1’)>0 whenever P(w2|w1)>0. Above condition might not hold good in some cases,So smoothing is required which is very expensive for large vocabularies.
It is a relative measure based on the total KL divergence to the average of two distributions: This is reduced to
A (w1,w1’) is bounded ,ranging between 0 and 2log2. Smoothed estimates are not required because probability ratios are not involved. Calculation of  A (w1,w1’) requires summing only over those  w2  for which P(w2|w1) and P(w2|w1’) are both non zero, this makes computation quite fast.
L1 norm is defined as by reducing it to form depending upon w2 It is also bounded between 0 to 2.
It estimates that a word w1’ can be substituted with word w1 or not. Unlike the  D , A , L   w1 may not be “closest” to itself ie. there may exist a word w1’ such that
As the sense of actual word may be very fine or very coarse, provided by the dictionary and it will take large amount of resources for training data to have correct sense,Experiment done on  Pseudo Word . Eg: {make,take} plans {make,take} action where {make,take} is a pseudo word  tested with plans and action.
Each method in experiment is tested with a noun and two verbs and method decides which verb is more likely to have a noun as direct object. Experiment used 587833 bigrams to make bigram language model. Experiment tested with 17152 unseen bigrams by dividing it into five equal parts T1 to T5. Used error rate as performance metric.
As Back off consistently performed worse than MLE so not including Back off in experiments. As only experiment is only on unsmoothed data so KL divergence is not included in experiments.
 
 
 
Similarity based methods performed 40% better over Back off and MLE methods. Singletons should not be omitted from training data for similarity based methods. Total divergence to average method ( A ) performs best in all cases.
 

More Related Content

PPT
Lec 02 logical eq (Discrete Mathematics)
Naosher Md. Zakariyar
 
PPTX
Jarrar: Propositional Logic Inference Methods
Mustafa Jarrar
 
PPTX
Spell Checker and string matching Using BK tree
111shridhar
 
PDF
P13 corley
UKM university
 
PDF
Using Word Embedding for Automatic Query Expansion
Dwaipayan Roy
 
PDF
Kevin teh insight presentation
Kevin Teh
 
PDF
Knowledge based System
Tamanna
 
PDF
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
kevig
 
Lec 02 logical eq (Discrete Mathematics)
Naosher Md. Zakariyar
 
Jarrar: Propositional Logic Inference Methods
Mustafa Jarrar
 
Spell Checker and string matching Using BK tree
111shridhar
 
P13 corley
UKM university
 
Using Word Embedding for Automatic Query Expansion
Dwaipayan Roy
 
Kevin teh insight presentation
Kevin Teh
 
Knowledge based System
Tamanna
 
PRONOUN DISAMBIGUATION: WITH APPLICATION TO THE WINOGRAD SCHEMA CHALLENGE
kevig
 

What's hot (18)

PDF
semeval2016
Lukáš Svoboda
 
PPTX
Matching techniques
Nagpalkirti
 
DOCX
G6 m4-d-lesson 11-s
mlabuski
 
PPTX
Chat bot using text similarity approach
dinesh_joshy
 
PPTX
Fasttext 20170720 yjy
재연 윤
 
PPTX
Mapping Cardinalities
Megha Sharma
 
PDF
Noun Paraphrasing Based on a Variety of Contexts
Tomoyuki Kajiwara
 
PPTX
Mapping cardinality (cardinality constraint) in ER MODEL
RUpaliLohar
 
PPTX
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...
Ashish Duggal
 
PPTX
Reasoning Over Knowledge Base
Shubham Agarwal
 
DOCX
G6 m4-g-lesson 23-s
mlabuski
 
PDF
2016 m7 w2
Ayush Pareek
 
PPT
Mapping cardinalities
Arafat Hossan
 
PDF
An extended stable marriage problem
ijseajournal
 
PPTX
Using prior knowledge to initialize the hypothesis,kbann
swapnac12
 
PPTX
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
PPT
ma52006id386
matsushimalab
 
PDF
Cc35451454
IJERA Editor
 
semeval2016
Lukáš Svoboda
 
Matching techniques
Nagpalkirti
 
G6 m4-d-lesson 11-s
mlabuski
 
Chat bot using text similarity approach
dinesh_joshy
 
Fasttext 20170720 yjy
재연 윤
 
Mapping Cardinalities
Megha Sharma
 
Noun Paraphrasing Based on a Variety of Contexts
Tomoyuki Kajiwara
 
Mapping cardinality (cardinality constraint) in ER MODEL
RUpaliLohar
 
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...
Ashish Duggal
 
Reasoning Over Knowledge Base
Shubham Agarwal
 
G6 m4-g-lesson 23-s
mlabuski
 
2016 m7 w2
Ayush Pareek
 
Mapping cardinalities
Arafat Hossan
 
An extended stable marriage problem
ijseajournal
 
Using prior knowledge to initialize the hypothesis,kbann
swapnac12
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
ma52006id386
matsushimalab
 
Cc35451454
IJERA Editor
 
Ad

Viewers also liked (19)

PPT
Mycin
vini89
 
PPTX
Hcs
vini89
 
PPT
Artificial Intelligence
vini89
 
PDF
Graph-based Word Sense Disambiguation
Elena-Oana Tabaranu
 
PDF
Usage of word sense disambiguation in concept identification in ontology cons...
Innovation Quotient Pvt Ltd
 
PPTX
Word-sense disambiguation
Mariposa Speranza
 
PDF
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
Pierpaolo Basile
 
PDF
Word sense disambiguation a survey
unyil96
 
PPTX
Biomedical Word Sense Disambiguation presentation [Autosaved]
akm sabbir
 
PPT
Similarity based methods for word sense disambiguation
vini89
 
PPTX
Error analysis of Word Sense Disambiguation
Rubén Izquierdo Beviá
 
PDF
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Leonardo Di Donato
 
PDF
Word Sense Disambiguation and Induction
Leon Derczynski
 
PDF
Ontology-Based Word Sense Disambiguation for Scientific Literature
eXascale Infolab
 
PPTX
Machine translation with statistical approach
vini89
 
PDF
Lecture: Word Sense Disambiguation
Marina Santini
 
PDF
Babelfy: Entity Linking meets Word Sense Disambiguation.
Grupo HULAT
 
PDF
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Stuart Shulman
 
PPTX
Tutorial of Sentiment Analysis
Fabio Benedetti
 
Mycin
vini89
 
Hcs
vini89
 
Artificial Intelligence
vini89
 
Graph-based Word Sense Disambiguation
Elena-Oana Tabaranu
 
Usage of word sense disambiguation in concept identification in ontology cons...
Innovation Quotient Pvt Ltd
 
Word-sense disambiguation
Mariposa Speranza
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
Pierpaolo Basile
 
Word sense disambiguation a survey
unyil96
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
akm sabbir
 
Similarity based methods for word sense disambiguation
vini89
 
Error analysis of Word Sense Disambiguation
Rubén Izquierdo Beviá
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Leonardo Di Donato
 
Word Sense Disambiguation and Induction
Leon Derczynski
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
eXascale Infolab
 
Machine translation with statistical approach
vini89
 
Lecture: Word Sense Disambiguation
Marina Santini
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Grupo HULAT
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Stuart Shulman
 
Tutorial of Sentiment Analysis
Fabio Benedetti
 
Ad

Similar to Similarity based methods for word sense disambiguation (20)

PDF
An approach to speed up the word sense disambiguation procedure through sense...
ijics
 
PDF
Simple effective decipherment via combinatorial optimization
Attaporn Ninsuwan
 
PDF
A Survey of Text Mining
Justin Sybrandt, Ph.D.
 
PDF
IB Mathematics Extended Essay (2021) - Building A Predictive Text List Using ...
Michelle Bojorquez
 
PDF
4A HYBRID APPROACH TO WORD SENSE DISAMBIGUATION COMBINING SUPERVISED AND UNSU...
ijaia
 
PPT
Measuring Similarity Between Contexts and Concepts
University of Minnesota, Duluth
 
PDF
A new approach to achieve the users’ habitual opportunities on social media
IAESIJAI
 
PDF
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
 
PDF
An Adaptive Approach for Subjective Answer Evaluation
vivatechijri
 
PPTX
DL-CO2 -Session 3 Learning Vectorial Representations of Words.pptx
Kv Sagar
 
PPT
Measuring Similarity Between Contexts and Concepts
University of Minnesota, Duluth
 
PPT
Qualitative differences between human behvaioral data and co-occurrence mode...
Gabriel Recchia
 
PPTX
Word2vec slide(lab seminar)
Jinpyo Lee
 
PPTX
Fuzzy Matching with Apache Spark
DataWorks Summit
 
PPT
natural language processing by Christopher
2021ismadhuprasadrna
 
PDF
Probability Theory Application and statitics
malickizorom1
 
PDF
HaiqingWang-MasterThesis
Haiqing Wang
 
PDF
Brute force searching, the typical set and guesswork
Lisandro Mierez
 
PPT
Acm ihi-2010-pedersen-final
University of Minnesota, Duluth
 
PDF
20433-39028-3-PB.pdf
IjictTeam
 
An approach to speed up the word sense disambiguation procedure through sense...
ijics
 
Simple effective decipherment via combinatorial optimization
Attaporn Ninsuwan
 
A Survey of Text Mining
Justin Sybrandt, Ph.D.
 
IB Mathematics Extended Essay (2021) - Building A Predictive Text List Using ...
Michelle Bojorquez
 
4A HYBRID APPROACH TO WORD SENSE DISAMBIGUATION COMBINING SUPERVISED AND UNSU...
ijaia
 
Measuring Similarity Between Contexts and Concepts
University of Minnesota, Duluth
 
A new approach to achieve the users’ habitual opportunities on social media
IAESIJAI
 
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
 
An Adaptive Approach for Subjective Answer Evaluation
vivatechijri
 
DL-CO2 -Session 3 Learning Vectorial Representations of Words.pptx
Kv Sagar
 
Measuring Similarity Between Contexts and Concepts
University of Minnesota, Duluth
 
Qualitative differences between human behvaioral data and co-occurrence mode...
Gabriel Recchia
 
Word2vec slide(lab seminar)
Jinpyo Lee
 
Fuzzy Matching with Apache Spark
DataWorks Summit
 
natural language processing by Christopher
2021ismadhuprasadrna
 
Probability Theory Application and statitics
malickizorom1
 
HaiqingWang-MasterThesis
Haiqing Wang
 
Brute force searching, the typical set and guesswork
Lisandro Mierez
 
Acm ihi-2010-pedersen-final
University of Minnesota, Duluth
 
20433-39028-3-PB.pdf
IjictTeam
 

Recently uploaded (20)

PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
PPTX
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 

Similarity based methods for word sense disambiguation

  • 1. Ido Dagan Lillian Lee Fernando Pereira
  • 2. The problem is ” How to get sense from unseen word pairs that are not present in training set”. Eg: I want to bee a scientist. Robbed the bank
  • 3. They compared four similarity based estimation methods: KL divergence Total divergence to average L1 Norm Confusion probability Against two well established methods Katz’s back-off scheme Maximum likelihood estimation
  • 4. Katz’s back-off scheme(1987) widely used in bigram language modeling, estimates the probability of an unseen bigram by utilizing unigram estimates using baye’s conditional probability theorem. Eg: {make,take} plans.
  • 5. As the estimation of probability of unseen bigram depends on unigram frequencies ,so this has undesirable result of assigning unseen bigrams the same probability if they are made up of unigrams of same frequency. Eg:{a b} and {c b}
  • 6. In this method, words of similar meaning are grouped together statically to form a class. So for a group of words there is only one representative , which is its class. A word is therefore modeled by average behavior of many words. When in doubt between two words search the testing data related to words of those classes. Eg: {a,b,c,d,e} & {f,g,h,I} W
  • 7. As the word is modeled by average behavior of many words so the uniqueness of meaning of word is lost. Eg: Thanda Initially probability for unseen word pairs remains zero which leads to extremely inaccurate estimates for word pair probabilities. Eg: Periodic table
  • 8. Estimates for most compatible(similar) words with a word w are combined and based on evidence provided by word w ’ ,is weighted by a function of its compatibility with w . No word pair is dropped even it is very rare one,as there were in katz’s back off scheme .
  • 9. Similarity based word sense can be achieved in 3 steps… A scheme for deciding which word pairs require similarity based estimation. A method for combining information from similar words. A function measuring similarity between words.
  • 10. Good points of katz’s back off scheme and MLE are combined… In the MLE probability is P ML(w2/w1) =c(w1,w2)/c(w1) But for similarity based sense P(w2/w1)={ P d (w2/w1) c(w1,w2)>0 for seen pair α (w1)P r (w2/w1) for unseen pair
  • 11. Similarity based models assume that if word w1’ is similar to word w1 ,then w1’ can yield the information about probability of unseen word pairs involving w1. It is proved that w2 is more likely to occur with w1 if it tends to occur with the words that are most similar to w1 . They used a weighted average of evidence provided by similar words, where the weight given to a particular word depends on its similarity to w1.
  • 12. Number of words similar to a word w1 are set up to a threshold value because in a large training set it will use very large amount of resources. Number of similar words(k) and threshold of dissimilarity between words(t) is tuned experimentally.
  • 13. These word similarity functions can be derived automatically from statistics of training data, as opposed to functions derived from manually constructed word classes. KL divergence Total divergence to average L1 Norm Confusion Probability
  • 14. KL divergence is standard measure of dissimilarity between two probability mass functions For D to be defined P(w2|w1’)>0 whenever P(w2|w1)>0. Above condition might not hold good in some cases,So smoothing is required which is very expensive for large vocabularies.
  • 15. It is a relative measure based on the total KL divergence to the average of two distributions: This is reduced to
  • 16. A (w1,w1’) is bounded ,ranging between 0 and 2log2. Smoothed estimates are not required because probability ratios are not involved. Calculation of A (w1,w1’) requires summing only over those w2 for which P(w2|w1) and P(w2|w1’) are both non zero, this makes computation quite fast.
  • 17. L1 norm is defined as by reducing it to form depending upon w2 It is also bounded between 0 to 2.
  • 18. It estimates that a word w1’ can be substituted with word w1 or not. Unlike the D , A , L w1 may not be “closest” to itself ie. there may exist a word w1’ such that
  • 19. As the sense of actual word may be very fine or very coarse, provided by the dictionary and it will take large amount of resources for training data to have correct sense,Experiment done on Pseudo Word . Eg: {make,take} plans {make,take} action where {make,take} is a pseudo word tested with plans and action.
  • 20. Each method in experiment is tested with a noun and two verbs and method decides which verb is more likely to have a noun as direct object. Experiment used 587833 bigrams to make bigram language model. Experiment tested with 17152 unseen bigrams by dividing it into five equal parts T1 to T5. Used error rate as performance metric.
  • 21. As Back off consistently performed worse than MLE so not including Back off in experiments. As only experiment is only on unsmoothed data so KL divergence is not included in experiments.
  • 22.  
  • 23.  
  • 24.  
  • 25. Similarity based methods performed 40% better over Back off and MLE methods. Singletons should not be omitted from training data for similarity based methods. Total divergence to average method ( A ) performs best in all cases.
  • 26.