Natural Language processing and web deigning notes
1. CS460/626 : Natural Language
Processing/Speech, NLP and the Web
Lecture 24, 25, 26
Wordnet
Pushpak Bhattacharyya
CSE Dept.,
IIT Bombay
17th and 19th (morning and night), 2013
12. WSD should be distinguished
from structural ambiguity
Correct groupings a must
…
Iran quake kills 87, 400 injured
When it rains cats and dogs run for cover
13. Should be distinguished from
structural ambiguity
Correct groupings a must
…
Iran quake kills 87, 400 injured
When it rains, cats and dogs runs for
cover
When it rains cats and dogs, run for
cover
14. Groups of words (Multiwords)
and names can be ambiguous
Broken guitar for sale, no strings
attached (Pun)
Washington voted Washington to power
pujaa ne pujaa ke liye phul todaa
(Pujaa plucked flowers for worship)
(deep world knowledge) The use of a
shin bone is to locate furniture in dark
room
16. Example of WSD
Operation, surgery, surgical operation, surgical procedure, surgical
process -- (a medical procedure involving an incision with instruments;
performed to repair damage or arrest disease in a living body; "they
will schedule the operation as soon as an operating room is available";
"he died while undergoing surgery") TOPIC->(noun) surgery#1
Operation, military operation -- (activity by a military or naval force
(as a maneuver or campaign); "it was a joint operation of the navy and
air force") TOPIC->(noun) military#1, armed forces#1, armed
services#1, military machine#1, war machine#1
Operation -- ((computer science) data processing in which the result
is completely specified by a rule (especially the processing that results
from a single instruction); "it can perform millions of operations per
second") TOPIC->(noun) computer science#1, computing#1
mathematical process, mathematical operation, operation --
((mathematics) calculation by mathematical methods; "the problems at
the end of the chapter demonstrated the mathematical processes
involved in the derivation"; "they were learning the basic operations of
arithmetic") TOPIC->(noun) mathematics#1, math#1, maths#1
IS WSD NEEDED IN LARGE APPLICATIONS?
17. Word ambiguitytopic drift in
IR
Query word:
“Madrid bomb blast case”
{case, container}
{case, suit, lawsuit}
{suit, apparel}
Drifted topic due to expanded term!!!
Drifted topic due to inapplicable sense!!!
19. How about WSD and MT?
Zaheer Khan, the India fast
bowler, has been ruled out of the
remainder of the series against
England.
He will return to India and will be
replaced by left-arm seamer RP
Singh.
Zaheer picked up a hamstring
injury during the first Test at
Lord's.
He had been withdrawn from the
squad for India's recent Test series
in the West Indies due to a right
ankle injury.
भारत क
े तेज गदबाज, जह र खान, इं लड
क
े खलाफ ृंखला क
े शेष क
े बाहर शासन
कया गया है. (ruled in the
administrative sense??)
वह भारत लौटने और बाएँ हाथ क
े तेज
गदबाज आरपी संह वारा त था पत
कया जाएगा.
जह र लॉ स म पहले टे ट क
े दौरान
हैमि ंग चोट उठाया. (lifted??)
वह भारत क वे ट इंडीज म हाल ह म
एक सह (correct??) टखने क चोट क
े
कारण टे ट ृंखला क
े लए ट म से वापस
ले लया गया था.
21. Psycholinguistic Theory
Human lexical memory for nouns as a hierarchy.
Can canary sing? - Pretty fast response.
Can canary fly? - Slower response.
Does canary have skin? – Slowest response.
(can move, has skin)
(can fly)
(can sing)
Wordnet - a lexical reference system based on psycholinguistic theories of
human lexical memory.
Animal
Bird
canary
22. Essential Resource for WSD:
Wordnet
Word Meanings
Word Forms
F1 F2 F3 … Fn
M1
(depend)
E1,1
(bank)
E1,2
(rely)
E1,3
M2
(bank)
E2,2
(embankme
nt)
E2,…
M3
(bank)
E3,2 E3,3
… …
Mm Em,n
23. Wordnet: History
The first wordnet in the world was for English
developed at Princeton over 15 years.
The Eurowordnet- linked structure of European
language wordnets was built in 1998 over 3 years
with funding from the EC as a a mission mode
project.
Wordnets for Hindi and Marathi being built at IIT
Bombay are amongst the first IL wordnets.
All these are proposed to be linked into the
IndoWordnet which eventually will be linked to the
English and the Euro wordnets.
24. Basic Principle
Words in natural languages are polysemous.
However, when synonymous words are put
together, a unique meaning often emerges.
Use is made of Relational Semantics.
25. Lexical and Semantic relations
in wordnet
1. Synonymy
2. Hypernymy / Hyponymy
3. Antonymy
4. Meronymy / Holonymy
5. Gradation
6. Entailment
7. Troponymy
1, 3 and 5 are lexical (word to word), rest are
semantic (synset to synset).
27. Fundamental Design Question
Syntagmatic vs. Paradigmatic relations?
Psycholinguistics is the basis of the design.
When we hear a word, many words come to
our mind by association.
For English, about half of the associated
words are syntagmatically related and half
are paradignatically related.
For cat
animal, mammal- paradigmatic
mew, purr, furry- syntagmatic
28. Stated Fundamental Application of
Wordnet: Sense Disambiguation
Determination of the correct sense of the
word
The crane ate the fish vs.
The crane was used to lift the load
bird vs. machine
29. The problem of Sense tagging
Given a corpora To Assign correct
sense to the words.
This is sense tagging. Needs Word
Sense Disambiguation (WSD)
Highly important for Question
Answering, Machine Translation,
Text Mining tasks.
31. Example of sense marking: its
need
एक_4187 नए शोध_1138 क
े अनुसार_3123 िजन लोग _1189 का सामािजक_43540 जीवन_125623
य त_48029 होता है उनक
े दमाग_16168 क
े एक_4187
ह से_120425 म अ धक_42403 जगह_113368 होती है।
(According to a new research, those people who have a busy social life, have larger space in a part of
their brain).
नेचर यूरोसाइंस म छपे एक_4187 शोध_1138 क
े अनुसार_3123 कई_4118 लोग _1189 क
े दमाग_16168
क
े क
ै न से पता_11431 चला क दमाग_16168 का एक_4187 ह सा_120425 ए मगडाला सामािजक_43540
य तताओं_1438 क
े साथ_328602 सामंज य_166
क
े लए थोड़ा_38861 बढ़_25368 जाता है। यह शोध_1138 58 लोग _1189 पर कया गया िजसम उनक
उ _13159 और दमाग_16168 क साइज़ क
े आँकड़े_128065
लए गए। अमर क _413405 ट म_14077 ने पाया_227806 क िजन लोग _1189 क सोशल नेटव कग
अ धक_42403 है उनक
े दमाग_16168 का ए मगडाला
वाला ह सा_120425 बाक _130137 लोग _1189 क तुलना_म_38220 अ धक_42403 बड़ा_426602 है।
दमाग_16168 का ए मगडाला वाला ह सा_120425
भावनाओं_1912 और मान सक_42151 ि थ त_1652 से जुड़ा हु आ माना
_212436 जाता है।
32. Ambiguity of लोग (People)
लोग, जन, लोक, जनमानस, पि लक - एक से अ धक
यि त "लोग क
े हत म काम करना चा हए"
(English synset) multitude, masses, mass, hoi_polloi,
people, the_great_unwashed - the common people
generally "separate the warriors from the mass"
"power to the people"
दु नया, दु नयाँ, संसार, व व, जगत, जहाँ, जहान, ज़माना,
जमाना, लोक, दु नयावाले, दु नयाँवाले, लोग - संसार म
रहने वाले लोग "महा मा गाँधी का स मान पूर दु नया
करती है / म इस दु नया क परवाह नह ं करता / आज
क दु नया पैसे क
े पीछे भाग रह है"
(English synset) populace, public, world - people in
general considered as a whole "he is a hero in the
eyes of the public”
33. Basic Principle
Words in natural languages are polysemous.
However, when synonymous words are put
together, a unique meaning often emerges.
Use is made of Relational Semantics.
Componential Semantics where each word is
a bundle of semantic features (as in the
Schankian Conceptual Dependency system or
Lexical Componential Semantics) is to be
examined as a viable alternative.
34. Componential Semantics
Consider cat and tiger.
Decide on componential
attributes.
For cat (Y, Y, N, Y)
For tiger (Y,Y,Y,N)
Complete and correct
Attributes are difficult
to design.
Furry
Furry Carnivorous
Carnivorous Heavy
Heavy Domesticable
Domesticable
35. Semantic relations in wordnet
1. Synonymy
2. Hypernymy / Hyponymy
3. Antonymy
4. Meronymy / Holonymy
5. Gradation
6. Entailment
7. Troponymy
1, 3 and 5 are lexical (word to word), rest are
semantic (synset to synset).
36. Synset: the foundation
(house)
1. house -- (a dwelling that serves as living quarters for one or more families; "he has a house on
Cape Cod"; "she felt she had to get out of the house")
2. house -- (an official assembly having legislative powers; "the legislature has two houses")
3. house -- (a building in which something is sheltered or located; "they had a large carriage house")
4. family, household, house, home, menage -- (a social unit living together; "he moved his family to
Virginia"; "It was a good Christian household"; "I waited until the whole house was asleep"; "the
teacher asked how many people made up his home")
5. theater, theatre, house -- (a building where theatrical performances or motion-picture shows can
be presented; "the house was full")
6. firm, house, business firm -- (members of a business organization that owns or operates one or
more establishments; "he worked for a brokerage house")
7. house -- (aristocratic family line; "the House of York")
8. house -- (the members of a religious community living together)
9. house -- (the audience gathered together in a theatre or cinema; "the house applauded"; "he
counted the house")
10. house -- (play in which children take the roles of father or mother or children and pretend to
interact like adults; "the children were playing house")
11. sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal
areas into which the zodiac is divided)
12. house -- (the management of a gambling house or casino; "the house gets a percentage of every
bet")
38. Synset creation (continued)
Home
John’s home was decorated with lights on the occasion of
Christmas.
Having worked for many years abroad, John Returned home.
House
John’s house was decorated with lights on the occasion of
Christmas.
Mercury is situated in the eighth house of John’s horoscope.
39. Synsets (continued)
{house} is ambiguous.
{house, home} has the sense of a social unit living
together;
Is this the minimal unit?
{family, house , home} will make the unit completely
unambiguous.
For coverage:
{family, household, house, home} ordered according
to frequency.
Replacability of the most frequent words is a
requirement.
40. Synset creation
From first principles
Pick all the senses from good standard
dictionaries.
Obtain synonyms for each sense.
Needs hard and long hours of work.
41. Synset creation (continued)
From the wordnet of another language in the same
family
Pick the synset and obtain the sense from the
gloss.
Get the words of the target language.
Often same words can be used- especially for
t%sama words.
Translation, Insertion and deletion.
42. Synset+Gloss+Example
Crucially needed for concept explication, wordnet building using
another wordnet and wordnet linking.
English Synset: {earthquake, quake, temblor, seism} -- (shaking
and vibration at the surface of the earth resulting from
underground movement along a fault plane of from volcanic
activity)
Hindi Synset: {भूक
ं प, भूचाल, भूडोल, जलजला, भूक प, भू-क
ं प, भू-
क प, ज़लज़ला, भू मक
ं प, भू मक प - ाकृ तक कारण से पृ वी क
े भीतर
भाग म क
ु छ उथल-पुथल होने से ऊपर भाग क
े सहसा हलने क
या "२००१ म गुज़रात म आये भूक
ं प म काफ़ लोग मारे गये थे"
(shaking of the surface of earth; many were killed in the earthquake in
Gujarat)
Marathi Synset: धरणीक
ं प,भूक
ं प - पृ वी या पोटात य ोभ होऊन पृ ठभाग
हाल याची या "२००१ साल गुजरातम ये झाले या धरणीक
ं पात अनेक लोक
मृ युमुखी पडले
"
43. Semantic Relations
Hypernymy and Hyponymy
Relation between word senses (synsets)
X is a hyponym of Y if X is a kind of Y
Hyponymy is transitive and asymmetrical
Hypernymy is inverse of Hyponymy
(lion->animal->animate entity->entity)
44. Semantic Relations (continued)
Meronymy and Holonymy
Part-whole relation, branch is a part of tree
X is a meronymy of Y if X is a part of Y
Holonymy is the inverse relation of
Meronymy
{kitchen} ………………………. {house}
45. Lexical Relation
Antonymy
Oppositeness in meaning
Relation between word forms
Often determined by phonetics, word
length etc. ({rise, ascend} vs. {fall,
descend})
48. Entailment
Snoring entails sleeping.
Buying entails paying.
Proper Temporal Inclusion.
Inclusion can be in any way.
Sleeping temporally includes snoring.
Buying temporally includes paying.
Co-extensiveness. (Troponymy)
Limping is a manner of walking.
49. Opposition among verbs.
{Rise,ascend} {fall,descend}
Tie-untie (do-undo)
Walk-run (slow,fast)
Teach-learn (same activity different perspective)
Rise-fall (motion upward or downward)
Opposition and Entailment.
Hit or miss (entail aim) . Backward presupposition.
Succeed or fail (entail try.)
50. The causal relationship.
Show- see.
Give- have.
Causation and Entailment.
Giving entails having.
Feeding entails eating.
52. Kinds of Antonymy
Size
Size Small
Small -
- Big
Big
Quality
Quality Good
Good –
– Bad
Bad
State
State Warm
Warm –
– Cool
Cool
Personality
Personality Dr. Jekyl
Dr. Jekyl-
- Mr. Hyde
Mr. Hyde
Direction
Direction East
East-
- West
West
Action
Action Buy
Buy –
– Sell
Sell
Amount
Amount Little
Little –
– A lot
A lot
Place
Place Far
Far –
– Near
Near
Time
Time Day
Day -
- Night
Night
Gender
Gender Boy
Boy -
- Girl
Girl
53. Kinds of Meronymy
Component
Component-
-object
object Head
Head -
- Body
Body
Staff
Staff-
-object
object Wood
Wood -
- Table
Table
Member
Member-
-collection
collection Tree
Tree -
- Forest
Forest
Feature
Feature-
-Activity
Activity Speech
Speech -
- Conference
Conference
Place
Place-
-Area
Area Palo Alto
Palo Alto -
- California
California
Phase
Phase-
-State
State Youth
Youth -
- Life
Life
Resource
Resource-
-process
process Pen
Pen -
- Writing
Writing
Actor
Actor-
-Act
Act Physician
Physician -
-
Treatment
Treatment
54. Gradation
State
State Childhood, Youth, Old
Childhood, Youth, Old
age
age
Temperature
Temperature Hot, Warm, Cold
Hot, Warm, Cold
Action
Action Sleep, Doze, Wake
Sleep, Doze, Wake
55. Metonymy
Associated with Metaphors which are
epitomes of semantics
Oxford Advanced Learners Dictionary
definition: “The use of a word or phrase
to mean something different from the
literal meaning”
Does it mean Careless Usage?!
56. Insight from Sanskritic
Tradition
Power of a word
Abhidha, Lakshana, Vyanjana
Meaning of Hall:
The hall is packed (avidha)
The hall burst into laughing (lakshana)
The Hall is full (unsaid: and so we cannot
enter) (vyanjana)
57. Metaphors in Indian Tradition
upamana and upameya
Former: object being compared
Latter: object being compared with
Puru was like a lion in the battle with
Alexander (Puru: upameya; Lion:
upamana)
58. Upamana, rupak, atishayokti
upamana: Explicit comparison
Puru was like a lion in the battle with
Alexander
rupak: Implicit comparison
Puru was a lion in the battle with
Alexander
Atishayokti (exaggeration): upamana
and upameya dropped
Puru’s army fled. But the lion fought on.
59. Modern study (1956 onwards,
Richards et. al.)
Three constituents of metaphor
Vehicle (items used metaphorically)
Tenor (the metaphorical meaning of the former)
Ground (the basis for metaphorical extension)
“The foot of the mountain”
Vehicle: :foot”
Tenor: “lower portion”
Ground: “spatial parallel between the relationship
between the foot to the human body and the
lower portion of the mountain with the rest of the
mountain”
60. Interaction of semantic fields
(Haas)
Core vs. peripheral semantic fields
Interaction of two words in metonymic
relation brings in new semantic fields
with selective inclusion of features
Leg of a table
Does not stretch or move
Does stand and support
62. Mapping Relations: ontological
correspondences
Anger is heat
of fluid in
container
Heat
Heat
(i) Container
(i) Container
(ii) Agitation of
(ii) Agitation of
fluid
fluid
(iii) Limit of
(iii) Limit of
resistence
resistence
(iv) Explosion
(iv) Explosion
Anger
Anger
Body
Body
Agitation of
Agitation of
mind
mind
Limit of ability
Limit of ability
to suppress
to suppress
Loss of control
Loss of control
63. Image Schemas
Categories: Container Contained
Quantity
More is up, less is down: Outputs rose
dramatically; accidents rates were lower
Linear scales and paths: Ram is by far the best
performer
Time
Stationary event: we are coming to exam time
Stationary observer: weeks rush by
Causation: desperation drove her to extreme
steps
64. Patterns of Metonymy
Container for contained
The kettle boiled (water)
Possessor for possessed/attribute
Where are you parked? (car)
Represented entity for representative
The government will announce new targets
Whole for part
I am going to fill up the car with petrol
65. Patterns of Metonymy (contd)
Part for whole
I noticed several new faces in the class
Place for institution
Lalbaug witnessed the largest Ganapati
Question: Can you have part-part metonymy
66. Purpose of Metonymy
More idiomatic/natural way of expression
More natural to say the kettle is boiling as
opposed to the water in the kettle is boiling
Economy
Room 23 is answering (but not *is asleep)
Ease of access to referent
He is in the phone book (but not *on the back of
my hand)
Highlighting of associated relation
The car in the front decided to turn right (but not
*to smoke a cigarette)
67. Feature sharing not necessary
In a restaurant:
Jalebii ko abhi dudh chaiye (no feature
sharing)
The elephant now wants some coffee
(feature sharing)
68. Proverbs
Describes a specific event or state of
affairs which is applicable
metaphorically to a range of events or
states of affairs provided they have the
same or sufficiently similar image-
schematic structure
72. Size of Indian Language
wordnets (June, 2012) 1/2
Assamese 14958 Guahati University, Guahati, Assam
Bengali 23765 Indian Statistical Institute, Kolkata, West Bengal
Bodo 15785 Guahati University, Guahati, Assam
Gujarati 26580 Dharmsingh Desai University, Nadiad, Gujarat
Kannada 4408 Mysore University, Mysore, Karnataka
Kashmiri 23982 Kashmir University, Srinagar, Jammu and Kashmir
Konkani 25065 Goa University, Panji, Goa
Malayalam 8557 Amrita University, Coimbatore, Tamilnadu
Manipuri 16351 Manipur University, Imphal, Manipur
Marathi 24954 IIT Bombay, Mumbai, Maharastra
73. Size of Indian Language
wordnets (June, 2012) 2/2
Nepali 11713 Assam University, Silchar, Assam
Oriya 31454 Hyderabad Central University, Hyderabad, Andhra
Pradesh
Punjabi 22332 Thapar University and Punjabi University, Patiala,
Punjab
Sanskrit 18980 IIT Bombay, Mumbai
Tamil 8607 Tamil University, Thanjavur, Tamilnadu
Telugu 14246 Dravidian University, Kuppam, Andhra Pradesh
Urdu 23071 Jawaharlal Nehru University, New Delhi
74. Categories of Synsets (1/2)
•Universal: Synsets which have an indigenous lexeme in
all the languages (e.g. Sun ,Earth).
•Pan Indian: Synsets which have indigenous lexeme in all
the Indian languages but no English equivalent (e.g.
Paapad).
•In-Family: Synsets which have indigenous lexeme in the
particular language family (e.g. the term for Bhatija in
Dravidian languages).
75. Categories of Synsets (2/2)
•Language specific: Synsets which are unique to a
language (e.g. Bihu in Assamese language)
•Rare: Synsets which express technical terms (e.g. ngram).
•Synthesized: Synsets created in the language due to
influence of another language (e.g. Pizza).
76. Expansion approach: linking is
a subtle and difficult process
To link or not to link
While linking:
face lexical and semantic chasms
Syntactic divergences in the example
sentences
Change of POS
Copula drop (HindiBangla)
77. Recap: Synset creation by
Expansion approach
From the wordnet of another language preferably in the
same family
Pick the synset and obtain the sense from the
gloss.
Get the words of the target language.
Often same words can be used- especially for
words with the same etymology borrowed from
the parent language in the typology.
Translation, Insertion and deletion.
78. Illustration of expansion
approach with noun1
English
bank (sloping land
(especially the slope
beside a body of water))
"they pulled the canoe up
on the bank"; "he sat on
the bank of the river and
watched the currents"
French (wrong!)
banque (les terrains en
pente (en particulier la
pente à côté d'un plan
d'eau)) "ils ont tiré le
canot sur la rive», «il
était assis sur le bord de
la rivière et j'ai vu les
courants"
79. Illustration of expansion
approach with noun2
English
bank (sloping land
(especially the slope
beside a body of water))
"they pulled the canoe up
on the bank"; "he sat on
the bank of the river and
watched the currents"
French
{rive, rivage, bord} (les
terrains en pente (en
particulier la pente à côté
d'un plan d'eau)) "ils ont
tiré le canot sur la
rive», «il était assis sur le
bord de la rivière et j'ai
vu les courants"
cote Rive,
rivage
bord
bank
?
edge
English Wordnet French Wordnet
No hypernymy
in the synset
80. Illustration of expansion
approach with verb3
English
trust, swear, rely, bank
(have confidence or faith
in) "We can trust in God";
"Rely on your friends";
"bank on your good
education"
French
compter_sur,
avoir_confiance_en,
se_fier_a ’,
faire_confiance_a’ (avoir
confiance ou foi en)
"Nous pouvons faire
confiance en Dieu»,«Fiez-
vous à vos amis",
Ordered by frequency
81. Case of kashmiri
Linking kinship relations and fine
grained concepts
Relative
Uncle
Mama
Chacha
पानी direct आब
पानी hypernym ेश
82. Important decision
TWO kinds of linkages
Direct
Hypernymy
Case of kashmiri
पानी direct आब
पानी hypernym ेश
84. Transliteration: often
employed
Synset ID : 39 POS : adjective Synonyms :
सनाथ, (sanaatha)
Gloss : िजसका कोई पालन-पोषण या देखभाल करने वाला
हो (orphan)
Example statement : "सनाथ बालक को अनाथ बालक
क मदद करनी चा हए (children who are looked after
should help the ones who are orphans)/ साधक भु
का हो जाने पर अनाथ नह ं रहता, सनाथ हो जाता है”
Transliterated and adopted by Bangla and Gujarati
86. Linking
Linking synsets
synsets across languages:
across languages:
Influence on Hindi
Influence on Hindi Wordnet
Wordnet
Hindi wordnet has to add new synsets to accommodate
language specific concepts, e.g., in Gujarati
ભૈરવજપ (bhairav jap)
ID :: 103040
CAT :: NOUN
CONCEPT :: मो क
े लए जप करते हु ए पवत पर से अपने आप को
गराना (Taking God’s name and throwing oneself from
atop a mountain to attain liberation)
EXAMPLE :: गरनार क
े शखर पर से या क भैरवजप करते
थे एसा माना जाता है। (it is though that pilgrms used to
do bhairav jap atop girnar)
SYNSET-HINDI :: भैरवजप
95. Achievements so far
Significant progress in the number and
quality of synsets of all languages
Project on track
Deep insights obtained in the expansion
approach
Platform created for multilingual WSD,
Multilingual sense based dictionary
Paving way for CLIR, MT
Students graduated/graduating (PhD,
masters, bachelors)
Significant publications (LREC, GWC)