Natural Language Processing Topics for Engineering students

 Distinction between language processing
applications and Data processing Systems-
Knowledge of Language
 Eg: use of wc in Linux
 Usage of data processing - count bytes and lines
 Usage of knowledge- count the words in a file(needs a
knowledge about what it means to be a word)

 Eg:HAL, conversational agent
 Able to recognize words from an audio
signal(speech recognition) and generate
audio signal from a sequence of
words(speech synthesis).
 Knowledge required: phonetics, phonology
How words are
pronounced
How sounds are
realized acoustically

 Morphology: producing and recognizing
variations of individual words(like singular and
plural)
 Structural knowledge(syntax): properly string
together the words that constitute its response.
 Semantics: knowledge of meaning
 Lexical semantics(meaning of all words, like silk, export in the
following eg)
 Compositional semantics(what does end mean when combined
with 18th century)
Eg: how much Chinese silk was exported to western Europe by
the end of 18th century?

 Pragmatics: Knowledge of the relationship of
meaning to the goals and intentions of the
speaker.
Example:
Request: John, open the front door.
statement: John, the front door is open.
Information Question: John, is the front door
open?
 Knowledge about kind of actions that
speakers intend by their use of sentences.
 Also known as dialogue knowledge.

 Discourse :Knowledge about linguistic units
larger than a single utterance.
Example: How many students were in classroom
that time?
To interpret words like that year ,a QA System
needs to examine the earlier questions that
were asked,for example in above context,qn
may be
ISRO Scientist came to the CSE class room for
the motivational speech by last week of
January.

 The use of knowledge about how words like
that or pronouns like it or she refer to
previous parts of the discourse known as
coreference resolution.

 If multiple, alternative linguistic structures
can be built for an input – it is ambiguous.
 Example: John went to the bank.
(The bank may be edge of the river or
financial bank.)

 Part of speech tagging: deciding whether the
word is noun or verb.
 Word sense disambiguation: deciding the
correct sense based on context.
 Speech act interpretation: Determining
whether a sentence is a statement or
question.
 Probabilistic parsing: for addressing syntactic
disambiguation.

 various kinds of linguistic knowledge can be
captured through formal models or theories.
 state machines(formal models that consists of states, transitions
among states, and an input representation.)
 Rule systems(Regular grammars, context free grammars)
 Logic
 Probabilistic models
 Vector space models

 Models in turn lend themselves to a small
number of algorithms.
 State space search
Dynamic Programming
 Machine learning Algorithms
Classifiers
Expectation-Maximization

 Truly intelligent machines-Ability of
computer to process language skilfully as
humans do.
 First and major work by-Alan Turing
 Turing Test-game in which computer’s use of
language would form the basis for
determining if the machine could think. If the
machine could win the game, it would be
judged intelligent.

 Three participants are there in turing’s game.
 Two people and a computer.
 One of the people is contestant and plays the
role of interrogator.
 To win, interrogator have to determine which
of the other participant is the machine by
asking a series of questions.
 Turing System helped a lot in the invention of
ELIZA(natural language processing system
capable of carrying on a limited form of
conversation with a user.)

 ELIZA is a simple program that uses pattern
matching to process the input and translate it
into suitable outputs.
 Many people interacted with ELIZA believed
that it can understood them and their
problems. Even after the program operation
have explained to them, still the people were
continue their belief in ELIZA.
 These facts lead to the design of
conversational agents.

 Regular expression is one way of describing finite
automata.
 FSA Finite-state automata are the theoretical
foundation of a good deal of the computational
work.
 Any regular expression can be implemented as a
Finite state automaton.
 Symmetrically, any finite-state automaton can be
described with a regular expression.

 We can represent the automaton as a directed
graph: a finite set of vertices (also called nodes),
together with a set of directed links between pairs
of vertices called arcs. We’ll represent vertices
with circles and arcs with arrows.
 The automaton has five STATES, which are
represented by nodes in the graph. State 0 is the
start state. State 4 is the final state or accepting
state, which we represent by the double circle. It
also has four transitions, which we represent by
arcs in the graph.

 the sheep language can be defined as any
string from the following (infinite) set:
baa!
baaa!
baaaa!
baaaaa!
baaaaaa!
------.

 FSA can be used for recognizing (we also say
accepting) strings in the following way. First,
think of the input as being written on a long
tape broken up into cells, with one symbol
written in each cell of the tape, as the
following figure shows.

 The machine starts in the start state (q0), and iterates the
following process:
 Check the next letter of the input. If it matches the
symbol on an arc leaving the current state, then cross that
arc, move to the next state, and also advance one symbol
in the input. If we are in the accepting state (q4) when we
run out of input, the machine has successfully recognized
an instance of sheeptalk.

 If the machine never gets to the final state,
either because it runs out of input, or it gets
some input that doesn’t match an arc, or if it
just happens to get stuck in some non-final
state, we say the machine rejects or fails
REJECTS to accept an input.

 We can represent Automaton with a state
transition table.
 the state-transition table represents the start state,
the accepting states, and what transitions leave
each state with which symbols.

 marked state 4 with a colon to indicate that it’s a final
state (you can have as many final states as you want),
and the /0 indicates an illegal or missing transition.
 We can read the first row as “if we’re in state 0 and
we see the input b we must go to state 1. If we’re in
state 0 and we see the input a or !, we fail”.

 Algorithm for recognizing a string using a
state-transition table. The algorithm is called
D-RECOGNIZE for “deterministic
recognizer”.
 A deterministic algorithm is one that has no
choice points; the algorithm always knows
what to do for any input.

 The algorithm will fail whenever there is no
legal transition for a given combination of
state and input.
 We can think of “empty” elements in the table
as if they all pointed at one “empty” state,
which we might call the fail state or sink
state.

 A model which can both generate and
recognize all and only the strings of a formal
language acts as a definition of the formal
language.
 A formal language is a set of strings, each
string composed of symbols from a
ALPHABET finite symbol-set called an
alphabet.

So the formal language defined by our sheeptalk automaton m in
Fig. 2.10 (and Fig. 2.12) is the infinite set:
(2.1) L(m) = {baa!,baaa!,baaaa!,baaaaa!,baaaaaa!, . . .}

 A formal language is a set of strings, each
string composed of symbols from a finite
symbol-set called an alphabet.
 we can use L(m) to mean “the formal language
characterized by m”.
Example:
The alphabet for the sheep language is the set S = {a,b,
!}.
So the formal language defined by our sheep talk
automaton m is the infinite set:
L(m) = {baa!,baaa!,baaaa!,baaaaa!,baaaaaa!, . . .}

 Formal languages are not the same as natural
languages, which are the kind of languages
that real people speak.
 a formal language may bear no resemblance at
all to a real language.
 But we often use a formal language to model
part of a natural language.

 In Deterministic FSAs each of its transactions
is uniquely determined by its source state
and input symbol.
 But in NFSA, for some state and input symbol,
the next state may be nothing or one or two
or more possible state.

 Here, we get in state2,if we seen an a, we don’t
know whether to remain in state2 or go on to
state 3.Automata with decision points like
these are called NFSA.
 NFSA have been generalized in multiple ways:
◦ NFSA with Epsilon moves
◦ Finite State Transducers
◦ Push Down Automata
◦ Probabilistic Automata

 Since there is more than one choice point in
NFAs may lead to a wrong choice. There are 3
solutions to the problem of this non
determinism:
◦ Backup: Whenever we come to a choice point, we could put a
marker to mark where we were in the input, and what state the
automaton was in. Then if it turns out that we took the wrong
choice, we could back up and try another path.
◦ Look-ahead: We could look ahead in the input to help us decide
which path to take.
◦ Parallelism: Whenever we come to a choice point, we could look
at every alternative path in parallel.

 ND-RECOGNIZE accomplishes the task of
recognizing strings in a regular language by
providing a way to systematically explore all the
possible paths through a machine.
 If this exploration yields a path ending in an
accept state, it accepts the string, otherwise it
rejects it.

 Algorithms which operates by systematically
searching for solutions are known as state
space search algorithms.
 Goal is to explore the space of possible
solutions, return answer when one is found, or
rejects the input when the space has been
exhaustively explored.
 Effectiveness depends on the order in which
the state in the space are considered.

 Depth First Search or Last In First Out
 Breadth First Search or First In First Out

 Consider an ordering strategy where the
states that are considered next are the most
recently created ones.
 Such a policy can be implemented by placing
newly created states at the front of the
agenda and having NEXT return the state at
the front of the agenda when called.
 Thus the agenda is implemented by a stack.

 The second way to order the states in the search space
is to consider states in the order in which they are
created.
 Such a policy can be implemented by placing newly
created states at the back of the agenda and still have
NEXT return front of the agenda.
 Thus the agenda is implemented via a queue. This is
commonly referred to as a breadth-first search or
First In First Out (FIFO) strategy.

 Study of the way words are built up from smaller
meaning-bearing units morphemes.
 Example:
cats - consisting of two morpheme cat and –s.
fox -consists of only one morpheme fox.
 Two broad classes of morphemes
 Stems-main morpheme of the word, which supplies the main
meaning.
 Affixes- add additional meanings of various kinds.

 Prefixes-precede the stem.
 Eg:unhappy –composed of stem happy and the prefix un-
 Suffixes- follow the stem
 Eg: eats-composed of stem eat and suffix –s.
 Infixes-inserted inside the stem
 Eg: fanbloomingtastic-word blooming in middle of fantastic.
 Circumfixes- do both including precede or
follow.
 English and Malayalam doesn’t really have circumfixes,
but many other languages like German do.
 Eg:segan(say)-gesagt(said)

 A word can have more than one affix.
◦ Eg: rewrites :
stem - write
prefix - re-
suffix - –s.
◦ Eg:unbelievably:
stem - believe
Prefix - un-
Suffix - -able, -ly

 English doesn’t support more than four or five
affixes.
 But some languages like Turkish can support
words with nine or ten affixes, called
agglutinative languages.

 Inflection-combination of a word stem with a
grammatical morpheme resulting in a word of the
same class as the original stem.
 Eg:-s for marking the plural on nouns and –ed for
marking the past tense on verbs.
 Derivation-combination of a word stem with a
grammatical morpheme resulting in a word of the
a different class as the original stem.
 Eg:computerize by adding derivational suffix
–ation can make computerization.

 Compounding-combination of multiple word stems
together.
 Eg:doghouse-concatenation of morphemes dog and
house.
 Cliticization-combination of a word stem with clitic.
Acts syntactically like a
word,but is reduced in form and
attached to anoth er word.
Example:I ‘ve

 English has relatively simple inflectional
system.
 Only nouns, verbs and some adjectives can
be inflected.
 Nouns can have two kinds of inflection-an
affix that marks plural and an affix that
marks possessive.

 English nouns may appear in two forms:
Regular Nouns
Plural is formed with –s after
most nouns
Eg:catcats
But for words ending in –s,-z,-
sh,-ch and sometimes –x
plural is formed with –es.
Eg:
ibisibises
waltzwaltzes
thrushthrushes
finchfinches
boxboxes
Irregular nouns
Eg:
Mousemice
oxoxen

 Possessive suffix is realized by apostrophe +
-s for regular singular nouns(eg: llama’s)
and plural nouns not ending in –s
eg:children’s.
 For regular plural nouns and for ends in –s,
possessive suffix will be realized by lone
apostrophe after nouns.
Eg:Euripides’

 English verbal inflection is more complicated
than nominal inflection. First, English has three
kinds of verbs; main verbs, (eat, sleep,
impeach), modal verbs (can, will, should), and
primary verbs (be, have, do) .
 we will mostly be concerned with the main and
primary verbs, because it is these that have
inflectional endings.
 Of these verbs a large class REGULAR are
regular, that is to say all verbs of this class
have the same endings marking the same
functions. These regular verbs (e.g. walk, or
inspect) have four morphological forms, as
follow:

 These verbs are called regular because just by
knowing the stem we can predict the other
forms by adding one of three predictable
endings and making some regular spelling
changes.
 Irregular verbs are those that have some more
or less idiosyncratic forms of inflection.
Irregular verbs in English often have five
different forms, but can have as many as eight
(e.g., the verb be) or as few as three (e.g. cut or
hit).

 Note that PRETERITE an irregular verb can
inflect in the past form (also called the
preterite) by changing its vowel (eat/ate), or
its vowel and some consonants
(catch/caught), or with no change at all
(cut/cut).

 Derivation in English is quite complex.
 Common kind of derivation in English is the
formation of new nouns from verbs or
adjectives.
 The process is known as nominalization.
 Verbs ending in suffix –ize can be converted to
nouns by adding the suffix –ation.
Eg: computerizecomputerization.
Killkiller
Fuzzyfuzziness

 Adjectives can also be derived from verbs and
nouns.
 Computation(Noun) + -al computational
 Embrace(verb) + -ableembraceable
 Clue(Noun) + -lessclueless

 Less productive.
 Subtle and complex meaning differences
among nominalising suffixes.

 Clitic can be two types:
 Proclitics:clitics preceding a word
 Enclitics: clitics following a word
Full form Clitic Full form clitic
am ‘m have ‘ve
are ‘re has ‘s
is ‘s had ‘d
will ‘ll would ‘d

 Clitics in English are ambiguous.
eg: she ‘s  it can be she is or she has.
 Except for few such ambiguities, correctly
segmenting clitics in English is simplified by
the presence of an apostrophe.
 Clitics can be harder to parse in other
languages.

 Morphological parsing results
stem+assorted morphological features.
Example:
Cats cat+N+PL
Cat cat+N+SG
Caught catch+V+Past

 Lexicon: the list of stems and affixes,together
with basic information about them(like whether
the stem is verb or noun).
 Morphotactics: model of morpheme ordering that
explains which classes of morphemes can follow
other classes of morphemes inside a word.
 Orthographic rules: spelling rules used to model
the changes that occur in a word, usually when
two morphemes combine.
 Example:city+-scities(y ie)

 A lexicon is a repository for words.
 The simplest possible lexicon would consist
of an explicit list of every word of the
language.

 In finite state morphology a word can be
represented as a correspondence between
two levels.
Lexical Level
Surface Level
represents a concatenation of morphemes making up a word.
represents the concatenation of letters which make up the actual
spelling of the word.

 Irregular plurals like geese will parse into the
correct stem goose +N +Pl.
 We do this by allowing the lexicon to also
have two levels. Since surface geese maps to
lexical goose, the new lexical entry will be
“g:g o:e o:e s:s e:e”.
 Regular forms are simpler; the two-level
entry for fox will now be “f:f o:o x:x”, but by
relying on the orthographic convention that f
stands for f:f and so on, we can simply refer
to it as fox and the form for geese as “g o:e
o:e s e”.

 Since the output symbols include the
morpheme and word boundary markers ˆ and
#, the lower labels of above Figure do not
correspond exactly to the surface level. Hence
we refer to tapes with these morpheme
boundary markers as intermediate tapes.

 English often requires spelling changes at
morpheme boundaries by introducing
spelling rules (or orthographic rules).
 the ability to implement rules as a transducer
turns out to be useful throughout speech and
language processing.

 FST Lexicons can be combined with
orthographic rules for parsing and generating.
 Lexical transducer maps between the lexical
level and an intermediate level.
 A host of transducers each representing a
single spelling rule constraint, all run in
parallel to map between this intermediate level
to surface level.

 The above architecture is a two level cascade
of transducers.
 Cascading two automata means running them
in series with the output of the first feeding
the input to the second.
 Cascades can be of arbitrary depth.
 Each level might be built out of many
transducers.
 Cascade can be run top-down to generate a
string.
 Cascade can be run bottom up to parse it.

 Parsing may be slightly complicated than
generation.
 Ambiguity may arise in parsing.
 Example:Foxes can also be a verb rather than
noun.
So, after parsing two possibilities:
Fox+N+PL
Fox+V+3Sg
 Transducer will enumerate the possible
choices and transduces both.

 Local ambiguity may also arise during parsing.
 To handle this type of non determinism ,FST
parsing algorithms need to incorporate some
sort of search algorithm.

 Running a cascade can be made more
efficient by composing and intersecting the
transducers.
 Transducers in general are not not closed
under intersection.
 Transducers between strings of equal length
are closed under intersection.

 The intersection of two transducers F and G
defines a relation R such that R(x,y) if and
only if F(x,y) and G(x,y).
 intersection algorithm just takes the
INTERSECTION Cartesian product of the
states, i.e., for each state qi in machine 1 and
state qj in machine 2, we create a new state
qi j . Then for any input symbol a, if machine
1 would transition to state qn and machine 2
would transition to state qm, we transition to
state qnm.

 One of the most widely used such stemming
algorithms is the simple and efficient Porter
(1980) algorithm, which is based on a series
of simple cascaded rewrite rules.
 Since cascaded rewrite rules are just the sort
of thing that could be easily implemented as
an FST, we think of the Porter algorithm as a
lexicon-free FST stemmer.

 Example Rules:
 ATIONAL→ ATE (e.g., relational→ relate)
 ING→ € if stem contains vowel (e.g.,
motoring→ motor)
 SSES-> SS(eg: grasses grass)
 stemming tends to somewhat improve the
performance of information retrieval,
especially with smaller documents (the larger
the document, the higher the chance the
keyword will occur in the exact form used in
the query).

 Nonetheless, not all IR engines use
stemming, partly because of stemmer errors
such as shown in the following.

 psycholinguistic studies on how multi-
morphemic words are represented in the
minds of speakers of English.
 Example: walk walked,walks
All the three words are listed in the human
lexicon? or walk along with –ed and –s?
 Two hypothesis are proposed by scientists
based on human’s lexicon representation in
mind.

 Full Listing Hypothesis: all words of a
language are listed in the mental lexicon
without any internal morphological structure.
 Minimum redundancy Hypothesis: suggests
that only the constituent morphemes are
represented in the lexicon, and when
processing walks, we must access both
morphemes (walk and -s) and combine them.

 Some of the earliest evidence that the human
lexicon represents at least some morphological
structure comes from speech errors, also called
slips of the tongue.
 inflectional and derivational affixes can appear
separately from their stems. The ability of these
affixes to be produced separately from their
stem suggests that the mental lexicon contains
some representation of morphological structure.
 Example:
◦ it’s not only us who have screw looses (for “screws
loose”)
◦ words of rule formation (for “rules of word formation”)
◦ easy enoughly (for “easily enough”)

 More recent experimental evidence suggests
that neither the full listing nor the minimum
redundancy hypotheses may be completely
true. Instead, it’s possible that some but not all
morphological relationships are mentally
represented.
 Done using repetitive prime experiment.
Findings
Word is recognized faster if it has been seen before(if it is primed).
spoken derived words can prime their stems, but only if the meaning
of the derived form is closely related to the stem.
Example: Government primesgovern but
Department doesn’t prime depart

 Early results suggest that (at least) productive
morphology like inflection does play an online
role in the human lexicon.
 Studies have also shows that words with a
larger morphological family size are recognized
faster.
number of other multi morphemic words
and compounds in which it appears
Example: family for fear, for example, includes
fearful, fearfully, fearfulness, fearless, fearlessly,
fearlessness, fearsome, and god fearing for a total size
of 9.

 Studies shown that words with a larger
morphological family size are recognized
faster.
 Recent work has further shown that word
recognition speed is effected by the total
amount of information (or entropy) contained
by the morphological paradigm.
It is a measure of information. It can be used as a metric
for how much information there is in a particular
grammar, for how well a given grammar matches a given
language etc.

WORD CLASSES
&
PART OF SPEECH TAGGING
(Module II)

 Parts of speech can be divided into two broad
categories:
 those that have not fixed membership
◦ Open Classes
◦ Closed classes
those that have relatively fixed
membership

 prepositions are a closed class because there
is a fixed set of them in English; new
prepositions are rarely coined.
 nouns and verbs are open classes because
new nouns and verbs are continually coined
or borrowed from other languages.

 Closed class words are also generally function
words like of, it, and, or you, which tend to
be very short, occur frequently, and often
have structuring uses in grammar.
 There are four major open classes that occur
in the languages of the world; nouns, verbs,
adjectives, and adverbs. It turns out that
English has all four of these, although not
every language does.

 Divided into two types:
 Proper nouns(names of specific people or entities)
 In written English, proper nouns are capitalized.
 Example: John, IBM
 Common nouns
Divided into : count nouns
mass nouns

 Count nouns are those that allow
grammatical enumeration; that is, they can
occur in both the singular and plural
(goat/goats, relationship/relationships) and
they can be counted (one goat, two goats).
 Mass nouns are used when something is
conceptualized as a homogeneous group. So
words like snow, salt, and communism are
not counted.

 The verb class includes most of the words
referring to actions VERB and processes,
including main verbs like draw, provide,
differ, and go.
 Verbs have a number of morphological
classes, including progressive(eg:eating),past
participle(eaten) etc.

 semantically this class includes many terms
that describe properties or qualities.
 Most languages have adjectives for the
concepts of color (white, black), age (old,
young), and value (good, bad), but there are
languages without adjectives.
 eg: Korean

 rather a hodge-podge (heterogeneous mixture),
both semantically and formally.
 An adverb is a word that describes-modifies as
grammarians put it-a verb, an adjective or
another adverb. They simply explain most
about the action.
 Example: He quickly runs
 She slowly walks

 Directional adverbs or locative adverbs
(home, here, downhill) specify the direction
or location of some action;
 degree adverbs (extremely, very, somewhat)
specify the extent of some action, process, or
property.
 manner adverbs (slowly, delicately) describe
the manner of some action or process
 temporal adverb describe the time that some
action or event took place (yesterday,
Monday).

 Differ from language to language than do open
classes.
 important closed classes in English, with
a few examples of each:
• prepositions: on, under, over, near, by, at,
from, to, with
• determiners: a, an, the
• pronouns: she, who, I, others
• conjunctions: and, but, or, as, if, when
• auxiliary verbs: can, may, should, are
• particles: up, down, on, off, in, out, at, by,
• numerals: one, two, three, first, second, third

 Tagsets for English, many of which evolved
from the 87-tag tagset used for the Brown corpus .
 The Brown corpus is a 1 million word collection of
samples from 500 written texts from different genres
(newspaper, novels, non-fiction, academic, etc.)
 This corpus was tagged with parts-of-speech by first
applying the TAGGIT program and then hand-
correcting the tags.

 Brown tagset
 Small 45 tag penn Treebank Test
 Medium size 61 tag C5 tagset

 Penn Treebank is one of the most widely one
used.
 Used to tag brown corpus, Wall street Journal
corpus, Switchboard corpus and many others.
 The following will show the complete tagset
for Penn Treebank.

 Tag Indeterminacy and tokenization
 Unknown Words
 Part of Speech Tagging for other Languages
 Combining taggers

 Tag indeterminacy arises when a word is
ambiguous between multiple tags and it is
very impossible or difficult to disambiguate.
 Some taggers allow the use of multiple tags.
 Example: Penn treebank, BNC
 Common tag indeterminacies include
adjective versus preterite vs past
participle(JJ/VBD/VBN)

 There are 3 ways to deal with tag
indeterminacy:
 Somehow replace the indeterminate tags
with only one tag.
 In testing, count a tagger as having
correctly tagged an indeterminate token if it
gives either of the correct tags. In training,
somehow choose only one of the tags
for the word.
 Treat the indeterminate tag as a single
complex tag.

 Second approach is the most sensible.
 Third approach is widely used, but it requires
more tags so that it will increase the size of
tagset for penn treebank,BNC etc.

 Issue coming in the differentiation of
periods(.) which is using in sentence final
markers and also as the word internal
period(eg:etc.,B.Tech)
 Second issue is in word splitting.
(example:children’s,wouldn’t)
 the special Treebank tag POS is used only for
the morpheme ’s which must be segmented
off during tokenization.

 Another tokenization issue concerns multi-part
words.
 The Treebank tagset assumes that tokenization
of words like New York is done at whitespace.
The phrase a New York City firm is tagged in
Treebank notation as five separate words:
a/DT New/NNP York/NNP City/NNP
firm/NN.
 The C5 tagset, by contrast, allow prepositions
like “in terms of” to be treated as a single word
by adding numbers to each tag, as in in/II31
terms/II32 of/II33.

 All the tagging algorithms we have discussed require
a dictionary that lists the possible parts-of-speech of
every word. But the largest dictionary will still not
contain every possible word. Proper names and
acronyms are created very often, and even new
common nouns and verbs enter the language at a
surprising rate.
 Therefore in order to build a complete tagger we
cannot always use a dictionary to giveus p(wi|ti). We
need some method for guessing the tag of an
unknown word.

 The simplest possible unknown-word
algorithm is to pretend that each unknown
word is ambiguous among all possible tags,
with equal probability.
 A slightly more complex algorithm is based
on the idea that the probability distribution of
tags over unknown words is very similar to
the distribution of tags over words that
occurred only once in the training set.
 These words that only occur once are known
as hapax legomena (singular hapax
legomenon).

 Most unknown word algorithms, however, make use
of a much more powerful source of information: the
morphology of the words.
 For example, words that end in -s are likely to be
plural nouns (NNS), words ending with -ed tend to be
past participles (VBN), words ending with able tend
to be adjectives (JJ), and so on.
 Even if we’ve never seen a word, we can use facts
about its morphological form to guess its part-of-
speech.

 Besides morphological knowledge,
orthographic information can be very helpful.
 For example words starting with capital letters
are likely to be proper nouns (NP).
 The presence of a hyphen is also a useful
feature; hyphenated words in the Treebank
version of Brown are most likely to be
adjectives (JJ).

 Part of speech tagging algorithms can also be
applied to other languages without larger
modifications.
 But a number of augmentations and changes
become necessary when dealing with highly
inflected or agglutinative languages.
 One problem with these languages is simply
the large number of words, when compared
to English.

 Agglutinative languages like Malayalam are
those in which words contain long strings of
morphemes.
 where each morpheme has relatively few
surface forms, and so it is often possible to
clearly see the morphemes in the surface
text.
 For these type of Languages, corpus may
contain large no of similar words from one
particular root.

 The large vocabulary size seems to cause a
significant degradation in tagging performance
when the HMM algorithm is applied directly
to agglutinative languages.
 one difficulty in tagging highly inflected and
agglutinative languages is tagging of unknown
words.

 A second issue with such language is vast
amount of information that is coded in the
morphology of the word.
 For this reason, tagsets for agglutinative and
highly inflectional languages are usually
much larger than the 50-100 tags we have
seen for English.

 Various part of speech tagging algorithms
can be combined.
 The most common approach to tagger
combination is to run multiple taggers in
parallel on the same sentence, and then
combine their output, either by voting or by
training another classifier to choose which
tagger to trust in a given context.

 Another option is to combine taggers in series.
This is using the rule-based approach to
remove some of the impossible tag
possibilities for each word, and then an HMM
tagger to choose the best sequence from the
remaining tags.

 The fundamental idea of constituency is that
groups of words may behave as a single unit
or phrase, called a constituent.
 For example we will see that a group of words
called a noun phrase often acts as a unit;
noun phrases include single words like she or
Michael and phrases like the house, Russian
Hill, and a well-weathered three-story
structure.

 Noun phrases can occur before verbs.
 Other kinds of evidence for constituency
come from what are called preposed or
postposed constructions. For example, the
prepositional phrase on September
seventeenth can be placed in a number of
different locations in the following examples,
including preposed at the beginning, and
postposed at the end.

 On September seventeenth, I’d like to fly from
Atlanta to Denver
 I’d like to fly on September seventeenth from
Atlanta to Denver
 I’d like to fly from Atlanta to Denver on
September seventeenth

 But again, while the entire phrase can be
placed differently, the individual words
making up the phrase cannot be:
 *On September, I’d like to fly seventeenth
from Atlanta to Denver
 *On I’d like to fly September seventeenth
from Atlanta to Denver
 *I’d like to fly on September from Atlanta to
Denver seventeenth

 The most commonly used mathematical
system for modelling constituent structure in
CFG English and other natural languages is
the Context-Free Grammar, or CFG.
Context free grammars are also called
Phrase-Structure Grammars, and the
formalism is equivalent to what is also called
Backus-Naur Form or BNF.

 A context-free grammar consists of a set of
rules or productions, each of which expresses
the ways that symbols of the language can be
grouped and ordered together, and a lexicon
of words and symbols.
 Example:
NP → Det Nominal
NP → ProperNoun
Nominal → Noun | Nominal Noun

 Context-free rules can be hierarchically
embedded, so we can combine the previous
rules with others like the following which
express facts about the lexicon:
Det → a
Det → the
Noun → flight

 Two classes:
 Terminals: The symbols that correspond to
words in the language (“the”, “nightclub”) are
called terminal symbols; the lexicon is the set
of rules that introduce these terminal
symbols.
 Nonterminals: The symbols that express
clusters or generalizations of these are called
non-terminals.

 In each context free rule, the item to the right
of the arrow (→) is an ordered list of one or
more terminals and non-terminals, while to
the left of the arrow is a single non-terminal
symbol expressing some cluster or
generalization.
 A CFG can be thought of in two ways: as a
device for generating sentences, and as a
device for assigning a structure to a given
sentence.

 sequence of rule expansions is called a
derivation of the string of words.
 Example:
EE+T/F
TT*F/F
Fid
 For the string a+b*c,
 EE+TE+T*FE+T*idE+F+idE+id*id
F+id*idid+id*ida+b*c
 It is common to represent a derivation by a
parse tree.

 a flight
 Starting from NP,
NPDet NominalDet Nouna flight
 Parse tree can be represented as follows:

 S → NP VP
 VP → Verb NP
 VP → Verb NP PP
 VP → Verb PP
 PP → Preposition NP

Parsing with CFG
14
8
 Syntactic parsing
◦ The task of recognizing a sentence and assigning a
syntactic structure to it
 Since CFGs are a declarative formalism, they
do not specify how the parse tree for a given
sentence should be computed.
 Parse trees are useful in applications such as
◦ Grammar checking
◦ Semantic analysis
◦ Machine translation
◦ Question answering
◦ Information extraction

Parsing with CFG 149
 The parser can be viewed as searching through
the space of all possible parse trees to find the
correct parse tree for the sentence.
 How can we use the grammar to produce the
parse tree?

 Top-down parsing

 Bottom-up parsing

Parsing with CFG
15
2
 Comparisons
◦ The top-down strategy never wastes time exploring
trees that cannot result in an S.
◦ The bottom-up strategy, by contrast, trees that
have no hope to leading to an S, or fitting in with
any of their neighbors, are generated with wild
abandon.
 The left branch of Fig. 10.4 is completely wasted effort.
◦ Spend considerable effort on S trees that are not
consistent with the input.
 The first four of the six trees in Fig. 10.3 cannot match
the word book.

 Use depth-first strategy

 A top-down,
depth-first, left-
to-right
derivation

 Adding bottom-up filtering
 Left-corner notion
◦ For nonterminals A and B, B is a left-corner of A if the following
relation holds:
A  B
 Using the left-corner notion, it is easy to see that only the S 
Aux NP VP rule is a viable candidate Since the word Does can not
server as the left-corner of other two S-rules.
*
S  NP VP
A  Aux NP VP
S  VP

Parsing with CFG
15
8
Category Left Corners
S Det, Proper-Noun, Aux, Verb
NP Det, Proper-Noun
Nominal Noun
VP Verb

Parsing with CFG
15
9
 Problems with the top-down parser
◦ Left-recursion
◦ Ambiguity
◦ Inefficiency reparsing of subtrees
 Then, introducing the Earley algorithm

 Exploring infinite search space, when left-recursive
grammars are used
 A grammar is left-recursive if it contains at least one
NT A, such that A  A, for some  and  and  ε.
* *
NP  Det Nominal
Det  NP ‘s
NP  NP PP
VP  VP PP
S  S and S
Left-recursive rules

Parsing with CFG
16
1
 Two reasonable methods for dealing with left-
recursion in a backtracking top-down parser:
◦ Rewriting the grammar
◦ Explicitly managing the depth of the search during
parsing
 Rewrite each rule of left-recursion
A  A | 

A   A’
A’   A | ε

Parsing with CFG
16
2
 Common structural ambiguity
◦ Attachment ambiguity
◦ Coordination ambiguity
◦ NP bracketing ambiguity

 Example of PP attachment

Parsing with CFG
16
4
 The gerundive-VP flying to Paris can be
◦ part of a gerundive sentence, or
◦ an adjunct modifying the VP
We saw the Eiffel Tower flying to Paris.

 The sentence “Can you book TWA flights” is ambiguous
◦ “Can you book flights on behalf of TWA”
◦ “Can you book flights run by TWA”

Parsing with CFG
16
6
 Coordination ambiguity
◦ Different set of phrases that can be conjoined by a
conjunction like and.
◦ For example old men and women can be
 [old [men and women]] or [old men] and [women]
 Parsing sentence thus requires
disambiguation:
◦ Choosing the correct parse from a multitude of
possible parser
◦ Requiring both statistical (Ch 12) and semantic
knowledge (Ch 17)

Parsing with CFG
16
7
 Parsers which do not incorporate disambiguators
may simply return all the possible parse trees for
a given input.
 We do not want all possible parses from the
robust, highly ambiguous, wide-coverage
grammars used in practical applications.
 Reason:
◦ Potentially exponential number of parses that are
possible for certain inputs
◦ Given the ATIS example:
 Show me the meal on Flight UA 386 from San Francisco to
Denver.
◦ The three PP’s at the end of this sentence yield a total of
14 parse trees for this sentence.

 The parser often builds valid
parse trees for portion of the
input, then discards them
during backtracking, only to
find that it has to rebuild them
again.
a flight
From Indianapolis
To Houston
On TWA
A flight from Indianapolis
A flight from Indianapolis to Houston
A flight from Indianapolis to Houston on TWA
4
3
2
1
3
2
1

Parsing with CFG
16
9
 Solving three kinds of problems afflicting
standard bottom-up or top-down parsers
 Dynamic programming providing a framework for
solving this problem
◦ Systematically fill in tables of solutions to sub-problems.
◦ When complete, the tables contain solution to all sub-
problems needed to solve the problem as a whole.
◦ Reducing an exponential-time problem to a polynomial-
time one by eliminating the repetitive solution of sub-
problems inherently iin backtracking approaches
◦ O(N3), where N is the number of words in the input

 (10.7) Book that flight
S  VP, [0,0]
NP  Det Nominal, [1,2]
VP  V NP, [0,3]

Parsing with CFG
17
3
Sequence of state created in
Chart while parsing Book that
flight including Structural
information

Parsing with CFG
17
4
 Partial parsing or shallow parsing
◦ Some language processing tasks do not require complete parses.
◦ E.g., information extraction algorithms generally do not extract all the
possible information in a text; they simply extract enough to fill out
some sort of template of required data.
 Many partial parsing systems use cascade of finite-state
automata instead of CFGs.
◦ Use FSA to recognize basic phrases, such as noun groups, verb
groups, locations, etc.
◦ FASTUS of SRI
Company Name:
Verb Group:
Noun Group:
Noun Group:
Verb Group
Noun Group:
Preposition
Location:
Preposition:
Noun Group:
Conjunction:
Noun Group:
Verb Group
Noun Group:
Verb Group:
Preposition
Location:
Bridgestone Sports Co.
said
Friday
it
had set up
a joint venture
in
Taiwan
with
a local concern
and
a Japanese trading hounse
to produce
golf clubs
to be shipped
to
Japan

Parsing with CFG
17
5
 Detection of noun group
NG  Pronoun | Time-NP | Date-NP she, him, them, yesterday
NG  (DETP) (Adjs) HdNns | DETP Ving HdNns the quick and dirty
solution,
the frustrating mathematics problem,
the rising index
DETP  DETP-CP | DET-INCP
DETP-CP  …
DETP-INCP  …
Adjs  AdjP …
AdjP  …
HdNns  HdNn …
HdNn  PropN |PreNs …
PreNs  PreN …
PreN  ..

Parsing
with
Unification-Based
Grammars

 Feature Structures and
Unification
 Unification-Based Grammars
 Chart Parsing with Unification-
Based Grammars
 Type Hierarchies

 We had a problem adding agreement to CFGs.
What we needed were features, e.g., a way to say:
◦ [number sg
person 3 ]
 A structure like this allows us to state properties,
e.g., about a noun phrase
◦ [cat NP
number sg
person 3 ]
 Each feature (e.g., ‘number’) is paired with a
value (e.g., ‘sg’)
◦ A bundle of feature-value pairs can be put into an
attribute-value matrix (AVM)

 Values can be atomic (e.g. ‘sg’ or ‘NP’ or ‘3’), or
can be complex, and thus we can define feature
paths
[cat NP
agreement [number sg
person 3]]
 The value of the path [agreement number] is ‘sg’
 A grammar with only atomic feature values can
be converted to a CFG.
◦ e.g. AVM on previous page  NP3,sg
◦ However, when the values are complex, it is more
expressive than a CFG  can represent more linguistic
phenomena

 Feature structures embedded in feature
structures can share the same values
 That is, two features have the exact same value—
they share precisely the same object as their
value
◦ we’ll indicate this with a tag like *1
[cat S
head [agr *1[num sg
per 3]
subj [agr *1 ]]]
 In this example, the agreement features of both
the matrix sentence and the embedded subject
are identical
 This is referred to as reentrancy

 Technically, feature structures are directed
acyclic graphs (DAGs)
 So, the feature structure represented by the
attribute-value matrix (AVM):
[cat NP
agreement [number sg
person 3]]
 is really the graph:
CAT
AGR



NUM
PER


sg
3
np

 Unification (U) = a basic operation to merge
two feature structures into a resultant feature
structure (FS)
 The two feature structures must be
compatible, i.e., have no values that conflict
 Identical FSs:
◦ [number sg] U [number sg] = [number sg]
 Conflicting FSs:
◦ [number sg] U [number pl] = Fail
 Merging with an unspecified FS:
◦ [number sg] U [number []] = [number sg]

 Merging FSs with different features specified:
◦ [number sg] U [person 3] = [number sg
person 3]
 More examples:
◦ [cat NP] U [agreement [number sg]] =
[cat NP
agreement [number sg]]
◦ [agr [num sg]
subj [agr [num sg]]] U [subj [agr [num
sg]]]=
[agr [num sg]
subj [agr [num sg]]]

 Remember that structure-sharing means they are the
same object:
[agr *1[num sg U [subj [agr [per 3
per 3] num sg]]]
subj [agr *1]]
= [agr *1[num sg
per 3]
subj [agr *1]]
 When unification takes place, shared values are copied
over:
[agr *1 U [sub [agr [per 3
subj [agr *1]] num sg]]]
=[agr *1
subj [agr *1[per 3
num sg]]]

 And remember that having similar values is not
the same as structure-sharing:
[agr [num sg] U [sub [agr [per 3
subj [agr [num sg]]] num sg]]]
= [agr [num sg]
subj [agr [per 3
num sg]]]
 With structure-sharing, you have to make sure
the values are compatible everywhere that
structure-sharing is specified
[agr *1[num sg U [agr [num sg
per 3] per 3] = Fail
subj [agr *1]] subj [agr [num pl
per 3]]]

 We can see that a more general feature structure
(less values specified) subsumes a more specific
feature structure
(1) [num sg]
(2) [per 3]
(3) [num sg
per 3]
 So, we have the following subsumption relations,
where
◦ (1) subsumes (3)
◦ (2) subsumes (3)
◦ (1) does not subsume (2), and (2) does not subsume (1)

 Syntactic constraints are difficult to express
using context free grammars alone.
 Feature structures and unification are the way
to elegantly express the syntactic constraints.
 The better way is to integrate feature
structures and unification operations into the
specification of a grammar.

 This can be accomplished by augmenting the
rules of ordinary context-free grammars with
attachments that specify feature structures for
the constituents of the rules, along with
appropriate unification operations that express
constraints on those constituents.

 to associate complex feature structures with
both lexical items and instances of
grammatical categories.
 to guide the composition of feature structures
for larger grammatical constituents based on
the feature structures of their component parts.
 to enforce compatibility constraints between
specified parts of grammatical constructions.

 β0 → β1 ・・・βn
{set of constraints}
The specified constraints have one of the
following forms.
βi feature path = Atomic value
βi feature path = βj feature path

 The notation βi feature path denotes a feature
path through the feature structure associated
with the βi component of the context-free part
of the rule.
 The first style of constraint specifies that the
value found at the end of the given path must
unify with the specified atomic value. The
second form specifies that the values found at
the end of the two given paths must be
unifiable.

 For example, the rule
S  NP VP
can be augmented with attachment of the
feature structure for number agreement as
follows:
S  NP VP
<NP number> = <VP number>

 If there are two or more constituents of the
same syntactic category in a rule, we will
subscript the constituents to keep them
straight, as in
VP→V NP1 NP2.
 in this approach the simple generative nature
of context-free rules has been fundamentally
changed by this augmentation.

 Agreement
 Grammatical Heads
 Sub categorization
 Long distance dependencies

 Discussing how unification can be used to
capture two types of English agreement
phenomena:
 Subject-Verb agreement
 Determiner-Nominal agreement

 Look at the following sentences:
 Does this flight serves breakfast?
 Do these flights serve breakfast?
 In these Qns, the subject NP must agree with
the Auxiliary verb rather than the main verb
of the sentence. This agreement constraint
can be handled by the following rule:
SAux NP VP
<Aux agreement>=<NP agreement>

 Handled in a similar fashion.
 Constraints can be enforced with grammar
rules as follows:
NPDet Nominal
<Det Agreement>=<Nominal Agreement>
<NP Agreement>=<Nominal Agreement>

 This rule states that the AGREEMENT feature of the
Det must unify with the AGREEMENT feature of the
Nominal, and moreover, that the AGREEMENT
feature of the NP must also unify with the Nominal.
 The simpler lexical constituents, Aux and Det, receive
values for their respective agreement features directly
from the lexicon as in the following rules.

 Returning to first S rule, let us first consider the
AGREEMENT feature for the VP constituent. The
constituent structure for this VP is specified by the
following rule.
VP → Verb NP
 It seems clear that the agreement constraint for this
constituent must be based on its constituent verb.
This verb, as with the previous lexical entries, can
acquire its agreement feature values directly from
lexicon as in the following rules.

 The features for most grammatical categories
are copied from one of the children to the
parent.
 The child that provides the features is called
the head of the phrase.
 features copied are referred to as head
features.

VP  Verb NP
<VP Agreement>=<Verb Agreement>
 verb is the head of the verb phrase.
 The constituent providing the agreement feature
structure to its parent is the head of the phrase.
 So, we can say ,agreement feature structure is a
head feature.

 We can rewrite our rules to reflect these
generalizations by placing the agreement
feature structure under a HEAD feature.
VP  Verb NP
<VP Agreement>=<Verb Agreement>
It becomes,
VP  Verb NP
<VP HEAD>=<Verb HEAD>

 Traditional grammar distinguishes between
transitive ( take a direct object NP) and
intransitive verbs( don’t).
 Traditional Grammars subcategorize verbs
into these two categories.
 Modern Grammars distinguishes as many as
100 subcategories.

 Feature structures are introduced to
distinguish among various members of the
verb category.
 Can accomplish this goal by associating with
each of the verbs in the lexicon an atomic
feature called SUBCAT, with an appropriate
value.

 Transitive version of serves could be assigned the
following feature structure in the lexicon:
Verbserves
<Verb HEAD AGREEMENT NUMBER>=sg
<Verb HEAD SUBCAT>=trans
 SUBCAT feature signals to the rest of the grammar
that this verb should only appear in verb phrases with
a single noun phrase argument.

 The model of sub categorization has two
components.
 Each head word has a SUBCAT feature which
contains a list of the complements it expects.
 Then phrasal rules like the VP rule match up
each expected complement in the SUBCAT list
with an actual constituent.
 This mechanism works fine when the
complements of a verb are in fact to be found
in the verb phrase.

 Sometimes, however, a constituent
subcategorized for by the verb is not locally
instantiated, but stands in a long-distance
relationship with its predicate.
 Example sentences:
 What cities does Continental service?
 What flights do you have from Boston
to Baltimore?
 What time does that flight leave
Atlanta?

 First example, the constituent what cities is
subcategorized for by the verb service, but
because the sentence is an example of a wh-
non-subject question, the object is located at
the front of the sentence.
 phrase-structure rule for a wh-non-subject-
question is something like the following
 S → Wh-NP Aux NP VP

 we can augment this phrase-structure rule to
require the Aux and the NP to agree (since the
NP is the subject).
 But we also need some way to augment the
rule to tell it that the Wh-NP should fill some
sub categorization slot in the VP.
 The representation of such long-distance
dependencies is a quite difficult problem,
because the verb whose sub categorization
requirement is being filled can be quite distant
from the filler

 One of the solution to long distant
dependencies.
 implemented as a feature GAP, which is
passed up from phrase to phrase in the parse
tree. The filler (for example, which flight
above) is put on the gap list, and must
eventually be unified with the sub
categorization frame of some verb.

 The unification operator takes two feature
structures as input and returns a single merged
feature structure if successful, or a failure
signal if the two inputs are not compatible.
 The input feature structures are represented as
directed acyclic graphs (DAGs), where
features are depicted as labels on directed
edges, and feature values are either atomic
symbols or DAGs.

 A notable aspect of this algorithm is that rather
than constructing a new feature structure with
the unified information from the two
arguments, it destructively alters the
arguments so that in the end they point to
exactly the same information.
 destructive nature of this algorithm
necessitates certain minor extensions to the
simple graph version of feature structures as
DAGs.

 each feature structure consists of two fields:
1.Content Field.
2. Pointer Field
The content field may be null or contain an
ordinary feature structure.
Pointer field may be null or contain a
pointer to another feature structure.

 If the pointer field of the DAG is null, then the
content field of the DAG contains the actual
feature structure to be processed.
 If the pointer field is non-null, then the
destination of the pointer represents the actual
feature structure to be processed.
 The merger aspects of unification will be
achieved by altering the pointer fields of
DAGs during processing.

 From figure, it can seen that first argument now
contains all the correct information, the second one
does not; it lacks a NUMBER feature.
 We could, of course, add a NUMBER feature to this
argument with a pointer to the appropriate place in
the first one.
 This change would result in the two arguments
having all the correct information from this
unification.
 Unfortunately, this solution is inadequate since it
does not meet our requirement that the two arguments
be truly unified.

 The solution to this problem is to simply set
the POINTER field of the second argument to
point at the first one. When this is done any
future change to either argument will be
immediately reflected in both.
 The following DAG will result after
unification.

 The first step in this algorithm is to acquire the
true contents of both of the arguments.
 Recall that if the pointer field of an extended
feature structure is non null, then the real
content of that structure is found by following
the pointer found in pointer field.
 The variables f1 and f2 are the result of this
pointer following process, often referred to as
dereferencing.

 The basic feature structures have two problems.
 The first problem is that there is no way to place a
constraint on what can be the value of a
feature.(eg:value of number can be sg or pl)
 The second problem with simple feature structures is
that there is no way to capture generalizations across
them.
 A general solution to both of these problems is the
use of types.

 Simple: Complex
An atomic symbol like sg or pl.
 All types are organized into a multiple-inheritance
type hierarchy (a kind of partial order called a lattice).

 Complex types are also part of the type
hierarchy.
 Subtypes of complex types inherit all the
features of their parents, together with the
constraints on their values.

 The simplest augmentation of the context-free grammar.
 Also known as stochastic Context Free Grammar(SCFG).
 a context-free grammar G is defined by four parameters (N,
S, P, S).
 a probabilistic context-free grammar augments each rule in
P with a conditional probability.

 PCFG differs from standard CFG by
augmenting each rule in R with a conditional
probability:
Aβ [p]
 Here p expresses the probability that the given
non-terminal A will be expanded to the
sequence b. That is, p is the conditional
probability of a given expansion b given the
left-hand-side (LHS) non-terminal A.
 i.e. P(Aβ)

 The sum of probabilities of all possible
expansion of a non terminal must be one.
 In the above example, we can see that all the
probabilities of expansions of S will results the
probability as 1.(.80+.15+.05)
 A PCFG is said to be consistent if the sum of
the probabilities of all sentences in the
language equals 1.

 poor independence assumptions: CFG rules
impose an independence assumption on
probabilities, resulting in poor modelling of
structural dependencies across the parse tree.
 lack of lexical conditioning: CFG rules don’t
model syntactic facts about specific words,
leading to problems with sub categorization
ambiguities, preposition attachment, and
coordinate structure ambiguities

 In a CFG the expansion of a non-terminal is
independent of the context, i.e., of the other nearby
non-terminals in the parse tree.
 Similarly, in a PCFG, the probability of a particular
rule like NPDet N is also independent of the rest of
the tree.
 the probability of a group of independent events is
the product of their probabilities.

 PCFGs can achieve extremely high parsing
accuracy if the grammar rule symbols are
redesigned via automatic splits and merges.
 An alternative model is , instead of modifying
grammar rules we can go for modification of
the probabilistic model of the parser to allow
for lexicalized rules.
 Examples are:Collins ,Charniak parsers.

 Each non terminal in the tree is annotated
with its lexical head.
 Example:
VPVBD NP PP
Can be extended as
VP(dumped)VBD(dumped) NP(sacks) PP(into)

 In some cases, it can be extended with the
head tag (part of speech tag of the head
words).
 VP(dumped,VBD) VBD(dumped,VBD) NP(sacks,NNS) PP(into,IN)
 Then,a lexicalized parse tree can be shown as
follws:

 Lexicalized grammar consists of two rules:
 Lexical Rules
 Internal Rules
 Express the expansion of a pre terminal to a word
 NNS(workers,NNS)- workers
express the other rule expansions.
NP(workers,NNS)NNS(workers,NNS)

 Rules in lexicalized grammar associate with
different probabilities.
 Lexical rules always have the probability
1.(eg:NN(bin,NN)can only expand to the
word bin.)
 But for internal Rules we need to estimate
probabilities.

 Given a treebank , we can compute the
probability of each expansion of a non
terminal by counting the number of times that
expansion occurs and then normalizing.

 Constituent and phrase structure rules do not
play any fundamental role.
 Instead, syntactic structures of a sentence is
described in terms of word and syntactic
relations between these words.
 Quite important in speech and language
processing.

 Strong predictive parsing power that words
have for their dependence.
 Knowing the identity of the verb can help in
deciding which noun is the subject or the
object.
 Ability to handle languages with relatively
free word order.

 A phrase structure grammar would need a
separate rule for each possible place in the
parse tree.
 Dependency grammar abstracts from word
order variation ,representing only the
information that is necessary for the parse.
 Example: stanford parser, Link grammar.

 Early lexicalized grammar model
 Two components:
Categorial lexicons-associate each word
with a syntactic and semantic category.
Combinatory rules- allow functions and
arguments to be combined.

 Categories are of two types:
Functors and Arguments
 Arguments, like nouns, have simple
categories like N. Verbs or determiners act as
functors.
 For example, a determiner can be thought of
as a function that applies to an N on its right
to produce an NP.
 Such complex categories can be built using
the X/Y and XY operators

 X/Y means a function from Y to X, that is,
something which combines with a Y on its
right to produce an X. Determiners thus
receive the category NP/N: something that
combines with an N on its right to produce an
NP.
 The simplest combination rules just combine
an X/Y with a Y on its right to produce an X
or a XY with a Y on its left to produce an X.

 Human sentence processing may use in
different probabilistic parsing methods?
 Recent studies shows that there are at least
two ways in which humans apply probabilistic
parsing algorithms.
 Still there are some disagreement on the
details.

 One family of studies has shown that when
humans read, the predictability of a word
seems to influence the reading time; more
predictable words are read more quickly.
 One way of defining predictability is from
simple bigram measures.
 It is found that the higher the bigram
predictability of a word, the shorter the time
that participants looked at the word (the initial-
fixation duration).

 The second family of studies has examined
how humans disambiguate sentences which
have multiple possible parses, suggesting
that humans prefer whichever parse is more
probable.
 Studies were done on garden path sentences.
 These are sentences which are cleverly
constructed to have three properties that
combine to make them very difficult for
people to parse.

 They are temporarily ambiguous: The sentence is
unambiguous, but its initial portion is ambiguous.
 One of the two or more parses in the initial portion is
somehow preferable to the human parsing mechanism.
 But the dispreferred parse is the correct one for the
sentence.
 The result of these three properties is that people are
“led down the garden path” toward the incorrect parse,
and then are confused when they realize it’s the wrong
one.

 Besides grammatical knowledge some other
factors are also there to influence human
parsing.

 Semantics is the study of the meaning of
linguistic utterances.
 Study of formal representations(capture the
meaning of linguistic utterances)
 Studying of Algorithms(Mapping from
linguistic utterance to meaning
representations)

 Meaning of linguistic utterance can be
captured in formal structures , called
meaning representations.
 Frame works used to specify the syntax
and semantics of this representation are
called meaning representation languages.

 In the representational approach, we take
linguistic inputs and construct meaning
representations that are made up of the same
kind of stuff that is used to represent this kind
of everyday commonsense knowledge of the
world. The process whereby such
representations are created and assigned to
linguistic inputs is called semantic analysis.

 Reading a menu and deciding what to order, giving
advice about where to go to dinner, following a
recipe, and generating new recipes all require deep
knowledge about food, its preparation, what people
like to eat and what restaurants are like.

 First order Logic
 Semantic Network
 Conceptual Dependency
 Frame based Representation

 Meaning Representation will be helpful for the
systems to give appropriate responses by using
a knowledge base of relevant domain
knowledge.

 Representations can be used to determine the relationship
between the meaning of a sentence and the world as we know
it.
 That is , we need to able to be determine the truth of our
representations.
 The system have the ability to compare the meaning of
representations with representation in knowledge base.
 Knowledge base is nothing but it stores the information about
its world.

 Maharani serves vegetarian food.
 We can gloss this representation as:
Serves(Maharani, VegetarianFood)
Representation will be matched against the
knowledge base of set of restaurants. If the
system finds matching , it can return an
affirmative answer.

 Other wise, it must either say NO if its
knowledge of local restaurants is
complete or say that it doesn’t know if
there is reason to believe that its
knowledge is incomplete. This notion is
known by the name verifiability.

 System’s ability to compare the state of
affairs described by a representation to
the states of affairs in some world as
modelled in a knowledge base.

 Domain of semantics also subject to
ambiguity.
 The answer generated by the system for this
request will depend on which interpretation is
chosen as the correct one.
 Ambiguities can be resolved by some means
of determining that certain interpretations are
preferable (alternatively not preferable) to
others is needed.

 A concept closely related to ambiguity is
vagueness.
 It will be difficult to determine what to do with
a particular input on the basis of its meaning
representation.
 But it doesn’t give rise to multiple
presentations.

 I want to eat Italian food.
 It provide enough information to a restaurant
advisor to provide reasonable
recommendations, but quite vague in terms
of what item the user wants to eat.
 The representation is useful in some
situations.

 There is possibility that distinct inputs may
lead to the same meaning representations.
 Examples:
 Does Maharani have vegetarian Dishes?
 Do they have vegetarian dishes at Maharani?
 Are vegetarian dishes serves at maharani?
 Does Maharani serve vegetarian fare?

 These alternatives using different words and
syntactic structures.
 Unfair to expect different meaning
representations.
 If KB contains only one meaning
representation, there is a chance to fail any one
of the above alternatives.
 If we have stored all alternative
representations, it will lead to some other
problems like keeping such a KB consistent.

 The notion that the input s that mean the same
thing should have the same meaning
representation is known as canonical form.
 This will simplifies various reasoning tasks
since systems need to deal with single
meaning representation.
 Canonical forms will complicate the task of
semantic analysis.

 Inference is the system’s ability to draw valid
conclusions based on the meaning
representation of inputs and its store of
background knowledge.
 To answer some type of request, it requires
complex kind of magic that involves the use of
variables.

 I’d like to find a restaurant where I can get vegetarian
food.
Representation can be expressed with the use of
variables as follows:
Serves(x, VegetarianFood)
Matching will be succeed only if the variables can be
replaced by some known object in the KB.

 A meaning representation must be expressive enough
to handle an extremely wide range of subject matter.
 A single meaning representation language that could
adequately represent the meaning of any sensible
natural language utterance.
 First Order Logic is expressive enough to handle
quite a lot of what needs to be represented.

 Words and sentences have parts that combine
in patterns, exhibiting the grammar of the
language.
 Syntax and semantics involve studying
patterns in sentence structure from the
vantages of form and meaning respectively.

Various methods by which human languages convey meaning
is:
 Conventional form meaning associations
 Word order regularities
 Tense systems
 Conjunctions and quantifiers
 Predicate argument structure.
Among this predicate argument structure have greatest
practical influence in the meaning structure of language.

 All human languages have a form of predicate
argument arrangement at the core of their
semantic structure.
 One of the most important jobs of a grammar
is to help organize this predicate argument
structure.

I want Italian food.
This can be classified as the following syntactic
argument frame.
NP want NP
 The syntactic frames specify the number, position and
syntactic category of the arguments that are expected
to accompany a verb.

 For example, the frame in the above example
specifies the following facts:
 There are two arguments to this predicate.
 Both arguments must be NPs.
 The first argument is pre-verbal and plays the
role of the subject.
 The second argument is post verbal and plays
the role of the direct object.

 These types of facts may lead valuable
information in syntax and meaning
representations.
 The notion of semantic roles can be
understood by looking the similarities among
the arguments. The study of roles associated
with specific verbs and across classes of verbs
is referred as thematic role or case role
analysis.

 The notion of semantic restrictions arises
directly from these semantic roles.
 Selection restriction is one among these
whereby verbs can specify semantic
restrictions on their arguments.
 The predicate argument structure is not only
based on verbs rather nouns, prepositions etc.
Following example will illustrate the concept
in more detail.

 An Italian restaurant under fifteen dollars.
 Meaning representation associated with the
preposition ‘under’ can have the following structure.
Under(ItaliaRestaurant,$15)
 Make a reservation for this evening for a table for
two persons at 8.
 Meaning representation will be:
Reservation(hearer,today,8 PM,2)
 Here the predicate argument structure is based on the
noun Reservation, rather than ‘make’ the main verb in
the phrase.

 The following are the different semantic
information that languages have:
 Variable arity predicate argument structures.
 The semantic labelling of arguments to
predicates.
 The statement of semantic constraints on the
fillers of argument roles.

 Flexible, well understood and computationally
tractable approach to the representation of
knowledge.
 Satisfies many of the requirements for a meaning
representation language.
 It makes very few specific commitments as to how
things ought to be represented.

 Term
 Predicates
 Logical connectives

 FOPC Device for representing objects. FOPC
provides 3 ways to represent these basic
building blocks.
Constants
Functions
Variables

 Refer to specific objects in the world being described.
 Constants are conventionally depicted as either single
capitalized letters or single capitalized words that are
often reminiscent of proper nouns such as Maharani
or Harry.
 Constants refer to exactly one object.
 Objects can have multiple constants that refer to
them.

 FOPC functions are syntactically same as single
argument predicates.
 Example: LocationOf (Maharani)
 Functions provide a way to refer to specific objects
without having to associate a named constant with
them.

 Gives us the ability to make assertions and
draw inferences about objects without having
to make reference to any particular named
object.
 Normally depicted as single lower case letters.

 One of the FOPC mechanism used to state
relations that hold among objects.
Eg:Maharani serves vegetarian food.
 Reasonable FOPC representation might look
like the following formula:
Serves(Maharani, VegetarianFood)
 Here Serves is a two place predicate holds
between the objects denoted by constants
Maharani and Vegetarian Food.

Maharani is a restaurant.
FOPC Representation
Restaurant(Maharani)
 This is an example of one place predicate that is used
not to relate multiple objects,rather to assert a
property of a single object.

 Larger composite representations can be put
together through the use of logical
connectives.
 It gives the ability to create larger
representations by conjoining logical formulas
using one of three operators.

 I only have five dollars and I don’t have a lot of time.
 Have (Speaker,FiveDollars) ∧ ┒Have(Speaker,LotOfTime)

 FOPC sentence can be assigned a value true or false
based on the propositions they encode are in accord
with the world or not.
 FOPC will identify the terms and predicates that
corresponds to the various grammatical elements of
the sentence, and creating logical formulas that
capture the relations implied by the words and syntax
of the sentence.

Ilahia is near Mulavoor.
It will yield the following representation.
Near(LocationOf(Ilahia),LocationOf(Mulavoor))
This sentence can be assigned a value true or false
based on whether or not the real Ilahia is actually
close to the Mulavoor or not.

 For determining the truth of our logical
formulas ,a database semantics may used.
 Atomic formula are taken to be true if they are
literally present in the KB or if they can be
inferred from other formula that are in
knowledge base.

 Variables are used in two ways in FOPC.
 To refer to particular anonymous objects
 To refer generically to all objects in a collection.
 These two uses are made possible through two
operators known as quantifiers.

 They are two types:
 Universal quantifier denoted ∀, and is
pronounced as “for all”.
 Existential quantifier denoted by ∃, and is
pronounced as “there exists”

 A restaurant that serves Chinese food near
Cochin.
 Here reference is made to an anonymous
object. So the reasonable representation of the
meaning is:
 ∃xRestaurant(x)
∧Serves(x,ChineseFood)
∧Near((LocationO f (x),LocationO f (Cochin)

 The sentence to be true if there must be at least
one object such that if we were to substitute it
for the variable x.
 ∧ indicate that the sentence will be true if all
atomic formulas are true.

 All vegetarian restaurants serve vegetarian Food.
 Reasonable representation for this sentence will be:
 ∀xVegetarianRestaurant(x)⇒ Serves(x,VegetarianFood)
 For this sentence to be true, it must be the case that
every substitution of a known object for x must result
in a sentence that is true.

 The ability to add valid new propositions to a KB.
 The most important inference method provided by
FOPC is modus ponens(informally known as if then
reasoning).
 We can abstractly define modus ponens as follows,
where α and β should be taken as FOPC formulas.

α
α ⇒ β
β
 In general, schemas like this indicate that the formula
below the line can be inferred from the formulas
above the line by some form of inference. Modus
ponens simply states that if the left-hand side of an
implication rule is present in the knowledge base,
then the right-hand side of the rule can be inferred.

 Modus ponens is typically put to practical use
in one of two ways:
Forward chaining
Backward chaining

 As soon as a new fact is added to the knowledge
base, all applicable implication rules are found and
applied, each resulting in the addition new facts to the
knowledge base. These new propositions in turn can
be used to fire implication rules applicable to them.
The process continues until no further facts can be
deduced.

 Modus ponens is run in reverse to prove specific propositions,
called queries.
 The first step is to see if the query formula is true by
 determining if it is present in the knowledge base.
 If it is not, then the next step is to search for applicable
implication rules present in the knowledge base. An applicable
rule is one where the consequent of the rule matches the query
formula.
 If there are any such rules, then the query can be proved if the
antecedent of any one them can be shown to be true.
 Not surprisingly, this can be performed recursively by
backward chaining on the antecedent as a new query.

 Both backward and forward chaining are sound, but
neither of them is complete.
 Another inference technique named resolution is
there, which is found to be sound and complete.
 But these are computationally more expensive.
 In practice, most of them uses any one of the
chaining methods.

 This sections will provide introductions to the
meaning representations of
Categories
Events
Time
Beliefs.

 Categories are represented in the following way:
◦ create a unary predicate for each category of interest.
◦ Such predicates are then asserted for each member of the
category.
 Example:VegetarianRestaurant(Maharani)
Unary Predicate
 Similar Logical formulas are included in our KB for each
known vegetarian restaurant.

 Here categories are relations, rather than full fledged
objects. There for difficult to make assertions about
categories themselves, rather than about their
individual members.
 Example:
MostPopular(Maharani,VegetarianRestaurant)
This is not legal FOPC formula.(Because arguments
must be terms, not other predicates)

 One way to solve this problem is to represent all the
concepts that we want to make statements about as
full-fledged objects via a technique called reification.
 In this case, we can represent the category of
VegetarianRestaurant as an object just as Maharani
is. The notion of membership in such a category is
then denoted via a membership relation as in the
following:
ISA(Maharani,VegetarianRestaurant)
 Relation denoted by ISA(is a) holds between object
and the categories in which they are members.

 This technique can be extended to create hierarchies
of categories through the use of other similar
relations, as in the following:
 AKO(VegetarianRestaurant,Restaurant)
 Here, the relation AKO (a kind of) holds between
categories and denotes a category inclusion
relationship.

 The representation of events consisted of
single predicates with as many arguments as
are needed to incorporate all the roles
associated with a given example.
 Example:
 Reservation(hearer,Maharani,Today,8 PM,2)

 In the case of verbs, this approach simply
assumes that the predicate representing the
meaning of a verb has the same number of
arguments as are present in the verb’s
syntactic sub categorization frame.
 The following problems will occur due to this
approach:

 Determining the correct number of roles for any
given event.
 Representing facts about the roles associated with an
event.
 Ensuring that all the correct inferences can be derived
directly from the representation of an event.
 Ensuring that no incorrect inferences can be derived
from the representation of an event.

 The following example will illustrate a series of
representation for events, based on the verb eat.
 I ate.
 I ate a vegetable sandwich.
 I ate a vegetable sandwich at my desk.
 I ate at my desk.
 I ate lunch.
 I ate a vegetable sandwich for lunch.
 I ate a vegetable sandwich for lunch at my desk.

 The variable number of arguments for a predicate
bearing verb like eat poses a tricky problem.
 If we think that all of these examples represents same
FOPC have fixed arity-they take a fixed number of
arguments.
 One possible solution is to create a sub categorization
frame for each of the configurations of arguments
that a verb follows.

 The semantic analog to this approach is to
create as many different eating predicates as
are needed to handle all of the ways that eat
behaves. Such an approach would yield the
following kinds of representations:

 Eating₁(Speaker)
 Eating₂(Speaker, VegetableSandwich)
 Eating₃(Speaker, VegetableSandwich, Desk)
 Eating₄(Speaker,Desk)
 Eating₅(Speaker,Lunch)
 Eating₆(Speaker,VegetableSandwich,Lunch)
 Eating₇(Speaker,VegetableSandwich,Lunch,Desk)

 Distinct predicates are created for each sub
categorization frame.
 This will solves the problem of how many arguments
to eat predicate.
 But it will results in high cost.
 Other than the suggestive names of the predicates,
there is nothing to tie these events to one another
even though there are obvious logical relations
among them.

 One method to solve this problem is meaning
postulates.
 This postulates explicitly ties together the
semantics of two of our predicates.
 ∀ w,x,y,z Eating₇(w,x,y,z) Eating ₆(w,x,y)

 In the previous discussions,reperesentations are done
in terms of events.
 Time can also be incorporated along with.
 The concerned domain is known as temporal logic.
 Here it deals with how human languages convey
temporal information along with tense logic, the ways
that verb tenses convey temporal information.

 Time always flows forward and that events associated
with either points or interval in time.
 an ordering can be imposed on distinct events by
situating them on the timeline.
 More specifically, we can say that one event
precedes another, if the flow of time leads from the
first event to the second.
 There are no of schemes for representing temporal
information.
 Here we are discussing one simple one that stays
within FOPC framework.

 I arrived in New York.
 I am arriving in New York.
 I will arrive in New York.
 These sentences will refer to the same kind of events
and differ only in the tense of verb.
 In the scheme representing events,all three would
share the following kind of information,which lacks
temporal information.

 Here w is an event variable. Similarly we can
also put temporal variable.
 Such an approach will yield to the following
representation:

 Unfortunately, the relation between simple verb
tenses and points in time is by no means
straightforward.
 Consider the following examples.
 Ok, we fly from San Francisco to Boston at 10.
 Flight 1390 will be at the gate an hour now.
 In the first example, the present tense of the verb fly
is used to refer to a future event, while in the second
the future tense is used to refer to a past event.

 More complications occur when we consider some of
the other verb tenses.
 Flight 1902 arrived late.
 Flight 1902 had arrived late.
 Although both refer to events in the past, representing
them in the same way seems wrong.
 The second example seems to have another unnamed
event lurking in the background (eg. Flight 1902 had
already arrived late when something else happened).

 To account for this phenomena, reichenbach(1947)
introduced the notion of a reference point. In our simple
temporal reference scheme, the current moment in time is
equated with the time of the utterance, and is used as a
reference point for when the event occurred (before, at,
Or after).
 In Reichenbach’s approach, the notion of the reference
point is separated out from the utterance time and the
event time.

 The following examples illustrate the basics of this
approach.
 When Mary’s flight departed, I ate lunch.
 When Mary’s flight departed, I had eaten lunch.
 In both of these examples, the eating event has
happened in the past, ie. prior to the utterance.
However, the verb tense in the first example indicates
that the eating event began when the flight departed,
while the second example indicates that the eating
was accomplished prior to the flight’s departure.

 Therefore, in Reichenbach’s terms the departure
event specifies the reference point. These facts can be
accommodated by asserting additional constraints
relating the eating and departure events. In the first
example, the reference point precedes the eating
event, and in the second example, the eating precedes
the reference point.

 Concerns a cluster of related topics, including
whether an event has ended or is ongoing, whether it
is conceptualized as happening at a point in time or
over some interval, and whether or not any particular
state in the world comes about because of it.
 Based on these and related notions, event expressions
have traditionally been divided into four general
classes:

 Statives
 Activities
 Accomplishments
 Achievements.

 Stative expressions represent the notion of an event
participant having a particular property, or being in a
state, at a given point in time.
 As such, they can be thought of as capturing an
aspect of a world at a single point in time.
 Example:
I know my departure gate.

 Stative verbs are distinctly odd when used in the
progressive form.
 Statives are also odd when used as imperatives.
 Statives are not easily modified by adverbs like
deliberately and carefully.

 Activity expressions describe events undertaken by a
participant that have no particular end-point. Unlike
statives, activities are seen as occurring over some
span of time, and are therefore not associated with
single points in time.
 Example:
 John is flying.

 Unlike statives, activity expressions are fine in both
the progressive and imperative forms.
Examples:
 She is living in Brooklyn.
 Drive a Mazda!
 However, like statives, activity expressions are odd
when temporally modified with temporal expressions
using in.
 Examples:
 I live in Brooklyn in a month.
She drove a Mazda in an hour.

 They can, however, successfully be used with for
temporal adverbials.
 Example:
I live in Brooklyn for a month.
She drove a Mazda for an hour.

 Unlike activities, accomplishment expressions
describe events that have a natural end-point and
result in a particular state.
He booked me a reservation.
United flew me to New York.
 In these examples, there is an event that is seen as
occurring over some period of time that ends when
the intended state is accomplished.

 A number of diagnostics can be used to
distinguish accomplishment events from
activities. Consider the following examples,
which make use of the word stop as a test.
 I stopped living in Brooklyn.
 She stopped booking my flight.

 In the first example, which is an activity, one
can safely conclude that the statement I lived
in Brooklyn even though this activity came to
an end.
 However, from the second example, one can
not conclude the statement She booked her
flight, since the activity was stopped before
the intended state was accomplished.
 Therefore, although stopping an activity
entails that the activity took place, stopping
an accomplishment event indicates that the
event did not succeed.

Natural Language Processing Topics for Engineering students

More Related Content

Similar to Natural Language Processing Topics for Engineering students (20)

Recently uploaded (20)

Natural Language Processing Topics for Engineering students