SlideShare a Scribd company logo
CE 634: Natural Language
Processing
Chapter 9
Building Feature based Grammar
Presented by:
Dr CRS Kumar
1
Overview
 Grammatical Features
 Processing Feature Structures
 Extending Feature based Grammar
 Summary
2
Goal of the chapter
 Natural languages have an extensive range of grammatical constructions which are hard
to handle with the simple methods described in 8.. In order to gain more flexibility, we
change our treatment of grammatical categories like S, NP and V. In place of atomic
labels, we decompose them into structures like dictionaries, where features can take on a
range of values.
 The goal of this chapter is to answer the following questions:
 How can we extend the framework of context free grammars with features so as to gain
more fine-grained control over grammatical categories and productions?
 What are the main formal properties of feature structures and how do we use them
computationally?
 What kinds of linguistic patterns and grammatical constructions can we now capture
with feature based grammars?
 Along the way, we will cover more topics in English syntax, including phenomena such
as agreement, sub categorization, and unbounded dependency constructions.
3
Grammatical Features
 we described how to build classifiers that rely on detecting features of text.
 Such features may be quite simple, such as extracting the last letter of a word, or more
complex, such as a part-of-speech tag which has itself been predicted by the classifier.
 In this chapter, we will investigate the role of features in building rule-based grammars.
 In contrast to feature extractors, which record features that have been automatically detected,
we are now going to declare the features of words and phrases.
 We start off with a very simple example, using dictionaries to store features and their values.
 >>> kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}
 >>> chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}
 The objects kim and chase both have a couple of shared features, CAT (grammatical category)
and ORTH (orthography, i.e., spelling).
 In addition, each has a more semantically-oriented feature: kim['REF'] is intended to give the
referent of kim, while chase['REL'] gives the relation expressed by chase.
 In the context of rule-based grammars, such pairings of features and values are known
as feature structures, and we will shortly see alternative notations for them.
4
Feature structures
 Feature structures contain various kinds of information about grammatical entities.
 The information need not be exhaustive, and we might want to add further
properties.
 For example, in the case of a verb, it is often useful to know what "semantic role" is
played by the arguments of the verb.
 In the case of chase, the subject plays the role of "agent", while the object has the
role of "patient".
 Let's add this information, using 'sbj' and 'obj' as placeholders which will get filled
once the verb combines with its grammatical arguments:
 >>> chase['AGT'] = 'sbj‘
 >>> chase['PAT'] = 'obj'
 If we now process a sentence Kim chased Lee, we want to "bind" the verb's agent
role to the subject and the patient role to the object.
 We do this by linking to the REF feature of the relevant NP.
5
Example
 we make the simple-minded assumption that the NPs immediately to the left and right of the verb are
the subject and object respectively.
 We also add a feature structure for Lee to complete the example.
 >>> sent = "Kim chased Lee“
 >>> tokens = sent.split()
 >>> lee = {'CAT': 'NP', 'ORTH': 'Lee', 'REF': 'l'}
 >>> def lex2fs(word):
 ... for fs in [kim, lee, chase]:
 ... if fs['ORTH'] == word:
 ... return fs
 >>> subj, verb, obj = lex2fs(tokens[0]), lex2fs(tokens[1]), lex2fs(tokens[2])
 >>> verb['AGT'] = subj['REF']
 >>> verb['PAT'] = obj['REF']
 >>> for k in ['ORTH', 'REL', 'AGT', 'PAT']:
 ... print("%-5s => %s" % (k, verb[k]))
 ORTH => chased REL => chase AGT => k PAT => l
6
Ad hoc
 Feature structures are pretty powerful, but the way in which we have
manipulated them is extremely ad hoc.
 Our next task in this chapter is to show how the framework of context free
grammar and parsing can be expanded to accommodate feature structures, so
that we can build analyses like this in a more generic and principled way.
 We will start off by looking at the phenomenon of syntactic agreement; we will
show how agreement constraints can be expressed elegantly using features, and
illustrate their use in a simple grammar.
 Since feature structures are a general data structure for representing information
of any kind, we will briefly look at them from a more formal point of view, and
illustrate the support for feature structures offered by NLTK.
 In the final part of the chapter, we demonstrate that the additional
expressiveness of features opens up a wide spectrum of possibilities for
describing sophisticated aspects of linguistic structure.
7
Syntactic Agreement
 The following examples show pairs of word sequences, the first of which is grammatical and the
second not. (We use an asterisk at the start of a word sequence to signal that it is ungrammatical.)
 (1)a.this dog
 b.*these dog
 (2)a.these dogs
 b.*this dogs
 In English, nouns are usually marked as being singular or plural. The form of the demonstrative also
varies: this (singular) and these (plural).
 show that there are constraints on the use of demonstratives and nouns within a noun phrase: either
both are singular or both are plural. A similar constraint holds between subjects and predicates:
 (3)a.the dog runs
 b.*the dog run
 (4)a.the dogs run
 b.*the dogs runs
 Here we can see that morphological properties of the verb co-vary with syntactic properties of the
subject noun phrase. This co-variance is called agreement.
8
Agreement Paradigm for English
Regular Verbs
 singular plural
 1st perI runwe run
 2nd per you runyou run
 3rd per he/she/it runsthey run
 We can make the role of morphological properties a bit more explicit as illustrated in ex-runs and ex-run.
 These representations indicate that the verb agrees with its subject in person and number.
 (We use "3" as an abbreviation for 3rd person, "SG" for singular and "PL" for plural.)
 Let's see what happens when we encode these agreement constraints in a context-free grammar. We will
begin with the simple CFG in (5).
 (5)S -> NP VP
 NP -> Det N
 VP -> V
 Det -> 'this'
 N -> 'dog'
 V -> 'runs'
 Grammar (5) allows us to generate the sentence this dog runs; however, what we really want to do is also
generate these dogs run while blocking unwanted sequences like *this dogs run and *these dog runs.
9
Blowing up
 The most straightforward approach is to add new non-terminals and productions to the grammar:
 (6)S -> NP_SG VP_SG
 S -> NP_PL VP_PL
 NP_SG -> Det_SG N_SG
 NP_PL -> Det_PL N_PL
 VP_SG -> V_SG
 VP_PL -> V_PL
 Det_SG -> 'this'
 Det_PL -> 'these'
 N_SG -> 'dog'
 N_PL -> 'dogs'
 V_SG -> 'runs'
 V_PL -> 'run'
 In place of a single production expanding S, we now have two productions, one covering the
sentences involving singular subject NPs and VPs, the other covering sentences with plural
subject NPs and VPs.
10
Growing grammar size
 In fact, every production in (5) has two counterparts in (6). With a small
grammar, this is not really such a problem, although it is aesthetically
unappealing.
 However, with a larger grammar that covers a reasonable subset of
English constructions, the prospect of doubling the grammar size is very
unattractive.
 Let's suppose now that we used the same approach to deal with first,
second and third person agreement, for both singular and plural.
 This would lead to the original grammar being multiplied by a factor of 6,
which we definitely want to avoid.
 Can we do better than this? In the next section we will show that
capturing number and person agreement need not come at the cost of
"blowing up" the number of productions.
11
Using Attributes and
Constraints
 We spoke informally of linguistic categories having properties; for example, that a noun has the property of being
plural. Let's make this explicit:
 (7)N[NUM=pl]
 In (7), we have introduced some new notation which says that the category N has a
(grammatical) feature called NUM (short for 'number') and that the value of this feature is pl (short for 'plural').
 We can add similar annotations to other categories, and use them in lexical entries:
 (8)Det[NUM=sg] -> 'this' Det[NUM=pl] -> 'these' N[NUM=sg] -> 'dog' N[NUM=pl] -> 'dogs' V[NUM=sg] -> 'runs'
V[NUM=pl] -> 'run'
 Does this help at all? So far, it looks just like a slightly more verbose alternative to what was specified in (6).
 Things become more interesting when we allow variables over feature values, and use these to state constraints:
 (9)S -> NP[NUM=?n] VP[NUM=?n] NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n] VP[NUM=?n] -> V[NUM=?n]
 We are using ?n as a variable over values of NUM; it can be instantiated either to sg or pl, within a given production.
 We can read the first production as saying that whatever value NP takes for the feature NUM, VP must take the
same value.

12
Trees of depth one
 in order to understand how these feature constraints work,
it's helpful to think about how one would go about building
a tree.
 Lexical productions will admit the following local trees
(trees of depth one):
13
Combining productions
 Prohibited:
14
VP[NUM=?n] -> V[NUM=?n]
 Production VP[NUM=?n] -> V[NUM=?n] says that the NUM value of the
head verb has to be the same as the NUM value of the VP parent.
 Combined with the production for expanding S, we derive the consequence
that if the NUM value of the subject head noun is pl, then so is the NUM value
of the VP’s head verb.
15
Unspecified NUM
 Grammar illustrated lexical productions for determiners like this and these, which
require a singular or plural head noun respectively.
 However, other determiners in English are not choosy about the grammatical number of
the noun they combine with.
 One way of describing this would be to add two lexical entries to the grammar, one each
for the singular and plural versions of a determiner such as the:
 Det[NUM=sg] -> 'the' | 'some' | 'several'
 Det[NUM=pl] -> 'the' | 'some' | 'several'
 However, a more elegant solution is to leave the NUM value underspecified and let it
agree in number with whatever noun it combines with. Assigning a variable value to
 NUM is one way of achieving this result:
 Det[NUM=?n] -> 'the' | 'some' | 'several'
16
Example feature-based grammar.
 >>> nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')
 % start S
 # ###################
 # Grammar Productions
 # ###################
 # S expansion productions
 S -> NP[NUM=?n] VP[NUM=?n]
 # NP expansion productions
 NP[NUM=?n] -> N[NUM=?n]
 NP[NUM=?n] -> PropN[NUM=?n]
 NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]
 NP[NUM=pl] -> N[NUM=pl]
 # VP expansion productions
 VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]
 VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP
17
Lexical productions
 # ###################
 # Lexical Productions
 # ###################
 Det[NUM=sg] -> 'this' | 'every'
 Det[NUM=pl] -> 'these' | 'all'
 Det -> 'the' | 'some' | 'several'
 PropN[NUM=sg]-> 'Kim' | 'Jody'
 N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'
 N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'
 IV[TENSE=pres, NUM=sg] -> 'disappears' | 'walks'
 TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'
 IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk'
 TV[TENSE=pres, NUM=pl] -> 'see' | 'like'
 IV[TENSE=past] -> 'disappeared' | 'walked'
 TV[TENSE=past] -> 'saw' | 'liked'
18
Fcfg file
 Notice that a syntactic category can have more than one feature: for example,
 V[TENSE=pres, NUM=pl].
 In general, we can add as many features as we like.
 A final detail about Example 9-1 is the statement %start S. This “directive”
tells the parser to take S as the start symbol for the grammar.
 In general, when we are trying to develop even a very small grammar, it is
convenient to put the productions in a file where they can be edited, tested,
and revised.
 We have saved Example 9-1 as a file named feat0.fcfg in the NLTK data
distribution.
 You can make your own copy of this for further experimentation using
nltk.data.load().
19
Earley chart parser
 Feature-based grammars are parsed in NLTK using an Earley chart
parser
 After tokenizing the input, we import the load_parser function , which
takes a grammar filename as input and returns a chart parser cp .
 Calling the parser’s nbest_parse() method will return a list trees of
parse trees;
 trees will be empty if the grammar fails to parse the input and
otherwise will contain one or more parse trees,
 depending on whether the input is syntactically ambiguous.
20
Trace of feature-based chart
parser.
 >>> tokens = 'Kim likes children'.split()
 >>> from nltk import load_parser
 >>> cp = load_parser('grammars/book_grammars/feat0.fcfg', trace=2)
 >>> trees = cp.nbest_parse(tokens)
 |.Kim .like.chil.|
 |[----] . .| PropN[NUM='sg'] -> 'Kim' *
 |[----] . .| NP[NUM='sg'] -> PropN[NUM='sg'] *
 |[----> . .| S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'}
 |. [----] .| TV[NUM='sg', TENSE='pres'] -> 'likes' *
 |. [----> .| VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[]
 {?n: 'sg', ?t: 'pres'}
 |. . [----]| N[NUM='pl'] -> 'children' *
 |. . [----]| NP[NUM='pl'] -> N[NUM='pl'] *
 |. . [---->| S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'}
 |. [---------]| VP[NUM='sg', TENSE='pres']
 -> TV[NUM='sg', TENSE='pres'] NP[] *
 |[==============]| S[] -> NP[NUM='sg'] VP[NUM='sg'] *
21
Flow Upwards
 there is an implementation issue which bears on our earlier discussion of grammar size.
 One possible approach to parsing productions containing feature constraints is to
compile out all admissible values of the features in question so that we end up with a
large, fully specified CFG
 By contrast, the parser process illustrated in the previous examples works directly with
the underspecified productions given by the grammar.
 Feature values “flow upwards” from lexical entries, and variable values are then
associated with those values via bindings (i.e., dictionaries) such as {?n: 'sg', ?t: 'pres'}.
 As the parser assembles information about the nodes of the tree it is building, these
variable bindings are used to instantiate values in these nodes;
 thus the underspecified VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] NP[]
becomes instantiated as VP[NUM='sg', TENSE='pres'] -> TV[NUM='sg', TENSE='pres']
NP[] by looking up the values of ?n and ?t in the bindings.
22
Atomic values
 So far, we have only seen feature values like sg and pl. These simple values
are usually called atomic—that is, they can’t be decomposed into subparts.
 A special case of atomic values are Boolean values, that is, values that just
specify whether a property is true or false.
 For example, we might want to distinguish auxiliary verbs such as can may,
will, and do with the Boolean feature AUX. Then the production
V[TENSE=pres, aux=+] -> 'can' means that can receives the value pres for
TENSE and + or true for AUX.
 There is a widely adopted convention that abbreviates the representation of
Boolean features f; instead of aux=+ or aux=-, we use +aux and -aux
respectively.
 These are just abbreviations, however, and the parser interprets them as
though + and - are like any other atomic value.
23
Feature Annotations
 shows some representative productions:
 (17) V[TENSE=pres, +aux] -> 'can'
 V[TENSE=pres, +aux] -> 'may'
 V[TENSE=pres, -aux] -> 'walks'
 V[TENSE=pres, -aux] -> 'likes'
 We have spoken of attaching “feature annotations” to syntactic categories.
 A more radical approach represents the whole category—that is, the non-
terminal symbol plus the annotation—as a bundle of features.
 For example, N[NUM=sg] contains part-ofspeech information which can be
represented as POS=N.
 An alternative notation for this category, therefore, is [POS=N, NUM=sg].
24
Attribute Value Matrix
 In addition to atomic-valued features, features may take values that are themselves
feature structures.
 For example, we can group together agreement features (e.g., person, number, and
gender) as a distinguished part of a category, serving as the value of AGR.
 In this case, we say that AGR has a complex value. (18) depicts the structure, in a
format known as an attribute value matrix (AVM).
 [POS = N ]
 [ ]
 [AGR = [PER = 3 ] ]
 [ [NUM = pl ] ]
 [ [GND = fem ] ]
 Rendering a feature structure as an attribute value matrix.
25
Bundling Agreement Features
 Once we have the possibility of using features like AGR, we can
refactor a grammar like so that agreement features are bundled
together.
 A tiny grammar illustrating this idea is shown in (20).
 (20) S -> NP[AGR=?n] VP[AGR=?n]
 NP[AGR=?n] -> PropN[AGR=?n]
 VP[TENSE=?t, AGR=?n] -> Cop[TENSE=?t, AGR=?n] Adj
 Cop[TENSE=pres, AGR=[NUM=sg, PER=3]] -> 'is'
 PropN[AGR=[NUM=sg, PER=3]] -> 'Kim'
 Adj -> 'happy'
26
Processing Feature Structures
 we will show how feature structures can be constructed and manipulated in NLTK.
 We will also discuss the fundamental operation of unification, which allows us to combine the information
contained in two different feature structures.
 Feature structures in NLTK are declared with the FeatStruct() constructor.
 Atomic feature values can be strings or integers.
 >>> fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')
 >>> print fs1
 [ NUM = 'sg' ]
 [ TENSE = 'past' ]
 A feature structure is actually just a kind of dictionary, and so we access its values by indexing in the usual
way.
 We can use our familiar syntax to assign values to features:
 >>> fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')
 >>> print fs1['GND']
 fem

>>> fs1['CASE'] = 'acc'
27
Complex Values in FeatStruct
 We can also define feature structures that have complex values, as discussed earlier.
 >>> fs2 = nltk.FeatStruct(POS='N', AGR=fs1)
 >>> print fs2
 [ [ CASE = 'acc' ] ]
 [ AGR = [ GND = 'fem' ] ]
 [ [ NUM = 'pl' ] ]
 [ [ PER = 3 ] ]
 [ ]
 [ POS = 'N' ]
 >>> print fs2['AGR']
 [ CASE = 'acc' ]
 [ GND = 'fem' ]
 [ NUM = 'pl' ]
 [ PER = 3 ]
 >>> print fs2['AGR']['PER']
 3
28
Directed Acyclic Graphs
 Feature structures are not inherently tied to linguistic objects; they are general-purpose structures for
representing knowledge.
 For example, we could encode information about a person in a feature structure:
 >>> print nltk.FeatStruct(name='Lee', telno='01 27 86 42 96', age=33)
 [ age = 33 ]
 [ name = 'Lee' ]
 [ telno = '01 27 86 42 96' ]
 In the next couple of pages, we are going to use examples like this to explore standard operations
over feature structures.
 This will briefly divert us from processing natural language, but we need to lay the groundwork
before we can get back to talking about grammars.
 It is often helpful to view feature structures as graphs, more specifically, as directed acyclic graphs
(DAGs). (21) is equivalent to the preceding AVM.
 The feature names appear as labels on the directed arcs, and feature values appear as labels on the
nodes that are pointed to by the arcs.
29
Feature Path
 When we look at such graphs, it is
natural to think in terms of paths
through the graph.
 A feature path is a sequence of arcs
that can be followed from the root
node.
 We will represent paths as tuples of
arc labels. Thus, ('ADDRESS',
'STREET') is a feature path whose
value is the node labeled 'rue Pascal'.
30
Structure Sharing
 the value of the path ('ADDRESS') in (24) is identical to the value of
th path ('SPOUSE', 'ADDRESS'). DAGs such as (24) are said to
involve structure sharing or reentrancy.
 When two paths have the same value, they are said to be
equivalent.
 In order to indicate reentrancy in our matrix-style representations, we
will prefix the
 first occurrence of a shared feature structure with an integer in
parentheses, such as
 (1). Any later reference to that structure will use the notation ->(1), as
shown here.
 >>> print nltk.FeatStruct("""[NAME='Lee', ADDRESS=(1)
[NUMBER=74, STREET='rue Pascal'],
 ... SPOUSE=[NAME='Kim', ADDRESS->(1)]]""")
 [ ADDRESS = (1) [ NUMBER = 74 ] ]
 [ [ STREET = 'rue Pascal' ] ]
 [ ]
 [ NAME = 'Lee' ]
 [ ]
 [ SPOUSE = [ ADDRESS -> (1) ] ]
 [ [ NAME = 'Kim' ] ]
 The bracketed integer is sometimes called a tag or a coindex.
31
Subsumption
 It is standard to think of feature structures as providing partial information about some object, in
the sense that we can order feature structures according to how general they are.
 For example, (25a) is more general (less specific) than (25b), which in turn is more general than
(25c).
 (25) a. [NUMBER = 74]
 b. [NUMBER = 74 ]
 [STREET = 'rue Pascal']
 c. [NUMBER = 74 ]
 [STREET = 'rue Pascal']
 [CITY = 'Paris' ]
 This ordering is called subsumption; a more general feature structure subsumes a less general
one. If FS0 subsumes FS1 (formally, we write FS0 FS1), then FS1 must have
⊑ all the paths and
path equivalences of FS0, and may have additional paths and equivalences as well.
 Thus, (23) subsumes (24) since the latter has additional path equivalences.
 It should be obvious that subsumption provides only a partial ordering on feature structures, since
some feature structures are incommensurable. For example, (26) neither subsumes nor is subsumed
32
unification
 How do we go about specializing a given feature structure? For example, we
might decide that addresses should consist of not just a street number and a
street name, but also a city.
 That is, we might want to merge graph (27a) with (27b) to yield (27c).
 Merging information from two feature structures is called unification and is
supported by the unify() method.
 >>> fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')
 >>> fs2 = nltk.FeatStruct(CITY='Paris')
 >>> print fs1.unify(fs2)
 [ CITY = 'Paris' ]
 [ NUMBER = 74 ]
 [ STREET = 'rue Pascal' ]
33
Unification
34
Subcategorization
 we augmented our category labels to represent different kinds of verb, and
used the labels IV and TV for intransitive and transitive verbs respectively.
 This allowed us to write productions like the following:
 (27)VP -> IV VP -> TV NP
 Although we know that IV and TV are two kinds of V, they are just atomic
nonterminal symbols from a CFG, as distinct from each other as any other pair
of symbols.
 This notation doesn't let us say anything about verbs in general, e.g. we cannot
say "All lexical items of category V can be marked for tense", since walk, say,
is an item of category IV, not V.
 So, can we replace category labels such as TV and IV by V along with a
feature that tells us whether the verb combines with a following NP object or
whether it can occur without any complement?
35
Generalized Phrase Structure
Grammar
 A simple approach, originally developed for a grammar framework called
Generalized Phrase Structure Grammar (GPSG), tries to solve this problem by
allowing lexical categories to bear a SUBCAT which tells us what
subcategorization class the item belongs to.
 While GPSG used integer values for SUBCAT, the example below adopts
more mnemonic values, namely intrans, trans and clause:
 (28)VP[TENSE=?t, NUM=?n] -> V[SUBCAT=intrans, TENSE=?t, NUM=?n]
VP[TENSE=?t, NUM=?n] -> V[SUBCAT=trans, TENSE=?t, NUM=?n] NP
VP[TENSE=?t, NUM=?n] -> V[SUBCAT=clause, TENSE=?t, NUM=?n]
SBar V[SUBCAT=intrans, TENSE=pres, NUM=sg] -> 'disappears' | 'walks'
V[SUBCAT=trans, TENSE=pres, NUM=sg] -> 'sees' | 'likes'
V[SUBCAT=clause, TENSE=pres, NUM=sg] -> 'says' | 'claims'
36
SUBCAT
 When we see a lexical category like V[SUBCAT=trans], we can interpret the SUBCAT
specification as a pointer to a production in which V[SUBCAT=trans] is introduced as the head
child in a VP production.
 By convention, there is a correspondence between the values of SUBCAT and the productions
that introduce lexical heads.
 On this approach, SUBCAT can only appear on lexical categories; it makes no sense, for
example, to specify a SUBCAT value on VP.
 As required, walk and like both belong to the category V.
 Nevertheless, walk will only occur in VPs expanded by a production with the feature
SUBCAT=intrans on the right hand side, as opposed to like, which requires a SUBCAT=trans.
 In our third class of verbs above, we have specified a category SBar. This is a label for
subordinate clauses such as the complement of claim in the example You claim that you like
children.
 We require two further productions to analyze such sentences:
 (29)SBar -> Comp S
 Comp -> 'that'
37
SUBCAT tree representation
38
Categorical grammar
 An alternative treatment of subcategorization, due originally to a framework known
as categorial grammar, is represented in feature based frameworks such as PATR and
Head-driven Phrase Structure Grammar.
 Rather than using SUBCAT values as a way of indexing productions, the SUBCAT
value directly encodes the valency of a head (the list of arguments that it can
combine with).
 For example, a verb like put that takes NP and PP complements (put the book on the
table) might be represented as (31):
 (31)V[SUBCAT=<NP, NP, PP>]
 This says that the verb can combine with three arguments. The leftmost element in
the list is the subject NP, while everything else — an NP followed by a PP in this
case — comprises the subcategorized-for complements.
 When a verb like put is combined with appropriate complements, the requirements
which are specified in the SUBCAT are discharged, and only a subject NP is needed
39
Tree representation
 This category, which corresponds to what is traditionally thought of as VP,
might be represented as follows.
 (32)V[SUBCAT=<NP>]
 Finally, a sentence is a kind of verbal category that has no requirements for
further arguments, and hence has a SUBCAT whose value is the empty list.
 The tree (33) shows how these category assignments combine in a parse of
Kim put the book on the table.
40
Heads revisited
 We noted in the previous section that by factoring subcategorization information out of
the main category label, we could express more generalizations about properties of
verbs.
 Another property of this kind is the following: expressions of category V are heads of
phrases of category VP. Similarly, Ns are heads of NPs, As (i.e., adjectives) are heads of
APs, and Ps (i.e., prepositions) are heads of PPs.
 Not all phrases have heads—for example, it is standard to say that coordinate phrases
(e.g., the book and the bell) lack heads.
 Nevertheless, we would like our grammar formalism to express the parent/head-child
relation where it holds.
 At present, V and VP are just atomic symbols, and we need to find a way to relate them
using features (as we did earlier to relate IV and TV).
41
Phrasal Level
 X-bar syntax addresses this issue by abstracting out the notion of phrasal level.
 It isusual to recognize three such levels.
 If N represents the lexical level, then N' represents the next level up, corresponding to
the more traditional category Nom, and N'' represents the phrasal level, corresponding to
the category NP.
 (36a) illustrates a representative structure, while (36b) is the more conventional
counterpart.
42
Projections
 The head of the structure (36a) is N, and N' and N'' are called (phrasal)
projections of N.
 N'' is the maximal projection, and N is sometimes called the zero
projection.
 One of the central claims of X-bar syntax is that all constituents share a
structural similarity.
 Using X as a variable over N, V, A, and P, we say that directly subcategorized
complements of a lexical head X are always placed as siblings of the head,
whereas adjuncts are placed as siblings of the intermediate category, X'.
 Thus, the configuration of the two P'' adjuncts in (37) contrasts with that of the
complement P'' in (36a).
43
Bar levels
 The productions in (38) illustrate how bar levels can be encoded using feature structures.
 The nested structure in (37) is achieved by two applications of the recursive rule
 expanding N[BAR=1].
 (38) S -> N[BAR=2] V[BAR=2]
 N[BAR=2] -> Det N[BAR=1]
 N[BAR=1] -> N[BAR=1] P[BAR=2]
 N[BAR=1] -> N[BAR=0] P[BAR=2]
44
Auxiliary Verbs and Inversion
 Inverted clauses—where the order of subject and verb is switched—occur in English
interrogatives and also after “negative” adverbs:
 (39) a. Do you like children?
 b. Can Jody walk?
 (40) a. Rarely do you see Kim.
 b. Never have I seen this dog.
 However, we cannot place just any verb in pre-subject position:
 (41) a. *Like you children?
 b. *Walks Jody?
 (42) a. *Rarely see you Kim.
 b. *Never saw I this dog.
 Verbs that can be positioned initially in inverted clauses belong to the class known as
auxiliaries, and as well as do, can, and have include be, will, and shall.
45
Inverted Clause
 That is, a clause marked as [+inv] consists of an auxiliary verb
followed by a VP.
 (In a more detailed grammar, we would need to place some constraints
on the form of the VP, depending on the choice of auxiliary.) (44)
illustrates the structure of an inverted clause:
46
Unbounded Dependency
Constructions
 Consider the following contrasts:
 (45) a. You like Jody.
 b. *You like.
 (46) a. You put the card into the slot.
 b. *You put into the slot.
 c. *You put the card.
 d. *You put.
 The verb like requires an NP complement, while put requires both a
following NP and PP.
 (45) and (46) show that these complements are obligatory: omitting
them leads to ungrammaticality.
47
Filler and Gaps
 . Yet there are contexts in which obligatory complements can be omitted, as (47) and
(48) illustrate.
 (47) a. Kim knows who you like.
 b. This music, you really like.
 (48) a. Which card do you put into the slot?
 b. Which slot do you put the card into?
 That is, an obligatory complement can be omitted if there is an appropriate filler in the
sentence, such as the question word who in (47a), the preposed topic this music in (47b),
or the wh phrases which card/slot in (48).
 It is common to say that sentences like those in (47) and (48) contain gaps where the
obligatory complements have been omitted, and these gaps are sometimes made
explicit using an underscore:
 (49) a. Which card do you put __ into the slot?
 b. Which slot do you put the card into __?
48
Dependency
 So, a gap can occur if it is licensed by a filler. Conversely, fillers can occur
only if there is an appropriate gap elsewhere in the sentence, as shown by the
following examples:
 (50) a. *Kim knows who you like Jody.
 b. *This music, you really like hip-hop.
 (51) a. *Which card do you put this into the slot?
 b. *Which slot do you put the card into this one?
 The mutual co-occurrence between filler and gap is sometimes termed a
“dependency.”
 One issue of considerable importance in theoretical linguistics has been the nature
of the material that can intervene between a filler and the gap that it licenses;
 in particular, can we simply list a finite set of sequences that separate the two?
 The answer is no: there is no upper bound on the distance between filler and gap
49
Unbounded Dependency
Construction
 constructions involving sentential complements, as shown in (52).
 (52) a. Who do you like __?
 b. Who do you claim that you like __?
 c. Who do you claim that Jody says that you like __?
 Since we can have indefinitely deep recursion of sentential
complements, the gap can be embedded indefinitely far inside the
whole sentence.
 This constellation of properties leads to the notion of an unbounded
dependency construction, that is, a filler-gap dependency where
there is no upper bound on the distance between filler and gap.
50
Slash Categories
 A variety of mechanisms have been suggested for handling unbounded
dependencies in formal grammars; here we illustrate the approach due to
Generalized Phrase Structure Grammar that involves slash categories.
 A slash category has the form Y/XP; we interpret this as a phrase of
category Y that is missing a subconstituent of category XP.
 For example, S/NP is an S that is missing an NP. The use of slash categories is
illustrated
51
Case and Gender in German
 Compared with English, German has a relatively rich morphology for
agreement.
 For example, the definite article in German varies with case, gender,
and number, as shown in Table 9-2.
 Table 9-2. Morphological paradigm for the German definite article
 Case Masculine Feminine Neutral Plural
 Nominative der die das die
 Genitive des der des der
 Dative dem der dem den
 Accusative den die das die
52
Dative case
 Subjects in German take the nominative case, and most verbs govern their objects in the
accusative case.
 However, there are exceptions, such as helfen, that govern the dative case:
 (55) a. Die Katze sieht den Hund
 the.NOM.FEM.SG cat.3.FEM.SG see.3.SG the.ACC.MASC.SG dog.3.MASC.SG
 ‘the cat sees the dog’
 b. *Die Katze sieht dem Hund
 the.NOM.FEM.SG cat.3.FEM.SG see.3.SG the.DAT.MASC.SG dog.3.MASC.SG
 c. Die Katze hilft dem Hund
 the.NOM.FEM.SG cat.3.FEM.SG help.3.SG the.DAT.MASC.SG dog.3.MASC.SG
 ‘the cat helps the dog’
 d. *Die Katze hilft den Hund
 the.NOM.FEM.SG cat.3.FEM.SG help.3.SG the.ACC.MASC.SG dog.3.MASC.SG
53
Summary
 The traditional categories of context-free grammar are atomic symbols. An
important motivation for feature structures is to capture fine-grained distinctions
that would otherwise require a massive multiplication of atomic categories.
 By using variables over feature values, we can express constraints in grammar
productions that allow the realization of different feature specifications to be
inter-dependent.
 Typically we specify fixed values of features at the lexical level and constrain
the values of features in phrases to unify with the corresponding values in their
children.
 Feature values are either atomic or complex. A particular sub-case of atomic
value is the Boolean value, represented by convention as [+/- f].
 Two features can share a value (either atomic or complex). Structures with
shared values are said to be re-entrant. Shared values are represented by
numerical indexes (or tags) in AVMs.
54
Summary
 A path in a feature structure is a tuple of features corresponding to the labels on
a sequence of arcs from the root of the graph representation.
 Two paths are equivalent if they share a value.
 Feature structures are partially ordered by subsumption. FS0 subsumes FS1 when
all the information contained in FS0 is also present in FS1.
 The unification of two structures FS0 and FS1, if successful, is the feature
structure FS2 that contains the combined information of both FS0 and FS1.
 If unification adds information to a path π in FS, then it also adds information to
every path π' equivalent to π.
 We can use feature structures to build succinct analyses of a wide variety of
linguistic phenomena, including verb subcategorization, inversion constructions,
unbounded dependency constructions and case government.
55

More Related Content

Similar to Natural Language Processing 9th Chapter.ppt (20)

PPT
IKL presentation for Ontolog
Pat Hayes
 
PPTX
Python ppt_118.pptx
MadhuriAnaparthy
 
PPTX
NLP Concepts detail explained in details.pptx
FaizRahman56
 
PPTX
lec3 AI.pptx
someyamohsen2
 
PPT
ppt3-conditionalstatementloopsdictionaryfunctions-240731050730-455ba0fa.ppt
avishekpradhan24
 
PPT
NLP Natural Language Processing 10th Chapter.ppt
pandeyharshita00
 
PPTX
Natural Language Processing Datascience.pptx
Anandh798253
 
PPTX
Lfg and gpsg
SubramanianMuthusamy3
 
PPTX
LFG and GPSG.pptx
Subramanian Mani
 
PDF
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
 
PPTX
Syntax
Sovanna Kakk
 
PPT
Lecture 7: Definite Clause Grammars
CS, NcState
 
PDF
2023-12, PhD Viva at Cardiff University, Representing Relational Knowledge wi...
asahiushio1
 
PDF
Final exam in advance dbms
Md. Mashiur Rahman
 
PPTX
Syntax presetation
qamaraftab6
 
PDF
Word2vec on the italian language: first experiments
Vincenzo Lomonaco
 
PPTX
(Semantics) saeed's book ch 9
VivaAs
 
PPT
unit -3 part 1.ppt
LSURYAPRAKASHREDDY
 
PPT
PPT3-CONDITIONAL STATEMENT LOOPS DICTIONARY FUNCTIONS.ppt
RahulKumar812056
 
PPTX
Ics1019 ics5003
Matt Montebello
 
IKL presentation for Ontolog
Pat Hayes
 
Python ppt_118.pptx
MadhuriAnaparthy
 
NLP Concepts detail explained in details.pptx
FaizRahman56
 
lec3 AI.pptx
someyamohsen2
 
ppt3-conditionalstatementloopsdictionaryfunctions-240731050730-455ba0fa.ppt
avishekpradhan24
 
NLP Natural Language Processing 10th Chapter.ppt
pandeyharshita00
 
Natural Language Processing Datascience.pptx
Anandh798253
 
Lfg and gpsg
SubramanianMuthusamy3
 
LFG and GPSG.pptx
Subramanian Mani
 
[Emnlp] what is glo ve part ii - towards data science
Nikhil Jaiswal
 
Syntax
Sovanna Kakk
 
Lecture 7: Definite Clause Grammars
CS, NcState
 
2023-12, PhD Viva at Cardiff University, Representing Relational Knowledge wi...
asahiushio1
 
Final exam in advance dbms
Md. Mashiur Rahman
 
Syntax presetation
qamaraftab6
 
Word2vec on the italian language: first experiments
Vincenzo Lomonaco
 
(Semantics) saeed's book ch 9
VivaAs
 
unit -3 part 1.ppt
LSURYAPRAKASHREDDY
 
PPT3-CONDITIONAL STATEMENT LOOPS DICTIONARY FUNCTIONS.ppt
RahulKumar812056
 
Ics1019 ics5003
Matt Montebello
 

Recently uploaded (20)

PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
PPT
Carmon_Remote Sensing GIS by Mahesh kumar
DhananjayM6
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
Carmon_Remote Sensing GIS by Mahesh kumar
DhananjayM6
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Ad

Natural Language Processing 9th Chapter.ppt

  • 1. CE 634: Natural Language Processing Chapter 9 Building Feature based Grammar Presented by: Dr CRS Kumar 1
  • 2. Overview  Grammatical Features  Processing Feature Structures  Extending Feature based Grammar  Summary 2
  • 3. Goal of the chapter  Natural languages have an extensive range of grammatical constructions which are hard to handle with the simple methods described in 8.. In order to gain more flexibility, we change our treatment of grammatical categories like S, NP and V. In place of atomic labels, we decompose them into structures like dictionaries, where features can take on a range of values.  The goal of this chapter is to answer the following questions:  How can we extend the framework of context free grammars with features so as to gain more fine-grained control over grammatical categories and productions?  What are the main formal properties of feature structures and how do we use them computationally?  What kinds of linguistic patterns and grammatical constructions can we now capture with feature based grammars?  Along the way, we will cover more topics in English syntax, including phenomena such as agreement, sub categorization, and unbounded dependency constructions. 3
  • 4. Grammatical Features  we described how to build classifiers that rely on detecting features of text.  Such features may be quite simple, such as extracting the last letter of a word, or more complex, such as a part-of-speech tag which has itself been predicted by the classifier.  In this chapter, we will investigate the role of features in building rule-based grammars.  In contrast to feature extractors, which record features that have been automatically detected, we are now going to declare the features of words and phrases.  We start off with a very simple example, using dictionaries to store features and their values.  >>> kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}  >>> chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}  The objects kim and chase both have a couple of shared features, CAT (grammatical category) and ORTH (orthography, i.e., spelling).  In addition, each has a more semantically-oriented feature: kim['REF'] is intended to give the referent of kim, while chase['REL'] gives the relation expressed by chase.  In the context of rule-based grammars, such pairings of features and values are known as feature structures, and we will shortly see alternative notations for them. 4
  • 5. Feature structures  Feature structures contain various kinds of information about grammatical entities.  The information need not be exhaustive, and we might want to add further properties.  For example, in the case of a verb, it is often useful to know what "semantic role" is played by the arguments of the verb.  In the case of chase, the subject plays the role of "agent", while the object has the role of "patient".  Let's add this information, using 'sbj' and 'obj' as placeholders which will get filled once the verb combines with its grammatical arguments:  >>> chase['AGT'] = 'sbj‘  >>> chase['PAT'] = 'obj'  If we now process a sentence Kim chased Lee, we want to "bind" the verb's agent role to the subject and the patient role to the object.  We do this by linking to the REF feature of the relevant NP. 5
  • 6. Example  we make the simple-minded assumption that the NPs immediately to the left and right of the verb are the subject and object respectively.  We also add a feature structure for Lee to complete the example.  >>> sent = "Kim chased Lee“  >>> tokens = sent.split()  >>> lee = {'CAT': 'NP', 'ORTH': 'Lee', 'REF': 'l'}  >>> def lex2fs(word):  ... for fs in [kim, lee, chase]:  ... if fs['ORTH'] == word:  ... return fs  >>> subj, verb, obj = lex2fs(tokens[0]), lex2fs(tokens[1]), lex2fs(tokens[2])  >>> verb['AGT'] = subj['REF']  >>> verb['PAT'] = obj['REF']  >>> for k in ['ORTH', 'REL', 'AGT', 'PAT']:  ... print("%-5s => %s" % (k, verb[k]))  ORTH => chased REL => chase AGT => k PAT => l 6
  • 7. Ad hoc  Feature structures are pretty powerful, but the way in which we have manipulated them is extremely ad hoc.  Our next task in this chapter is to show how the framework of context free grammar and parsing can be expanded to accommodate feature structures, so that we can build analyses like this in a more generic and principled way.  We will start off by looking at the phenomenon of syntactic agreement; we will show how agreement constraints can be expressed elegantly using features, and illustrate their use in a simple grammar.  Since feature structures are a general data structure for representing information of any kind, we will briefly look at them from a more formal point of view, and illustrate the support for feature structures offered by NLTK.  In the final part of the chapter, we demonstrate that the additional expressiveness of features opens up a wide spectrum of possibilities for describing sophisticated aspects of linguistic structure. 7
  • 8. Syntactic Agreement  The following examples show pairs of word sequences, the first of which is grammatical and the second not. (We use an asterisk at the start of a word sequence to signal that it is ungrammatical.)  (1)a.this dog  b.*these dog  (2)a.these dogs  b.*this dogs  In English, nouns are usually marked as being singular or plural. The form of the demonstrative also varies: this (singular) and these (plural).  show that there are constraints on the use of demonstratives and nouns within a noun phrase: either both are singular or both are plural. A similar constraint holds between subjects and predicates:  (3)a.the dog runs  b.*the dog run  (4)a.the dogs run  b.*the dogs runs  Here we can see that morphological properties of the verb co-vary with syntactic properties of the subject noun phrase. This co-variance is called agreement. 8
  • 9. Agreement Paradigm for English Regular Verbs  singular plural  1st perI runwe run  2nd per you runyou run  3rd per he/she/it runsthey run  We can make the role of morphological properties a bit more explicit as illustrated in ex-runs and ex-run.  These representations indicate that the verb agrees with its subject in person and number.  (We use "3" as an abbreviation for 3rd person, "SG" for singular and "PL" for plural.)  Let's see what happens when we encode these agreement constraints in a context-free grammar. We will begin with the simple CFG in (5).  (5)S -> NP VP  NP -> Det N  VP -> V  Det -> 'this'  N -> 'dog'  V -> 'runs'  Grammar (5) allows us to generate the sentence this dog runs; however, what we really want to do is also generate these dogs run while blocking unwanted sequences like *this dogs run and *these dog runs. 9
  • 10. Blowing up  The most straightforward approach is to add new non-terminals and productions to the grammar:  (6)S -> NP_SG VP_SG  S -> NP_PL VP_PL  NP_SG -> Det_SG N_SG  NP_PL -> Det_PL N_PL  VP_SG -> V_SG  VP_PL -> V_PL  Det_SG -> 'this'  Det_PL -> 'these'  N_SG -> 'dog'  N_PL -> 'dogs'  V_SG -> 'runs'  V_PL -> 'run'  In place of a single production expanding S, we now have two productions, one covering the sentences involving singular subject NPs and VPs, the other covering sentences with plural subject NPs and VPs. 10
  • 11. Growing grammar size  In fact, every production in (5) has two counterparts in (6). With a small grammar, this is not really such a problem, although it is aesthetically unappealing.  However, with a larger grammar that covers a reasonable subset of English constructions, the prospect of doubling the grammar size is very unattractive.  Let's suppose now that we used the same approach to deal with first, second and third person agreement, for both singular and plural.  This would lead to the original grammar being multiplied by a factor of 6, which we definitely want to avoid.  Can we do better than this? In the next section we will show that capturing number and person agreement need not come at the cost of "blowing up" the number of productions. 11
  • 12. Using Attributes and Constraints  We spoke informally of linguistic categories having properties; for example, that a noun has the property of being plural. Let's make this explicit:  (7)N[NUM=pl]  In (7), we have introduced some new notation which says that the category N has a (grammatical) feature called NUM (short for 'number') and that the value of this feature is pl (short for 'plural').  We can add similar annotations to other categories, and use them in lexical entries:  (8)Det[NUM=sg] -> 'this' Det[NUM=pl] -> 'these' N[NUM=sg] -> 'dog' N[NUM=pl] -> 'dogs' V[NUM=sg] -> 'runs' V[NUM=pl] -> 'run'  Does this help at all? So far, it looks just like a slightly more verbose alternative to what was specified in (6).  Things become more interesting when we allow variables over feature values, and use these to state constraints:  (9)S -> NP[NUM=?n] VP[NUM=?n] NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n] VP[NUM=?n] -> V[NUM=?n]  We are using ?n as a variable over values of NUM; it can be instantiated either to sg or pl, within a given production.  We can read the first production as saying that whatever value NP takes for the feature NUM, VP must take the same value.  12
  • 13. Trees of depth one  in order to understand how these feature constraints work, it's helpful to think about how one would go about building a tree.  Lexical productions will admit the following local trees (trees of depth one): 13
  • 15. VP[NUM=?n] -> V[NUM=?n]  Production VP[NUM=?n] -> V[NUM=?n] says that the NUM value of the head verb has to be the same as the NUM value of the VP parent.  Combined with the production for expanding S, we derive the consequence that if the NUM value of the subject head noun is pl, then so is the NUM value of the VP’s head verb. 15
  • 16. Unspecified NUM  Grammar illustrated lexical productions for determiners like this and these, which require a singular or plural head noun respectively.  However, other determiners in English are not choosy about the grammatical number of the noun they combine with.  One way of describing this would be to add two lexical entries to the grammar, one each for the singular and plural versions of a determiner such as the:  Det[NUM=sg] -> 'the' | 'some' | 'several'  Det[NUM=pl] -> 'the' | 'some' | 'several'  However, a more elegant solution is to leave the NUM value underspecified and let it agree in number with whatever noun it combines with. Assigning a variable value to  NUM is one way of achieving this result:  Det[NUM=?n] -> 'the' | 'some' | 'several' 16
  • 17. Example feature-based grammar.  >>> nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')  % start S  # ###################  # Grammar Productions  # ###################  # S expansion productions  S -> NP[NUM=?n] VP[NUM=?n]  # NP expansion productions  NP[NUM=?n] -> N[NUM=?n]  NP[NUM=?n] -> PropN[NUM=?n]  NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]  NP[NUM=pl] -> N[NUM=pl]  # VP expansion productions  VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]  VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP 17
  • 18. Lexical productions  # ###################  # Lexical Productions  # ###################  Det[NUM=sg] -> 'this' | 'every'  Det[NUM=pl] -> 'these' | 'all'  Det -> 'the' | 'some' | 'several'  PropN[NUM=sg]-> 'Kim' | 'Jody'  N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'  N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'  IV[TENSE=pres, NUM=sg] -> 'disappears' | 'walks'  TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'  IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk'  TV[TENSE=pres, NUM=pl] -> 'see' | 'like'  IV[TENSE=past] -> 'disappeared' | 'walked'  TV[TENSE=past] -> 'saw' | 'liked' 18
  • 19. Fcfg file  Notice that a syntactic category can have more than one feature: for example,  V[TENSE=pres, NUM=pl].  In general, we can add as many features as we like.  A final detail about Example 9-1 is the statement %start S. This “directive” tells the parser to take S as the start symbol for the grammar.  In general, when we are trying to develop even a very small grammar, it is convenient to put the productions in a file where they can be edited, tested, and revised.  We have saved Example 9-1 as a file named feat0.fcfg in the NLTK data distribution.  You can make your own copy of this for further experimentation using nltk.data.load(). 19
  • 20. Earley chart parser  Feature-based grammars are parsed in NLTK using an Earley chart parser  After tokenizing the input, we import the load_parser function , which takes a grammar filename as input and returns a chart parser cp .  Calling the parser’s nbest_parse() method will return a list trees of parse trees;  trees will be empty if the grammar fails to parse the input and otherwise will contain one or more parse trees,  depending on whether the input is syntactically ambiguous. 20
  • 21. Trace of feature-based chart parser.  >>> tokens = 'Kim likes children'.split()  >>> from nltk import load_parser  >>> cp = load_parser('grammars/book_grammars/feat0.fcfg', trace=2)  >>> trees = cp.nbest_parse(tokens)  |.Kim .like.chil.|  |[----] . .| PropN[NUM='sg'] -> 'Kim' *  |[----] . .| NP[NUM='sg'] -> PropN[NUM='sg'] *  |[----> . .| S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'}  |. [----] .| TV[NUM='sg', TENSE='pres'] -> 'likes' *  |. [----> .| VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[]  {?n: 'sg', ?t: 'pres'}  |. . [----]| N[NUM='pl'] -> 'children' *  |. . [----]| NP[NUM='pl'] -> N[NUM='pl'] *  |. . [---->| S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'}  |. [---------]| VP[NUM='sg', TENSE='pres']  -> TV[NUM='sg', TENSE='pres'] NP[] *  |[==============]| S[] -> NP[NUM='sg'] VP[NUM='sg'] * 21
  • 22. Flow Upwards  there is an implementation issue which bears on our earlier discussion of grammar size.  One possible approach to parsing productions containing feature constraints is to compile out all admissible values of the features in question so that we end up with a large, fully specified CFG  By contrast, the parser process illustrated in the previous examples works directly with the underspecified productions given by the grammar.  Feature values “flow upwards” from lexical entries, and variable values are then associated with those values via bindings (i.e., dictionaries) such as {?n: 'sg', ?t: 'pres'}.  As the parser assembles information about the nodes of the tree it is building, these variable bindings are used to instantiate values in these nodes;  thus the underspecified VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] NP[] becomes instantiated as VP[NUM='sg', TENSE='pres'] -> TV[NUM='sg', TENSE='pres'] NP[] by looking up the values of ?n and ?t in the bindings. 22
  • 23. Atomic values  So far, we have only seen feature values like sg and pl. These simple values are usually called atomic—that is, they can’t be decomposed into subparts.  A special case of atomic values are Boolean values, that is, values that just specify whether a property is true or false.  For example, we might want to distinguish auxiliary verbs such as can may, will, and do with the Boolean feature AUX. Then the production V[TENSE=pres, aux=+] -> 'can' means that can receives the value pres for TENSE and + or true for AUX.  There is a widely adopted convention that abbreviates the representation of Boolean features f; instead of aux=+ or aux=-, we use +aux and -aux respectively.  These are just abbreviations, however, and the parser interprets them as though + and - are like any other atomic value. 23
  • 24. Feature Annotations  shows some representative productions:  (17) V[TENSE=pres, +aux] -> 'can'  V[TENSE=pres, +aux] -> 'may'  V[TENSE=pres, -aux] -> 'walks'  V[TENSE=pres, -aux] -> 'likes'  We have spoken of attaching “feature annotations” to syntactic categories.  A more radical approach represents the whole category—that is, the non- terminal symbol plus the annotation—as a bundle of features.  For example, N[NUM=sg] contains part-ofspeech information which can be represented as POS=N.  An alternative notation for this category, therefore, is [POS=N, NUM=sg]. 24
  • 25. Attribute Value Matrix  In addition to atomic-valued features, features may take values that are themselves feature structures.  For example, we can group together agreement features (e.g., person, number, and gender) as a distinguished part of a category, serving as the value of AGR.  In this case, we say that AGR has a complex value. (18) depicts the structure, in a format known as an attribute value matrix (AVM).  [POS = N ]  [ ]  [AGR = [PER = 3 ] ]  [ [NUM = pl ] ]  [ [GND = fem ] ]  Rendering a feature structure as an attribute value matrix. 25
  • 26. Bundling Agreement Features  Once we have the possibility of using features like AGR, we can refactor a grammar like so that agreement features are bundled together.  A tiny grammar illustrating this idea is shown in (20).  (20) S -> NP[AGR=?n] VP[AGR=?n]  NP[AGR=?n] -> PropN[AGR=?n]  VP[TENSE=?t, AGR=?n] -> Cop[TENSE=?t, AGR=?n] Adj  Cop[TENSE=pres, AGR=[NUM=sg, PER=3]] -> 'is'  PropN[AGR=[NUM=sg, PER=3]] -> 'Kim'  Adj -> 'happy' 26
  • 27. Processing Feature Structures  we will show how feature structures can be constructed and manipulated in NLTK.  We will also discuss the fundamental operation of unification, which allows us to combine the information contained in two different feature structures.  Feature structures in NLTK are declared with the FeatStruct() constructor.  Atomic feature values can be strings or integers.  >>> fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')  >>> print fs1  [ NUM = 'sg' ]  [ TENSE = 'past' ]  A feature structure is actually just a kind of dictionary, and so we access its values by indexing in the usual way.  We can use our familiar syntax to assign values to features:  >>> fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')  >>> print fs1['GND']  fem  >>> fs1['CASE'] = 'acc' 27
  • 28. Complex Values in FeatStruct  We can also define feature structures that have complex values, as discussed earlier.  >>> fs2 = nltk.FeatStruct(POS='N', AGR=fs1)  >>> print fs2  [ [ CASE = 'acc' ] ]  [ AGR = [ GND = 'fem' ] ]  [ [ NUM = 'pl' ] ]  [ [ PER = 3 ] ]  [ ]  [ POS = 'N' ]  >>> print fs2['AGR']  [ CASE = 'acc' ]  [ GND = 'fem' ]  [ NUM = 'pl' ]  [ PER = 3 ]  >>> print fs2['AGR']['PER']  3 28
  • 29. Directed Acyclic Graphs  Feature structures are not inherently tied to linguistic objects; they are general-purpose structures for representing knowledge.  For example, we could encode information about a person in a feature structure:  >>> print nltk.FeatStruct(name='Lee', telno='01 27 86 42 96', age=33)  [ age = 33 ]  [ name = 'Lee' ]  [ telno = '01 27 86 42 96' ]  In the next couple of pages, we are going to use examples like this to explore standard operations over feature structures.  This will briefly divert us from processing natural language, but we need to lay the groundwork before we can get back to talking about grammars.  It is often helpful to view feature structures as graphs, more specifically, as directed acyclic graphs (DAGs). (21) is equivalent to the preceding AVM.  The feature names appear as labels on the directed arcs, and feature values appear as labels on the nodes that are pointed to by the arcs. 29
  • 30. Feature Path  When we look at such graphs, it is natural to think in terms of paths through the graph.  A feature path is a sequence of arcs that can be followed from the root node.  We will represent paths as tuples of arc labels. Thus, ('ADDRESS', 'STREET') is a feature path whose value is the node labeled 'rue Pascal'. 30
  • 31. Structure Sharing  the value of the path ('ADDRESS') in (24) is identical to the value of th path ('SPOUSE', 'ADDRESS'). DAGs such as (24) are said to involve structure sharing or reentrancy.  When two paths have the same value, they are said to be equivalent.  In order to indicate reentrancy in our matrix-style representations, we will prefix the  first occurrence of a shared feature structure with an integer in parentheses, such as  (1). Any later reference to that structure will use the notation ->(1), as shown here.  >>> print nltk.FeatStruct("""[NAME='Lee', ADDRESS=(1) [NUMBER=74, STREET='rue Pascal'],  ... SPOUSE=[NAME='Kim', ADDRESS->(1)]]""")  [ ADDRESS = (1) [ NUMBER = 74 ] ]  [ [ STREET = 'rue Pascal' ] ]  [ ]  [ NAME = 'Lee' ]  [ ]  [ SPOUSE = [ ADDRESS -> (1) ] ]  [ [ NAME = 'Kim' ] ]  The bracketed integer is sometimes called a tag or a coindex. 31
  • 32. Subsumption  It is standard to think of feature structures as providing partial information about some object, in the sense that we can order feature structures according to how general they are.  For example, (25a) is more general (less specific) than (25b), which in turn is more general than (25c).  (25) a. [NUMBER = 74]  b. [NUMBER = 74 ]  [STREET = 'rue Pascal']  c. [NUMBER = 74 ]  [STREET = 'rue Pascal']  [CITY = 'Paris' ]  This ordering is called subsumption; a more general feature structure subsumes a less general one. If FS0 subsumes FS1 (formally, we write FS0 FS1), then FS1 must have ⊑ all the paths and path equivalences of FS0, and may have additional paths and equivalences as well.  Thus, (23) subsumes (24) since the latter has additional path equivalences.  It should be obvious that subsumption provides only a partial ordering on feature structures, since some feature structures are incommensurable. For example, (26) neither subsumes nor is subsumed 32
  • 33. unification  How do we go about specializing a given feature structure? For example, we might decide that addresses should consist of not just a street number and a street name, but also a city.  That is, we might want to merge graph (27a) with (27b) to yield (27c).  Merging information from two feature structures is called unification and is supported by the unify() method.  >>> fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')  >>> fs2 = nltk.FeatStruct(CITY='Paris')  >>> print fs1.unify(fs2)  [ CITY = 'Paris' ]  [ NUMBER = 74 ]  [ STREET = 'rue Pascal' ] 33
  • 35. Subcategorization  we augmented our category labels to represent different kinds of verb, and used the labels IV and TV for intransitive and transitive verbs respectively.  This allowed us to write productions like the following:  (27)VP -> IV VP -> TV NP  Although we know that IV and TV are two kinds of V, they are just atomic nonterminal symbols from a CFG, as distinct from each other as any other pair of symbols.  This notation doesn't let us say anything about verbs in general, e.g. we cannot say "All lexical items of category V can be marked for tense", since walk, say, is an item of category IV, not V.  So, can we replace category labels such as TV and IV by V along with a feature that tells us whether the verb combines with a following NP object or whether it can occur without any complement? 35
  • 36. Generalized Phrase Structure Grammar  A simple approach, originally developed for a grammar framework called Generalized Phrase Structure Grammar (GPSG), tries to solve this problem by allowing lexical categories to bear a SUBCAT which tells us what subcategorization class the item belongs to.  While GPSG used integer values for SUBCAT, the example below adopts more mnemonic values, namely intrans, trans and clause:  (28)VP[TENSE=?t, NUM=?n] -> V[SUBCAT=intrans, TENSE=?t, NUM=?n] VP[TENSE=?t, NUM=?n] -> V[SUBCAT=trans, TENSE=?t, NUM=?n] NP VP[TENSE=?t, NUM=?n] -> V[SUBCAT=clause, TENSE=?t, NUM=?n] SBar V[SUBCAT=intrans, TENSE=pres, NUM=sg] -> 'disappears' | 'walks' V[SUBCAT=trans, TENSE=pres, NUM=sg] -> 'sees' | 'likes' V[SUBCAT=clause, TENSE=pres, NUM=sg] -> 'says' | 'claims' 36
  • 37. SUBCAT  When we see a lexical category like V[SUBCAT=trans], we can interpret the SUBCAT specification as a pointer to a production in which V[SUBCAT=trans] is introduced as the head child in a VP production.  By convention, there is a correspondence between the values of SUBCAT and the productions that introduce lexical heads.  On this approach, SUBCAT can only appear on lexical categories; it makes no sense, for example, to specify a SUBCAT value on VP.  As required, walk and like both belong to the category V.  Nevertheless, walk will only occur in VPs expanded by a production with the feature SUBCAT=intrans on the right hand side, as opposed to like, which requires a SUBCAT=trans.  In our third class of verbs above, we have specified a category SBar. This is a label for subordinate clauses such as the complement of claim in the example You claim that you like children.  We require two further productions to analyze such sentences:  (29)SBar -> Comp S  Comp -> 'that' 37
  • 39. Categorical grammar  An alternative treatment of subcategorization, due originally to a framework known as categorial grammar, is represented in feature based frameworks such as PATR and Head-driven Phrase Structure Grammar.  Rather than using SUBCAT values as a way of indexing productions, the SUBCAT value directly encodes the valency of a head (the list of arguments that it can combine with).  For example, a verb like put that takes NP and PP complements (put the book on the table) might be represented as (31):  (31)V[SUBCAT=<NP, NP, PP>]  This says that the verb can combine with three arguments. The leftmost element in the list is the subject NP, while everything else — an NP followed by a PP in this case — comprises the subcategorized-for complements.  When a verb like put is combined with appropriate complements, the requirements which are specified in the SUBCAT are discharged, and only a subject NP is needed 39
  • 40. Tree representation  This category, which corresponds to what is traditionally thought of as VP, might be represented as follows.  (32)V[SUBCAT=<NP>]  Finally, a sentence is a kind of verbal category that has no requirements for further arguments, and hence has a SUBCAT whose value is the empty list.  The tree (33) shows how these category assignments combine in a parse of Kim put the book on the table. 40
  • 41. Heads revisited  We noted in the previous section that by factoring subcategorization information out of the main category label, we could express more generalizations about properties of verbs.  Another property of this kind is the following: expressions of category V are heads of phrases of category VP. Similarly, Ns are heads of NPs, As (i.e., adjectives) are heads of APs, and Ps (i.e., prepositions) are heads of PPs.  Not all phrases have heads—for example, it is standard to say that coordinate phrases (e.g., the book and the bell) lack heads.  Nevertheless, we would like our grammar formalism to express the parent/head-child relation where it holds.  At present, V and VP are just atomic symbols, and we need to find a way to relate them using features (as we did earlier to relate IV and TV). 41
  • 42. Phrasal Level  X-bar syntax addresses this issue by abstracting out the notion of phrasal level.  It isusual to recognize three such levels.  If N represents the lexical level, then N' represents the next level up, corresponding to the more traditional category Nom, and N'' represents the phrasal level, corresponding to the category NP.  (36a) illustrates a representative structure, while (36b) is the more conventional counterpart. 42
  • 43. Projections  The head of the structure (36a) is N, and N' and N'' are called (phrasal) projections of N.  N'' is the maximal projection, and N is sometimes called the zero projection.  One of the central claims of X-bar syntax is that all constituents share a structural similarity.  Using X as a variable over N, V, A, and P, we say that directly subcategorized complements of a lexical head X are always placed as siblings of the head, whereas adjuncts are placed as siblings of the intermediate category, X'.  Thus, the configuration of the two P'' adjuncts in (37) contrasts with that of the complement P'' in (36a). 43
  • 44. Bar levels  The productions in (38) illustrate how bar levels can be encoded using feature structures.  The nested structure in (37) is achieved by two applications of the recursive rule  expanding N[BAR=1].  (38) S -> N[BAR=2] V[BAR=2]  N[BAR=2] -> Det N[BAR=1]  N[BAR=1] -> N[BAR=1] P[BAR=2]  N[BAR=1] -> N[BAR=0] P[BAR=2] 44
  • 45. Auxiliary Verbs and Inversion  Inverted clauses—where the order of subject and verb is switched—occur in English interrogatives and also after “negative” adverbs:  (39) a. Do you like children?  b. Can Jody walk?  (40) a. Rarely do you see Kim.  b. Never have I seen this dog.  However, we cannot place just any verb in pre-subject position:  (41) a. *Like you children?  b. *Walks Jody?  (42) a. *Rarely see you Kim.  b. *Never saw I this dog.  Verbs that can be positioned initially in inverted clauses belong to the class known as auxiliaries, and as well as do, can, and have include be, will, and shall. 45
  • 46. Inverted Clause  That is, a clause marked as [+inv] consists of an auxiliary verb followed by a VP.  (In a more detailed grammar, we would need to place some constraints on the form of the VP, depending on the choice of auxiliary.) (44) illustrates the structure of an inverted clause: 46
  • 47. Unbounded Dependency Constructions  Consider the following contrasts:  (45) a. You like Jody.  b. *You like.  (46) a. You put the card into the slot.  b. *You put into the slot.  c. *You put the card.  d. *You put.  The verb like requires an NP complement, while put requires both a following NP and PP.  (45) and (46) show that these complements are obligatory: omitting them leads to ungrammaticality. 47
  • 48. Filler and Gaps  . Yet there are contexts in which obligatory complements can be omitted, as (47) and (48) illustrate.  (47) a. Kim knows who you like.  b. This music, you really like.  (48) a. Which card do you put into the slot?  b. Which slot do you put the card into?  That is, an obligatory complement can be omitted if there is an appropriate filler in the sentence, such as the question word who in (47a), the preposed topic this music in (47b), or the wh phrases which card/slot in (48).  It is common to say that sentences like those in (47) and (48) contain gaps where the obligatory complements have been omitted, and these gaps are sometimes made explicit using an underscore:  (49) a. Which card do you put __ into the slot?  b. Which slot do you put the card into __? 48
  • 49. Dependency  So, a gap can occur if it is licensed by a filler. Conversely, fillers can occur only if there is an appropriate gap elsewhere in the sentence, as shown by the following examples:  (50) a. *Kim knows who you like Jody.  b. *This music, you really like hip-hop.  (51) a. *Which card do you put this into the slot?  b. *Which slot do you put the card into this one?  The mutual co-occurrence between filler and gap is sometimes termed a “dependency.”  One issue of considerable importance in theoretical linguistics has been the nature of the material that can intervene between a filler and the gap that it licenses;  in particular, can we simply list a finite set of sequences that separate the two?  The answer is no: there is no upper bound on the distance between filler and gap 49
  • 50. Unbounded Dependency Construction  constructions involving sentential complements, as shown in (52).  (52) a. Who do you like __?  b. Who do you claim that you like __?  c. Who do you claim that Jody says that you like __?  Since we can have indefinitely deep recursion of sentential complements, the gap can be embedded indefinitely far inside the whole sentence.  This constellation of properties leads to the notion of an unbounded dependency construction, that is, a filler-gap dependency where there is no upper bound on the distance between filler and gap. 50
  • 51. Slash Categories  A variety of mechanisms have been suggested for handling unbounded dependencies in formal grammars; here we illustrate the approach due to Generalized Phrase Structure Grammar that involves slash categories.  A slash category has the form Y/XP; we interpret this as a phrase of category Y that is missing a subconstituent of category XP.  For example, S/NP is an S that is missing an NP. The use of slash categories is illustrated 51
  • 52. Case and Gender in German  Compared with English, German has a relatively rich morphology for agreement.  For example, the definite article in German varies with case, gender, and number, as shown in Table 9-2.  Table 9-2. Morphological paradigm for the German definite article  Case Masculine Feminine Neutral Plural  Nominative der die das die  Genitive des der des der  Dative dem der dem den  Accusative den die das die 52
  • 53. Dative case  Subjects in German take the nominative case, and most verbs govern their objects in the accusative case.  However, there are exceptions, such as helfen, that govern the dative case:  (55) a. Die Katze sieht den Hund  the.NOM.FEM.SG cat.3.FEM.SG see.3.SG the.ACC.MASC.SG dog.3.MASC.SG  ‘the cat sees the dog’  b. *Die Katze sieht dem Hund  the.NOM.FEM.SG cat.3.FEM.SG see.3.SG the.DAT.MASC.SG dog.3.MASC.SG  c. Die Katze hilft dem Hund  the.NOM.FEM.SG cat.3.FEM.SG help.3.SG the.DAT.MASC.SG dog.3.MASC.SG  ‘the cat helps the dog’  d. *Die Katze hilft den Hund  the.NOM.FEM.SG cat.3.FEM.SG help.3.SG the.ACC.MASC.SG dog.3.MASC.SG 53
  • 54. Summary  The traditional categories of context-free grammar are atomic symbols. An important motivation for feature structures is to capture fine-grained distinctions that would otherwise require a massive multiplication of atomic categories.  By using variables over feature values, we can express constraints in grammar productions that allow the realization of different feature specifications to be inter-dependent.  Typically we specify fixed values of features at the lexical level and constrain the values of features in phrases to unify with the corresponding values in their children.  Feature values are either atomic or complex. A particular sub-case of atomic value is the Boolean value, represented by convention as [+/- f].  Two features can share a value (either atomic or complex). Structures with shared values are said to be re-entrant. Shared values are represented by numerical indexes (or tags) in AVMs. 54
  • 55. Summary  A path in a feature structure is a tuple of features corresponding to the labels on a sequence of arcs from the root of the graph representation.  Two paths are equivalent if they share a value.  Feature structures are partially ordered by subsumption. FS0 subsumes FS1 when all the information contained in FS0 is also present in FS1.  The unification of two structures FS0 and FS1, if successful, is the feature structure FS2 that contains the combined information of both FS0 and FS1.  If unification adds information to a path π in FS, then it also adds information to every path π' equivalent to π.  We can use feature structures to build succinct analyses of a wide variety of linguistic phenomena, including verb subcategorization, inversion constructions, unbounded dependency constructions and case government. 55