SlideShare a Scribd company logo
Words and Transducers
 Orthographic and Morphological
rules,
 Survey of English morphology,
- Prefixes, suffixes,
- Infixes, circumfixes,
- inflection, derivation,
- compounding, cliticization.
 Finite-state Morphological parsing,
- lexicon, morphotactics,
- orthographic rules,
 Building a finite-state Lexicon,
- Working for words,
- Reg/Irreg noun,
- Reg/Irreg verb,
 Transducers and Orthographic rules,
 Minimum Edit Distance.
 Plural e.g., woodchucks was easy to search these type of
plurals just tacks an s on to the end. (e.g., using disjunctions or
Pipe Symbol And Paranthesis)
 Consider words like FOX, and a FISH, and PUCARRY a
soft- drink.
 Hunting for the plurals of these words takes more than just tacking
on an S.
 The plural of
- fox is foxes;
- of pucarry is pucarries;
- and of goose is geese.
 Further, fish don’t usually change their form when they are
plural
1. Words and Transducers (Some Concepts)
 It takes two kinds of knowledge to correctly search for singulars
and plurals of these forms/
(1)Orthographic rules tell us that English words ending in -y are pluralized by
changing the -y to -i- and adding an -es.
(2)Morphological rules tell us that
- fish has a null plural, and that
- the plural of goose is formed
by changing the vowel.
 Recognizing that a word foxes breaks down into component
morphemes (fox and -es) and building a structured representation
of this fact is called morphological parsing
 Parsing means taking an input and producing some sort of linguistic
structure for it
1. Words and Transducers (Some Concepts)
(Cont..)
 To solve the morphological parsing problem, why couldn’t we
just store all the plural forms of English nouns and -ing forms of
English verbs in a dictionary and do parsing by lookup?
Sometimes we can do this
For example; for English speech recognition this is exactly what we do.
 But, for many NLP applications this isn’t possible because -ing is a
productive suffix.
 Mean that it applies to every verb.
 Similarly -s applies to almost every noun.
 Productive suffixes even apply to new words; thus the new word fax can
automatically be used in the -ing form
1. Words and Transducers (Some Concepts)
(Cont..)
 Now in next section, we will survey MORPHOLOGICAL
KNOWLEDGE for English language and then study some
algorithms to solve these problems.
1. Words and Transducers (Some Concepts)
(Cont..)
 Morphology is the study of the way words are built up from
smaller meaning-bearing units, morphemes.
- A Morpheme is often defined as the minimal meaning-
bearing unit in a language.
For example
- the word fox consists of a single morpheme (the morpheme
fox).
-while, the word cats consists of two: (i) the morpheme cat and (ii)
the morpheme -s.
2. Survey of English
Morphology
 Previous example suggests, it is often useful to distinguish two
broad classes of morphemes:
(1) stems and (2) affixes.
 The stem is the “main” morpheme of the word, supplying the main
meaning.
- example; In Cat’s, Cat is stem.
 The affixes add “additional” meanings of various kinds.
- example; In Cat’s, ’s is affixes.
2. Survey of English Morphology
(Cont..)
2. Survey of English
Morphology
2.1 Categories of Affixes
 Affixes are further divided into 4 types;
(1)prefixes, (2) suffixes, (3) infixes, and (4) circumfixes.
(1)Prefixes precede the stem,
e.g., The word unbuckle is composed of a stem buckle and
the prefix un-.
(2)Suffixes follow the stem,
e.g., the word eats is composed of a stem eat and the suffix
-s.
(3)Infixes, are inserted inside the stem.
- a morpheme is inserted in the middle of a word.
e.g., the affix e, infixed to the stem bled “borrow” to
produce bleed.
the affix um, infixed to the stem hingi “borrow” to
- English doesn’t have any good examples of circumfixes, but many
other languages do. In German,
e.g., adding ge- to the beginning of the stem and -t to the end;
so the past participle of the verb sagen (to say) is gesagt (said).
 Words can have more than one affix
e.g., word “rewrites” have
 prefix “re”,
 the stem “write” and
 suffix “s”
2. Survey of English
Morphology
2.1 Categories of Affixes
(4) Circumfixe, circumfixes do both (prefixes and suffixes).
(Cont..)
 There are many ways to combine morphemes to create words.
 Four methods are common and play important roles in speech
and language processing:
(1) Inflection,
(2) Derivation,
(3) Cliticization, and
(4) Compounding.
3. Morphology to create Words
1. Inflection
It is the combination of a word stem with a grammatical morpheme,
usually resulting in a word of the same class as the original stem,
and usually filling some syntactic function like agreement.
-English has the inflectional morpheme -s for marking the plural
on nouns, and
- the inflectional morpheme -ed for marking the past tense on
verbs
For example: Play > Played
Player > Players
3. Morphology to create Words
(Cont..)
 English has simple inflectional system with;
(a) nouns,
(b) verbs and
(c) some times adjectives.
 Nouns have two kind of inflections:
(i) Affix that marks plural. (e.g., cat to cats)
(ii) Affix that marks possessive (e.g., Ali’s Pen)
(iii)Affix that marks plural
 Regular plural is spelled -s after most nouns,
 it is spelled -es after words ending in -s (ibis/ibises), -z (waltz/waltzes),
-sh (thrush/thrushes), -ch (finch/finches), and sometimes -x
(box/boxes). Nouns ending in -y preceded by a consonant change the -
y to -i (butterfly/butterflies).
3.1 Inflectional Morphology (a.
Nouns)
(ii) Affix that marks possessive (Tense)
The possessive suffix is realized by apostrophe + -s for regular
singular nouns (llama’s)
 Plural nouns not ending in -s (children’s)
3.1 Inflectional Morphology (a. Nouns)
(Cont…)
 English verbal inflection is more complicated
 English has 3 kinds of verbs;
 main verbs, {direct verb, action} (e.g., eat, sleep, impeach),
 modal verbs {indirect verb, week action} (e.g., can, will, should), and
 primary verbs {supporting verb, action} (e.g., be, have, do)
 We will mostly be concerned with the main and primary verbs,
because it have inflectional endings.
 Of these verbs a large class are regular, that is to say all
verbs of this class have the same endings marking the same
functions
3.1 Inflectional Morphology (b. Verbs)
 Regular verbs (e.g. walk) have four morphological forms, as follow:
 stem
 -s form
 -ing participle
 Past form or -ed participle
walk
walks
walking
walked
 These verbs are called regular because just by knowing the stem we
can predict the other forms by adding one of three predictable
endings and making some regular spelling changes
 Regular verbs and forms are significant in the morphology of English
first because they cover a majority of the verbs, and second because
the regular class is Productive
 A productive class is one that automatically includes any new
words that enter the language (e.g., Fax to Faxing)
3.1 Inflectional Morphology (b. Verbs)
(Cont…)
 The Irregular verbs are those that have some more or less
idiosyncratic forms of Irregular verb inflection
 Irregular verbs in English often have five different forms, but can have
as many as eight or as few as three (e.g. cut or hit).
 Note that an irregular verb can inflect in the past form (also called
the preterite) by changing its vowel (eat/ate), or its vowel and some
consonants (catch/caught), or with no change at all (cut/cut).
3.1 Inflectional Morphology (b. Verbs)
(Cont…)
Irregular verbs Example :
The -s form is used in the “habitual present” form to distinguish the
-
- third-person singular ending (She jogs every Tuesday) from the
other choices of person and number (I/you/we/they jog every Tuesday).
In addition to noting which suffixes can be attached to which stems,
we need to capture the fact that a number of regular spelling changes
occur at these morpheme boundaries.
For Example, a single consonant letter is doubled before adding the –
ing
and -ed suffixes (beg/begging/begged).
3.1 Inflectional Morphology (b. Verbs)
(Cont…)
2. Derivation
is the combination of a word stem with a grammatical morpheme,
- mainly deal with adjective, nouns and verbs.
Resulting in a word of a different class, often with a meaning hard to
predict exactly.
For example
the verb computerize can take the derivational suffix -ation to
produce the noun computerization.
3. Morphology to create Words
(Cont..)
Case 1: Verb/Adjective to Noun :-
While English inflection is relatively simple compared to other
languages, derivation in English is quite complex.
A very common kind of derivation in English is the formation of
new nouns, often from verbs or adjectives. This process is called
nominalization.
For Example:-
the suffix -ation produces nouns from verbs ending often in the suffix -
ize (computerize → computerization). Here are examples of some
particularly productive English nominalizing suffixes.
3.2 Derivational Morphology
Case 2: Verb/Noun to Adjective:-
Adjectives can also be derived from nouns and verbs. Here are
examples of a few suffixes deriving adjectives from nouns or verbs.
Derivation in English is more complex than inflection for a number
of reasons. One is that it is generally less productive; even a
nominalizing suffix like -ation, which can be added to almost any
verb ending in -ize, cannot be added to absolutely every verb.
3.2 Derivational Morphology (Cont..)
3. Cliticization
It is the combination of a word stem with a clitic.
A clitic is a morpheme that acts syntactically like a word, but is
reduced in form and attached (phonologically and sometimes
orthographically) to another word
For example
English morpheme ’ve in the word “ I’ve ” is a clitic
3. Morphology to create Words
(Cont..)
 The phonological behavior of clitics is like affixes; they tend to
be short and unaccented. Their syntactic behavior is more like
words, often acting as pronouns, articles, conjunctions, or verbs.
 Clitics preceding a word are called proclitics, (e.g., ‘Tis is it is )
- while those following Proclitic are enclitics. (e.g., I’m)
• Note that the clitics in English are ambiguous; Thus she’s can mean
she is or she has, correctly segmenting off clitics in English is
simplified by the presence of the apostrophe (’) .
3.3 Cliticization Morphology
4. Compounding
It is the combination of multiple word stems together.,
For example
the noun doghouse is the concatenation of the morpheme
dog with the morpheme house.
3. Morphology to create Words (Cont..)
• Inputs from English
morphologically parsed
in Morphological
Parse Column.
4. Finite-State Morphological Parsing
 The second column contains the stem of each word as well as
assorted morphological features. These features specify
additional information Feature about the stem.
For Example the feature;
+N
+Sg
+Pl
: means that the word is a noun;
: means it is singular,
: means it is plural.
+PresPart : is Present Participle (ending in “ing”)
+PastPart : is Past Participle (ending in “ed”)
 Note that some of the input forms (like caught, goose, canto, or
vino) will be ambiguous between different morphological parses.
For now, we will consider the goal of morphological parsing merely
to list all possible parses.
4. Finite-State Morphological Parsing
(Cont…)
 In order to build a morphological parser, we’ll need at least the
following:
(1)Lexicon: the list of stems and affixes, together with basic information
about them (whether a stem is a Noun stem or a Verb stem, etc.).
(2) Morphotactics: the model of morpheme ordering that explains
which classes of morphemes can follow other classes of morphemes
inside a word. For example, the fact that the English plural
morpheme follows the noun rather than preceding it is a
morphotactic fact.
For Example; (e.g., In Cats, Cat is stem and “s” as plural morpheme).
(3)Orthographic rules: these spelling rules are used to model the
changes that occur in a word, usually when two morphemes combine
For Example; (e.g., the y→ie spelling rule that changes city + -s to
4. Finite-State Morphological Parsing
(Cont…)
 A lexicon is a repository for words.
 The simplest possible lexicon would consist of an explicit list
of every word of the language
For Example;
- (every word, i.e., including abbreviations (“AAA”) and
e.g., a, AAA, AA, Aachen, aardvark, aardwolf, aba, abaca,
aback, . . .
- proper names (“Jane” or “Beijing”)) as follows:
 There are many ways to model morphotactics; one of the
most common is the finite-state automaton.
4.1 Building a Finite-State
LEXICON (Working For Words)
Reg-noun:- The FSA assumes that the
lexicon includes regular nouns (reg-noun) that
take the regular -s plural (e.g., cat, dog, fox,
aardvark).
irreg-pl-noun/ irreg-sg-noun :- These are
the vast majority of English nouns since for
now we will ignore the fact that the plural of
words like fox have an inserted e: foxes. The
lexicon also includes irregular noun forms
that don’t take -s,
-both singular irreg-sg-noun (goose,
mouse) and
- plural irreg-pl-noun (geese,mice).
4.2 Building a Finite-State
LEXICON (Reg/Irreg
Noun)
 This lexicon has three stem classes (reg-verb-stem, irreg-verb-stem, and
irreg- pastverb-form), plus four more affix classes (-ed past, -ed participle, -
ing participle, and third singular -s).
Table: Lexicon for finite-state
 English derivational morphology is significantly more complex than English
inflectional morphology, and so automata for modeling English derivation tend
to be quite complex.
4.3 Building a Finite-State
LEXICON (Reg/ Irreg Verb)
 Consider a relatively simpler case of derivation: the
morphotactics of English adjectives. Here are some examples
from Antworth (1990):
e.g., big, bigger, biggest,
 An initial hypothesis might be that adjectives can have an
optional prefix (un-), an obligatory root (big, cool, etc.) and an
optional suffix (-er, -est, or -ly).
 Big word (combination);
4.4 Building a Finite-State LEXICON (Example-1)
Problem Defined:
 While this FSA will recognize all the adjectives, it will also
recognize ungrammatical forms like unbig, unfast, oranger, or
smally. We need to set up classes of roots and specify their possible
suffixes.
-Thus adj-root1 would include adjectives that can occur with un-
and -ly (clear, happy, and real)
- while adj-root2 will include adjectives that can’t (big, small),
 This FSA models a number of derivational facts, such as the well
known generalization that any verb ending in -ize can be followed
by the nominalizing suffix –ation.
CASE STUDY : -
There is a word fossilize, we can predict the word fossilization by
following states q0, q1, and q2. Similarly, adjectives ending in -al or -
able at q5 (equal, formal, realizable) can take the suffix -ity, or
4.4 Building a Finite-State LEXICON (Example-1)
 Design and build a finite-state Lexicon of derivation in
which morphotactics of English adjectives and FSA of
following combinations are defined:
[Note: design single FSA for overall word].
 cool, cooler, coolest, coolly;
 happy, happier, happiest, happily;
 red, redder, reddest;
 unhappy, unhappier, unhappiest, unhappily;
 real, unreal, really;
 clear, clearer, clearest, clearly, unclear, unclearly
4.4 Building a Finite-State LEXICON
(Class Participation)
 Consider the following FSA of English derivational morphology;
describe following combinations of;
q0->q1->q2->q3
q0->q1->q2->q4
q0->q5->q6
q0->q5->q2->q3
q0->q5->q2->q4
q0->q5->q6
q0->q5->q9
q0->q8->q9
q0->q8->q6
q0->q7->q8->q9
q0->q10->q8->q6
4.4 Building a Finite-State LEXICON
(Assignments)
 q0->q10->q8->q9
 q0->q10->q8->q6
 q0->q11->q8->q9
 q0->q11->q8->q6
q0q1q2q3q4q5q6q7q8q9q10q11
 The Previous method will successfully recognize words like aardvarks
and mice.
 Just concatenating the morphemes won’t work for cases where there is a
spelling change, it would incorrectly reject an input like foxes and accept
an input like foxs.
 We need to deal with the fact that English often requires spelling changes
at morpheme boundaries by introducing spelling rules (or orthographic
rules).
Some Spelling Rules
5. Transducers and Orthographic Rules
 We could write an E-insertion rule that performs the mapping from the
intermediate to surface levels shown.
 Such a rule might say something like “insert an e on the surface tape just
when the lexical tape has a morpheme ending in (s, z, x, ch, sh etc.) and
the next morpheme is -s”.
 Here’s a formalization of the rule
This is the rule notation of Chomsky and Halle (1968);
5. Transducers and Orthographic Rules (Cont…)
 The distance between String distance two strings is a measure of how alike
two strings are to each other.
 The minimum edit distance between two strings is the minimum number of
editing operations (insertion, deletion, substitution) needed to transform one
string into another.
 For example the gap between the words intention and execution is five
operations
6. Minimum Edit Distance
 The minimum edit distance is computed by dynamic programming.
Dynamic programming is the name for a class of algorithms, that apply a
table-driven method to solve problems by combining solutions to
subproblems.
 This class of algorithms includes the most commonly-used algorithms in
speech and language processing.
 The intuition of a dynamic programming problem is that a large problem
can
be solved by properly combining the solutions to various subproblems.
 For example, consider the sequence or “path” of transformed words that
comprise the minimum edit distance between the strings intention and
execution
11. Minimum Edit Distance (Cont…)
 Dynamic programming algorithms for sequence comparison work by
creating a distance matrix with one column for each symbol in the target
sequence and one row for each symbol in the source sequence (i.e., target
along the bottom, source along the side).
 For minimum edit distance, this matrix is the edit-distance matrix. Each
cell edit-distance[i,j] contains the distance between the first i characters of
the target and the first j characters of the source.
 Each cell can be computed as a simple function of the surrounding cells;
thus starting from the beginning of the matrix it is possible to fill every
entry.
 The value in each cell is computed by taking the minimum of the three
possible paths through the matrix which arrive there.
11. Minimum Edit Distance (Cont…)

More Related Content

Similar to chapter4.pptx natural language processing (20)

PPT
Morphology
Mae Selim
 
PPTX
Chapter 5.1.pptx
brianjars
 
PDF
Morphology.....a major topic in Linguistics
saroshzainab
 
PPT
lect4-morphology.ppt
Waqar Ahmed Memon
 
PPT
lect4-morphology.ppt
Krismalita
 
PPTX
lect4-morphology.pptx
SahilAli23165
 
PDF
207 morphbooklet
Ignatius Joseph Estroga
 
PPT
unit III. Morphology.ppt
SistemadeEstudiosMed
 
PPTX
A Brief Introduction of Morphology
amna-shahid
 
PPTX
Morphology
Abdulsalam Mohammed
 
DOCX
Classes of Words
Tutik SR
 
PPTX
Types of Forming Words (Affiation, Suffi
teguhimansyah4
 
PPT
Morphemes & Types of morphemes
MahrukhShehzadi1
 
PPTX
Issues in Ling - Chap 4 in linguistics.pptx
ssuser2a8f15
 
PPTX
Su 2012 ss morphology pp
Christian Añamisi
 
PPT
unit 2.ppt english word structure and formation
NamDoMinh2
 
PDF
Morpho 12 13
blessedkkr
 
PPT
Lecture 05 sonia
sonianad
 
PPT
Morphology Son
fatmasima
 
Morphology
Mae Selim
 
Chapter 5.1.pptx
brianjars
 
Morphology.....a major topic in Linguistics
saroshzainab
 
lect4-morphology.ppt
Waqar Ahmed Memon
 
lect4-morphology.ppt
Krismalita
 
lect4-morphology.pptx
SahilAli23165
 
207 morphbooklet
Ignatius Joseph Estroga
 
unit III. Morphology.ppt
SistemadeEstudiosMed
 
A Brief Introduction of Morphology
amna-shahid
 
Morphology
Abdulsalam Mohammed
 
Classes of Words
Tutik SR
 
Types of Forming Words (Affiation, Suffi
teguhimansyah4
 
Morphemes & Types of morphemes
MahrukhShehzadi1
 
Issues in Ling - Chap 4 in linguistics.pptx
ssuser2a8f15
 
Su 2012 ss morphology pp
Christian Añamisi
 
unit 2.ppt english word structure and formation
NamDoMinh2
 
Morpho 12 13
blessedkkr
 
Lecture 05 sonia
sonianad
 
Morphology Son
fatmasima
 

More from ssuser77162c (14)

PPTX
REGULAR EXPRESSION FOR NATURAL LANGUAGES
ssuser77162c
 
PPTX
satellitesystems.pptx mobile database computing
ssuser77162c
 
PPTX
FINALEXAMCHAPTER3.pptx remote sensing diagram
ssuser77162c
 
PPT
CHAPTER2.ppt DATABASES FOR MULTIMEDIA COMPUTING
ssuser77162c
 
PPTX
UNIT-3.pptx GIS INFORMATION SYSTEMFOR COMPUTING
ssuser77162c
 
PPTX
chapter21-parallel processing. computing
ssuser77162c
 
PPT
RTET-2024_52.ppt research presentation for
ssuser77162c
 
PPTX
FSA.pptx natural language prsgdsgocessing
ssuser77162c
 
PPTX
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
PPT
Semantic natural language processing ppt
ssuser77162c
 
PPTX
arthimetic operator,classes,objects,instant
ssuser77162c
 
PPTX
Chapter1 python introduction syntax general
ssuser77162c
 
PPT
intro.ppt
ssuser77162c
 
PPT
8034.ppt
ssuser77162c
 
REGULAR EXPRESSION FOR NATURAL LANGUAGES
ssuser77162c
 
satellitesystems.pptx mobile database computing
ssuser77162c
 
FINALEXAMCHAPTER3.pptx remote sensing diagram
ssuser77162c
 
CHAPTER2.ppt DATABASES FOR MULTIMEDIA COMPUTING
ssuser77162c
 
UNIT-3.pptx GIS INFORMATION SYSTEMFOR COMPUTING
ssuser77162c
 
chapter21-parallel processing. computing
ssuser77162c
 
RTET-2024_52.ppt research presentation for
ssuser77162c
 
FSA.pptx natural language prsgdsgocessing
ssuser77162c
 
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
Semantic natural language processing ppt
ssuser77162c
 
arthimetic operator,classes,objects,instant
ssuser77162c
 
Chapter1 python introduction syntax general
ssuser77162c
 
intro.ppt
ssuser77162c
 
8034.ppt
ssuser77162c
 
Ad

Recently uploaded (20)

PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PPTX
How to Set Maximum Difference Odoo 18 POS
Celine George
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PPTX
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PDF
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PDF
The Different Types of Non-Experimental Research
Thelma Villaflores
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
community health nursing question paper 2.pdf
Prince kumar
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
How to Set Maximum Difference Odoo 18 POS
Celine George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
The Different Types of Non-Experimental Research
Thelma Villaflores
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
Ad

chapter4.pptx natural language processing

  • 1. Words and Transducers  Orthographic and Morphological rules,  Survey of English morphology, - Prefixes, suffixes, - Infixes, circumfixes, - inflection, derivation, - compounding, cliticization.  Finite-state Morphological parsing, - lexicon, morphotactics, - orthographic rules,  Building a finite-state Lexicon, - Working for words, - Reg/Irreg noun, - Reg/Irreg verb,  Transducers and Orthographic rules,  Minimum Edit Distance.
  • 2.  Plural e.g., woodchucks was easy to search these type of plurals just tacks an s on to the end. (e.g., using disjunctions or Pipe Symbol And Paranthesis)  Consider words like FOX, and a FISH, and PUCARRY a soft- drink.  Hunting for the plurals of these words takes more than just tacking on an S.  The plural of - fox is foxes; - of pucarry is pucarries; - and of goose is geese.  Further, fish don’t usually change their form when they are plural 1. Words and Transducers (Some Concepts)
  • 3.  It takes two kinds of knowledge to correctly search for singulars and plurals of these forms/ (1)Orthographic rules tell us that English words ending in -y are pluralized by changing the -y to -i- and adding an -es. (2)Morphological rules tell us that - fish has a null plural, and that - the plural of goose is formed by changing the vowel.  Recognizing that a word foxes breaks down into component morphemes (fox and -es) and building a structured representation of this fact is called morphological parsing  Parsing means taking an input and producing some sort of linguistic structure for it 1. Words and Transducers (Some Concepts) (Cont..)
  • 4.  To solve the morphological parsing problem, why couldn’t we just store all the plural forms of English nouns and -ing forms of English verbs in a dictionary and do parsing by lookup? Sometimes we can do this For example; for English speech recognition this is exactly what we do.  But, for many NLP applications this isn’t possible because -ing is a productive suffix.  Mean that it applies to every verb.  Similarly -s applies to almost every noun.  Productive suffixes even apply to new words; thus the new word fax can automatically be used in the -ing form 1. Words and Transducers (Some Concepts) (Cont..)
  • 5.  Now in next section, we will survey MORPHOLOGICAL KNOWLEDGE for English language and then study some algorithms to solve these problems. 1. Words and Transducers (Some Concepts) (Cont..)
  • 6.  Morphology is the study of the way words are built up from smaller meaning-bearing units, morphemes. - A Morpheme is often defined as the minimal meaning- bearing unit in a language. For example - the word fox consists of a single morpheme (the morpheme fox). -while, the word cats consists of two: (i) the morpheme cat and (ii) the morpheme -s. 2. Survey of English Morphology
  • 7.  Previous example suggests, it is often useful to distinguish two broad classes of morphemes: (1) stems and (2) affixes.  The stem is the “main” morpheme of the word, supplying the main meaning. - example; In Cat’s, Cat is stem.  The affixes add “additional” meanings of various kinds. - example; In Cat’s, ’s is affixes. 2. Survey of English Morphology (Cont..)
  • 8. 2. Survey of English Morphology 2.1 Categories of Affixes  Affixes are further divided into 4 types; (1)prefixes, (2) suffixes, (3) infixes, and (4) circumfixes. (1)Prefixes precede the stem, e.g., The word unbuckle is composed of a stem buckle and the prefix un-. (2)Suffixes follow the stem, e.g., the word eats is composed of a stem eat and the suffix -s. (3)Infixes, are inserted inside the stem. - a morpheme is inserted in the middle of a word. e.g., the affix e, infixed to the stem bled “borrow” to produce bleed. the affix um, infixed to the stem hingi “borrow” to
  • 9. - English doesn’t have any good examples of circumfixes, but many other languages do. In German, e.g., adding ge- to the beginning of the stem and -t to the end; so the past participle of the verb sagen (to say) is gesagt (said).  Words can have more than one affix e.g., word “rewrites” have  prefix “re”,  the stem “write” and  suffix “s” 2. Survey of English Morphology 2.1 Categories of Affixes (4) Circumfixe, circumfixes do both (prefixes and suffixes). (Cont..)
  • 10.  There are many ways to combine morphemes to create words.  Four methods are common and play important roles in speech and language processing: (1) Inflection, (2) Derivation, (3) Cliticization, and (4) Compounding. 3. Morphology to create Words
  • 11. 1. Inflection It is the combination of a word stem with a grammatical morpheme, usually resulting in a word of the same class as the original stem, and usually filling some syntactic function like agreement. -English has the inflectional morpheme -s for marking the plural on nouns, and - the inflectional morpheme -ed for marking the past tense on verbs For example: Play > Played Player > Players 3. Morphology to create Words (Cont..)
  • 12.  English has simple inflectional system with; (a) nouns, (b) verbs and (c) some times adjectives.  Nouns have two kind of inflections: (i) Affix that marks plural. (e.g., cat to cats) (ii) Affix that marks possessive (e.g., Ali’s Pen) (iii)Affix that marks plural  Regular plural is spelled -s after most nouns,  it is spelled -es after words ending in -s (ibis/ibises), -z (waltz/waltzes), -sh (thrush/thrushes), -ch (finch/finches), and sometimes -x (box/boxes). Nouns ending in -y preceded by a consonant change the - y to -i (butterfly/butterflies). 3.1 Inflectional Morphology (a. Nouns)
  • 13. (ii) Affix that marks possessive (Tense) The possessive suffix is realized by apostrophe + -s for regular singular nouns (llama’s)  Plural nouns not ending in -s (children’s) 3.1 Inflectional Morphology (a. Nouns) (Cont…)
  • 14.  English verbal inflection is more complicated  English has 3 kinds of verbs;  main verbs, {direct verb, action} (e.g., eat, sleep, impeach),  modal verbs {indirect verb, week action} (e.g., can, will, should), and  primary verbs {supporting verb, action} (e.g., be, have, do)  We will mostly be concerned with the main and primary verbs, because it have inflectional endings.  Of these verbs a large class are regular, that is to say all verbs of this class have the same endings marking the same functions 3.1 Inflectional Morphology (b. Verbs)
  • 15.  Regular verbs (e.g. walk) have four morphological forms, as follow:  stem  -s form  -ing participle  Past form or -ed participle walk walks walking walked  These verbs are called regular because just by knowing the stem we can predict the other forms by adding one of three predictable endings and making some regular spelling changes  Regular verbs and forms are significant in the morphology of English first because they cover a majority of the verbs, and second because the regular class is Productive  A productive class is one that automatically includes any new words that enter the language (e.g., Fax to Faxing) 3.1 Inflectional Morphology (b. Verbs) (Cont…)
  • 16.  The Irregular verbs are those that have some more or less idiosyncratic forms of Irregular verb inflection  Irregular verbs in English often have five different forms, but can have as many as eight or as few as three (e.g. cut or hit).  Note that an irregular verb can inflect in the past form (also called the preterite) by changing its vowel (eat/ate), or its vowel and some consonants (catch/caught), or with no change at all (cut/cut). 3.1 Inflectional Morphology (b. Verbs) (Cont…)
  • 17. Irregular verbs Example : The -s form is used in the “habitual present” form to distinguish the - - third-person singular ending (She jogs every Tuesday) from the other choices of person and number (I/you/we/they jog every Tuesday). In addition to noting which suffixes can be attached to which stems, we need to capture the fact that a number of regular spelling changes occur at these morpheme boundaries. For Example, a single consonant letter is doubled before adding the – ing and -ed suffixes (beg/begging/begged). 3.1 Inflectional Morphology (b. Verbs) (Cont…)
  • 18. 2. Derivation is the combination of a word stem with a grammatical morpheme, - mainly deal with adjective, nouns and verbs. Resulting in a word of a different class, often with a meaning hard to predict exactly. For example the verb computerize can take the derivational suffix -ation to produce the noun computerization. 3. Morphology to create Words (Cont..)
  • 19. Case 1: Verb/Adjective to Noun :- While English inflection is relatively simple compared to other languages, derivation in English is quite complex. A very common kind of derivation in English is the formation of new nouns, often from verbs or adjectives. This process is called nominalization. For Example:- the suffix -ation produces nouns from verbs ending often in the suffix - ize (computerize → computerization). Here are examples of some particularly productive English nominalizing suffixes. 3.2 Derivational Morphology
  • 20. Case 2: Verb/Noun to Adjective:- Adjectives can also be derived from nouns and verbs. Here are examples of a few suffixes deriving adjectives from nouns or verbs. Derivation in English is more complex than inflection for a number of reasons. One is that it is generally less productive; even a nominalizing suffix like -ation, which can be added to almost any verb ending in -ize, cannot be added to absolutely every verb. 3.2 Derivational Morphology (Cont..)
  • 21. 3. Cliticization It is the combination of a word stem with a clitic. A clitic is a morpheme that acts syntactically like a word, but is reduced in form and attached (phonologically and sometimes orthographically) to another word For example English morpheme ’ve in the word “ I’ve ” is a clitic 3. Morphology to create Words (Cont..)
  • 22.  The phonological behavior of clitics is like affixes; they tend to be short and unaccented. Their syntactic behavior is more like words, often acting as pronouns, articles, conjunctions, or verbs.  Clitics preceding a word are called proclitics, (e.g., ‘Tis is it is ) - while those following Proclitic are enclitics. (e.g., I’m) • Note that the clitics in English are ambiguous; Thus she’s can mean she is or she has, correctly segmenting off clitics in English is simplified by the presence of the apostrophe (’) . 3.3 Cliticization Morphology
  • 23. 4. Compounding It is the combination of multiple word stems together., For example the noun doghouse is the concatenation of the morpheme dog with the morpheme house. 3. Morphology to create Words (Cont..)
  • 24. • Inputs from English morphologically parsed in Morphological Parse Column. 4. Finite-State Morphological Parsing
  • 25.  The second column contains the stem of each word as well as assorted morphological features. These features specify additional information Feature about the stem. For Example the feature; +N +Sg +Pl : means that the word is a noun; : means it is singular, : means it is plural. +PresPart : is Present Participle (ending in “ing”) +PastPart : is Past Participle (ending in “ed”)  Note that some of the input forms (like caught, goose, canto, or vino) will be ambiguous between different morphological parses. For now, we will consider the goal of morphological parsing merely to list all possible parses. 4. Finite-State Morphological Parsing (Cont…)
  • 26.  In order to build a morphological parser, we’ll need at least the following: (1)Lexicon: the list of stems and affixes, together with basic information about them (whether a stem is a Noun stem or a Verb stem, etc.). (2) Morphotactics: the model of morpheme ordering that explains which classes of morphemes can follow other classes of morphemes inside a word. For example, the fact that the English plural morpheme follows the noun rather than preceding it is a morphotactic fact. For Example; (e.g., In Cats, Cat is stem and “s” as plural morpheme). (3)Orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine For Example; (e.g., the y→ie spelling rule that changes city + -s to 4. Finite-State Morphological Parsing (Cont…)
  • 27.  A lexicon is a repository for words.  The simplest possible lexicon would consist of an explicit list of every word of the language For Example; - (every word, i.e., including abbreviations (“AAA”) and e.g., a, AAA, AA, Aachen, aardvark, aardwolf, aba, abaca, aback, . . . - proper names (“Jane” or “Beijing”)) as follows:  There are many ways to model morphotactics; one of the most common is the finite-state automaton. 4.1 Building a Finite-State LEXICON (Working For Words)
  • 28. Reg-noun:- The FSA assumes that the lexicon includes regular nouns (reg-noun) that take the regular -s plural (e.g., cat, dog, fox, aardvark). irreg-pl-noun/ irreg-sg-noun :- These are the vast majority of English nouns since for now we will ignore the fact that the plural of words like fox have an inserted e: foxes. The lexicon also includes irregular noun forms that don’t take -s, -both singular irreg-sg-noun (goose, mouse) and - plural irreg-pl-noun (geese,mice). 4.2 Building a Finite-State LEXICON (Reg/Irreg Noun)
  • 29.  This lexicon has three stem classes (reg-verb-stem, irreg-verb-stem, and irreg- pastverb-form), plus four more affix classes (-ed past, -ed participle, - ing participle, and third singular -s). Table: Lexicon for finite-state  English derivational morphology is significantly more complex than English inflectional morphology, and so automata for modeling English derivation tend to be quite complex. 4.3 Building a Finite-State LEXICON (Reg/ Irreg Verb)
  • 30.  Consider a relatively simpler case of derivation: the morphotactics of English adjectives. Here are some examples from Antworth (1990): e.g., big, bigger, biggest,  An initial hypothesis might be that adjectives can have an optional prefix (un-), an obligatory root (big, cool, etc.) and an optional suffix (-er, -est, or -ly).  Big word (combination); 4.4 Building a Finite-State LEXICON (Example-1)
  • 31. Problem Defined:  While this FSA will recognize all the adjectives, it will also recognize ungrammatical forms like unbig, unfast, oranger, or smally. We need to set up classes of roots and specify their possible suffixes. -Thus adj-root1 would include adjectives that can occur with un- and -ly (clear, happy, and real) - while adj-root2 will include adjectives that can’t (big, small),  This FSA models a number of derivational facts, such as the well known generalization that any verb ending in -ize can be followed by the nominalizing suffix –ation. CASE STUDY : - There is a word fossilize, we can predict the word fossilization by following states q0, q1, and q2. Similarly, adjectives ending in -al or - able at q5 (equal, formal, realizable) can take the suffix -ity, or 4.4 Building a Finite-State LEXICON (Example-1)
  • 32.  Design and build a finite-state Lexicon of derivation in which morphotactics of English adjectives and FSA of following combinations are defined: [Note: design single FSA for overall word].  cool, cooler, coolest, coolly;  happy, happier, happiest, happily;  red, redder, reddest;  unhappy, unhappier, unhappiest, unhappily;  real, unreal, really;  clear, clearer, clearest, clearly, unclear, unclearly 4.4 Building a Finite-State LEXICON (Class Participation)
  • 33.  Consider the following FSA of English derivational morphology; describe following combinations of; q0->q1->q2->q3 q0->q1->q2->q4 q0->q5->q6 q0->q5->q2->q3 q0->q5->q2->q4 q0->q5->q6 q0->q5->q9 q0->q8->q9 q0->q8->q6 q0->q7->q8->q9 q0->q10->q8->q6 4.4 Building a Finite-State LEXICON (Assignments)  q0->q10->q8->q9  q0->q10->q8->q6  q0->q11->q8->q9  q0->q11->q8->q6 q0q1q2q3q4q5q6q7q8q9q10q11
  • 34.  The Previous method will successfully recognize words like aardvarks and mice.  Just concatenating the morphemes won’t work for cases where there is a spelling change, it would incorrectly reject an input like foxes and accept an input like foxs.  We need to deal with the fact that English often requires spelling changes at morpheme boundaries by introducing spelling rules (or orthographic rules). Some Spelling Rules 5. Transducers and Orthographic Rules
  • 35.  We could write an E-insertion rule that performs the mapping from the intermediate to surface levels shown.  Such a rule might say something like “insert an e on the surface tape just when the lexical tape has a morpheme ending in (s, z, x, ch, sh etc.) and the next morpheme is -s”.  Here’s a formalization of the rule This is the rule notation of Chomsky and Halle (1968); 5. Transducers and Orthographic Rules (Cont…)
  • 36.  The distance between String distance two strings is a measure of how alike two strings are to each other.  The minimum edit distance between two strings is the minimum number of editing operations (insertion, deletion, substitution) needed to transform one string into another.  For example the gap between the words intention and execution is five operations 6. Minimum Edit Distance
  • 37.  The minimum edit distance is computed by dynamic programming. Dynamic programming is the name for a class of algorithms, that apply a table-driven method to solve problems by combining solutions to subproblems.  This class of algorithms includes the most commonly-used algorithms in speech and language processing.  The intuition of a dynamic programming problem is that a large problem can be solved by properly combining the solutions to various subproblems.  For example, consider the sequence or “path” of transformed words that comprise the minimum edit distance between the strings intention and execution 11. Minimum Edit Distance (Cont…)
  • 38.  Dynamic programming algorithms for sequence comparison work by creating a distance matrix with one column for each symbol in the target sequence and one row for each symbol in the source sequence (i.e., target along the bottom, source along the side).  For minimum edit distance, this matrix is the edit-distance matrix. Each cell edit-distance[i,j] contains the distance between the first i characters of the target and the first j characters of the source.  Each cell can be computed as a simple function of the surrounding cells; thus starting from the beginning of the matrix it is possible to fill every entry.  The value in each cell is computed by taking the minimum of the three possible paths through the matrix which arrive there. 11. Minimum Edit Distance (Cont…)