SlideShare a Scribd company logo
Machine Translation withStatisticalApproach1
Whatis the machine translation???Machine translation is the study of designingsystemsthat translate from one humanlanguage in to another.Machine translation system essentiallytakes a text in one language (called the source language), and translate itintoanotherlanguage(calledtargetlanguage). The source  and targetlanguage are naturallanguagessuch as english and hindi.2
Contd……..	This is the hard problem, sinceprocessingnaturallanguagerequiresworkatseverallevles, and complexities and ambiguitiesariesateach of thoselevles.Hence an MT system canbesaid to bedoingnaturallanguageprocessing(NLP).In fact,most machine translation application requiressomedegree of naturallanguageunderstanding to do the translation.3
History of Machine Translation Machine translation as a discipline dates back to the earlynineteen-fifties. The complexity of the problemwasoriginallyunderestimated, and someearlysuccessfuldemonstrations of experimental system lead to unrealisticexpectionswhichwere hard to fulfil.In the early eighties, the JapaneseFifthGenerationComputing Project revivedinterest in thiswork.The currentapproach to MT is more pragmatic and realistic.4
Contd….It isnowwidelyacceptedthatfullyautomatic, general-purpose , highquality machine translation is a verydifficultproblem, but veryuseful and pratical system canneverthelessbedeveloped by realxing one or more of thesecriteria,andseveralusefulsystems have been built by doingso,and are in use today.Suchsystems are beingused to translate public announcements,weather bulletins, technical documents, and web pages.5
Contd..Some machine translation services are starting to becomeavailable on the world wide web. For example,the web page of the Google searchenginealsoprovides a translation service thatcan translate simple sentences among a handful of languages.6
Translation telephonetechnology(speech to speech translation)The ‘Janus’ projectat the Interactive System lab, Carnegie Mellon University, Is working on set of translation project.You dial yourcolleague in tokyo. You do not speakJapanese, and hedoes not speakenglish.Soyouneed system suchthatyouspeakinto the phone in english, whichautomaticallygets translate intojapanese for him, he replies in japanese, and youhearit in english.7
Research MT System Example:thejanustranlsating Phone projectThis prototype system allowstwousers to communicate in a givendomain via a videoconferencingconnection. Each party sees the other conversant, hearshis/herorginalvoicesees/hears translation of whathe/shesays as subtitles, caption and synthetic speech. The situation iscooperative, That isbothuserswant to understandeachother and collaborate via the system to achieveunderstanding.8
Contd….After the record buttonisactivated, the station acceptsspoken input and produces a paraphrase of the input sentence first. Once the user has verifiedthat the system properlyunderstood the intendedmeaning, he/sheactivate the sendbutton to send a translation of thisintendedmeaning to the otherside in the desiredlanguage. Various interactive correction mechanismsfacilitate quick recovery, should possible processingerros and miscommunication have altered the intendedmeaning.9
Machine Translation & Artificial IntelligenceMT is an important sub-discipline of the widerfield of Artificial Intelligence(AI).AI(amongotherthings)deals withgetting machine to exhibit intelligent behaviour.As wemightimagine,both AI and MT are interesting and challengingfields.10
        Component of MTWecandivide the machine translation taskintothree main phases:-The system has to first analyse the source language input to createsomeinternalrepresentetion.It thentypicallymanipulatesthisinternalrepresentationtotransferit to a formsuitable for a targetlanguage.Finally,itgenerates the output in the targetlanguage.11
AnalysisTransferGenerationSource Language Target Language Intermediate Representation based on source languageIntermediate Representation based on target language12
Contd…A typical MT system contains components for analysis ,transfer and generation as shown in diagram.These components incorporate a lot of knowledge about words(Lexical Knowledge), and about the language (LinguisticKnowledge).Suchknowledgeisstored in one or more lexicons ,and possiblyother sources of linguisticknowledge ,such as grammar.  13
Contd…The user interface isinvariably a crucial part of most MT system.The interface allows user to verify,disambiguate and if necessary correct the output of the system.Anothercommonfeature of NLP workis use of large ‘corpora’.A corpus is a large collection of textwhichisused for acquiring the required lexical and linguisticknowledge. 14
Contd…Somesystemsprefer to split the lexiconinto a source lexicon, a targetlexicon,and a transferlexiconthatmapsbetween the two.An MT lexicontypicallyneeds to bemuch more formal,precise and elaboratethan a typicalhumandictionary,sinceitismeant for mechanicalprocessing,and not for reading by humans.The lexiconplays a central role in modern MT system.15
LexiconThe lexiconis an important component of MT system.A lexiconcontains all the relevant information about words and phrases thatisrequired for the variouslevels of analysis and generation.A typicallexicon entry for a wordwouldcontain the following information about the word:the part of speech,information about the equivalentword in the targetlanguage. 16
Approaches to MTBased on how closely the internalrepresentationdepends on the source and targetlanguages,approaches to MT canbedividedintothree major classes- Direct.Transfer-based.Inter-lingual.  17
A direct MT system tries to directlymap the source language to the targetlanguage , and isthereforehighlydependent on both the source and targetlanguages.A transfer-basedapproach first converts the source languageinto an internalrepresentation (IRs)whichisdependent on the source but not the targetlanguage.The system thentransformIRsinto a formIRtwhichisindependent of the source language and dependsonly on the targetlanguage and finallygenerates the targetlanguage output fromIRt.    18
…The Inter-lingualapproachconverts the input into a single internalrepresentation(IR) thatisindependent of both source and targetlanguages,andthenconvertsfromthisinto the output.19
Levels of Natural LanguageProcessingDealingwithnaturallanguagetypicallyrequiresprocessingatvariouslevels.Inincreasingorder of difficulty,they are:-The Lexical Level(or the Word Level)The SyntacticLevel(or the Sentence Level)The SemanticLevel(or the MeaningLevel)The Discourse and PragmaticLevel(or the Conversation ContextLevel).20
The Lexical LevelThis level deals withlookingat the input string of characters and seperatingthemintotokens,whichmaybewords,space or punctuation.This levelalso deal with issues likehyphenatedwords,andmisspeltwords.It is the lexical levelwhich tells us that the input ‘’hejoined the parti’’consist of four words of which the last is incorrect.This levelissometimescalled ‘tokenisation’or ‘lexical analysis’.21
The SyntacticLevelThis level deals withidentifying the structure of a sentence,andverifyingwhether a sentence isgrammatically correct.This leveltypicallyconsist of a ‘parser’ which looks at the grammar of the language,and the input sentence,and tries to form a ‘parseTree’.If itcanform a parsetree ,the sentence issyntactically correct and the parsetreegives us the structure and the function of various components.22
For ex., a typical English sentence wouldconsist of a subject and predicate.Thesubjectisnormally a noun phrase and the predicateis a verbphrase,andso on.The syntacticlevel tells us the sentence ‘’He the party joined’’ is (syntactically) incorrect, eventhougheachword in itis (lexically) correct.23
           The SemanticLevelThis level deals with the meaning of the input and its components.It is the semanticlevelwhich tells us that the sentence ‘’He ate the Party’’ issemanticallyincorrect,thoughitislexically and syntacticallywellformed.In general, semanticanalysisinvolvesknowledge about the world,orat least the relevant aspect of world. 24
The Conversation ContextLevelThis level deals with the information carriedacross multiple sentences, and with information thatis not explicit in the input, but isimplicit in the socio-cultural context of the input pessage or conversation.For ex., the expectedanswer to the question ‘’Do you know what the time is?’’issomethinglike ‘’4p.m.’’ , and not just ‘’Yes’’though the latter islexically,syntactically and symanticallyaccurate.25
Issues in Machine TranslationMachine Translation(and Natural LanguageProcessing) is a difficultproblem.There are two mains reasons, which are related to it.The first reasonisthatnaturallanguageishighlyambiguous.Theambiguityoccurat  all levels-lexical,syntactic,semantic and pragmatic.Agivenword or sentence can have more than one meaning.Forex,theword ‘’party’’ couldmean a polyticalparty,or a social event,anddeciding the suitable one in perticular case is crucial to getting right analysis and therefore right translation 26
The second reasonisthatwhenhuman use naturallanguage , they use an enormousamount of commonsense, and knowledge about the world, whichhelps to resolve the ambiguity.For ex., in ‘’He went to the bank,butitwasclosed for lunch’’,wecaninferthat ‘bank’ refers to a financial institution, and not a river bank,becausewe know fromourknowledge of the world thatonly the former type of bankcanbeclosed for lunch.  27
The StatisticalApproach          (Warren Weaver,1949)Theyconsideronly the translation of indivisual sentences.Usually, there are many acceptable translation of a perticular sentence the choiceamongthembeinglargely a matter of taste.Theytake the viewthatevery sentence in one languageis a possible translation of any sentence in the other.28
Theyassign to every pair of sentences (S,T) a probability P(S/T) ie. Probabilitythat a translator willproduce T in the targetlanguagewhenpresentedwith S in the source language.Given a sentence T in the targetlanguage,theytry to seek the sentences S fromwhich the translator produces T. The chance of errorisminimized by choosingthat sentence S thatismost probable given T.Thus,theywish to choose S so as to maximize P(S/T).29
UsingBayse’ theorm                                                     P(S/T)  =  P(S).P(T/S) / P(T)The denominator on the right of thisequationdoes not depend on S, and soitsuffices to choose the S thatmaximizes the product P(S)P(T/S) .                                                                      where,                                                                              P(S) is the language model probability of S , and     P(T/S) is the translation probability of T given S.30
ConclusionTwophenomena have given a new impetus to machine translation work-the globalisation of the world economy, and the explosion of the internet and World Wide Web.Boththesedevelopmentsmeanthatthereis a need for making an immense collection of naturallanguage documents available to multilingual global audience, and translation tools  and system can go a long way in meeting thatneed.31
The global translation marketisestimated to beat least 12 billion dollars.System thatautomatically translates Kalidasa and Shakespeare maystillbe  a distant dream, but system that translate stock marketreport,weather bulletins and technicalmeasures are a reality today, and will continue to play an increasingly important role in the society of the next millenium.32
THANK YOU33

More Related Content

What's hot (20)

PPTX
Lexical semantics
MaryumAkhter
 
PPTX
Machine translation
AshaDhedhi
 
PDF
Syntactic analysis in NLP
kartikaVashisht
 
PDF
TRANSLATION UNIT, by Dr. Shadia Yousef Banjar
Dr. Shadia Banjar
 
PPTX
Machine Translation: What it is?
Multilizer
 
PPT
Introduction to Translation
Mohammed Raiyah
 
PPTX
Literal translation
Lida Berisha
 
PDF
Introduction To Translation Technologies
xenotext
 
PPTX
Natural language processing
Yogendra Tamang
 
PDF
Lecture: Word Sense Disambiguation
Marina Santini
 
PPT
Translation Types
Elena Shapa
 
PDF
Translation Strategies, by Dr. Shadia Y. Banjar
Dr. Shadia Banjar
 
PPTX
Natural language-processing
Hareem Naz
 
PDF
Structures in government binding Model
Hajar Moghaddasi
 
PPTX
Natural language processing
Md.Sumon Sarder
 
PDF
CS571: Phrase Structure Grammar
Jinho Choi
 
PPSX
Semantics
Kocaeli University
 
PPTX
Trasnlation shift
Buhsra
 
PPT
Types of translation
Azhar Bhatti
 
Lexical semantics
MaryumAkhter
 
Machine translation
AshaDhedhi
 
Syntactic analysis in NLP
kartikaVashisht
 
TRANSLATION UNIT, by Dr. Shadia Yousef Banjar
Dr. Shadia Banjar
 
Machine Translation: What it is?
Multilizer
 
Introduction to Translation
Mohammed Raiyah
 
Literal translation
Lida Berisha
 
Introduction To Translation Technologies
xenotext
 
Natural language processing
Yogendra Tamang
 
Lecture: Word Sense Disambiguation
Marina Santini
 
Translation Types
Elena Shapa
 
Translation Strategies, by Dr. Shadia Y. Banjar
Dr. Shadia Banjar
 
Natural language-processing
Hareem Naz
 
Structures in government binding Model
Hajar Moghaddasi
 
Natural language processing
Md.Sumon Sarder
 
CS571: Phrase Structure Grammar
Jinho Choi
 
Trasnlation shift
Buhsra
 
Types of translation
Azhar Bhatti
 

Viewers also liked (20)

PDF
Statistical machine translation in a few slides
Forcada Mikel
 
PPTX
A statistical approach to machine translation
Hiroshi Matsumoto
 
PDF
Towards OpenLogos Hybrid Machine Translation - Anabela Barreiro
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
PPT
Google services
tahamj1987
 
PDF
Summary of Rule-based Reordering Space in Statistical Machine Translation
Hiroshi Matsumoto
 
PDF
7. ebmt based on st sm
Hiroshi Matsumoto
 
PPTX
WEBINAR: TAUS Outlook 2013
TAUS - The Language Data Network
 
PPTX
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS - The Language Data Network
 
PDF
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS - The Language Data Network
 
ODP
Tools for translators: some theory & background
Nevada Interpreters and Translators Association (NITA)
 
PPTX
Hcs
vini89
 
PPT
Artificial Intelligence
vini89
 
DOCX
NLP and its applications
Utphala P
 
PPTX
Natural Language Processing: Definition and Application
Stephen Shellman
 
PDF
Computer Aided Translation
Philipp Koehn
 
PPTX
Statistical machine translation
Hrishikesh Nair
 
PPTX
Jeeves -natural language interface application
Karan Harsh Wardhan
 
PPTX
Natural language processing 2
Tony Vo
 
PPT
Similarity based methods for word sense disambiguation
vini89
 
PPT
Machine Translation And Computer Assisted Translation
Teritaa
 
Statistical machine translation in a few slides
Forcada Mikel
 
A statistical approach to machine translation
Hiroshi Matsumoto
 
Towards OpenLogos Hybrid Machine Translation - Anabela Barreiro
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Google services
tahamj1987
 
Summary of Rule-based Reordering Space in Statistical Machine Translation
Hiroshi Matsumoto
 
7. ebmt based on st sm
Hiroshi Matsumoto
 
WEBINAR: TAUS Outlook 2013
TAUS - The Language Data Network
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS - The Language Data Network
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS - The Language Data Network
 
Tools for translators: some theory & background
Nevada Interpreters and Translators Association (NITA)
 
Hcs
vini89
 
Artificial Intelligence
vini89
 
NLP and its applications
Utphala P
 
Natural Language Processing: Definition and Application
Stephen Shellman
 
Computer Aided Translation
Philipp Koehn
 
Statistical machine translation
Hrishikesh Nair
 
Jeeves -natural language interface application
Karan Harsh Wardhan
 
Natural language processing 2
Tony Vo
 
Similarity based methods for word sense disambiguation
vini89
 
Machine Translation And Computer Assisted Translation
Teritaa
 
Ad

Similar to Machine translation with statistical approach (20)

PPTX
Computational linguistics
AdnanBaloch15
 
PDF
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
ijaia
 
PDF
Natural Language Processing Theory, Applications and Difficulties
ijtsrd
 
PDF
NLPinAAC
Divya Sugumar
 
DOCX
Natural Language Processing an introduction
crjothiesh
 
PPTX
Natural language processing
Robert Antony
 
PDF
Natural Language Processing: A comprehensive overview
Benjaminlapid1
 
PDF
Machine Translation Approaches and Design Aspects
IOSR Journals
 
PDF
A Short Introduction To Text-To-Speech Synthesis
Cynthia King
 
PPTX
Prolog (present)
Melody Joey
 
PPT
Vl3.culture plex presentation
CameliaN
 
PPT
Vl3.culture plex presentation
CameliaN
 
PPT
Vl3.cultureplex presentation
CameliaN
 
PDF
An Overview Of Natural Language Processing
Scott Faria
 
PPT
Vl3.lab presentation
CameliaN
 
PDF
NL Context Understanding 23(6)
IT Industry
 
DOCX
Jawaharlal Nehru Technological University Natural Language Processing Capston...
write5
 
PDF
Untitled presentation.pdf
Upinder Kaur
 
PDF
An Intersemiotic Translation of Normative Utterances to Machine Language
dannyijwest
 
PDF
AN INTERSEMIOTIC TRANSLATION OF NORMATIVE UTTERANCES TO MACHINE LANGUAGE
IJwest
 
Computational linguistics
AdnanBaloch15
 
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
ijaia
 
Natural Language Processing Theory, Applications and Difficulties
ijtsrd
 
NLPinAAC
Divya Sugumar
 
Natural Language Processing an introduction
crjothiesh
 
Natural language processing
Robert Antony
 
Natural Language Processing: A comprehensive overview
Benjaminlapid1
 
Machine Translation Approaches and Design Aspects
IOSR Journals
 
A Short Introduction To Text-To-Speech Synthesis
Cynthia King
 
Prolog (present)
Melody Joey
 
Vl3.culture plex presentation
CameliaN
 
Vl3.culture plex presentation
CameliaN
 
Vl3.cultureplex presentation
CameliaN
 
An Overview Of Natural Language Processing
Scott Faria
 
Vl3.lab presentation
CameliaN
 
NL Context Understanding 23(6)
IT Industry
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
write5
 
Untitled presentation.pdf
Upinder Kaur
 
An Intersemiotic Translation of Normative Utterances to Machine Language
dannyijwest
 
AN INTERSEMIOTIC TRANSLATION OF NORMATIVE UTTERANCES TO MACHINE LANGUAGE
IJwest
 
Ad

More from vini89 (7)

PPT
Fuzzy logic
vini89
 
PPT
Ann
vini89
 
PPT
Artificial Intelligence
vini89
 
PPT
Ai
vini89
 
PPT
Ai presentation
vini89
 
PPT
Similarity based methods for word sense disambiguation
vini89
 
PPT
Mycin
vini89
 
Fuzzy logic
vini89
 
Ann
vini89
 
Artificial Intelligence
vini89
 
Ai
vini89
 
Ai presentation
vini89
 
Similarity based methods for word sense disambiguation
vini89
 
Mycin
vini89
 

Recently uploaded (20)

PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
PPTX
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
Introduction to Probability(basic) .pptx
purohitanuj034
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PPTX
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
PPT
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
Digital Professionalism and Interpersonal Competence
rutvikgediya1
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PPTX
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PPTX
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PPTX
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
PPTX
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Introduction to Probability(basic) .pptx
purohitanuj034
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
Digital Professionalism and Interpersonal Competence
rutvikgediya1
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 

Machine translation with statistical approach

  • 2. Whatis the machine translation???Machine translation is the study of designingsystemsthat translate from one humanlanguage in to another.Machine translation system essentiallytakes a text in one language (called the source language), and translate itintoanotherlanguage(calledtargetlanguage). The source and targetlanguage are naturallanguagessuch as english and hindi.2
  • 3. Contd…….. This is the hard problem, sinceprocessingnaturallanguagerequiresworkatseverallevles, and complexities and ambiguitiesariesateach of thoselevles.Hence an MT system canbesaid to bedoingnaturallanguageprocessing(NLP).In fact,most machine translation application requiressomedegree of naturallanguageunderstanding to do the translation.3
  • 4. History of Machine Translation Machine translation as a discipline dates back to the earlynineteen-fifties. The complexity of the problemwasoriginallyunderestimated, and someearlysuccessfuldemonstrations of experimental system lead to unrealisticexpectionswhichwere hard to fulfil.In the early eighties, the JapaneseFifthGenerationComputing Project revivedinterest in thiswork.The currentapproach to MT is more pragmatic and realistic.4
  • 5. Contd….It isnowwidelyacceptedthatfullyautomatic, general-purpose , highquality machine translation is a verydifficultproblem, but veryuseful and pratical system canneverthelessbedeveloped by realxing one or more of thesecriteria,andseveralusefulsystems have been built by doingso,and are in use today.Suchsystems are beingused to translate public announcements,weather bulletins, technical documents, and web pages.5
  • 6. Contd..Some machine translation services are starting to becomeavailable on the world wide web. For example,the web page of the Google searchenginealsoprovides a translation service thatcan translate simple sentences among a handful of languages.6
  • 7. Translation telephonetechnology(speech to speech translation)The ‘Janus’ projectat the Interactive System lab, Carnegie Mellon University, Is working on set of translation project.You dial yourcolleague in tokyo. You do not speakJapanese, and hedoes not speakenglish.Soyouneed system suchthatyouspeakinto the phone in english, whichautomaticallygets translate intojapanese for him, he replies in japanese, and youhearit in english.7
  • 8. Research MT System Example:thejanustranlsating Phone projectThis prototype system allowstwousers to communicate in a givendomain via a videoconferencingconnection. Each party sees the other conversant, hearshis/herorginalvoicesees/hears translation of whathe/shesays as subtitles, caption and synthetic speech. The situation iscooperative, That isbothuserswant to understandeachother and collaborate via the system to achieveunderstanding.8
  • 9. Contd….After the record buttonisactivated, the station acceptsspoken input and produces a paraphrase of the input sentence first. Once the user has verifiedthat the system properlyunderstood the intendedmeaning, he/sheactivate the sendbutton to send a translation of thisintendedmeaning to the otherside in the desiredlanguage. Various interactive correction mechanismsfacilitate quick recovery, should possible processingerros and miscommunication have altered the intendedmeaning.9
  • 10. Machine Translation & Artificial IntelligenceMT is an important sub-discipline of the widerfield of Artificial Intelligence(AI).AI(amongotherthings)deals withgetting machine to exhibit intelligent behaviour.As wemightimagine,both AI and MT are interesting and challengingfields.10
  • 11. Component of MTWecandivide the machine translation taskintothree main phases:-The system has to first analyse the source language input to createsomeinternalrepresentetion.It thentypicallymanipulatesthisinternalrepresentationtotransferit to a formsuitable for a targetlanguage.Finally,itgenerates the output in the targetlanguage.11
  • 12. AnalysisTransferGenerationSource Language Target Language Intermediate Representation based on source languageIntermediate Representation based on target language12
  • 13. Contd…A typical MT system contains components for analysis ,transfer and generation as shown in diagram.These components incorporate a lot of knowledge about words(Lexical Knowledge), and about the language (LinguisticKnowledge).Suchknowledgeisstored in one or more lexicons ,and possiblyother sources of linguisticknowledge ,such as grammar. 13
  • 14. Contd…The user interface isinvariably a crucial part of most MT system.The interface allows user to verify,disambiguate and if necessary correct the output of the system.Anothercommonfeature of NLP workis use of large ‘corpora’.A corpus is a large collection of textwhichisused for acquiring the required lexical and linguisticknowledge. 14
  • 15. Contd…Somesystemsprefer to split the lexiconinto a source lexicon, a targetlexicon,and a transferlexiconthatmapsbetween the two.An MT lexicontypicallyneeds to bemuch more formal,precise and elaboratethan a typicalhumandictionary,sinceitismeant for mechanicalprocessing,and not for reading by humans.The lexiconplays a central role in modern MT system.15
  • 16. LexiconThe lexiconis an important component of MT system.A lexiconcontains all the relevant information about words and phrases thatisrequired for the variouslevels of analysis and generation.A typicallexicon entry for a wordwouldcontain the following information about the word:the part of speech,information about the equivalentword in the targetlanguage. 16
  • 17. Approaches to MTBased on how closely the internalrepresentationdepends on the source and targetlanguages,approaches to MT canbedividedintothree major classes- Direct.Transfer-based.Inter-lingual. 17
  • 18. A direct MT system tries to directlymap the source language to the targetlanguage , and isthereforehighlydependent on both the source and targetlanguages.A transfer-basedapproach first converts the source languageinto an internalrepresentation (IRs)whichisdependent on the source but not the targetlanguage.The system thentransformIRsinto a formIRtwhichisindependent of the source language and dependsonly on the targetlanguage and finallygenerates the targetlanguage output fromIRt. 18
  • 19. …The Inter-lingualapproachconverts the input into a single internalrepresentation(IR) thatisindependent of both source and targetlanguages,andthenconvertsfromthisinto the output.19
  • 20. Levels of Natural LanguageProcessingDealingwithnaturallanguagetypicallyrequiresprocessingatvariouslevels.Inincreasingorder of difficulty,they are:-The Lexical Level(or the Word Level)The SyntacticLevel(or the Sentence Level)The SemanticLevel(or the MeaningLevel)The Discourse and PragmaticLevel(or the Conversation ContextLevel).20
  • 21. The Lexical LevelThis level deals withlookingat the input string of characters and seperatingthemintotokens,whichmaybewords,space or punctuation.This levelalso deal with issues likehyphenatedwords,andmisspeltwords.It is the lexical levelwhich tells us that the input ‘’hejoined the parti’’consist of four words of which the last is incorrect.This levelissometimescalled ‘tokenisation’or ‘lexical analysis’.21
  • 22. The SyntacticLevelThis level deals withidentifying the structure of a sentence,andverifyingwhether a sentence isgrammatically correct.This leveltypicallyconsist of a ‘parser’ which looks at the grammar of the language,and the input sentence,and tries to form a ‘parseTree’.If itcanform a parsetree ,the sentence issyntactically correct and the parsetreegives us the structure and the function of various components.22
  • 23. For ex., a typical English sentence wouldconsist of a subject and predicate.Thesubjectisnormally a noun phrase and the predicateis a verbphrase,andso on.The syntacticlevel tells us the sentence ‘’He the party joined’’ is (syntactically) incorrect, eventhougheachword in itis (lexically) correct.23
  • 24. The SemanticLevelThis level deals with the meaning of the input and its components.It is the semanticlevelwhich tells us that the sentence ‘’He ate the Party’’ issemanticallyincorrect,thoughitislexically and syntacticallywellformed.In general, semanticanalysisinvolvesknowledge about the world,orat least the relevant aspect of world. 24
  • 25. The Conversation ContextLevelThis level deals with the information carriedacross multiple sentences, and with information thatis not explicit in the input, but isimplicit in the socio-cultural context of the input pessage or conversation.For ex., the expectedanswer to the question ‘’Do you know what the time is?’’issomethinglike ‘’4p.m.’’ , and not just ‘’Yes’’though the latter islexically,syntactically and symanticallyaccurate.25
  • 26. Issues in Machine TranslationMachine Translation(and Natural LanguageProcessing) is a difficultproblem.There are two mains reasons, which are related to it.The first reasonisthatnaturallanguageishighlyambiguous.Theambiguityoccurat all levels-lexical,syntactic,semantic and pragmatic.Agivenword or sentence can have more than one meaning.Forex,theword ‘’party’’ couldmean a polyticalparty,or a social event,anddeciding the suitable one in perticular case is crucial to getting right analysis and therefore right translation 26
  • 27. The second reasonisthatwhenhuman use naturallanguage , they use an enormousamount of commonsense, and knowledge about the world, whichhelps to resolve the ambiguity.For ex., in ‘’He went to the bank,butitwasclosed for lunch’’,wecaninferthat ‘bank’ refers to a financial institution, and not a river bank,becausewe know fromourknowledge of the world thatonly the former type of bankcanbeclosed for lunch. 27
  • 28. The StatisticalApproach (Warren Weaver,1949)Theyconsideronly the translation of indivisual sentences.Usually, there are many acceptable translation of a perticular sentence the choiceamongthembeinglargely a matter of taste.Theytake the viewthatevery sentence in one languageis a possible translation of any sentence in the other.28
  • 29. Theyassign to every pair of sentences (S,T) a probability P(S/T) ie. Probabilitythat a translator willproduce T in the targetlanguagewhenpresentedwith S in the source language.Given a sentence T in the targetlanguage,theytry to seek the sentences S fromwhich the translator produces T. The chance of errorisminimized by choosingthat sentence S thatismost probable given T.Thus,theywish to choose S so as to maximize P(S/T).29
  • 30. UsingBayse’ theorm P(S/T) = P(S).P(T/S) / P(T)The denominator on the right of thisequationdoes not depend on S, and soitsuffices to choose the S thatmaximizes the product P(S)P(T/S) . where, P(S) is the language model probability of S , and P(T/S) is the translation probability of T given S.30
  • 31. ConclusionTwophenomena have given a new impetus to machine translation work-the globalisation of the world economy, and the explosion of the internet and World Wide Web.Boththesedevelopmentsmeanthatthereis a need for making an immense collection of naturallanguage documents available to multilingual global audience, and translation tools and system can go a long way in meeting thatneed.31
  • 32. The global translation marketisestimated to beat least 12 billion dollars.System thatautomatically translates Kalidasa and Shakespeare maystillbe a distant dream, but system that translate stock marketreport,weather bulletins and technicalmeasures are a reality today, and will continue to play an increasingly important role in the society of the next millenium.32