Natural Language Processing
Unit 1 – Introduction
Anantharaman Narayana Iyer
narayana dot anantharaman at gmail dot com
7th Aug 2015
Topics
• Motivation: Why NLP?
• Course Outline
• Grading Policy
What are the opportunities for NLP?
NLP is a hugely important topic for both industry and academia
Trends that accelerate NLP research
• Availability of web and social data
• Mobile devices as a source of data
• Need for natural language based I/O for
new devices
• ML techniques: eg deep learning
• Increasing availability of datasets in open
web e.g. Freebase, dbpedia
Motivation
• Google Search Engine
• Intelligently responding to the
query: eg, Where is India Gate?
• Predicting next word for
autocompletion
• Ability to do spelling corrections
• Segmenting words that may be
joined without space
• Ranking the search results
• Google translate
• Gmail
• Eg, Understand contents of an e-
mail through NLP and alert the
user
Speech/NLP
• What technologies
are involved here?
- Continuous Speech Recognition
- Keyword Spotting
- Text to speech
- Speech in Speech out systems
- Speaker identification
- Novel applications (to be explained on the board)
Disambiguation
• Consider an example below.
• We would like to collect tweets on a subject
(Say Rahul Gandhi) and analyse the
sentiment
• We can do a search on Twitter with the
Search API with key words: “Rahul Gandhi”
• This might miss tweets that have only the
term Rahul and not Gandhi.
• If we just search for the search terms:
[“Rahul”, “Gandhi”], we may get results that
match any Rahul (e.g Rahul Dravid or KL
Rahul)
• We can do an intelligent tweet search
using NLP techniques
Summarization
• The challenge we face is not the lack of
information but the overload.
• Summarization is a core technology that
can help address information overload
• Related Problems:
• How to validate the quality, correctness of
information?
• Summarizing multimedia
• How do we summarize social data, where:
• Data may have less signal, more noise!
• Data may be biased
• Data may not be factual
• Repetitive
• Can we autogenerate a (set of) Tweet(s)
from a news article?
Answer Evaluation
• Answer evaluation is a core
challenge for online
education systems.
• Wouldn’t it be nice if
questions can be both
descriptive as well as
objective?
• Can there be an automated
answer evaluation system
that doesn’t require peer
evaluation?
Sentiment Analysis
• Measurement of pulse of people
from social media
• Can measure sentiments against
a brand or product or events.
• Crowded space but not a fully
solved problem due to inherent
challenges in Natural Language
Processing
• Can we build a sentiment
analyser using RNNs and
evaluate the performance?
Plagiarism Detection
Dialog Systems
• Dialog systems that can be deployed
commercially?
• Natural Language Processing
• Natural language generation
Can we build a NLG library and make it open source?
Demo
• http://www.manifestation.com/neurotoys/eliza.php3
Course Structure
• Foundational
• Emerging
• Applications
Course Positioning
• Classical NLP techniques (such as Language Models, MaxEnt
classifiers, HMM, CRF etc) have proven to be effective in
addressing problems like Part of Speech tagging, Text
classification, Information Retrieval etc. However they are
inadequate when dealing with problems that involve more
semantics
• Modern approaches (such as deep learning) hold lot of
promise in addressing problems involving semantics. They
were also shown to produce results better than or equal to
classical techniques for typical NLP tasks.
• Internationally acclaimed courses like those offered by Dan
Jurafsky, Christopher Manning, Michael Collins on Coursera
and also those offered at Stanford are strong in the
traditional topics and somewhat light when discussing
emerging topics.
• The recent course by Socher at Stanford is heavy on
Recurrent network based approaches but assumes that the
student is familiar to a good extent with the traditional NLP
• Our course takes the best of both worlds and backs it up
with intense hands on work.
Key Topics
• Foundational
• Words, sentences: Tokenization, regular expressions, challenges of ambiguity, edit distance,
spelling corrections, string similarity, tf, tf-idf
• Stemming, Lemmatization
• Language models, smoothing, applications to speech, metrics
• Tagging problems: Viterbi Algorithm (HMM), POS, NER tagging, SRL
• Parsing: PCFG, CKY algorithm
• Information Retrieval, Information Extraction, Word Sense disambiguation, Summarization,
Q&A systems, Dialogue Systems
• Natural Language Generation
• Emerging Approaches:
• Deep Learning and Vector Space approaches to: Word representation, Sentence and text
compositionality, LM, Parsing, Parsing, Q&A Systems
• Applications:
• Modern approaches to many exciting applications including speech
Course Grading Policy
• Unit Evaluations (3 out of 5): 30%
• Lab sessions (2 out of 5): 10%
• T1: 15%
• Final Exam: 3 days, 6 to 8 hours per day of product development (Will
be run like a hackathon with a 90 minutes objective type written test
on day 1): 15% (for test) + 25% (for hands on)
• Attendance: 5%
Challenges: Why NLP is hard?
The central challenge of Natural Language Processing is ambiguity and
it exists at every level or stage of NLP
Poets and writers thrive on ambiguity in the language semantics while
most of us abhor ambiguity!
Can the NLP understand poetry or better still, can it generate one?
That seems to be the ultimate!
Another challenge is the representation: How to represent words?
Sentences? Large text? How to model the real world knowledge?
One prayer, 25 interpretations! (Ref: Raghuvamsa
by Kalidasa)
Vagarthaviva sampriktau vagarthah pratipattaye | Jagatah pitarau
vande parvathiparameshwarau || – Raghuvamsha 1.1
• Common Meaning: I pray parents of the world, Lord Shiva and
Mother Parvathi, who are inseparable as speech and its meaning to
gain knowledge of speech and its meaning.
Ambiguity – some examples
• Homophones: Words with same pronunciation but with different meanings
• Peace, piece: A spoken sentence like “The PM attended the peace summit” has an ambiguity at the term “peace”, as
a speech to text translation might translate this as “piece”
• Knew, new
• Weak, week
• Word boundary
• It’s all ready, looking great!
• It’s already looking great!
• Syntactic Ambiguity: Arises due to different parse trees for the same input
• Phrase boundary
• Ananth created the presentation with video from web: ‘with video’ can be attached as “Ananth created the presentation, ‘with video’ “ or to
“Ananth created the ‘presentation with video’”
• Semantic level ambiguity: Many ways to interpret a sentence
• John and Susan are married (to each other? Separately?)
• Ram had a smooth sailing.
• Prices have gone through the roof
• India says it can’t accept the proposal
Representation: Text, Images, Audio, Video
• What are the distinguishing characteristics of text data and what are the unique challenges?
• Text is made of words, images of pixels, audio with sampled and digitized audio signal, video with
image frames in motion
• How do we represent a piece of text in the computer?
• Let’s do a simple exercise: What are the thoughts, emotions that cross your mind when you hear
the following words?
• Kalam
• Brilliant
• Pleasant
• Destruction
• Perfume
• Code
• Test
• Run
• Signal
• Words can be used in different contexts and the context is key to interpreting the
meaning of the word

More Related Content

PDF
Natural Language Processing (NLP)
PDF
Word representation: SVD, LSA, Word2Vec
PDF
Introduction to natural language processing
PDF
Natural Language Processing: L02 words
PDF
NLP & Machine Learning - An Introductory Talk
PPTX
Artificial Intelligence Notes Unit 4
PPT
Natural Language Processing for Games Research
PPTX
natural language processing help at myassignmenthelp.net
Natural Language Processing (NLP)
Word representation: SVD, LSA, Word2Vec
Introduction to natural language processing
Natural Language Processing: L02 words
NLP & Machine Learning - An Introductory Talk
Artificial Intelligence Notes Unit 4
Natural Language Processing for Games Research
natural language processing help at myassignmenthelp.net

What's hot (20)

PPTX
Natural language processing
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
PDF
Natural language processing
PDF
Natural language processing (NLP) introduction
PDF
UCU NLP Summer Workshops 2017 - Part 2
PPT
Big Data and Natural Language Processing
PDF
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
PPT
Natural Language Processing
PPTX
Lecture 1: Semantic Analysis in Language Technology
PDF
Introduction to Natural Language Processing (NLP)
DOCX
Natural Language Processing
PDF
Adnan: Introduction to Natural Language Processing
PPTX
Recent Advances in NLP
PPTX
Introduction to Natural Language Processing
PPT
Introduction to Natural Language Processing
PDF
Anthiil Inside workshop on NLP
PPTX
A Panorama of Natural Language Processing
PPTX
Natural Language Processing in Alternative and Augmentative Communication
Natural language processing
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Natural language processing
Natural language processing (NLP) introduction
UCU NLP Summer Workshops 2017 - Part 2
Big Data and Natural Language Processing
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Natural Language Processing
Lecture 1: Semantic Analysis in Language Technology
Introduction to Natural Language Processing (NLP)
Natural Language Processing
Adnan: Introduction to Natural Language Processing
Recent Advances in NLP
Introduction to Natural Language Processing
Introduction to Natural Language Processing
Anthiil Inside workshop on NLP
A Panorama of Natural Language Processing
Natural Language Processing in Alternative and Augmentative Communication
Ad

Viewers also liked (20)

PDF
Natural Language Processing
PDF
Practical Natural Language Processing
PDF
Deep Learning Primer - a brief introduction
PPT
Introduction to Natural Language Processing
PPTX
Natural language processing
PPT
Natural language processing
PPTX
PDF
Natural Language Processing: L03 maths fornlp
PDF
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
PPTX
Natural Language Processing
PDF
Overview of TensorFlow For Natural Language Processing
PDF
L05 language model_part2
PDF
Convolutional Neural Networks: Part 1
PDF
Deep Learning For Speech Recognition
PDF
Recurrent Neural Networks, LSTM and GRU
PPT
PPT
Finalpresentation
PPTX
Natural Language Processing
PDF
Natural Language Processing glossary for Coders
PPTX
ADO.NET Introduction
Natural Language Processing
Practical Natural Language Processing
Deep Learning Primer - a brief introduction
Introduction to Natural Language Processing
Natural language processing
Natural language processing
Natural Language Processing: L03 maths fornlp
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Natural Language Processing
Overview of TensorFlow For Natural Language Processing
L05 language model_part2
Convolutional Neural Networks: Part 1
Deep Learning For Speech Recognition
Recurrent Neural Networks, LSTM and GRU
Finalpresentation
Natural Language Processing
Natural Language Processing glossary for Coders
ADO.NET Introduction
Ad

Similar to Natural Language Processing: L01 introduction (20)

PPTX
Introduction to NLP.pptx
PDF
Natural Language Processing for development
PDF
Natural Language Processing for development
PPTX
Natural language processing and search
PPTX
AI in ELT PPT by Dr Krishna Chaitanya Associate Professor CUKashmir.pptx
PPTX
Natural-Language-Processing -Stages and application area.pptx
PDF
NOVA Data Science Meetup 1/19/2017 - Presentation 2
PDF
Text analysis and Semantic Search with GATE
PPT
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
PDF
Nlp presentation
PPTX
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
PDF
Natural language processing (nlp)
PPTX
Building NLP solutions for Davidson ML Group
PPTX
Addis Ababa University.pptx
PPTX
Natural Language Processing (NLP).pptx
PPTX
NLP,expert,robotics.pptx
PDF
Natural language processing module 1 chapter 1
PDF
GATE: a text analysis tool for social media
PPTX
Open Creativity Scoring Tutorial
PPT
introduction to natural language processing(NLP).ppt
Introduction to NLP.pptx
Natural Language Processing for development
Natural Language Processing for development
Natural language processing and search
AI in ELT PPT by Dr Krishna Chaitanya Associate Professor CUKashmir.pptx
Natural-Language-Processing -Stages and application area.pptx
NOVA Data Science Meetup 1/19/2017 - Presentation 2
Text analysis and Semantic Search with GATE
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Nlp presentation
LONGSEM2024-25_CSE3015_ETH_AP2024256000125_Reference-Material-I.pptx
Natural language processing (nlp)
Building NLP solutions for Davidson ML Group
Addis Ababa University.pptx
Natural Language Processing (NLP).pptx
NLP,expert,robotics.pptx
Natural language processing module 1 chapter 1
GATE: a text analysis tool for social media
Open Creativity Scoring Tutorial
introduction to natural language processing(NLP).ppt

More from ananth (16)

PDF
Generative Adversarial Networks : Basic architecture and variants
PDF
Convolutional Neural Networks : Popular Architectures
PDF
Foundations: Artificial Neural Networks
PDF
Overview of Convolutional Neural Networks
PDF
Artificial Intelligence Course: Linear models
PDF
An Overview of Naïve Bayes Classifier
PDF
Mathematical Background for Artificial Intelligence
PDF
Search problems in Artificial Intelligence
PDF
Introduction to Artificial Intelligence
PDF
Machine Learning Lecture 3 Decision Trees
PDF
Machine Learning Lecture 2 Basics
PDF
Introduction To Applied Machine Learning
PDF
MaxEnt (Loglinear) Models - Overview
PDF
An overview of Hidden Markov Models (HMM)
PDF
L06 stemmer and edit distance
PDF
L05 word representation
Generative Adversarial Networks : Basic architecture and variants
Convolutional Neural Networks : Popular Architectures
Foundations: Artificial Neural Networks
Overview of Convolutional Neural Networks
Artificial Intelligence Course: Linear models
An Overview of Naïve Bayes Classifier
Mathematical Background for Artificial Intelligence
Search problems in Artificial Intelligence
Introduction to Artificial Intelligence
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 2 Basics
Introduction To Applied Machine Learning
MaxEnt (Loglinear) Models - Overview
An overview of Hidden Markov Models (HMM)
L06 stemmer and edit distance
L05 word representation

Recently uploaded (20)

PPTX
Relevance Tuning with Genetic Algorithms
PDF
solman-7.0-ehp1-sp21-incident-management
PDF
Top AI Tools for Project Managers: My 2025 AI Stack
PPTX
ESDS_SAP Application Cloud Offerings.pptx
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
PPTX
Human Computer Interaction lecture Chapter 2.pptx
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PPTX
Swiggy API Scraping A Comprehensive Guide on Data Sets and Applications.pptx
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PPTX
Beige and Black Minimalist Project Deck Presentation (1).pptx
PPTX
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
PDF
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
PDF
Difference Between Website and Web Application.pdf
PPT
3.Software Design for software engineering
PPTX
Why 2025 Is the Best Year to Hire Software Developers in India
PDF
Adlice Diag Crack With Serial Key Free Download 2025
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PDF
Mobile App for Guard Tour and Reporting.pdf
PPTX
Presentation - Summer Internship at Samatrix.io_template_2.pptx
Relevance Tuning with Genetic Algorithms
solman-7.0-ehp1-sp21-incident-management
Top AI Tools for Project Managers: My 2025 AI Stack
ESDS_SAP Application Cloud Offerings.pptx
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Human Computer Interaction lecture Chapter 2.pptx
SAP Business AI_L1 Overview_EXTERNAL.pptx
Swiggy API Scraping A Comprehensive Guide on Data Sets and Applications.pptx
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
Beige and Black Minimalist Project Deck Presentation (1).pptx
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
Difference Between Website and Web Application.pdf
3.Software Design for software engineering
Why 2025 Is the Best Year to Hire Software Developers in India
Adlice Diag Crack With Serial Key Free Download 2025
Top 10 Project Management Software for Small Teams in 2025.pdf
Mobile App for Guard Tour and Reporting.pdf
Presentation - Summer Internship at Samatrix.io_template_2.pptx

Natural Language Processing: L01 introduction

  • 1. Natural Language Processing Unit 1 – Introduction Anantharaman Narayana Iyer narayana dot anantharaman at gmail dot com 7th Aug 2015
  • 2. Topics • Motivation: Why NLP? • Course Outline • Grading Policy
  • 3. What are the opportunities for NLP?
  • 4. NLP is a hugely important topic for both industry and academia
  • 5. Trends that accelerate NLP research • Availability of web and social data • Mobile devices as a source of data • Need for natural language based I/O for new devices • ML techniques: eg deep learning • Increasing availability of datasets in open web e.g. Freebase, dbpedia
  • 6. Motivation • Google Search Engine • Intelligently responding to the query: eg, Where is India Gate? • Predicting next word for autocompletion • Ability to do spelling corrections • Segmenting words that may be joined without space • Ranking the search results • Google translate • Gmail • Eg, Understand contents of an e- mail through NLP and alert the user
  • 7. Speech/NLP • What technologies are involved here? - Continuous Speech Recognition - Keyword Spotting - Text to speech - Speech in Speech out systems - Speaker identification - Novel applications (to be explained on the board)
  • 8. Disambiguation • Consider an example below. • We would like to collect tweets on a subject (Say Rahul Gandhi) and analyse the sentiment • We can do a search on Twitter with the Search API with key words: “Rahul Gandhi” • This might miss tweets that have only the term Rahul and not Gandhi. • If we just search for the search terms: [“Rahul”, “Gandhi”], we may get results that match any Rahul (e.g Rahul Dravid or KL Rahul) • We can do an intelligent tweet search using NLP techniques
  • 9. Summarization • The challenge we face is not the lack of information but the overload. • Summarization is a core technology that can help address information overload • Related Problems: • How to validate the quality, correctness of information? • Summarizing multimedia • How do we summarize social data, where: • Data may have less signal, more noise! • Data may be biased • Data may not be factual • Repetitive • Can we autogenerate a (set of) Tweet(s) from a news article?
  • 10. Answer Evaluation • Answer evaluation is a core challenge for online education systems. • Wouldn’t it be nice if questions can be both descriptive as well as objective? • Can there be an automated answer evaluation system that doesn’t require peer evaluation?
  • 11. Sentiment Analysis • Measurement of pulse of people from social media • Can measure sentiments against a brand or product or events. • Crowded space but not a fully solved problem due to inherent challenges in Natural Language Processing • Can we build a sentiment analyser using RNNs and evaluate the performance?
  • 13. Dialog Systems • Dialog systems that can be deployed commercially? • Natural Language Processing • Natural language generation Can we build a NLG library and make it open source?
  • 15. Course Structure • Foundational • Emerging • Applications
  • 16. Course Positioning • Classical NLP techniques (such as Language Models, MaxEnt classifiers, HMM, CRF etc) have proven to be effective in addressing problems like Part of Speech tagging, Text classification, Information Retrieval etc. However they are inadequate when dealing with problems that involve more semantics • Modern approaches (such as deep learning) hold lot of promise in addressing problems involving semantics. They were also shown to produce results better than or equal to classical techniques for typical NLP tasks. • Internationally acclaimed courses like those offered by Dan Jurafsky, Christopher Manning, Michael Collins on Coursera and also those offered at Stanford are strong in the traditional topics and somewhat light when discussing emerging topics. • The recent course by Socher at Stanford is heavy on Recurrent network based approaches but assumes that the student is familiar to a good extent with the traditional NLP • Our course takes the best of both worlds and backs it up with intense hands on work.
  • 17. Key Topics • Foundational • Words, sentences: Tokenization, regular expressions, challenges of ambiguity, edit distance, spelling corrections, string similarity, tf, tf-idf • Stemming, Lemmatization • Language models, smoothing, applications to speech, metrics • Tagging problems: Viterbi Algorithm (HMM), POS, NER tagging, SRL • Parsing: PCFG, CKY algorithm • Information Retrieval, Information Extraction, Word Sense disambiguation, Summarization, Q&A systems, Dialogue Systems • Natural Language Generation • Emerging Approaches: • Deep Learning and Vector Space approaches to: Word representation, Sentence and text compositionality, LM, Parsing, Parsing, Q&A Systems • Applications: • Modern approaches to many exciting applications including speech
  • 18. Course Grading Policy • Unit Evaluations (3 out of 5): 30% • Lab sessions (2 out of 5): 10% • T1: 15% • Final Exam: 3 days, 6 to 8 hours per day of product development (Will be run like a hackathon with a 90 minutes objective type written test on day 1): 15% (for test) + 25% (for hands on) • Attendance: 5%
  • 19. Challenges: Why NLP is hard? The central challenge of Natural Language Processing is ambiguity and it exists at every level or stage of NLP Poets and writers thrive on ambiguity in the language semantics while most of us abhor ambiguity! Can the NLP understand poetry or better still, can it generate one? That seems to be the ultimate! Another challenge is the representation: How to represent words? Sentences? Large text? How to model the real world knowledge?
  • 20. One prayer, 25 interpretations! (Ref: Raghuvamsa by Kalidasa) Vagarthaviva sampriktau vagarthah pratipattaye | Jagatah pitarau vande parvathiparameshwarau || – Raghuvamsha 1.1 • Common Meaning: I pray parents of the world, Lord Shiva and Mother Parvathi, who are inseparable as speech and its meaning to gain knowledge of speech and its meaning.
  • 21. Ambiguity – some examples • Homophones: Words with same pronunciation but with different meanings • Peace, piece: A spoken sentence like “The PM attended the peace summit” has an ambiguity at the term “peace”, as a speech to text translation might translate this as “piece” • Knew, new • Weak, week • Word boundary • It’s all ready, looking great! • It’s already looking great! • Syntactic Ambiguity: Arises due to different parse trees for the same input • Phrase boundary • Ananth created the presentation with video from web: ‘with video’ can be attached as “Ananth created the presentation, ‘with video’ “ or to “Ananth created the ‘presentation with video’” • Semantic level ambiguity: Many ways to interpret a sentence • John and Susan are married (to each other? Separately?) • Ram had a smooth sailing. • Prices have gone through the roof • India says it can’t accept the proposal
  • 22. Representation: Text, Images, Audio, Video • What are the distinguishing characteristics of text data and what are the unique challenges? • Text is made of words, images of pixels, audio with sampled and digitized audio signal, video with image frames in motion • How do we represent a piece of text in the computer? • Let’s do a simple exercise: What are the thoughts, emotions that cross your mind when you hear the following words? • Kalam • Brilliant • Pleasant • Destruction • Perfume • Code • Test • Run • Signal • Words can be used in different contexts and the context is key to interpreting the meaning of the word