9
Most read
10
Most read
13
Most read
PROCESSING TEXT IN NATURAL LANGUAGE
Natural language processing (NLP) is a machine learning
technology that gives computers the ability to interpret, manipulate,
and comprehend human language.
Organizations today have large volumes of voice and text data from
various communication channels like emails, text messages, social
media newsfeeds, video, audio, and more.
They use NLP software to automatically process this data, analyze the
intent or sentiment in the message, and respond in real time to
human communication.
Why is NLP important?
• Natural language processing (NLP) is critical to fully and efficiently analyze
text and speech data.
• It can work through the differences in dialects, slang, and grammatical
irregularities typical in day-to-day conversations.
Companies use it for several automated tasks, such as to:
• Process, analyze, and archive large documents
• Analyze customer feedback or call center recordings
• Run chatbots for automated customer service
• Answer who-what-when-where questions
• Classify and extract text
Natural language processing (NLP) is a field that focuses on making natural
human language usable by computer programs.
NLTK, or Natural Language Toolkit, is a Python package that you can use for NLP.
If you’re familiar with the basics of using Python and would like to get your feet
wet with some NLP, then you’ve come to the right place.
Find text to analyze
• Preprocess your text for analysis
• Analyze your text
• Create visualizations based on your analysis
Pre-processing Text
• Text pre-processing is a crucial step in performing sentiment analysis, as it
helps to clean and normalize the text data, making it easier to analyse.
• The pre-processing step involves a series of techniques that help transform raw
text data into a form you can use for analysis.
• Some common text pre-processing techniques include tokenization, stop word
removal, stemming, and lemmatization.
Working with NLTK
• To work with NLTK, you first need to install it using pip install nltk.
• Then, import the library and download necessary data like corpora and models using
nltk.download().
• NLTK provides tools for various NLP tasks such as tokenization, stemming, part-of-speech
tagging, and more.
Here's a breakdown of how to get started:
• 1. Installation:
• Open your terminal or command prompt and Run the command: pip install nltk.
• 2. Importing NLTK:
In your Python script, import the necessary module:
Import nltk
Downloading NLTK Data:
NLTK requires downloading specific corpora and models for various tasks.
You can download them using the following command:
nltk.download()
This will open a graphical interface where you can choose which data to download.
For beginners, downloading the book collection is a good starting point.
Basic NLTK Operations:
Tokenization: Breaking down text into smaller units (words, sentences
Tokenizing
By tokenizing, you can conveniently split up text by word or by sentence.
This will allow you to work with smaller pieces of text that are still relatively
coherent and meaningful even outside of the context of the rest of the text.
It’s your first step in turning unstructured data into structured data, which is
easier to analyse.
When you’re analysing text, you’ll be tokenizing by word and tokenizing by
sentence. Here’s what both types of tokenization bring to the table:
Tokenizing by word: Words are like the atoms of natural language.They’re the
smallest unit of meaning that still makes sense on its own.
Tokenizing your text by word allows you to identify words that come up
particularly often.
For example, if you were analyzing a group of job ads, then you might find that the
word “Python” comes up often.
That could suggest high demand for Python knowledge, but you’d need to look
deeper to know more.
Tokenizing by sentence: When you tokenize by sentence, you can analyze how
those words relate to one another and see more context.
Example:
import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
text="This is rama and krishna. Avanthi is a college for Professionals"
word=word_tokenize(text)
sentence=sent_tokenize(text)
print(word)
print(sentence)
Stemming : Reduce their works to their root form
import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
text="This is ram and krishna. Avanthi is a college for Professionals "
words=word_tokenize(text)
sentences=sent_tokenize(text)
print(words)
print(sentences)
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)
Example of POS Tagging
Consider the sentence: "The quick brown fox jumps over the lazy dog.“
After performing POS Tagging:
• "The" is tagged as determiner (DT)
• "quick" is tagged as adjective (JJ)
• "brown" is tagged as adjective (JJ)
• "fox" is tagged as noun (NN)
• "jumps" is tagged as verb (VBZ)
• "over" is tagged as preposition (IN)
• "the" is tagged as determiner (DT)
• "lazy" is tagged as adjective (JJ)
• "dog" is tagged as noun (NN)
Part-of-Speech (POS) Tagging: Identifying the grammatical role of each word.
import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
text="This is rama and krishna. Avanthi is a college for professional"
words=word_tokenize(text)
sentences=sent_tokenize(text)
print(words)
print(sentences)
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)
from nltk.tag import pos_tag
tagged_words = pos_tag(words)
print(tagged_words)
Stop Word Removal: Removing common words that don't carry much meaning (e.g., "the", "a", "is").
import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
text="This is ram and krishna. Avanthi is a college"
words=word_tokenize(text)
sentences=sent_tokenize(text)
print(words)
print(sentences)
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)
from nltk.tag import pos_tag
tagged_words = pos_tag(words)
print(tagged_words)
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
filtered_words = [w for w in words if w.lower() not in stop_words]
print(filtered_words)
Accessing Corpora:
NLTK provides access to various corpora (text datasets).
from nltk.corpus import gutenberg
# List available corpora
print(gutenberg.fileids())
# Access a specific corpus (e.g., Shakespeare's Hamlet)
hamlet = gutenberg.words('shakespeare-hamlet.txt')
print(hamlet[:50]) # Print the first 50 words
Sentiment Analysis:
NLTK can be used for sentiment analysis.
You'll need to download the vader_lexicon corpus for this.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
text = "This movie was fantastic! I loved it."
scores = analyzer.polarity_scores(text)
print(scores)

More Related Content

PPTX
Automate using Python
PPTX
presentation on matrix
PPTX
Introduction to ML (Machine Learning)
PPT
Matrices - Mathematics
DOCX
Java questions for viva
PPTX
PPT on Data Science Using Python
PPTX
Diabetes Mellitus
PPTX
Hypertension
Automate using Python
presentation on matrix
Introduction to ML (Machine Learning)
Matrices - Mathematics
Java questions for viva
PPT on Data Science Using Python
Diabetes Mellitus
Hypertension

What's hot (20)

PPTX
Regular expressions,function and glob module.pptx
PPTX
Pyhton with Mysql to perform CRUD operations.pptx
PPTX
What is FIle and explanation of text files.pptx
PPTX
BINARY files CSV files JSON files with example.pptx
PPTX
JSON, XML and Data Science introduction.pptx
PPTX
Handling Missing Data for Data Analysis.pptx
PPTX
Mongodatabase with Python for Students.pptx
PPTX
DataStructures in Pyhton Pandas and numpy.pptx
DOCX
Python Notes for mca i year students osmania university.docx
PPTX
Parsing HTML read and write operations and OS Module.pptx
PPTX
python plotting's and its types with examples.pptx
PDF
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
PDF
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PDF
Python - Lecture 11
PPT
Python ppt
PDF
Numpy - Array.pdf
PPTX
Python With MongoDB in advanced Python.pptx
PPTX
Object oriented programming in python
PDF
Functions and modules in python
Regular expressions,function and glob module.pptx
Pyhton with Mysql to perform CRUD operations.pptx
What is FIle and explanation of text files.pptx
BINARY files CSV files JSON files with example.pptx
JSON, XML and Data Science introduction.pptx
Handling Missing Data for Data Analysis.pptx
Mongodatabase with Python for Students.pptx
DataStructures in Pyhton Pandas and numpy.pptx
Python Notes for mca i year students osmania university.docx
Parsing HTML read and write operations and OS Module.pptx
python plotting's and its types with examples.pptx
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
PYTHON PROGRAMMING NOTES RKREDDY.pdf
Python - Lecture 11
Python ppt
Numpy - Array.pdf
Python With MongoDB in advanced Python.pptx
Object oriented programming in python
Functions and modules in python
Ad

Similar to Natural Language processing using nltk.pptx (20)

PPTX
Nltk
PPTX
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
PPTX
Introduction to natural language processing (NLP)
PDF
Nltk:a tool for_nlp - py_con-dhaka-2014
PPTX
MODULE 4-Text Analytics.pptx
PPSX
Nltk - Boston Text Analytics
PPTX
PPTX
Natural Language Processing_in semantic web.pptx
PPTX
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
PPTX
Natural Language Processing
PPTX
What is Text Analysis?
PPTX
Natural Language Processing & its importance
PPTX
NLP PPT.pptx
PDF
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
PPTX
Natural language processing using python
PPTX
DOCX
Top 10 Must-Know NLP Techniques for Data Scientists
PPTX
Text Mining_big_data_machine_learning.pptx
PDF
Beyond the Symbols: A 30-minute Overview of NLP
PPTX
Fast and accurate sentiment classification us and naive bayes model b516001
Nltk
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
Introduction to natural language processing (NLP)
Nltk:a tool for_nlp - py_con-dhaka-2014
MODULE 4-Text Analytics.pptx
Nltk - Boston Text Analytics
Natural Language Processing_in semantic web.pptx
NLP WITH NAÏVE BAYES CLASSIFIER (1).pptx
Natural Language Processing
What is Text Analysis?
Natural Language Processing & its importance
NLP PPT.pptx
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
Natural language processing using python
Top 10 Must-Know NLP Techniques for Data Scientists
Text Mining_big_data_machine_learning.pptx
Beyond the Symbols: A 30-minute Overview of NLP
Fast and accurate sentiment classification us and naive bayes model b516001
Ad

More from Ramakrishna Reddy Bijjam (18)

PPTX
Probability Distribution Reviewing Probability Distributions.pptx
PPTX
Combining data and Customizing the Header NamesSorting.pptx
PPTX
Statistics and its measures with Python.pptx
DOCX
VBS control structures for if do whilw.docx
DOCX
Builtinfunctions in vbscript and its types.docx
DOCX
VBScript Functions procedures and arrays.docx
DOCX
VBScript datatypes and control structures.docx
PPTX
Numbers and global functions conversions .pptx
DOCX
Structured Graphics in dhtml and active controls.docx
DOCX
Filters and its types as wave shadow.docx
PPTX
JavaScript Arrays and its types .pptx
PPTX
JS Control Statements and Functions.pptx
PPTX
Code conversions binary to Gray vice versa.pptx
PDF
FIXED and FLOATING-POINT-REPRESENTATION.pdf
PPTX
Data Frame Data structure in Python pandas.pptx
PPTX
Series data structure in Python Pandas.pptx
PPTX
Arrays to arrays and pointers with arrays.pptx
PPTX
Pointers and single &multi dimentionalarrays.pptx
Probability Distribution Reviewing Probability Distributions.pptx
Combining data and Customizing the Header NamesSorting.pptx
Statistics and its measures with Python.pptx
VBS control structures for if do whilw.docx
Builtinfunctions in vbscript and its types.docx
VBScript Functions procedures and arrays.docx
VBScript datatypes and control structures.docx
Numbers and global functions conversions .pptx
Structured Graphics in dhtml and active controls.docx
Filters and its types as wave shadow.docx
JavaScript Arrays and its types .pptx
JS Control Statements and Functions.pptx
Code conversions binary to Gray vice versa.pptx
FIXED and FLOATING-POINT-REPRESENTATION.pdf
Data Frame Data structure in Python pandas.pptx
Series data structure in Python Pandas.pptx
Arrays to arrays and pointers with arrays.pptx
Pointers and single &multi dimentionalarrays.pptx

Recently uploaded (20)

PPTX
Theoretical for class.pptxgshdhddhdhdhgd
PPTX
Diploma pharmaceutics notes..helps diploma students
PPTX
PLASMA AND ITS CONSTITUENTS 123.pptx
PDF
FYJC - Chemistry textbook - standard 11.
PDF
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
PDF
Chevening Scholarship Application and Interview Preparation Guide
PDF
Laparoscopic Dissection Techniques at WLH
PDF
faiz-khans about Radiotherapy Physics-02.pdf
PDF
Health aspects of bilberry: A review on its general benefits
PPTX
Integrated Management of Neonatal and Childhood Illnesses (IMNCI) – Unit IV |...
PPTX
Designing Adaptive Learning Paths in Virtual Learning Environments
PPTX
Why I Am A Baptist, History of the Baptist, The Baptist Distinctives, 1st Bap...
PPTX
ACFE CERTIFICATION TRAINING ON LAW.pptx
PPTX
Neurology of Systemic disease all systems
PPTX
BSCE 2 NIGHT (CHAPTER 2) just cases.pptx
PPTX
principlesofmanagementsem1slides-131211060335-phpapp01 (1).ppt
PPTX
Thinking Routines and Learning Engagements.pptx
PPT
Acidosis in Dairy Herds: Causes, Signs, Management, Prevention and Treatment
PDF
Horaris_Grups_25-26_Definitiu_15_07_25.pdf
PDF
Solved Past paper of Pediatric Health Nursing PHN BS Nursing 5th Semester
Theoretical for class.pptxgshdhddhdhdhgd
Diploma pharmaceutics notes..helps diploma students
PLASMA AND ITS CONSTITUENTS 123.pptx
FYJC - Chemistry textbook - standard 11.
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
Chevening Scholarship Application and Interview Preparation Guide
Laparoscopic Dissection Techniques at WLH
faiz-khans about Radiotherapy Physics-02.pdf
Health aspects of bilberry: A review on its general benefits
Integrated Management of Neonatal and Childhood Illnesses (IMNCI) – Unit IV |...
Designing Adaptive Learning Paths in Virtual Learning Environments
Why I Am A Baptist, History of the Baptist, The Baptist Distinctives, 1st Bap...
ACFE CERTIFICATION TRAINING ON LAW.pptx
Neurology of Systemic disease all systems
BSCE 2 NIGHT (CHAPTER 2) just cases.pptx
principlesofmanagementsem1slides-131211060335-phpapp01 (1).ppt
Thinking Routines and Learning Engagements.pptx
Acidosis in Dairy Herds: Causes, Signs, Management, Prevention and Treatment
Horaris_Grups_25-26_Definitiu_15_07_25.pdf
Solved Past paper of Pediatric Health Nursing PHN BS Nursing 5th Semester

Natural Language processing using nltk.pptx

  • 1. PROCESSING TEXT IN NATURAL LANGUAGE Natural language processing (NLP) is a machine learning technology that gives computers the ability to interpret, manipulate, and comprehend human language. Organizations today have large volumes of voice and text data from various communication channels like emails, text messages, social media newsfeeds, video, audio, and more. They use NLP software to automatically process this data, analyze the intent or sentiment in the message, and respond in real time to human communication.
  • 2. Why is NLP important? • Natural language processing (NLP) is critical to fully and efficiently analyze text and speech data. • It can work through the differences in dialects, slang, and grammatical irregularities typical in day-to-day conversations. Companies use it for several automated tasks, such as to: • Process, analyze, and archive large documents • Analyze customer feedback or call center recordings • Run chatbots for automated customer service • Answer who-what-when-where questions • Classify and extract text
  • 3. Natural language processing (NLP) is a field that focuses on making natural human language usable by computer programs. NLTK, or Natural Language Toolkit, is a Python package that you can use for NLP. If you’re familiar with the basics of using Python and would like to get your feet wet with some NLP, then you’ve come to the right place. Find text to analyze • Preprocess your text for analysis • Analyze your text • Create visualizations based on your analysis
  • 4. Pre-processing Text • Text pre-processing is a crucial step in performing sentiment analysis, as it helps to clean and normalize the text data, making it easier to analyse. • The pre-processing step involves a series of techniques that help transform raw text data into a form you can use for analysis. • Some common text pre-processing techniques include tokenization, stop word removal, stemming, and lemmatization.
  • 5. Working with NLTK • To work with NLTK, you first need to install it using pip install nltk. • Then, import the library and download necessary data like corpora and models using nltk.download(). • NLTK provides tools for various NLP tasks such as tokenization, stemming, part-of-speech tagging, and more. Here's a breakdown of how to get started: • 1. Installation: • Open your terminal or command prompt and Run the command: pip install nltk. • 2. Importing NLTK: In your Python script, import the necessary module: Import nltk
  • 6. Downloading NLTK Data: NLTK requires downloading specific corpora and models for various tasks. You can download them using the following command: nltk.download() This will open a graphical interface where you can choose which data to download. For beginners, downloading the book collection is a good starting point.
  • 7. Basic NLTK Operations: Tokenization: Breaking down text into smaller units (words, sentences Tokenizing By tokenizing, you can conveniently split up text by word or by sentence. This will allow you to work with smaller pieces of text that are still relatively coherent and meaningful even outside of the context of the rest of the text. It’s your first step in turning unstructured data into structured data, which is easier to analyse. When you’re analysing text, you’ll be tokenizing by word and tokenizing by sentence. Here’s what both types of tokenization bring to the table:
  • 8. Tokenizing by word: Words are like the atoms of natural language.They’re the smallest unit of meaning that still makes sense on its own. Tokenizing your text by word allows you to identify words that come up particularly often. For example, if you were analyzing a group of job ads, then you might find that the word “Python” comes up often. That could suggest high demand for Python knowledge, but you’d need to look deeper to know more. Tokenizing by sentence: When you tokenize by sentence, you can analyze how those words relate to one another and see more context.
  • 9. Example: import nltk from nltk.tokenize import word_tokenize,sent_tokenize text="This is rama and krishna. Avanthi is a college for Professionals" word=word_tokenize(text) sentence=sent_tokenize(text) print(word) print(sentence)
  • 10. Stemming : Reduce their works to their root form import nltk from nltk.tokenize import word_tokenize,sent_tokenize text="This is ram and krishna. Avanthi is a college for Professionals " words=word_tokenize(text) sentences=sent_tokenize(text) print(words) print(sentences) from nltk.stem import PorterStemmer stemmer = PorterStemmer() stemmed_words = [stemmer.stem(word) for word in words] print(stemmed_words)
  • 11. Example of POS Tagging Consider the sentence: "The quick brown fox jumps over the lazy dog.“ After performing POS Tagging: • "The" is tagged as determiner (DT) • "quick" is tagged as adjective (JJ) • "brown" is tagged as adjective (JJ) • "fox" is tagged as noun (NN) • "jumps" is tagged as verb (VBZ) • "over" is tagged as preposition (IN) • "the" is tagged as determiner (DT) • "lazy" is tagged as adjective (JJ) • "dog" is tagged as noun (NN)
  • 12. Part-of-Speech (POS) Tagging: Identifying the grammatical role of each word. import nltk from nltk.tokenize import word_tokenize,sent_tokenize text="This is rama and krishna. Avanthi is a college for professional" words=word_tokenize(text) sentences=sent_tokenize(text) print(words) print(sentences) from nltk.stem import PorterStemmer stemmer = PorterStemmer() stemmed_words = [stemmer.stem(word) for word in words] print(stemmed_words) from nltk.tag import pos_tag tagged_words = pos_tag(words) print(tagged_words)
  • 13. Stop Word Removal: Removing common words that don't carry much meaning (e.g., "the", "a", "is"). import nltk from nltk.tokenize import word_tokenize,sent_tokenize text="This is ram and krishna. Avanthi is a college" words=word_tokenize(text) sentences=sent_tokenize(text) print(words) print(sentences) from nltk.stem import PorterStemmer stemmer = PorterStemmer() stemmed_words = [stemmer.stem(word) for word in words] print(stemmed_words) from nltk.tag import pos_tag tagged_words = pos_tag(words) print(tagged_words) from nltk.corpus import stopwords stop_words = set(stopwords.words('english')) filtered_words = [w for w in words if w.lower() not in stop_words] print(filtered_words)
  • 14. Accessing Corpora: NLTK provides access to various corpora (text datasets). from nltk.corpus import gutenberg # List available corpora print(gutenberg.fileids()) # Access a specific corpus (e.g., Shakespeare's Hamlet) hamlet = gutenberg.words('shakespeare-hamlet.txt') print(hamlet[:50]) # Print the first 50 words
  • 15. Sentiment Analysis: NLTK can be used for sentiment analysis. You'll need to download the vader_lexicon corpus for this. from nltk.sentiment.vader import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() text = "This movie was fantastic! I loved it." scores = analyzer.polarity_scores(text) print(scores)