SlideShare a Scribd company logo
2
Most read
3
Most read
4
Most read
Welcome to Ducat India
Language | Industrial Training | Digital Marketing |
Web Technology | Testing+ | Database |
Networking | Mobile Application | ERP | Graphic |
Big Data | Cloud Computing
Apply Now
Training & Placement
Call now:
70-70-90-50-90
www.ducatindia.com
What is Text Analysis?
What is Text Analysis?
Text analysis is known as text analytics. It refers to the representation, processing, and modeling of textual data
to derive beneficial insights. An important element of text analysis is text mining, the process of finding
relationships and interesting patterns in large text collections.
Steps of Text Analysis
A text analysis problem usually includes three important steps: parsing, search and retrieval, and text mining.
Parsing:
Parsing is the process that takes the unstructured text and imposes a structure for further analysis. The
unstructured text can be a plain text file, a weblog, an Extensible Markup Language (XML) file, a HyperText
Markup Language (HTML) file, or a Word document. Parsing deconstructs the provided text and renders it in a
more structured way for the subsequent steps.
Search and retrieval:
Search, and retrieval is the identification of the documents in a corpus that contain search items such as specific
words, phrases, topics, or entities like people or organizations. These search items are generally known as key
terms. Search, and retrieval originated from the field of library science and is now used extensively by web
search engines.
What is Text Analysis?
Text mining:
Text mining uses the terms and indexes produced by the prior two phases to find meaningful insights pertaining
to domains or problems of interest.
Representing Text
Tokenization is the function of separating (also called tokenizing) words from the body of the text. Raw text is
modified into a set of tokens after the tokenization, where each token is generally a word. A common approach is
tokenizing on spaces. For example, the tweet has shown previously:
I once had a gf back in the day. Then the bPhone came out lol
tokenization based on spaces would output a list of tokens.
{I, once, had, a, gf, back, in, the, day., Then, the, bPhone, came, out, lol}
Tokenization is a much more difficult task than one may expect. For example, should words like state-of-the-art,
Wi-Fi, and San Francisco be considered one token or more? Should words like Résumé, résumé, and resume all
map to the same token? Tokenization is even more difficult beyond English. In German, for example, there are
many unsegmented compound nouns. In Chinese, there are no spaces between words. Japanese has several
alphabets intermingled. This list can go on.
What is Text Analysis?
Another text normalization technique is called case folding, which reduces all letters to lowercase (or the opposite
if applicable). For the previous tweet, after case folding the text would become this:
i once had a gf back in the day. Then the bphone came out lol
After normalizing the text by tokenization and case folding, it needs to be represented in a more structured way. A
simple yet widely used approach to represent text is called bag-of-words.
Read More: https://tutorials.ducatindia.com/data-science/what-is-text-analysis/
Call now:
70-70-90-50-90
www.ducatindia.com

More Related Content

PPTX
UNIT-1 and 2 Text and image classification .pptx
Keerthanakeerthana869629
 
PPTX
Future of text analysis forrester briefing
Stuart Shulman
 
PDF
Text analysis and its Importance.pdf
VivekDixit486466
 
PPTX
Introduction to Text Analysis
Lauren Klein
 
PPTX
Sa discover text webinar
QuestionPro
 
PPTX
Information Retrieval Systems_Lecture_1_Text_Analytics.pptx
SudheerKumar723333
 
PPTX
Text analytics
Utkarsh Sharma
 
PPTX
Text analysis
Naimat Chitrali
 
UNIT-1 and 2 Text and image classification .pptx
Keerthanakeerthana869629
 
Future of text analysis forrester briefing
Stuart Shulman
 
Text analysis and its Importance.pdf
VivekDixit486466
 
Introduction to Text Analysis
Lauren Klein
 
Sa discover text webinar
QuestionPro
 
Information Retrieval Systems_Lecture_1_Text_Analytics.pptx
SudheerKumar723333
 
Text analytics
Utkarsh Sharma
 
Text analysis
Naimat Chitrali
 

Similar to What is Text Analysis? (7)

PPTX
MODULE 4-Text Analytics.pptx
nikshaikh786
 
PPTX
Data Science & Analytics , Computer Science
MurugeswariC1
 
PPTX
Data Science & Analytics , Computer Science
MurugeswariC1
 
PPTX
Data Science & Analytics , Computer Science
MurugeswariC1
 
PPT
Text Analytics: Yesterday, Today and Tomorrow
Tony Russell-Rose
 
PDF
NLP Msc Computer science S2 Kerala University
vineethpradeep50
 
PPTX
Text Analytics
Ajay Ram
 
MODULE 4-Text Analytics.pptx
nikshaikh786
 
Data Science & Analytics , Computer Science
MurugeswariC1
 
Data Science & Analytics , Computer Science
MurugeswariC1
 
Data Science & Analytics , Computer Science
MurugeswariC1
 
Text Analytics: Yesterday, Today and Tomorrow
Tony Russell-Rose
 
NLP Msc Computer science S2 Kerala University
vineethpradeep50
 
Text Analytics
Ajay Ram
 
Ad

More from Ducat India (20)

PPTX
Join MCSA Server 2016 And 2019 Course In Noida
Ducat India
 
PPTX
Apply now for dot net training classes in Noida
Ducat India
 
PPTX
Apply now for linux training classes in noida
Ducat India
 
PPTX
Apply Now for DevOps Training Classes in Noida
Ducat India
 
PPTX
Apply Now for AutoCAD Training Course in Noida
Ducat India
 
PPTX
Amazon Elastic Load Balancing
Ducat India
 
PPTX
AWS Relation Database Services
Ducat India
 
PPTX
Microsoft Dynamics CRM – Web Resources
Ducat India
 
PPTX
Field Types
Ducat India
 
PPTX
Sprint in jira
Ducat India
 
PPTX
JIRA Versions
Ducat India
 
PPTX
Kanban Board in Jira
Ducat India
 
PPTX
Test Report Preparation
Ducat India
 
PPTX
Data Science Using Scikit-Learn
Ducat India
 
PPTX
Struts 2 – Database Access
Ducat India
 
PPTX
Struts 2 – Interceptors
Ducat India
 
PPTX
Struts 2 – Architecture
Ducat India
 
PPTX
Hibernate 5 – merge() Example
Ducat India
 
PPTX
Hibernate Object States – Transient,Persistent and Detached
Ducat India
 
PPTX
Spring – Java-based Container Configuration
Ducat India
 
Join MCSA Server 2016 And 2019 Course In Noida
Ducat India
 
Apply now for dot net training classes in Noida
Ducat India
 
Apply now for linux training classes in noida
Ducat India
 
Apply Now for DevOps Training Classes in Noida
Ducat India
 
Apply Now for AutoCAD Training Course in Noida
Ducat India
 
Amazon Elastic Load Balancing
Ducat India
 
AWS Relation Database Services
Ducat India
 
Microsoft Dynamics CRM – Web Resources
Ducat India
 
Field Types
Ducat India
 
Sprint in jira
Ducat India
 
JIRA Versions
Ducat India
 
Kanban Board in Jira
Ducat India
 
Test Report Preparation
Ducat India
 
Data Science Using Scikit-Learn
Ducat India
 
Struts 2 – Database Access
Ducat India
 
Struts 2 – Interceptors
Ducat India
 
Struts 2 – Architecture
Ducat India
 
Hibernate 5 – merge() Example
Ducat India
 
Hibernate Object States – Transient,Persistent and Detached
Ducat India
 
Spring – Java-based Container Configuration
Ducat India
 
Ad

Recently uploaded (20)

PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PPTX
How to Apply for a Job From Odoo 18 Website
Celine George
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
CDH. pptx
AneetaSharma15
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
How to Apply for a Job From Odoo 18 Website
Celine George
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
CDH. pptx
AneetaSharma15
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 

What is Text Analysis?

  • 1. Welcome to Ducat India Language | Industrial Training | Digital Marketing | Web Technology | Testing+ | Database | Networking | Mobile Application | ERP | Graphic | Big Data | Cloud Computing Apply Now Training & Placement Call now: 70-70-90-50-90 www.ducatindia.com
  • 2. What is Text Analysis? What is Text Analysis? Text analysis is known as text analytics. It refers to the representation, processing, and modeling of textual data to derive beneficial insights. An important element of text analysis is text mining, the process of finding relationships and interesting patterns in large text collections. Steps of Text Analysis A text analysis problem usually includes three important steps: parsing, search and retrieval, and text mining. Parsing: Parsing is the process that takes the unstructured text and imposes a structure for further analysis. The unstructured text can be a plain text file, a weblog, an Extensible Markup Language (XML) file, a HyperText Markup Language (HTML) file, or a Word document. Parsing deconstructs the provided text and renders it in a more structured way for the subsequent steps. Search and retrieval: Search, and retrieval is the identification of the documents in a corpus that contain search items such as specific words, phrases, topics, or entities like people or organizations. These search items are generally known as key terms. Search, and retrieval originated from the field of library science and is now used extensively by web search engines.
  • 3. What is Text Analysis? Text mining: Text mining uses the terms and indexes produced by the prior two phases to find meaningful insights pertaining to domains or problems of interest. Representing Text Tokenization is the function of separating (also called tokenizing) words from the body of the text. Raw text is modified into a set of tokens after the tokenization, where each token is generally a word. A common approach is tokenizing on spaces. For example, the tweet has shown previously: I once had a gf back in the day. Then the bPhone came out lol tokenization based on spaces would output a list of tokens. {I, once, had, a, gf, back, in, the, day., Then, the, bPhone, came, out, lol} Tokenization is a much more difficult task than one may expect. For example, should words like state-of-the-art, Wi-Fi, and San Francisco be considered one token or more? Should words like Résumé, résumé, and resume all map to the same token? Tokenization is even more difficult beyond English. In German, for example, there are many unsegmented compound nouns. In Chinese, there are no spaces between words. Japanese has several alphabets intermingled. This list can go on.
  • 4. What is Text Analysis? Another text normalization technique is called case folding, which reduces all letters to lowercase (or the opposite if applicable). For the previous tweet, after case folding the text would become this: i once had a gf back in the day. Then the bphone came out lol After normalizing the text by tokenization and case folding, it needs to be represented in a more structured way. A simple yet widely used approach to represent text is called bag-of-words. Read More: https://tutorials.ducatindia.com/data-science/what-is-text-analysis/