SlideShare a Scribd company logo
Introduction to Sentiment Analysis
Rajesh Piryani
Department Of Computer Science
South Asian University, New Delhi
What is Sentiment Analysis?
It is a natural language processing task that uses an algorithmic formulation to
categorize an opinionated text into either “positive” or “negative” sentiment classes (or
sometimes a “neutral” class equivalent to having no opinion polarity).
SA(Sentiment Analysis) is defined as a quintuple
<Oi; Fij; Ski jl; Hk; Tl >
 Oi = targeted object
 Fij = feature of the object
 Ski jl = Sentiment polarity,
 Hk = Opinion Holder k,
 Tl =Time when the opinion is expressed
Example
Oi = Samsung Mobile
Fij = Battery, Camera, Memory Card, Design, etc
Ski jl = positive for six month, Negative after that
Hk = Myself,
Tl =When I purchased the Samsung mobile it was
good, but now after 6 months it gets heated in 4 to
5 minutes .
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 2
Why Sentiment Analysis
Mainly because of the Web; huge volumes of opinionated text
User-generated media: One can express opinions on anything in reviews, forums,
discussion groups, blogs
Opinions of global scale: No longer limited to:
Individuals: one’s circle of friends
 Businesses: Small scale surveys, tiny focus groups, etc.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 3
Example 1
I love this movie! It's sweet, but with satirical humor. The dialogue is great and the
adventure scenes are fun… It manages to be whimsical and romantic while laughing at
the conventions of the fairy tale genre. I would recommend it to just about anyone.
I've seen it several times, and I'm always happy to see it again whenever I have a
friend who hasn't seen it yet.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 4
Example 2
My XYZ CAR was delivered yesterday. It looks fabulous. We went on a long
highway drive the very second day of getting the car. It was smooth, comfortable and
wonderful drive. Had a wonderful experience with family. Its an awesome car. I am
loving it..!
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 5
Classification of Sentence
Opinion without sentiment (Objectivity)
I believe the World is flat.
Samsung Galaxy has resolution of 14 MP.
Sentiment always involve holder’s emotion or
desires (Subjectivity)
I think intervention in Libya will put US in a
difficult situation.
The US attack on Afghanistan is wrong.
Video Quality of iPhone is awesome.
iPhone6 is newest in the market.
Sentences
Objective Subjective
Positive Negative Neutral
Figure 1. Classification of Sentence
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 6
Levels of Sentiment Analysis
Levels of
Sentiment Analysis
Document Level Sentence Level Aspect Level
Figure 2. Level of Sentiment Analysis
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 7
Example 3
iPhone- User Review:
I bought an iPhone a few days ago. It was such a nice phone. The touch screen was
really cool. The voice quality was clear too. Although the battery life was not long, that
is ok for me. However, my mother was mad with me as I did not tell her before I
bought the phone. She also thought the phone was too expensive, and wanted me to
return it to the shop. …
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 8
Visual Comparison of Aspect based
Sentiment Analysis
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 9
Figure 3. Visual Comparison of Aspect Level based Sentiment Analysis
Approaches to perform Sentiment Analysis
Machine Learning Classifier Approach
Naïve Bayes, Maximum Entropy, Support Vector Machine etc.
Unsupervised Semantic Orientation Approach
Semantic Orientation-Point-wise Mutual Information-Information Retrieval
Semi-supervised SentiWordNet based Approaches
SentiWordNet, SenticNet
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 10
ML Supervised Algorithm Block Diagram
Figure 4. Block diagram of ML Supervised Algorithm
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 11
Preprocessing of data for ML Algorithm
Review/Text Tokenization
Stop word
removal
Punctuation
marks
removal
Stemming
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 12
Figure 5. Steps for pre-processing of data
Preprocessing of data for ML Algorithm
Stop Words:
•common words that have low discrimination power (e.g., the, is, and who)
•usually filtered out before processing the text
Stemming
•the purpose of stemming is to reduce different grammatical forms or word forms of a
word like its noun, adjective, verb, adverb etc
•The goal of stemming is to reduce inflectional forms and sometimes derivationally related
forms of a word to a common base form
•Example: "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu"
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 13
Supervised Machine Learning
Input:
 a document 𝒅
A fixed set of classes 𝑪 = 𝒄𝟏, 𝒄𝟐, … , 𝒄𝒏
A train set of m hand-labeled documents 𝒅𝟏, 𝒄𝟏 , … , (𝒅𝒎, 𝒄𝒎)
Output
A learned classifier, 𝒀: 𝒅 → 𝒄
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 14
The bag of words representation
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 15
The bag of words representation
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 16
The bag of word representation:
using a subset of words
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 17
The bag of words representation
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 18
NB Machine Learning Approach
The probability of a document d being in class c is computed as
𝑷 𝒄 𝒅 ∝ 𝑷 𝒄
𝟏≤𝒌≤𝒏𝒅
𝑷( 𝒕𝒌|𝒄)
where, 𝑷(𝒕𝒌|𝒄) is the conditional probability of a term 𝒕𝒌 occurring in a document of class 𝒄.
The goal is to find the best class, i.e., Maximum A Posteriori Class as follows:
𝒄𝒎𝒂𝒑 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒄∈𝑪 𝑷 𝒄 ∗
𝟏≤𝒌≤𝒏𝒅
𝑷( 𝒕𝒌|𝒄)
Which can be reframed as
𝒄𝒎𝒂𝒑 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒄∈𝑪[𝒍𝒐𝒈 𝑷 𝒄 +
𝟏≤𝒌≤𝒏𝒅
𝒍𝒐𝒈 𝑷(𝒕𝒌|𝒄)]
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 19
NB Machine Learning Approach (Contd..)
𝑷(𝒄) and 𝑷(𝒕𝒌|𝒄) are maximum likelihood estimates based on training data and can be computed as:
𝑷 𝒄 =
𝑵𝑪
𝑵
𝑷 𝒕 𝒄 =
𝑻𝒄𝒕
𝒕′∈𝑽 𝑻𝒄𝒕′
Laplace (add-1) smoothing for Naïve Bayes
𝑷 𝒕 𝒄 =
𝑻𝒄𝒕 + 𝟏
𝒕′∈𝑽 𝑻𝒄𝒕′ + 𝟏
=
𝑻𝒄𝒕 + 𝟏
( 𝒕′∈𝑽 𝑻𝒄𝒕′) + |𝑽|
where, 𝑵 is total no. of docs,
𝑵𝒄 is the no. of docs in the class 𝒄.
𝑻𝒄𝒕 is the number of occurrences of term 𝒕 in training docs from class 𝒄.
|𝑽|is the number of unique words in vocabulary
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 20
Example
S: I love this fun film.
Steps:
 Assigning each word: 𝑷(𝒘𝒐𝒓𝒅 | 𝒄)
 Assigning each sentence: 𝑷(𝒔|𝒄) = 𝚷 𝑷(𝒘𝒐𝒓𝒅|𝒄)
Which class assigns the higher probability to s?
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 21
Example
S: I love this fun film.
Steps:
 Assigning each word: 𝑷(𝒘𝒐𝒓𝒅 | 𝒄)
 Assigning each sentence: 𝑷(𝒔|𝒄) = 𝚷 𝑷(𝒘𝒐𝒓𝒅|𝒄)
Model Positive
0.1 I
0.1 love
0.01 this
0.05 fun
0.1 film
Model Negative
0.2 I
0.001 love
0.01 this
0.005 fun
0.1 film
S I love this fun film
0.1 0.1 0.01 0.05 0.1
0.2 0.001 0.01 0.005 0.1
𝑷 𝒔 𝒑𝒐𝒔 > 𝑷(𝒔|𝒏𝒆𝒈)
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 22
Example
Doc Words Class
Training Document
1 Chinese Beijing Chinese c
2 Chinese Chinese Shanghai c
3 Chinese Macao c
4 Tokyo Japan Chinese j
Test Document 5 Chinese Chinese Chinese Tokyo Japan ?
Formulas
𝑷𝒓𝒊𝒐𝒓 𝑷 𝒄 =
𝑵𝒄
𝑵
Conditional Probability
𝑷 𝒘 𝒄 =
𝒄𝒐𝒖𝒏𝒕 𝒘, 𝒄 + 𝟏
𝒄𝒐𝒖𝒏𝒕 𝒄 + |𝑽|
𝑽 : 𝒔𝒊𝒛𝒆 𝒐𝒇 𝒗𝒐𝒄𝒂𝒃𝒖𝒍𝒂𝒓𝒚(𝐮𝐧𝐢𝐪𝐮𝐞 𝐰𝐨𝐫𝐝𝐬)
𝒄𝒐𝒖𝒏𝒕 𝒄 : 𝒕𝒐𝒕𝒂𝒍 𝒘𝒐𝒓𝒅 𝒊𝒏 𝒄𝒍𝒂𝒔𝒔 𝒄
𝒄𝒐𝒖𝒏𝒕 𝒘, 𝒄 : 𝒐𝒄𝒄𝒖𝒓𝒆𝒏𝒄𝒆 𝒐𝒇 𝒘 𝒊𝒏 𝒄
For example: Prior 𝑷 𝒄 =
𝟑
𝟒
𝑷 𝒋 =
𝟏
𝟒
Conditional Probabilities
𝑃 𝐶ℎ𝑖𝑛𝑒𝑠𝑒 𝑐 =
5 + 1
8 + 6
=
6
14
=
3
7
𝑃 𝐵𝑒𝑖𝑗𝑖𝑛𝑔 𝑐 =
1 + 1
8 + 6
=
2
14
=
1
7
𝑃 𝑆ℎ𝑎𝑛𝑔ℎ𝑎𝑖 𝑐 =
1 + 1
8 + 6
=
2
14
=
1
7
𝑃 𝑀𝑎𝑐𝑎𝑜 𝑐 =
1 + 1
8 + 6
=
2
14
=
1
7
𝑃 𝑇𝑜𝑘𝑦𝑜 𝑐 =
0 + 1
8 + 6
=
1
14
𝑃 𝐽𝑎𝑝𝑎𝑛 𝑐 =
0 + 1
8 + 6
=
1
14
Conditional Probabilities
𝑃 𝐶ℎ𝑖𝑛𝑒𝑠𝑒 𝑗 =
1 + 1
3 + 6
=
2
9
𝑃 𝐵𝑒𝑖𝑗𝑖𝑛𝑔 𝑗 =
0 + 1
3 + 6
=
1
9
𝑃 𝑆ℎ𝑎𝑛𝑔ℎ𝑎𝑖 𝑗 =
0 + 1
3 + 6
=
1
9
𝑃 𝑀𝑎𝑐𝑎𝑜 𝑗 =
0 + 1
3 + 6
=
1
9
𝑃 𝑇𝑜𝑘𝑦𝑜 𝑗 =
1 + 1
3 + 6
=
2
9
𝑃 𝐽𝑎𝑝𝑎𝑛 𝑗 =
1 + 1
3 + 6
=
2
9
CHOOSING A CLASS
𝑷 𝒄 𝒅𝟓 ∝
𝟑
𝟒
∗
𝟑
𝟕
𝟑
∗
𝟏
𝟏𝟒
∗
𝟏
𝟏𝟒
≈ 𝟎. 𝟎𝟎𝟎𝟑
𝑷 𝒋 𝒅𝟓 ∝
𝟏
𝟒
∗
𝟐
𝟗
𝟑
∗
𝟐
𝟗
∗
𝟐
𝟗
≈ 𝟎. 𝟎𝟎𝟎𝟏
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 23
Algorithm
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 24
Performance Evaluation
Definition of some terminologies
𝒕𝒑:- A true positive (𝒕𝒑) decision assigns two similar documents to the same classes
𝒕𝒏:- a true negative (𝒕𝒏) decision assigns two dissimilar documents to different classes
𝒇𝒑:- A (𝒇𝒑) decision assigns two dissimilar documents to the same classes
𝒇𝒏:- A (𝒇𝒏) decision assigns two similar documents to different classes
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 25
Performance Evaluation
Accuracy (A)
𝑨 =
𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒐𝒓𝒓𝒆𝒄𝒕𝒍𝒚 𝒄𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒆𝒅 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔
𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔
Precision (P)
𝑷 =
| 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 ∩ 𝒓𝒆𝒕𝒓𝒊𝒆𝒗𝒆𝒅 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 |
| 𝒓𝒆𝒕𝒓𝒊𝒆𝒗𝒆𝒅 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 |
𝑷 =
𝒕𝒑
𝒕𝒑 + 𝒇𝒑
Recall (R)
𝑹 =
| 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 ∩ 𝒓𝒆𝒕𝒓𝒊𝒆𝒗𝒆𝒅 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 |
| 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 |
𝑹 =
𝒕𝒑
𝒕𝒑 + 𝒇𝒏
F-measure(F)
𝑭 = 𝟐 ∗
𝑷 ∗ 𝑹
𝑷 + 𝑹
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 26
Exercise
Doc Words Class
Training
Document
1 India Eden India Wicket Cricket
2 India India Sachin Cricket
3 Sachin India Eden Cricket
4 Japan Mesi India Football
Test Document 5 India Sachin India Japan Eden Wicket ?
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 27
Compute the Conditional Probability of each unique word and compute
the class of doc5?
Hint
Doc Words Class
Training
Document
1 India Eden India Wicket Cricket
2 India India Sachin Cricket
3 Sachin India Eden Cricket
4 Japan Mesi India Football
Test Document 5 India Sachin India Japan Eden Wicket ?
Formulas
𝑷𝒓𝒊𝒐𝒓 𝑷 𝒄 =
𝑵𝒄
𝑵
Conditional Probability
𝑷 𝒘 𝒄 =
𝒄𝒐𝒖𝒏𝒕 𝒘, 𝒄 + 𝟏
𝒄𝒐𝒖𝒏𝒕 𝒄 + |𝑽|
𝑽 : 𝒔𝒊𝒛𝒆 𝒐𝒇 𝒗𝒐𝒄𝒂𝒃𝒖𝒍𝒂𝒓𝒚(𝐮𝐧𝐢𝐪𝐮𝐞 𝐰𝐨𝐫𝐝𝐬)
𝒄𝒐𝒖𝒏𝒕 𝒄 : 𝒕𝒐𝒕𝒂𝒍 𝒘𝒐𝒓𝒅 𝒊𝒏 𝒄𝒍𝒂𝒔𝒔 𝒄
𝒄𝒐𝒖𝒏𝒕 𝒘, 𝒄 : 𝒐𝒄𝒄𝒖𝒓𝒆𝒏𝒄𝒆 𝒐𝒇 𝒘 𝒊𝒏 𝒄
For example: Prior 𝑷 𝒄 =
𝟑
𝟒
𝑷 𝒋 =
𝟏
𝟒
Conditional Probabilities
𝑃 𝐼𝑛𝑑𝑖𝑎 𝐶 =?
𝑃 𝐸𝑑𝑒𝑛 𝐶 =?
𝑃 𝑊𝑖𝑐𝑘𝑒𝑡 𝐶 =?
𝑃 𝑆𝑎𝑐ℎ𝑖𝑛 𝐶 =?
𝑃 𝐽𝑎𝑝𝑎𝑛 𝐶 =?
𝑃 𝑀𝑒𝑠𝑖 𝐶 =?
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 28
Conditional Probabilities
𝑃 𝐼𝑛𝑑𝑖𝑎 𝐹 =?
𝑃 𝐸𝑑𝑒𝑛 𝐹 =?
𝑃 𝑊𝑖𝑐𝑘𝑒𝑡 𝐹 =?
𝑃 𝑆𝑎𝑐ℎ𝑖𝑛 𝐹 =?
𝑃 𝐽𝑎𝑝𝑎𝑛 𝐹 =?
𝑃 𝑀𝑒𝑠𝑖 𝐹 =?
CHOOSING A CLASS
𝑷 𝑪 𝒅𝟓 ?
𝑷 𝑭 𝒅𝟓 ?
References
1. Bing Liu. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing, Second Edition.
Taylor and Francis Group, Boca, 2010.
2. Kushal Dave, Steve Lawrence, and David M. Pennock. Mining the peanut gallery: Opinion extraction and
semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide
Web, WWW ’03, pages 519–528, New York, NY, USA, 2003. ACM.
3. Soo-Min Kim and Eduard Hovy. Determining the sentiment of opinions. Proceedings of the20th international
conference on Computational Linguistics - COLING 04, 2004.
4. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Proceedings of the ACL-02 conference on
Empirical methods in natural language processing - EMNLP ’02, 2002.
5. Bo Pang and Lillian Lee. A sentimental education. Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics - ACL ’04, 2004.
6. Bo Pang and Lillian Lee. Seeing stars. Proceedings of the 43rd Annual Meeting on Association for
Computational Linguistics - ACL 05, 2005.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 29
References
7. Michael Gamon. Sentiment classification on customer feedback data. Proceedings of the 20thinternational
conference on Computational Linguistics - COLING 04, 2004.
8. Daniel M. Bikel and Jeffrey Sorensen. If we want your opinion. International Conference on Semantic
Computing (ICSC 2007).
9. Kathleen T Durant and Michael D Smith. Mining sentiment classification from political web logs. In
Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (WebKDD-2006), Philadelphia, PA, 2006.
10. Peter D. Turney. Mining the web for synonyms: Pmi-ir versus lsa on toefl. Lecture Notes in Computer
Science, page 491 to 502, 2001.
11. Peter D Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of
reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417–
424. Association for Computational Linguistics, 2002.
12. Janyce Wiebe. Learning subjective adjectives from corpora. In AAAI/IAAI, pages 735–740,2000.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 30
References
13. Vasileios Hatzivassiloglou and Kathleen R McKeown. Predicting the semantic orientation of
adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational
Linguistics and Eighth Conference of the European Chapter of the Association for Computational
Linguistics, pages 174–181. Association for Computational Linguistics, 1997.
14. VK Singh, R Piryani, A Uddin, and P Waila. Sentiment analysis of movie reviews: A new feature-
based heuristic for aspect-level sentiment classification. In Automation, Computing, Communication,
Control and Compressed Sensing (iMac4s), 2013 International Multi- Conference on, pages 712–717.
IEEE, 2013.
15. Prem Melville, Wojciech Gryc, and Richard D. Lawrence. Sentiment analysis of blogs by combining
lexical knowledge with text classification. Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining - KDD ’09, 2009.
16. Robert T. Clemen and Robert L. Winkler. Combining probability distributions from experts in risk
analysis. Risk Analysis, 19(2):187 to 203, Apr 1999.
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 31
11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 32

More Related Content

PPT
A Step To The Future
guest56e386
 
PPT
A Step To The Future
sheasmith
 
PDF
Statistical Distributions
Venkata Reddy Konasani
 
PDF
Descriptive statistics
Venkata Reddy Konasani
 
PPTX
GRE Presentation
Western Academy
 
PPT
GRE prepration
Western Academy
 
PPTX
Wechsler Adult Intelligence scale (WAIS)
ssuser62e5671
 
A Step To The Future
guest56e386
 
A Step To The Future
sheasmith
 
Statistical Distributions
Venkata Reddy Konasani
 
Descriptive statistics
Venkata Reddy Konasani
 
GRE Presentation
Western Academy
 
GRE prepration
Western Academy
 
Wechsler Adult Intelligence scale (WAIS)
ssuser62e5671
 

What's hot (12)

PDF
Introduction to GRE
Venkatram Sureddy
 
PDF
MCI Worchester State University Singapore Math Institute
Jimmy Keng
 
PDF
PresTrojan0_1212
Stanislav Moskovtsev
 
PPT
Unit 2 jcs mid
jcsmathfoundations
 
PPT
NAACL HLT 2010 d-Confidence
NunoEscudeiro
 
PDF
Logistic regression
Venkata Reddy Konasani
 
PDF
Multiple regression
Venkata Reddy Konasani
 
PPT
W A I S I I I
neocharles
 
PDF
Data analysis Design Document
Venkata Reddy Konasani
 
PPT
The Wechsler Adult Intelligence Scale (WAIS)
Hemangi Narvekar
 
PPTX
Does Pronunciation Instruction Promote Intelligibility and Comprehensibility?
ozpar
 
PPTX
Forms of learning in ai
Robert Antony
 
Introduction to GRE
Venkatram Sureddy
 
MCI Worchester State University Singapore Math Institute
Jimmy Keng
 
PresTrojan0_1212
Stanislav Moskovtsev
 
Unit 2 jcs mid
jcsmathfoundations
 
NAACL HLT 2010 d-Confidence
NunoEscudeiro
 
Logistic regression
Venkata Reddy Konasani
 
Multiple regression
Venkata Reddy Konasani
 
W A I S I I I
neocharles
 
Data analysis Design Document
Venkata Reddy Konasani
 
The Wechsler Adult Intelligence Scale (WAIS)
Hemangi Narvekar
 
Does Pronunciation Instruction Promote Intelligibility and Comprehensibility?
ozpar
 
Forms of learning in ai
Robert Antony
 
Ad

Similar to Introduction to sentiment analysis (20)

PPTX
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
PPTX
Tweet sentiment analysis (Data mining)
Anil Shrestha
 
PPTX
Fun with Text - Hacking Text Analytics
aiaioo
 
PPTX
introduction to machine learning and nlp
Mahmoud Farag
 
PPT
Moore_slides.ppt
butest
 
PDF
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
ijfcstjournal
 
PDF
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
NAVER Engineering
 
PPT
Genetic Algorithms.ppt
RohanBorgalli
 
PDF
Human-Centric Machine Learning
Rakuten Group, Inc.
 
PDF
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Tao Xie
 
PPTX
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
DOCX
Week 6 Assignment 2Application Chi-Square Study.docx
melbruce90096
 
PDF
Machine learning Lecture 1
Srinivasan R
 
PDF
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
ijfcstjournal
 
PDF
Non parametric bayesian learning in discrete data
Yueshen Xu
 
PPTX
Sentiment analysis using naive bayes classifier
Dev Sahu
 
PDF
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
ijfcstjournal
 
PPTX
Deep Learning for Natural Language Processing
Jonathan Mugan
 
PDF
American sign language recognizer
Garrett Broughton, Architect/Engineer
 
PPT
Lecture 7
butest
 
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
Tweet sentiment analysis (Data mining)
Anil Shrestha
 
Fun with Text - Hacking Text Analytics
aiaioo
 
introduction to machine learning and nlp
Mahmoud Farag
 
Moore_slides.ppt
butest
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
ijfcstjournal
 
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
NAVER Engineering
 
Genetic Algorithms.ppt
RohanBorgalli
 
Human-Centric Machine Learning
Rakuten Group, Inc.
 
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Tao Xie
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Week 6 Assignment 2Application Chi-Square Study.docx
melbruce90096
 
Machine learning Lecture 1
Srinivasan R
 
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
ijfcstjournal
 
Non parametric bayesian learning in discrete data
Yueshen Xu
 
Sentiment analysis using naive bayes classifier
Dev Sahu
 
AUTOMATED WORD PREDICTION IN BANGLA LANGUAGE USING STOCHASTIC LANGUAGE MODELS
ijfcstjournal
 
Deep Learning for Natural Language Processing
Jonathan Mugan
 
American sign language recognizer
Garrett Broughton, Architect/Engineer
 
Lecture 7
butest
 
Ad

More from Rajesh Piryani (11)

PDF
Gomory's cutting plane method
Rajesh Piryani
 
PDF
Monte carlo simulation
Rajesh Piryani
 
PPSX
Online Advertisements and the AdWords Problem
Rajesh Piryani
 
PDF
Hadoop
Rajesh Piryani
 
PDF
Tqm metrics
Rajesh Piryani
 
PPTX
(Project) Student grading system
Rajesh Piryani
 
PDF
Optics ordering points to identify the clustering structure
Rajesh Piryani
 
PDF
Agile software development
Rajesh Piryani
 
PDF
(Paper Presentation) DSDV
Rajesh Piryani
 
PDF
(Paper Presentation) ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Strea...
Rajesh Piryani
 
PDF
Address Binding Scheme
Rajesh Piryani
 
Gomory's cutting plane method
Rajesh Piryani
 
Monte carlo simulation
Rajesh Piryani
 
Online Advertisements and the AdWords Problem
Rajesh Piryani
 
Tqm metrics
Rajesh Piryani
 
(Project) Student grading system
Rajesh Piryani
 
Optics ordering points to identify the clustering structure
Rajesh Piryani
 
Agile software development
Rajesh Piryani
 
(Paper Presentation) DSDV
Rajesh Piryani
 
(Paper Presentation) ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Strea...
Rajesh Piryani
 
Address Binding Scheme
Rajesh Piryani
 

Recently uploaded (20)

PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
CDH. pptx
AneetaSharma15
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
CDH. pptx
AneetaSharma15
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 

Introduction to sentiment analysis

  • 1. Introduction to Sentiment Analysis Rajesh Piryani Department Of Computer Science South Asian University, New Delhi
  • 2. What is Sentiment Analysis? It is a natural language processing task that uses an algorithmic formulation to categorize an opinionated text into either “positive” or “negative” sentiment classes (or sometimes a “neutral” class equivalent to having no opinion polarity). SA(Sentiment Analysis) is defined as a quintuple <Oi; Fij; Ski jl; Hk; Tl >  Oi = targeted object  Fij = feature of the object  Ski jl = Sentiment polarity,  Hk = Opinion Holder k,  Tl =Time when the opinion is expressed Example Oi = Samsung Mobile Fij = Battery, Camera, Memory Card, Design, etc Ski jl = positive for six month, Negative after that Hk = Myself, Tl =When I purchased the Samsung mobile it was good, but now after 6 months it gets heated in 4 to 5 minutes . 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 2
  • 3. Why Sentiment Analysis Mainly because of the Web; huge volumes of opinionated text User-generated media: One can express opinions on anything in reviews, forums, discussion groups, blogs Opinions of global scale: No longer limited to: Individuals: one’s circle of friends  Businesses: Small scale surveys, tiny focus groups, etc. 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 3
  • 4. Example 1 I love this movie! It's sweet, but with satirical humor. The dialogue is great and the adventure scenes are fun… It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 4
  • 5. Example 2 My XYZ CAR was delivered yesterday. It looks fabulous. We went on a long highway drive the very second day of getting the car. It was smooth, comfortable and wonderful drive. Had a wonderful experience with family. Its an awesome car. I am loving it..! 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 5
  • 6. Classification of Sentence Opinion without sentiment (Objectivity) I believe the World is flat. Samsung Galaxy has resolution of 14 MP. Sentiment always involve holder’s emotion or desires (Subjectivity) I think intervention in Libya will put US in a difficult situation. The US attack on Afghanistan is wrong. Video Quality of iPhone is awesome. iPhone6 is newest in the market. Sentences Objective Subjective Positive Negative Neutral Figure 1. Classification of Sentence 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 6
  • 7. Levels of Sentiment Analysis Levels of Sentiment Analysis Document Level Sentence Level Aspect Level Figure 2. Level of Sentiment Analysis 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 7
  • 8. Example 3 iPhone- User Review: I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. … 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 8
  • 9. Visual Comparison of Aspect based Sentiment Analysis 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 9 Figure 3. Visual Comparison of Aspect Level based Sentiment Analysis
  • 10. Approaches to perform Sentiment Analysis Machine Learning Classifier Approach Naïve Bayes, Maximum Entropy, Support Vector Machine etc. Unsupervised Semantic Orientation Approach Semantic Orientation-Point-wise Mutual Information-Information Retrieval Semi-supervised SentiWordNet based Approaches SentiWordNet, SenticNet 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 10
  • 11. ML Supervised Algorithm Block Diagram Figure 4. Block diagram of ML Supervised Algorithm 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 11
  • 12. Preprocessing of data for ML Algorithm Review/Text Tokenization Stop word removal Punctuation marks removal Stemming 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 12 Figure 5. Steps for pre-processing of data
  • 13. Preprocessing of data for ML Algorithm Stop Words: •common words that have low discrimination power (e.g., the, is, and who) •usually filtered out before processing the text Stemming •the purpose of stemming is to reduce different grammatical forms or word forms of a word like its noun, adjective, verb, adverb etc •The goal of stemming is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form •Example: "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu" 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 13
  • 14. Supervised Machine Learning Input:  a document 𝒅 A fixed set of classes 𝑪 = 𝒄𝟏, 𝒄𝟐, … , 𝒄𝒏 A train set of m hand-labeled documents 𝒅𝟏, 𝒄𝟏 , … , (𝒅𝒎, 𝒄𝒎) Output A learned classifier, 𝒀: 𝒅 → 𝒄 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 14
  • 15. The bag of words representation 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 15
  • 16. The bag of words representation 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 16
  • 17. The bag of word representation: using a subset of words 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 17
  • 18. The bag of words representation 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 18
  • 19. NB Machine Learning Approach The probability of a document d being in class c is computed as 𝑷 𝒄 𝒅 ∝ 𝑷 𝒄 𝟏≤𝒌≤𝒏𝒅 𝑷( 𝒕𝒌|𝒄) where, 𝑷(𝒕𝒌|𝒄) is the conditional probability of a term 𝒕𝒌 occurring in a document of class 𝒄. The goal is to find the best class, i.e., Maximum A Posteriori Class as follows: 𝒄𝒎𝒂𝒑 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒄∈𝑪 𝑷 𝒄 ∗ 𝟏≤𝒌≤𝒏𝒅 𝑷( 𝒕𝒌|𝒄) Which can be reframed as 𝒄𝒎𝒂𝒑 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒄∈𝑪[𝒍𝒐𝒈 𝑷 𝒄 + 𝟏≤𝒌≤𝒏𝒅 𝒍𝒐𝒈 𝑷(𝒕𝒌|𝒄)] 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 19
  • 20. NB Machine Learning Approach (Contd..) 𝑷(𝒄) and 𝑷(𝒕𝒌|𝒄) are maximum likelihood estimates based on training data and can be computed as: 𝑷 𝒄 = 𝑵𝑪 𝑵 𝑷 𝒕 𝒄 = 𝑻𝒄𝒕 𝒕′∈𝑽 𝑻𝒄𝒕′ Laplace (add-1) smoothing for Naïve Bayes 𝑷 𝒕 𝒄 = 𝑻𝒄𝒕 + 𝟏 𝒕′∈𝑽 𝑻𝒄𝒕′ + 𝟏 = 𝑻𝒄𝒕 + 𝟏 ( 𝒕′∈𝑽 𝑻𝒄𝒕′) + |𝑽| where, 𝑵 is total no. of docs, 𝑵𝒄 is the no. of docs in the class 𝒄. 𝑻𝒄𝒕 is the number of occurrences of term 𝒕 in training docs from class 𝒄. |𝑽|is the number of unique words in vocabulary 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 20
  • 21. Example S: I love this fun film. Steps:  Assigning each word: 𝑷(𝒘𝒐𝒓𝒅 | 𝒄)  Assigning each sentence: 𝑷(𝒔|𝒄) = 𝚷 𝑷(𝒘𝒐𝒓𝒅|𝒄) Which class assigns the higher probability to s? 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 21
  • 22. Example S: I love this fun film. Steps:  Assigning each word: 𝑷(𝒘𝒐𝒓𝒅 | 𝒄)  Assigning each sentence: 𝑷(𝒔|𝒄) = 𝚷 𝑷(𝒘𝒐𝒓𝒅|𝒄) Model Positive 0.1 I 0.1 love 0.01 this 0.05 fun 0.1 film Model Negative 0.2 I 0.001 love 0.01 this 0.005 fun 0.1 film S I love this fun film 0.1 0.1 0.01 0.05 0.1 0.2 0.001 0.01 0.005 0.1 𝑷 𝒔 𝒑𝒐𝒔 > 𝑷(𝒔|𝒏𝒆𝒈) 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 22
  • 23. Example Doc Words Class Training Document 1 Chinese Beijing Chinese c 2 Chinese Chinese Shanghai c 3 Chinese Macao c 4 Tokyo Japan Chinese j Test Document 5 Chinese Chinese Chinese Tokyo Japan ? Formulas 𝑷𝒓𝒊𝒐𝒓 𝑷 𝒄 = 𝑵𝒄 𝑵 Conditional Probability 𝑷 𝒘 𝒄 = 𝒄𝒐𝒖𝒏𝒕 𝒘, 𝒄 + 𝟏 𝒄𝒐𝒖𝒏𝒕 𝒄 + |𝑽| 𝑽 : 𝒔𝒊𝒛𝒆 𝒐𝒇 𝒗𝒐𝒄𝒂𝒃𝒖𝒍𝒂𝒓𝒚(𝐮𝐧𝐢𝐪𝐮𝐞 𝐰𝐨𝐫𝐝𝐬) 𝒄𝒐𝒖𝒏𝒕 𝒄 : 𝒕𝒐𝒕𝒂𝒍 𝒘𝒐𝒓𝒅 𝒊𝒏 𝒄𝒍𝒂𝒔𝒔 𝒄 𝒄𝒐𝒖𝒏𝒕 𝒘, 𝒄 : 𝒐𝒄𝒄𝒖𝒓𝒆𝒏𝒄𝒆 𝒐𝒇 𝒘 𝒊𝒏 𝒄 For example: Prior 𝑷 𝒄 = 𝟑 𝟒 𝑷 𝒋 = 𝟏 𝟒 Conditional Probabilities 𝑃 𝐶ℎ𝑖𝑛𝑒𝑠𝑒 𝑐 = 5 + 1 8 + 6 = 6 14 = 3 7 𝑃 𝐵𝑒𝑖𝑗𝑖𝑛𝑔 𝑐 = 1 + 1 8 + 6 = 2 14 = 1 7 𝑃 𝑆ℎ𝑎𝑛𝑔ℎ𝑎𝑖 𝑐 = 1 + 1 8 + 6 = 2 14 = 1 7 𝑃 𝑀𝑎𝑐𝑎𝑜 𝑐 = 1 + 1 8 + 6 = 2 14 = 1 7 𝑃 𝑇𝑜𝑘𝑦𝑜 𝑐 = 0 + 1 8 + 6 = 1 14 𝑃 𝐽𝑎𝑝𝑎𝑛 𝑐 = 0 + 1 8 + 6 = 1 14 Conditional Probabilities 𝑃 𝐶ℎ𝑖𝑛𝑒𝑠𝑒 𝑗 = 1 + 1 3 + 6 = 2 9 𝑃 𝐵𝑒𝑖𝑗𝑖𝑛𝑔 𝑗 = 0 + 1 3 + 6 = 1 9 𝑃 𝑆ℎ𝑎𝑛𝑔ℎ𝑎𝑖 𝑗 = 0 + 1 3 + 6 = 1 9 𝑃 𝑀𝑎𝑐𝑎𝑜 𝑗 = 0 + 1 3 + 6 = 1 9 𝑃 𝑇𝑜𝑘𝑦𝑜 𝑗 = 1 + 1 3 + 6 = 2 9 𝑃 𝐽𝑎𝑝𝑎𝑛 𝑗 = 1 + 1 3 + 6 = 2 9 CHOOSING A CLASS 𝑷 𝒄 𝒅𝟓 ∝ 𝟑 𝟒 ∗ 𝟑 𝟕 𝟑 ∗ 𝟏 𝟏𝟒 ∗ 𝟏 𝟏𝟒 ≈ 𝟎. 𝟎𝟎𝟎𝟑 𝑷 𝒋 𝒅𝟓 ∝ 𝟏 𝟒 ∗ 𝟐 𝟗 𝟑 ∗ 𝟐 𝟗 ∗ 𝟐 𝟗 ≈ 𝟎. 𝟎𝟎𝟎𝟏 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 23
  • 25. Performance Evaluation Definition of some terminologies 𝒕𝒑:- A true positive (𝒕𝒑) decision assigns two similar documents to the same classes 𝒕𝒏:- a true negative (𝒕𝒏) decision assigns two dissimilar documents to different classes 𝒇𝒑:- A (𝒇𝒑) decision assigns two dissimilar documents to the same classes 𝒇𝒏:- A (𝒇𝒏) decision assigns two similar documents to different classes 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 25
  • 26. Performance Evaluation Accuracy (A) 𝑨 = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒐𝒓𝒓𝒆𝒄𝒕𝒍𝒚 𝒄𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒆𝒅 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 Precision (P) 𝑷 = | 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 ∩ 𝒓𝒆𝒕𝒓𝒊𝒆𝒗𝒆𝒅 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 | | 𝒓𝒆𝒕𝒓𝒊𝒆𝒗𝒆𝒅 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 | 𝑷 = 𝒕𝒑 𝒕𝒑 + 𝒇𝒑 Recall (R) 𝑹 = | 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 ∩ 𝒓𝒆𝒕𝒓𝒊𝒆𝒗𝒆𝒅 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 | | 𝒓𝒆𝒍𝒆𝒗𝒂𝒏𝒕 𝒅𝒐𝒄𝒖𝒎𝒆𝒏𝒕𝒔 | 𝑹 = 𝒕𝒑 𝒕𝒑 + 𝒇𝒏 F-measure(F) 𝑭 = 𝟐 ∗ 𝑷 ∗ 𝑹 𝑷 + 𝑹 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 26
  • 27. Exercise Doc Words Class Training Document 1 India Eden India Wicket Cricket 2 India India Sachin Cricket 3 Sachin India Eden Cricket 4 Japan Mesi India Football Test Document 5 India Sachin India Japan Eden Wicket ? 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 27 Compute the Conditional Probability of each unique word and compute the class of doc5?
  • 28. Hint Doc Words Class Training Document 1 India Eden India Wicket Cricket 2 India India Sachin Cricket 3 Sachin India Eden Cricket 4 Japan Mesi India Football Test Document 5 India Sachin India Japan Eden Wicket ? Formulas 𝑷𝒓𝒊𝒐𝒓 𝑷 𝒄 = 𝑵𝒄 𝑵 Conditional Probability 𝑷 𝒘 𝒄 = 𝒄𝒐𝒖𝒏𝒕 𝒘, 𝒄 + 𝟏 𝒄𝒐𝒖𝒏𝒕 𝒄 + |𝑽| 𝑽 : 𝒔𝒊𝒛𝒆 𝒐𝒇 𝒗𝒐𝒄𝒂𝒃𝒖𝒍𝒂𝒓𝒚(𝐮𝐧𝐢𝐪𝐮𝐞 𝐰𝐨𝐫𝐝𝐬) 𝒄𝒐𝒖𝒏𝒕 𝒄 : 𝒕𝒐𝒕𝒂𝒍 𝒘𝒐𝒓𝒅 𝒊𝒏 𝒄𝒍𝒂𝒔𝒔 𝒄 𝒄𝒐𝒖𝒏𝒕 𝒘, 𝒄 : 𝒐𝒄𝒄𝒖𝒓𝒆𝒏𝒄𝒆 𝒐𝒇 𝒘 𝒊𝒏 𝒄 For example: Prior 𝑷 𝒄 = 𝟑 𝟒 𝑷 𝒋 = 𝟏 𝟒 Conditional Probabilities 𝑃 𝐼𝑛𝑑𝑖𝑎 𝐶 =? 𝑃 𝐸𝑑𝑒𝑛 𝐶 =? 𝑃 𝑊𝑖𝑐𝑘𝑒𝑡 𝐶 =? 𝑃 𝑆𝑎𝑐ℎ𝑖𝑛 𝐶 =? 𝑃 𝐽𝑎𝑝𝑎𝑛 𝐶 =? 𝑃 𝑀𝑒𝑠𝑖 𝐶 =? 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 28 Conditional Probabilities 𝑃 𝐼𝑛𝑑𝑖𝑎 𝐹 =? 𝑃 𝐸𝑑𝑒𝑛 𝐹 =? 𝑃 𝑊𝑖𝑐𝑘𝑒𝑡 𝐹 =? 𝑃 𝑆𝑎𝑐ℎ𝑖𝑛 𝐹 =? 𝑃 𝐽𝑎𝑝𝑎𝑛 𝐹 =? 𝑃 𝑀𝑒𝑠𝑖 𝐹 =? CHOOSING A CLASS 𝑷 𝑪 𝒅𝟓 ? 𝑷 𝑭 𝒅𝟓 ?
  • 29. References 1. Bing Liu. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing, Second Edition. Taylor and Francis Group, Boca, 2010. 2. Kushal Dave, Steve Lawrence, and David M. Pennock. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide Web, WWW ’03, pages 519–528, New York, NY, USA, 2003. ACM. 3. Soo-Min Kim and Eduard Hovy. Determining the sentiment of opinions. Proceedings of the20th international conference on Computational Linguistics - COLING 04, 2004. 4. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Proceedings of the ACL-02 conference on Empirical methods in natural language processing - EMNLP ’02, 2002. 5. Bo Pang and Lillian Lee. A sentimental education. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL ’04, 2004. 6. Bo Pang and Lillian Lee. Seeing stars. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics - ACL 05, 2005. 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 29
  • 30. References 7. Michael Gamon. Sentiment classification on customer feedback data. Proceedings of the 20thinternational conference on Computational Linguistics - COLING 04, 2004. 8. Daniel M. Bikel and Jeffrey Sorensen. If we want your opinion. International Conference on Semantic Computing (ICSC 2007). 9. Kathleen T Durant and Michael D Smith. Mining sentiment classification from political web logs. In Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (WebKDD-2006), Philadelphia, PA, 2006. 10. Peter D. Turney. Mining the web for synonyms: Pmi-ir versus lsa on toefl. Lecture Notes in Computer Science, page 491 to 502, 2001. 11. Peter D Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417– 424. Association for Computational Linguistics, 2002. 12. Janyce Wiebe. Learning subjective adjectives from corpora. In AAAI/IAAI, pages 735–740,2000. 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 30
  • 31. References 13. Vasileios Hatzivassiloglou and Kathleen R McKeown. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pages 174–181. Association for Computational Linguistics, 1997. 14. VK Singh, R Piryani, A Uddin, and P Waila. Sentiment analysis of movie reviews: A new feature- based heuristic for aspect-level sentiment classification. In Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), 2013 International Multi- Conference on, pages 712–717. IEEE, 2013. 15. Prem Melville, Wojciech Gryc, and Richard D. Lawrence. Sentiment analysis of blogs by combining lexical knowledge with text classification. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09, 2009. 16. Robert T. Clemen and Robert L. Winkler. Combining probability distributions from experts in risk analysis. Risk Analysis, 19(2):187 to 203, Apr 1999. 11/10/2017 INTRODUCTIONTO SENTIMENTANALYSIS 31