0% found this document useful (0 votes)

12 views64 pages

05 Text Classification - Naive Bayes

The document discusses text classification, focusing on the Naïve Bayes classifier and its applications in various domains such as spam detection and sentiment analysis. It covers the principles of the Naïve Bayes method, including the Bag of Words representation, learning process, and evaluation metrics like precision, recall, and F1 score. Additionally, it highlights the importance of handling unknown words and negation in sentiment classification, as well as the relationship between Naïve Bayes and language modeling.

Uploaded by

esmailelhariri272

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views64 pages

05 Text Classification - Naive Bayes

Uploaded by

esmailelhariri272

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

TM340 Natural Language

Processing

Text Classification

Naïve Bayes

Based on slides by Dan Jurafsky and Chris Manning

Agenda

 The Task of Text Classification

 The Naive Bayes Classifier

 Naive Bayes: Learning

 Sentiment and Binary Naive Bayes

 More on Sentiment Classification

 Naïve Bayes: Relationship to Language Modeling

 Text Classification Evaluation: Precision, Recall, and F1

2
The Task of Text Classification

3
Is this spam?

4
Positive or negative movie review?

 unbelievably disappointing

 Full of zany characters and richly applied satire,

and some great plot twists

 this is the greatest screwball comedy ever filmed

 It was pathetic. The worst part about it was the

boxing scenes.
5
What is the subject of this article?

 MeSH Subject Category Hierarchy

• Antogonists and Inhibitors

• Blood Supply
• Chemistry
?
• Drug Therapy
• Embryology
• Epidemiology
• …
6
Text Classification

 Input:

• a document d
• a fixed set of classes C = {c1, c2,…, cJ}

 Output: a predicted class

8
Classification Methods:
Hand-coded rules
 Rules based on combinations of words or other features

• Spam: black-list-address OR (“dollars” AND“have been selected”)

 Accuracy can be high, If rules carefully refined by expert

 But building and maintaining these rules is expensive

9
Classification Methods:
Supervised Machine Learning
 Input:

• a document d

• a fixed set of classes C = {c1, c2,…, cj }

• A training set of m hand-labeled documents (d1,c1),....,(dm,cm)

 Output:

• a learned classifier γ:d  c

10
Classification Methods:
Supervised Machine Learning
 Any kind of classifier can be used:

• Naïve Bayes
• Logistic regression
• Support-vector machines
• k-Nearest Neighbors
•…

11
The Naive Bayes Classifier

12
Naive Bayes Intuition

 Simple ("naive") classification method based on Bayes

rule

 Relies on very simple representation of document

(Bag of words)

13
The Bag of Words Representation

seen 2

γ
sweet 1
whimsical
recommend
1
1 )=c
(
happy 1
... ... 14
Bayes’ Rule Applied to Documents and Classes

 For a document d and a class c

P ( d | c ) P (c )
P (c | d ) 
P(d )

15
Naive Bayes Classifier

MAP is “maximum a
posteriori” = most likely
class

Bayes Rule

Dropping the denominator

16
Naive Bayes Classifier

"Likelihood "Prior
" "

argmax P( x1 , x2 ,  , xn | c) P (c)
cC
Document d represented as features
x1..xn
17
Naive Bayes Classifier

cMAP argmax P ( x1 , x2 ,..., xn | c) P (c)

cC

O(|X| •|C|) parameters

n
How often does this class
Could only be occur?
estimated if a very, We can just count
very large number of the relative
training examples frequencies in a
was available. corpus
18
Multinomial Naive Bayes Independence
Assumptions

P( x1 , x2 ,  , xn | c)
 Bag of Words assumption: Assume position doesn’t
matter

 Conditional Independence: Assume the feature

probabilities P(xi | cj) are independent given the class c

P( x1 ,..., xn | c) P ( x1 | c)  P ( x2 | c)  P( x3 | c)  ...  P ( xn | c)
19
Multinomial Naive Bayes Classifier

cMAP argmax P ( x1 , x2 ,..., xn | c) P (c)

cC

20
Applying Multinomial Naive Bayes Classifiers to Text
Classification
positions  all word positions in test document

21
Problems with multiplying lots of probs

 There's a problem with this:

 Multiplying lots of probabilities can result in floating-point underflow!

0.0006 * 0.0007 * 0.0009 * 0.01 * 0.5 * 0.000008….

 Idea: Use logs, because log(ab) = log(a) + log(b)

We'll sum logs of probabilities instead of multiplying probabilities!

22
We do everything in log space

 Instead of this:

 This:

 Notes:
1) Taking log doesn't change the ranking of classes!
The class with highest probability also has highest log probability!
2) It's a linear model:
Just a max of a sum of weights: a linear function of the inputs
So naive bayes is a linear classifier
23
Naive Bayes: Learning

24
Learning the Multinomial Naive Bayes Model

 First attempt: maximum likelihood estimates

• simply use the frequencies in the data

𝑁𝑐
^
𝑃 (𝑐 𝑗)= 𝑗

𝑁 𝑡𝑜𝑡𝑎𝑙

25
Parameter Estimation

fraction of times word wi appears

among all words in documents of topic cj

 Create mega-document for topic j by concatenating all

docs in this topic
• Use frequency of w in mega-document
26
Problem with Maximum Likelihood

 What if we have seen no training documents with the

word fantastic and classified in the topic positive?

 Zero probabilities cannot be conditioned away, no

matter the other evidence!

27
Laplace (add-1) smoothing for Naïve Bayes

 The solution: apply Laplace (add-1) smoothing for

Naïve Bayes


28
Multinomial Naïve Bayes: Learning

 From training corpus, extract Vocabulary

• Calculate P (cj) terms • Calculate P (wk | cj) terms

• For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj • For each word wk in Vocabulary
nk  # of occurrences of wk in Textj

29
Unknown words
 What about unknown words

• that appear in our test data

• but not in our training data or vocabulary?

 We ignore them

• Remove them from the test document!

• Pretend they weren't there!
• Don't include any probability for them at all!

 Why don't we build an unknown word model?

• It doesn't help: knowing which class has more unknown words is not generally
helpful!
30
Stop words

 Some systems ignore stop words

• Stop words: very frequent words like the and a.

- Sort the vocabulary by word frequency in training set
- Call the top 10 or 50 words the stopword list.
- Remove all stop words from both training and test sets
• As if they were never there!

 But removing stop words doesn't usually help

• So, in practice most NB algorithms use all words and don't use
stopword lists
31
Sentiment and Binary Naive
Bayes

32
Let's do a worked sentiment example!

33
A worked sentiment example with add-1 smoothing

1. Prior from training:

𝑁𝑐 P(-) = 3/5
P(+) =
^ (𝑐 )=
𝑃 𝑗

2/5
𝑗
𝑁 𝑡𝑜𝑡𝑎𝑙

2. Drop "with"
3. Likelihoods from training:
𝑝 ( 𝑤 𝑖|𝑐 ) =
𝑐𝑜𝑢𝑛𝑡 ( 𝑤 𝑖 , 𝑐 ) +1
4. Scoring the test set:
(∑
𝑤 ∈𝑉
)
𝑐𝑜𝑢𝑛𝑡 ( 𝑤 , 𝑐 ) + ¿ 𝑉 ∨¿ ¿

34
Optimizing for sentiment analysis

 For tasks like sentiment, word occurrence seems to

be more important than word frequency.
• The occurrence of the word fantastic tells us a lot
• The fact that it occurs 5 times may not tell us much more.
 Binary multinominal naive bayes, or binary NB

• Clip our word counts at 1

35
Binary Multinomial Naïve Bayes: Learning

• From training corpus, extract Vocabulary

 Calculate P(cj) terms • Calculate P(wk | cj) terms
• For each cj in C do • Textj  single doc containing all docsj
• For each word wk in Vocabulary
docsj  all docs with class =cj
nk  # of occurrences of wk in Textj

36
Binary Multinomial Naïve Bayes: Learning

• From training corpus, extract Vocabulary

 Calculate P(cj) terms • Calculate P(wk | cj) terms
• For each cj in C do • Remove duplicates in each doc:
• For each word type w in docj
docsj  all docs with class =cj
• Retain only a single instance of w
• Textj  single doc containing all docsj
• For each word wk in Vocabulary
nk  # of occurrences of wk in Textj

37
Binary Multinomial Naive Bayes
on a test document d
 First remove all duplicate words from d

 Then compute NB using the same equation:

38
Binary multinominal naive Bayes

Counts can still be 2! Binarization is within-doc! 39

 I really like this movie

 I really don't like this movie

 Negation changes the meaning of "like" to negative.

 Negation can also change negative to positive.

• Don't dismiss this film

• Doesn't let us get bored
41
Sentiment Classification: Dealing with Negation

 Simple baseline method:

 Add NOT_ to every word between negation and following

punctuation:

 didn’t like this movie , but I

 didn’t NOT_like NOT_this NOT_movie but I

42
Sentiment Classification: Lexicons

 Sometimes we don't have enough labeled training

data

 In that case, we can make use of pre-built word lists

called lexicons

 There are various publicly available lexicons

43
MPQA Subjectivity Cues Lexicon

 Home page: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

 6885 words from 8221 lemmas, annotated for intensity (strong/weak)

• 2718 positive
• 4912 negative
 + : admirable, beautiful, confident, dazzling, ecstatic, favor, glee,
great

 − : awful, bad, bias, catastrophe, cheat, deny, envious, foul, harsh,

hate

44
The General Inquirer

 Home page: http://www.wjh.harvard.edu/~inquirer

 List of Categories: http://www.wjh.harvard.edu/~inquirer/homecat.htm
 Spreadsheet: http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls
 Categories:

• Positiv (1915 words) and Negativ (2291 words)

• Strong vs Weak, Active vs Passive, Overstated versus Understated
• Pleasure, Pain, Virtue, Vice, Motivation, Cognitive Orientation, etc

 Free for Research Use

45
Using Lexicons in Sentiment Classification

 Add a feature that gets a count whenever a word from the lexicon
occurs
• E.g., a feature called "this word occurs in the positive lexicon" or "this
word occurs in the negative lexicon"

 Now all positive words (good, great, beautiful, wonderful) or negative

words count for that feature.

 Using 1-2 features isn't as good as using all the words.

• But when training data is sparse or not representative of the test set, dense
lexicon features can help
46
Naive Bayes in Spam Filtering

 Spam Assassin Features:

• Mentions millions of (dollar) ((dollar) NN,NNN,NNN.NN)

• From: starts with many numbers
• Subject is all capitals
• HTML has a low ratio of text to image area
• "One hundred percent guaranteed"
• Claims you can be removed from the list
47
Naive Bayes in Language ID

 Determining what language a piece of text is written

in.

 Features based on character n-grams do very well

 Important to train on lots of varieties of each language

• (e.g., American English varieties like African-American

English, or English varieties around the world like Indian
English)
48
Summary: Naive Bayes is Not So Naive

 Very Fast, low storage requirements

 Work well with very small amounts of training data

 Robust to Irrelevant Features

 Very good in domains with many equally important features

 Optimal if the independence assumptions hold

 A good dependable baseline for text classification

49
Naïve Bayes: Relationship to
Language Modeling

50
Naïve Bayes and Language Modeling

 Naïve bayes classifiers can use any sort of feature

• URL, email address, dictionaries, network features

 But if:

• We use only word features

• we use all of the words in the text (not a subset)
 Then:

• Naïve bayes has an important similarity to language modeling.

51
Each class = a unigram language model

Assigning each word: P(word | c)

Assigning each sentence: P(s | c)=Π P(word | c)

Class pos I love this fun film
0.1 I 0.1 0. 0.05 0.0 0.

P(s | pos) = 0.0000005

0.1 love 1 1 1
0.01
this
0.05 fun
0.1 film
… 52
Naïve Bayes as a Language Model

 Which class assigns the higher probability to s?

Model pos Model neg

I love this fun film
0.1 I 0.2 I
0.1 0.1 0.01 0.05 0.1
0.1 love 0.001 love 0.001 0.01 0.005 0.1
0.2
0.01 0.01
this P(s|pos) > P(s|neg)
this
0.05 fun
0.005 fun
0.1 film
0.1 film 53
Evaluation of Text Classification
Precision, Recall, and F1

54
Evaluating Classifiers
How well does our classifier work?
Let's first address binary classifiers:

•Is this email spam?

spam (+) or not spam (-)

•Is this post about Delicious Pie Company?

about Del. Pie Co (+) or not about Del. Pie Co(-)

We'll need to know

1. What did our classifier say about each email or post?

2. What should our classifier have said, i.e., the correct answer, usually as
defined by humans ("gold label")

55
First step in evaluation: The confusion matrix

56
Why don't we use accuracy?

Accuracy doesn't work well when we're dealing with uncommon or

imbalanced classes

Suppose we look at 1,000,000 social media posts to find Delicious Pie-

lovers (or haters)

• 100 of them talk about our pie

• 999,900 are posts about something unrelated

Imagine the following simple classifier

Every post is "not about pie"

57
Why don't we use accuracy?

Accuracy of our "nothing is pie" classifier

999,900 true negatives and 100 false negatives

Accuracy is 999,900/1,000,000 = 99.99%!

But useless at finding pie-lovers (or haters)!!

Which was our goal!

Accuracy doesn't work well for unbalanced classes

Most tweets are not about pie!

58
Instead of accuracy we use precision and recall

Precision: % of selected items that are correct

Recall: % of correct items that are selected

59
Precision/Recall aren't fooled by the
“ just call everything negative" classifier!
Stupid classifier: Just say no: every tweet is "not about pie"

•100 tweets talk about pie, 999,900 tweets don't

•Accuracy = 999,900/1,000,000 = 99.99%

But the Recall and Precision for this classifier are terrible:

60
A combined measure: F1

 F1 is a combination of precision and recall.

61
Suppose we have more than 2 classes?
 Lots of text classification tasks have more than two classes.

 Sentiment analysis (positive, negative, neutral) , named entities (person, location,

organization)

 We can define precision and recall for multiple classes like this 3-way email task:

62
How to combine P/R values for different classes:
Microaveraging vs Macroaveraging

63
Thank You

05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
NB 24 Aug
No ratings yet
NB 24 Aug
85 pages
NB 24 Aug
No ratings yet
NB 24 Aug
79 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
Multimedia Application L7 - For
No ratings yet
Multimedia Application L7 - For
46 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
NLP NB
No ratings yet
NLP NB
52 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Multinomial NB
No ratings yet
Multinomial NB
52 pages
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
T4L1 Naive Bayes
No ratings yet
T4L1 Naive Bayes
50 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
MLRD 2
No ratings yet
MLRD 2
15 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
Text Classification
No ratings yet
Text Classification
60 pages
Naive Bayes and Sentiment Classification: CS6431 Natural Language Processing Spring 2023
No ratings yet
Naive Bayes and Sentiment Classification: CS6431 Natural Language Processing Spring 2023
36 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
Naïve Bayes for Text Classification
No ratings yet
Naïve Bayes for Text Classification
25 pages
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
No ratings yet
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
8 pages
Sentiment Analysis: Using Naïve Bayes Classifier
No ratings yet
Sentiment Analysis: Using Naïve Bayes Classifier
18 pages
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
Bag - of - Words NLP
100% (1)
Bag - of - Words NLP
23 pages
Text Classification
No ratings yet
Text Classification
7 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
Week 4
No ratings yet
Week 4
45 pages
Na Ive Bayes Classifier
No ratings yet
Na Ive Bayes Classifier
3 pages
Text Classification
No ratings yet
Text Classification
53 pages
Naïve Bayes for CS Students
No ratings yet
Naïve Bayes for CS Students
55 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
Module 3
No ratings yet
Module 3
25 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
NBayes 1 20 2011 Ann
No ratings yet
NBayes 1 20 2011 Ann
21 pages
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
Lecture3 Linear Classifiers
No ratings yet
Lecture3 Linear Classifiers
36 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
Lecture 5-1 Naive
No ratings yet
Lecture 5-1 Naive
44 pages
24 Shivangi DMDW
No ratings yet
24 Shivangi DMDW
12 pages
Bayesian Learning Essentials
No ratings yet
Bayesian Learning Essentials
49 pages
Naive Bayes Sentiment Analysis
No ratings yet
Naive Bayes Sentiment Analysis
23 pages
Naive Bayes and Sentiment Classification
No ratings yet
Naive Bayes and Sentiment Classification
23 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
Module 3 NLP
No ratings yet
Module 3 NLP
17 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
4.machine Learning For Text Understanding-1
No ratings yet
4.machine Learning For Text Understanding-1
45 pages
Naive Bayes Classifier Presentation
No ratings yet
Naive Bayes Classifier Presentation
7 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
Lec 2
No ratings yet
Lec 2
21 pages
05 Naive Bayes - Relationship To Language Modeling 4-35
No ratings yet
05 Naive Bayes - Relationship To Language Modeling 4-35
2 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
64 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
In4080 2022 Lecture 03
No ratings yet
In4080 2022 Lecture 03
62 pages
Lecture5 421
No ratings yet
Lecture5 421
115 pages
Naive Bayes
No ratings yet
Naive Bayes
12 pages
Hindi Sentiment Analysis Method
No ratings yet
Hindi Sentiment Analysis Method
8 pages
PhD Application & Text Classification
No ratings yet
PhD Application & Text Classification
41 pages
EU - EV Interoperability and Roaming - 2020
No ratings yet
EU - EV Interoperability and Roaming - 2020
34 pages
English Test for Primary Students
No ratings yet
English Test for Primary Students
2 pages
Ayman Nazhan - CV
No ratings yet
Ayman Nazhan - CV
1 page
Digital Transformation in Water Comapnies
No ratings yet
Digital Transformation in Water Comapnies
36 pages
Introduction To Law and The Legal System 11th Edition Frank August Schubert ISBN10 1285438256 ISBN13 9781285438252 Digital Access
No ratings yet
Introduction To Law and The Legal System 11th Edition Frank August Schubert ISBN10 1285438256 ISBN13 9781285438252 Digital Access
411 pages
Indian Business Directory
0% (2)
Indian Business Directory
201 pages
SAP Risk Management Matrix
No ratings yet
SAP Risk Management Matrix
17 pages
Discontinuities
No ratings yet
Discontinuities
26 pages
013 Stream Pe 100 Pipe Support Spacings
No ratings yet
013 Stream Pe 100 Pipe Support Spacings
1 page
LED ECO STREETLIGHT 70 W 4000 K en
No ratings yet
LED ECO STREETLIGHT 70 W 4000 K en
4 pages
CPGET 2023 Exam Schedule Osmania University
No ratings yet
CPGET 2023 Exam Schedule Osmania University
1 page
Age Is Not Maturity
100% (1)
Age Is Not Maturity
21 pages
Yos Yngylyzce A 4
No ratings yet
Yos Yngylyzce A 4
24 pages
Science Benchmark3 Review
No ratings yet
Science Benchmark3 Review
20 pages
Metrix+ FN+ Coating Thickness Gauge
No ratings yet
Metrix+ FN+ Coating Thickness Gauge
1 page
Analysis of Ceramics Market 2010-2011 en
No ratings yet
Analysis of Ceramics Market 2010-2011 en
79 pages
Resilience Quotient and Work Performance of Teachers in Indigenous People Schools in Tboli, South Cotabato
No ratings yet
Resilience Quotient and Work Performance of Teachers in Indigenous People Schools in Tboli, South Cotabato
42 pages
Qualitative Research
No ratings yet
Qualitative Research
16 pages
SWOT Analysis: August 2016
No ratings yet
SWOT Analysis: August 2016
16 pages
Learning Style Quiz: Arch. John Torre Gentapanan, UAP
No ratings yet
Learning Style Quiz: Arch. John Torre Gentapanan, UAP
33 pages
Evaluation of Scale Inhibitors High Iron Water
No ratings yet
Evaluation of Scale Inhibitors High Iron Water
9 pages
Chapter-2 Practice Sheet
No ratings yet
Chapter-2 Practice Sheet
4 pages
Assigment 01 2025 Semester 1due 14 March
No ratings yet
Assigment 01 2025 Semester 1due 14 March
3 pages
Unit - IX Interaction Between A.C and D.C Systems 9.0 Introduction
No ratings yet
Unit - IX Interaction Between A.C and D.C Systems 9.0 Introduction
8 pages
Cooling Conveyor
No ratings yet
Cooling Conveyor
9 pages
Kobra 270 TS HS
No ratings yet
Kobra 270 TS HS
2 pages
Conceptual Understanding and Procedural Fluency in Mathematics Overview
No ratings yet
Conceptual Understanding and Procedural Fluency in Mathematics Overview
2 pages
Architecture Thesis Synopsis
100% (3)
Architecture Thesis Synopsis
5 pages
Strategic Management Process
No ratings yet
Strategic Management Process
19 pages
Empaques
No ratings yet
Empaques
16 pages

05 Text Classification - Naive Bayes

Uploaded by

05 Text Classification - Naive Bayes

Uploaded by

TM340 Natural Language

Based on slides by Dan Jurafsky and Chris Manning

 The Task of Text Classification

 The Naive Bayes Classifier

 Naive Bayes: Learning

 Sentiment and Binary Naive Bayes

 More on Sentiment Classification

 Naïve Bayes: Relationship to Language Modeling

 Text Classification Evaluation: Precision, Recall, and F1

 Full of zany characters and richly applied satire,

 this is the greatest screwball comedy ever filmed

 It was pathetic. The worst part about it was the

 MeSH Subject Category Hierarchy

• Antogonists and Inhibitors

 Assigning subject categories, topics, or genres

 Output: a predicted class

• Spam: black-list-address OR (“dollars” AND“have been selected”)

 But building and maintaining these rules is expensive

• a fixed set of classes C = {c1, c2,…, cj }

• A training set of m hand-labeled documents (d1,c1),....,(dm,cm)

• a learned classifier γ:d  c

 Simple ("naive") classification method based on Bayes

 Relies on very simple representation of document

 For a document d and a class c

Dropping the denominator

cMAP argmax P ( x1 , x2 ,..., xn | c) P (c)

O(|X| •|C|) parameters

 Conditional Independence: Assume the feature

probabilities P(xi | cj) are independent given the class c

cMAP argmax P ( x1 , x2 ,..., xn | c) P (c)

 There's a problem with this:

 Multiplying lots of probabilities can result in floating-point underflow!

0.0006 * 0.0007 * 0.0009 * 0.01 * 0.5 * 0.000008….

 Idea: Use logs, because log(ab) = log(a) + log(b)

We'll sum logs of probabilities instead of multiplying probabilities!

 First attempt: maximum likelihood estimates

• simply use the frequencies in the data

fraction of times word wi appears

 Create mega-document for topic j by concatenating all

 What if we have seen no training documents with the

 Zero probabilities cannot be conditioned away, no

 The solution: apply Laplace (add-1) smoothing for

 From training corpus, extract Vocabulary

• Calculate P (cj) terms • Calculate P (wk | cj) terms

• that appear in our test data

• Remove them from the test document!

 Why don't we build an unknown word model?

 Some systems ignore stop words

• Stop words: very frequent words like the and a.

 But removing stop words doesn't usually help

1. Prior from training:

 For tasks like sentiment, word occurrence seems to

• Clip our word counts at 1

• From training corpus, extract Vocabulary

• From training corpus, extract Vocabulary

 Then compute NB using the same equation:

Counts can still be 2! Binarization is within-doc! 39

 I really like this movie

 I really don't like this movie

 Negation changes the meaning of "like" to negative.

 Negation can also change negative to positive.

• Don't dismiss this film

 Simple baseline method:

 Add NOT_ to every word between negation and following

 didn’t like this movie , but I

 didn’t NOT_like NOT_this NOT_movie but I

 Sometimes we don't have enough labeled training

 In that case, we can make use of pre-built word lists

 There are various publicly available lexicons

 Home page: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

 6885 words from 8221 lemmas, annotated for intensity (strong/weak)

 − : awful, bad, bias, catastrophe, cheat, deny, envious, foul, harsh,

 Home page: http://www.wjh.harvard.edu/~inquirer

• Positiv (1915 words) and Negativ (2291 words)

 Free for Research Use

 Now all positive words (good, great, beautiful, wonderful) or negative