Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Advanced Analytics - Theory and Methods
1Module 4: Analytics Theory/Methods
1Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Advanced Analytics – Theory and Methods
During this lesson the following topics are covered:
• Naïve Bayesian Classifier
• Theoretical foundations of the classifier
• Use cases
• Evaluating the effectiveness of the classifier
• The Reasons to Choose (+) and Cautions (-) with the use of
the classifier
Naïve Bayesian Classifiers
Module 4: Analytics Theory/Methods 2
The topics covered in this lesson are listed.
Module 4: Analytics Theory/Methods 2
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Classifiers
• Classification: assign labels to objects.
• Usually supervised: training set of pre-classified examples.
• Our examples:
Module 4: Analytics Theory/Methods 3
Where in the catalog should I place this product listing?
Is this email spam?
Is this politician Democrat/Republican/Green?
The primary task performed by classifiers is to assign labels to
objects. Labels in classifiers are
pre-determined unlike in clustering where we discover the
structure and assign labels.
Classifier problems are supervised learning methods. We start
with a training set of pre-
classified examples and with the knowledge of probabilities we
assign class labels.
Some use case examples are shown in the slide. Based on the
voting pattern on issues we
could classify whether a politician has an affiliation to a party
or a principle. Retailers use
classifiers to assign proper catalog entry locations for their
products. Most importantly the
classification of emails as spam is another useful application of
classifier methods.
Logistic regression, discussed in the previous lesson, can be
viewed and used as a classifier. We
will discuss Naïve Bayesian Classifiers in this lesson and the
use of Decision Trees in the next
lesson.
Module 4: Analytics Theory/Methods 3
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Naïve Bayesian Classifier
• Determine the most probable class label for each object
t of each
other
may be classified (labeled) as a tennis ball
• Input variables are discrete
• Output:
– proportional to the true probability
– based on the highest probability score
4Module 4: Analytics Theory/Methods
The Naïve Bayesian Classifier is a probabilistic classifier ba sed
on Bayes' Law and naïve
conditional independence assumptions. In simple terms, a Naïve
Bayes Classifier assumes
that the presence (or absence) of a particular feature of a class
is unrelated to the presence (or
absence) of any other feature.
For example, an object can be classified into a particular
category based on its attributes such
as shape, color, and weight. A reasonable classification for an
object, that is spherical, yellow
and less than 60 grams in weight, may be a tennis ball. Even if
these features depend on each
other or upon the existence of the other features, a Naïve
Bayesian Classifier considers all of
these properties to independently contribute to the probability
that the object is a tennis ball.
The input variables are generally discrete (categorical) but there
are variations to the
algorithms that work with continuous variables as well. For
this lesson, we will consider only
discrete input variables. Although weight may be considered a
continuous variable, in the
tennis ball example, weight was grouped into intervals in order
to make weight a categorical
variable.
The output typically returns a probability score and class
membership. The output from most
implementations are log probability scores for the class (w e will
address this later in the lesson)
and we assign the class label that corresponds to the highest log
probability score.
4Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Naïve Bayesian Classifier - Use Cases
• Preferred method for many text classification problems.
complicated
• Use cases
raud detection
5Module 4: Analytics Theory/Methods
Naïve Bayesian Classifiers are among the most successful
known algorithms for learning to
classify text documents. Spam filtering is the best known use of
Naïve Bayesian Text
Classification. Bayesian Spam Filtering has become a popular
mechanism to distinguish
illegitimate spam email from legitimate email. Many modern
mail clients implement Bayesian
Spam Filtering.
Naïve Bayesian Classifiers are used to detect fraud. For
example in auto insurance, based on a
training data set with attributes (such as driver’s rating, vehicle
age, vehicle price, is it a claim
by the policy holder, police report status, claim genuine ) we
can classify a new claim as
genuine or not.
References:
Spam filtering
(http://en.wikipedia.org/wiki/Bayesian_spam_filtering)
http://www.cisjournal.org/archive/vol2no4/vol2no4_1.pdf
Hybrid Recommender System Using Naive Bayes Classifier and
Collaborative Filtering
(http://eprints.ecs.soton.ac.uk/18483/
Online applications (http://www.convo.co.uk/x02/)
5Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Building a Training Dataset to Predict Good or Bad Credit
6Module 4: Analytics Theory/Methods
• Predict the credit behavior of
a credit card applicant from
applicant's attributes:
• These are all categorical
variables and are better suited
to Naïve Bayesian Classifier
than to logistic regression.
Let us look into a specific use case example. We present here
the same example we worked
with in Lesson 2 of this module with the Apriori algorithm. The
training dataset consists of
attributes: personal status, job type, housing type and amount
of money in their savings
account. They are represented as categorical variables which are
well suited for Naïve Bayesian
Classifier.
With this training set we want to predict the credit behavior of a
new customer. This problem
could be solved with logistic regression as well. If there are
multiple levels for the outcome you
want to predict, then Naïve Bayesian Classifier is a better
solution.
Next, we will go through the technical basis for Naïve Bayesian
Classifiers and will revisit this
credit dataset later.
6Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Technical Description - Bayes' Law
• C is the class label:
• A is the observed object attributes
• P(C | A) is the probability of C given A is observed
7Module 4: Analytics Theory/Methods
)(
)()|(
)(
)(
)|(
AP
CPCAP
AP
CAP
AC
Bayes' Law states: P(C | A)*P(A) = P(A | C)*P(C) = P(A ^ C).
That is, the conditional probability that C is true given that A is
true, denoted P(C|A), times the probability of A is
the same as the conditional probability that A is true given that
C is true, denoted P(A|C), times the probability of
C. Both of these terms are equal to P(A^C) that is the
probability A and C are simultaneously true. If we divide all
three terms by P(A), then we get the form shown on the slide.
The reason that Bayes’ Law is important is that we may not
know P(C|A) (and we want to), but we do know P(A|C)
and P(C) for each possible value of C from the training data.
As we will see later, it is not necessary to know P(A)
for the purposes of Naïve Bayes Classifiers.
An example using Bayes Law:
John flies frequently and likes to upgrade his seat to first class.
He has determined that, if he checks in for his
flight at least two hours early, the probability that he will get
the upgrade is .75; otherwise, the probability that he
will get the upgrade is .35. With his busy schedule, he checks in
at least two hours before his flight only 40% of the
time. Suppose John didn’t receive an upgrade on his most
recent attempt. What is the probability that he arrived
late?
C = John arrives late
A = John did not receive an upgrade
P(C) = Probability John arrives late = .6
P(A) = Probability John did not receive an upgrade = 1 – ( .4 x
.75 + .6 x .35) = 1 -.51 = .49
P(A|C) = Probability that John did not receive an upgrade given
that he arrived late = 1 - .35 = .65
P(C|A) = Probability that John arrived late given that he did not
receive his upgrade
= P(A|C)P(C)/P(A) = (.65 x .6)/.49 = .80 (approx)
In this simple example, C can take one of two possible values
{arriving early, arriving late) and there is only one
attribute which can take one of two possible values {received
upgrade, did not receive upgrade}. Next, we will
generalize Bayes’ Law to multiple attributes and apply the naïve
independence assumptions.
7Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
• For observed attributes A = (a1, a2, … am), we want to
compute
and assign the classifier, Ci , with the largest P(Ci|A)
• Two simplifications to the calculations
- each aj is conditionally
independent of
each other, then
ni
aaaP
CPCaaaP
ACP
m
iim
i
,...,2 ,1
),...,,(
)()|,...,,(
)|(
21
Apply the Naïve Assumption and Remove a Constant
9Module 4: Analytics Theory/Methods
m
j
ijimiiim
CaPCaPCaPCaPCaaaP
1
2121
The general approach is to assign the classifier label, Ci, to the
object with attributes A = (a1, a2,
… am) that corresponds to the largest value of P(Ci|A).
The probability that a set of attribute values A (comprised of m
variables a1 thru am) should be
labeled with a classification Ci will equal the probabi lity that of
the set of variables a1 thru am
given Ci is true, times the probability of Ci all divided by the
probability of the set of attribute
values a1 thru am .
The conditional independence assumption is that the probability
of observing the value of a
particular attribute given Ci is independent of the other
attributes. This naïve assumption
simplifies the calculation of P(a1, a2, …, am|Ci) as shown on
the slide.
Since P(a1, a2, …, am) appears in the denominator of P(Ci|A),
for all values of i, removing the
denominator will have no impact on the relative probability
scores and will simplify
calculations. Next, these two simplifications to the
calculations will be applied to build the
Naïve Bayesian Classifier.
9Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Building a Naïve Bayesian Classifier
• Applying the two simplifications
• To build a Naïve Bayesian Classifier, collect the following
statistics from the training data:
10Module 4: Analytics Theory/Methods
niCPCaPaaaCP
i
m
j
ijmi
,...,2 ,1 )()|(),...,,|(
1
21
niCPCaP
i
m
j
ij
,...,2 ,1 )()|(
1
Applying the two simplifications, P(Ci|a1, a2, …, am) is
proportional to the product of the
various P(aj|Ci), for j=1,2,…m, times P(Ci). From a training
dataset, these probabilities can be
computed and stored for future classifier assignments. We now
return to the credit applicant
example.
10Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Naïve Bayesian Classifiers for the Credit Example
• Class labels: {good, bad}
• Conditional Probabilities
) = 0.75
11Module 4: Analytics Theory/Methods
To build a Naïve Bayesian Classifier we need to collect the
following statistics:
1. Probability of all class labels – Probability of good credit and
probability of bad credit. From
the all data available in the training set we determine P(good) =
0.7 and P(bad) = 0.3
2. In the training set, there are several attributes:
personal_status, job, housing, and
saving_status. For each attribute and its possible values, we
need to compute the
conditional probabilities given bad or good credit. For
example, relative to the housing
attribute, we need to compute P(own|bad), P(own|good),
P(rent|bad), P(rent|good), etc.
11Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Naïve Bayesian Classifier for a Particular Applicant
• Given applicant attributes of
A= {female single,
owns home,
self-employed,
savings > $1000}
• Since P(good|A) > (bad|A),
assign the applicant the label
"good" credit
aj Ci P(aj| Ci)
female single good 0.28
female single bad 0.36
own good 0.75
own bad 0.62
self emp good 0.14
self emp bad 0.17
savings>1K good 0.06
savings>1K bad 0.02
P(good|A) ~ (0.28*0.75*0.14*0.06)*0.7 = 0.0012
P(bad|A) ~ (0.36*0.62*0.17*0.02)*0.3 = 0.0002
12Module 4: Analytics Theory/Methods
Here we have an example of an applicant who is female, single,
owns a home, is self-employed
and has savings over $1000 in her savings account. How will we
classify this person? Will she
be scored as a person with good or bad credit?
Having built the classifier with the training set we find
P(good|A) which is equal to 0.0012 (see
the computation on the slide) and P(bad|A) is 0.0002. Since
P(good|A) is the maximum of the
two probability scores, we assign the label “good” credit.
The score is only proportional to the probability. It doesn't
equal the probability, because we
haven't included the denominator. However, both formulas have
the same denominator, so we
don't need to calculate it in order to know which quantity is
bigger.
Notice, though, how small in magnitude these scores are. When
we are looking at problems
with a large number of attributes, or attributes with a very high
number of levels, these values
can become very small in magnitude.
12Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Naïve Bayesian Implementation Considerations
• Numerical underflow
• Zero probabilities due to unobserved attribute/classifier pair s
amount)
• Assign the classifier label, Ci, that maximizes the value of
13Module 4: Analytics Theory/Methods
)(log)|(log
1
i
m
j
ij
where i = 1,2,…,n and
P’ denotes the adjusted probabilities
Multiplying several probability values, each possibly close to
zero, invariably leads to the
problem of numerical underflow. So an important
implementation guideline is to compute the
logarithm of the product of the probabilities, which is
equivalent to the summation of the
logarithm of the probabilities. Although the risk of underflow
may increase as the number of
attributes increase, the use of logarithms should be applied
regardless of the number of
attribute dimensions.
Additionally, to address the possibility of probabilities equal to
zero, smoothing techniques can
be employed to adjust the probabilities to ensure non-zero
values. Applying a smoothing
technique assigns a small non-zero probability to rare events not
included in the training
dataset. Also, the smoothing addresses the possibility of taking
the logarithm of zero.
The R implementation of Naïve Bayes incorporates the
smoothing directly into the probability
tables. Essentially, the Laplace smoothing that R uses adds one
(or a small value) to every
count. For example, if we have 100 "good" customers, and 20
of them rent their housing, the
"raw" P(rent | good) = 20/100 = 0.2; with Laplace smoothing
add adding one to the counts,
the calculation would be P(rent | good) ~ (20 + 1)/(100+3) =
0.20388, where there are 3
possible values for housing (own, rent, for free).
Fortunately, the use of the logarithms and the smoothing
techniques are already implemented
in standard software packages for Naïve Bayes Classifiers.
However, if for performance
reasons, the Naïve Bayes Classifier algorithm needs to be coded
directly into an application,
these considerations should be implemented.
13Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Diagnostics
• Hold-out data
• Cross-validation
• ROC curve/AUC
14Module 4: Analytics Theory/Methods
The diagnostics we used in regression can be used to validate
the effectiveness of the model
we built. The technique of using the hold-out data and
performing N-fold cross validations and
using the ROC/Area Under the Curve methods can be deployed
with Naïve Bayesian Classifier
as well.
14Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Prediction
Actual
Class
good bad
good 671 29 700
bad 38 262 300
709 291 1000
Diagnostics: Confusion Matrix
Overall success rate (or accuracy):
(TP + TN) / (TP+TN+FP+FN) = (671+262)/1000 ≈ 0.93
TPR: TP / (TP + FN) = 671 / (671+29) = 671/700 ≈ 0.96
FPR: FP / (FP + TN) = 38 / (38 + 262) = 38/300 ≈ 0.13
FNR: FN / (TP + FN) = 29 / (671 + 29) = 29/700 ≈ 0.04
Precision: TP/ (TP + FP) = 671/709 ≈ 0.95
Recall (or TPR): TP / (TP + FN) ≈ 0.96
false negatives (FN)
false positives (FP)
15Module 4: Analytics Theory/Methods
true positives (TP)
true negatives (TN)
A confusion matrix is a specific table layout that allows
visualization of the performance of a model. In
the hypothetical example of confusion matrix shown:
Of 1000 credit score samples, the system predicted that there
were good and bad credit, and of the 700
good credits, the model predicted 29 as bad and similarly 38 of
the actual bad credits were predicted as
good. All correct guesses are located in the diagonal of the
table, so it's easy to visually inspect the table
for errors, as they will be represented by any non-zero values
outside the diagonal.
We define overall success rate (or accuracy) as a metric
defining – what we got right - which is the ratio
between the sum of the diagonal values (i.e., TP and TN) vs. the
sum of the table. In other words, the
confusion table of a good model has large numbers diagonally
and small (ideally zero) numbers off-
diagonally.
We saw a true positive rate (TPR) and a false positive rate
(FPR) when we discussed ROC curves:
• TPR – what percent of positive instances did we correctly
identify.
• FPR – what percent of negatives we marked positive.
Additionally we can measure the false negative rate (FNR):
• FNR– what percent of positives we marked negative
The computation of TPR, FPR and FNR are shown in the slide.
Precision and Recall are accuracy metrics used by the
information retrieval community; they are often
used to characterize classifiers as well. We will detail these
metrics in lesson 8 of this module.
Note:
• precision – what percent of things we marked positive really
are positive
• recall – what percent of positive instances did we correctly
identify. Recall is equivalent to TPR.
15Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Reasons to Choose (+) Cautions (-)
Handles missing values quite well Numeric variables have to be
discrete
(categorized) Intervals
Robust to irrelevant variables Sensitive to correlated variables
"Double-counting"
Easy to implement Not good for estimating probabilities
Stick to class label or yes/no
Easy to score data
Resistant to over-fitting
Computationally efficient
Handles very high dimensional
problems
Handles categorical variables with a
lot of levels
Naïve Bayesian Classifier - Reasons to Choose (+)
and Cautions (-)
Module 4: Analytics Theory/Methods 16
The Reasons to Choose (+) and Cautions (-) of the Naïve
Bayesian Classifier are listed. Unlike
Logistic regression, missing values are handled well by the
Naïve Bayesian Classifier. It is also
very robust to irrelevant variables (irrelevant variables are
distributed among all the classes and
their effects are not pronounced).
The model is easy to implement and we will see how easily a
basic version can be implemented
in the lab without using any packages. Scoring data (predicting)
is very simple and the model is
resistant to over fitting. (Over fitting refers to fitting training
data so well that we fit the
idiosyncrasies such as the data that are not relevant in
characterizing the data). It is
computationally efficient and handles high dimensional
problems efficiently. Unlike logistic
regression Naïve Bayesian Classifier handles categorical
variables with a lot of levels.
The Cautions (-) are that it is sensitive to correlated variables as
the algorithm double counts
the effect of the correlated variables. For example people with
low income tend to default and
people with low credit tend to default. It is also true that people
with low income tend to have
low credit. If we try to score “default” with both low income
and low credit as variables we will
see the double counting effect in our model output and in the
scoring.
Though the probabilities are provided as an output of the scored
data, Naïve Bayesian Classifier
is not very reliable for the probability estimation and should be
used for class label assignments
only. Naïve Bayesian Classifier in its simple form is used only
with categorical variables and any
continuous variables should be rendered discrete into intervals.
You will learn more about this
in the lab. However it is not necessary to have the continuous
variables as “discrete” and
several standard implementations can handle continuous
variables as well.
Module 4: Analytics Theory/Methods 16
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Check Your Knowledge
1. Consider the following Training Data Set:
• Apply the Naïve Bayesian Classifier to this
data set and compute the probability
score for P(y = 1|X) for X = (1,0,0)
Show your work
2. List some prominent use cases of the Naïve Bayesian
Classifier.
3. What gives the Naïve Bayesian Classifier the advantage of
being
computationally inexpensive?
4. Why should we use log-likelihoods rather than pure
probability
values in the Naïve Bayesian Classifier?
X1 X2 X3 Y
1 1 1 0
1 1 0 0
0 0 0 0
0 1 0 1
1 0 1 1
0 1 1 1
Training Data Set
Your Thoughts?
17Module 4: Analytics Theory/Methods
Record your answers here. More Check Your Knowledge
questions are on the next page.
17Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Check Your Knowledge (Continued)
5. What is a confusion matrix and how it is used to evaluate the
effectiveness of the model?
6. Consider the following data set with two input features
temperature and season
• What is the Naïve Bayesian assumption?
• Is the Naïve Bayesian assumption satisfied for this problem?
Your Thoughts?
18Module 4: Analytics Theory/Methods
Temperature Season Electricty Usage
-10 to 50 F Winter High
50 to 70 F Winter Low
70 to 85 F Summer Low
85 to 110 F Summer High
Record your answers here.
18Module 4: Analytics Theory/Methods
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Advanced Analytics – Theory and Methods
During this lesson the following topics were covered:
• Naïve Bayesian Classifier
• Theoretical foundations of the classifier
• Use cases
• Evaluating the effectiveness of the classifier
• The Reasons to Choose (+) and Cautions (-) with the use of
the classifier
Naïve Bayesian Classifiers - Summary
Module 4: Analytics Theory/Methods 19
This lesson covered these topics. Please take a moment to
review them.
Module 4: Analytics Theory/Methods 19
Copyright © 2014 EMC Corporation. All rights reserved.Copy

More Related Content

DOCX
You need to have JavaScript enabled in order to access this site..docx
PDF
19BayesTheoremClassification19BayesTheoremClassification.ppt
PPTX
Lecture 10 Naive Bayes Classifier.hdghpptx
PPT
Unit-2.ppt
PPT
Text classification methods
PPT
Text classification methods
PPT
Text classification methods
PPT
Text classification methods
You need to have JavaScript enabled in order to access this site..docx
19BayesTheoremClassification19BayesTheoremClassification.ppt
Lecture 10 Naive Bayes Classifier.hdghpptx
Unit-2.ppt
Text classification methods
Text classification methods
Text classification methods
Text classification methods

Similar to Copyright © 2014 EMC Corporation. All rights reserved.Copy (20)

PPT
Text classification methods
PPT
Text classification methods
PPT
Text classificationmethods
PDF
Bayesianclassifiers
PPTX
Naive Bayes
PPT
BAYESIAN theorem and implementation of i
PPT
2.3 bayesian classification
DOCX
Naive bayes classifier
PDF
18 ijcse-01232
PPT
9-Decision Tree Induction-23-01-2025.ppt
PDF
naive bayes example.pdf
PDF
naive bayes example.pdf
PDF
Bayes 6
PDF
Naive Bayes Classifier
PDF
Machine learning naive bayes and svm.pdf
PPT
UNIT2_NaiveBayes algorithms used in machine learning
PPTX
UnlockingthePowerofBayesianClassification6ed840070da3a8bc.pptx
PPTX
Bayesian classification
PPTX
Naive Bayesian classifier Naive Bayesian classifier Naive Bayesian classifier
PPT
bayesNaive.ppt
Text classification methods
Text classification methods
Text classificationmethods
Bayesianclassifiers
Naive Bayes
BAYESIAN theorem and implementation of i
2.3 bayesian classification
Naive bayes classifier
18 ijcse-01232
9-Decision Tree Induction-23-01-2025.ppt
naive bayes example.pdf
naive bayes example.pdf
Bayes 6
Naive Bayes Classifier
Machine learning naive bayes and svm.pdf
UNIT2_NaiveBayes algorithms used in machine learning
UnlockingthePowerofBayesianClassification6ed840070da3a8bc.pptx
Bayesian classification
Naive Bayesian classifier Naive Bayesian classifier Naive Bayesian classifier
bayesNaive.ppt

More from shpopkinkz (20)

DOCX
Primar source- Choose an anthologized poem from our textbook and det.docx
DOCX
Previously we discussed how the Department of Homeland Security is t.docx
DOCX
Prevalence of Social Media  Please respond to the followingDi.docx
DOCX
Pretend you are a tour guide giving a tour of the human brain. In a .docx
DOCX
Pretend you are a Public Health Director at Local Health Department .docx
DOCX
Prevention of Disease through VaccinationChoose a diseasesdisorde.docx
DOCX
Presidential vs. Congressional ReconstructionCompare and contrast .docx
DOCX
Prevailing wisdom reinforces the fact that working in U.S. health ca.docx
DOCX
Prevalence rates for ADHD in children and adolescents have increased.docx
DOCX
Pretend you are a British government official during the time leadin.docx
DOCX
PresentationYou will present information about how realistic your .docx
DOCX
Pretend that you are participating in a public safety awareness foru.docx
DOCX
Presidential Elections and Peaceful Transitions Please respond to .docx
DOCX
Presenter StudentInstitution Grantham UniversityDate July.docx
DOCX
Presentation Logic is about much more than symbols, equations, pr.docx
DOCX
Present a draft of your final paper based on your chosen topic.(.docx
DOCX
Prepare an 8- to 10-slide Microsoft® PowerPoint® presentat.docx
DOCX
Preparing for disasters, terrorist threats, or communicable disease .docx
DOCX
Prepare Your initial post in this discussion must be informed by .docx
DOCX
Prepare written responses in Excel to the following assignments  f.docx
Primar source- Choose an anthologized poem from our textbook and det.docx
Previously we discussed how the Department of Homeland Security is t.docx
Prevalence of Social Media  Please respond to the followingDi.docx
Pretend you are a tour guide giving a tour of the human brain. In a .docx
Pretend you are a Public Health Director at Local Health Department .docx
Prevention of Disease through VaccinationChoose a diseasesdisorde.docx
Presidential vs. Congressional ReconstructionCompare and contrast .docx
Prevailing wisdom reinforces the fact that working in U.S. health ca.docx
Prevalence rates for ADHD in children and adolescents have increased.docx
Pretend you are a British government official during the time leadin.docx
PresentationYou will present information about how realistic your .docx
Pretend that you are participating in a public safety awareness foru.docx
Presidential Elections and Peaceful Transitions Please respond to .docx
Presenter StudentInstitution Grantham UniversityDate July.docx
Presentation Logic is about much more than symbols, equations, pr.docx
Present a draft of your final paper based on your chosen topic.(.docx
Prepare an 8- to 10-slide Microsoft® PowerPoint® presentat.docx
Preparing for disasters, terrorist threats, or communicable disease .docx
Prepare Your initial post in this discussion must be informed by .docx
Prepare written responses in Excel to the following assignments  f.docx

Recently uploaded (20)

PPTX
4. Diagnosis and treatment planning in RPD.pptx
PPT
Acidosis in Dairy Herds: Causes, Signs, Management, Prevention and Treatment
PPTX
PLASMA AND ITS CONSTITUENTS 123.pptx
PDF
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
PDF
faiz-khans about Radiotherapy Physics-02.pdf
PDF
FYJC - Chemistry textbook - standard 11.
PDF
Health aspects of bilberry: A review on its general benefits
PDF
Diabetes Mellitus , types , clinical picture, investigation and managment
PDF
Review of Related Literature & Studies.pdf
PDF
Laparoscopic Imaging Systems at World Laparoscopy Hospital
PPTX
Unit 1 aayurveda and nutrition presentation
PPTX
Neurology of Systemic disease all systems
PDF
Everyday Spelling and Grammar by Kathi Wyldeck
PPTX
Neurological complocations of systemic disease
PDF
Disorder of Endocrine system (1).pdfyyhyyyy
PPTX
Key-Features-of-the-SHS-Program-v4-Slides (3) PPT2.pptx
PDF
CHALLENGES FACED BY TEACHERS WHEN TEACHING LEARNERS WITH DEVELOPMENTAL DISABI...
PPTX
pharmaceutics-1unit-1-221214121936-550b56aa.pptx
PPTX
Power Point PR B.Inggris 12 Ed. 2019.pptx
PDF
Compact First Student's Book Cambridge Official
4. Diagnosis and treatment planning in RPD.pptx
Acidosis in Dairy Herds: Causes, Signs, Management, Prevention and Treatment
PLASMA AND ITS CONSTITUENTS 123.pptx
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
faiz-khans about Radiotherapy Physics-02.pdf
FYJC - Chemistry textbook - standard 11.
Health aspects of bilberry: A review on its general benefits
Diabetes Mellitus , types , clinical picture, investigation and managment
Review of Related Literature & Studies.pdf
Laparoscopic Imaging Systems at World Laparoscopy Hospital
Unit 1 aayurveda and nutrition presentation
Neurology of Systemic disease all systems
Everyday Spelling and Grammar by Kathi Wyldeck
Neurological complocations of systemic disease
Disorder of Endocrine system (1).pdfyyhyyyy
Key-Features-of-the-SHS-Program-v4-Slides (3) PPT2.pptx
CHALLENGES FACED BY TEACHERS WHEN TEACHING LEARNERS WITH DEVELOPMENTAL DISABI...
pharmaceutics-1unit-1-221214121936-550b56aa.pptx
Power Point PR B.Inggris 12 Ed. 2019.pptx
Compact First Student's Book Cambridge Official

Copyright © 2014 EMC Corporation. All rights reserved.Copy

  • 1. Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Advanced Analytics - Theory and Methods 1Module 4: Analytics Theory/Methods 1Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Advanced Analytics – Theory and Methods During this lesson the following topics are covered: • Naïve Bayesian Classifier • Theoretical foundations of the classifier • Use cases • Evaluating the effectiveness of the classifier • The Reasons to Choose (+) and Cautions (-) with the use of the classifier Naïve Bayesian Classifiers
  • 2. Module 4: Analytics Theory/Methods 2 The topics covered in this lesson are listed. Module 4: Analytics Theory/Methods 2 Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Classifiers • Classification: assign labels to objects. • Usually supervised: training set of pre-classified examples. • Our examples: Module 4: Analytics Theory/Methods 3 Where in the catalog should I place this product listing? Is this email spam? Is this politician Democrat/Republican/Green? The primary task performed by classifiers is to assign labels to objects. Labels in classifiers are
  • 3. pre-determined unlike in clustering where we discover the structure and assign labels. Classifier problems are supervised learning methods. We start with a training set of pre- classified examples and with the knowledge of probabilities we assign class labels. Some use case examples are shown in the slide. Based on the voting pattern on issues we could classify whether a politician has an affiliation to a party or a principle. Retailers use classifiers to assign proper catalog entry locations for their products. Most importantly the classification of emails as spam is another useful application of classifier methods. Logistic regression, discussed in the previous lesson, can be viewed and used as a classifier. We will discuss Naïve Bayesian Classifiers in this lesson and the use of Decision Trees in the next lesson. Module 4: Analytics Theory/Methods 3 Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Naïve Bayesian Classifier • Determine the most probable class label for each object t of each
  • 4. other may be classified (labeled) as a tennis ball • Input variables are discrete • Output: – proportional to the true probability – based on the highest probability score 4Module 4: Analytics Theory/Methods The Naïve Bayesian Classifier is a probabilistic classifier ba sed on Bayes' Law and naïve conditional independence assumptions. In simple terms, a Naïve Bayes Classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, an object can be classified into a particular category based on its attributes such as shape, color, and weight. A reasonable classification for an object, that is spherical, yellow and less than 60 grams in weight, may be a tennis ball. Even if these features depend on each other or upon the existence of the other features, a Naïve Bayesian Classifier considers all of
  • 5. these properties to independently contribute to the probability that the object is a tennis ball. The input variables are generally discrete (categorical) but there are variations to the algorithms that work with continuous variables as well. For this lesson, we will consider only discrete input variables. Although weight may be considered a continuous variable, in the tennis ball example, weight was grouped into intervals in order to make weight a categorical variable. The output typically returns a probability score and class membership. The output from most implementations are log probability scores for the class (w e will address this later in the lesson) and we assign the class label that corresponds to the highest log probability score. 4Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Naïve Bayesian Classifier - Use Cases • Preferred method for many text classification problems. complicated • Use cases
  • 6. raud detection 5Module 4: Analytics Theory/Methods Naïve Bayesian Classifiers are among the most successful known algorithms for learning to classify text documents. Spam filtering is the best known use of Naïve Bayesian Text Classification. Bayesian Spam Filtering has become a popular mechanism to distinguish illegitimate spam email from legitimate email. Many modern mail clients implement Bayesian Spam Filtering. Naïve Bayesian Classifiers are used to detect fraud. For example in auto insurance, based on a training data set with attributes (such as driver’s rating, vehicle age, vehicle price, is it a claim by the policy holder, police report status, claim genuine ) we can classify a new claim as genuine or not. References: Spam filtering (http://en.wikipedia.org/wiki/Bayesian_spam_filtering) http://www.cisjournal.org/archive/vol2no4/vol2no4_1.pdf Hybrid Recommender System Using Naive Bayes Classifier and Collaborative Filtering (http://eprints.ecs.soton.ac.uk/18483/ Online applications (http://www.convo.co.uk/x02/)
  • 7. 5Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Building a Training Dataset to Predict Good or Bad Credit 6Module 4: Analytics Theory/Methods • Predict the credit behavior of a credit card applicant from applicant's attributes: • These are all categorical variables and are better suited to Naïve Bayesian Classifier than to logistic regression. Let us look into a specific use case example. We present here the same example we worked with in Lesson 2 of this module with the Apriori algorithm. The training dataset consists of attributes: personal status, job type, housing type and amount of money in their savings account. They are represented as categorical variables which are
  • 8. well suited for Naïve Bayesian Classifier. With this training set we want to predict the credit behavior of a new customer. This problem could be solved with logistic regression as well. If there are multiple levels for the outcome you want to predict, then Naïve Bayesian Classifier is a better solution. Next, we will go through the technical basis for Naïve Bayesian Classifiers and will revisit this credit dataset later. 6Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Technical Description - Bayes' Law • C is the class label: • A is the observed object attributes • P(C | A) is the probability of C given A is observed 7Module 4: Analytics Theory/Methods )(
  • 9. )()|( )( )( )|( AP CPCAP AP CAP AC Bayes' Law states: P(C | A)*P(A) = P(A | C)*P(C) = P(A ^ C). That is, the conditional probability that C is true given that A is true, denoted P(C|A), times the probability of A is the same as the conditional probability that A is true given that C is true, denoted P(A|C), times the probability of C. Both of these terms are equal to P(A^C) that is the probability A and C are simultaneously true. If we divide all three terms by P(A), then we get the form shown on the slide. The reason that Bayes’ Law is important is that we may not know P(C|A) (and we want to), but we do know P(A|C) and P(C) for each possible value of C from the training data. As we will see later, it is not necessary to know P(A) for the purposes of Naïve Bayes Classifiers. An example using Bayes Law:
  • 10. John flies frequently and likes to upgrade his seat to first class. He has determined that, if he checks in for his flight at least two hours early, the probability that he will get the upgrade is .75; otherwise, the probability that he will get the upgrade is .35. With his busy schedule, he checks in at least two hours before his flight only 40% of the time. Suppose John didn’t receive an upgrade on his most recent attempt. What is the probability that he arrived late? C = John arrives late A = John did not receive an upgrade P(C) = Probability John arrives late = .6 P(A) = Probability John did not receive an upgrade = 1 – ( .4 x .75 + .6 x .35) = 1 -.51 = .49 P(A|C) = Probability that John did not receive an upgrade given that he arrived late = 1 - .35 = .65 P(C|A) = Probability that John arrived late given that he did not receive his upgrade = P(A|C)P(C)/P(A) = (.65 x .6)/.49 = .80 (approx) In this simple example, C can take one of two possible values {arriving early, arriving late) and there is only one attribute which can take one of two possible values {received upgrade, did not receive upgrade}. Next, we will generalize Bayes’ Law to multiple attributes and apply the naïve independence assumptions. 7Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.
  • 11. • For observed attributes A = (a1, a2, … am), we want to compute and assign the classifier, Ci , with the largest P(Ci|A) • Two simplifications to the calculations - each aj is conditionally independent of each other, then ni aaaP CPCaaaP ACP m iim i ,...,2 ,1 ),...,,( )()|,...,,( )|( 21 Apply the Naïve Assumption and Remove a Constant
  • 12. 9Module 4: Analytics Theory/Methods m j ijimiiim CaPCaPCaPCaPCaaaP 1 2121 The general approach is to assign the classifier label, Ci, to the object with attributes A = (a1, a2, … am) that corresponds to the largest value of P(Ci|A). The probability that a set of attribute values A (comprised of m variables a1 thru am) should be labeled with a classification Ci will equal the probabi lity that of the set of variables a1 thru am given Ci is true, times the probability of Ci all divided by the probability of the set of attribute values a1 thru am . The conditional independence assumption is that the probability of observing the value of a particular attribute given Ci is independent of the other attributes. This naïve assumption simplifies the calculation of P(a1, a2, …, am|Ci) as shown on
  • 13. the slide. Since P(a1, a2, …, am) appears in the denominator of P(Ci|A), for all values of i, removing the denominator will have no impact on the relative probability scores and will simplify calculations. Next, these two simplifications to the calculations will be applied to build the Naïve Bayesian Classifier. 9Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Building a Naïve Bayesian Classifier • Applying the two simplifications • To build a Naïve Bayesian Classifier, collect the following statistics from the training data: 10Module 4: Analytics Theory/Methods niCPCaPaaaCP i m
  • 15. ,...,2 ,1 )()|( 1 Applying the two simplifications, P(Ci|a1, a2, …, am) is proportional to the product of the various P(aj|Ci), for j=1,2,…m, times P(Ci). From a training dataset, these probabilities can be computed and stored for future classifier assignments. We now return to the credit applicant example. 10Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved.
  • 16. Naïve Bayesian Classifiers for the Credit Example • Class labels: {good, bad} • Conditional Probabilities ) = 0.75 11Module 4: Analytics Theory/Methods To build a Naïve Bayesian Classifier we need to collect the following statistics: 1. Probability of all class labels – Probability of good credit and probability of bad credit. From the all data available in the training set we determine P(good) = 0.7 and P(bad) = 0.3 2. In the training set, there are several attributes: personal_status, job, housing, and saving_status. For each attribute and its possible values, we need to compute the conditional probabilities given bad or good credit. For example, relative to the housing attribute, we need to compute P(own|bad), P(own|good),
  • 17. P(rent|bad), P(rent|good), etc. 11Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Naïve Bayesian Classifier for a Particular Applicant • Given applicant attributes of A= {female single, owns home, self-employed, savings > $1000} • Since P(good|A) > (bad|A), assign the applicant the label "good" credit aj Ci P(aj| Ci) female single good 0.28 female single bad 0.36 own good 0.75 own bad 0.62 self emp good 0.14 self emp bad 0.17
  • 18. savings>1K good 0.06 savings>1K bad 0.02 P(good|A) ~ (0.28*0.75*0.14*0.06)*0.7 = 0.0012 P(bad|A) ~ (0.36*0.62*0.17*0.02)*0.3 = 0.0002 12Module 4: Analytics Theory/Methods Here we have an example of an applicant who is female, single, owns a home, is self-employed and has savings over $1000 in her savings account. How will we classify this person? Will she be scored as a person with good or bad credit? Having built the classifier with the training set we find P(good|A) which is equal to 0.0012 (see the computation on the slide) and P(bad|A) is 0.0002. Since P(good|A) is the maximum of the two probability scores, we assign the label “good” credit. The score is only proportional to the probability. It doesn't equal the probability, because we haven't included the denominator. However, both formulas have the same denominator, so we don't need to calculate it in order to know which quantity is bigger. Notice, though, how small in magnitude these scores are. When we are looking at problems with a large number of attributes, or attributes with a very high number of levels, these values can become very small in magnitude.
  • 19. 12Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Naïve Bayesian Implementation Considerations • Numerical underflow • Zero probabilities due to unobserved attribute/classifier pair s amount) • Assign the classifier label, Ci, that maximizes the value of 13Module 4: Analytics Theory/Methods )(log)|(log 1 i m j ij
  • 20. where i = 1,2,…,n and P’ denotes the adjusted probabilities Multiplying several probability values, each possibly close to zero, invariably leads to the problem of numerical underflow. So an important implementation guideline is to compute the logarithm of the product of the probabilities, which is equivalent to the summation of the logarithm of the probabilities. Although the risk of underflow may increase as the number of attributes increase, the use of logarithms should be applied regardless of the number of attribute dimensions. Additionally, to address the possibility of probabilities equal to zero, smoothing techniques can be employed to adjust the probabilities to ensure non-zero
  • 21. values. Applying a smoothing technique assigns a small non-zero probability to rare events not included in the training dataset. Also, the smoothing addresses the possibility of taking the logarithm of zero. The R implementation of Naïve Bayes incorporates the smoothing directly into the probability tables. Essentially, the Laplace smoothing that R uses adds one (or a small value) to every count. For example, if we have 100 "good" customers, and 20 of them rent their housing, the "raw" P(rent | good) = 20/100 = 0.2; with Laplace smoothing add adding one to the counts, the calculation would be P(rent | good) ~ (20 + 1)/(100+3) = 0.20388, where there are 3 possible values for housing (own, rent, for free). Fortunately, the use of the logarithms and the smoothing techniques are already implemented in standard software packages for Naïve Bayes Classifiers. However, if for performance reasons, the Naïve Bayes Classifier algorithm needs to be coded directly into an application, these considerations should be implemented. 13Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Diagnostics
  • 22. • Hold-out data • Cross-validation • ROC curve/AUC 14Module 4: Analytics Theory/Methods The diagnostics we used in regression can be used to validate the effectiveness of the model we built. The technique of using the hold-out data and performing N-fold cross validations and using the ROC/Area Under the Curve methods can be deployed with Naïve Bayesian Classifier as well. 14Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Prediction Actual Class good bad good 671 29 700 bad 38 262 300
  • 23. 709 291 1000 Diagnostics: Confusion Matrix Overall success rate (or accuracy): (TP + TN) / (TP+TN+FP+FN) = (671+262)/1000 ≈ 0.93 TPR: TP / (TP + FN) = 671 / (671+29) = 671/700 ≈ 0.96 FPR: FP / (FP + TN) = 38 / (38 + 262) = 38/300 ≈ 0.13 FNR: FN / (TP + FN) = 29 / (671 + 29) = 29/700 ≈ 0.04 Precision: TP/ (TP + FP) = 671/709 ≈ 0.95 Recall (or TPR): TP / (TP + FN) ≈ 0.96 false negatives (FN) false positives (FP) 15Module 4: Analytics Theory/Methods true positives (TP) true negatives (TN) A confusion matrix is a specific table layout that allows visualization of the performance of a model. In the hypothetical example of confusion matrix shown: Of 1000 credit score samples, the system predicted that there were good and bad credit, and of the 700 good credits, the model predicted 29 as bad and similarly 38 of the actual bad credits were predicted as
  • 24. good. All correct guesses are located in the diagonal of the table, so it's easy to visually inspect the table for errors, as they will be represented by any non-zero values outside the diagonal. We define overall success rate (or accuracy) as a metric defining – what we got right - which is the ratio between the sum of the diagonal values (i.e., TP and TN) vs. the sum of the table. In other words, the confusion table of a good model has large numbers diagonally and small (ideally zero) numbers off- diagonally. We saw a true positive rate (TPR) and a false positive rate (FPR) when we discussed ROC curves: • TPR – what percent of positive instances did we correctly identify. • FPR – what percent of negatives we marked positive. Additionally we can measure the false negative rate (FNR): • FNR– what percent of positives we marked negative The computation of TPR, FPR and FNR are shown in the slide. Precision and Recall are accuracy metrics used by the information retrieval community; they are often used to characterize classifiers as well. We will detail these metrics in lesson 8 of this module. Note: • precision – what percent of things we marked positive really are positive
  • 25. • recall – what percent of positive instances did we correctly identify. Recall is equivalent to TPR. 15Module 4: Analytics Theory/Methods Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Reasons to Choose (+) Cautions (-) Handles missing values quite well Numeric variables have to be discrete (categorized) Intervals Robust to irrelevant variables Sensitive to correlated variables "Double-counting" Easy to implement Not good for estimating probabilities Stick to class label or yes/no Easy to score data Resistant to over-fitting Computationally efficient Handles very high dimensional problems
  • 26. Handles categorical variables with a lot of levels Naïve Bayesian Classifier - Reasons to Choose (+) and Cautions (-) Module 4: Analytics Theory/Methods 16 The Reasons to Choose (+) and Cautions (-) of the Naïve Bayesian Classifier are listed. Unlike Logistic regression, missing values are handled well by the Naïve Bayesian Classifier. It is also very robust to irrelevant variables (irrelevant variables are distributed among all the classes and their effects are not pronounced). The model is easy to implement and we will see how easily a basic version can be implemented in the lab without using any packages. Scoring data (predicting) is very simple and the model is resistant to over fitting. (Over fitting refers to fitting training data so well that we fit the idiosyncrasies such as the data that are not relevant in characterizing the data). It is computationally efficient and handles high dimensional problems efficiently. Unlike logistic regression Naïve Bayesian Classifier handles categorical variables with a lot of levels. The Cautions (-) are that it is sensitive to correlated variables as the algorithm double counts the effect of the correlated variables. For example people with low income tend to default and people with low credit tend to default. It is also true that people
  • 27. with low income tend to have low credit. If we try to score “default” with both low income and low credit as variables we will see the double counting effect in our model output and in the scoring. Though the probabilities are provided as an output of the scored data, Naïve Bayesian Classifier is not very reliable for the probability estimation and should be used for class label assignments only. Naïve Bayesian Classifier in its simple form is used only with categorical variables and any continuous variables should be rendered discrete into intervals. You will learn more about this in the lab. However it is not necessary to have the continuous variables as “discrete” and several standard implementations can handle continuous variables as well. Module 4: Analytics Theory/Methods 16 Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Check Your Knowledge 1. Consider the following Training Data Set: • Apply the Naïve Bayesian Classifier to this data set and compute the probability score for P(y = 1|X) for X = (1,0,0) Show your work
  • 28. 2. List some prominent use cases of the Naïve Bayesian Classifier. 3. What gives the Naïve Bayesian Classifier the advantage of being computationally inexpensive? 4. Why should we use log-likelihoods rather than pure probability values in the Naïve Bayesian Classifier? X1 X2 X3 Y 1 1 1 0 1 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 1 1 Training Data Set Your Thoughts? 17Module 4: Analytics Theory/Methods Record your answers here. More Check Your Knowledge questions are on the next page. 17Module 4: Analytics Theory/Methods
  • 29. Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Check Your Knowledge (Continued) 5. What is a confusion matrix and how it is used to evaluate the effectiveness of the model? 6. Consider the following data set with two input features temperature and season • What is the Naïve Bayesian assumption? • Is the Naïve Bayesian assumption satisfied for this problem? Your Thoughts? 18Module 4: Analytics Theory/Methods Temperature Season Electricty Usage -10 to 50 F Winter High 50 to 70 F Winter Low 70 to 85 F Summer Low 85 to 110 F Summer High Record your answers here. 18Module 4: Analytics Theory/Methods
  • 30. Copyright © 2014 EMC Corporation. All rights reserved. Copyright © 2014 EMC Corporation. All Rights Reserved. Advanced Analytics – Theory and Methods During this lesson the following topics were covered: • Naïve Bayesian Classifier • Theoretical foundations of the classifier • Use cases • Evaluating the effectiveness of the classifier • The Reasons to Choose (+) and Cautions (-) with the use of the classifier Naïve Bayesian Classifiers - Summary Module 4: Analytics Theory/Methods 19 This lesson covered these topics. Please take a moment to review them. Module 4: Analytics Theory/Methods 19