SlideShare a Scribd company logo
SWAPNA.C
Asst.Prof.
IT Dept.
Sridevi Women’s Engineering College.
EVALUATING HYPOTHESIS
Reasons to Evaluate hypothesis
 Understand whether to use the hypothesis.
 The evaluating hypotheses is an integral component of
many learning methods.
Swapna.C
 Difficulties:
 Bias in the estimate: Estimator accuracy for future is poor
which learned from training examples. To get unbiased
estimate test the hypothesis on some set of examples
chosen independently of the training examples.
 Variance in the estimate:
 With Unbiased set measured accuracy vary from true
accuracy. The smaller the set of test ex. The greater the
expected variance.
Swapna.C
ESTIMATING HYPOTHESIS ACCURACY
 When evaluating a learned hypothesis we are most often
interested in estimation the accuracy with which it will
classify future instances.
 Generally 2 questions arise:
 1.Given a hypothesis h and a data sample containing n ex.
drawn at random according to the distribution D, what is
the best estimate of the accuracy of h over future instances
drawn from the same distribution?
 2. What is the probable error in this accuracy estimate?
Swapna.C
Sample Error and True Error
Swapna.C
Swapna.C
Confidence Intervals for Discrete-
Valued Hypotheses
 suppose we wish to estimate the true error for some
discrete valued hypothesis h, based on its observed sample
error over a sample S, where the sample S contains n
examples drawn independent of one another, and
independent of h, according to the probability distribution
V
 n>30
 hypothesis h commits r errors over these n examples (i.e.,
errors(h) = r/n).
Swapna.C
 Under these conditions, statistical theory allows us to make
the following assertions:
 95% confidence interval.
 different constant is used to calculate the N%
confidence interval.
Swapna.C
 Data sample S contains n =40 examples on hypothesis
Errors r= 12
 Sample Error errors (h) =12/40=0.30
 Accuracy= 1-0.3= 0.7 95% confidence interval
 errorD (h) 0.30+(1.96*0.70)=0.30+0.14.
 If it satisfies then that is the correct interval
Swapna.C
BASICS OF SAMPLING THEORY
Basic Definitions:
Swapna.C
Swapna.C
BASICS OF SAMPLING THEORY
 Error Estimation and Estimating Binomial
Proportions: The Binomial distribution
Swapna.C
The Binomial Distribution
 The general setting to which the Binomial distribution applies is:
 1. There is a base, or underlying, experiment (e.g., toss of the
coin) whose outcome can be described by a random variable, say Y. The
random variable Y can take on two possible values (e.g., Y = 1 if heads, Y
= 0 if tails).
 2. The probability that Y = 1 on any single trial of the underlying
experiment is given by some constant p, independent of the outcome
of any other experiment. The probability that Y = 0 is therefore (1 - p).
Typically, p is not known in advance, and the problem is to estimate it.
 3. A series of n independent trials of the underlying experiment
is performed (e.g., n independent coin tosses), producing the
sequence of independent, identically distributed random variables Yl,
Yz, . . , Yn. Let R denote the number of trials for which Yi = 1 in this
series of n experiments
Swapna.C
 4. The probability that the random variable R will take on
a specific value r (e.g., the probability of observing exactly
r heads) is given by the Binomial
distribution
 The Binomial distribution characterizes the probability of
observing r heads from n coin flip experiments, as well as
the probability of observing r errors in a data sample
containing n randomly drawn instances.
Swapna.C
 Mean and Variance:
 Two properties of a random variable that are often of
interest are its expected value (also called its mean value)
and its variance. The expected value is the average of the
values taken on by repeatedly sampling the random
variable. More precisely
 Definition: Consider a random variable Y that takes
on the possible values yl, . . . yn.
 The expected value of Y, E[Y], is
Swapna.C
 For ex. If takes on the value 1 with probability 0.7 and
the value 2 with probability 0.3, then its expected value
is (1.0.7+2.0.3=1.3) E(Y)=np(Binomial Distribution)
Swapna.C
 By a Binomial distribution
Swapna.C
Estimators, Bias, and Variance
where n is the number of instances in the sample S, r
is the number of instances from S misclassified by h,
and p is the probability of misclassifying a single
instance drawn from D.
Swapna.C
 Statisticians call errors(h) an estimator for the true
error errorD(h).
 The estimation bias to be the difference between the
expected value of the estimator and the true value of
the parameter.
 If the estimation bias is zero, we say that Y is an
unbiased estimator for p.
Swapna.C
 The average of many random values of Y generated by
repeated random experiments (i.e., E[Y]) converges
toward p. The hypothesis on the training examples
provides an optimistically biased estimate of
hypothesis error, exactly notion of estimation bias.
 The estimation bias is a numerical quantity, whereas
the inductive bias is a set of assertions.
 Variance, this choice will yield the smallest expected
squared error between the estimate and the true value
of the parameter.
Swapna.C
 To test hypotheses r = 12 errors on a sample of n = 40 randomly
drawn test examples. Then an unbiased estimate for errorD(h) is
given by
errors(h) = r/n = 0.3.
 The variance in this estimate arises completely from the variance
in r, because n is a constant. Because r is Binomially distributed,
its variance is given by np(1 - p). Unfortunately p is unknown,
but we can substitute our estimate r/n for p. This yields an
estimated variance in r of 40. 0.3(1 -0.3) = 8.4, or a corresponding
standard deviation of a ;j: 2.9. his implies that the standard
deviation in errors(h) = r/n is approximately 2.9140 = .07. To
summarize, errors(h) in this case is observed to be 0.30, with a
standard deviation of approximately 0.07.
Swapna.C
 In given r errors in a sample of n independently drawn
test examples, the standard deviation errors(h)given
 Which can be approximated by substituting
r/n=errors(h) for p
Swapna.C
Confidence Intervals
For example, if we observe r = 12 errors in a sample of
n = 40 independently drawn examples, we can say
with approximately 95% probability that the interval
0.30+ 0.14 contains the true error errorD(h).
Swapna.C
 An easily calculated and very good approximation can
be found in most cases, based on the fact the for
sufficiently large sample sizes the Binomial
distribution can be closely approximated by the
Normal distribution.
 It is a bell-shaped distribution fully specified by its
mean and standard deviation .
 For large n, any Binomial distribution is very closely
approximated by a Normal distribution with the same
mean and variance.
Swapna.C
Normal or Gaussian Distribution
Swapna.C
 If a random variable Y obeys a Normal distribution
with mean and standard deviation , then the
measured random value y of Y will fall into the
following interval N% of the time
 Equivalently, the means will fall into the following
interval N% of the time
Swapna.C
 We can easily combine this fact with earlier facts to derive
the general expression for N% confidence intervals for
discrete-valued hypotheses .
 1st We know that errors(h)follows a Binomial distribution
with mean value errorD(h) and standard deviation.
 2nd Sufficiently large sample size n, this Binomial
distribution in well approximated by a Normal
distribution.
 3rd How to find the N% confidence interval for estimating
the mean value of a Normal distribution.
Swapna.C
 Substituting the mean and standard deviation of
errors (h) yields the expression for N% confidence
intervals for discrete-valued hypotheses
Swapna.C
2 Approximations to derive expression
 1.In estimating the standard deviation of errors(h),
We have approximated errorD (h) by errors (h) and
2.The binomial distribution has been approximated
by the Normal distribution.
 The common rule of thumb in statistics is that these two
approximations are very good as long as n >30, or when
np(1- p) > =5. For smaller values of n it is wise to use a table
giving exact values for the Binomial distribution.
Swapna.C
Two-sided and One-sided Bounds
 Confidence interval is a 2-sided bound , it bounds the
estimated quantity from above and below.
 The fact that the Normal distribution is symmetric
about its mean. Two-sided confidence interval based
on a Normal distribution can be converted to a
corresponding one-sided interval with twice the
confidence.
Swapna.C
Swapna.C
 That is, a 100(1- )% confidence interval with lower bound
L and upper bound U implies a 100(1- /2)% confidence
interval with lower bound L and no upper bound. It also
implies a 100(1 - /2)% confidence interval with upper
bound U and no lower bound. Here a corresponds
to the probability that the correct value lies outside the
stated interval. In other words, a is the probability that the
value will fall into the unshaded region in Figure (a),
and /2 is the probability that it will fall into the
unshaded region in Figure (b).
Swapna.C
 consider again the example in which h commits r = 12
errors over a sample of n = 40 independently drawn
examples. This leads to a (two-sided) 95% confidence
interval of 0.30 +0.14. In this case, 100(1 - ) = 95%,
so = 0.05. We can apply the rule to say with
100(1 - /2) = 97.5% confidence that error D(h) is at
most 0.30 + 0.14 = 0.44, making no assertion about the
lower bound on errorD (h). Thus, we have a one sided
error bound on errorD (h) with double the confidence
that we had in the corresponding two-sided bound.
Swapna.C
A GENERAL APPROACH FOR DERIVING
CONFIDENCE INTERVALS
 The general process includes the following steps:
1. Identify the underlying population parameter p to be
estimated, for example, errorD(h).
2. Define the estimator Y (e.g., errors(h)). It is desirable to
choose a minimum variance, unbiased estimator.
3. Determine the probability distribution Dy that governs
the estimator Y, including its mean and variance.
4. Determine the N% confidence interval by finding
thresholds L and U such that N% of the mass in the
probability distribution Dy falls between L and U.
Swapna.C
a)Central Limit Theorem
Swapna.C
 Let p denote the mean of the unknown distribution
governing each of the Yi and let a denote the standard
deviation. We say that these variables Yi are
independent, identically distributed random variables,
because they describe independent experiments, each
obeying the same underlying probability distribution.
In an attempt to estimate the mean p of the
distribution governing the Yi, we calculate the sample
mean
Swapna.C
 The Central Limit Theorem states that the probability
distribution governing Yn approaches a Normal
distribution as n-- regardless of the distribution
that governs the underlying random variables Yi.
Furthermore, the mean of the distribution governing
Yn approaches and the standard deviation
approaches / .
Swapna.C
 The Central Limit Theorem is a very useful fact,
because it implies that whenever we define an
estimator that is the mean of some sample (e.g.,
errors(h) is the mean error), the distribution
governing this estimator can be approximated
by a Normal distribution for sufficiently large n.
Swapna.C
DIFFERENCE IN ERROR OF TWO HYPOTHESES
 Consider the case where we have two hypotheses hl
and h2 for some discrete valued target function.
Hypothesis hl has been tested on a sample S1
containing n1 randomly drawn examples, and h2 has
been tested on an independent sampleS2 containing
n2 examples drawn from the same distribution.
Swapna.C
d gives an unbiased estimate of d, that is E[d]=d.
For large n1 and n2(e.g., both>30),both
errors1(h1) and errors2(h2) follow distribution
that are approximately Normal. Because of
2Normal distributions is also Normal
distribution, d will also follow a distribution
that is approximately Normal, with mean d.
Swapna.C
 Variance is the sum of the variances of errors =
 N% confidence interval estimate for d is
 h1 and h2 are tested on independent data samples
 d = errors(h1)-errors(h2) This is smaller than the
variance of set s1, and s2 to s.
Swapna.C
Hypothesis Testing
 suppose we measure the sample errors for hl and h2 using
two independent samples S1 and S2 of size 100 and find
that errors1 (hl) = .30 and errors2(h2) = .20, hence the
observed difference is d = .10.
 Note the probability Pr(d > 0) is equal to the probability
that d has not overestimated d by more than .10. Put
another way, this is the probability that d falls into the
one-sided interval d< d + .10. Since d is the mean of the
distribution governing d, we can equivalently express this
one-sided interval as d < + .10. It falls into one -sided
Swapna.C
1.64
we find that 1.64 standard deviations about the mean
corresponds to a two-sided interval with confidence level
90%. Therefore, the one-sided interval will have an associated
confidence level of 95%.
Therefore, given the observed d = .10, the probability that
error D(h1) > errorD(h2) is approximately .95. In the
terminology of the statistics literature, we say that we accept
the hypothesis that "errorD (hl) > error D(h2)" with confidence
0.95. Alternatively, we may state that we reject the opposite
hypothesis (often called the null hypothesis) at a (1 - 0.95) =
.05 level of significance.
Swapna.C
COMPARING LEARNING ALGORITHMS
 Comparing the performance of 2 learning algorithms
LA and LB rather than 2 specific hypotheses.
To determine 1 method is “on average” is to consider
the relative performance of 2 algorithms averaged over
all the training sets of size n over distribution D.
Other one is Expected value of the difference in their
errors
Swapna.C
where L(S) denotes the hypothesis output by learning
method L when given the sample S of training data and
where the subscript S c D indicates that the expected value
is taken over samples S drawn according to the underlying
instance distribution D.
Swapna.C
 Limited sample D0, divides into training set S0, Test
set T0. Algorithms LA and LB .
 Quantity is
 2 key differences between this estimator and the
quantity
 1.We are using error T0(h) to approximate errorD(h).
 2.Measuring the difference in errors for one training
set S0 rather than taking the expected values of this
difference over all samples S that might be drawn from
the distribution D.
Swapna.C
 D0 into disjoint training sets and test sets.
Swapna.C
 One way to improve on the estimator is to repeatedly
partition the data Do into disjoint training and test sets
and to take the mean of the test set errors for these
different experiments.
 This procedure first partitions the data into k disjoint
subsets of equal size, where this size is at least 30. It then
trains and tests the learning algorithms k times, using each
of the k subsets in turn as the test set, and using all
remaining data as the training set. In this way, the learning
algorithms are tested on k independent test sets, 'and the
mean difference in errors is returned as an estimate of the
difference between the two learning algorithms.
Swapna.C
 We can
 The approximate N% confidence interval for
estimating the quantity
Swapna.C
The first specifies the desired confidence level, as it did for our
earlier constant ZN. The second parameter, called the number
of degrees of freedom and usually denoted by v, is related to
the number of independent random events that go into
producing the value for the random variable 8. In the current
setting, the number of degrees of freedom is k - 1.
Swapna.C
Swapna.C
 The procedure described here for comparing two learning
methods involves testing the two learned hypotheses on
identical test sets.
 comparing hypotheses that have been evaluated using two
independent test sets. Tests where the hypotheses are
evaluated over identical samples are called paired tests.
Paired tests typically produce tighter confidence intervals
because any differences in observed errors in a paired test
are due to differences between the hypotheses. In contrast,
when the hypotheses are tested on separate data samples,
differences in the two sample errors might be partially
attributable to differences in the makeup of the two
samples.
Swapna.C
a)Paired t Tests
 consider the following estimation problem:
Swapna.C
 This problem of estimating the distribution mean p based
on the sample mean Y is quite general
 Use steps of comparing learning methods. Assume that
instead of having a fixed sample of data D0, we can request
new training examples drawn according to the underlying
instance distribution. Modify the steps on each iteration
through the loop it generates a new random training set Si
and new random test set Ti by drawing from this
underlying instance distribution instead of drawing from
this underlying instance distribution instead of drawing
from the fixed sample D0.
Swapna.C
 The t test applies, in which the task is to estimate the
sample mean of a collection of independent,
identically and Normally distributed random variables.
 Confidence interval =
Swapna.C
Swapna.C
b)Practical Considerations:
 In practice, the problem is that the only way to
generate new Si is to resample Do, dividing it into
training and test sets in different ways. The i are not
independent of one another in this case, because they
are based on overlapping sets of training examples
drawn from the limited subset Do of data, rather than
from the full distribution 'D.
Swapna.C
 k-fold method in which Do is partitioned into k disjoint,
equal-sized subsets. In this k-fold approach, each example
from Do is used exactly once in a test set, and k - 1 times in
a training set. A
 second popular approach is to randomly choose a test set of
at least 30 examples from Do, use the remaining examples
for training, then repeat this process as many times as
desired.
 This randomized method has the advantage that it can be
repeated an indefinite number of times, to shrink the
confidence interval to the desired width.
Swapna.C
 In contrast, the k-fold method is limited by the total
number of examples, by the use of each example only once
in a test set, and by our desire to use samples of size at least
30.
 However, the randomized method has the disadvantage
that the test sets no longer qualify as being independently
drawn with respect to the underlying instance distribution
D.
 In contrast, the test sets generated by k-fold cross
validation are independent because each instance is
included in only one test set.
Swapna.C

More Related Content

What's hot (20)

PPTX
First order logic
Megha Sharma
 
PPT
AI Lecture 7 (uncertainty)
Tajim Md. Niamat Ullah Akhund
 
PDF
AI 7 | Constraint Satisfaction Problem
Mohammad Imam Hossain
 
PPTX
Asymptotic notations
Nikhil Sharma
 
PPTX
Performance analysis(Time & Space Complexity)
swapnac12
 
PDF
Daa notes 3
smruti sarangi
 
PPTX
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
PPTX
Inductive bias
swapnac12
 
PPTX
Random forest
Musa Hawamdah
 
PPTX
Introduction to Machine Learning
Lior Rokach
 
PDF
Artificial Intelligence Notes Unit 1
DigiGurukul
 
PDF
Linear regression
MartinHogg9
 
PPTX
Knowledge representation In Artificial Intelligence
Ramla Sheikh
 
PPTX
Semantic Networks
Jenny Galino
 
PPTX
Learning With Complete Data
Vishnuprabhu Gopalakrishnan
 
PPT
Learning sets of rules, Sequential Learning Algorithm,FOIL
Pavithra Thippanaik
 
PDF
Lexical Analysis - Compiler design
Aman Sharma
 
PPT
Game Playing in Artificial Intelligence
lordmwesh
 
PPTX
Naive Bayes
Abdullah al Mamun
 
PPTX
Concept learning
Musa Hawamdah
 
First order logic
Megha Sharma
 
AI Lecture 7 (uncertainty)
Tajim Md. Niamat Ullah Akhund
 
AI 7 | Constraint Satisfaction Problem
Mohammad Imam Hossain
 
Asymptotic notations
Nikhil Sharma
 
Performance analysis(Time & Space Complexity)
swapnac12
 
Daa notes 3
smruti sarangi
 
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Inductive bias
swapnac12
 
Random forest
Musa Hawamdah
 
Introduction to Machine Learning
Lior Rokach
 
Artificial Intelligence Notes Unit 1
DigiGurukul
 
Linear regression
MartinHogg9
 
Knowledge representation In Artificial Intelligence
Ramla Sheikh
 
Semantic Networks
Jenny Galino
 
Learning With Complete Data
Vishnuprabhu Gopalakrishnan
 
Learning sets of rules, Sequential Learning Algorithm,FOIL
Pavithra Thippanaik
 
Lexical Analysis - Compiler design
Aman Sharma
 
Game Playing in Artificial Intelligence
lordmwesh
 
Naive Bayes
Abdullah al Mamun
 
Concept learning
Musa Hawamdah
 

Similar to Evaluating hypothesis (20)

PPT
1_Standard error Experimental Data_ML.ppt
VGaneshKarthikeyan
 
PPTX
Statistics Applied to Biomedical Sciences
Luca Massarelli
 
PDF
Module 4_Machine Learning_Evaluating Hyp
Dr. Shivashankar
 
PPT
Statistical Methods
Enric Cecilla Real
 
ODP
QT1 - 07 - Estimation
Prithwis Mukerjee
 
PDF
Statistics_summary_1634533932.pdf
YoursTube1
 
PDF
Statistics_Cheat_sheet_1567847508.pdf
Akashyadav375896
 
PPT
Inferential statistics-estimation
Southern Range, Berhampur, Odisha
 
PPTX
5. RV and Distributions.pptx
SaiMohnishMuralidhar
 
PDF
Random variables
MenglinLiu1
 
PPTX
AI ML M5
Varshitha888314
 
DOCX
Estimation in statistics
Rabea Jamal
 
PDF
Ch3_Statistical Analysis and Random Error Estimation.pdf
Vamshi962726
 
PDF
An Introduction To Probability And Statistical Inference 1st Edition George G...
gopencripeou
 
PPTX
Standard Deviaton and Justification of the Mean
cypunksevenfold
 
PDF
Statistical Inference & Hypothesis Testing.pdf
Manash Kumar Mondal
 
PPTX
Statistical inference with Python
Johnson Ubah
 
PPT
Lesson 04 chapter 7 estimation
Ning Ding
 
PPT
random variation 9473 by jaideep.ppt
BhartiYadav316049
 
PDF
10-probability-distribution-sta102.pdfhh
primcejames
 
1_Standard error Experimental Data_ML.ppt
VGaneshKarthikeyan
 
Statistics Applied to Biomedical Sciences
Luca Massarelli
 
Module 4_Machine Learning_Evaluating Hyp
Dr. Shivashankar
 
Statistical Methods
Enric Cecilla Real
 
QT1 - 07 - Estimation
Prithwis Mukerjee
 
Statistics_summary_1634533932.pdf
YoursTube1
 
Statistics_Cheat_sheet_1567847508.pdf
Akashyadav375896
 
Inferential statistics-estimation
Southern Range, Berhampur, Odisha
 
5. RV and Distributions.pptx
SaiMohnishMuralidhar
 
Random variables
MenglinLiu1
 
AI ML M5
Varshitha888314
 
Estimation in statistics
Rabea Jamal
 
Ch3_Statistical Analysis and Random Error Estimation.pdf
Vamshi962726
 
An Introduction To Probability And Statistical Inference 1st Edition George G...
gopencripeou
 
Standard Deviaton and Justification of the Mean
cypunksevenfold
 
Statistical Inference & Hypothesis Testing.pdf
Manash Kumar Mondal
 
Statistical inference with Python
Johnson Ubah
 
Lesson 04 chapter 7 estimation
Ning Ding
 
random variation 9473 by jaideep.ppt
BhartiYadav316049
 
10-probability-distribution-sta102.pdfhh
primcejames
 
Ad

More from swapnac12 (18)

PPTX
Awt, Swing, Layout managers
swapnac12
 
PPTX
Applet
swapnac12
 
PPTX
Event handling
swapnac12
 
PPTX
Asymptotic notations(Big O, Omega, Theta )
swapnac12
 
PPTX
Introduction ,characteristics, properties,pseudo code conventions
swapnac12
 
PPTX
Inductive analytical approaches to learning
swapnac12
 
PPTX
Using prior knowledge to initialize the hypothesis,kbann
swapnac12
 
PPTX
Combining inductive and analytical learning
swapnac12
 
PPTX
Analytical learning
swapnac12
 
PPTX
Learning set of rules
swapnac12
 
PPTX
Genetic algorithms
swapnac12
 
PPTX
Instance based learning
swapnac12
 
PPTX
Computational learning theory
swapnac12
 
PPTX
Multilayer & Back propagation algorithm
swapnac12
 
PPTX
Artificial Neural Networks 1
swapnac12
 
PPTX
Advanced topics in artificial neural networks
swapnac12
 
PPTX
Introdution and designing a learning system
swapnac12
 
PPTX
Concept learning and candidate elimination algorithm
swapnac12
 
Awt, Swing, Layout managers
swapnac12
 
Applet
swapnac12
 
Event handling
swapnac12
 
Asymptotic notations(Big O, Omega, Theta )
swapnac12
 
Introduction ,characteristics, properties,pseudo code conventions
swapnac12
 
Inductive analytical approaches to learning
swapnac12
 
Using prior knowledge to initialize the hypothesis,kbann
swapnac12
 
Combining inductive and analytical learning
swapnac12
 
Analytical learning
swapnac12
 
Learning set of rules
swapnac12
 
Genetic algorithms
swapnac12
 
Instance based learning
swapnac12
 
Computational learning theory
swapnac12
 
Multilayer & Back propagation algorithm
swapnac12
 
Artificial Neural Networks 1
swapnac12
 
Advanced topics in artificial neural networks
swapnac12
 
Introdution and designing a learning system
swapnac12
 
Concept learning and candidate elimination algorithm
swapnac12
 
Ad

Recently uploaded (20)

PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PDF
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PPT
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
PPTX
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
PDF
EXCRETION-STRUCTURE OF NEPHRON,URINE FORMATION
raviralanaresh2
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
PPTX
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
PPTX
Digital Professionalism and Interpersonal Competence
rutvikgediya1
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
EXCRETION-STRUCTURE OF NEPHRON,URINE FORMATION
raviralanaresh2
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
Basics and rules of probability with real-life uses
ravatkaran694
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
Virus sequence retrieval from NCBI database
yamunaK13
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
Translation_ Definition, Scope & Historical Development.pptx
DhatriParmar
 
Digital Professionalism and Interpersonal Competence
rutvikgediya1
 

Evaluating hypothesis

  • 2. EVALUATING HYPOTHESIS Reasons to Evaluate hypothesis  Understand whether to use the hypothesis.  The evaluating hypotheses is an integral component of many learning methods. Swapna.C
  • 3.  Difficulties:  Bias in the estimate: Estimator accuracy for future is poor which learned from training examples. To get unbiased estimate test the hypothesis on some set of examples chosen independently of the training examples.  Variance in the estimate:  With Unbiased set measured accuracy vary from true accuracy. The smaller the set of test ex. The greater the expected variance. Swapna.C
  • 4. ESTIMATING HYPOTHESIS ACCURACY  When evaluating a learned hypothesis we are most often interested in estimation the accuracy with which it will classify future instances.  Generally 2 questions arise:  1.Given a hypothesis h and a data sample containing n ex. drawn at random according to the distribution D, what is the best estimate of the accuracy of h over future instances drawn from the same distribution?  2. What is the probable error in this accuracy estimate? Swapna.C
  • 5. Sample Error and True Error Swapna.C
  • 7. Confidence Intervals for Discrete- Valued Hypotheses  suppose we wish to estimate the true error for some discrete valued hypothesis h, based on its observed sample error over a sample S, where the sample S contains n examples drawn independent of one another, and independent of h, according to the probability distribution V  n>30  hypothesis h commits r errors over these n examples (i.e., errors(h) = r/n). Swapna.C
  • 8.  Under these conditions, statistical theory allows us to make the following assertions:  95% confidence interval.  different constant is used to calculate the N% confidence interval. Swapna.C
  • 9.  Data sample S contains n =40 examples on hypothesis Errors r= 12  Sample Error errors (h) =12/40=0.30  Accuracy= 1-0.3= 0.7 95% confidence interval  errorD (h) 0.30+(1.96*0.70)=0.30+0.14.  If it satisfies then that is the correct interval Swapna.C
  • 10. BASICS OF SAMPLING THEORY Basic Definitions: Swapna.C
  • 12. BASICS OF SAMPLING THEORY  Error Estimation and Estimating Binomial Proportions: The Binomial distribution Swapna.C
  • 13. The Binomial Distribution  The general setting to which the Binomial distribution applies is:  1. There is a base, or underlying, experiment (e.g., toss of the coin) whose outcome can be described by a random variable, say Y. The random variable Y can take on two possible values (e.g., Y = 1 if heads, Y = 0 if tails).  2. The probability that Y = 1 on any single trial of the underlying experiment is given by some constant p, independent of the outcome of any other experiment. The probability that Y = 0 is therefore (1 - p). Typically, p is not known in advance, and the problem is to estimate it.  3. A series of n independent trials of the underlying experiment is performed (e.g., n independent coin tosses), producing the sequence of independent, identically distributed random variables Yl, Yz, . . , Yn. Let R denote the number of trials for which Yi = 1 in this series of n experiments Swapna.C
  • 14.  4. The probability that the random variable R will take on a specific value r (e.g., the probability of observing exactly r heads) is given by the Binomial distribution  The Binomial distribution characterizes the probability of observing r heads from n coin flip experiments, as well as the probability of observing r errors in a data sample containing n randomly drawn instances. Swapna.C
  • 15.  Mean and Variance:  Two properties of a random variable that are often of interest are its expected value (also called its mean value) and its variance. The expected value is the average of the values taken on by repeatedly sampling the random variable. More precisely  Definition: Consider a random variable Y that takes on the possible values yl, . . . yn.  The expected value of Y, E[Y], is Swapna.C
  • 16.  For ex. If takes on the value 1 with probability 0.7 and the value 2 with probability 0.3, then its expected value is (1.0.7+2.0.3=1.3) E(Y)=np(Binomial Distribution) Swapna.C
  • 17.  By a Binomial distribution Swapna.C
  • 18. Estimators, Bias, and Variance where n is the number of instances in the sample S, r is the number of instances from S misclassified by h, and p is the probability of misclassifying a single instance drawn from D. Swapna.C
  • 19.  Statisticians call errors(h) an estimator for the true error errorD(h).  The estimation bias to be the difference between the expected value of the estimator and the true value of the parameter.  If the estimation bias is zero, we say that Y is an unbiased estimator for p. Swapna.C
  • 20.  The average of many random values of Y generated by repeated random experiments (i.e., E[Y]) converges toward p. The hypothesis on the training examples provides an optimistically biased estimate of hypothesis error, exactly notion of estimation bias.  The estimation bias is a numerical quantity, whereas the inductive bias is a set of assertions.  Variance, this choice will yield the smallest expected squared error between the estimate and the true value of the parameter. Swapna.C
  • 21.  To test hypotheses r = 12 errors on a sample of n = 40 randomly drawn test examples. Then an unbiased estimate for errorD(h) is given by errors(h) = r/n = 0.3.  The variance in this estimate arises completely from the variance in r, because n is a constant. Because r is Binomially distributed, its variance is given by np(1 - p). Unfortunately p is unknown, but we can substitute our estimate r/n for p. This yields an estimated variance in r of 40. 0.3(1 -0.3) = 8.4, or a corresponding standard deviation of a ;j: 2.9. his implies that the standard deviation in errors(h) = r/n is approximately 2.9140 = .07. To summarize, errors(h) in this case is observed to be 0.30, with a standard deviation of approximately 0.07. Swapna.C
  • 22.  In given r errors in a sample of n independently drawn test examples, the standard deviation errors(h)given  Which can be approximated by substituting r/n=errors(h) for p Swapna.C
  • 23. Confidence Intervals For example, if we observe r = 12 errors in a sample of n = 40 independently drawn examples, we can say with approximately 95% probability that the interval 0.30+ 0.14 contains the true error errorD(h). Swapna.C
  • 24.  An easily calculated and very good approximation can be found in most cases, based on the fact the for sufficiently large sample sizes the Binomial distribution can be closely approximated by the Normal distribution.  It is a bell-shaped distribution fully specified by its mean and standard deviation .  For large n, any Binomial distribution is very closely approximated by a Normal distribution with the same mean and variance. Swapna.C
  • 25. Normal or Gaussian Distribution Swapna.C
  • 26.  If a random variable Y obeys a Normal distribution with mean and standard deviation , then the measured random value y of Y will fall into the following interval N% of the time  Equivalently, the means will fall into the following interval N% of the time Swapna.C
  • 27.  We can easily combine this fact with earlier facts to derive the general expression for N% confidence intervals for discrete-valued hypotheses .  1st We know that errors(h)follows a Binomial distribution with mean value errorD(h) and standard deviation.  2nd Sufficiently large sample size n, this Binomial distribution in well approximated by a Normal distribution.  3rd How to find the N% confidence interval for estimating the mean value of a Normal distribution. Swapna.C
  • 28.  Substituting the mean and standard deviation of errors (h) yields the expression for N% confidence intervals for discrete-valued hypotheses Swapna.C
  • 29. 2 Approximations to derive expression  1.In estimating the standard deviation of errors(h), We have approximated errorD (h) by errors (h) and 2.The binomial distribution has been approximated by the Normal distribution.  The common rule of thumb in statistics is that these two approximations are very good as long as n >30, or when np(1- p) > =5. For smaller values of n it is wise to use a table giving exact values for the Binomial distribution. Swapna.C
  • 30. Two-sided and One-sided Bounds  Confidence interval is a 2-sided bound , it bounds the estimated quantity from above and below.  The fact that the Normal distribution is symmetric about its mean. Two-sided confidence interval based on a Normal distribution can be converted to a corresponding one-sided interval with twice the confidence. Swapna.C
  • 32.  That is, a 100(1- )% confidence interval with lower bound L and upper bound U implies a 100(1- /2)% confidence interval with lower bound L and no upper bound. It also implies a 100(1 - /2)% confidence interval with upper bound U and no lower bound. Here a corresponds to the probability that the correct value lies outside the stated interval. In other words, a is the probability that the value will fall into the unshaded region in Figure (a), and /2 is the probability that it will fall into the unshaded region in Figure (b). Swapna.C
  • 33.  consider again the example in which h commits r = 12 errors over a sample of n = 40 independently drawn examples. This leads to a (two-sided) 95% confidence interval of 0.30 +0.14. In this case, 100(1 - ) = 95%, so = 0.05. We can apply the rule to say with 100(1 - /2) = 97.5% confidence that error D(h) is at most 0.30 + 0.14 = 0.44, making no assertion about the lower bound on errorD (h). Thus, we have a one sided error bound on errorD (h) with double the confidence that we had in the corresponding two-sided bound. Swapna.C
  • 34. A GENERAL APPROACH FOR DERIVING CONFIDENCE INTERVALS  The general process includes the following steps: 1. Identify the underlying population parameter p to be estimated, for example, errorD(h). 2. Define the estimator Y (e.g., errors(h)). It is desirable to choose a minimum variance, unbiased estimator. 3. Determine the probability distribution Dy that governs the estimator Y, including its mean and variance. 4. Determine the N% confidence interval by finding thresholds L and U such that N% of the mass in the probability distribution Dy falls between L and U. Swapna.C
  • 36.  Let p denote the mean of the unknown distribution governing each of the Yi and let a denote the standard deviation. We say that these variables Yi are independent, identically distributed random variables, because they describe independent experiments, each obeying the same underlying probability distribution. In an attempt to estimate the mean p of the distribution governing the Yi, we calculate the sample mean Swapna.C
  • 37.  The Central Limit Theorem states that the probability distribution governing Yn approaches a Normal distribution as n-- regardless of the distribution that governs the underlying random variables Yi. Furthermore, the mean of the distribution governing Yn approaches and the standard deviation approaches / . Swapna.C
  • 38.  The Central Limit Theorem is a very useful fact, because it implies that whenever we define an estimator that is the mean of some sample (e.g., errors(h) is the mean error), the distribution governing this estimator can be approximated by a Normal distribution for sufficiently large n. Swapna.C
  • 39. DIFFERENCE IN ERROR OF TWO HYPOTHESES  Consider the case where we have two hypotheses hl and h2 for some discrete valued target function. Hypothesis hl has been tested on a sample S1 containing n1 randomly drawn examples, and h2 has been tested on an independent sampleS2 containing n2 examples drawn from the same distribution. Swapna.C
  • 40. d gives an unbiased estimate of d, that is E[d]=d. For large n1 and n2(e.g., both>30),both errors1(h1) and errors2(h2) follow distribution that are approximately Normal. Because of 2Normal distributions is also Normal distribution, d will also follow a distribution that is approximately Normal, with mean d. Swapna.C
  • 41.  Variance is the sum of the variances of errors =  N% confidence interval estimate for d is  h1 and h2 are tested on independent data samples  d = errors(h1)-errors(h2) This is smaller than the variance of set s1, and s2 to s. Swapna.C
  • 42. Hypothesis Testing  suppose we measure the sample errors for hl and h2 using two independent samples S1 and S2 of size 100 and find that errors1 (hl) = .30 and errors2(h2) = .20, hence the observed difference is d = .10.  Note the probability Pr(d > 0) is equal to the probability that d has not overestimated d by more than .10. Put another way, this is the probability that d falls into the one-sided interval d< d + .10. Since d is the mean of the distribution governing d, we can equivalently express this one-sided interval as d < + .10. It falls into one -sided Swapna.C
  • 43. 1.64 we find that 1.64 standard deviations about the mean corresponds to a two-sided interval with confidence level 90%. Therefore, the one-sided interval will have an associated confidence level of 95%. Therefore, given the observed d = .10, the probability that error D(h1) > errorD(h2) is approximately .95. In the terminology of the statistics literature, we say that we accept the hypothesis that "errorD (hl) > error D(h2)" with confidence 0.95. Alternatively, we may state that we reject the opposite hypothesis (often called the null hypothesis) at a (1 - 0.95) = .05 level of significance. Swapna.C
  • 44. COMPARING LEARNING ALGORITHMS  Comparing the performance of 2 learning algorithms LA and LB rather than 2 specific hypotheses. To determine 1 method is “on average” is to consider the relative performance of 2 algorithms averaged over all the training sets of size n over distribution D. Other one is Expected value of the difference in their errors Swapna.C
  • 45. where L(S) denotes the hypothesis output by learning method L when given the sample S of training data and where the subscript S c D indicates that the expected value is taken over samples S drawn according to the underlying instance distribution D. Swapna.C
  • 46.  Limited sample D0, divides into training set S0, Test set T0. Algorithms LA and LB .  Quantity is  2 key differences between this estimator and the quantity  1.We are using error T0(h) to approximate errorD(h).  2.Measuring the difference in errors for one training set S0 rather than taking the expected values of this difference over all samples S that might be drawn from the distribution D. Swapna.C
  • 47.  D0 into disjoint training sets and test sets. Swapna.C
  • 48.  One way to improve on the estimator is to repeatedly partition the data Do into disjoint training and test sets and to take the mean of the test set errors for these different experiments.  This procedure first partitions the data into k disjoint subsets of equal size, where this size is at least 30. It then trains and tests the learning algorithms k times, using each of the k subsets in turn as the test set, and using all remaining data as the training set. In this way, the learning algorithms are tested on k independent test sets, 'and the mean difference in errors is returned as an estimate of the difference between the two learning algorithms. Swapna.C
  • 49.  We can  The approximate N% confidence interval for estimating the quantity Swapna.C
  • 50. The first specifies the desired confidence level, as it did for our earlier constant ZN. The second parameter, called the number of degrees of freedom and usually denoted by v, is related to the number of independent random events that go into producing the value for the random variable 8. In the current setting, the number of degrees of freedom is k - 1. Swapna.C
  • 52.  The procedure described here for comparing two learning methods involves testing the two learned hypotheses on identical test sets.  comparing hypotheses that have been evaluated using two independent test sets. Tests where the hypotheses are evaluated over identical samples are called paired tests. Paired tests typically produce tighter confidence intervals because any differences in observed errors in a paired test are due to differences between the hypotheses. In contrast, when the hypotheses are tested on separate data samples, differences in the two sample errors might be partially attributable to differences in the makeup of the two samples. Swapna.C
  • 53. a)Paired t Tests  consider the following estimation problem: Swapna.C
  • 54.  This problem of estimating the distribution mean p based on the sample mean Y is quite general  Use steps of comparing learning methods. Assume that instead of having a fixed sample of data D0, we can request new training examples drawn according to the underlying instance distribution. Modify the steps on each iteration through the loop it generates a new random training set Si and new random test set Ti by drawing from this underlying instance distribution instead of drawing from this underlying instance distribution instead of drawing from the fixed sample D0. Swapna.C
  • 55.  The t test applies, in which the task is to estimate the sample mean of a collection of independent, identically and Normally distributed random variables.  Confidence interval = Swapna.C
  • 57. b)Practical Considerations:  In practice, the problem is that the only way to generate new Si is to resample Do, dividing it into training and test sets in different ways. The i are not independent of one another in this case, because they are based on overlapping sets of training examples drawn from the limited subset Do of data, rather than from the full distribution 'D. Swapna.C
  • 58.  k-fold method in which Do is partitioned into k disjoint, equal-sized subsets. In this k-fold approach, each example from Do is used exactly once in a test set, and k - 1 times in a training set. A  second popular approach is to randomly choose a test set of at least 30 examples from Do, use the remaining examples for training, then repeat this process as many times as desired.  This randomized method has the advantage that it can be repeated an indefinite number of times, to shrink the confidence interval to the desired width. Swapna.C
  • 59.  In contrast, the k-fold method is limited by the total number of examples, by the use of each example only once in a test set, and by our desire to use samples of size at least 30.  However, the randomized method has the disadvantage that the test sets no longer qualify as being independently drawn with respect to the underlying instance distribution D.  In contrast, the test sets generated by k-fold cross validation are independent because each instance is included in only one test set. Swapna.C