ETC1000
BUSINESS AND ECONOMIC
STATISTICS
10. Predictive Analytics
Estimation
PREDICTIVE ANALYTICS:
DATA AND DIFFERENCES
Consider data on number of doctors visits people
make.
PREDICTIVE ANALYTICS:
REAL DIFFERENCES VS CHANCE DIFFERENCES
Descriptive
Analytics allows us to discover a number of
patterns and relationships:
male
= 1.871, female = 2.682
Low-income = below $1,800pm, high above $1,800pm.
low = 2.272, high = 2.289
Proportion with no doctors visits:
Full sample: 38.0%
Winter: 37.9%
Other Seasons: 38.1%
Correlation (Doctors visits & Education) = -.092
Correlation (Doctors visits & Income) = -.005
PREDICTIVE ANALYTICS:
REAL DIFFERENCES VS CHANCE DIFFERENCES
Some differences in means and proportions are small,
and some correlations are small; others do not seem
as small.
Small differences / values are probably just zero, and
the non-zero value was due to random variation. But
how do we decide what is small?
Need to identify real / substantive differences vs
small, coincidental differences.
Real, substantive differences are likely to persist, so
relevant to decisions & predicting behaviour.
Predictive Analytics requires ideas of Statistical
Inference.
PREDICTIVE ANALYTICS:
STATISTICAL INFERENCE
Statistical Inference : use a sample of data to
estimate characteristics of a population.
PREDICTIVE ANALYTICS:
STATISTICAL INFERENCE
A Sample Statistic is used
as an estimator of a Population Parameter
Population Parameter
Sample Statistic
Mean
Sample Mean
Proportion
Sample Proportion p
Difference between means
Difference between sample means
Difference in proportions
Difference in sample proportions
Correlation
Sample Correlation r
etc
etc
PREDICTIVE ANALYTICS:
STATISTICAL INFERENCE
Whenever we use a sample statistic to estimate a
population parameter, there is uncertainty: the
estimate is only an approximation to the true
value.
If we take a new sample, we will get a different
estimate. Both cannot be correct!
The uncertainty is captured with ideas of
confidence intervals, hypothesis testing, standard
errors.