SlideShare a Scribd company logo
Confidence Interval Module
One of the key concepts of statistics enabling statisticians to
make incredibly accurate predictions is called the Central Limit
Theorem. The Central Limit Theorem is defined in this way:
· For samples of a sufficiently large size, the real distribution of
means is almost always approximately normal.
· The distribution of means gets closer and closer to normal as
the sample size gets larger and larger, regardless of what the
original variable looks like (positively or negatively skewed).
· In other words, the original variable does not have to be
normally distributed.
· This is because, if we as eccentric researchers, drew an almost
infinite number of random samples from a single population
(such as the student body of NMSU), the means calculated from
the many samples of that population will be normally
distributed and the mean calculated from all of those samples
would be a very close approximation to the true population
mean. It is this very characteristic that makes it possible for us,
using sound probability based sampling techniques, to make
highly accurate statements about characteristics of a population
based upon the statistics calculated on a sample drawn from that
population.
· Furthermore, we can calculate a statistic known as the
standard error of the mean (abbreviated s.e.) that describes the
variability of the distribution of all possible sample means in
the same way that we used the standard deviation to describe
the variability of a single sample. We will use the standard
error of the mean (s.e.) to calculate the statistic that is the topic
of this module, the confidence interval.
The formula that we use to calculate the standard error of the
mean is:
s.e. = s / √N – 1
where s = the standard deviation calculated from the sample;
and
N = the sample size.
So the formula tells us that the standard error of the mean is
equal to the
standard deviation divided by the square root of the sample size
minus 1.
This is the preferred formula for practicing professionals as it
accounts for errors that may be a function of the particular
sample we have selected.
THE CONFIDENCE INTERVAL (CI)
The formula for the CI is a function of the sample size (N).
For samples sizes ≥ 100, the formula for the CI is:
CI = (the sample mean) + & - Z(s.e.).
Let’s look at an example to see how this formula works.
* Please use a pdf doc. “how to solve the problem”, I have
provided for you under the “notes” link.
Example 1
Suppose that we conducted interviews with 140 randomly
selected individuals (N = 140) in a large metropolitan area. We
assured these individuals that their answers would remain
confidential, and we asked them about their law-breaking
behavior. Among other questions the individuals were asked to
self-report the number of times per month they exceeded the
speed limit. One of the objectives of the study was to estimate
(make an inference about) the average number of times per
month residents in all metropolitan areas across the country
exceeded the speed limit. The sample statistics we obtained
were as follows:
Mean = 12.4 times
S = 3.2 times
N = 140
Let’s construct a 95% CI around our estimate of the mean drawn
from this sample.
The sample mean of 12.4 times tells us that, on average, the
individuals from our sample exceed the speed limit about 12.4
times a month. This sample mean estimate is our best point
estimate of the true population mean. We know full well that
12.4 times is not the true population mean and that repeated
samples will yield different means. What does our sample mean
tell us about the mean of the entire population of metropolitan
residents? This is the question we are really trying to answer.
We want to make our point estimate of 12.4 more reliable and at
the same time, give ourselves the ability to make a probability
statement about the confidence we have in our estimate. To do
this, we use the CI equation above to construct a 95%
confidence interval around the sample mean estimate of 12.4.
We have all the information we need to fill in the information
for the formula except for the Z score. The Z score for a 95%
CI is 1.96. From the Z Table, we can find the correct Z score
corresponding to 95 %. Remember that the total area under the
normal distribution/curve equals 100% and that half of that
area, 50%, is above and below the mean. If we are looking for
the Z score corresponding to 95% we first divide 95% in half
leaving a total of 47.5% above and below the mean with 2.5% in
the tail above and below our 95% confidence interval on either
side of the mean. Next we look inside the Z Table (the numbers
corresponding to areas under the normal curve) for the number
that comes closest to .4750 (47.5%) without going under .4750
and identify the corresponding Z score. The correct Z score is
1.96 where the area is .4750.
Now we can solve the equation.
95% CI = 12.4 + & - 1.96 (3.2 / √140 – 1)
= 12.4 + & - 1.96 (3.2 / √139)
= 12.4 + & - 1.96 (3.2 / 11.79)
= 12.4 + & - 1.96 (.27)
= 12.4 + & - .53
12.4 - .53 = 11.87
12.4 + .53 = 12.93
95% CI = 11.87 to 12.93
So what does this interval tell us? It tells us that based on our
sample data; we can be 95 percent confident that the mean
number of self-admitted speeding violations among all residents
of metropolitan areas lies between 11.87 and 12.93 times per
month. That is, theoretically speaking, if we had taken a large
number of random samples from this sample population and
calculated 95% confidence intervals around the means obtained
from each sample, approximately 95% of these intervals would
include the true population mean and 5 percent would not.
Example 2
Let’s say for the sake of argument that we only wanted a 90%
CI about our sample mean, rather than a 95% CI for our point
estimate of 12.4. From the Z Table, we can find the correct Z
score corresponding to 90%. Remember that the total area
under the normal distribution/curve equals 100% and that half
of that area, 50%, is above and below the mean. If we are
looking for the Z score corresponding to 90% we first divide
90% in half leaving a total of 45% above and below the mean
with 5% in the tail above and below our 90% confidence
interval on either side of the mean. Next we look inside the Z
Table (the numbers corresponding to areas under the normal
curve) for the number that comes closest to .4500 (45%) without
going under .4500 and identify the corresponding Z score. The
correct Z score is 1.65 where the area is .4505. A Z score of
1.64 would be incorrect because the area of .4495 is less than
45 percent and thus our CI estimate would not truly be a 90%
confidence level estimate.
As in example 1 we will insert 1.65 into the CI equation and
solve.
90% CI = 12.4 + & - 1.65 (3.2 / √140 – 1)
= 12.4 + & - 1.65 (3.2 / √139)
= 12.4 + & - 1.65 (3.2 / 11.79)
= 12.4 + & - 1.65 (.27)
= 12.4 + & - .44
12.4 - .44 = 11.96
12.4 + .44 = 12.84
90% CI = 11.96 – 12.84
The interval indicates that we are 90 percent confident that the
true population mean speeding violation score falls between
11.96 and 12.84 times per month. Notice that the interval for a
90% confidence interval is narrower than for a 95% confidence
interval. You can see, then, that we are less confident (90
percent vs. 95 percent confident) that our true population means
falls into this interval. By lowering our level of confidence, we
gained some precision in our estimate. We could reduce the
width of our confidence interval even more, but we would pay
the price in levels of confidence.
Example 3
Let’s say that we took a new sample only this time we randomly
select and interview 901 individuals, asking the same questions.
Our sample data for this sample are:
Sample mean = 12.4 times
S = 3.2 times
N = 901
Now lets recalculate our 90% CI.
90% CI = 12.4 + & - 1.65 (3.2 / √901 – 1)
= 12.4 + & - 1.65 (3.2 / √900)
= 12.4 + & - 1.65 (3.2 / 30)
= 12.4 + & - 1.65 ( .11)
= 12.4 + & - .18
12.4 - .18 = 12.22
12.4 + .18 = 12.58
90% CI = 12.22 – 12.58
The interval indicates that we are 90 percent confident that the
true population mean speeding violation score falls between
12.22 and 12.58 times per month. Notice that the interval is
considerably smaller than in Example 2 where the sample size is
140. Why is this? By increasing the sample size, the s.e.
became smaller. We can see this mathematically, but what is
the theoretical reasoning for this change? As our sample size
increased, we captured a greater proportion of the variability in
self-reported speeding violations that exists in the total
population. Consequently, our confidence interval estimate is
more precise. The lesson learned is that whenever you have a
choice between a smaller or a larger sample, choose the larger
sample as your estimates (inferences) about the population will
be more accurate.
Example 4
We have been calculating the confidence interval for samples
where N ≥ 100. What if the sample size is less than 100, N <
100?
In this situation, we must use the two-tailed “T” distribution,
from the Table of T Values. I have provided to you as a pdf
doc. under the “notes” link. We use the two-tailed T
distribution because we are working with a confidence interval
and are concerned with the area between two points on either
side of the mean. This means that we will use the column
headings beneath the label “Level of Significance for Two-
Tailed Test.”
Let’s continue with our effort to estimate the number of self-
reported speeding violations and construct a confidence interval
using the T distribution.
Let’s say we are short on research funds and we are only able to
randomly select and interview 17 individuals and we want to
construct a 90%CI around our estimate of the population mean.
From our sample we obtained the following statistics:
Sample mean = 12.4 times
S = 3.2 times
N = 17
The formula we use is the same as that for samples where N ≥
100 except instead of using Z, we use T. The only trick is to
determine which value of T from the Table of T Values we will
use. The first task is to determine the correct column. For a
90% confidence level we will select the column labeled “.10”.
If we wanted a confidence level of 95% we would select the
column labeled “.05”. If we wanted a confidence level of 98%
we would select the column labeled “.02”. If we wanted a
confidence level of 99% we would select the column labeled
“.01”. These levels (.10, .05, .02, .01) represent the total area
remaining in the two tails of the curve that are outside of our
confidence interval. For example, when we construct a 90%
confidence interval 10% of the area under the curve lies outside
the confidence interval boundaries (100 – 90 = 10) and that
remaining 10% is split equally on either side of the boundary
such that 5% remains below the lower boundary of the
confidence interval and 5% remains above the upper boundary
of the confidence interval. The same logic holds true for any
given level of confidence when we are constructing a
confidence interval.
The second task is to select the correct row. To do this we must
calculate something called the degrees of freedom (abbreviated
df). The degrees of freedom (df) = N -1. In this example, df =
17 – 1 = 16. Now we are able to find the appropriate value for
T to insert into our confidence interval formula. The degrees of
freedom are located in the very first column and begin with 1
and go sequentially through 30 and then moves to 40, 60, 120,
and infinity. Go down the column for df until you arrive at 16.
Go across the row for 16 until you are in the column for .10.
That number is 1.746. Now we are ready to construct our 90%
CI.
90% CI = 12.4 + & - 1.746 ( 3.2 / √17 – 1)
= 12.4 + & - 1.746 (3.2 / √16)
= 12.4 + & - 1.746 (3.2 / 4)
= 12.4 + & - 1.746 (0.8)
= 12.4 + & - 1.397
12.4 – 1.397 = 11.003
12.4 + 1.397 = 13.797
90% CI = 11.003 – 13.797
The interval indicates that we are 90 percent confident that the
true population mean speeding violation score falls between
11.003 to 13.797. Notice that the interval is considerably larger
than the intervals in any of the prior examples. This difference
is due to the same phenomenon I discussed in example 3 above
regarding the effect of sample size on the accuracy of our
estimates of the true population mean.

More Related Content

DOCX
- Aow-Aowf--,d--Tto o4prnbAuSDUJ_ pya.docx
mercysuttle
 
PPTX
Standard Error & Confidence Intervals.pptx
hanyiasimple
 
PPT
What is an estimate with details regarding it's use in biostatistics
bilalkhanafridi582
 
PPTX
M1-4 Estimasi Titik dan Intervaltttt.pptx
Agus Setiawan
 
PDF
Normal and standard normal distribution
Avjinder (Avi) Kaler
 
DOCX
Module 7 Interval estimatorsMaster for Business Statistics.docx
gilpinleeanna
 
PPTX
How to compute for sample size.pptx
noelmartinez003
 
PPT
Confidence Intervals
mandalina landy
 
- Aow-Aowf--,d--Tto o4prnbAuSDUJ_ pya.docx
mercysuttle
 
Standard Error & Confidence Intervals.pptx
hanyiasimple
 
What is an estimate with details regarding it's use in biostatistics
bilalkhanafridi582
 
M1-4 Estimasi Titik dan Intervaltttt.pptx
Agus Setiawan
 
Normal and standard normal distribution
Avjinder (Avi) Kaler
 
Module 7 Interval estimatorsMaster for Business Statistics.docx
gilpinleeanna
 
How to compute for sample size.pptx
noelmartinez003
 
Confidence Intervals
mandalina landy
 

Similar to Confidence Interval ModuleOne of the key concepts of statist.docx (20)

PPTX
Inferential Statistics-Part-I mtech.pptx
ShaktikantGiri1
 
PPTX
STATPRO-LESSON-ESTI•••••••••••••••••••••
jerleenjoycecnapacia
 
PPT
M.Ed Tcs 2 seminar ppt npc to submit
BINCYKMATHEW
 
PPT
Chapter09
rwmiller
 
PPT
Statistik 1 7 estimasi & ci
Selvin Hadi
 
PPT
Chapter 11
Cheryl Lawson
 
PPTX
RMH Concise Revision Guide - the Basics of EBM
AyselTuracli
 
PPT
Mca admission in india
Edhole.com
 
PDF
Estimation and hypothesis testing (2).pdf
MuazbashaAlii
 
PPTX
6. point and interval estimation
ONE Virtual Services
 
PPT
Introduction to Statistics - Part 2
Damian T. Gordon
 
PPT
Confidence intervals
Tanay Tandon
 
DOCX
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
boyfieldhouse
 
PPTX
Statistical inference with Python
Johnson Ubah
 
PPTX
Point and Interval Estimation
Shubham Mehta
 
PPTX
Lecture 7 Sample Size and CI.pptxtc5c5kyso6xr6x
Arsalna
 
PPTX
Basic statistics for pharmaceutical (Part 1)
Syed Muhammad Danish
 
PPT
101_sampling__population_Sept_2020.ppt
Andrei33323
 
DOCX
WEEK 5 HOMEWORK 5THIS WEEK INVOLVES READING NEW TABLES, THE t-TA.docx
cockekeshia
 
PPTX
Inorganic CHEMISTRY
Saikumar raja
 
Inferential Statistics-Part-I mtech.pptx
ShaktikantGiri1
 
STATPRO-LESSON-ESTI•••••••••••••••••••••
jerleenjoycecnapacia
 
M.Ed Tcs 2 seminar ppt npc to submit
BINCYKMATHEW
 
Chapter09
rwmiller
 
Statistik 1 7 estimasi & ci
Selvin Hadi
 
Chapter 11
Cheryl Lawson
 
RMH Concise Revision Guide - the Basics of EBM
AyselTuracli
 
Mca admission in india
Edhole.com
 
Estimation and hypothesis testing (2).pdf
MuazbashaAlii
 
6. point and interval estimation
ONE Virtual Services
 
Introduction to Statistics - Part 2
Damian T. Gordon
 
Confidence intervals
Tanay Tandon
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
boyfieldhouse
 
Statistical inference with Python
Johnson Ubah
 
Point and Interval Estimation
Shubham Mehta
 
Lecture 7 Sample Size and CI.pptxtc5c5kyso6xr6x
Arsalna
 
Basic statistics for pharmaceutical (Part 1)
Syed Muhammad Danish
 
101_sampling__population_Sept_2020.ppt
Andrei33323
 
WEEK 5 HOMEWORK 5THIS WEEK INVOLVES READING NEW TABLES, THE t-TA.docx
cockekeshia
 
Inorganic CHEMISTRY
Saikumar raja
 
Ad

More from maxinesmith73660 (20)

DOCX
You have been chosen to present in front of your local governing boa.docx
maxinesmith73660
 
DOCX
You have been charged with overseeing the implementation of cybersec.docx
maxinesmith73660
 
DOCX
You have been commissioned to create a manual covering the installat.docx
maxinesmith73660
 
DOCX
You have been challenged by a mentor you respect and admire to demon.docx
maxinesmith73660
 
DOCX
You have been chosen as the consultant group to assess the organizat.docx
maxinesmith73660
 
DOCX
You have been assigned a reading by WMF Petrie; Diospolis Parva (.docx
maxinesmith73660
 
DOCX
You have been asked to speak to city, municipal, and state elected a.docx
maxinesmith73660
 
DOCX
You have been asked to provide a presentation, covering the history .docx
maxinesmith73660
 
DOCX
You have been asked to organize a community health fair at a loc.docx
maxinesmith73660
 
DOCX
You have been asked to explain the differences between certain categ.docx
maxinesmith73660
 
DOCX
You have been asked to evaluate a 3-year-old child in your clinic.  .docx
maxinesmith73660
 
DOCX
You have been asked to develop UML diagrams to graphically depict .docx
maxinesmith73660
 
DOCX
You have been asked to develop UML diagrams to graphically depict an.docx
maxinesmith73660
 
DOCX
You have been asked to develop a quality improvement (QI) process fo.docx
maxinesmith73660
 
DOCX
You have been asked to design and deliver a Microsoft PowerPoint pre.docx
maxinesmith73660
 
DOCX
You have been asked to be the project manager for the development of.docx
maxinesmith73660
 
DOCX
You have been asked to conduct research on a past forensic case to a.docx
maxinesmith73660
 
DOCX
You have been asked for the summary to include the following compone.docx
maxinesmith73660
 
DOCX
You have been asked to be the project manager for the developmen.docx
maxinesmith73660
 
DOCX
You have been asked by management, as a senior member of your co.docx
maxinesmith73660
 
You have been chosen to present in front of your local governing boa.docx
maxinesmith73660
 
You have been charged with overseeing the implementation of cybersec.docx
maxinesmith73660
 
You have been commissioned to create a manual covering the installat.docx
maxinesmith73660
 
You have been challenged by a mentor you respect and admire to demon.docx
maxinesmith73660
 
You have been chosen as the consultant group to assess the organizat.docx
maxinesmith73660
 
You have been assigned a reading by WMF Petrie; Diospolis Parva (.docx
maxinesmith73660
 
You have been asked to speak to city, municipal, and state elected a.docx
maxinesmith73660
 
You have been asked to provide a presentation, covering the history .docx
maxinesmith73660
 
You have been asked to organize a community health fair at a loc.docx
maxinesmith73660
 
You have been asked to explain the differences between certain categ.docx
maxinesmith73660
 
You have been asked to evaluate a 3-year-old child in your clinic.  .docx
maxinesmith73660
 
You have been asked to develop UML diagrams to graphically depict .docx
maxinesmith73660
 
You have been asked to develop UML diagrams to graphically depict an.docx
maxinesmith73660
 
You have been asked to develop a quality improvement (QI) process fo.docx
maxinesmith73660
 
You have been asked to design and deliver a Microsoft PowerPoint pre.docx
maxinesmith73660
 
You have been asked to be the project manager for the development of.docx
maxinesmith73660
 
You have been asked to conduct research on a past forensic case to a.docx
maxinesmith73660
 
You have been asked for the summary to include the following compone.docx
maxinesmith73660
 
You have been asked to be the project manager for the developmen.docx
maxinesmith73660
 
You have been asked by management, as a senior member of your co.docx
maxinesmith73660
 
Ad

Recently uploaded (20)

PPTX
CDH. pptx
AneetaSharma15
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
How to Apply for a Job From Odoo 18 Website
Celine George
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
CDH. pptx
AneetaSharma15
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
How to Apply for a Job From Odoo 18 Website
Celine George
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 

Confidence Interval ModuleOne of the key concepts of statist.docx

  • 1. Confidence Interval Module One of the key concepts of statistics enabling statisticians to make incredibly accurate predictions is called the Central Limit Theorem. The Central Limit Theorem is defined in this way: · For samples of a sufficiently large size, the real distribution of means is almost always approximately normal. · The distribution of means gets closer and closer to normal as the sample size gets larger and larger, regardless of what the original variable looks like (positively or negatively skewed). · In other words, the original variable does not have to be normally distributed. · This is because, if we as eccentric researchers, drew an almost infinite number of random samples from a single population (such as the student body of NMSU), the means calculated from the many samples of that population will be normally distributed and the mean calculated from all of those samples would be a very close approximation to the true population mean. It is this very characteristic that makes it possible for us, using sound probability based sampling techniques, to make highly accurate statements about characteristics of a population based upon the statistics calculated on a sample drawn from that population. · Furthermore, we can calculate a statistic known as the standard error of the mean (abbreviated s.e.) that describes the variability of the distribution of all possible sample means in the same way that we used the standard deviation to describe
  • 2. the variability of a single sample. We will use the standard error of the mean (s.e.) to calculate the statistic that is the topic of this module, the confidence interval. The formula that we use to calculate the standard error of the mean is: s.e. = s / √N – 1 where s = the standard deviation calculated from the sample; and N = the sample size. So the formula tells us that the standard error of the mean is equal to the standard deviation divided by the square root of the sample size minus 1.
  • 3. This is the preferred formula for practicing professionals as it accounts for errors that may be a function of the particular sample we have selected. THE CONFIDENCE INTERVAL (CI) The formula for the CI is a function of the sample size (N). For samples sizes ≥ 100, the formula for the CI is: CI = (the sample mean) + & - Z(s.e.). Let’s look at an example to see how this formula works. * Please use a pdf doc. “how to solve the problem”, I have provided for you under the “notes” link. Example 1 Suppose that we conducted interviews with 140 randomly selected individuals (N = 140) in a large metropolitan area. We assured these individuals that their answers would remain confidential, and we asked them about their law-breaking behavior. Among other questions the individuals were asked to self-report the number of times per month they exceeded the speed limit. One of the objectives of the study was to estimate (make an inference about) the average number of times per month residents in all metropolitan areas across the country exceeded the speed limit. The sample statistics we obtained were as follows: Mean = 12.4 times
  • 4. S = 3.2 times N = 140 Let’s construct a 95% CI around our estimate of the mean drawn from this sample. The sample mean of 12.4 times tells us that, on average, the individuals from our sample exceed the speed limit about 12.4 times a month. This sample mean estimate is our best point estimate of the true population mean. We know full well that 12.4 times is not the true population mean and that repeated samples will yield different means. What does our sample mean tell us about the mean of the entire population of metropolitan residents? This is the question we are really trying to answer. We want to make our point estimate of 12.4 more reliable and at the same time, give ourselves the ability to make a probability statement about the confidence we have in our estimate. To do this, we use the CI equation above to construct a 95% confidence interval around the sample mean estimate of 12.4. We have all the information we need to fill in the information for the formula except for the Z score. The Z score for a 95% CI is 1.96. From the Z Table, we can find the correct Z score corresponding to 95 %. Remember that the total area under the normal distribution/curve equals 100% and that half of that area, 50%, is above and below the mean. If we are looking for the Z score corresponding to 95% we first divide 95% in half leaving a total of 47.5% above and below the mean with 2.5% in the tail above and below our 95% confidence interval on either side of the mean. Next we look inside the Z Table (the numbers
  • 5. corresponding to areas under the normal curve) for the number that comes closest to .4750 (47.5%) without going under .4750 and identify the corresponding Z score. The correct Z score is 1.96 where the area is .4750. Now we can solve the equation. 95% CI = 12.4 + & - 1.96 (3.2 / √140 – 1) = 12.4 + & - 1.96 (3.2 / √139) = 12.4 + & - 1.96 (3.2 / 11.79) = 12.4 + & - 1.96 (.27) = 12.4 + & - .53 12.4 - .53 = 11.87
  • 6. 12.4 + .53 = 12.93 95% CI = 11.87 to 12.93 So what does this interval tell us? It tells us that based on our sample data; we can be 95 percent confident that the mean number of self-admitted speeding violations among all residents of metropolitan areas lies between 11.87 and 12.93 times per month. That is, theoretically speaking, if we had taken a large number of random samples from this sample population and calculated 95% confidence intervals around the means obtained from each sample, approximately 95% of these intervals would include the true population mean and 5 percent would not. Example 2 Let’s say for the sake of argument that we only wanted a 90% CI about our sample mean, rather than a 95% CI for our point estimate of 12.4. From the Z Table, we can find the correct Z score corresponding to 90%. Remember that the total area under the normal distribution/curve equals 100% and that half of that area, 50%, is above and below the mean. If we are looking for the Z score corresponding to 90% we first divide 90% in half leaving a total of 45% above and below the mean with 5% in the tail above and below our 90% confidence interval on either side of the mean. Next we look inside the Z Table (the numbers corresponding to areas under the normal curve) for the number that comes closest to .4500 (45%) without going under .4500 and identify the corresponding Z score. The correct Z score is 1.65 where the area is .4505. A Z score of 1.64 would be incorrect because the area of .4495 is less than 45 percent and thus our CI estimate would not truly be a 90%
  • 7. confidence level estimate. As in example 1 we will insert 1.65 into the CI equation and solve. 90% CI = 12.4 + & - 1.65 (3.2 / √140 – 1) = 12.4 + & - 1.65 (3.2 / √139) = 12.4 + & - 1.65 (3.2 / 11.79) = 12.4 + & - 1.65 (.27) = 12.4 + & - .44
  • 8. 12.4 - .44 = 11.96 12.4 + .44 = 12.84 90% CI = 11.96 – 12.84 The interval indicates that we are 90 percent confident that the true population mean speeding violation score falls between 11.96 and 12.84 times per month. Notice that the interval for a 90% confidence interval is narrower than for a 95% confidence interval. You can see, then, that we are less confident (90 percent vs. 95 percent confident) that our true population means falls into this interval. By lowering our level of confidence, we gained some precision in our estimate. We could reduce the width of our confidence interval even more, but we would pay the price in levels of confidence. Example 3 Let’s say that we took a new sample only this time we randomly select and interview 901 individuals, asking the same questions. Our sample data for this sample are: Sample mean = 12.4 times
  • 9. S = 3.2 times N = 901 Now lets recalculate our 90% CI. 90% CI = 12.4 + & - 1.65 (3.2 / √901 – 1) = 12.4 + & - 1.65 (3.2 / √900) = 12.4 + & - 1.65 (3.2 / 30) = 12.4 + & - 1.65 ( .11) = 12.4 + & - .18
  • 10. 12.4 - .18 = 12.22 12.4 + .18 = 12.58 90% CI = 12.22 – 12.58 The interval indicates that we are 90 percent confident that the true population mean speeding violation score falls between 12.22 and 12.58 times per month. Notice that the interval is considerably smaller than in Example 2 where the sample size is 140. Why is this? By increasing the sample size, the s.e. became smaller. We can see this mathematically, but what is the theoretical reasoning for this change? As our sample size increased, we captured a greater proportion of the variability in self-reported speeding violations that exists in the total population. Consequently, our confidence interval estimate is more precise. The lesson learned is that whenever you have a choice between a smaller or a larger sample, choose the larger sample as your estimates (inferences) about the population will be more accurate. Example 4 We have been calculating the confidence interval for samples where N ≥ 100. What if the sample size is less than 100, N <
  • 11. 100? In this situation, we must use the two-tailed “T” distribution, from the Table of T Values. I have provided to you as a pdf doc. under the “notes” link. We use the two-tailed T distribution because we are working with a confidence interval and are concerned with the area between two points on either side of the mean. This means that we will use the column headings beneath the label “Level of Significance for Two- Tailed Test.” Let’s continue with our effort to estimate the number of self- reported speeding violations and construct a confidence interval using the T distribution. Let’s say we are short on research funds and we are only able to randomly select and interview 17 individuals and we want to construct a 90%CI around our estimate of the population mean. From our sample we obtained the following statistics: Sample mean = 12.4 times S = 3.2 times N = 17
  • 12. The formula we use is the same as that for samples where N ≥ 100 except instead of using Z, we use T. The only trick is to determine which value of T from the Table of T Values we will use. The first task is to determine the correct column. For a 90% confidence level we will select the column labeled “.10”. If we wanted a confidence level of 95% we would select the column labeled “.05”. If we wanted a confidence level of 98% we would select the column labeled “.02”. If we wanted a confidence level of 99% we would select the column labeled “.01”. These levels (.10, .05, .02, .01) represent the total area remaining in the two tails of the curve that are outside of our confidence interval. For example, when we construct a 90% confidence interval 10% of the area under the curve lies outside the confidence interval boundaries (100 – 90 = 10) and that remaining 10% is split equally on either side of the boundary such that 5% remains below the lower boundary of the confidence interval and 5% remains above the upper boundary of the confidence interval. The same logic holds true for any given level of confidence when we are constructing a confidence interval. The second task is to select the correct row. To do this we must calculate something called the degrees of freedom (abbreviated df). The degrees of freedom (df) = N -1. In this example, df = 17 – 1 = 16. Now we are able to find the appropriate value for T to insert into our confidence interval formula. The degrees of freedom are located in the very first column and begin with 1 and go sequentially through 30 and then moves to 40, 60, 120, and infinity. Go down the column for df until you arrive at 16. Go across the row for 16 until you are in the column for .10. That number is 1.746. Now we are ready to construct our 90% CI. 90% CI = 12.4 + & - 1.746 ( 3.2 / √17 – 1)
  • 13. = 12.4 + & - 1.746 (3.2 / √16) = 12.4 + & - 1.746 (3.2 / 4) = 12.4 + & - 1.746 (0.8) = 12.4 + & - 1.397 12.4 – 1.397 = 11.003 12.4 + 1.397 = 13.797
  • 14. 90% CI = 11.003 – 13.797 The interval indicates that we are 90 percent confident that the true population mean speeding violation score falls between 11.003 to 13.797. Notice that the interval is considerably larger than the intervals in any of the prior examples. This difference is due to the same phenomenon I discussed in example 3 above regarding the effect of sample size on the accuracy of our estimates of the true population mean.