Simulations in statistical inference
Recall the confidence interval for the population mean:
x̄ ± z SE(x̄)
The idea was that x̄ follows approximately the normal curve.
What if we are interested in an estimator θ̂ for some parameter θ and the normal
approximation is not valid for the estimator θ̂? What if there is no formula for SE(θ̂)?
In such situations, simulations can often be used to estimate these quantities quite well.
In fact, simulations may result in better estimates even in cases where the normal
approximation is applicable!
The Monte Carlo Method
What is the average height of all people living in the United States?
This is difficult to determine exactly but can easily be estimated quite well:
Sample n = 100 (say) people at random. Then use the average height of these n
people as an estimate of the average height of all people in the US.
This is an example of the general problem where we are interested in a unknown
parameter θ of a population.
We estimate θ with a statistic (estimator) θ̂ which is based on a sample of n
observations X1 , . . . , Xn drawn at random from the population:
θ̂ = average of the sample = n1 ni=1 Xi
P
The Monte Carlo Method
θ̂ = n1 ni=1 Xi tends to be close to the uncomputable population mean θ, even for
P
moderate sample sizes such as n = 100.
This example is a special case of the Monte Carlo Method or Simulation:
I We approximate a fixed quantity θ by the average of independent random variables
that have expected value θ.
I By the law of large numbers, the approximation error can be made arbitrarily small
by using a large enough sample size.
The Monte Carlo Method
The Monte Carlo Method can also be used for more involved quantities. For example,
we can use it to compute the standard error (SE) of a statistic θ̂.
Recall that the standard error tells roughly how far off the statistic will be from its
expected value.
q The precise definition is
SE(θ̂) = E(θ̂ − E(θ̂))2 .
I Get many (say 1,000) samples of 100 observations each.
I Compute θ̂ for each sample, resulting in 1,000 estimates θ̂1 , . . . , θ̂1000 .
I Compute the standard q deviation of these 1,000 estimates:
1 P1000
s(θ̂1 , . . . , θ̂1000 ) = 999 i=1 (θ̂i − ave(θ̂i ))2
Note that this is not an average of independent random variables. But it can be shown
that the law of large numbers still applies and Monte Carlo works:
s(θ̂1 , . . . , θ̂1000 ) ≈ SE(θ̂).
We can use Monte Carlo only if we can draw many samples of size 100!
The Bootstrap principle
We have an estimate θ̂ for a parameter θ and want to know how accurate θ̂ is:
we would like to find SE(θ̂) or give a confidence interval for θ.
The bootstrap can do this in quite general settings.
Example: θ = average height of all people in the US.
θ is unknown but can be estimated by the average height θ̂ of 100 randomly selected
people.
This illustrates the plug-in principle:
We can’t compute the population mean because we can’t access the whole population.
So we ’plug in’ the sample in place of the population and compute the mean of the
sample instead.
The Bootstrap principle
The rationale for the plug-in principle is that the sample mean θ̂ will be close to the
population mean θ because the sample histogram is close to the population histogram.
Histogram of population Histogram of sample
The bootstrap principle
The bootstrap uses the plug-in principle and the Monte Carlo Method to approximate
quantities such as SE(θ̂).
Here is the reasoning behind the bootstrap:
Suppose we can draw as many samples from the population as we wish. Then we can
approximate SE(θ̂) with Monte Carlo:
I Draw a sample X1 , . . . , Xn and use it to compute θ̂.
I Repeat B times (say B=1,000) to get θ̂1 , . . . , θ̂B .
I The standard deviation of these B estimates is close to SE(θ̂) if B is large, by the
law of large numbers.
However, we have only one sample X1 , . . . , Xn and we can’t simulate more because
the population is not accessible.
The bootstrap uses the plug-in principle to get around this: It simulates from the
sample instead of from the population.
The bootstrap principle
The bootstrap pretends that the sample histogram is the population histogram and
then uses Monte Carlo to simulate the quantity of interest.
Simulating a bootstrap sample X1∗ , . . . , Xn∗ means that we draw n times with
replacement from X1 , . . . , Xn .
The bootstrap consists of two steps:
I Draw B bootstrap samples and compute θ̂∗ for each bootstrap sample:
X1∗1 , . . . , Xn∗1 → θ̂1∗
..
.
X1∗B , . . . , Xn∗B → θ̂B
∗
I Use θ̂1∗ , . . . , θ̂B
∗ to approximate the quantity of interest.
For example, we approximate SE(θ̂) by the standard deviation of θ̂1∗ , . . . , θ̂B
∗.
More about the bootstrap
The nonparametric bootstrap simulates a bootstrap sample X1∗ , . . . , Xn∗ by drawing
with replacement from X1 , . . . , Xn .
Sometimes a parametric model is appropriate for the data, e.g. a normal distribution
with unknown mean and standard deviation. Then one may be better off with the
parametric bootstrap, which simulates the bootstrap samples from this model, using
estimates for the unknown parameters.
So far, the bootstrap samples were drawn independently. If there is dependence in the
data (time series), then this needs to be incorporated, e.g. with the block bootstrap.
Bootstrap confidence intervals
If the sampling distribution of θ̂ is approximately normal, then
h i
θ̂ − zα/2 SE(θ̂), θ̂ + zα/2 SE(θ̂)
is an approximate (1 − α)-confidence interval for θ.
SE(θ̂) can be estimated by the bootstrap.
If θ̂ is far from normal, then we have to use the bootstrap to estimate the whole
sampling distribution of θ̂, not just SE(θ̂).
The sampling distribution of θ̂ can be approximated by that of θ̂∗ , which in turn can be
approximated by the histogram of θ̂1∗ , . . . , θ̂B
∗.
This gives the bootstrap percentile interval
h i
∗ ∗
θ̂(α/2) , θ̂(1−α/2)
∗
where θ̂(α/2) is the α/2 percentile of the θ̂1∗ , . . . , θ̂B
∗.
Bootstrap confidence intervals
An alternative to bootstrapping the distribution of θ̂ is to do so for θ̂ − θ.
The hope is that this approach is less sensitive to θ and therefore produces a more
accurate confidence interval.
This results in the bootstrap pivotal interval
h i
∗ ∗
2θ̂ − θ̂(1−α/2) , 2θ̂ − θ̂(α/2)
Bootstrapping for regression
We have data (X1 , Y1 ), . . . , (Xn , Yn ) from the simple linear regression model
Yi = a + bXi + ei
From the data we can compute estimates â, b̂. How can we use the bootstrap to get
standard errors and confidence intervals?
I Compute the residuals êi = Yi − â − b̂Xi
I Resample from those residuals to get e∗1 , . . . , e∗n
I Compute the bootstrapped responses Yi∗ = â + b̂Xi + e∗i
This gives a bootstrap sample (X1 , Y1∗ ), . . . , (Xn , Yn∗ ), from which we can estimate the
parameters â∗ and b̂∗ in the usual way.