Chapter 11 - 250305 - 102157
Chapter 11 - 250305 - 102157
i.c
CS1-11: Linear regression Page 27
ng
si
om
o m
Chapter 11 Practice Questions
as
.m
w
11.1 A new computerised ultrasound scanning technique has enabled doctors to monitor the weights
w
w
of unborn babies. The table below shows the estimated weights for one particular baby at
fortnightly intervals during the pregnancy.
Estimated baby weight (kg) 1.6 1.7 2.5 2.8 3.2 3.5
(c) ˆ 2 0.0234 .
(ii) Calculate the baby’s expected weight at 42 weeks (assuming it hasn’t been born by then).
(iii) (a) Calculate the residual sum of squares and the regression sum of squares for these
data.
(v) Construct an ANOVA table for the sum of squares from part (iii)(a) and carry out an F-test
stating the conclusion clearly.
(vi) (a) Estimate the mean weight of a baby at 33 weeks. Calculate the variance of this
mean predicted response.
(b) Hence, calculate a 90% confidence interval for the mean weight of a baby at 33
weeks.
(vii) (a) Estimate the actual weight of an individual baby at 33 weeks. Calculate the
variance of this individual predicted response.
(b) Hence, calculate a 90% confidence interval for the weight of an individual baby at
33 weeks.
[ctd.]
ng
si
om
m
The table below shows some of the residuals:
o
as
.m
w
Gestation period (weeks) 30 32 34 36 38 40
w
w
Residual 0.07 0.05 0.04 0.07
(c) Plot the residuals against the x values and comment on the fit.
11.2 An analysis using the simple linear regression model based on 19 data points gave:
(iii) Comment on the results of the tests in parts (i) and (ii).
ng
si
om
m
11.3 The sums of the squares of the errors in a regression analysis are found to be:
o
as
.m
SSREG (yˆi y )2 6.4 SSRES (yi yˆi )2 3.6 SSTOT (yi y )2 10.0
w
w
w
Calculate the coefficient of determination and explain what this represents.
(i) yi a bxi2 ei
(ii) yi ae bxi
11.5 A university wishes to analyse the performance of its students on a particular degree course. It
records the scores obtained by a sample of 12 students at entry to the course, and the scores
Exam style
obtained in their final examinations by the same students. The results are as follows:
Student A B C D E F G H I J K L
Entrance exam score x (%) 86 53 71 60 62 79 66 84 90 55 58 72
Finals paper score y (%) 75 60 74 68 70 75 78 90 85 60 62 70
(ii) Assuming the full normal model, calculate an estimate of the error variance 2 and
obtain a 90% confidence interval for 2 . [3]
(iii) By considering the slope parameter, formally test whether the data are positively
correlated. [3]
(iv) Calculate a 95% confidence interval for the mean finals paper score corresponding to an
individual entrance score of 53. [3]
(v) Calculate the proportion of variation explained by the model. Hence, comment on the fit
of the model. [2]
[Total 14]
ng
si
om
m
11.6 The share price, in pence, of a certain company is monitored over an 8-year period. The results
o
as
are shown in the table below:
.m
Exam style
w
w
Time (years) 0 1 2 3 4 5 6 7 8
w
Price 100 131 183 247 330 454 601 819 1,095
yi xi ei i 0,1, ,8
where {ei } are independent normal random variables with mean zero and variance 2 .
(i) Determine the fitted regression line in which the price is modelled as the response and
the time as an explanatory variable. [2]
(iii) (a) State the ‘total sum of squares’ and calculate its partition into the ‘regression sum
of squares’ and the ‘residual sum of squares’.
(b) Use the values in part (iii)(a) to calculate the ‘proportion of variability explained by
the model’ and comment on the result. [5]
(iv) The actuary decides to check the fit of the model by calculating the residuals.
Time (years) 0 1 2 3 4 5 6 7 8
Residual 132 21 75 104 75 25
(c) Plot the residuals against time and hence comment on the appropriateness of the
linear model. [7]
[Total 19]
ng
si
om
m
11.7 A schoolteacher is investigating the claim that class size does not affect GCSE results. His
o
as
observations of nine GCSE classes are as follows:
.m
Exam style
w
w
Class X1 X2 X3 X4 Y1 Y2 Y3 Y4 Y5
w
Students in class ( c ) 35 32 27 21 34 30 28 24 7
Average GCSE point
5.9 4.1 2.4 1.7 6.3 5.3 3.5 2.6 1.6
score for class ( p )
(ii) Class X5 was not included in the results above and contains 15 students. Calculate an
estimate of the average GCSE point score for this individual class and specify the standard
error for this estimate assuming the full normal model. [4]
[Total 7]
11.8 An actuary is fitting the following linear regression model through the origin:
Exam style
Yi xi ei ei N(0, 2 ) i 1,2, n
ˆ xiYi [3]
xi2
(ii) Derive the bias and mean square error of ˆ under this model. [4]
[Total 7]
ng
si
om
m
A life assurance company is examining the force of mortality, x , of a particular group of
o
11.9
as
policyholders. It is thought that it is related to the age, x , of the policyholders by the formula:
.m
Exam style
w
w
x Bc x
w
It is decided to analyse this assumption by using the linear regression model:
Age, x 30 32 34 36 38 40 42 44
Force of mortality, x
5.84 6.10 6.48 7.05 7.87 9.03 10.56 12.66
( 104 )
(i) (a) Apply a transformation to the original formula, x Bc x , to make it suitable for
analysis by linear regression. Hence, write down expressions for Y , and in
terms of x , B and c .
(b) Plot a graph of ln x against the age of the policyholder, x . Hence comment on
the suitability of the regression model and state how this supports the
transformation in part (a). [4]
(ii) Use the data to calculate least squares estimates of B and c in the original formula. [3]
(iii) (a) Calculate the coefficient of determination between ln x and x . Hence comment
on the fit of the model to the data.
(b) Complete the table of residuals and use it to comment on the fit. [5]
Age, x 30 32 34 36 38 40 42 44
(iv) Calculate a 95% confidence interval for the mean predicted response ln 35 and hence
obtain a 95% confidence interval for the mean predicted value of 35 . [4]
[Total 16]
ng
si
om
m
11.10 The government of a country suffering from hyperinflation has sponsored an economist to monitor
o
as
the price of a ‘basket’ of items in the population’s staple diet over a one-year period. As part of his
.m
Exam style
study, the economist selected six days during the year and on each of these days visited a single
w
w
nightclub, where he recorded the price of a pint of lager. His report showed the following prices:
w
Day ( i ) 8 29 57 92 141 148
Price ( Pi ) 15 17 22 51 88 95
lnPi a bi ei
where a and b are constants and the ei ’s are uncorrelated N(0, 2 ) random variables.
(iv) Determine a 95% confidence interval for the average price of a pint of lager on day 365:
11.11 (i) Show that the maximum likelihood estimates (MLEs) of and in the simple linear
Exam style
regression model are identical to their least squares estimates. [5]
(ii) Show that the MLE of 2 has a different denominator to the least squares estimate. [4]
[Total 9]