Question 1
A number, 𝑛, of contestants are registered to take part in an archery contest. The distance between the centre of the target and the point that the 𝑖𝑡h archer’s arrow hits is given by the randomvariable𝑋𝑖,for𝑖=1,…,𝑛. Therandomvariables𝑋1,…,𝑋𝑛areindependentand identically distributed, each following an exponential distribution with a mean of 10cm. Each archer has one shot and the archer whose arrow hits closest to the centre of the target wins the contest.
a) Determine the probability density function of the winner’s distance from the centre of the target, that is, the density of the random variable
𝑌 = 𝑚𝑖𝑛{𝑋1,…,𝑋𝑛}.
(3 marks)
b) Because of an outbreak of food poisoning in the hotel where the contestants are
staying, it is possible that not all registered archers can participate in the contest. The number of archers taking part is given by the discrete random variable 𝑁, such that
𝑃(𝑁=10)=0.8, 𝑃(𝑁=9)=0.1, 𝑃(𝑁=8)=0.1
and 𝑁 is independent of 𝑋1, 𝑋2, … Hence, the winner’s distance from the centre of the target is given by the random variable 𝑍 = 𝑚𝑖𝑛{𝑋1, … , 𝑋𝑁}. Using the properties of conditional expectation, calculate 𝐸(𝑍).
Question 2
Consider the random variables 𝑋1 and 𝑋2, with means 𝐸(𝑋1) = 𝐸(𝑋2) = 0 and variances 𝑉𝑎𝑟(𝑋1) = 𝑉𝑎𝑟(𝑋2) = 1. The random variables follow the bivariate normal distribution, which means that their joint probability density function is given by
𝑓(𝑥1,𝑥2)=
1 2𝜋√1 − 𝜌2
𝑥2 − 2𝑥 𝑥 𝜌 + 𝑥2
𝑒𝑥𝑝{−1 12 2}, 𝑥1,𝑥2∈R
(4 marks)
Total: 7 marks
2(1 − 𝜌2)
where 𝜌 ∈ (−1,1). You can take as given that each of 𝑋1, 𝑋2 follows a standard normal distribution and that their correlation coefficient is 𝜌.
a) Show that if 𝜌 = 0, the random variables 𝑋1, 𝑋2 are statistically independent.
(1 mark)
b) Show that the conditional density 𝑓 𝑋1|𝑋2
takes the following form:
𝑓 𝑋1|𝑋2
(𝑥|𝑥)= 1 2
1 √2𝜋√1−𝜌2
1 𝑥1−𝑥2𝜌 2 𝑒𝑥𝑝{− ( )}
2 √1−𝜌2
1
(1 mark)
c) For 𝜌 = 0.99, state the value of 𝑉𝑎𝑟(𝑋2|𝑋1 = 𝑥) for some 𝑥 and interpret it.
d) Define the random variable 𝑋3 = (𝑋1)2. Show that 𝐶𝑜𝑣(𝑋1, 𝑋3) = 0, carefully justifying
all steps, and interpret your finding.
Question 3
(4 marks)
Total: 8 marks
Four university lecturers (A, B, D, and C) teach four modules each within a given academic year. The sample mean and variance of each lecturer’s module evaluation score, calculated across each lecturer’s modules, are given in the table below.
a) Perform a one-way Analysis of Variance for the above data, stating clearly the
hypotheses tested and reporting your test result at the 5% significance level. (You may
assume that all assumptions of the one-way ANOVA model are satisfied. You are given
the following critical values of the F distribution, one of which will be needed to answer
this question: 𝐹 = 4.474, 𝐹 = 8.745, 𝐹 = 3.490, 𝐹 = 3,12,0.025 12,3,0.05 3,12,0.05 4,12,0.05
3.259.)
(4 marks)
b) After being called in by his Head of Department to discuss his low feedback scores, Lecturer A claims that the reason his scores are comparatively low is that his class sizes were large. The following scatter-plot shows alllecturers’ scores plotted against the sizes of the four classes they each taught, together with the line of bestfit, obtained via simple
regression model of the form
𝑌 = 𝛽 + 𝛽 𝑥 + 𝜀 , 𝜀 ∼ 𝑁(0, 𝜎2), 𝑖01𝑖𝑖𝑖
where 𝑌 are the evaluation scores for individual modules and 𝑥 are the corresponding 𝑖𝑖
class sizes.
i. The estimate of the variance 𝜎2 in the simple linear regression model is 𝑠2 =
0.2744. Calculate the values of 𝑅2 and of the correlation coefficient of the evaluation scores with the class size.
(4 marks)
(2 mark)
Lecturer A
Lecturer B
Lecturer C
Lecturer D
Number of modules
4
4
4
4
Average score
2.60
3.13
3.56
3.92
Variance of scores
0.2196
0.3751
0.1851
0.2416
2
Figure 1
ii. From the plot, estimate the value of the slope, 𝑏1. You are given that the standard error of 𝑏1 is 𝑠𝐵1 = 0.0039. Calculate a 95% confidence interval and state your conclusion. (You may assume that the relevant critical value of the t distribution is approximately 2.)
(3 marks) c) The Head of Department is not convinced that class size explains poor evaluation
scores. She states that it may just be a coincidence that the worst performing lecturers teach larger classes. Explain what further analysis could be carried out to explore the issue further.
(2 marks)
Total: 13 marks
3
Question 4
Let the random variable 𝑋 represent the effort that a randomly chosen actuarial science student puts towards studying for a statistics module (on a scale from 0 to 5) and the random variable 𝑌 represent that student’s final exam mark. Assume that the conditional expectation of 𝑌 given
𝑋 = 𝑥 be given by the following formula:
where
𝐸(𝑌|𝑋 = 𝑥) = 20 + 10𝑥 + 20 ⋅ tanh(𝑥 − 2),
tanh(𝑥) = 𝑒2𝑥 − 1 𝑒2𝑥 +1
is the hyperbolic tangent function. The graph of the function 𝑔(𝑥) = 𝐸(𝑌|𝑋 = 𝑥) is represented by the solid line in Figure 2 below.
a) Let 𝑋 be normally distributed, with mean equal to 2 and standard deviation equal to 0.5. Provide a simulation algorithm for calculating numerically the unconditional expectation 𝐸(𝑌), starting from 𝑛 standard normal observations 𝑧1, … , 𝑧𝑛.
(5 marks)
b) Calculate 𝐸(𝑌|𝑋 = 0) and interpret the result.
(1 mark)
c) An education researcher, who is not aware of the formula for 𝐸(𝑌|𝑋 = 𝑥) given above,
tries to understand the relationship between students’ effort and their final exam mark. The researcher manages to collect data from (𝑋, 𝑌) for 20 students. The data are shown in Figure 2 as points. The lecturer is fitting two regression models to the data, with prediction equations:
Model 1: 𝑦̂ = −23.88 + 32.56 ⋅ 𝑥
Model 2: 𝑦̂ = 19.83 ⋅ 𝑥
Model 1 is a standard linear regression model. Model 2 is fitted by fixing the intercept to
zero, that is, 19.83 is the minimiser of the expression min ∑20 (𝑦 − 𝑏 ⋅ 𝑥 )2.
b
Draw the regression lines corresponding to the two models on (a print-out of) Figure 2 and include this in your coursework submission.
(3 marks)
d) Comment on the rationale behind Model 2. Explain whether Model 1 or Model 2 is
preferable.
4
𝑗=1𝑗 𝑗
(3 marks)
Total: 12 marks
Figure 2
5
6