Instructions
SMM045 Modelling Coursework Description
The coursework for SMM045 consists of 4 questions and you are required to answer all of them. Note that this is an individual assignment.
For all questions, you are expected to produce analytical solutions and typeset your an- swers with Microsoft Word or Latex. Parts of these questions may also require that you produce graph(s) to illustrate and comment on results. In such case, you are encouraged to use Excel or any other software of your choice.
Answers to analytical questions should be presented in detail. Hence, you are advised not to miss stages and present a clear exposition of your derivations and proofs. Any transgressions will result in a reduction of marks.
Deadline: 03 April 2020.
Submission Method: Online via Moodle.
Question 1
A quality control process in the car manufacturing industry requires that car parts are to tested in a controlled environment in order to identify whether their lifetime has improved compared to older versions. It is assumed that, for a given part, the time taken until it malfunctions can be modelled with a hazard function of the form
μx = 0.00018 × 1.06x
where x is the length of time (in hours) that the part has been tested for.
i. Determine the probability that a new part continuous to function properly after 60 hours of testing. [2 marks]
ii. Calculate the probability that a part that is functioning properly after 20 hours of testing continuous to function properly after 50 hours of testing. [2 marks]
iii. Calculate the length of time when 50% of new parts are expected to malfunction.
[2 marks]
In reality, one of the hazards of car parts is that they deteriorate depending on the driving style or the quality of the road in the area where the car is driven.
iv. Suggest a different form of the hazard function that would allow for this real world hazard i.e. one function that models all the hazard rather than just an additional function. [2 marks]
It has been suggested that it is not possible to claim the duration of these parts because it is not possible to replicate how parts perform in the real world. To investigate the latter, a sample of drivers are asked to record the length of time that the part was used before it malfunctioned.
A Cox regression model was used to fit the data as detailed below: hi(t)=h0(t)×exp{−0.12(xi −3)+0.3×gi −0.2×ci}
where:
xi = average time of individual session where the car is driven gi = 1 if the car was driven in both the city and rural areas
0 if the car was driven only in the city
ci = 1 if the car is serviced at the designated time intervals
0 if the car was not serviced at the designated time intervals
v. State the baseline hazard function.
[2 marks]
vi. What does the model say about the survival function of the following two parts:
– a part where the car is driven only in the city, the average driving session is 2 hours and the car is not properly maintained;
– compared to a part where the car is driven within both the city and rural areas, the average driving session lasts 4 hours and the car is not maintained.
[2 marks]
The model appears to have a high level of standard error. To further improve the model it has been suggested that the data should be gathered on the type driving style.
vii. Comment on why the current model may have a high level of standard error and discuss the reasons for the suggestion to include driving style as a parameter and the problems that may occur when trying to fit this parameter. [4 marks]
viii. Suggest two other covariates that you could add to the model to try and improve the model’s predictability. For each covariate explain the reasoning for its inclusion and any problems that the covariate may cause. [4 marks]
[Total 20 marks]
Question 2
The following Table provides data on preliminary results from a clinical trial that was carried out to evaluate the efficacy of a new chemotherapy designed for acute myelogenous leukemia (AML). After reaching a status of remission through treatment by chemother- apy, the patients who entered the study were assigned randomly to two groups. The first group received the new (Type II) chemotherapy, while the second group received the standard chemotherapy (Type I). The objective of the trial was to see if Type II chemotherapy prolonged the time until relapse. The length of complete remission in weeks is indicated under Type I and Type II.
Patient
Number 1
2
3
4
5
6
7
8
9
10
11
12
i. ii.
iii.
iv.
v.
Type II
9
13
13
18
23
28
31
34
45
48
161
Reason patient left in-
vestigation
End of Investigation End of Investigation Withdrawal
End of Investigation End of Investigation Withdrawal
End of Investigation End of Investigation Withdrawal
End of Investigation Withdrawal
Type I
5
5
8
8
12
16
23
27
30
33
43
45
Reason patient left in-
vestigation
End of Investigation End of Investigation End of Investigation End of Investigation End of Investigation Withdrawal
End of Investigation End of Investigation End of Investigation End of Investigation End of Investigation End of Investigation
Describe the types of censoring that are present in the data. [3 marks] (a) Construct a Kaplan-Meier estimate of the time that a person is still considered
to be in remission. [4 marks] (b) Use Excel (or a software of your choice) to derive and demonstrate graphically
the survival and hazard function. [3 marks] (c) Carry out an appropriate test (using a software of your choice) to conclude on
whether the new treatment results in a greater remission time. [4 marks] State the assumptions underlying the Kaplan-Meier estimate and whether you think
they apply in this scenario. [5 marks] Calculate the probability that a patient’s status is classed as in remission after 30
weeks of treatment. [1 marks]
The manufacturer wants to claim that the drug will mean that after 30 weeks of treatment 40% of patients will be in remission. To be able to make this claim it must use a suitable hypothesis test at the 0.05 significance level. Carry out such a test, discuss your conclusions and what action should take place. [5 marks]
[Total 25 marks]
Question 3
A country is split into five separate districts – North, South, East, West and Central. The government actuaries department of the country produces mortality tables for the whole population but in addition each of these districts have their own mortality tables to help forecast future populations.
These mortality tables are constructed by breaking the population down into four differ- ent age groups: 0-25; 26-64; 65-88; and 89+
The first three age groups are graduated using graduation by parametric formula with a different function fitted to each age range. The final group is graduated by reference to the country’s standard mortality table.
i. Explain why each district needs to calculate its own mortality rates. [2 marks] ii. Discuss why the districts have used the above method to create their mortality
tables. [4 marks]
The data collected over the last three years relating to 65 to 88 year olds for one of the districts is given below. To calculate the mortality rates the crude data is graduated using a function with four parameters.
Age Initial Exposed
65 136120
66 134462
67 132598
68 130302
69 127933
70 125279
71 122277
72 118986
73 115356
74 111438
75 107110
76 102450
77 97468
78 92211
79 86703
80 80816
81 74772
82 68628
83 62375
84 56154
85 50033
86 44395
87 38670
88 33282
to Risk
Actual Deaths 1579 1792 2208 2278 2552 2886 3165 3490 3767 4162 4481 4790 5055 5296 5661 5811 5908 6012 5982 5886 5421 5505 5180 4837
Graduated mortality rates 0.012000 0.013730 0.015832 0.017782 0.020248 0.022906 0.025780 0.029195 0.032977 0.037150 0.041738 0.046766 0.052256 0.058227 0.064701 0.071691 0.079210 0.087269 0.094871 0.103017 0.111702 0.121917 0.131647 0.143869
iii.
(a) Carry out the following tests on the above data:
– Chi-squared Test
– Standardised Deviations Test
– Signs Test
– Grouping of Signs Test
– Cumulative Deviation test over the whole range, half range and quarter ranges
For each test state the reason for carrying out the test and draw a brief con- clusion. [8 marks]
(b) For each test, confirm the result using either Excel or any other software of your choice. [5 marks]
iv. You have also been asked to carry out a smoothness test on the graduated rates for this age range. As well as carrying out the test, explain why a smoothness test is not normally required and why it has been requested in this case. [3 marks]
v. Using your results from parts iii) and iv) comment the suitability of the graduation including commenting on actions that can be taken to improve the results, and the consequences of using these standard mortality rates to project demand for services (not including pensions) used by the elderly. [8 marks]
[Total 30 marks]
Question 4
A benefit scheme makes payments when an employee is long-term sick during their work- ing life, pays a pension when the person retires and a lump sum on death if this occurs before retirement. An employee retires normally at age 65 though they may retire earlier if they retire through ill-health. The company sometimes offers early retirement to their staff who are over 60 though only as part of the company’s overall strategy and these offers are only for a short period of time (usually the opportunity only lasts for three months). The company is involved in heavy industry and both the mortality rates and the sickness rates for members of the scheme are higher than the average rates in the country.
The scheme is open to employees over the age of 25. The scheme is to be modeled using the following five-state model where the arrows indicate the possible transitions:
1: Active 2: Sick
3: Retired 5: Dead 4: Withdrawal
You are given the following definitions with respect to this model:
– μij is the force of transition from State i to State j at exact age (x + t), where
– pij is the probability that a life in State i at exact age x is in State j at exact age tx
(x + t) where i = 1, 2; j = 1, 2, 3, 4, 5; i ̸= j and 1 > t ≥ 0.
i. Derive a solution for p11 in terms of the transitions above (where t < 1) stating
your assumptions.
ii. Derive a differential equation for p22 in terms of the transitions above (where t < 1)
and state the boundary condition. [3 marks]
iii. A student has queried whether there is an error in the model as there is no transition
possible from State 2 to State 4. Explain why this is not an error. [1 marks]
iv. Comment on the suitability of the Markov assumption for this model. [3 marks]
v. It has been suggested that the calculations would be easier if forces of transition were assumed to be constant for each year of age x. Comment on the suitability of this suggestion. [5 marks]
x+t
i ̸= j;
tx
[6 marks]
tx
It is decided that a constant force of transitions can be assumed. The data relating to 50 year olds in the scheme was collected (over a three year period) and is given below:
39 transitions from state 1 to state 2 0 transition from state 1 to state 3 10 transitions from state 1 to state 4 6 transitions from state 1 to state 5
28 transitions from state 2 to state 1 6 transitions from state 2 to state 3 3 transitions from state 2 to state 5
The time spent in the two states was 610 years for state 1, and 93 years for state 2.
vi. State the likelihood function for these data. [3 marks]
vii. Derive the maximum likelihood estimate for μ15. [2 marks] 50
viii. Calculate an estimate of the standard error of μˆ15 where μˆ15 is the maximum 50 50
likelihood estimator of μ15. 50
[2 marks]
[Total 25 Marks]