GLBH0031: Modelling for Decision Science
Coursework April 2020
An infectious disease caused by a Maths-2020 virus is assumed to be spreading across a population of 1,000,000 individuals. The transmission is assumed to be via person-to-person contact, with 10 infected individuals initially introduced into the wholly susceptible population.
We assume that once an individual is in contact with the virus, they become infectious after one week. In absence of treatment, they remain infectious over a period of 2 weeks, following which they recover and are immune to re-infection for two months.
Using the above information, this coursework is asking you to develop a mathematical model to describe the transmission dynamics of this infectious disease, parametrise the model, calibrate it against the data and discuss the results of the model. In your model we ask you to use a timestep of four weeks (i.e. a month) and assume that the model is run over a period of one year.
Q1: Design a mathematical model and apply it to modelling infectious disease spread
[45 marks]
1.1) Define the compartments of the model based on the narrative, design the model flow chart and assign appropriate parameters to the arrows linking different compartments.
[2 marks]
1.2) Using part 1.1) and based on the law of mass action, write the system of equations that describe the change in time of each model’s compartment.
[3 marks]
1.3) Using the information for the disease given in the narrative above, set appropriate values to the model parameters you defined in 1.1.
[3 marks]
1.4) Based on the numerical codes we developed during the practical sessions, numerically code the system of equations from 1.2 in MATLAB using the ode45 command. Include your code in your write up.
[10 marks]
1.5) Without solving the system of equations, what possible outcomes of the mathematical model related to the virus spreading in the population exist?
[2 marks]
1.6) What are long-time steady states of the system of equation and describe how linear stability analysis can be used to determine these. Determine the steady states of the system of equations from 1.2 and discuss the stability of the long-term solutions. Discuss how the long-term solutions are related to the virus causing a pandemic or not, and align this to your answer of 1.5.
[5 marks]
1.7) Using your stability analysis results from 1.6 and the numerical code from the 1.4, determine two possible temporal profiles for the cohorts of susceptible and infectious individuals describing two different scenarios. Discuss the differences between the different transient dynamics and long-time solutions.
[10 marks]
1.8) The epidemic curve of the system describes the number of infections caused by the virus over time. Epidemic curves change depending on how fast the virus is spreading within the population.
Using your numerical code from 1.4, and by varying the model parameters, derive two epidemic curves similar to the figure below, describing which model parameter(s) you changed and why.
The epidemic curve shown as a dashed line on the figure below, is a more “flattening” version of the epidemic curve shown as a solid line. Using this information, and the process by which you derived the two epidemic curves, discuss why “flattening the epidemic curve” may be a useful method to prevent the virus spreading.
[10 marks]
Q2: Scenario analysis [20 marks]
For the case of the infection with Maths-2020, we now assume that half of the newly infected individuals get quarantined immediately, with no contact with the rest of the population. Furthermore, we also assume that 10% of infected untreated people die every month.
2.1) Adapt the schematic of the mathematical model from Q1 part 1.1 to incorporate quarantine and disease-related deaths and clearly define any new parameters.
[2 marks]
2.2) Modify the equations in Q1, part 1.2 to account for the presence of quarantine and disease- related deaths. Solving the system of equations numerically and discuss how the long-term solutions differ to those in part 1.5.
[3 marks]
2.3) Let N define the total population of the model from question 1.2. By adding all the derivatives in part 1.2 and integrating the resulting equation for N, show that the SEIR model described in this part describes a constant population model.
[5 marks]
2.4) Now adapt the model from part 1.5, to include only natural deaths. let M define the total population in this case. By again adding all the derivatives, and integrating the resulting equation for M, show that this SEIR model from 1.5 describes a population that decays exponentially.
[5 marks]
2.5) Without solving the equations of the Model from 2.2, and by using the results from 2.3 and 2.4, discuss what do you expect the total population of the model in 2.2 to look like.
[5 marks]
Q3: Calibration of the mathematical model [15 marks]
The data in the table below describes the monthly number of infected cases for this infectious disease over a period of a year in country X:
Month
Number of infected cases
January
10
February
10
March
15
April
25
May
27
June
30
July
35
August
40
September
60
October
65
November
70
December
70
Table 1: Monthly number of infected cases
3.1)Using Table 1 discuss the descriptive statistics of the number of infected cases, projecting the maximum, minimum, mean and the range of infected cases. What type pf disease may this represent? [2 marks]
3.2) Plot the data from Table 1 and fit an exponential model and a polynomial of degree 2 and of degree 3 to the data. Using the 𝑅” as a measure for goodness of the fit, discuss which of these models is the best fit. (Within your results, include a screenshot of your results using the Curve Fitting Tool as well as snapshots of the model equations in each case and the projected 𝑅”)
[3 marks]
3.3) Calibrate the model defined in part 1.2 to these data showing the fitting plot and the R- square number. What is the new value of beta? Discuss your results and make potential predictions for the future transmission of this disease.
(For this questions Include a screenshot of your results using the Curve Fitting Tool showing the Results box, Fit Options box and the plot).
[10 marks]
Q4: Data analysis and machine learning [20 marks]
The virology of the Maths-2020 virus is unclear and the risk of infection for the population is not fully understood. Three different possible factors have been suggested as important to whether individuals infected with Maths-2020 will be hospitalised or not with the virus and also whether they will die as a consequence of the virus. These are: factor 1= age, factor 2=number of contacts with infected individuals and factor 3=underlaying health conditions.
4.1) Discuss how machine learning algorithm may be used to determine which of these three factors 1-3 may be the most significant determinant of hospitalising people with the Maths- 2020 infection and dying from Maths-2020.
Would your machine learning algorithm be a regression or a classification algorithm and why? [5 marks]
4.2) Consider the dataset below. It contains the values of factors 1-3 for 50 individuals. Obtain descriptive statistics of the three factors for hospitalized and non-hospitalised and for dead and non-dead. Based on this, discuss which variable(s) you expect to be driving hospitalizations and deaths. Note: due to the small sample size and non-normality of some variables, only report median values and max-min intervals.
[5 marks]
4.3) Design two random forest algorithms: one describing hospitalising people with the infection and another one describing people dying from the virus.
Plot the first trained decision tree of each random forest and the out-of-bag (OOB) classification error with respect to the number of trees. Do the tree plots reinforce your conclusions from exercise 4.2? Discuss why or why not? Do the errors converge? If yes, discuss how many decision trees we need in each forest to reach convergence approximately? Please include the MATLAB code in your coursework.
[10 marks]
Dataset for Q4:
Age = [62.2305; 78.4088; 65.1736; 58.4743; 71.9337; 65.5837; 66.7383; 68.5120; 68.5550; 72.9297; 66.7681; 69.2763; 65.7561; 66.5899; 71.0894; 45.8784; 63.9656; 48.1489; 57.0661; 69.5567; 54.7143; 65.2957; 50.4288; 67.9574; 72.4937; 65.6961; 68.8917; 67.3778; 70.4884; 69.5894; 70.4203; 58.7361; 63.2689; 67.5224; 62.7476; 60.8367; 49.0112; 56.4758; 63.7243; 69.2878; 58.0572; 57.2322; 56.5681; 67.0423; 79.5763; 76.9493; 55.0285; 55.4782; 78.1735; 49.9563];
Number_of_Contacts = [20;3;2;7;11;12;6;9;8;2;16;11;9;9;7;7;8;6;7;7;6;9;12;4;8;12;11;8;9;0;0;9;20;5;8;6;6;9;10;12;4 ;10;6;9;5;3;7;8;4;5];
Underlaying_Health_Conditions= [5;7;3;5;5;2;0;2;1;7;0;4;0;0;2;1;1;1;0;2;4;6;3;3;3;3;4;7;3;7;2;2;0;1;1;1;1;0;1;3;0;1;2;0;3;2;3;2; 4;0];
Hospitalisation = [1;1;1;1;1;0;0;1;0;1;0;1;0;0;1;0;0;0;0;1;0;1;0;1;1;0;1;1;1;1;1;0;0;0;0;0;0;0;0;1;0;0;0;0;1;1;0;0; 1;0];
Death = [1;0;0;1;1;NaN;NaN;1;NaN;0;NaN;1;NaN;NaN;1;NaN;NaN;NaN;NaN;1;NaN;1;NaN;0;1;NaN;1 ;1;1;0;0;NaN;NaN;NaN;NaN;NaN;NaN;NaN;NaN;1;NaN;NaN;NaN;NaN;1;0;NaN;NaN;0;NaN];
Note: Please copy and paste the data below into your MATLAB command window. And for reproducibility, use a seed value of 0 at the beginning of this exercise (i.e., rng(0)).