1
Start body copy here. Please use any style within the Arial Font Family.
ACTL2131/5101 Assignment
Submission deadline:
Friday, 26 May, 11am sharp via Turnitin
This assignment deals with analysing data, and consists of a mix of practical questions. There are two
main tasks in this assignment. Both have components of knowledge and problem solving and will be
used to assess your communication skills.
Background
You are an actuarial analyst working for the insurance provider. Your manager has requested you to
download, prepare and analyse data set which contains of observations for unemployment insurance
(UI) obtained from the United States Department of Labour available on
http://workforcesecurity.doleta.gov/unemploy/claimssum.asp
Data should be downloaded on a monthly basis for a particular state, the sample should cover from
January 1, 1971 (or the earliest available date) until December 2016. The data to download should
consist of 7 variables for a particular state. These variables include:
• Initial claims
• First payments
• Weeks claimed
• Weeks compensated
• Average weekly benefit
• Benefits paid
• Final payments
There should be no missing data.
Note: You have been assigned one of the 53 states in the document StateStudent.pdf uploaded to Moodle.
Note: for all the tasks below, if you require to set a significance level for the tests, please use 5%.
1
http://workforcesecurity.doleta.gov/unemploy/claimssum.asp
Task 1
Your first task is to analyse the log-variables log(Weekscompensated) and log(Benefitspaid). It
is assumed by many practitioners that these variables follow a normal distribution. However, your
manager doubts this finding and suggests that a alternative distribution with heavier tails than normal
(e.g. log-normal or Student-t) might be more appropriate.
1. Carefully investigate the manager’s conjecture that the data (both series) follow log-normal or
Student-t distribution (rather than normal) is appropriate or not. You should use appropriate
tests and graphical illustrations to support your conclusions.
2. In addition, the manager is also interested in whether the number of weeks claimed and compen-
sated are equal on average. Assist the manager in completing this task by comparing the means
for the log(Weeksclaimed) and log(Weekscompensated). Comment on the choice of the test and
the results.
3. Finally, the manager is interested in the magnitude of the Benefits paid, and has requested you
to test the hypothesis that the average value of log(Benefitspaid) is greater than the value
corresponding to the 60% quantile of this variable. You are asked to comment on the choice of
the test and the results.
Task 2
It is reasonable to expect that number of Weeks compensated have some degree of correlation with the
number of Weeks claimed. In this task, you will analyse the relationship between these two variables.
1. Perform a detailed statistical analysis on whether the simple linear regression model is reasonable
to describe the relationship between the number of Weeks compensated and the number of Weeks
claimed in the time frame ranging from January 1971 (or earliest date for which your data is
available) to December 2005. Estimate this model and comment on your findings. Your discussion
should include, but not limited to, the significance of these variables, model fit, residual statistics
and other findings you might find interesting.
2. At recent conference, the manager heard about the so-called out-of-sample procedure can be used
to validate the predictive power of the model and want you to use it here. The out-of-sample
procedure assumes that the estimation of the model is performed in-sample (i.e. using model
fitted in point 1. for data ranging from January 1971 to December 2005), and the remaining
sub-sample (January 2006 to December 2016) is treated as an out-of-sample period, which is
used to validate the quality of the predictive power of the model. Help the manager validate
the quality of the predictive power of the proposed simple linear regression model by using the
2
out-of-sample procedure.
Format
• This is an individual assignment. You MUST perform your analysis in R.
• The assignment should be typed with the main tables, charts and results presented throughout
the assignment to highlight your responses to the assignment questions.
• Marks will be awarded for neatness, conciseness and clarity of answers; refer to the assessment
rubric.
• Maximum number of pages: 4 (excluding the title page and references). The first 3 pages must
be a self-contained report, and the fourth page can be used as a technical appendix describing
the detailed methodology used. Be as concise as you can, while clearly addressing each question.
• If the length exceeds 4 pages, the pages beyond page 4 will not be marked.
• Format of each page: Please use at least font size 11; single line spacing; and at least 1.27cm
page margins from the left, right, top and bottom.
• As a requirement for the assessment to be marked, you must submit your R code. We will refer
to your R code if necessary.
• Assignment should be submitted via Turnitin on Moodle.
Plagiarism awareness
Students are reminded that the work they submit must be their own. While we have no problem with
students working together on the assignment problems, the material students submit for assessment
must be their own. This means that:
Students should make sure they understand what plagiarism is—cases of plagiarism have a very high
probability of being discovered. For issues of collective work, having different persons marking the
assignment does not decrease this probability.
Students should consult the Turnitin section of the website accessible to all ACTL students
well in advance, as this gives a (non exhaustive list) of things that could go wrong and
explains how the policies above are implemented.
3