RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS REGRESSION MODELLING (STAT2008/STAT4038/STAT6014/STAT6038)
Assignment 1 for Semester 1, 2019
INSTRUCTIONS:
• Thisassignmentisworth15%ofyouroverallmarksforthiscourse.
• Please submit your assignment on Wattle. When uploading to Wattle you must submit the fol-
lowing, combined into a single document:
1. Yourassignment/reportinapdfdocument.
2. An ‘.R’ file containing the R code you have used for the assignment. Failure to upload the R code will result in a penalty.
• Assignmentsshouldbetyped.Yourassignmentmayincludesomecarefullyeditedcomputerout- put (e.g. graphs, tables) showing the results of your data analysis and a discussion of these results, as well as some carefully selected code. Please be selective about what you present and only include as many pages and as much computer output as necessary to justify your solution. It is important to be be concise in your discussion of the results. Clearly label each part of your report with the part of the question that it refers to.
• Unlessotherwiseadvised,useasignificancelevelof5%.
• Marksmaybedeductediftheseinstructionsarenotstrictlyadheredto,andmarkswillcertainlybe deducted if the total report is of an unreasonable length, i.e. more than 10 pages including graphs and tables. You may include an appendix that is in addition to the above page limits; however the appendix will not be assessed. It will only be used if there is some question about what you have actually done.
• You may ask me (Abhinav Mehta) questions about this assignment up to 24 hours before the submission time. This will allow me enough time to respond to your questions.
• Latesubmissionswillattractapenaltyof5%ofyourmarkforeachdayofdelay.Noassignments will be accepted 10 days beyond the due date.
• Extensions will usually be granted on medical or compassionate grounds on production of ap- propriate evidence, but must have my permission by no later than 24hours before the submission date. If you are granted an extension and submit your assignment after the extended deadline then the late submission penalty will still apply.
Assignment 1 – Sem 1, 2019 Page 1 of 3
Question 1 [50 Marks]
Data on eruptions of Old Faithful Geyser, in October 1980 was collected and stored in a .csv file ‘oldfaithful’. Variables are the duration in seconds of the current eruption, and the interval time in minutes to the next eruption. Data was not collected between approximately midnight and 6 AM. It is suspected that Duration is associated with the Interval
(a) [5marks] Conductanexploratorydataanalysistoassesswhetherthetwovariablesareassoci- ated. Is there a statistically significant correlation between the variables?
Use the cor.test() function to conduct a suitable hypothesis test. Clearly specify the hy- potheses you are testing and present and interpret the results.
(b) [20 marks] Fit a simple linear regression (SLR) model with Interval as the response variable and Duration as the predictor. Construct a plot of the residuals against the fitted values, a normal Q-Q plot of the residuals, a bar plot of the leverages for each observation and a bar plot of Cook’s distances for each observation. Use these plots (and other means) to comment on the model assumptions and on any unusual data points.
(c) [10marks] ProducetheANOVA(AnalysisofVariance)tablefortheSLRmodelandinterpret the results of the F-test. What is the coefficient of determination for this model and how should you interpret this summary measure?
(d) [10marks] WhataretheestimatedcoefficientsoftheSLRmodelinpart(b)andthestandard errors associated with these coefficients? Interpret the values of these estimated coefficients and perform t-tests to test whether or not these coefficients differ significantly from zero. What do you conclude as a result of these t-tests?
(e) [5marks] Ifthereisaeruptionwhichlastedfor120secondsthenwhatwillbetheintervalof time before the next eruption, as predicted by your model? Construct an appropriate interval estimate for the length of this interval.
Assignment 1 – Sem 1, 2019 Page 2 of 3
Question 2 [50 Marks]
On March 1, 1984, the Wall Street Journal published a survey of television advertisements conducted by Video Board Test, Inc., a New York ad-testing company that interviewed 4000 adults. These respondents were regular product users who were asked to cite a commercial they had seen for that product category in the past week. In this case, the response is the number of millions of retained impressions per week (return). The predictor, (spend), is the amount of money (in $ millions) spent by the firm on advertising. The data is available on wattle in .csv file called advertising.
(a) [10marks] Istherealinearassociationbetweenthetwovariables?Youmaywanttoexperiment with some transformations, like the natural log (log()) and the square root transformation (sqrt()) to one or both of your variables to assess the linear association. Make a choice at this stage, for your transformed variables and provide justification for this choice.
(b) [15marks] Withyourchosentransformations,fitasimplelinearregression(SLR)model.Con- struct a plot of the residuals against the fitted values, a normal Q-Q plot of the residuals, a bar plot of the leverages for each observation and a bar plot of Cook’s distances for each observa- tion. Use these plots (and other means) to comment on the model assumptions and on any unusual data points.
(c) [10marks] ProducetheANOVA(AnalysisofVariance)tablefortheSLRmodelandinterpret the results of the F-test. What is the coefficient of determination for this model and how should you interpret this summary measure?
(d) [15marks] Basedonthemodelfitinpart(b),writethemathematicalexpressionfortheregres- sion model in the original untransformed variables. Interpret the effect of coefficients on the response variable. In particular, for every $1 million increase in spending how much increase is expected in the retained impressions, based on your chosen model fit?
Assignment 1 – Sem 1, 2019 Page 3 of 3