Statistical Project Assignment
ECON7300: Statistical Project Assignment (Part IIIb), Semester 2, 2021
Instruction:
(A) Questions in this paper should be answered by students whose surnames fall
within the range G-M.
(B) Use the Excel file ‘Dataset2_part3b to answer the questions asked.
(C) A heavy penalty would apply if your answers to the questions are not based on
Dataset2_part3b
Instructions for Dataset2_part3b: Multiple Regression Analysis
Absenteeism is a serious employment problem in most countries. Two economists
launched a research project to learn more about the problem. They randomly selected
100 organisations to participate in a one-year study. For each organization, they
recorded the average number of days absent for employee, percentage of part-time
employees, percentage of unionised employees and availability of shiftwork.
The variables in the dataset are:
• adab (Y, average number of days absent)
• ptp (X1, percentage of part-time employees in each organisation)
• up (X2, percentage of unionised employees in each organisation)
• swork (X3, availability of shift work: coded 1 if yes and 0 if no)
The dependent variable for your analysis is adab.
Answer the following questions using Dataset2_part3b
(a) Estimate a regression model using X1 and X2 to predict Y (state the multiple
regression equation).
(b) Interpret the meaning of the slopes.
(c) Predict Y when X1 = 15 and X2 = 40.
(d) Compute a 95% confidence interval estimate of the mean Y for all organisations
when X1 = 15 and X2 = 40 and interpret its meaning.
(e) Compute a 95% prediction interval of Y for a single organisation when X1 = 15
and X2 = 40 and interpret its meaning.
(f) Plot the residuals to test the assumptions of the regression model. Is there any
evidence of violation of the regression assumptions? Explain.
ECON7300: Statistical Project Assignment (Part IIIb), Semester 2, 2021
(g) Determine the variance inflation factor (VIF) for each independent variable (X1
and X2) in the model. Is there reason to suspect the existence of collinearity?
Why?
(h) At the 0.05 level of significance, determine whether each independent variable
(X1 and X2) makes a significant contribution to the regression model (use t tests
and follow all the necessary steps). On the basis of these results, indicate the
independent variables to include in the model.
(i) Test for the significance of the overall multiple regression model (with two
independent variables, X1 and X2) at 5% level of significance.
(j) Determine whether there is a significant relationship between Y and each
independent variable (X1 and X2) at the 5% level of significance (hint: testing
portions of the multiple regression model using the partial F test).
(k) Compute the coefficients of partial determination for a multiple regression model
containing X1 and X2 and interpret their meaning.
(l) Estimate a regression model using X1, X2 and X3 to predict Y (state the multiple
regression equation, the regression equation for availability of shift work, the
regression equation for non-availability of shift work) and interpret the coefficient
for X3.
(m) Estimate a regression model using X1, X2, X3, an interaction between X1 and
X2, an interaction between X1 and X3, and an interaction between X2 and X3 to
predict Y.
(n) Test whether the three interactions significantly improve the regression model.
Assume 5% level of significance (hint: test the joint significance of the three
interaction terms using the partial F test. If you reject the null hypothesis, test the
contribution of each interaction separately (using the partial F test) in order to
determine which interaction terms to include in the model).