Before you start:
• Your assignment should be in the form of a report that responds to the parts of this assignment.
• Sections from your R output should be embedded in appropriate places in your report. PLEASE
NOTE THAT INCLUDING NON-RELEVANT OUTPUT MIGHT BE PENALIZED.
• Please mention the additional R packages that you are using for producing the outputs of analysis.
• You will need different datasets for completing this assignment provided in Canvas: a. HATCO.csv
• JOHNSON.csv
• STATPAK.csv
Question 1
For this section you will be using the dataset stored in ‘STATPAK.csv’ This dataset can be used to explore the relationships between different stat packages, computing platforms and experience on the one hand, and time taken to complete a task, satisfaction and comprehension measures on the other, when performing a stat package task.
Independent Variables
Stat Package
Which stat package was used?
Platform
Which type of computer was used?
Experience
Number of years that the subject has used stat packages
Dependent Variables
Comprehension
Objective quality of task output.
The resulting stat package was given a score out of 120.
Time
Time needed to finish the task.
The subjects were allowed as much time as they thought was necessary until they felt they could make no further progress.
Satisfaction
A self-reported level of satisfaction with the overall environment. The subjects were asked for a single value up to 150.
Run ANOVA models examining 2 factors – stat package and platform with satisfaction as dependent variable looking for main and interaction effects. (Assumption checks required) (a) Interpret the output and visualize the interaction plots.
• For each factor, rank the levels. State which differences between levels are significant?
• Which combination of stat package and platform produces the highest value for the dependent variable?
Question 2
Using the HATCO dataset:
X1 Delivery speed—amount of time it takes to deliver the product once an order has been confirmed
X2 Price level—perceived level of price charged by product suppliers
X3 Price flexibility—perceived willingness of HATCO representatives to negotiate price on all types of purchases
X4
Manufacturer’s image—overall image of the manufacturer/supplier
X5
Service—overall level of service necessary for maintaining a satisfactory relationship between supplier and purchaser
X6
Salesforce’s image—overall image of the manufacturer’s sales force
X7
Product quality—perceived level of quality of a particular product (e.g., performance or yield)
X8
Size of firm—size of the firm relative to others in this market. This variable has two categories: 1=large, and 0=small
X9
Usage level—how much of the firm’s total product is purchased from HATCO, measured on a 100-point percentage scale, ranging from 0 to 100 percent
X10
Satisfaction level—how satisfied the purchaser is with past purchases
from HATCO, measured on the same graphic rating scale as the perceptions X1 to X7
Run two regression models that include all the independent variables X1 – X7. The first model should have X9 (Usage level) as the dependent variable and the second, X10 (Satisfaction level) as the dependent variable. Compare the two models in terms of: (Assumption checks required)
• Prediction accuracy
• Overall significance of the model
• Impact and significance of the coefficients
• Assessing if the effect of price level (X2) on satisfaction (X10) depends on the value of firm sizes (X8)?
Question 3
Johnson Filtration Inc., provides maintenance service for water filtration systems throughout southern Florida. Customers contact Johnson with request for maintenance service on their water filtration sytems. To estimate the service time and service cost, Johnson’s manager wants to predict the repair time necessary with each maintenance request. Hence, repair time in hours is the dependent variable. Repair time is believed to be related to three factors: the number of months since the last maintenance service, the type of repair problem (mechanical or electrical) and the repairperson who performs the repair (Donna Newton or Bob Jones). Data for sample of ten service calls are reported in the following table: (No assumption checks required).
Repair Time (Hours)
Months since last service
Type of Repair
Repairperson
2.9
2
Electrical
Donna Newton
3.0
6
Mechanical
Donna Newton
4.8
8
Electrical
Bob Jones
1.8
3
Mechanical
Donna Newton
2.9
2
Electrical
Donna Newton
4.9
7
Electrical
Bob Jones
4.2
9
Mechanical
Bob Jones
4.8
8
Mechanical
Bob Jones
4.4
4
Electrical
Bob Jones
4.5
6
Electical
Donna Newton
Managerial Report:
• Develop the simple linear regression equation to predict repair time given the number of months since the last maintenance service, and use the results to test the hypothesis that no relationship exists between repair time and number of months since last maintenance service at the 0.05 level of significance. What is the interpretation of this relationship? What does the coefficient of determination tell you about this model?
• Using the simple linear regression model developed in part (a), calculate the predicted repair time and residual for each of the ten repairs in the data. Sort the data by residual (so that the data are in ascending order by value of the residual). Do you see any pattern in the residual for two types of repair? Do you see any pattern in the residual for two repairpersons? Do these results suggest any potential modifications to your simple linear regression model? Now create a scatter chart with months since last service on x-axis and repair time in hours on the y-axis for which the points representing the electrical and mechanical repairs are shown in different shapes/colors. Create a similar scatter chart for months since last service and repair time in hours for which points representing Bob Jones and Donna Newton repairs are shown in different shapes/colours. Do these charts and results of your residuals analysis suggest the same potential modification to your simple linear regression model?
• Create a new dummy variable that is equal to zero if the type of repair is mechanical and one if the type of repair is electrical. Develop the multiple regression equation to predict repair time, given the number of months since last maintenance service and the type of repair. What are the interpretations of the estimated regression parameters? What does the coefficient of determination tell you about this model?
• Create a new dummy variable that is equal to zero if the repairperson is Bob Jones and one if the repairperson is Donna Newton. Develop the multiple regression equation to predict the repair time, given the number of months since the last maintenance service and the repairperson. What are the interpretations of the estimated regression parameters? What does the coefficient of determination tell you about this model?
• Develop the multiple regression equation to predict repair time, given the number of months since last maintenance service, the type of repair (dummy variable: 0 for mechanical and 1 for electrical) and the repairperson. What are the interpretations of the estimated regression parameters? What does the coefficient of determination tell about this model?
• Which of these models would you use? Why?