MM914 Foundations Of Probability And Statistics
Statistical Inference Minitab Lab Assignment
Louise Kelly, Johnathan Love & Kate Pyper
Instructions
This assignment is worth 15% of the overall mark for this course.
You project submission should be submitted in the form of a word document by 4 January 2019 by 12noon. At the beginning of the document, include your name and registration number.
The project involves a statistical analysis of data using Minitab and you can download the data from MYPLACE. Below are a number of questions relating to the datasets and you should address these questions using the appropriate Minitab commands. Your final report should provide the answers to these ques- tions with any required interpretation. There are marks for the correct answer and marks for using Minitab to compute the answer. The answer and the Minitab output used to compute the answer should be clearly shown in your report.
Your final submission should be clear and concise with only the minimum amount of relevant computer output included.
1
Question 1
The Minitab worksheet Movies contains data on 32 movies released in the period 1997-1998. There are 5 variables in the data set which are:
Column Name Description
- C1 Movie Title of movie
- C2 Opening Gross receipts for the weekend after the movie
- C3 Budget
- C4 Star
- C5 Summer
was released (in millions of dollars) Total budget for the movie (Millions of dollars) Whether or not the movie has a superstar: 1=Star, 2=No star Whether or not the movie was released in summer: 1=Summer, 2= Not summer
- (a) Provide the appropriate measures of location and spread to summarise the gross receipts for the weekend after the movie was released and the total movie budget. For each, give a reason for your choice. (4 marks)
- (b) Recode the categorical variables and summarise in a table the number and percentage of movies with and without a star. (3 marks)
- (c) Perform a hypothesis test to determine if movies with stars are more likely to be released in the summer, giving a clear interpretation of the results. (5 marks)
- (d) Create a scatterplot with Opening on the y-axis and Budget on the x-axis with a clear title and axis labels. Write a sentence to describe what is observed from this plot. (4 marks)
- (e) Remove the two movies with the highest gross receipts from the data set and then compute the correlation coefficient between Opening and Budget. Explain what this correlation measures and how it should be interpreted. (4 marks)
- (f) For the reduced data set, compute the least squares estimates for the re- gression line which could be used to predict Opening from Budget, clearly stating how each of these coefficients can be interpreted. Produce a plot of the data with the least squares linear regression line superimposed. (5 marks)
- (g) Using the model produced in (f), determine the predicted gross opening receipts for a movie with a total budget of $70,000,000. Compute the co- efficient of determination and using this, comment on the reliability of the
prediction.
(5 marks)
2
Question 2
Medical researchers recorded blood cholesterol levels of 28 heart-attack victims 2, 4 and 14 days following the attack. The levels of 30 individuals who had not had an attack were taken as a control group. The data are stored in the Minitab worksheet Cholesterol.
- (a) Produce a box-plot to show the cholesterol levels for each column of data and comment on the distribution of the cholesterol levels over time. (4 marks)
- (b) Produce a 95% confidence interval for the mean cholesterol levels of the individuals in the control group and write a sentence to explain how this interval should be interpreted. (4 marks)
- (c) Perform a hypothesis test to determine if there is a difference between the 2-Day cholesterol levels and those in the control group. Explain your choice of test. (5 marks)
- (d) Perform a hypothesis test to determine if there is a significant change in the cholesterol levels for those who have had a heart attack between Day-2 and Day-14. Explain your choice of test. (5 marks)
- (e) Comment on any impact the missing data may have on the analysis. (2 marks)
3