Research School of Finance, Actuarial Studies and Statistics ASSIGNMENT
Semester 2, 2022
STAT7055 Introductory Statistics for Business and Finance ©c 2 0 2 2 A N U
INSTRUCTIONS TO STUDENTS
Copyright By PowCoder代写 加微信 powcoder
• The assignment is due at 6:00pm on Thursday October 20.
• Late submission of the assignment is not permitted. An assignment submitted without
an extension after the due date will receive a mark of 0.
Obtaining your Assignment
• There are different versions of the assignment and each student will be assigned a particular version of the assignment.
• Therefore, you must log in to Wattle with your own ANU credentials and download your assignment questions and data directly from Wattle.
Writing your Assignment
• The assignment is an individual piece of assessment and must be completed on your own.
• You will be required to write a report in an R Markdown document that contains R code, R output and written text. An example of an R Markdown document, which you can use as a template, has been provided on Wattle.
• When answering the assignment questions in your report, you will need to include all your R code and R output that you used to calculate any answers and you must also write your answers in proper sentences. For example, if you are required to calculate a sample mean, then you would include your R code for calculating the sample mean and the R output of the sample mean value and you would also write a proper sentence in the report such as “The sample mean is equal to …”.
• Make sure to be clear and concise in your answers.
• A good way to approach writing your report is to imagine that you are a statistical
consultant and that a client has asked you to do some statistical analyses. When presenting the results of your analyses to the client, you wouldn’t just give them pages of R code or pages of R output. Rather, you should give them a proper report which clearly outlines and explains the results of the analyses and which also includes the R code and R output used to produce the results.
• Therefore, presentation is very important. Marks will be deducted for poorly presented reports.
• Once you have finished writing your report in your R Markdown document, you will need to render the document by pressing the Knit button in RStudio to create a HTML file of your report.
• Further to the above point, it is good practice to regularly Knit your R Markdown document as you write your report. This is useful for checking that it’s rendering properly.
Submitting your Assignment
• Submission of the assignment will be through Wattle via Turnitin.
• A Turnitin link with further details regarding assignment submission will be provided
on Wattle.
• For submission you will need to submit two files: the R Markdown file of your report
(i.e., a “.Rmd” file) and the rendered HTML file of your report produced by pressing
the Knit button in RStudio (i.e., a “.html” file).
• Please name your two files as “uNNNNNNN.Rmd” and “uNNNNNNN.html”, where
uNNNNNNN is your student number.
• No other file types will be accepted or marked, e.g., “.R”, “.docx”, “.RData”, “.zip”,
etc. In particular, do not submit any compressed files.
Other Important Details
• You may only use built-in functions available in the default installation of R and you are not permitted to use functions in any additional R packages (e.g., ggplot2).
• You must use the appropriate R functions (and not the statistical tables) to calculate critical values or p-values for the normal, t and F distributions.
• You must use R for all calculations.
• Round all final numeric answers to 4 decimal places. However, as you will be using
R, keep all decimals during all intermediate steps to ensure the accuracy of your final
numeric answer.
• Please use the help function if you want to learn more about a particular R function,
e.g., enter help(mean) in the R console to learn more about the mean function.
• For questions that require writing mathematical symbols, you are welcome to use short- hand notation, provided you make the meaning clear (e.g., using “Mu” for μ, or “!=”
• Answers (including hypotheses, explanations, conclusions, etc.) need to be written in
the text of the R Markdown document and not in the comments of code chunks.
• Do not print out the entire data sets in your R Markdown document or HTML file, as
this will only take up unnecessary space.
Question 1 [25 marks]
Some data were collected to assess problem solving skills in secondary school students within a school district. A random sample of 500 year 7 students and a random sample of 500 year 8 students were selected (the two samples were selected independently of each other) and each student was given a test consisting of questions based on logic and reasoning. The time in minutes it took each student to complete the test was recorded. The data are stored in the file AssignmentData.RData in the data frame Q1.df. The data frame contains two columns, one for the test times of the year 7 students (Year7) and one for the test times of the year 8 students (Year8).
(a) [4 marks] Create a boxplot and a histogram of the test times for the year 7 stu- dents. Make sure to give each plot a proper descriptive title and label the x-axis of the histogram appropriately (do not just use the default titles or labels). Based on these plots, describe the distribution of the test times for the year 7 students. Be specific in your description, making sure to mention any interesting and/or impor- tant aspects of the distribution.
(b) [3 marks] Based on the definitions given in the lectures, calculate the sample range, the sample interquartile range and the sample coefficient of variation of the test times for the year 7 students.
(c) [4 marks] The testing centre that collates all the students’ test results classifies a test taker’s performance into categories depending on the time it took them to complete the test. Specifically, a test time shorter than 39.35 minutes is considered “great”, a test time between 39.35 and 44.05 minutes is considered “good”, a test time between 44.05 and 45.85 minutes is considered “average”, a test time between 45.85 and 48.25 minutes is considered “mediocre” and a test time longer than 48.25 minutes is considered “poor”. Create a bar chart that describes the year 7 students’ test performance in terms of this classification. Make sure to give the bar chart a proper descriptive title and label the x-axis appropriately (do not just use the default title or label). Determine the least frequently occurring category.
(d) [3 marks] Test whether the population proportion of year 7 students within the school district that would complete the test in less than 43.05 minutes is less than 0.415. Clearly state your hypotheses, making sure to define any parameters, and use a significance level of α = 5%. Do not use any R functions that are designed to perform hypothesis tests.
(e) [4 marks] Test whether the population proportion of year 7 students within the school district that would take longer than 44.05 minutes to complete the test is less than the population proportion of year 8 students within the school district that would complete the test in less than 49.55 minutes. Clearly state your hypotheses, making sure to define any parameters, and use a significance level of α = 2.5%. Do not use any R functions that are designed to perform hypothesis tests.
Assignment S2 2022 Page 3 of 5 STAT7055
For parts (f), (g) and (h), consider only the test times for the first six year 7 students in the sample (i.e., the first six rows of the Year7 column in the data frame).
(f) [3 marks] For a randomly selected sample of size three taken without replacement from among these six test times, determine the sampling distribution of the sample range.
(g) [2 marks] For a randomly selected sample of size three taken without replacement from among these six test times, if the sample range was greater than 3, find the probability that it was less than 3.5.
(h) [2 marks] For a randomly selected sample of size three taken without replacement from among these six test times, calculate the variance of the sample range.
Question 2 [43 marks]
A study was conducted to investigate possible relationships between upper body strength, body weight and handedness among people who play one of three recreational sports. A sample of 300 total people was obtained by randomly selecting people from each of the three recreational sports and the following were recorded for each person: their body weight in kilograms (Weight), their bench press amount which was defined to be the amount in kilograms that they can comfortably bench press three times (BenchPress), the recreational sport that they play (Sport) and whether they were left-handed or right- handed (Hand). The data are stored in the file AssignmentData.RData in the data frame Q2.df.
(a) [3 marks] Test whether the population variance of body weight is the same for peo- ple who play football and for people who play tennis. Clearly state your hypotheses, making sure to define any parameters, and use a significance level of α = 1%. Do not use any R functions that are designed to perform hypothesis tests.
(b) [4 marks] Test whether the population mean body weight of people who play foot- ball is greater than the population mean body weight of people who play tennis by more than 0.3 kilograms. Clearly state your hypotheses, making sure to define any parameters, and use a significance level of α = 1%. Do not use any R functions that are designed to perform hypothesis tests.
(c) [4 marks] Based on methods covered in the lectures, use a single test to test whether the population mean body weight is the same across all recreational sports. Clearly state your hypotheses, making sure to define any parameters, and use a sig- nificance level of α = 2%. Do not use any R functions that are designed to perform hypothesis tests or to perform, analyse or interpret an ANOVA.
(d) [7 marks] Discuss whether the assumptions for the test you performed in part (c) hold for this data. You do not need to conduct any hypothesis tests, but make sure to provide clear justifications for your answer.
(e) [3 marks] Test whether the population mean body weight of right-handed people who play tennis is less than 80.5 kilograms. Clearly state your hypotheses, making sure to define any parameters, and use a significance level of α = 0.5%. Do not use any R functions that are designed to perform hypothesis tests.
Assignment S2 2022 Page 4 of 5 STAT7055
(f) [3 marks] Create a scatter plot of bench press amount against body weight. Make sure to give your plot an appropriate title and appropriate labels for the x and y axes. Describe the relationship between these two variables.
(g) [3 marks] Test whether the correlation between bench press amount and body weight is greater than zero. Clearly state your hypotheses, making sure to define any parameters, and use a significance level of α = 5%. Do not use any R functions that are designed to perform hypothesis tests.
(h) [2 marks] Fit a simple linear regression model with bench press amount as the dependent variable and body weight as the independent variable. Write down the estimated regression model.
(i) [5 marks] Discuss whether the assumptions for a simple linear regression model hold for the model you fitted in part (h), making sure to provide clear justifications for your answer.
(j) [5 marks] Considering only left-handed people, fit a simple linear regression model with bench press amount as the dependent variable and body weight as the inde- pendent variable without using the lm function or any other R function designed to fit, analyse or interpret regression models. Write down the estimated regression model. Use the estimated regression model to predict the bench press amount for a left-handed person who weighs 84 kilograms without using any R functions that are designed to calculate any predictions.
(k) [4 marks] For right-handed people who weigh more than 89.5 kilograms, test whether the population mean body weight is greater than the population mean bench press amount by more than 11.5 kilograms. Clearly state your hypotheses, making sure to define any parameters, and use a significance level of α = 2.5%. Do not use any R functions that are designed to perform hypothesis tests.
Presentation [2 marks]
END OF ASSIGNMENT
Assignment S2 2022 Page 5 of 5 STAT7055
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com