STA 104 Exam I Project, due
Friday, May 3th in lecture before 9:50AM
Read the following instructions carefully:
• You may work in a group of two, or by yourself.
• You are not allowed to discuss the questions with anyone other than the instructor or TA and your
group mate.
• Any outside help beyond that from the instructor or TA is considered plagiarism. This including asking a tutor, your classmates (for example, comparing answers), posting the questions to homework help sites, etc. Should we believe you have sought outside help, you will be reported to the Student Judicial Affairs office.
• You are allowed to use or modify your previous functions, or the instructors functions that are posted online.
• Do not share answers, or specific values for calculations, particularly on Piazza.
• You may ask clarifying questions about code and general approach on Piazza, but do not give away any numerical
answers. If you are concerned you may be giving something away, email me or the TA’s directly.
1
The group (of one or two people) will select one question from each topic, for a total of two questions. Topic I
Question 1
The data used for this question is Toys.csv, and it has a column Broken. What has been measured is how many toys a particularly agile cat breed (A Bengal Cat) “destroys” in a week. A frustrated owner believes that they have to buy over 3 toys per week in order to keep up with their cat. Assess this claim using the median.
Question 2
The data used for this question is Play.csv, with a column Return. Border collies are dogs known for their obsessive behavior, and 16 border collie owners measured how many times in a row their dogs returned a ball to the owner after it was thrown. They claim that there is a 50% change that the dogs will return the ball 12 times or more. Assess this claim using the median.
Topic II
Question 1
The data is found in the file Drug.csv, with the following columns:
Column 1: Relief: The number of hours of relief provided.
Column 2: Groups: Either DrugA or DrugB.
The study was comparing the hours of pain relief for two common over-the-counter pain medications. Compare the two groups, being as specific as you can about your outcome.
Question 2
The data is found in the file soil.csv, with the following columns:
Column 1: condition: gap (the soil was under an opening in the forest canopy), and growth (the soil was taken under heavy
tree growth).
Column 2: respiration: The amount of carbon dioxide given off by each soil core (in mol CO2g soiohr).
Soil respiration is a measure of Microbial activity in soil, which affects plant growth. Compare the two groups, being as specific as you can about your outcome.
2
3. The Report Format
Each question should be a short report. This means you write in full sentences, and have the following sections for each question, while being as specific as you can about your results. There should not be any “copy and pasted” R code in this report. You must format the results you get from R.
I. Introduction. State the question you are trying to answer, why it is a question of interest (why might we be interested in the answer), and what statistical technique you are going to use. This must be a non-parametric technique.
II. Summary of your data (and only the data you are using for the question). This should include things like plots (histograms, boxplots) including the interpretation of the plots, and summary values such as sample means and standard deviations. This is where you should justify which non-parametric technique you are using. An R handout is available online for graphing and summaries of various data types.
III. Analysis. Report back confidence intervals, test-statistics, and p-values, nulls and alternatives, etc. You may use tables here, but be sure that you organize your work. Remember to write your results in full sentences where possible.
IV. Interpretation. State your conclusion, and inference that you may draw from your corresponding tests or confidence intervals. These should all be in terms of your problem.
V. Conclusion. Summarize briefly your findings. Here you do not have to re-iterate your numeric values, but summarize all relevant conclusions.
4. Details
Your report should be the following format: i. Typed.
ii. A title page including your name/s, the name of the class, and the name of your instructor (me). iii. Treat each question as a small, stand alone report. Then staple them together at the end.
iv. Double-sided pages.
v. An appendix of your R code used to produce the results. Do not include in R code in the body of your report.
For example, your project should be put together in the following order (stapled):
Cover Page
Parts I-V for first question
Parts I-V for second question
Code appendix
Feel free to make your cover page “unique” so that it is easy to find when I hand them back.
Notice: your project will be graded as a group effort (if you have two people). This means that you are responsible for your own work, and your partners work. I will not assign two different grades to one project.
3