STA238 – FINAL PROJECT
PROJECT PROPOSAL DUE: MARCH 21, 2022 @ 8 PM ET IF WORKING WITH A PARTNER:
PAIR WORK CONTRACT DUE: MARCH 18, 2022 @ 11:59 PM ET
FINAL PROJECT DUE: APRIL 19, 2022 @ 8 PM ET
Copyright By PowCoder代写 加微信 powcoder
PROJECT INSTRUCTIONS TO COME AFTER PROPOSAL FEEDBACK
The purpose of this project is to provide you with an opportunity to perform basic data analysis, from beginning to end, using tools and concepts covered in our course. This includes everything from: formulating a study question, reading in and cleaning data, performing exploratory data analysis and representing them using select numerical and graphical summaries, performing inferential statistics, and communicating your findings using both technical and non-technical language. The grading rubric for the proposal is the same whether you choose to work individually or with a partner.
INDIVIDUAL WORK
For this final project, you are permitted to work with ONE partner in the class if you choose to do so. Those working in pairs will have to perform one additional analysis as part of their final project component. If you prefer to work in pairs, scroll to the section titled ‘Pair Work’.
In your individual and independent final project, you will need to perform appropriate exploratory data analysis to your data set, along with at least two of the following statistical methods, appropriately chosen for your research question:
• Statistical inference using confidence intervals (single mean/variance or two population means) OR bootstrapped confidence intervals
• Statistical inference using hypothesis testing (single mean or two population means) OR simulated hypothesis tests
• Estimator analysis through simulation methods (such as simulating new data or bootstrapping)
• Goodness of fit test
• Simple Linear Regression, including diagnostics and inference on parameters (i.e. Confidence
intervals and/or hypothesis testing of beta parameters are part of SLR)
These instructions apply to students who plan to work with another classmate in the course. If you choose to work with a partner, it will be you and your partners’ responsibility to establish how work is fairly shared, and to manage yourselves in establishing work expectations, meeting deadlines, and having a mutual agreement on calibre/quality of work you expect of each other. To facilitate this, students working in a partnership will be expected to complete a pair work contract, agreed upon by both parties beforehand. The pair work contract MUST be submitted between March 15-18 for the group to be created on Quercus.
In your final project working as a pair, you will need to submit as part of your final project two sets of data analyses:
1. Answer one research question using simple linear regression analysis (complete with inference on parameters), complete with diagnostics
2. Perform appropriate exploratory data analysis to your data set, along with at least two of the
following statistical methods, appropriately chosen for your second research question:
• Statistical inference using confidence intervals (single mean/variance or two population means) OR bootstrapped confidence intervals
• Statistical inference using hypothesis testing (single mean or two population means) OR simulated hypothesis tests
• Estimator analysis through simulation methods (such as simulating new data or bootstrapping)
• Goodness of fit test
PROPOSAL INSTRUCTIONS
Your proposal should only include an outline and plan for your project, which will be submitted for TA review and feedback. You are not expected to do any analysis for your project other than the initial EDA until after the feedback has been returned to you.
To write your project proposal, you will need to do the following tasks:
1. If you are working with a partner, complete your pair work contract together, agreed to and signed by both students before proceeding. This contract must be submitted on MS Forms between March 15 and March 18 in order to be able to work with a partner. If you are working alone, start at step 2.
2. Find a data set that might have sufficient potential to be studied on Open Data Toronto (https://open.toronto.ca/) or using one of the open data sets in R
(some R packages include additional data sets that you can explore: https://cran.r- project.org/web/packages/available_packages_by_name.html). Here are some additional data sources you can use as long as they are classified as “open datasets”:
a. Gapminder which has open global data: https://www.gapminder.org/data/
b. Kaggle which has some freely available data sets: https://www.kaggle.com/datasets
(make sure to check “Open Database” in the filters)
You may not use any data set that has already been studied in our course! Keep your options open and have a few data sets selected and pick the best choice among them. For the purposes of data analysis, it is recommended you limit yourself to .csv formatted data (scroll to “Formats” and filter by “.csv” files). Keep in mind what kind of analysis can be done on the data sets you are considering. Once you have a viable data set, begin thinking about what research question can be investigated with your chosen data.
3. In 2-3 sentences, provide a description and context of your data set. This is an overview of what data was collected, the variables included (including units of measurement). If there’s a time frame, include that information, such as frequency of data collection, or date range of your data. Include a citation and link to your data source (APA/MLA/Chicago styles only).
4. State your research question clearly. Your research question should be specific to your data set and context.
e.g. “Has rainfall gotten worse?” Is not a clear research question. Include relevant contextual details, such as “Has the trend in the amount of rainfall that occurs over the summer in Toronto changed over the past 10 years?”
5. In 3-4 sentences, describe which variables from your data set you plan to use in your analyses to help answer your research question, with a brief justification for your choice.
6. Perform some exploratory data analyses on your data in R to get a sense of whether
a. You have a large enough data set to work with
b. Your data has enough interesting and relevant attributes that you can apply to your
research question
c. Your data seems to be feasible to work with
You should only include in your proposal a selection of graphical displays and/or summary
statistics that are relevant to your research question.
7. Describe any data cleaning/refining you think you will need to do on your data set before any
analysis can be conducted.
8. A brief description of which analyses or methodologies you intend to apply to your data. This
portion should include some exploratory data analyses to justify your selection of methodologies (e.g. “I am using a side-by-side boxplot over a histogram because I want to see/verify _____.” Or I plan to use confidence intervals because I want to study ____ about the data.”).
FORMATTING PROPOSAL
Your proposal should be written using full sentences and include proper grammar and spelling, using course-level terminology as appropriate. At the same time, your project plans and ideas should be communicated clearly.
To write your proposal, it is recommended that you organize your tasks into sections, for example:
SECTION 1: STUDY QUESTION
SECTION 2: DESCRIPTION OF DATA (2-3 SENTENCES) INCLUDING LINK AND CITATION OF YOUR DATA SECTION 3: DESCRIPTION OT RELEVANT VARIABLES TO BE STUDIED (3-4 SENTENCES)
SECTION 4: DESCRIPTION OF METHODS THAT YOU WILL USE (2-3 SENTENCES PER METHODOLOGY)
ANTICIPATED DATA CLEANING/REFINING
INITIAL EXPLORATORY DATA ANALYSIS
LINEAR REGRESSION ANALYSIS (ONLY IF YOU ARE WORKING WITH A PARTNER) METHODOLOGY 1
METHOLODOGY 2
The written portion of your proposal should be approximately 500 words long, no more than 1 page in length (Times New Roman/Arial/Calibri, size 12 font), and maximum two pages when including your initial exploratory data analysis.
SUBMISSION
Submission link on Quercus will be available starting March 15, 2022. Your submission must include:
§ Written proposal with EDA (Times New Roman/Arial/Calibri, size 12 font, 500 words, ≤ 2 pages) § R markdown file for the EDA conducted
§ Data file you used in R markdown in your EDA
If working with a partner, you should not submit your proposal until after your group has been formed on Quercus, which will be available no later than March 20, 2022 (this is based on when your group submits their pair work contract).
Note the proposal is due by 8 PM ET on March 21.
GRADING RUBRIC
Criteria 3 2 1 0
Section 1: Study Question (2 points)
Study question/goal is clear and specific to the data set selected.
Study question/goal is unclear OR is not suitable for the data set selected.
Study question is missing OR does not align at all with the data set selected.
Section 2: Description of Data
(3 points)
Self-Test: Can a reader understand your data set without returning to data source?
Context of data is clearly described, including brief description of (relevant) variables contained.
Context of data is missing some details and/or variable descriptions are missing important details (e.g. rainfall variable measures amount of daily rainfall but missing descriptions like how they are measured, and units and frequency of measurement)
Context of data is not provided OR
description of variables contained in data set is missing.
Citation Missing
Description of data set is missing
If the description fails the self- test
Section 3: Variable Selection
(3 points)
Variables selected are well- justified and suitable for the study question/goal and analyses planned.
Variables selected are suitable for the study question/goal and analyses planned. Justification is missing or vague.
Variables selected are not ideal for the study question/goal and analyses planned with weak or missing justification of selection.
Variables selected are not suitable for the study question/goal and analyses planned.
Section 4: Exploratory Data Analysis
(2 points)
Graphical display and numerical summaries are carefully selected to visually inspect the selected variables and study question/goal
Includes appropriate details in labels, titles. Aesthetics are also carefully selected.
Graphical display and numerical summaries are not optimal for inspecting the selected variables and study question/goal
Missing significant details in labels and titles, or poor aesthetic choices (e.g. too few/many bins, selecting summaries without justification, plotting/calculating every summary, even if not very relevant to study)
Missing OR
Selection is not relevant to study question/goal.
GRADING RUBRIC (CONTINUED)
Criteria 6 4.5 3 1.5 0
Section 4: Methodologies (includes SLR if paired)
(6 points)
The methodologies selected are appropriate for the data, is relevant to addressing study question/goal
Investigates the study question/goal from multiple perspectives
The methodologies selected are appropriate for the data
Is relevant to addressing study question/goal, but lacks some variety in conclusions/inferences that can be drawn (e.g. Confidence interval and hypothesis testing on the same parameter)
The methodologies selected are appropriate for the data
Relevance of methodology to study question/goal is unclear
At most one of the methodologies is appropriate for the data
Relevance of methodology to study question/goal is unclear
Missing OR
Not appropriate for data/question/goal (e.g. simulation methods without distribution info)
STA238 – FINAL PROJECT PAIR CONTRACT
Group Members UTORid U of T Email Address and/or Primary Email Address
Before commencing any work, make sure you and your potential partner meet and write out a plan for the following events. This is a contract you and your partner agree to uphold and is intended to facilitate clear communication of expectations in quality and timeline of work for the project, conflict resolution plans, meeting availability, etc. to each other so there are no surprises or unfair distribution of work in the partnership.
In addition to the agreements below, we expect students to communicate openly and respectfully with their peers and be active participants when choosing to work in a partnership. This project is a shared final project that has a sizeable weight to your individual final course grades. Any conflicts that arise that could not be resolved in a reasonable amount of time (e.g. 1-2 days) must be communicated to the course as soon as possible and at least 5 days before the final project due date so that we can help with the resolution process. We cannot assist if we are only notified at the last minute of ongoing conflict. For this reason, group members are urged to schedule regular check-ins to ensure both parties are making good progress on their respective duties to avoid any last-minute surprises. If conflicts arise during the proposal period, this is good indication to reconsider working with said partner.
When working with a partner, it is recommended that each member documents and keeps a record of the work they have completed individually. Should a conflict arise that cannot be resolved, and there is no documentation of who completed what work, each partner will be expected to complete the project independently with a new data set.
This contract MUST be completed and submitted HERE between March 15 and March 18 if you plan to work with a
partner. If this is not received by March 18, you will not be permitted to work together.
Communication Method
How will you and your partner communicate for this project (Teams, email, Zoom, Shared document messages, in-person, etc.)? How often do you expect your partner to check for updates/messages? Are there ‘blackout’ times that both parties should respect (e.g. no messages after 10 PM)?
E.g. We will communicate over WhatsApp about our individual progress, which we should each check every other day. Any changes to plans will be communicated as soon as possible. No messages between 12 AM and 9 AM please!
Meeting Plans
How often and how long do you and your partner expect to work/discuss synchronously about the proposal and final project? Note the due date of March 21 is firm.
E.g. We plan to have one meeting on the week of ____ to decide how to split the work, another meeting to put together our individual parts, and a third meeting to review the proposal to make sure the text flows seamlessly.
Other Academic/Personal Obligations
Are there any other important academic or other obligations that you know will impact your availability to work on the project? If so, each member should list the dates/times they know they will be unavailable due to other obligations. Note: You do not need to disclose the nature of these obligations, but this helps set expectations so there are no surprises.
Shared Work Plan
Briefly outline how you and your partner plan to share the workload for this proposal and project. Both members are expected to contribute equally to the project, but this does not necessarily mean that both members will do the exact same work. E.g. Being responsible for editing, one type of analysis, peer reviewing, etc.
What tangible work does each member expect to have completed and by what date?
Conflict Resolution
If something comes up that will impact your timeline, you agree to disclose this to your partner ASAP and have a discussion on how this will affect your group/shared work. It’s a good idea to plan for these scenarios. In the case where you or your partner are suddenly unable to meet a deadline or does not turn in work by the agreed upon date, what resolution plans do you and your partner have in place?
Note: If issues are not resolved after 1-2 days, please alert the teaching team no later than 3 days before the proposal due date and 5 days before the project due date.
Miscellaneous
Are there any other specific needs you or your partner expect when working together that hasn’t been addressed above? Discuss and list any agreed upon expectations below.
By signing below, you agree to adhere to the contract and expectations. Upon signing, each group must submit ONE copy of the signed group contract to the following MS Forms. Make sure to name your file with your UTORids in the following format: UTORid1-UTORid2.pdf.
Group Members UTORid Signature
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com