Methods of Data Analysis 1
University of Toronto
Department of Statistical Sciences STA302H1F/1001H1F Fall 2022
Updated by Departmental Request on Sept. 13, 2022
Copyright By PowCoder代写 加微信 powcoder
—————————————————————————————————————————————–
Section details:
LEC0101/2001: LEC0201: Classroom:
Wed. 11am-1pm, Fri. 11am-12pm Wed. 3pm-5pm, Fri. 3pm-4pm MS 3153 (all above lectures)
Instructor:
Course email:
O ce Hours:
Dr. UY9167
—————————————————————————————————————————————–
COURSE OVERVIEW
Course Description: The course provides a solid introduction to data analysis with a focus on the theory and application of linear regression. Topics to be covered include: initial examination of data, correlation, simple and multiple regression models using least squares, inference for regression parameters for normally distributed errors, confidence and prediction intervals, model diagnostics and remedial measures when the model assumptions are violated, interactions and dummy variables, ANOVA, and model selection and validation. Statistical software will be used throughout and will be required for the completion of various assessments during the term. The development of strong written communication skills will be emphasized.
Learning Outcomes: By the end of this course, all students should be able to:
1. Recognize the importance of assumptions and limitations of linear regression models to gauge when
linear models are appropriate to use and to be critical of their results.
2. Interpret the results of an analysis involving linear models for technical and non-technical audiences.
3. Apply methods of linear models and data cleaning to new datasets correctly using statistical software in a reproducible way.
4. Explain statistical concepts and theory of linear models to various audiences as would be required in the job market or collaborative environment.
5. Outline the correct use of linear models in a coherent and reproducible analysis plan.
6. Apply and extend linear model theory through completion of problem-solving questions
Pre-requisites: Pre-requisites are strictly enforced by the department, not the instructor. If you do not have the equivalent pre-requisites, you will be un-enrolled from the course. Students should have a second year statistics course, such as {STA238, STA248, STA255, or STA261}, a computer science such as {CSC108, CSC120, CSC121, or CSC148} and a mathematics course such as {MAT221(70%), MAT223, or MAT240} or equivalent preparation as determined by the department.
COURSE MATERIALS
Course Content: We have a common Quercus course page for sections L0101/2001 and L0201 of this course. All lecture slides, any recordings and materials will be posted on this Quercus course page. Further, any important announcements will also be posted in Quercus. Please make sure to check it regularly.
Textbook: This course does not strictly follow any particular textbook, but rather merges material from a number of sources. All of the below recommended textbooks are freely available as an electronic copy through the University of Toronto Library. Our two primary reference texts will be
• Linear Models in Statistics, 2nd edition by . Rencher and G. (Wiley).
• A Modern Approach to Regression with R, by . Sheather (Springer) Other helpful references from which practice problems may be assigned are:
• Applied Regression Modeling, 2nd edition, by (Wiley).
• Methods and Applications of Linear Models, 2nd edition, by . Hocking (Wiley) • Applied Linear Regression, 3rd edition, by (Wiley).
These are all useful books, but may present the material in a di↵erent order or in a di↵erent way. They are still good for additional explanation and practice problems. Other useful resources will be posted on the Quercus course page.
Statistical Software: We will be using the R Statistical Software for performing statistical analyses in this course. R is a free software that can either be downloaded onto your personal computer or used in a cloud environment. We encourage all students to use RStudio through the JupyterHub for University of Toronto. This will allow you to login with your o cial UofT credentials and use RStudio without the need for a local installation and can be run on any device that has access to an internet connection. More information about using RStudio in JupyterHub will be provided early in the term. R code shown in class will be available on the course page and, along with any additional resources, should be su cient to complete any assessment involving data analysis.
COURSE COMPONENTS
Lectures: Lectures will be conducted in person in MS 3153 for sections L0101/2001 and L0201. Slides will be available no later than the Tuesday night preceding class. Class time each week will comprise of a combination of lecturing, in-class activities, and code-along sessions. Where possible, you are encouraged to bring a laptop or tablet to follow along with the code.
O ce Hours: Instructor and TAs will hold o ce hours in a combination of online and in-person formats. The o ce hour schedule and mode of delivery will be posted on Quercus once finalized. It is recommended that you visit o ce hours whenever you have a question about the material. It is always important to have material clarified as quickly as possible. Don’t wait until the last minute to ask your questions!
ED Discussion Board: We will be using the ED-STEM Discussion Board as an online discussion forum, which can be accessed through the Quercus course page. All questions about course material should be posted here or asked during TA/instructor o ce hours. The instructor and TAs will monitor the board and will help answer questions but students are encouraged to answer posts and help their fellow classmates.
COMMUNICATION
How your instructor will communicate with you: All communication will be made through Quer- cus announcements or during lectures. Please ensure that you check Quercus regularly so you don’t miss anything important.
Where to send content questions: We will be using the ED Discussion board to collect student ques- tions regarding course content, assignments, etc. All questions should be posted here.
When to email the instructor: The instructor will only respond to emails of a private or sensitive na- ture. If you email the instructor with content related questions, you will be asked to repost your question on
the content board so the answer may benefit all students. Should you need to email the instructor about a sensitive or personal nature, please use your o cial mail.utoronto.ca email, include your full name, student number and lecture section (e.g. L0101) in the text. Send all course related emails to Please allow up to 48 hours for a reply. Emails will not be monitored on evenings and weekends.
A note on email and discussion board etiquette: Please make sure that you communicate politely and respectfully with all members of the teaching team and your fellow classmates. Written communica- tions can sometimes take a tone other than what was intended (e.g. can come o↵ as dismissive, rude or insulting), so make sure you re-read or read out loud your email/post before sending it to make sure it has the tone you intended. For more tips on respectful communication, see professional communication tips. The ED discussion board is a teaching and learning tool and therefore should only be used as such. Any posts that detract from the learning goal of the board will be removed to keep the board a safe space.
GRADING SCHEME
Both undergraduate and graduate students will be o↵ered two grading schemes that will be used to cal- culate your final grade. Your final grade for the course will automatically be determined by the higher of the two grading schemes. Undergraduate students will have the grading scheme as outlined below.
Graduate students will use the same grading schemes, with the exception that the Term Test will be worth 20% (or 24%, depending on scheme) while the Final Written Report (Part 3) will be worth 40%.
Assessment
In-class Participation
Pre-requisite and Syllabus Quiz
Reproducible Writing Exercise Part 1: Draft/Create
Part 2: Peer Feedback/Assess Part 3: Final Draft/Revise
Term Test (during scheduled class) LEC0101/2001
Final Project (3 parts)
Part 1: Research Question/Proposal Part 2: Analysis Flowchart
Part 3: Written Final Report
Date Due/Occurring
Ongoing during class time
By Sept. 26 at 11:59PM ET
Sept. 26 by 11:59PM ET Oct. 3 by 11:59PM ET Oct. 7 by 11:59PM ET
Oct. 26 11:10-12:50PM ET Oct. 26 3:10-4:50PM ET
Oct. 20 by 11:59PM ET Nov. 17 by 11:59PM ET Tentatively Dec. 20
12.5% 12.5% 30%
12.5% 12.5% 30%
Please note that the last day to drop the course without penalty is November 16, 2022.
MINIMUM PASSING REQUIREMENT
In order for the instructor to be able to reasonably assess the ability of each student with the course material, a minimum amount of work must be submitted to provide enough evidence of proficiency. To this end, students must submit the following assessments in order to be considered for a passing grade in the course: the term test, and part 3 of the final project. As these are summative assessments, if a student fails to submit one or more of these assessments (even if all other assessments have been completed), it will not be possible to gauge the student’s proficiency with the material and will therefore not be able to pass the course.
EVALUATION BREAKDOWN In-Class Participation:
We will be using Poll Everywhere to pose questions to students during lecture. Students will be asked to respond to these questions on any electronic device using Poll Everywhere. In order for your partici- pation to be recorded towards your final grade, students should register for an account at PollEv.com/ katherinedai702/register and use their UofT email address (this is how we will match responses to you for credit). It will be the student’s responsibility to ensure that they are logged into this account at the start of each class.
In-class participation is optional. Should a student choose not to participate, their final grade will be computed using Scheme 2. For students who opt in to participation, your in-class participation grade will be computed using the scheme below. Note that this scheme allows you to miss up to 25% of questions posed and still receive full credit for your participation throughout the term. Due to this flexibility, there is no accommodation for missed in-class participation for any reason.
% Polls Answered
0 < % answered < 25 25 % answered < 50 50 % answered < 75 75 % answered 100
Participation Mark (/4%) 0%
Syllabus and Pre-requisite Quiz: There will be 1 short multiple choice quiz early in the term to ensure that students are prepared for the course in terms of their knowledge of prerequisite material and the syllabus content. This quiz will be conducted on Quercus and will be open for students to take at any time until the deadline. Students will get 2 attempts and the highest score will be counted towards their final grade. On each attempt, students will be given 1 hour to complete the quiz, and each question will show up one at a time and will be locked once the question has been answered.
Reproducible Writing Exercise: This exercise is to highlight the importance of writing in science, specifically in a way that another independent researcher could reproduce what you have done based solely on a summary of your process. It also provides an opportunity for students to experience the scientific review and editing process. It will take place in three parts:
• Part 1 - Draft/Create: Students will submit a draft summary of a data analysis process that they applied to a dataset, for completion points (i.e not graded for credit only for completeness of sub- mission).
• Part 2 - Peer Feedback/Assess: Students will have their draft reviewed by another student (peer) who will attempt to replicate their analysis. The reviewer student will provide comments on what is good and what could be improved with the draft. This will be graded for completion only (i.e not graded for credit only for completeness of submission).
• Part 3 - Final Draft/Revise: Students will revise their original draft, taking into account the feedback provided by their peer reviewer and submit their final product for grades. They will also rate the feedback provided to them by their reviewer based on helpfulness.
Term Test: The term test will be conducted in person during the scheduled Wednesday class time for all sections (see top of page 1). The test will be approximately 1 hour and 40 minutes long. More details will be communicated closer to the test date. The term test will take place during each section’s scheduled lecture time on Wednesday October 26. The test will cover material from Weeks 0-6. All students will be required to write the test in the lecture time in which they are enrolled.
Final Project: The final project will be due during the final assessment period (date to be confirmed as soon as possible) and will consist of a data analysis on a novel dataset of your choice. Students will be required to demonstrate their understanding of the methods taught in lecture by developing a reasonable regression model that addresses a valid research question using the techniques taught in class. The students will be responsible for choosing the correct methods to apply and providing appropriate justifications defending their choices. The final project is a sca↵olded assessment involving 3 parts:
• Part 1- Research question and dataset selection: Students must find a dataset available online and define a research question that can be answered with this dataset using linear regression. Students will need to explain why their research question is important and how linear regression may be used to answer it. A short exploratory data analysis of the chosen dataset will also be required.
• Part 2 - Analysis Plan Flowchart: Students will be asked to put together a flowchart outlining the steps that they plan to take in their data analysis for the final project on their chosen dataset. This will help in developing a consistent analysis flow and make writing the final report easier.
• Part 3 - Final Project Report: Students will put together a scientific report that outlines the relevance of their proposed research question, the process of their analysis, the results of the performed data analysis, and a discussion of the meaning of the results as well as limitations of the analysis with respect to the statistical tools used/decisions made or the data used.
The final project will be done individually, and must be typed and submitted by the deadline. More detailed instructions will be provided at a later date.
LATE ASSESSMENT AND EXTENSION REQUEST POLICY
‘No Questions Asked’ Extensions: All students will have access to 2 ‘No Questions Asked’ (NQA) extensions to help manage multiple deadlines across multiple courses and for short-term illness or other unexpected situations. These are extensions of up to 48 hours each that a student can choose to use on any eligible assessment during the term without having to request an extension from the instructor. The NQA extensions work as follows:
• Students should not notify the instructor when using these extensions - we will simply accept the work up to 2 days late without any penalty.
• These extensions can only be used one at a time (i.e. cannot be combined) and only on the Revise part (part 3) of the Reproducible Writing Exercise and parts 1 and 2 of the final project.
• Once both NQA extensions have been used, late penalties will apply on any further assessment that is turned in late.
• We will apply these extensions by default to any eligible piece of work turned in late in the order that the work is turned in. This means that, for example, if a student submits the Revise portion of the Reproducible Writing Exercise and final project part 1 both late, they will have used up both their NQA extensions and thus late penalties will apply on any other late submission. We will not entertain requests for NQA extensions to be applied to later pieces of work after they have already been used on earlier submissions, nor will a student be able to request which assessment the NQA extensions will be applied towards.
• It is recommended, where possible, to keep the NQA extensions for assessments that have larger late penalties and/or are worth more towards your final grade.
Extreme Situations/Prolonged Illness Extensions: Should a student be experiencing a prolonged illness or other situation that prevents them from turning in their work even after the use of their ‘No Questions Asked’ extension, they should immediately contact their instructor and College Regis- trar to inform them of their situation. They should also submit an Absence Declaration form on ACORN that lists every day during which they were incapacitated and unable to work. Accommodations or further extensions will not be considered without a completed declaration, and will only be considered for extreme circumstances.
Accessibility-Related Extension Requests: Students registered with Accessibility Services should no- tify the instructor as soon as possible if additional time is needed on assessments that are eligible for extensions. Please notify the instructor by email of your situation and cc your accessibility ad- visor in the process. The instructor will work with the accessibility advisor to determine an appropriate extension for your situation.
Assessment Late Penalty/Extension Policy
Syllabus/pre-requisite quiz, and in-class participation
no late submissions/extensions
Reproducible Writing Exercise Parts 1 & 2
1-hour upload grace period, not eligible for any exten- sions, no late submissions accepted after grace period
Reproducible Writing Exercise Part 3
1-hour upload grace period, will accept submission up to 3 days after deadline with 5% lost per each day late, eligible for NQA extension.
Final Project Part 1 & 2
1-hour upload grace period, will accept up to 3 days after deadiine with 10% lost per each day late, eligible for NQA extension.
Final Project Part 3
1-hour upload grace period, not eligible for any exten- sions, no late submissions accepted after grace period
MISSED ASSESSMENT POLICY
If you experience a prolonged absence due to illness or emergency that prevents you from completing any number of assessments, please contact your College Registrar as soon as possible so that any necessary arrangements can be made.
Missed Quiz and in-class participation: There will be no accommodations made for missing the syl- labus and prerequisite quiz due to the flexible deadline and lengthy availability of the quiz on Quercus. There will be no accommodations made for missed in-class participation due to the flexible grading scheme (see evaluation breakdown for details).
Missed Reproducible Writing Exercise: Due to the sca↵olded nature of the exercise, there is no accommodation for missing Part 1 and 2 of the exercise. Part 3 of the exercise is eligible for an NQA extension (see above for details) but no further accommodations will be made.
Missed Term Test: If a student is experiencing a serious personal illness or emergency on the date of the test, the student must declare their absence on ACORN and notify the teaching team via email no later than one week after the date of the test. A make-up test will then be scheduled at a date and time determined by the instructor. The format of the make-up is at the discretion of the instructor and may not resemble the format of the original (e.g. an oral exam).
Missed Final Project: Part 1 and 2 of the final project are eligible for NQA extensions and late submis- sions will be accepted (see above for d
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com