2022 SA3 Midterm Exam
Using the NFL play by play data perform the following tasks and answer the following questions. Document all steps of this process and show your results. The final product should be an RMarkdown file where I can clearly follow your thought process as well as a knitted PDF file.
Use the methods that we learned in class to complete the tasks described below. The dataset is very large and is a fantastic example of why using R would be much better than using something like excel. This is fairly open ended so be creative and challenge yourself, but remember this is an exam, so you should try your best to show your mastery of the topics that we have worked on so far in class. Please number and answer all questions in your markdown file in the order that they are asked and use a different chunk to show the code that helped you arrive at your conclusion.
NFL FastR Data Explanation
Copyright By PowCoder代写 加微信 powcoder
You can use any resources (the book, labs, lectures, etc.) to complete this exam. I will be available to answer clarifications that you might have or questions that you might have about the data. Also, if you are feeling uncomfortable with American Football please reach out and I can help you formulate a research question to use for this exam. I would suggest reading every question below before you get started.
1. Load the data into R. How many total plays are there in the dataset? Describe 5 of the variables that you find interesting.
2. Is this dataset considered tidy? If it is tidy explain why, if it is not tidy explain why and clean the data using the TidyR and DplyR packages.
3. Use the Mutate function to create 3 new variables. Explain why you made these variables and what these variables mean.
a. 5 Points Extra Credit: Use a join to add data from another source to this dataset and explain why it would be helpful to have this added data in the data set.
4. Examine summary statistics of variables that seem interesting to you in the dataset. Formulate a research questions, and clearly state what your research question is. You will be working on this research question the rest of this assignment.
5. This dataset is very large and has a lot of unneeded variables. Subset/clean the dataset and create a new dataset for your research question. Explain what data and variables you decided to keep. Your new dataset should only include potentially relevant variables and relevant data for your research question.
6. Perform tasks associated with the EDA process that we worked on in class. This should include comparison of summary statistics as well as relevant simple charts using ggplot2. Explain how this information has helped you in your process of understanding how to approach your research question. (Remember it might be good to look at years/individual teams if relevant)
7. Go through bivariate analysis process with your dependent variable and at least 3 independent variables you think might be relevant. Interpret the results for each.
8. Build a relevant model using one of the methods discussed in class (Multivariate Regression, Logistic Regression). Try at least 3 different versions of the model and explain how you decided which version was best.
a. 5 Points Extra Credit: Create a CART Model/K-Means or Hierarchical Clustering model instead of a regression.
9. Think about how this model would be helpful for a team in the NFL or the league and explain how they might be able to use this information.
10. No model is perfect. Think about how you might be able to improve your model, and how that improvement might change your results/what impact (if any) it would have on your research question.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com