Statistics 507, Fall 2021 (../index.html)
Problem Set 6
Due Friday November 12 by 5pm.
Instructions
Complete all questions of the assignment below and submit to Canvas by the due date. Remember, if
you are using late days you should submit a draft of the assignment by the due date and leave a
comment indicating how many late days you want to use.
For this problem set, you should submit source code as a plain text markdown script (with extension
.md ) and an associated Jupyter notebook (with extension .ipynb ). Use Jupytext
(https://jupytext.readthedocs.io/en/latest/install.html) to associate the two files.
Questions on this and future problem sets may ask you to use concepts or ideas that have not been
discussed in class. One of the goals of these assignments is to help you learn to be independent, read
documentation, and otherwise make reasonable decisions about how to analyze and present data, code,
or other data science material.
You may discuss the problem set and its solution with your peers, but you are required to work
independently on the files to be submitted and to submit your own original work. If you use or closely
follow patterns or code from sources other than the course notes or texts you should cite the source.
In addition to the content of your submission, you will be graded on the quality of your source code and
the professionalism of your notebook file.
Maintain a consistent and literate style in your markdown script and work to make your notebook look
professional and well polished. Follow all style rules from previous problem sets.
Question 0 – GitHub Repository [20 points]
In this question you will create a GitHub (github.com) account and create a repository for the work
you’ve done in the course.
1. If you don’t already have one, create a GitHub account. Make sure your username is professional
and appropriate – I recommend new users use their umich unique name as their username.
2. Create a repository Stats507 (or similar if you have one of that name already). It should be a
public repository to facilitate grading.
All you need to include in your notebook is a link to this repository.
Question 1 – Using Git [50 points]
For each part in this question, please succinctly record the steps you take – through a web browser or at
the command line – to complete each part.
Provide enough detail that someone without prior knowledge of git could emulate the steps for
themselves, but otherwise be concise and omit details that are only relevant to (e.g.) how your local file
tree is setup.
https://jbhender.github.io/Stats507/F21/index.html
https://jupytext.readthedocs.io/en/latest/install.html
https://jbhender.github.io/Stats507/F21/ps/github.com
1. Extract your code from PS2, Question 3 into a stand alone script or notebook and add it to the
repo.
2. Create a README that briefly documents the purpose of the repo created in the warmup. In the
README, briefly document the script you included for the previous part – state what it dose and
for what purpose. The README should include a link to this file. Hint: Use a local file path for
the link.
3. Commit the changes from the previous step and push them to the remote. Include a direct link to
the commit from the remote’s history in your write up.
4. Create a branch named “ps4”. Checkout that branch and edit the file from step 3 to include
“Gender” as you did for PS4 Q1. Commit these changes to the branch and create an upstream
branch on GitHub to track this branch. Don’t delete the branch (at least until after the assignment
is graded).
5. Merge the “ps4” branch into the “main” branch. Include a direct link to the commit from the
remote’s history in your write up.
Question 2 – GitHub Collaboration [30 points]
In this you question will extract your notes on a Panda’s topic from PS4, Question 0 to a script of their
own. Then, you will collaborate to aggregate these into a single document for the course.
1. In your Stats507 repo make a folder called “pandas_notes”.
2. Extract your PS4, Question 0 topic tutorial and copy it into a script called “pd_topic_XYZ.py”
replacing XYZ with your UM unique name. Include your name and UM email on a title “slide”
(markdown cell) if you don’t have one already. Include a link in your writeup to this file.
It’s okay if this next step takes longer than the November 12 due date. There are due dates for each stage
of the tree to help us finish before the end of the term. Due dates are based on your “level” in the tree:
– Level 4 (root) – Tuesday, November 16
– Level 3 – Friday, November 19
– Level 2 – Tuesday, November 23
– Level 1 – Tuesday, November 30
– Level 0 – Friday, December 3.
3. Use the fan out tree distributed through Canvas for this step. Email the people above you on the
fan out tree (listed in columns “person1” and “person2”) and to ask for a link to their Stats507
repository. Complete the remainder of this step after the people above you on the tree. Clone the
repositories of the people above you on the tree, create a new Python script / notebook that adds
your topic to all topics above you. Name the combined script as indicated in the fan out tree.
Try to complete this question by your due date, so you don’t hold up the fan out. Please be professional
and respectful in your communications with one another. If either person (“person1” or “person2”)
above you on the tree have not completed their part by their due date, please reach out to those above
them on the list.
Note – those at the roots (level 4) of each tree (group) have a slightly different assignment; they should
clone the starter script (https://github.com/jbhender/Stats507_F21/blob/main/ps/starter_script.py)
and add their topic notes to it.
Note to those at level 0 – please email Dr. Henderson with a link when you’ve completed your portion.
https://github.com/jbhender/Stats507_F21/blob/main/ps/starter_script.py
Question 3 – PS Corrections [Optional, points vary]
This question is optional. If you lost points for style, lack of professional “polish” in your notebook or
any other easily correctable mistake on any of the graded problem sets may correct them here.
You may correct these types of mistakes on (up to) any 2 assignments. You may not, however,
completely redo a question or a major part of a question. For example, you will not receive credit for
replacing your solution with the official solution or a peer’s solution. The instructors have final say on
whether corrections are in the spirit of the question.
To receive credit for your corrections please follow the steps below.
1. Create a commit in your “Stats507” repository (from question 0) with the original version of your
source files for the problem set(s) you are correcting.
2. Copy, paste, and clearly label the GSI comments from Canvas into your solution for this question.
Clearly describe which comments you are making corrections for.
3. Make your corrections and commit them to your Stats507 repo. Include a direct link to the
commit (showing the diff) in your submission. All changes should be in a single commit to
facilitate this – you may wish to work in a branch if you need to make intermediate commits.
Do not include the corrections in the submission – just link to the commit showing the corrections.