APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
Crunching the Numbers on COVID-19
Preamble
The Final Project focuses on the reading and manipulation of data on the worldwide spread of COVID-19. As engineers around the world, including at UofT, are developing treatments, building ventilators, redesigning supply chains, and repurposing production lines, students of APS106 will get an opportunity to access real data on COVID-19 cases, manipulate data into a form that can be visualized, and develop data structures that allow quick access to the data. We hope that this task will give you a sense of the power of coding to gather and display data and help in understanding important worldwide events.
Submit the Final Project via the MarkUs system by Saturday April 18th, 11:59pm EDT
Each part of the project is graded based on the following criteria:
• a “pre-project” document (called project_report.pdf) specifying an algorithm
plan and a programming plan. As with the pre-labs, these plans should be created before you start coding and do not have to be exactly what you do. Marking is based on the extent to which you have demonstrated thought about the task and have a clear approach to attacking it – even if that approach is modified by what you actually do/code. See Details of the Pre-Project Document on page 6.
• automated tests we run via MarkUs
• a set of tests that you create: you will submit tests for your code and accompanying
documentation explaining what each test accomplishes. Grading is based on the comprehensiveness of your tests and the logic of your explanations. See A Note About Tests on page 7.
• the quality of your code: clarity, simplicity, meaningful variable names, documentation, etc.
Each part of this project has a different degree of difficulty; however, each is worth the same number of marks. See the individual “Marking Scheme” in each part below.
An important additional difference from the labs: MarkUs will evaluate your code with tests that we write (just like in the labs). However, we are not providing those tests to you. We are providing several very simple MarkUs tests – see A Note About Tests on page 7. It is up to you to create tests for your code.
The project problem comes with a package of files that includes: a data file (in csv format), a title page for your pre-project document, starter code (in project.py), and helper functions that we have written for you (in project_helper.py). Please unzip the project_problem.zip file. See Description of the Starter Package on page 7.
IMPORTANT: DO NOT CHANGE ANY FILENAMES
1
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
Problem
To complete this project, you will build a program that reads in data on the geographic spread of COVID-19 over time and then filters the data to build data structures that can be used for visualization and efficient access.
Part 1 – Parsing COVID-19 Data [15 Marks]
In this part, you will complete the parse_covid19 function which reads a file containing data on the number of COVID-19 cases worldwide. The argument to the function is: filename, a string, the filename for a csv data file.
The csv file will have the following format:
Province_State,Country_Region,Last_Update,Confirmed,Deaths ,Afghanistan,2020-04-03,281,6 ,Afghanistan,2020-04-04,299,7 Alberta,Canada,2020-04-03,969,13 Alberta,Canada,2020-04-04,1075,18 Ontario,Canada,2020-04-03,3255,67 Ontario,Canada,2020-04-04,3630,94 ,Norway,2020-04-03,5370,59
,Norway,2020-04-04,5550,62
Notes:
• The file has a header row as shown above.
• Each row of the file after the header contains the total number of confirmed cases and
deaths for a given country and given date. There are multiple entries for the same country for different dates. There are also multiple entries for the same date and same country when the data has been broken into regions inside the country.
• Some of the rows may have an empty cell in the first column (e.g., the first row above starts with a comma, indicating that the first cell is empty).
• You can make no assumptions about the order of the data in the file (i.e., it is not necessarily alphabetical by country name nor in chronological order).
The parse_covid19 function returns a dictionary with the following format:
{country : [(date, num_cases), (date, num_cases), …], …}.
In terms of data types, the dictionary has the following format:
{str : [(str, int), (str, int), …], …}.
The keys are strings corresponding to country name and the values are a list of tuples where each tuple is:
• a string representing a date (in YYYY-MM-DD format). This is the Last_Update date from the corresponding entry in the csv file.
• an integer which is the sum of the number of confirmed cases on that date for that country. If a country has multiple entries for a given date, those entries must be summed.
2
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
The tuples do not have to be in any particular order in the list.
Example:
{‘Afghanistan’: [(‘2020-04-03’, 281), (‘2020-04-04’, 299)], ‘Canada’: [(‘2020-04-03’, 4224), (‘2020-04-04’, 4705)], ‘Norway’: [(‘2020-04- 03’, 5370), (‘2020-04-04’, 5550)]}
Part 1 Marking Scheme
Part 2 – Selecting Countries [15 Marks]
In this part, you will complete the select_countries function which takes a list of countries and a dictionary like the one returned by parse_covid19 and returns another dictionary of the same format containing only entries for countries in the list.
The arguments to the function in the following order are:
• country_names: a list of strings. Names of some countries.
• covid19_data: a dictionary like the one returned by parse_covid19. It is not
necessarily the case that all countries in country_names have entries in covid19_data. Just skip such countries.
The select_countries function returns a dictionary with the same format as covid19_data.
Part 2 Marking Scheme
SPECIAL VISUALIZATION: The output of select_countries can be passed to the display_growth() function that has been written for you (see project_helper.py) to graph the growth of COVID-19 over time in the selected countries.
Part 3 – COVID-19 Country Class [15 Marks]
In this part, you will define a class called Covid_Country that will store all the data about COVID-19 cases for a specified country.
Component
Value
Pre-project
4 marks
Automated MarkUs Tests
8 marks
Student tests and explanations
1 mark
Code quality
2 marks
Component
Value
Pre-project
3 marks
Automated MarkUs Tests
9 marks
Student tests and explanations
1 mark
Code quality
2 marks
3
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
An instance of Covid_Country will have two data attributes:
• country_name: the name of the country, a str
• daily_count: the daily number of confirmed cases for the country, a list of tuples of
the same format as the dictionary values in Parts 1 and 2: [(date, num_cases), (date, num_cases), …].
Covid_Country objects are created as follows:
>>> Norway = Covid_Country(“Norway”, covid19_data)
The constructor takes in a country name, as a string, and a dictionary, like the one generated by parse_covid19. The data attribute country_name is assigned to the string and the daily_count attribute is set to be the COVID-19 data (i.e., all daily confirmed cases) for the country in country_name in the dictionary passed in. Technically, daily_count will be an alias to the value in the dictionary. There are no additional or optional arguments to the constructor. You can assume that an entry for country_name will exist in the passed-in dictionary.
This class will also have a method day_count(date) that takes in a date, a string in the format YYYY-MM-DD, and returns the number of confirmed cases for that date for the country. If the date does not exist for that country, the method should return None.
This is what a sample instantiation of Covid_Country and a call to the day_count() method looks like with a dictionary created (in Part 1) from the sample csv file shown above:
>>> Norway = Covid_Country(“Norway”, covid19_data)
>>> print(Norway.country_name)
>>> Norway
>>> print(Norway.daily_count)
>>> [(‘2020-04-03’, 5370), (‘2020-04-04’, 5550)]
>>> print(Norway.day_count(‘2020-04-04’)) >>> 5550
Part 3 Marking Scheme
Component
Value
Pre-project
3 marks
Automated MarkUs Tests
9 marks
Student tests and explanations
1 mark
Code quality
2 marks
4
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
Part 4 – Building a Binary Tree [15 Marks]
In this part, you will create a class Split_Node and implement two methods: the constructor and build_tree. The tree will organize a list of Covid_Country objects (see Part 3) into a binary tree that can be quickly used to identify countries with approximately the same number of cases on a given date.
The Split_Node class will have the following data attributes: • split_number:afloat
• countries:alistofCovid_Countryobjects
• left: a Split_Node indicating the left child in the binary tree
• right: a Split_Node indicating the right child in the binary tree
The first method of Split_Node you will create is the constructor. There are no optional arguments to the constructor and the method should set all data attributes to appropriate default values that you choose. The constructor should enable a Split_Node object to be created with the following code:
>>> sp_node = Split_Node()
The second method is build_tree(country_list, date): country_list is a list of Covid_Country objects and date is a string in YYYY-MM-DD format. You can assume that country_list will not be empty and that each element of country_list will have an entry for date.
The build_tree method does the following:
1. Calculates the mean number of cases across all countries in the list on date and assigns that
value to split_number. This calculation must use the
Covid_Country.day_count() method.
2. Divides country_list into three lists:
a. List 1: Countries with a day_count() strictly less than 0.7 * split_number.
b. List 2: Countries with a day_count() strictly greater than 1.3 * split_number.
c. List 3: The rest of the countries from country_list.
3. If List 1 is not empty, it should be used in a recursive call to build_tree on a new Split_Node assigned to the left data attribute. If List 1 is empty, nothing should be done with it.
4. If List 2 is not empty, it should be used in a recursive call to build_tree on a new Split_Node assigned to the right data attribute. If List 2 is empty nothing should be done with it.
5. List 3 should be assigned to the countries data attribute of the current Split_Node. List 3 could be empty, that is fine.
6. The return value from build_tree is None.
5
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
Part 4 Marking Scheme
SPECIAL VISUALIZATION: After running build_tree, a Split_Node object can be passed to the display_tree() function that has been written for you (see project_helper.py) to print out the groups of countries with similar counts.
Details of the Pre-Project Document
The pre-project document (like your “pre-labs”) should contain your algorithm plan and programming plan for each section of the project. Only one pre-project file can be submitted. The document must be in PDF format and typed (e.g., using Microsoft Word or other word processing software) – do not submit a picture of hand-written notes. Pictures of diagrams may be included: they must be embedded in the PDF and should be have a small enough memory size that the entire document meets the 5 Mb requirement below.
Your document must start with the title page and honour statement found in APS106_FinalProject_TitlePage.docx in the starter package. Your name and student number must appear on every page of the document. See Description of the Starter Package below.
Your entry for each part of the project should not exceed approximately 500 words. This is a guideline meant to reflect our expectations and is meant to stop you spending too much time on this part of the project. Longer plans are not necessarily better – don’t go crazy.
Appendix
As explained in the next section, the tests you submit should be written as code in the project.py file. However, (hint, hint) some of the tests (*cough* Part 1) will likely also consist of small csv files. Such files should be included (copy-pasted) in your pre-project document together with the written explanation of what the test is assessing. For each test, you should include a title (e.g., “Part 1 Test 1”), a short explanation of the test, the name of the csv file, and the csv file itself. The csv files should be as short as possible – only large enough to test the intended functionality.
Component
Value
Pre-project
4 marks
Automated MarkUs Tests
8 marks
Student tests and explanations
1 mark
Code quality
2 marks
The entire pre-project document must be less than 5 Mb. MarkUs will not accept larger documents.
APS106 is not responsible for documents exceeding the 5 Mb limit.
6
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
A Note About Tests
You must submit tests for each part of the project. These tests are in the form of Python code together with specified input and expected output and a written description of what the test does and why you proposed it. With the exception of possible csv files (see above), the tests should only be in the form of Python code and comments. These tests should be written and called from the run_my_tests() function in the project.py file (see below). If necessary, you may create other test functions, however each one should be called from run_my_tests().
The tests will be marked based on the extent to which they logically and comprehensively test the functionality of the code in each part of the project. Each part of the project specifies its marking scheme.
The tests do not have to automatically check if the output matches what is expected. While the input and expected output must be indicated in the test documentation, it is sufficient to manually compare the output to the expected output.
Unlike in your labs, we are not providing MarkUs tests to evaluate your code. It is your responsibility to create sufficient tests to be confident in the correctness of your code.
We are providing some very basic MarkUs tests to you. For Parts 1 and 2, the tests check:
• The submitted file contains a function with the expected name.
• The arguments to the function are correctly named and in the correct order.
For Parts 3 and 4, the tests check:
• The submitted file contains a class with the expected name.
• The class contains all expected methods.
• The arguments to the methods are correctly named and in the correct order.
These tests are not even close to sufficient for assessing if your code works. That is up to your tests. Our tests provide a “sanity check” to help you figure-out if you are on the right-track.
You will be able to run these tests in the same way you ran MarkUs tests for labs 2-9. That is, submit your code to MarkUs and then click on “Run Tests” under the automated testing tab. After a few minutes, refresh the page and the results of the tests will be displayed.
Description of the Starter Package
Please download project_problem.zip from Quercus and unzip the file to access the starter package. Once unzipped, you will find a package that includes the following:
1. Data file (in csv format)
This file contains information on the geographic spread of COVID-19 over time. This file (or any other csv file with the same structure as this file) will be read by parse_covid19(). Please see Part 1 for a full description of the data file.
7
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
2. Starter code (in project.py)
This is where you will be writing your code for the final project. Please refer to the table below for a brief description of each component in the project.py file.
3. Helper functions that are already written for you (in project_helper.py)
This file includes two helper functions to allow you to visualize the output of your code. You do not need to modify this file. Please refer to the table for a description of the two helper functions provided.
4. The title page for your pre-project document (APS106_FinalProject_TitlePage.docx)
This file must be used as the first page in your pre-project document. Please fill it out with the required information, including your name and student number on each page.
Item
Description
parse_covid19 function
Write this function to solve Part 1 of the project.
select_countries
function
Write this function to solve Part 2 of the project.
Covid_Country class
Write this class to solve Part 3 of the project.
Split_Node class
Write this class to solve Part 4 of the project.
run_my_tests function
This function is provided to assist with testing your project. You must include all your test cases in this function. See A Note About Tests above for details.
run_tests flag
Set this flag to True to run your test cases in the run_my_tests function. Set this flag to False if you do not want to run your test cases.
run_visualization
flag
As an optional activity, set this flag to True to visualize the output from the select_countries function graphed over time. You will also see a representation of your binary tree. To turn-off the visualization, set the flag to False.
if __name__ ==
“__main__”:
You should not make any changes to the remainder of the project.py file. This code is used to run your tests and output visualizations of your project code.
display_growth
function
This function is already written for you in project_helper.py. As an optional activity, pass the output of the select_countries into display_growth() to view a graphical representation of the growth of COVID-19 over time.
display_tree function
This function is already written for you in project_helper.py. As an optional activity, pass a Split_Node object and date string into this function to print your binary tree.
Table 1: Description of the starter and helper code.
8
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
Submitting Your Project
The APS106 Final Project is due by April 18th 11:59PM (EDT) on MarkUs. The late penalty is –20% per 24-hour period past the deadline; any submissions considered incomplete are subject to the late penalty until all parts are submitted correctly.
All students are encouraged to carefully check and then submit their projects prior to the deadline, with all required components.
Please note that technical difficulty is not in-itself a valid reason for late submissions: students are strongly encouraged to verify and submit their completed assignments early, rather than wait for the last minute. Issues such as overloading of the MarkUs system, exceeding the 5 Mb size limitation, intermittent outages are all considered part of the reason why we encourage students to submit early.
Separately, the minimum 1-week completion time is in-place not as an indication that this project will take a week to complete (it should NOT take the majority of students that long to finish), but is rather so that it accounts for challenges faced with moving this to an online system. More time doesn’t necessarily mean a better product: work smartly and you can do this in a few days at most.
The APS106 team will be using advanced similarity-checking software on all submissions and will notify students as per university regulations if academic misconduct is detected. We have issued, and will continue to issue, Academic Misconduct penalties as appropriate, especially with this Final Project.
Your work for this project will be submitted as two files:
1. A python file named project.py containing all the required functions and classes for
each part of the project along with your tests and supporting documentation.
2. A PDF named project_report.pdf containing the pre-project document and
appendix. The file size must be less than 5 MB. Larger files will not be accepted.
Both files should be submitted to the “Final_Project: Project” assignment on MarkUs. MarkUs will only accept submission of the two files listed above. After submission, you should review the submitted files listed under the “Submissions” tab and verify that both files correctly uploaded to MarkUs.
IMPORTANT: Do not change any file, function, class, or method names. Do not include any input() or print() statements in the submission of your project.py file.
Academic Integrity
The project is “open book” meaning that students are allowed to use a Python IDE (e.g. Wing 101), all course material (lecture notes and videos, labs, textbook), and other offline and online resources with the following restrictions:
9
APS106 2020S Final Project Due date: April 18, 11:59 pm EDT
• students are not to ask questions or otherwise consult any other person (either in the course or not in the course) other than the course instructors via the APS106 piazza site;
• students should not post answers to Piazza or any other bulletin board site on any topic that can reasonably be understood to be relevant to the final project;
• students are not to submit work not wholly created by the student;
• students should not copy code from anywhere;
• students are not to collaborate in any way – this is an individual assignment.
Doing any of the above is an academic offense. We will be using tools to detect such offenses.
Submission of your final assessment package constitutes agreement with the following statement.
In submitting this assessment, I confirm that my conduct adheres to the Code of Behaviour on Academic Matters. I confirm that I did not act in such a way that would constitute cheating, misrepresentation, or unfairness, including but not limited to, using unauthorized aids and assistance, impersonating another person, and committing plagiarism. I pledge upon my honour that I have not violated the Faculty of Applied Science & Engineering’s Honour Code during this assessment.
Given the difficult and unusual situation in which this final assessment is being administered, any academic offences will be pursued to the full extent of University regulations. Note that it is standard FASE policy that students who are found to have committed an academic offense are not allowed to subsequently drop a course. Please think about this given the CR/NCR and late drop policy that the FASE has adopted for the 2020S semester.
10