There are 2 projects to choose from.
(1) Food networks based on and extended from Barabasi.
(2) Corona virus outbreak – dataset from Kaggle or Power BI (2 options!)
Due Dates
Each project has 3 parts
Part 1- Preliminary Analysis: April 23, 2020
Part 2 – Presentations: (all team members must present): 14 or 21 May, 2020 Part 3 – Code and Data: May 25, 2020 – via LMS
Project is worth: 30% of the final grade
Must be undertaken in groups of 2-4 (discuss with the coordinator if group size may differ from the specified).
Submit Zipped Folder containing: Jupyter Notebook, PDF version of the report, presentation slides (PDF or PPT formats), and supporting data.
We will have 10-minute team presentations in class in the last 2 weeks of classes.
Project Option 1
The paper is uploaded with the Projects’ descriptions.
Please also download the tables provided by the authors to recreate their results.
Next get the Russian food recipes data from any repository/website you choose and create an additional Russian food network and compare to the networks in Barabasi’s paper.
Russian food is a must, and anything that is not in the paper is a benefit!
Part 1 Submission:
Code and table in the same format as data available from the authors for the Russian food recipes. Outline which comparisons you will be making as you develop the project further.
Project Option 2
Download the latest Corona virus dataset on Kaggle:
[NeurIPS 2020] Data Science for COVID-19 (DS4C)
Two files contain patient_id and you can match patients across the two files. Use both to construct a network. Build the network that grows based on the chronological progression infections. Investigate the network growth. Investigate any other factors relevant to the virus spread. Is it more likely in men or women, it is more likely in some age groups, etc.
You will need to determine which of the locations you will be using as most patients have multiple reported locations on different days prior to being confirmed as infected.
Part 1 submission:
Provide code to combine the patient information across 2 files and the combine dataset. Outline how you will be storing the gender and age information as you develop the project further.
Alternatively, you can download daily numbers for the global virus spread from Power BI (Global Case Locations): https://app.powerbi.com/view?r=eyJrIjoiMWI4Y2IzZGItYTc0Ni00YWQwLWIyYzEtNzUwYzlmYjkwMzVhIiwid CI6IjliNmExNzUwLTAzYTItNDJkNC05MGM2LWU4ZmM2NTNjYzRjOCIsImMiOjZ9
Check if you can write a program to provide the dates to the app to make it easier for you to obtain the daily data. You will then be able to show how the numbers grow each day and where. Analyse the data to show how the virus spreads, are there nodes that accumulate the number of cases faster, what nodes are able to decrease the number of cases faster than other nodes, etc.
If you prefer to work with other data available from Power BI, please get approval. You may be able to compare several counties, there is USA, Australia and Canada state data that may be interesting to compare.