程序代写代做代考 In [1]:

In [1]:
# Basic
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Loading the dataset¶
1. Specify the link as url
2. Load data in a pandas dataframe
3. Converting the dateRep into a timestamp
In [2]:
url = ‘https://opendata.ecdc.europa.eu/covid19/casedistribution/csv’
covid_data = pd.read_csv(url)
covid_data[‘timestamp’] = pd.to_datetime(covid_data.dateRep, format=’%d/%m/%Y’)
covid_data.head()
Out[2]:

dateRep
day
month
year
cases
deaths
countriesAndTerritories
geoId
countryterritoryCode
popData2019
continentExp
Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
timestamp
0
06/08/2020
6
8
2020
67
4
Afghanistan
AF
AFG
38041757.0
Asia
2.578745
2020-08-06
1
05/08/2020
5
8
2020
82
6
Afghanistan
AF
AFG
38041757.0
Asia
2.896817
2020-08-05
2
04/08/2020
4
8
2020
37
4
Afghanistan
AF
AFG
38041757.0
Asia
2.975677
2020-08-04
3
03/08/2020
3
8
2020
0
1
Afghanistan
AF
AFG
38041757.0
Asia
3.246433
2020-08-03
4
02/08/2020
2
8
2020
0
0
Afghanistan
AF
AFG
38041757.0
Asia
3.703825
2020-08-02

Display Methods¶
Define methods for displaying the raw dataset:
• plot_history() takes a dataframe for one continent and displays the raw cases and deaths numbers on a time scale; y-axis is at log-scale
• plot_histogram() takes a dataframe for one continent and displays the raw cases and deaths numbers as a histogram; y-axis is at log-scale
These methods are example to visualize the dataset and identify outliers / potential false datapoints. The intention is to provide simple visualisations to help selecting data for the experiments.
In [3]:
def plot_history(covid_data_continent):
countries = covid_data_continent.geoId.unique()

fig = plt.figure(figsize=(30, 20))

for country , num in zip(countries, range(1,len(countries)-1)):
plot_df = covid_data_continent.loc[covid_data_continent.geoId == country]
ax = fig.add_subplot(8,7,num)
plot_df.plot(x=”timestamp”,y =[‘cases’, ‘deaths’], title=plot_df.countriesAndTerritories[:1].values[0], ax=ax, legend=False)
plt.yscale(‘log’)

plt.tight_layout()
plt.show()

def plot_histogram(covid_data_continent):
countries = covid_data_continent.geoId.unique()

fig = plt.figure(figsize=(30, 20))

for country , num in zip(countries, range(1,len(countries)-1)):
plot_df = covid_data_continent.loc[covid_data_continent.geoId == country]
ax = fig.add_subplot(8,7,num)
plot_df.plot.hist(x=”timestamp”,y =[‘cases’, ‘deaths’], title=plot_df.countriesAndTerritories[:1].values[0], ax=ax, legend=False)
plt.yscale(‘log’)

plt.tight_layout()
plt.show()

Displaying data¶
• Here we show how to subset the datasets and use the plot methods to get the displays.
• List item
In [4]:
covid_data.continentExp.unique()
Out[4]:
array([‘Asia’, ‘Europe’, ‘Africa’, ‘America’, ‘Oceania’, ‘Other’],
dtype=object)
In [5]:
plt.rcParams[“figure.figsize”] = (20, 6)
plot_df = covid_data.loc[covid_data.geoId == ‘NO’]
plot_df.plot(x=”timestamp”,y =[‘cases’, ‘deaths’], title=plot_df.countriesAndTerritories[:1].values[0])
#plt.yscale(‘log’)
Out[5]:


In [6]:
plot_df.plot(x=”timestamp”,y =[‘cases’, ‘deaths’], title=plot_df.countriesAndTerritories[:1].values[0])
plt.yscale(‘log’)


In [7]:
#plot_history(covid_data.loc[covid_data.continentExp == ‘Oceania’])
#plot_history(covid_data.loc[covid_data.continentExp == ‘America’])
#plot_history(covid_data.loc[covid_data.continentExp == ‘Asia’])
plot_history(covid_data.loc[covid_data.continentExp == ‘Europe’])


In [8]:
plot_histogram(covid_data.loc[covid_data.continentExp == ‘Europe’])


In [9]:
subset = covid_data.loc[covid_data.continentExp == ‘Europe’] # only including Europe
# creating a subset for only July data
subset = subset.set_index(‘timestamp’)
subset = subset[‘2020-07-01′:’2020-07-30’]
subset
Out[9]:

dateRep
day
month
year
cases
deaths
countriesAndTerritories
geoId
countryterritoryCode
popData2019
continentExp
Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
timestamp

2020-07-30
30/07/2020
30
7
2020
108
2
Albania
AL
ALB
2862427.0
Europe
47.267581
2020-07-29
29/07/2020
29
7
2020
117
4
Albania
AL
ALB
2862427.0
Europe
46.464067
2020-07-28
28/07/2020
28
7
2020
117
6
Albania
AL
ALB
2862427.0
Europe
45.730424
2020-07-27
27/07/2020
27
7
2020
126
4
Albania
AL
ALB
2862427.0
Europe
45.730424
2020-07-26
26/07/2020
26
7
2020
67
6
Albania
AL
ALB
2862427.0
Europe
44.228202













2020-07-05
05/07/2020
5
7
2020
579
67
United_Kingdom
UK
GBR
66647112.0
Europe
14.332204
2020-07-04
04/07/2020
4
7
2020
602
136
United_Kingdom
UK
GBR
66647112.0
Europe
14.942883
2020-07-03
03/07/2020
3
7
2020
651
89
United_Kingdom
UK
GBR
66647112.0
Europe
15.580570
2020-07-02
02/07/2020
2
7
2020
617
176
United_Kingdom
UK
GBR
66647112.0
Europe
16.123729
2020-07-01
01/07/2020
1
7
2020
730
155
United_Kingdom
UK
GBR
66647112.0
Europe
16.851443
1620 rows × 12 columns
In [10]:
plot_histogram(subset.reset_index())


In [ ]: