Elements Of Data Processing (2020S1) – Week 2¶
DataFrames¶
DataFrames represents tabular data structure and can contain multiple rows and columns. They can be thought of as a dictionary of Series objects, and are one of the most important data structures you will use to store and manipulate information in data science.
A DataFrame has both row and column indices.
The Pandas DataFrame structure contains many useful methods to aid your analysis. Recall from week 1 the API reference is available which details all of the functionality provided by pandas. You will particularly need con consult the DataFrame reference page.

In [25]:
# as before, begin by importing the pandas library
import pandas as pd
In [26]:
# create a new series of the population
Aus_Population = {‘1990’:17065100, ‘2000’:19153000, ‘2007’:20827600,
‘2008’:21249200,’2009′:21691700,’2010′:22031750,
‘2011’:22340024, ‘2012’:22728254, ‘2013’:23117353}
population = pd.Series(Aus_Population)
In [27]:
# we will reuse the emissions data from last week
Aus_Emission = {‘1990’:15.45288167, ‘2000’:17.20060983, ‘2007’:17.86526004,
‘2008’:18.16087566,’2009′:18.20018196,’2010′:16.92095367,
‘2011’:16.86260095, ‘2012’:16.51938578, ‘2013’:16.34730205}
co2_Emission = pd.Series(Aus_Emission)
In [28]:
# verify the values in the series
population
Out[28]:
1990 17065100
2000 19153000
2007 20827600
2008 21249200
2009 21691700
2010 22031750
2011 22340024
2012 22728254
2013 23117353
dtype: int64
In [85]:
# create a DataFrame object from the series objects
australia = pd.DataFrame({‘co2_emission’:co2_Emission, ‘Population’:population})
australia
Out[85]:
co2_emission
Population
1990
15.452882
17065100
2000
17.200610
19153000
2007
17.865260
20827600
2008
18.160876
21249200
2009
18.200182
21691700
2010
16.920954
22031750
2011
16.862601
22340024
2012
16.519386
22728254
2013
16.347302
23117353
In [84]:
# create a DataFrame from a csv file
countries = pd.read_csv(‘data/countries.csv’,encoding = ‘ISO-8859-1’)
In [31]:
# check the top 10 countries in the DataFrame
countries.head(10) # the default value is set to 5
Out[31]:
Country
Region
IncomeGroup
0
Afghanistan
South Asia
Low income
1
Albania
Europe & Central Asia
Upper middle income
2
Algeria
Middle East & North Africa
Upper middle income
3
American Samoa
East Asia & Pacific
Upper middle income
4
Andorra
Europe & Central Asia
High income
5
Angola
Sub-Saharan Africa
Upper middle income
6
Antigua and Barbuda
Latin America & Caribbean
High income
7
Argentina
Latin America & Caribbean
Upper middle income
8
Armenia
Europe & Central Asia
Lower middle income
9
Aruba
Latin America & Caribbean
High income
In [32]:
# count the number of countries in each region
countries.Region.value_counts()
Out[32]:
Europe & Central Asia 58
Sub-Saharan Africa 48
Latin America & Caribbean 42
East Asia & Pacific 37
Middle East & North Africa 21
South Asia 8
North America 3
Name: Region, dtype: int64
In [33]:
# set the name of countries as the index
countries.set_index(‘Country’)
Out[33]:
Region
IncomeGroup
Country
Afghanistan
South Asia
Low income
Albania
Europe & Central Asia
Upper middle income
Algeria
Middle East & North Africa
Upper middle income
American Samoa
East Asia & Pacific
Upper middle income
Andorra
Europe & Central Asia
High income
Angola
Sub-Saharan Africa
Upper middle income
Antigua and Barbuda
Latin America & Caribbean
High income
Argentina
Latin America & Caribbean
Upper middle income
Armenia
Europe & Central Asia
Lower middle income
Aruba
Latin America & Caribbean
High income
Australia
East Asia & Pacific
High income
Austria
Europe & Central Asia
High income
Azerbaijan
Europe & Central Asia
Upper middle income
Bahamas, The
Latin America & Caribbean
High income
Bahrain
Middle East & North Africa
High income
Bangladesh
South Asia
Lower middle income
Barbados
Latin America & Caribbean
High income
Belarus
Europe & Central Asia
Upper middle income
Belgium
Europe & Central Asia
High income
Belize
Latin America & Caribbean
Upper middle income
Benin
Sub-Saharan Africa
Low income
Bermuda
North America
High income
Bhutan
South Asia
Lower middle income
Bolivia
Latin America & Caribbean
Lower middle income
Bosnia and Herzegovina
Europe & Central Asia
Upper middle income
Botswana
Sub-Saharan Africa
Upper middle income
Brazil
Latin America & Caribbean
Upper middle income
British Virgin Islands
Latin America & Caribbean
High income
Brunei Darussalam
East Asia & Pacific
High income
Bulgaria
Europe & Central Asia
Upper middle income
…
…
…
Sweden
Europe & Central Asia
High income
Switzerland
Europe & Central Asia
High income
Syrian Arab Republic
Middle East & North Africa
Lower middle income
Tajikistan
Europe & Central Asia
Lower middle income
Tanzania
Sub-Saharan Africa
Low income
Thailand
East Asia & Pacific
Upper middle income
Timor-Leste
East Asia & Pacific
Lower middle income
Togo
Sub-Saharan Africa
Low income
Tonga
East Asia & Pacific
Lower middle income
Trinidad and Tobago
Latin America & Caribbean
High income
Tunisia
Middle East & North Africa
Lower middle income
Turkey
Europe & Central Asia
Upper middle income
Turkmenistan
Europe & Central Asia
Upper middle income
Turks and Caicos Islands
Latin America & Caribbean
High income
Tuvalu
East Asia & Pacific
Upper middle income
Uganda
Sub-Saharan Africa
Low income
Ukraine
Europe & Central Asia
Lower middle income
United Arab Emirates
Middle East & North Africa
High income
United Kingdom
Europe & Central Asia
High income
United States
North America
High income
Uruguay
Latin America & Caribbean
High income
Uzbekistan
Europe & Central Asia
Lower middle income
Vanuatu
East Asia & Pacific
Lower middle income
Venezuela, RB
Latin America & Caribbean
Upper middle income
Vietnam
East Asia & Pacific
Lower middle income
Virgin Islands (U.S.)
Latin America & Caribbean
High income
West Bank and Gaza
Middle East & North Africa
Lower middle income
Yemen, Rep.
Middle East & North Africa
Lower middle income
Zambia
Sub-Saharan Africa
Lower middle income
Zimbabwe
Sub-Saharan Africa
Low income
217 rows × 2 columns
In [86]:
# create a new DataFrame for the CO2 emission from a csv file
emission = pd.read_csv(‘data/emission.csv’,encoding = ‘ISO-8859-1’)
#emission.head()
In [35]:
# Create a subset of emission dataset for Year 2010
yr2010 = emission[‘2010’]
names = emission[‘Country’]
yr2010.index = names
type(yr2010)
Out[35]:
pandas.core.series.Series
In [36]:
# Sort column values using sort_values
yr2010.sort_values()
Out[36]:
Country
Burundi 0.022480
Congo, Dem. Rep. 0.030197
Chad 0.043463
Rwanda 0.057354
Central African Republic 0.059398
Mali 0.063586
Somalia 0.063912
Niger 0.072026
Ethiopia 0.075215
Malawi 0.080690
Madagascar 0.092895
Eritrea 0.109471
Mozambique 0.112476
Uganda 0.118253
Burkina Faso 0.125501
Sierra Leone 0.125706
Guinea-Bissau 0.145855
Tanzania 0.155682
Comoros 0.183692
Nepal 0.188153
Zambia 0.194713
Liberia 0.201046
Haiti 0.212694
Timor-Leste 0.220073
Guinea 0.236422
Myanmar 0.241924
Lao PDR 0.261822
Gambia, The 0.279411
Kenya 0.301883
Afghanistan 0.302936
…
Canada 14.485639
Kazakhstan 15.110081
Gibraltar 15.153879
Australia 16.920954
Oman 17.112061
United States 17.484803
Saudi Arabia 18.531173
United Arab Emirates 19.306504
Brunei Darussalam 20.856947
Luxembourg 21.635136
Bahrain 23.101200
Aruba 24.182702
Kuwait 29.294309
Trinidad and Tobago 36.073741
Qatar 41.131162
American Samoa NaN
Channel Islands NaN
Curacao NaN
Guam NaN
Isle of Man NaN
Kosovo NaN
Monaco NaN
Northern Mariana Islands NaN
Puerto Rico NaN
San Marino NaN
Sint Maarten (Dutch part) NaN
South Sudan NaN
St. Martin (French part) NaN
Tuvalu NaN
Virgin Islands (U.S.) NaN
Name: 2010, Length: 217, dtype: float64
In [37]:
#Sort column values to find the top countries
yr2010.sort_values(ascending = False)
Out[37]:
Country
Qatar 41.131162
Trinidad and Tobago 36.073741
Kuwait 29.294309
Aruba 24.182702
Bahrain 23.101200
Luxembourg 21.635136
Brunei Darussalam 20.856947
United Arab Emirates 19.306504
Saudi Arabia 18.531173
United States 17.484803
Oman 17.112061
Australia 16.920954
Gibraltar 15.153879
Kazakhstan 15.110081
Canada 14.485639
New Caledonia 14.169288
Estonia 13.599689
Faroe Islands 12.986678
Norway 12.294955
Russian Federation 11.725820
Greenland 11.663773
Finland 11.543147
Korea, Rep. 11.469587
Turkmenistan 11.362475
Netherlands 10.932543
Czech Republic 10.652579
Palau 10.569272
Cayman Islands 10.173449
Belgium 9.889429
Libya 9.791235
…
Burkina Faso 0.125501
Uganda 0.118253
Mozambique 0.112476
Eritrea 0.109471
Madagascar 0.092895
Malawi 0.080690
Ethiopia 0.075215
Niger 0.072026
Somalia 0.063912
Mali 0.063586
Central African Republic 0.059398
Rwanda 0.057354
Chad 0.043463
Congo, Dem. Rep. 0.030197
Burundi 0.022480
American Samoa NaN
Channel Islands NaN
Curacao NaN
Guam NaN
Isle of Man NaN
Kosovo NaN
Monaco NaN
Northern Mariana Islands NaN
Puerto Rico NaN
San Marino NaN
Sint Maarten (Dutch part) NaN
South Sudan NaN
St. Martin (French part) NaN
Tuvalu NaN
Virgin Islands (U.S.) NaN
Name: 2010, Length: 217, dtype: float64
Exercise 1 ¶
• Retrieve the mean, median of CO2 emission generated in 2012 by all countries.
• Retrieve the top 5 countries with the most CO2 emission in 2012. How about the 5 countries with the least emission? (remember that sort_values has an ascending parameter that is set to True by default).
• Retrieve the sum of CO2 emission for all years and find the 2 years with the maximum CO2 emission.
In [87]:
##answer here
# CO2 emission generated in 2012 by all countries
yr2012 = emission[‘2012’]
yr2012.index = names
yr2012
Out[87]:
Country
Afghanistan 0.688084
Albania 1.624722
Algeria 3.471950
American Samoa NaN
Andorra 6.195194
Angola 1.472255
Antigua and Barbuda 5.887357
Argentina 4.569635
Armenia 1.912090
Aruba 12.713613
Australia 16.519386
Austria 7.384032
Azerbaijan 3.824877
Bahamas, The 5.150115
Bahrain 21.810997
Bangladesh 0.434634
Barbados 5.222200
Belarus 6.680341
Belgium 8.284838
Belize 1.426691
Benin 0.548419
Bermuda 6.847542
Bhutan 1.099541
Bolivia 1.835512
Bosnia and Herzegovina 5.814068
Botswana 2.085533
Brazil 2.322258
British Virgin Islands 6.173617
Brunei Darussalam 23.800884
Bulgaria 6.142043
…
Sweden 4.942301
Switzerland 4.719447
Syrian Arab Republic 2.193362
Tajikistan 0.385152
Tanzania 0.196219
Thailand 4.544431
Timor-Leste 0.255327
Togo 0.320733
Tonga 1.715040
Trinidad and Tobago 33.819694
Tunisia 2.505571
Turkey 4.402995
Turkmenistan 12.564208
Turks and Caicos Islands 6.106013
Tuvalu NaN
Uganda 0.115395
Ukraine 6.487201
United Arab Emirates 19.252223
United Kingdom 7.334311
United States 16.287053
Uruguay 2.559638
Uzbekistan 3.889733
Vanuatu 0.459305
Venezuela, RB 6.653708
Vietnam 1.781697
Virgin Islands (U.S.) NaN
West Bank and Gaza 0.543675
Yemen, Rep. 0.735086
Zambia 0.235347
Zimbabwe 0.887956
Name: 2012, Length: 217, dtype: float64
In [88]:
# mean
yr2012.mean()
Out[88]:
5.161476912617648
In [40]:
# median
yr2012.median()
Out[40]:
2.7530064495
In [105]:
# top 5 CO2 emitting countries
sorted2012 = yr2012.sort_values(ascending=False)
top5 = sorted2012[:5]
top5
Out[105]:
Country
Qatar 46.697477
Curacao 39.638551
Trinidad and Tobago 33.819694
Kuwait 29.578729
Brunei Darussalam 23.800884
Name: 2012, dtype: float64
In [106]:
# least CO2 emitting countries
sorted2012 = yr2012.sort_values()
least5 = sorted2012[:5]
least5
Out[106]:
Country
Burundi 0.027888
Congo, Dem. Rep. 0.036414
Chad 0.042682
Mali 0.061677
Central African Republic 0.063505
Name: 2012, dtype: float64
In [90]:
res = emission.sum()[1:] # columns To ignore header
res.sort_values(ascending=False)[:2]
# emission
#emission_T= emission.transpose()
#res = res.astype(float)
#emission_T.sum(axis=1) # rows
#emission_T.index
Out[90]:
Country
1990
2000
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
0
Afghanistan
0.216661
0.039272
0.087858
0.158962
0.249074
0.302936
0.425262
0.688084
0.693183
NaN
NaN
NaN
1
Albania
1.615624
0.978175
1.322335
1.484311
1.495600
1.578574
1.803972
1.624722
1.662185
NaN
NaN
NaN
2
Algeria
3.007911
2.819778
3.195865
3.168524
3.430129
3.307164
3.300558
3.471950
3.514780
NaN
NaN
NaN
3
American Samoa
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
4
Andorra
NaN
8.018181
6.350868
6.296125
6.049173
6.124770
5.968685
6.195194
6.473848
NaN
NaN
NaN
5
Angola
0.459698
0.633625
1.311096
1.295687
1.354389
1.369339
1.382752
1.472255
1.384496
NaN
NaN
NaN
6
Antigua and Barbuda
4.857267
4.439239
5.561525
5.628319
5.906292
6.011269
5.823804
5.887357
5.827427
NaN
NaN
NaN
7
Argentina
3.440711
3.835574
4.382704
4.682974
4.410973
4.558612
4.600430
4.569635
4.462298
NaN
NaN
NaN
8
Armenia
NaN
1.126529
1.694755
1.868611
1.469961
1.422998
1.656831
1.912090
1.837059
NaN
NaN
NaN
9
Aruba
27.850035
25.547679
25.613715
24.750133
24.876706
24.182702
23.922412
12.713613
8.515395
NaN
NaN
NaN
10
Australia
15.452882
17.200610
17.865260
18.160876
18.200182
16.920954
16.862601
16.519386
16.347302
NaN
NaN
NaN
11
Austria
7.513246
7.755493
8.405093
8.286723
7.555670
8.110593
7.776538
7.384032
7.360056
NaN
NaN
NaN
12
Azerbaijan
NaN
3.666271
3.555340
4.051383
3.565668
3.388226
3.646180
3.824877
3.785069
NaN
NaN
NaN
13
Bahamas, The
7.610436
5.600992
4.939204
4.292002
4.630520
6.829321
5.199844
5.150115
8.229959
NaN
NaN
NaN
14
Bahrain
25.058198
27.956644
26.001291
26.538479
22.815070
23.101200
22.484702
21.810997
23.682574
NaN
NaN
NaN
15
Bangladesh
0.146565
0.212287
0.302930
0.334291
0.359054
0.395683
0.414399
0.434634
0.438736
NaN
NaN
NaN
16
Barbados
4.126491
4.403042
4.939722
5.844668
5.807341
5.286054
5.449494
5.222200
5.127255
NaN
NaN
NaN
17
Belarus
NaN
5.376068
6.330945
6.623153
6.380896
6.602532
6.719672
6.680341
6.736650
NaN
NaN
NaN
18
Belgium
10.639672
11.229668
9.744777
9.751985
9.286298
9.889429
8.953622
8.284838
8.371639
NaN
NaN
NaN
19
Belize
1.661912
1.675499
1.536094
1.257606
1.378374
1.676100
1.815716
1.426691
1.502201
NaN
NaN
NaN
20
Benin
0.141510
0.230066
0.516299
0.491602
0.504357
0.535216
0.542585
0.548419
0.561654
NaN
NaN
NaN
21
Bermuda
10.013384
7.531723
11.246039
9.943759
6.927723
8.784043
6.134007
6.847542
5.585037
NaN
NaN
NaN
22
Bhutan
0.239671
0.701959
0.575768
0.606779
0.549146
0.677145
1.001576
1.099541
1.171089
NaN
NaN
NaN
23
Bolivia
0.797982
1.225923
1.306164
1.378957
1.433538
1.532870
1.602046
1.835512
1.894512
NaN
NaN
NaN
24
Bosnia and Herzegovina
NaN
3.635213
4.596619
5.248737
5.394822
5.547458
6.233013
5.814068
5.729428
NaN
NaN
NaN
25
Botswana
1.958655
2.177083
2.192110
2.290168
1.956623
2.342203
2.105751
2.085533
2.491830
NaN
NaN
NaN
26
Brazil
1.388940
1.865808
1.884034
1.990203
1.866522
2.113415
2.191394
2.322258
2.465870
NaN
NaN
NaN
27
British Virgin Islands
4.010329
4.973889
6.078310
6.301672
6.238752
6.331007
6.307461
6.173617
6.057402
NaN
NaN
NaN
28
Brunei Darussalam
24.105188
14.255144
22.474463
23.950011
20.311171
20.856947
24.272670
23.800884
18.918736
NaN
NaN
NaN
29
Bulgaria
8.690224
5.328034
6.925435
6.772088
5.731194
5.965396
6.714882
6.142043
5.445649
NaN
NaN
NaN
…
…
…
…
…
…
…
…
…
…
…
…
…
…
187
Sweden
6.069368
5.562430
5.253522
5.328098
4.631411
5.547348
5.474957
4.942301
4.617182
NaN
NaN
NaN
188
Switzerland
6.345085
5.435485
5.030574
5.283044
5.377974
4.983429
4.672038
4.719447
4.987795
NaN
NaN
NaN
189
Syrian Arab Republic
3.000384
3.120999
3.421673
3.367748
3.001799
2.973157
2.775850
2.193362
1.866465
NaN
NaN
NaN
190
Tajikistan
NaN
0.361593
0.455597
0.400363
0.330848
0.335663
0.303143
0.385152
0.442107
NaN
NaN
NaN
191
Tanzania
0.089017
0.077997
0.142010
0.143360
0.133837
0.155682
0.171744
0.196219
0.214119
NaN
NaN
NaN
192
Thailand
1.604832
2.891389
3.847628
3.842684
4.063510
4.327193
4.339749
4.544431
4.493869
NaN
NaN
NaN
193
Timor-Leste
NaN
NaN
0.180962
0.195691
0.216865
0.220073
0.219288
0.255327
0.372893
NaN
NaN
NaN
194
Togo
0.204317
0.279083
0.239054
0.278072
0.447486
0.408538
0.366913
0.320733
0.321782
NaN
NaN
NaN
195
Tonga
0.809305
0.973891
1.112637
1.176967
1.276514
1.128883
0.983562
1.715040
1.988025
NaN
NaN
NaN
196
Trinidad and Tobago
13.879875
18.844281
36.816763
35.455298
33.952984
36.073741
35.137310
33.819694
34.520237
NaN
NaN
NaN
197
Tunisia
1.627000
2.085612
2.357974
2.403508
2.374509
2.622539
2.437841
2.505571
2.541452
NaN
NaN
NaN
198
Turkey
2.701355
3.417944
4.094889
4.036995
3.898959
4.121155
4.364167
4.402995
4.243453
NaN
NaN
NaN
199
Turkmenistan
NaN
8.339388
11.535622
11.556826
10.110661
11.362475
12.256911
12.564208
12.765704
NaN
NaN
NaN
200
Turks and Caicos Islands
NaN
0.777071
5.762099
6.094875
5.940523
6.152486
6.010149
6.106013
5.981875
NaN
NaN
NaN
201
Tuvalu
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
202
Uganda
0.043664
0.060197
0.096346
0.102865
0.105434
0.118253
0.124480
0.115395
0.133853
NaN
NaN
NaN
203
Ukraine
NaN
6.525619
6.907397
6.723650
5.648607
6.641350
6.267086
6.487201
5.959633
NaN
NaN
NaN
204
United Arab Emirates
28.711160
36.904101
22.566623
22.804551
21.797583
19.306504
18.270092
19.252223
18.708236
NaN
NaN
NaN
205
United Kingdom
9.710882
9.199549
8.617164
8.414457
7.569499
7.864205
7.080979
7.334311
7.133722
NaN
NaN
NaN
206
United States
19.323368
20.207615
19.237460
18.489234
17.192367
17.484803
17.019439
16.287053
16.389758
NaN
NaN
NaN
207
Uruguay
1.284045
1.597640
1.797404
2.472148
2.401221
1.890871
2.292951
2.559638
2.231639
NaN
NaN
NaN
208
Uzbekistan
NaN
4.911023
4.300004
4.362477
3.858172
3.647049
3.874801
3.889733
3.413199
NaN
NaN
NaN
209
Vanuatu
0.450144
0.455754
0.450131
0.423112
0.524352
0.512110
0.545784
0.459305
0.420054
NaN
NaN
NaN
210
Venezuela, RB
6.150573
6.226035
5.810738
6.370396
6.295452
6.520630
6.076146
6.653708
6.128009
NaN
NaN
NaN
211
Vietnam
0.324281
0.691020
1.245243
1.386218
1.568338
1.694879
1.842551
1.781697
1.700368
NaN
NaN
NaN
212
Virgin Islands (U.S.)
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
213
West Bank and Gaza
NaN
0.271058
0.665297
0.570947
0.564578
0.534015
0.572407
0.543675
0.584855
NaN
NaN
NaN
214
Yemen, Rep.
0.802312
0.822618
0.966383
1.000747
1.070024
0.993225
0.811478
0.735086
0.992680
NaN
NaN
NaN
215
Zambia
0.300362
0.172174
0.139614
0.166369
0.187044
0.194713
0.204524
0.235347
0.250863
NaN
NaN
NaN
216
Zimbabwe
1.478723
1.113890
0.743449
0.575777
0.607470
0.663130
0.815684
0.887956
0.924990
NaN
NaN
NaN
217 rows × 13 columns
In [44]:
# another solution for the 3rd sub-question
# list comprehension to find the sum of CO2 emission
year = [‘1990′,’2000′,’2007′,’2008′,’2009′,’2010′,’2011′,’2012′,’2013′,’2014′,’2015′,’2016’]
sum_per_year = [emission[y].sum() for y in year]
sum_per_year
Out[44]:
[734.2376925839999,
936.682642449,
1025.9772624969999,
1023.5284745380001,
975.0620494310001,
1010.472087137,
998.0177672359998,
1052.941290174,
1024.685642681,
0.0,
0.0,
0.0]
In [45]:
# create a series object of the sum and sort the values
s = pd.Series(sum_per_year, index= year)
s.sort_values(ascending=False)[:2]
Out[45]:
2012 1052.941290
2007 1025.977262
dtype: float64
More Sort Operations¶
Pandas allows you to sort your DataFrame by rows/columns as follows:
In [46]:
# Sort column values of a DataFrame
sorted2012 = emission.sort_values( by = ‘2012’,ascending = False )
sorted2012
Out[46]:
Country
1990
2000
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
158
Qatar
24.712027
58.522169
53.672746
46.684906
43.504331
41.131162
42.213752
46.697477
40.462355
NaN
NaN
NaN
50
Curacao
NaN
NaN
NaN
NaN
NaN
NaN
NaN
39.638551
34.161635
NaN
NaN
NaN
196
Trinidad and Tobago
13.879875
18.844281
36.816763
35.455298
33.952984
36.073741
35.137310
33.819694
34.520237
NaN
NaN
NaN
105
Kuwait
23.466084
27.759023
29.636849
30.581264
30.300789
29.294309
28.102662
29.578729
27.258964
NaN
NaN
NaN
28
Brunei Darussalam
24.105188
14.255144
22.474463
23.950011
20.311171
20.856947
24.272670
23.800884
18.918736
NaN
NaN
NaN
14
Bahrain
25.058198
27.956644
26.001291
26.538479
22.815070
23.101200
22.484702
21.810997
23.682574
NaN
NaN
NaN
171
Sint Maarten (Dutch part)
NaN
NaN
NaN
NaN
NaN
NaN
NaN
21.489636
20.835796
NaN
NaN
NaN
115
Luxembourg
26.197659
18.885512
22.957283
22.385472
20.877125
21.635136
21.102970
20.084219
18.700782
NaN
NaN
NaN
204
United Arab Emirates
28.711160
36.904101
22.566623
22.804551
21.797583
19.306504
18.270092
19.252223
18.708236
NaN
NaN
NaN
165
Saudi Arabia
13.320831
13.880494
14.866777
16.085642
17.102371
18.531173
17.393401
19.188937
17.927485
NaN
NaN
NaN
147
Oman
6.283132
9.777453
16.846640
16.101568
15.963375
17.112061
17.512479
16.699720
15.660423
NaN
NaN
NaN
10
Australia
15.452882
17.200610
17.865260
18.160876
18.200182
16.920954
16.862601
16.519386
16.347302
NaN
NaN
NaN
206
United States
19.323368
20.207615
19.237460
18.489234
17.192367
17.484803
17.019439
16.287053
16.389758
NaN
NaN
NaN
74
Gibraltar
5.329743
12.200541
13.919080
14.137127
15.195251
15.153879
14.636367
14.713776
15.491110
NaN
NaN
NaN
99
Kazakhstan
NaN
7.934854
14.359712
11.879265
10.309399
15.110081
15.650630
14.574160
15.432794
NaN
NaN
NaN
140
New Caledonia
9.269475
10.473212
12.372273
11.849562
12.138518
14.169288
14.364823
14.016718
14.681943
NaN
NaN
NaN
35
Canada
15.659070
17.370452
16.446455
16.215164
14.837315
14.485639
14.481342
13.858455
13.532295
NaN
NaN
NaN
62
Estonia
NaN
10.636252
13.954884
13.081834
10.914320
13.599689
14.016733
13.324000
15.110411
NaN
NaN
NaN
160
Russian Federation
NaN
10.627121
11.672457
12.014507
11.023856
11.725820
12.367534
12.818345
12.466816
NaN
NaN
NaN
9
Aruba
27.850035
25.547679
25.613715
24.750133
24.876706
24.182702
23.922412
12.713613
8.515395
NaN
NaN
NaN
199
Turkmenistan
NaN
8.339388
11.535622
11.556826
10.110661
11.362475
12.256911
12.564208
12.765704
NaN
NaN
NaN
64
Faroe Islands
12.826216
14.828590
14.185395
12.970121
11.842902
12.986678
11.721212
12.199843
12.301292
NaN
NaN
NaN
103
Korea, Rep.
5.760374
9.520932
10.199582
10.369833
10.346506
11.469587
11.840241
11.680558
11.798150
NaN
NaN
NaN
131
Mongolia
4.573372
3.130988
4.653094
4.578057
4.907974
9.090930
9.997258
11.653855
14.546549
NaN
NaN
NaN
149
Palau
15.553582
5.928706
10.389651
10.333663
10.274233
10.569272
10.855430
10.776980
10.693006
NaN
NaN
NaN
170
Singapore
15.405065
12.166624
4.342606
7.466762
7.422120
8.656224
7.393864
10.282908
9.363847
NaN
NaN
NaN
139
Netherlands
10.594475
10.383090
10.468447
10.541492
10.234971
10.932543
10.327657
10.109168
10.114759
NaN
NaN
NaN
76
Greenland
9.958939
9.461121
11.282079
11.718151
10.221739
11.663773
12.440341
10.005017
10.062939
NaN
NaN
NaN
146
Norway
7.429148
8.834016
9.574850
11.679583
11.461829
12.294955
9.124767
9.940980
11.740324
NaN
NaN
NaN
97
Japan
8.873293
9.622352
9.782964
9.449534
8.619442
9.150005
9.318429
9.639072
9.764393
NaN
NaN
NaN
…
…
…
…
…
…
…
…
…
…
…
…
…
…
169
Sierra Leone
0.124061
0.104753
0.119034
0.120200
0.115584
0.125706
0.152044
0.170511
0.192879
NaN
NaN
NaN
30
Burkina Faso
0.066173
0.089717
0.115429
0.130385
0.127425
0.125501
0.137283
0.158476
0.179008
NaN
NaN
NaN
81
Guinea-Bissau
0.163177
0.111505
0.151257
0.145619
0.146971
0.145855
0.146811
0.147568
0.146084
NaN
NaN
NaN
134
Mozambique
0.076510
0.073884
0.101189
0.098553
0.106996
0.112476
0.128845
0.121269
0.151850
NaN
NaN
NaN
177
South Sudan
NaN
NaN
NaN
NaN
NaN
NaN
NaN
0.121225
0.126461
NaN
NaN
NaN
118
Madagascar
0.082260
0.119013
0.093705
0.094772
0.086416
0.092895
0.106734
0.119581
0.134206
NaN
NaN
NaN
202
Uganda
0.043664
0.060197
0.096346
0.102865
0.105434
0.118253
0.124480
0.115395
0.133853
NaN
NaN
NaN
143
Niger
0.092696
0.062072
0.049726
0.053722
0.061771
0.072026
0.078332
0.105836
0.106861
NaN
NaN
NaN
63
Ethiopia
0.062799
0.053424
0.074345
0.079184
0.077723
0.075215
0.085984
0.092877
0.112463
NaN
NaN
NaN
119
Malawi
0.065085
0.079937
0.069546
0.078854
0.070120
0.080690
0.078027
0.071937
0.078594
NaN
NaN
NaN
161
Rwanda
0.074757
0.065826
0.058789
0.055661
0.057431
0.057354
0.062874
0.068137
0.072161
NaN
NaN
NaN
175
Somalia
0.002900
0.070009
0.068327
0.065851
0.063881
0.063912
0.058707
0.063592
0.060711
NaN
NaN
NaN
37
Central African Republic
0.067403
0.071843
0.060213
0.059112
0.058013
0.059398
0.061509
0.063505
0.063054
NaN
NaN
NaN
122
Mali
0.049717
0.074356
0.073291
0.075282
0.054901
0.063586
0.066826
0.061677
0.061882
NaN
NaN
NaN
38
Chad
0.024619
0.021097
0.042863
0.045756
0.042689
0.043463
0.043830
0.042682
0.046305
NaN
NaN
NaN
44
Congo, Dem. Rep.
0.116420
0.016943
0.028927
0.029605
0.026937
0.030197
0.036246
0.036414
0.038311
NaN
NaN
NaN
31
Burundi
0.052263
0.042809
0.021964
0.021615
0.020868
0.022480
0.024721
0.027888
0.028030
NaN
NaN
NaN
3
American Samoa
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
39
Channel Islands
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
61
Eritrea
NaN
0.172191
0.131490
0.092069
0.111761
0.109471
0.124031
NaN
NaN
NaN
NaN
NaN
78
Guam
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
93
Isle of Man
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
104
Kosovo
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
130
Monaco
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
145
Northern Mariana Islands
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
157
Puerto Rico
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
163
San Marino
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
182
St. Martin (French part)
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
201
Tuvalu
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
212
Virgin Islands (U.S.)
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
217 rows × 13 columns
In [114]:
# Sort column values using two columns
emission.sort_values( by = [‘2012’],ascending = [True], inplace = True)
emission
Out[114]:
Country
1990
2000
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
31
Burundi
0.052263
0.042809
0.021964
0.021615
0.020868
0.022480
0.024721
0.027888
0.028030
NaN
NaN
NaN
44
Congo, Dem. Rep.
0.116420
0.016943
0.028927
0.029605
0.026937
0.030197
0.036246
0.036414
0.038311
NaN
NaN
NaN
38
Chad
0.024619
0.021097
0.042863
0.045756
0.042689
0.043463
0.043830
0.042682
0.046305
NaN
NaN
NaN
122
Mali
0.049717
0.074356
0.073291
0.075282
0.054901
0.063586
0.066826
0.061677
0.061882
NaN
NaN
NaN
37
Central African Republic
0.067403
0.071843
0.060213
0.059112
0.058013
0.059398
0.061509
0.063505
0.063054
NaN
NaN
NaN
175
Somalia
0.002900
0.070009
0.068327
0.065851
0.063881
0.063912
0.058707
0.063592
0.060711
NaN
NaN
NaN
161
Rwanda
0.074757
0.065826
0.058789
0.055661
0.057431
0.057354
0.062874
0.068137
0.072161
NaN
NaN
NaN
119
Malawi
0.065085
0.079937
0.069546
0.078854
0.070120
0.080690
0.078027
0.071937
0.078594
NaN
NaN
NaN
63
Ethiopia
0.062799
0.053424
0.074345
0.079184
0.077723
0.075215
0.085984
0.092877
0.112463
NaN
NaN
NaN
143
Niger
0.092696
0.062072
0.049726
0.053722
0.061771
0.072026
0.078332
0.105836
0.106861
NaN
NaN
NaN
202
Uganda
0.043664
0.060197
0.096346
0.102865
0.105434
0.118253
0.124480
0.115395
0.133853
NaN
NaN
NaN
118
Madagascar
0.082260
0.119013
0.093705
0.094772
0.086416
0.092895
0.106734
0.119581
0.134206
NaN
NaN
NaN
177
South Sudan
NaN
NaN
NaN
NaN
NaN
NaN
NaN
0.121225
0.126461
NaN
NaN
NaN
134
Mozambique
0.076510
0.073884
0.101189
0.098553
0.106996
0.112476
0.128845
0.121269
0.151850
NaN
NaN
NaN
81
Guinea-Bissau
0.163177
0.111505
0.151257
0.145619
0.146971
0.145855
0.146811
0.147568
0.146084
NaN
NaN
NaN
30
Burkina Faso
0.066173
0.089717
0.115429
0.130385
0.127425
0.125501
0.137283
0.158476
0.179008
NaN
NaN
NaN
169
Sierra Leone
0.124061
0.104753
0.119034
0.120200
0.115584
0.125706
0.152044
0.170511
0.192879
NaN
NaN
NaN
191
Tanzania
0.089017
0.077997
0.142010
0.143360
0.133837
0.155682
0.171744
0.196219
0.214119
NaN
NaN
NaN
43
Comoros
0.150162
0.153992
0.175048
0.176347
0.172098
0.183692
0.209990
0.199929
0.214645
NaN
NaN
NaN
138
Nepal
0.033849
0.129286
0.100315
0.129963
0.162992
0.188153
0.203593
0.212549
0.233576
NaN
NaN
NaN
80
Guinea
0.175022
0.145444
0.119193
0.209948
0.210799
0.236422
0.245626
0.221998
0.192423
NaN
NaN
NaN
83
Haiti
0.139971
0.159990
0.250172
0.247108
0.230375
0.212694
0.218685
0.225249
0.230610
NaN
NaN
NaN
215
Zambia
0.300362
0.172174
0.139614
0.166369
0.187044
0.194713
0.204524
0.235347
0.250863
NaN
NaN
NaN
111
Liberia
0.226694
0.147087
0.190516
0.153758
0.135300
0.201046
0.219324
0.245916
0.222905
NaN
NaN
NaN
135
Myanmar
0.101785
0.211621
0.253947
0.192081
0.199163
0.241924
0.274293
0.246147
0.237874
NaN
NaN
NaN
193
Timor-Leste
NaN
NaN
0.180962
0.195691
0.216865
0.220073
0.219288
0.255327
0.372893
NaN
NaN
NaN
70
Gambia, The
0.207986
0.223804
0.257765
0.258834
0.266260
0.279411
0.249484
0.261768
0.263208
NaN
NaN
NaN
34
Cameroon
0.144002
0.215493
0.305806
0.283310
0.335016
0.329289
0.273127
0.282904
0.306750
NaN
NaN
NaN
100
Kenya
0.248364
0.335351
0.263922
0.267802
0.314501
0.301883
0.324913
0.294184
0.304402
NaN
NaN
NaN
184
Sudan
0.202623
0.159134
0.332724
0.341440
0.346451
0.345245
0.330142
0.301083
0.309100
NaN
NaN
NaN
…
…
…
…
…
…
…
…
…
…
…
…
…
…
35
Canada
15.659070
17.370452
16.446455
16.215164
14.837315
14.485639
14.481342
13.858455
13.532295
NaN
NaN
NaN
140
New Caledonia
9.269475
10.473212
12.372273
11.849562
12.138518
14.169288
14.364823
14.016718
14.681943
NaN
NaN
NaN
99
Kazakhstan
NaN
7.934854
14.359712
11.879265
10.309399
15.110081
15.650630
14.574160
15.432794
NaN
NaN
NaN
74
Gibraltar
5.329743
12.200541
13.919080
14.137127
15.195251
15.153879
14.636367
14.713776
15.491110
NaN
NaN
NaN
206
United States
19.323368
20.207615
19.237460
18.489234
17.192367
17.484803
17.019439
16.287053
16.389758
NaN
NaN
NaN
10
Australia
15.452882
17.200610
17.865260
18.160876
18.200182
16.920954
16.862601
16.519386
16.347302
NaN
NaN
NaN
147
Oman
6.283132
9.777453
16.846640
16.101568
15.963375
17.112061
17.512479
16.699720
15.660423
NaN
NaN
NaN
165
Saudi Arabia
13.320831
13.880494
14.866777
16.085642
17.102371
18.531173
17.393401
19.188937
17.927485
NaN
NaN
NaN
204
United Arab Emirates
28.711160
36.904101
22.566623
22.804551
21.797583
19.306504
18.270092
19.252223
18.708236
NaN
NaN
NaN
115
Luxembourg
26.197659
18.885512
22.957283
22.385472
20.877125
21.635136
21.102970
20.084219
18.700782
NaN
NaN
NaN
171
Sint Maarten (Dutch part)
NaN
NaN
NaN
NaN
NaN
NaN
NaN
21.489636
20.835796
NaN
NaN
NaN
14
Bahrain
25.058198
27.956644
26.001291
26.538479
22.815070
23.101200
22.484702
21.810997
23.682574
NaN
NaN
NaN
28
Brunei Darussalam
24.105188
14.255144
22.474463
23.950011
20.311171
20.856947
24.272670
23.800884
18.918736
NaN
NaN
NaN
105
Kuwait
23.466084
27.759023
29.636849
30.581264
30.300789
29.294309
28.102662
29.578729
27.258964
NaN
NaN
NaN
196
Trinidad and Tobago
13.879875
18.844281
36.816763
35.455298
33.952984
36.073741
35.137310
33.819694
34.520237
NaN
NaN
NaN
50
Curacao
NaN
NaN
NaN
NaN
NaN
NaN
NaN
39.638551
34.161635
NaN
NaN
NaN
158
Qatar
24.712027
58.522169
53.672746
46.684906
43.504331
41.131162
42.213752
46.697477
40.462355
NaN
NaN
NaN
3
American Samoa
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
39
Channel Islands
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
61
Eritrea
NaN
0.172191
0.131490
0.092069
0.111761
0.109471
0.124031
NaN
NaN
NaN
NaN
NaN
78
Guam
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
93
Isle of Man
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
104
Kosovo
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
130
Monaco
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
145
Northern Mariana Islands
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
157
Puerto Rico
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
163
San Marino
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
182
St. Martin (French part)
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
201
Tuvalu
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
212
Virgin Islands (U.S.)
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
217 rows × 13 columns
Slicing using the .loc and .iloc method¶
Slicing allows you to take part of your DataFrame. You can use the .iloc method to select data using row/column numbers, or use .loc to select data using row/column headings. See this article for more examples
In [48]:
# Slicing using a range of rows and range of columns
emission.iloc[2:5,2:6]
Out[48]:
2000
2007
2008
2009
2
2.819778
3.195865
3.168524
3.430129
3
NaN
NaN
NaN
NaN
4
8.018181
6.350868
6.296125
6.049173
In [115]:
# Slicing using specific rows and specific columns
emission.loc[[3,5],[‘Country’,’1990′]]
—————————————————————————
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _multi_take(self, tup)
940 self._validate_read_indexer(keyarr, indexer,
–> 941 o._get_axis_number(axis))
942 d[axis] = (keyarr, indexer)
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis)
1205 u”None of [{key}] are in the [{axis}]”.format(
-> 1206 key=key, axis=self.obj._get_axis_name(axis)))
1207
KeyError: “None of [Int64Index([1, 2], dtype=’int64′)] are in the [columns]”
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
1 # Slicing using specific rows and specific columns
—-> 2 emission.loc[[3,5],[1,2]]
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1470 except (KeyError, IndexError):
1471 pass
-> 1472 return self._getitem_tuple(key)
1473 else:
1474 # we by definition only have the 0th axis
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
877 # ugly hack for GH #836
878 if self._multi_take_opportunity(tup):
–> 879 return self._multi_take(tup)
880
881 # no shortcut needed
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _multi_take(self, tup)
943 return o._reindex_with_indexers(d, copy=True, allow_dups=True)
944 except (KeyError, IndexingError) as detail:
–> 945 raise self._exception(detail)
946
947 def _convert_for_reindex(self, key, axis=None):
KeyError: KeyError(“None of [Int64Index([1, 2], dtype=’int64′)] are in the [columns]”)
In [50]:
# Specific rows and all columns
emission.loc[[3,5],:]
Out[50]:
Country
1990
2000
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
3
American Samoa
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
5
Angola
0.459698
0.633625
1.311096
1.295687
1.354389
1.369339
1.382752
1.472255
1.384496
NaN
NaN
NaN
In [51]:
# All rows and specific columns
emission.loc[:,[‘Country’,’1990′]]
Out[51]:
Country
1990
0
Afghanistan
0.216661
1
Albania
1.615624
2
Algeria
3.007911
3
American Samoa
NaN
4
Andorra
NaN
5
Angola
0.459698
6
Antigua and Barbuda
4.857267
7
Argentina
3.440711
8
Armenia
NaN
9
Aruba
27.850035
10
Australia
15.452882
11
Austria
7.513246
12
Azerbaijan
NaN
13
Bahamas, The
7.610436
14
Bahrain
25.058198
15
Bangladesh
0.146565
16
Barbados
4.126491
17
Belarus
NaN
18
Belgium
10.639672
19
Belize
1.661912
20
Benin
0.141510
21
Bermuda
10.013384
22
Bhutan
0.239671
23
Bolivia
0.797982
24
Bosnia and Herzegovina
NaN
25
Botswana
1.958655
26
Brazil
1.388940
27
British Virgin Islands
4.010329
28
Brunei Darussalam
24.105188
29
Bulgaria
8.690224
…
…
…
187
Sweden
6.069368
188
Switzerland
6.345085
189
Syrian Arab Republic
3.000384
190
Tajikistan
NaN
191
Tanzania
0.089017
192
Thailand
1.604832
193
Timor-Leste
NaN
194
Togo
0.204317
195
Tonga
0.809305
196
Trinidad and Tobago
13.879875
197
Tunisia
1.627000
198
Turkey
2.701355
199
Turkmenistan
NaN
200
Turks and Caicos Islands
NaN
201
Tuvalu
NaN
202
Uganda
0.043664
203
Ukraine
NaN
204
United Arab Emirates
28.711160
205
United Kingdom
9.710882
206
United States
19.323368
207
Uruguay
1.284045
208
Uzbekistan
NaN
209
Vanuatu
0.450144
210
Venezuela, RB
6.150573
211
Vietnam
0.324281
212
Virgin Islands (U.S.)
NaN
213
West Bank and Gaza
NaN
214
Yemen, Rep.
0.802312
215
Zambia
0.300362
216
Zimbabwe
1.478723
217 rows × 2 columns
Exercise 2 ¶
Create a DataFrame object that has the name, region and IncomeGroup of the top 10 emitting countries in 2012.
In [118]:
##answer here
# create a DataFrame for the countries
countries = pd.read_csv(‘data/countries.csv’,encoding = ‘ISO-8859-1’, index_col = ‘Country’)
# finding the top 10 emitting countries in 2012
sorted2012 = emission[‘2012’].sort_values(ascending=False)[:10]
Out[118]:
Region
IncomeGroup
Country
Afghanistan
South Asia
Low income
Albania
Europe & Central Asia
Upper middle income
Algeria
Middle East & North Africa
Upper middle income
American Samoa
East Asia & Pacific
Upper middle income
Andorra
Europe & Central Asia
High income
Angola
Sub-Saharan Africa
Upper middle income
Antigua and Barbuda
Latin America & Caribbean
High income
Argentina
Latin America & Caribbean
Upper middle income
Armenia
Europe & Central Asia
Lower middle income
Aruba
Latin America & Caribbean
High income
Australia
East Asia & Pacific
High income
Austria
Europe & Central Asia
High income
Azerbaijan
Europe & Central Asia
Upper middle income
Bahamas, The
Latin America & Caribbean
High income
Bahrain
Middle East & North Africa
High income
Bangladesh
South Asia
Lower middle income
Barbados
Latin America & Caribbean
High income
Belarus
Europe & Central Asia
Upper middle income
Belgium
Europe & Central Asia
High income
Belize
Latin America & Caribbean
Upper middle income
Benin
Sub-Saharan Africa
Low income
Bermuda
North America
High income
Bhutan
South Asia
Lower middle income
Bolivia
Latin America & Caribbean
Lower middle income
Bosnia and Herzegovina
Europe & Central Asia
Upper middle income
Botswana
Sub-Saharan Africa
Upper middle income
Brazil
Latin America & Caribbean
Upper middle income
British Virgin Islands
Latin America & Caribbean
High income
Brunei Darussalam
East Asia & Pacific
High income
Bulgaria
Europe & Central Asia
Upper middle income
…
…
…
Sweden
Europe & Central Asia
High income
Switzerland
Europe & Central Asia
High income
Syrian Arab Republic
Middle East & North Africa
Lower middle income
Tajikistan
Europe & Central Asia
Lower middle income
Tanzania
Sub-Saharan Africa
Low income
Thailand
East Asia & Pacific
Upper middle income
Timor-Leste
East Asia & Pacific
Lower middle income
Togo
Sub-Saharan Africa
Low income
Tonga
East Asia & Pacific
Lower middle income
Trinidad and Tobago
Latin America & Caribbean
High income
Tunisia
Middle East & North Africa
Lower middle income
Turkey
Europe & Central Asia
Upper middle income
Turkmenistan
Europe & Central Asia
Upper middle income
Turks and Caicos Islands
Latin America & Caribbean
High income
Tuvalu
East Asia & Pacific
Upper middle income
Uganda
Sub-Saharan Africa
Low income
Ukraine
Europe & Central Asia
Lower middle income
United Arab Emirates
Middle East & North Africa
High income
United Kingdom
Europe & Central Asia
High income
United States
North America
High income
Uruguay
Latin America & Caribbean
High income
Uzbekistan
Europe & Central Asia
Lower middle income
Vanuatu
East Asia & Pacific
Lower middle income
Venezuela, RB
Latin America & Caribbean
Upper middle income
Vietnam
East Asia & Pacific
Lower middle income
Virgin Islands (U.S.)
Latin America & Caribbean
High income
West Bank and Gaza
Middle East & North Africa
Lower middle income
Yemen, Rep.
Middle East & North Africa
Lower middle income
Zambia
Sub-Saharan Africa
Lower middle income
Zimbabwe
Sub-Saharan Africa
Low income
217 rows × 2 columns
In [92]:
#sorted2012 = yr2012.sort_values(ascending=False)[:10]
sorted2012 = emission[‘2012’].sort_values(ascending=False)[:10]
sorted2012
Out[92]:
Country
Qatar 46.697477
Curacao 39.638551
Trinidad and Tobago 33.819694
Kuwait 29.578729
Brunei Darussalam 23.800884
Bahrain 21.810997
Sint Maarten (Dutch part) 21.489636
Luxembourg 20.084219
United Arab Emirates 19.252223
Saudi Arabia 19.188937
Name: 2012, dtype: float64
In [54]:
sorted2012 = emission[‘2012’].sort_values(ascending=False)[:10]
cntry = sorted2012.index
# Find the information from countries dataframe
cntryInfo = (countries.loc[c,:] for c in cntry)
# # Final answer: create a new dataframe form cntryInfo
top10 = pd.DataFrame(cntryInfo)
top10
Out[54]:
Region
IncomeGroup
Qatar
Middle East & North Africa
High income
Curacao
Latin America & Caribbean
High income
Trinidad and Tobago
Latin America & Caribbean
High income
Kuwait
Middle East & North Africa
High income
Brunei Darussalam
East Asia & Pacific
High income
Bahrain
Middle East & North Africa
High income
Sint Maarten (Dutch part)
Latin America & Caribbean
High income
Luxembourg
Europe & Central Asia
High income
United Arab Emirates
Middle East & North Africa
High income
Saudi Arabia
Middle East & North Africa
High income
In [55]:
cntry
Out[55]:
Index([‘Qatar’, ‘Curacao’, ‘Trinidad and Tobago’, ‘Kuwait’,
‘Brunei Darussalam’, ‘Bahrain’, ‘Sint Maarten (Dutch part)’,
‘Luxembourg’, ‘United Arab Emirates’, ‘Saudi Arabia’],
dtype=’object’, name=’Country’)
In [56]:
#Another solution
sorted2012 = emission[‘2012’].sort_values(ascending=False)[:10]
cntry = sorted2012.index
top10 = countries.loc[cntry,:]
top10
Out[56]:
Region
IncomeGroup
Country
Qatar
Middle East & North Africa
High income
Curacao
Latin America & Caribbean
High income
Trinidad and Tobago
Latin America & Caribbean
High income
Kuwait
Middle East & North Africa
High income
Brunei Darussalam
East Asia & Pacific
High income
Bahrain
Middle East & North Africa
High income
Sint Maarten (Dutch part)
Latin America & Caribbean
High income
Luxembourg
Europe & Central Asia
High income
United Arab Emirates
Middle East & North Africa
High income
Saudi Arabia
Middle East & North Africa
High income
In [ ]:
Groupby¶
The Groupby method lets you separate the data into different groups based off shared characteristics. For example, we could group countries by region or income range and then analyse those groups individually. The official documentation on groupby can be found here. This tutorial is also well worth reading.
Groupby¶

Exercise 3 ¶
Using Countries data frame, group the rows using the Region column.
• Show the size of each group
• Find the number of high income and low income countries by region
In [120]:
##answer here
# Groupby using the Region column
grouped = countries.groupby(‘Region’)
# type(grouped)
#print(grouped.count()) # it applies the function on each inner df
grouped.describe()
# grouped.count()
Out[120]:
IncomeGroup
count
unique
top
freq
Region
East Asia & Pacific
37
4
Lower middle income
15
Europe & Central Asia
58
3
High income
37
Latin America & Caribbean
42
4
Upper middle income
20
Middle East & North Africa
21
3
High income
8
North America
3
1
High income
3
South Asia
8
3
Lower middle income
5
Sub-Saharan Africa
48
4
Low income
27
In [123]:
# Show the size of each group
for region,group in grouped:
print (region + ‘:\t’ + str(group.shape[0]))
#print (group.shape[0])#, type(group))
East Asia & Pacific: 37
Europe & Central Asia: 58
Latin America & Caribbean: 42
Middle East & North Africa: 21
North America: 3
South Asia: 8
Sub-Saharan Africa: 48
In [59]:
# Find the number of high income and low income countries by region
for region,group in grouped:
print (region)
count = group[‘IncomeGroup’].value_counts()
high_low = count.filter(items = [‘High income’, ‘Low income’] , axis=0)
for i in high_low.index:
print(‘\t’ + i + ‘: ‘ + str(high_low[i]))
#print(high_low)
#print (group[‘IncomeGroup’].value_counts())
East Asia & Pacific
High income: 13
Low income: 1
Europe & Central Asia
High income: 37
Latin America & Caribbean
High income: 16
Low income: 1
Middle East & North Africa
High income: 8
North America
High income: 3
South Asia
Low income: 2
Sub-Saharan Africa
High income: 1
Low income: 27
Basic visualization with Python¶
In these exercises you will:
• learn how to visualize a set of data using a Python library called matplotlib.
• find out different forms of visualization, such as bar charts, histograms, scatter plot, and boxplot.
You will be able to transform a set of data into an appropriate visualization form.
matplotlib is a Python 2D plotting library that enables you to produce figures and charts, both in a screen or in an image file.
The following example demonstrates a simple plot of the top 100 emissions in 2010.
In [98]:
# slice the data to get top 100 emissions in 2010
yr2010_sorted = yr2010.sort_values(ascending = False)
top100_yr2010 = yr2010_sorted[0:100]
top100_yr2010
Out[98]:
Country
Qatar 41.131162
Trinidad and Tobago 36.073741
Kuwait 29.294309
Aruba 24.182702
Bahrain 23.101200
Luxembourg 21.635136
Brunei Darussalam 20.856947
United Arab Emirates 19.306504
Saudi Arabia 18.531173
United States 17.484803
Oman 17.112061
Australia 16.920954
Gibraltar 15.153879
Kazakhstan 15.110081
Canada 14.485639
New Caledonia 14.169288
Estonia 13.599689
Faroe Islands 12.986678
Norway 12.294955
Russian Federation 11.725820
Greenland 11.663773
Finland 11.543147
Korea, Rep. 11.469587
Turkmenistan 11.362475
Netherlands 10.932543
Czech Republic 10.652579
Palau 10.569272
Cayman Islands 10.173449
Belgium 9.889429
Libya 9.791235
…
Barbados 5.286054
Suriname 5.067292
Hungary 5.021578
Switzerland 4.983429
St. Kitts and Nevis 4.973201
Croatia 4.627555
Lebanon 4.623957
Argentina 4.558612
Portugal 4.552753
Nauru 4.389426
Lithuania 4.348616
Thailand 4.327193
Chile 4.246294
Macedonia, FYR 4.171161
Montenegro 4.167664
Turkey 4.121155
Romania 3.916056
Latvia 3.849593
Mexico 3.752926
Macao SAR, China 3.710719
Uzbekistan 3.647049
Iraq 3.634669
Cuba 3.393589
Azerbaijan 3.388226
Algeria 3.307164
Jordan 3.249598
French Polynesia 3.201007
Mauritius 3.132083
Syrian Arab Republic 2.973157
Gabon 2.941808
Name: 2010, Length: 100, dtype: float64
In [61]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.boxplot(top100_yr2010) # a boxplot of the top 100 emissions in year 2020
Out[61]:
{‘whiskers’: [
‘caps’: [
‘boxes’: [
‘medians’: [
‘fliers’: [
‘means’: []}

Scatter plot¶
Scatter plot is often used to display the relationship between two variables (plot as x-y pairs). In this scatter plot example, we use famous Iris data set. The data is available here. This data set provides measurements on various parts of three types of Iris flower (Iris setosa, Iris versicolour, and Iris virginica). For each type, there are 50 measurements, or samples. Each data row in the CSV file contains (1) sepal length, (2) sepal width, (3) petal length, (4) petal width, and (5) the type of Iris flower.
The following code generates the scatter plot between petal length and petal width of the three Iris types.
In [126]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
iris=pd.read_csv(‘data/iris.csv’,encoding = ‘ISO-8859-1′,header=None)
iris
setosa=iris.loc[iris[4]==’Iris-setosa’]
versicolor=iris.loc[iris[4]==’Iris-versicolor’]
virginica=iris.loc[iris[4]==’Iris-virginica’]
plt.scatter(setosa.iloc[:,2],setosa.iloc[:,3],color=’green’)
plt.scatter(versicolor.iloc[:,2],versicolor.iloc[:,3],color=’red’)
plt.scatter(virginica.iloc[:,2],virginica.iloc[:,3],color=’blue’)
plt.xlim(0.5,7.5)
plt.ylim(0,3)
plt.ylabel(“petal width”)
plt.xlabel(“petal length”)
plt.grid(True)

From the scatter plot, we may be able to suggest a particular type of relationship or a formation of clusters. In the example above you may notice that, for Iris versicolor, the samples with longer petal tend to have wider petal. You can also see clearly that there exist clusters of these three Irises. As such, the measurements of petal and sepal can help identifying the type of Iris flower. This example demonstrates how botanists may indentify a certain species from phenotype characteristics.
Exercise 4 ¶
Modify the example above to generate the scatter plot of sepal length and petal length.
In [63]:
##answer here
#(0) sepal length, (1) sepal width, (2) petal length, (3) petal width, and (4) the type of Iris flower.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
iris=pd.read_csv(‘data/iris.csv’,encoding = ‘ISO-8859-1′,header=None)
setosa=iris.loc[iris[4]==’Iris-setosa’]
versicolor=iris.loc[iris[4]==’Iris-versicolor’]
virginica=iris.loc[iris[4]==’Iris-virginica’]
# index 2 is petal length, replace with index 0 for sepal length
plt.scatter(setosa.iloc[:,0],setosa.iloc[:,2],color=’green’)
plt.scatter(versicolor.iloc[:,0],versicolor.iloc[:,2],color=’red’)
plt.scatter(virginica.iloc[:,0],virginica.iloc[:,2],color=’blue’)
plt.xlim(4,8)
plt.ylim(0.5,8)
plt.ylabel(“sepal length”)
# change x lable to sepal length
plt.xlabel(“sepal length”)
plt.grid(True)

Bar chart¶
Bar chart is probably the most common type of chart. It displays a property or properties of a set of different entities. Bar chart is typically used to provide comparison, or to show contrast between different entities. For example, the bar chart below displays the GNP per capita of the three poorest and the three richest countries in the world (based on 2004 GNP per capita):
In [64]:
%matplotlib inline
import matplotlib.pyplot as plt
import calendar
from numpy import arange
countries = [‘Burundi’,’Ethiopia’,’Rep of Congo’,’Switzerland’,’Norway’,’Luxembourg’]
gnp = [90,110,110,49600,51810,56380] # GNP per capita (2004)
plt.bar(arange(len(gnp)),gnp)
plt.xticks( arange(len(countries)),countries, rotation=30)
plt.show()

Exercise 5 ¶
Modify the bar chart example to plot the average maximum temperature in all major Australian cities. The data is available here.
In [65]:
##answer here
tmp = pd.read_csv(‘data/max_temp.csv’,encoding = ‘ISO-8859-1’)
city_avg_tmp = tmp.iloc[:,1:].mean(axis=1)
city_avg_tmp
city = tmp.iloc[:,0]
In [66]:
%matplotlib inline
import matplotlib.pyplot as plt
import calendar
from numpy import arange
plt.bar(arange(len(city)),city_avg_tmp)
plt.xticks( arange(len(city)),city, rotation=30)
plt.show()

In a clustered bar chart, you can display a few measurements from the entities of interest. For example, the clustered bar chart below simultaneously shows the number of births and deaths in four countries of interest. The number of births is displayed as the blue-colored bar and the number of deaths as the red-colored bar:
In [67]:
%matplotlib inline
import matplotlib.pyplot as plt
import calendar
from numpy import arange
countries = [‘Afghanistan’, ‘Albania’, ‘Algeria’, ‘Angola’]
births = [1143717, 53367, 598519, 498887]
deaths = [529623, 16474, 144694, 285380]
plt.bar(arange(len(births))-0.3, births, width=0.3)
plt.bar(arange(len(deaths)),deaths, width=0.3,color=’r’)
plt.xticks(arange(len(countries)),countries, rotation=30)
Out[67]:
([
)

Histogram¶
Histogram displays a distribution of population samples (typically a large set of data like digital images or age of population). The following example creates a histogram of age within a small number of samples (assumes these are the age of your classmates).
In [135]:
%matplotlib inline
import matplotlib.pyplot as plt
ages = [17,18,18,19,21,19,19,21,20,23,19,22,20,21,19,19,14,23,16,17]
plt.hist(ages, bins=20)
plt.grid(which=’major’, axis=’y’)
plt.show()

Exercise 6 ¶
Change the number of bins in the previous example to 20. Plot the histogram
In [69]:
# answer here
%matplotlib inline
import matplotlib.pyplot as plt
ages = [17,18,18,19,21,19,19,21,20,23,19,22,20,21,19,19,14,23,16,17]
plt.hist(ages, bins=20)
plt.grid(which=’major’, axis=’y’)
plt.show()

In [ ]: