ECON215 Coursework
The ECON215 Coursework Exercise is 100 of the module mark. The deadline for submitting the assignment online only, see below is 5pm, Thursday 12th December. All exercises in the ECON215 Coursework Exercise need to be completed using Python 3 code in this Jupyter Notebook file, which needs to be converted to a HTML file as follows before submitting:
1. Within Jupyter, click on File then Download as HTML .html. It will now be located in the Downloads directory of your MWS computer.
2. Save this to your M drive. One way to do this on MWS computers is by rightclicking on the file, selecting copy, then rightclicking in your M drive and selecting paste.
Once this is done, the final step is as follows:
1. Submit the .html version of your assignment via the link that will appear in the Assessment folder on Vital shortly.
The assignment is comprised of three sections, with the following weights: Section A 40, 5 marks per question, Section B 40, 10 marks per question and Section C 20, 10 marks per question. Where a question has parts a and b, the marks are split evenly.
Section A
Q1 Python lists Can you complete the print statement below, selecting the correct item in the drinks list in order to print tea:
In:
drinks orange juice, tea, water
print
Q2 Python lists Add an additional drink item, pineapple juice, to the list, then print the updated version of drinks
In:
Q3 Python lists Create a list containing any two strings and any two integers.
In:
Q4 Python lists remove the last integer in the list created in Q3
In:
Q5 Python dicts Create a Python dict variable called passengers2017, containing the following information key : value pairs on the total number of aircraft depatures and arrivals in thousands. That is, set the keys to be the airport names and the values to be the numbers.
Heathrow
474
Gatwick
282
Manchester
194
In:
passengers2017
Q6 Python functions Define a function myfunction that returns the value of
fx2x
f
x
2
x
when supplied with a value of
x
x
.
In:
the function definition:
def myfunction :
Q7 Python functions Define a function myfunction2 that returns the value of
fx,yxy
f
x
,
y
x
y
when supplied with values of
x
x
and
y
y
.
In:
Q8 Python for loop Complete the following code in order to print the items in alist.
In:
alist a, bb, ccc
for item in alist:
Section B
In your solutions to the Section B questions, you must also include explanation of each step in your code. For example, if you use functions belonging to a particular module, or methods belonging to a particular object, or core Python functions, you should say so. If you set particular options inside the functions or methods, you should explain what these do. If you create an object of a particular type e.g. DataFrame or list, you should say so. As many steps as possible should be explained, though it is not necessary to reexplain any identical steps within a given solution e.g. repeated use of the same function from the same module
The explanation should be included as comments inside the code cell. Recall that there are two ways of writing comments in Python:
This is a single line comment
This is
a multiline
comment
Q1 matplotlib line plotting Using matplotlib and the lists of data xdata and ydata provided in the code cell below, plot the ydata against xdata.
In:
import matplotlib.pyplot as plt
xdata 0,1,2,3,4,5,6,7,8,9,10
ydata 0,2.2,4.3,2.1,9.3,3.2, 13.4,14.1,15.6,7.7,10.2
Q2 matplotlib line plotting Using matplotlib and the lists of data xdata, ydata and zdata provided in the code cell below, can you create graphs of ydata against xdata, and zdata against xdata, both on the same plotting area?
Better answers will specify different line styles, marker styles and colours for each line, and will include a legend describing the lines
y
y
and
z
z
. Moverover, better answers will include labels for each axis
x
x
and
fx
f
x
, a title for the plot, and will use the annotate function from matplotlib.pyplot to point at the intersection of the lines.
In:
import matplotlib
import matplotlib.pyplot as plt
xdata 1, 2, 3, 4, 5
ydata 5, 4, 3, 2, 1
zdata 2.5, 3, 3, 4, 4.5
Q3 Loading data, changing the DataFrame index, Figure objects and subplots For this question, please download the files arableland2000.txt and arableland2001.txt from Vital and put them in the same directory as this Jupyter Notebook. This question has two parts, a and b. Please put your solutions to parts a and b in the separate code cells below.
a Suppose you would like to read the contents of the files arableland2000.txt and arableland2001.txt into pandas as DataFrame objects, but you dont know the contents of the files. Show how you can look at the contents of these text files using Python, then read both of them into pandas as arableland2000 and arableland2001, setting the Country Name column to be the row index in each case, and with the Country Name column dropped in both cases.
Note: Part of question a is to read the data directly from the text files into the DataFrame objects. However, if you are unable to do this step, you can produce the required DataFrame objects in any other way e.g. by copying and pasting values into code to create DataFrames from dicts and continue with the rest of the question for partial marks.
b Show how, using a matplotlib Figure object and the plot method for pandas DataFrame and Series objects, or otherwise, you can you create a figure containing two barchart plots sidebyside, one for each of the two datasets arableland2000.txt and arableland2001.txt. Each plot should have a title and the spacing between the plots should be increased.
Note: If you choose not to use the pandas plot method directly on the pandas objects arableland2000 and arableland2001, and instead use a matplotlib method, recall that the matplotlib methods only work for Series, not for DataFrame. The objects arableland2000 and arableland2001 will be DataFrames, so you would need to select the data column from each a Series in order to do this. Two cosmetic points optional: as part of your solution, however you do it, you should have two AxesSubplot objects within your code. If these are called ax1 and ax2, note that the unnecessary
x
x
axis label Country Name in the plots can be removed by running ax1.setxlabel and ax2.setxlabel . Moreover, if you are using the pandas plot method, note that the unnecessary legend can be removed by setting the option legend False.
Answer to a:
In:
Answer to b:
In:
Q4 Python for loop This question has two parts, a and b. Please put your solutions to parts a and b in the separate code cells below.
a Consider the following code, which contains a list comprehension.
alist a, bb, ccc
newlistlenitem for item in alist
Write a for loop that does the equivalent of the above, starting with the empty list newlist and using the list method append.
b Using a for loop, the list letters and the variable count that have been defined below, and either format strings or the string format method, can you create the following output?
Item 1 in letters is AItem 2 in letters is BItem 3 in letters is C
Answer to a:
In:
alist a, bb, ccc
Answer to b:
In:
letters A, B, C
count 0
In:
Section C
As in Section B, your solutions to the Section C questions must also include explanation of each step in your code. For example, if you use functions belonging to a particular module, or methods belonging to a particular object, or core Python functions, you should say so. If you set particular options inside the functions or methods, you should explain what these do. If you create an object of a particular type e.g. DataFrame or list, you should say so. As many steps as possible should be explained, though it is not necessary to reexplain any identical steps within a given solution e.g. repeated use of the same function from the same module
Q1 mergeing DataFrames, and operating on DataFrames For this question, please download the files poverty2000.csv, poverty2001.csv, poverty2002.csv and poverty2003.csv from Vital and put them in the same directory as this Jupyter Notebook.
The code below produces four pandas Series objects, pov2000, pov2001, pov2002 and pov2003, each containing poverty headcount ratio data percentage of population at 1.90 a day 2011 PPP for countries in a given year 2000, 2001, 2002 and 2003, respectively.
Use the merge function to join the four Series into a DataFrame with four columns, where only data for the countries common to all four Series is included there should be no missing data in the merged DataFrame. Then use pandas functions or methods to create a new Series containing the average values over 20002003.
Q2 Indexing, hierarchical indexing, groupby, aggregation, data analysis For this question, please download the files povertyfull.csv and regionandincomegroup.csv from Vital and put them in the same directory as this Jupyter Notebook.
Read both files into DataFrame objects, then use loc or iloc to create a new DataFrame containing only the following columns from the file povertyfull.csv: Country Name , Country Code, and the years 2010 to 2015.
Then obtain a new DataFrame that inner merges the two DataFrames on Country Code.
Using this data, perform some exploratory data analysis. The better answers will, as part of this, set a hierarchical index for the rows using IncomeGroup and Country Name, use groupby to find statistics by Region and IncomeGroup, and will include some visualisation of the data.
Two code cells are provide below for your solution: the first is for the data analysis, where you should print out key results, while the second is for any visualisation.
In:
pandas code here
In:
visualisation code here