Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\rightarrow$Run All). Check your output to make sure it all looks as you expected.
Make sure you fill in any place that says YOUR CODE HERE or “YOUR ANSWER HERE”, as well as your name below:
In [ ]:
NAME = “YOUR NAME HERE”
Week3 Homework¶
Reminder: Do not use exit() in a notebook. Use return to exit a function.
This homework includes list comprehensions, which are a very common pattern in Python code. They can be schematically described as being of the form [something(x) for x in list], which returns a new list with the function something applied to each member x in the list (or other iterable, such as a tuple).
Make sure you have read the chapter on List Comprehensions here: http://introtopython.org/lists_tuples.html#List-Comprehensions
In [ ]:
# Here is a list of names we will use for the first problem:
names = [‘frederica’, ‘gilbert’, ‘amine’, ‘hasan’, ‘annie’, ‘bob’]
Question 1: Write a function using a list comprehension to create a new list with just the length of each of the strings in the input list of names above. Return the new list of string lengths.
In [ ]:
def get_lengths(words):
“”” Takes a list of words, returns a list of their lengths. “””
# TYPE YOUR CODE HERE.
# ALSO REMOVE THE LINE BELOW WITH “raise” IN IT:
raise NotImplementedError()
In [ ]:
# show your code running here:
get_lengths(names)
In [ ]:
## A test for your code that you can ignore. We will check manually too.
assert get_lengths(names) == [9, 7, 5, 5, 5, 3]
In [ ]:
## another test to make sure you used a list comprehension
import ast
import inspect
src = inspect.getsource(get_lengths)
node = ast.parse(src)
class ListCompChecker(ast.NodeVisitor):
def __init__(self):
super(ListCompChecker).__init__()
self.Found = False
def visit_ListComp(self, node):
self.Found = True
checker = ListCompChecker()
checker.visit(node)
assert checker.Found
Question 2: How would you convert the following for-loop code (for val in mylist) into a single line list comprehension? (The print part can be a separate line after your list comprehension.)
In [ ]:
mylist = [2,4,8,10,3.4,4,2]
tens = []
for val in mylist:
tens.append(val * 10)
print(tens)
In [ ]:
# TYPE YOUR CODE HERE.
# ALSO REMOVE THE LINE BELOW WITH “raise” IN IT:
raise NotImplementedError()
Chapter 9: Dictionaries¶
You should have read the book chapter up to Advanced Text Parsing (you can skip that if you want) and also read this: http://introtopython.org/dictionaries.html
Pay particular attention to looping through keys and values in dictionaries using items(), which is not mentioned in the book. This is the most common way to access the parts of the dictionary.
The pattern to use with “items()” is this:
for key, value in mydict.items():
# do something with key and/or value
Make sure you understand this common pattern.
Question 3:¶
Write a function that adds one to each numeric value in a dictionary. If the value is not numeric (i.e. not an int or float), don’t add anything, but keep the same value. The function should take a dictionary as argument, and return the modified dictionary.
In [ ]:
def addtovalue(dictionary):
“Add one to each numeric value of the dictionary, return the dict”
# TYPE YOUR CODE HERE.
# ALSO REMOVE THE LINE BELOW WITH “raise” IN IT:
raise NotImplementedError()
In [ ]:
testdict = { ‘fred’: 3.3, ‘marie’: ‘5’, ‘jean’: 14, ‘angus’: 44, ‘amine’: None}
result = addtovalue(testdict)
assert result[‘fred’] == 4.3
assert result[‘jean’] == 15
assert result[‘angus’] == 45
In [ ]:
testdict = { ‘fred’: 3, ‘marie’: ‘5’, ‘jean’: 14, ‘angus’: 44, ‘amine’: None}
result = addtovalue(testdict)
assert result[‘amine’] is None
assert result[‘marie’] == ‘5’
Question 4:
You can use multiple functions if you want, but you have to have one main function that calls the others, if you do. This is because the test code relies on the main function to check your output.
Create a function that takes a file name (and path if needed) as the argument. In the function, open and read in the file mountains.csv. Use a try/catch to be sure the file exists and is readable. If the file location is wrong or it can’t be opened, print an error that begins with “Error:”. (You can test it with a junk path or filename that doesn’t exist.)
The pattern I suggest for this is:
try:
with open(‘mountains.csv’, ‘r’) as handle:
for line in handle:
#….do stuff here (you can have other try/except in here if you want)
except:
print(“Error: Something wrong with your file location?”)
return
An alternate pattern is:
try:
handle = open(filename, ‘r’)
except:
print(“Error: trouble with file opening”)
return
But you must remember to close the handle if you do this. The book says:
We could close the files which we open for read as well, but we can be a little sloppy if we are only opening a few files since Python makes sure that all open files are closed when the program ends. When we are writing files, we want to explicitly close the files so as to leave nothing to chance.”
If you are using the pattern I recommend with the with open() as handle: idiom, you don’t need to close it explicitly, it will be closed for you. That’s why it’s recommended.
Mountains.csv is a comma-separated list of mountains, their height in meters, and the range they belong to. (Look at it in a text editor, but don’t edit the file!) A CSV file is a common format for raw data. Other types of raw data files are point-virgule (semi-colon) separated files or tab-separated files. However the columns are separated, you must use that character in your “split” code.
In this case, it’s a comma. Split each line by the comma, and make a dictionary where the key is the mountain name (the first element) and the height is the value, the second element. Make sure to convert the height to a number. Then print the keys and values of the dictionary using .items(), in readable sentences that say, for instance, “The height of K2 is 8611 meters.” Return the dictionary at the end of the function.
Reminder about print with {} in your string: use print(string.format(variable)) to fill in the {} with your variable. If there are 2 {}’s, use .format(var1, var2).
In [ ]:
def mountain_height(filename):
“”” Read in a csv file of mountain names and heights.
Parse the lines and print the names and heights.
Return the data as a dictionary.
The key is the mountain and the height is the value.
“””
mountains = dict()
msg = “The height of {} is {} meters.”
err_msg = “Error: File doesn’t exist or is unreadable.”
# TYPE YOUR CODE HERE.
# ALSO REMOVE THE LINE BELOW WITH “raise” IN IT:
raise NotImplementedError()
In [ ]:
# Edit this to have the path to your file mountains.csv.
# Show that it runs.
filename = “./data_files/mountains.csv”
mountain_height(filename)
In [ ]:
# Test code for grading your function. You can ignore this.
filename = “./data_files/mountains.csv”
output = mountain_height(filename)
assert len(output.keys()) == 14
assert output[‘Annapurna’] == 8091
In [ ]:
# Test code for your printing the data.
from io import StringIO
import mock
import sys
filename = “./data_files/mountains.csv”
with mock.patch(‘sys.stdout’, new_callable=StringIO):
mountain_height(filename)
assert “8848” in sys.stdout.getvalue()
assert “Mount Everest” in sys.stdout.getvalue()
In [ ]:
# Test for your error condition with a bad filename/path.
from io import StringIO
import mock
import sys
filename = None
with mock.patch(‘sys.stdout’, new_callable=StringIO):
mountain_height(filename)
assert “Error:” in sys.stdout.getvalue()
Question 5:
Rewrite your function to use the collections module’s Counter to count how many times each mountain range is mentioned. Each row contains a mountain, its height, and the range it is part of. The ranges are still in the 3rd column of the mountains.csv file! You can use more than one function if you want.
Also add a dictionary that records all the heights of the mountains in a particular range. You will use a list for the values of the heights. So this is a dictionary with a list value! The key will be the range name. Each time you see a new mountain in the range, add the height to the list for that key. For example, after reading all the data, mountains[‘Himalayas’] == [8848, 8586, 8516, 8485, 8201, 8167, 8163, 8126, 8091, 8027]. (The “Himalayas” are the range.)
You may use a regular dict or a defaultdict, but you must beware of KeyError with a regular dictionary if the key doesn’t exist yet.
Your output should be to print the top 2 ranges (according to their Counter value — hint: look at the function most_common()). And adding the mountain range name to the counter requires a little care (look at update).
Then, print the average height of the mountains in each range. (They don’t have to be in order. Hint: You may need to find out how to import a mean function, or else calculate it by hand.)
Return the dictionary object with the ranges and their lists of mountain heights after all the printing.
Show that this code works with the other file, “highest_mountains.csv” too.
In [ ]:
# Using Counter()
from collections import Counter
from collections import defaultdict
from statistics import mean # this also exists in numpy if you prefer
# define your dicts inside the function, so they can be re-used each time it is called.
def mountain_ranges(filename):
ranges = Counter()
heights = defaultdict(list) # empty list is the default here, not an int!
# TYPE YOUR CODE HERE.
# ALSO REMOVE THE LINE BELOW WITH “raise” IN IT:
raise NotImplementedError()
In [ ]:
# TYPE YOUR CODE HERE.
# ALSO REMOVE THE LINE BELOW WITH “raise” IN IT:
raise NotImplementedError()
In [ ]:
## Testing the output contains values we expect from the counts and means.
import mock
from io import StringIO
import sys
with mock.patch(‘sys.stdout’, new_callable=StringIO):
mountain_ranges(“data_files/mountains.csv”)
assert “8321” in sys.stdout.getvalue()
assert “10” in sys.stdout.getvalue()
In [ ]:
# Testing your output for the grade. Ignore this. Handgrading of the printed output.
filepath = “./data_files/mountains.csv”
result = mountain_ranges(filepath)
assert result[‘Karakoram’] == [8611, 8080, 8051, 8035]
In [ ]:
# Show your code works with the other file, highest_mountains too.
# Fix the path!
filepath = “./data_files/highest_mountains.csv”
mountain_ranges(filepath)
Congratulations, now you are doing basic data science!¶