DNDS6013 Scientific Python: 3rd class¶
Central European University, Winter 2019/2020¶
Instructor: Márton Pósfai, TA: Luis Natera Orozco
Emails: posfaim@ceu.edu, natera_luis@phd.ceu.edu
Novelties:¶
• Slack channel to show your solution to the exercises. Join here!
• There will be a small quiz, not part of your grade, just for (self-)assessment.
• Hints and solutions for the excercises
Today’s topics¶
• Sorting
• Analyze city data
• Dictionaries
Recap from last week¶
Lists comprehensions¶
So far we created lists like this:
In [ ]:
L = []
for x in range(5):
L.append(x**2)
print(L)
We can also use list comprehensions:
In [ ]:
L = [x**2 for x in range(5)]
print(L)
In [ ]:
L = [x**2 for x in range(5) if x > 2]
print(L)
In [ ]:
L = []
for x in range(5):
if x%2 == 0:
L.append([x, x*x])
print(L)
In [ ]:
L = [[x, x**2] for x in range(5) if x%2 == 0]
print(L)
Functions¶
In [ ]:
def f(x,y):
z = x + y
return z
print(f(1,10))
The variable z is local:
In [ ]:
z=-100
f(1,2)
print(z)
Argument a of function f(a) is passed by assignment.
For example, if I call f(10), a local variable is created as a=10
In [ ]:
def f(a):
a += 1
return
b = 3
f(b)
print(b)
In [ ]:
def f(a):
a[0] += 1
return
b = [ 1, 2, 3 ]
f(b)
print(b)
• If x is mutable (lists, dictionaries,…): changing x inside f(x) -> x also changes outside
• If x is immutable (numbers, strings, tuples,…): changing x inside f(x) -> x does not change outside
Time for a little quiz!¶
Don’t panic! Does not effect your grade, for self-assessment. You can find it in moodle, 5 questions in 12 minutes.
Today’s first new topic: sorting¶
In [ ]:
l = [34,1,2,78,3]
sl = sorted(l)
print(“Sorted list:”,sl)
print()
print(“Original:”,l)
In place sorting: sort()
In [ ]:
l.sort()
print(l)
In [ ]:
guests = [“Kate”,”Peter”, “Adam”, “Jenny”, “Zack”, “Eva”]
print(“Zack”>”Eva”)
print(sorted(guests))
print(sorted(guests, reverse=True))
International characters¶
In [ ]:
guests = [“Kate”,”Peter”, “Adam”, “Jenny”, “Zack”, “Eva”]
guests.append(‘Ödön’)
print(sorted(guests))
print()
#The problem
print(“Zack”>”Odon”)
print(“Zack”>”Ödön”)
In [ ]:
import locale
locale.resetlocale()
locale.getlocale()
In [ ]:
locale.setlocale(locale.LC_ALL, (‘hu_HU’,’UTF-8′))
print(locale.getlocale())
print(locale.strxfrm(“Ödön”))
print(locale.strxfrm(“Zack”)>locale.strxfrm(“Ödön”))
print()
print(sorted(guests, key=locale.strxfrm))
In [ ]:
def f1(x):
return x % 7
L = [15, 3, 11, 7]
print(“Normal sort :”, sorted(L) )
print(“Sorted with key:”, sorted(L, key = f1) )
Exercise 1¶
Write code which orders the list generated by
L = [[x, y] for x in range(5) for y in range(5) if x != y]
by the sum of the two elements
Click to reveal a hint.
Define a function that takes a list of two elements as input and returns their sum. Use this function as the key for sorting.
In [ ]:
L = [[x, y] for x in range(5) for y in range(5) if x != y]
Click to reveal solution.
“`python #solution 1 def f(z): return z[0]+z[1] print(sorted(L,key=f)) #solution 2, Python already has a function for summing lists print(sorted(L,key=sum)) “`
Lambdas: one-line nameless functions¶
In [ ]:
f1 = lambda x: x%7
L = [15, 3, 11, 7]
print(“Sorted with key:”, sorted(L, key = f1) )
In [ ]:
L = [15, 3, 11, 7]
print(“Sorted with key:”, sorted(L, key = lambda x: x%7) )
Exercise 2¶
Use lambda to sort the
L = [[x, y] for x in range(5) for y in range(5) if x != y]
array by the second value in the list
In [ ]:
L = [[x, y] for x in range(5) for y in range(5) if x != y]
Click to reveal solution.
“`python L.sort(key = lambda z: z[1]) print(L) “`
Analyzing city data¶
We will read a file containing data about Hungarian municipalities and calculate some things.
In [ ]:
f = open(“Hun_cities.csv”,”r”)
contents = str(f.read())
f.close()
print(contents[:500])
In [ ]:
contents.split(‘\n’)[5]
In [ ]:
cities = []
for line in contents.split(‘\n’):
cities.append(line.split(‘,’))
print(cities[0])
print(cities[1])
Alternative way of doing this:
In [ ]:
cities = []
with open(“Hun_cities.csv”,”r”) as f:
for line in f:
cities.append(line.rstrip().split(‘,’))
print(cities[0])
print(cities[1])
print(len(cities))
In [ ]:
Convert coordinates from strings to numbers:
In [ ]:
for c in cities[1:]:
c[5]=float(c[5])
c[6]=float(c[6])
print(cities[1])
Exercise 3¶
Remove the double quotes from city names
Hint.
Iterate through the cities list and use the strip() method of strings to remove the double quotes.
In [ ]:
Solution.
“`python for c in cities: c[1]=c[1].strip(‘”‘) c[2]=c[2].strip(‘”‘) print(cities[5]) “`
In [ ]:
Exercise 4¶
• Print the top 10 most populus cities
• Print the top 10 cities with the largest area
• Print the smallest city with university
• Print the top 10 cities with population density
Hint.
Use the sorted() function with the key argument to appropriately sort the cities list. Don’t forget to convert the strings into numbers!
Hint 2.
Solution for the first one: “`python print(“10 most populus:”, [c[1] for c in sorted(cities[1:], key = lambda x: -int(x[3]))[:10]]) “`
In [ ]:
Solution.
“`python print(“10 most populus:”, [c[1] for c in sorted(cities[1:], key = lambda x: -int(x[3]))[:10]]) print(“10 largest area:”, [c[1] for c in sorted(cities[1:], key = lambda x: -float(x[4]))[:10]]) print(“smallest w university:”, min([c for c in cities[1:] if int(c[-1])>0], key = lambda x: int(x[3]))[1]) print(“10 largest pop density:”, [c[1] for c in sorted(cities[1:], key = lambda x: -float(x[3])/float(x[4]))[:10]]) “`
Exercise 5¶
• Find the cities with the larges distance between them.
• If you are done, extend the code to find the cities with the smallest distance between them.
A little help, convert lattitude and longitude to distance use the haversine formula:
In [ ]:
import math
def latlongdist(lat1,long1,lat2,long2):
rlat1 = math.radians(lat1)
rlat2 = math.radians(lat2)
rlong1 = math.radians(long1)
rlong2 = math.radians(long2)
dlat = rlat2 – rlat1
dlong = rlong2 – rlong1
a = math.sin(dlat / 2)**2 + math.cos(rlat1) * math.cos(rlat2) * math.sin(dlong / 2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 – a))
return 6371.0 * c
print(latlongdist(48.105625, 20.790556, 46.07308, 18.22857))
Write your solution here:
Hint.
Cycle through all city pairs using nested for loops: “`python for a in range(1,len(cities)): for b in range(a+1,len(cities)): “`
In [ ]:
Solution.
“`python def alldist(): maxa = 0 maxb = 0 maxdist = -1 mina = 0 minb = 0 mindist = 1e10 for a in range(1,len(cities)): for b in range(a+1,len(cities)): dist = latlongdist(float(cities[a][5]),float(cities[a][6]),\ float(cities[b][5]),float(cities[b][6])) if dist>maxdist: maxdist = dist maxa = a maxb = b if dist>0 and dist < mindist: mindist = dist mina = a minb = b print(cities[maxa][1],cities[maxb][1],maxdist) print(cities[mina][1],cities[minb][1],mindist) ```
In [ ]:
def alldist():
maxa = 0
maxb = 0
maxdist = -1
mina = 0
minb = 0
mindist = 1e10
for a in range(1,len(cities)):
for b in range(a+1,len(cities)):
dist = latlongdist(float(cities[a][5]),float(cities[a][6]),\
float(cities[b][5]),float(cities[b][6]))
if dist>maxdist:
maxdist = dist
maxa = a
maxb = b
if dist>0 and dist < mindist:
mindist = dist
mina = a
minb = b
print(cities[maxa][1],cities[maxb][1],maxdist)
print(cities[mina][1],cities[minb][1],mindist)
In [ ]:
import timeit
timeit.timeit(alldist,number=1)
In [ ]:
Bucketing¶
On my laptop calculation took ~9 seconds for $n = 2561$ cities. That is $\frac{n(n-1)}{2}$ distances.
What about larger systems? -> Bucketing!
• Divide the space into boxes
• Put the cities into boxes
• Number of boxes is much less than number of cities
• Select only the boxes which are candidates for the given quantity: for minimal distance only neighboring ones, for maximal distance the few with maximal distance
In [ ]:
lat = [ row[5] for row in cities[1:]]
print(“Lattitude:”,min(lat),max(lat))
long = [ row[6] for row in cities[1:]]
print(“Longitude:”,min(long),max(long))
Create an empty grid:
In [ ]:
shape = (4, 7) # shape of the grid
#a grid of empty lists
grid = [[[] for k in range(shape[1])] for j in range(shape[0])]
for row in grid:
print(row)
print()
Get size of tiles:
In [ ]:
#The possible range of coordinates
la_range = (min(lat), max(lat))
lo_range = (min(long), max(long))
#size of tiles in the grid
dla = (la_range[1]-la_range[0])/shape[0]
dlo = (lo_range[1]-lo_range[0])/shape[1]
Fill in the grid:
In [ ]:
for c in cities[1:]:
ilat = int((float(c[5]) – la_range[0]) / dla)
ilong = int((float(c[6]) – lo_range[0]) / dlo)
if ilat == shape[0]:
ilat -= 1
if ilong == shape[1]:
ilong -= 1
grid[ilat][ilong].append(c)
for i in range(shape[0]):
for j in range(shape[1]):
print(“%3d” % len(grid[i][j]), end=” “)
print()
Largest distance
• Get largest distance between nonempty boxes (28*27/2 distance calculations)
• Pair cities from the two boxes
• In general, this is most probably the largest distance between two cities (Stop here)
• In order to be really sure one should pair cities in boxes with less then the found distance
In [ ]:
boxcoords = [[grid[j][k],(j+.5)*dla+la_range[0],(k+.5)*dlo+lo_range[0]] \
for k in range(shape[1]) for j in range(shape[0])]
boxdists = [(b1[0],b2[0],latlongdist(b1[1],b1[2],b2[1],b2[2])) \
for b1 in boxcoords for b2 in boxcoords if b1[0] and b2[0]]
maxboxdist = max(boxdists, key= lambda x: x[2])
citydists = [(c1[1],c2[1],latlongdist(c1[5],c1[6],c2[5],c2[6])) \
for c1 in maxboxdist[0] for c2 in maxboxdist[1]]
print(max(citydists))
In [ ]:
for c in cities[1:]:
if c[2] in [‘”Or”‘,'”Ortilos”‘,'”Uszka”‘, ‘”Szakonyfalu”‘]:
ilat = int((float(c[5]) – la_range[0]) / dla)
ilong = int((float(c[6]) – lo_range[0]) / dlo)
if ilat == shape[0]:
ilat -= 1
if ilong == shape[1]:
ilong -= 1
print(c[1],ilat,ilong)
Exercise 6¶
Find smallest distance between cities.
The closest cities are either in the same box or in neighboring boxes. Let’s break up this exercise into two smaller steps.
1. Find closest cities that are in the same box
Hint.
Do something similar to what we did before: “`python citydists = [(c1[1],c2[1],latlongdist(c1[5],c1[6],c2[5],c2[6])) \ for c1 in maxboxdist[0] for c2 in maxboxdist[1]] “` Only instead of select cities from boxes maxboxdist[0] and maxboxdist[1], select them from the same box.
In [ ]:
Solution.
“`python citypairs1 = [] for k in range(shape[0]): for j in range(shape[1]): citypairs1 += [(c1[1],c2[1],latlongdist(c1[5],c1[6],c2[5],c2[6])) \ for c1 in grid[k][j] for c2 in grid[k][j]] min1 = min(citypairs1, key = lambda x: x[2] if x[2]>0 else 1000) print(min1) “`
1. Find closest cities that are in neighboring boxes in directions:
• west-east
• north-south
• northeast-southwest
Hint.
The western neighbor of box grid[j][k] is grid[j][k+1].
In [ ]:
Solution.
“`python citypairs2 = [] for k in range(shape[0]): for j in range(shape[1]): # west-east if j0 else 1000) print(min2) “`
Put the two together:
In [ ]:
print(min([min1, min2], key = lambda x: x[2]))
Home work!¶
In separate notebook uploaded to moodle. You will have to investigate how run-time of finding the maximum and minimum distance depends on the resolution of the grid. You can reuse the code from the class.
Submission deadline: February 13, 3:30 pm
Dictionaries¶
Dictionaries are similar to lists, except that each element is a key-value pair. The syntax for dictionaries is {key1 : value1, key2 : value2, …}:
In [ ]:
fruits = {“bananas” : 1,
“oranges” : 2,
“apples” : 3,}
print(type(fruits))
print(fruits)
In [ ]:
print(“bananas = ” + str(fruits[“bananas”]))
print(“oranges = ” + str(fruits[“oranges”]))
print(“apples = ” + str(fruits[“apples”]))
In [ ]:
# change value
fruits[“bananas”] = “no bananas”
fruits[“oranges”] = 100
# add a new entry
fruits[“pineapples”] = “D”
print(“bananas = ” + str(fruits[“bananas”]))
print(“oranges = ” + str(fruits[“oranges”]))
print(“apples = ” + str(fruits[“apples”]))
print(“pineapples = ” + str(fruits[“pineapples”]))
Strings, numbers, and tuples work as keys, and any type can be a value. Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). Looking up a value which is not in the dict throws a KeyError — use in to check if the key is in the dict, or use dict.get(key) which returns the value or None if the key is not present (or get(key, not-found) allows you to specify what value to return in the not-found case).
In [ ]:
print(fruits[‘bananas’]) ## Simple lookup
If you try to access something that does not exists in the dictionary, you will get an error:
In [ ]:
print(fruits[‘strawberries’])
To avoid key errors, you can simply check with an if that the key is present in the dictionary:
In [ ]:
if ‘bananas’ in fruits: print(fruits[‘bananas’]) ## Yes, you can also write an if in this way
if ‘strawberries’ in fruits: print (fruits[‘strawberries’]) ## and an if-else in this way
else: print(“I don\’t know what a strawberry is”)
An alternative way to access keys in a dictionary is with the method get. If the key does not exist you get a None:
In [ ]:
print(fruits.get(‘bananas’))
print(fruits.get(‘strawberries’))
You can also define a default different from None:
In [ ]:
print(fruits.get(‘strawberries’,0))
To iterate over key-value pairs of a dictionary:
In [ ]:
for key in fruits:
print(key, fruits[key])
This iteration is equivalent to iterating over fruits.keys():
In [ ]:
for key in fruits.keys():
print(key)
In [ ]:
type(fruits.keys())
If you want to show the values instead:
In [ ]:
for value in fruits.values():
print(value)
If you want to show both at the same time, you can use fruits.items()
In [ ]:
for key, value in fruits.items():
print(key + ” = ” + str(value))
Note that the keys are not sorted, nor are listed in the order you added them to the dictionary! If you want to do that, you should sort the keys first
In [ ]:
for key in sorted(fruits.keys()):
print(key, fruits[key])
In [ ]:
#fruits[‘bananas’]=23
#fruits[‘pineapples’]=45
for value in sorted(fruits.values()):
print(value)
Dictionary comprehensions
In [ ]:
D = {k:k*k for k in range(3)}
print(D)
In [ ]:
for value in D.values():
print(value)
Exercise 7¶
Build a dictionary from the cities list. The key should be the accentless cityname (column 2), The value should be the population
Hint 1.
You can start from an empty dictionary and use a for loop to iterate trough cities to populate the dictionary or you can use a dictionary comprehension.
Hint 2.
The first element of cities is a header, exclude it by slicing as cities[1:].
In [ ]:
Solution.
“`python cdict = {c[2]:int(c[3]) for c in cities[1:]} #test it print(cdict[‘Szeged’]) “`
Exercise 8¶
Build a dictionary containing dictionaries from the cities list. The key should be the accentless cityname (column 2), The value should be a dictionary, with ‘population’ the population and ‘area’ the size in km$^2$
In [ ]:
Solution.
“`python cdict2 = {c[2]:{“population”:int(c[3]), “area”:float(c[4])} for c in cities[1:]} # test it print(cdict2[‘Szeged’]) “`