代码要输入的地方用红色标注
Question
Complete the code in 2nd cell below for function min_distance() to compute the minimum distance between a list if x,y points. This is a more complex version of function to find the maximum value in a list (see notebook edX3_to_5). Your code should be efficient so it only check unique combination, i.e. remember maths where combination (nCr) where n is number of objects and r is number of samples. (4 marks)
Your answer whould just add code in place #YOUR CODE GOES HERE, and not change anything else.
Hint: Use indexing to access values in list with Python range function, i.e.
Hint: Use basic distance calculation, i.e. dist = math.sqrt((x2 – x1)**2 + (y2 – y1)**2)
Code
# Test set of points; assume xs,ys lists are same lenght
import matplotlib.pyplot as plt
xs = [1, 7, 2, 10, 3, 4, 8, 4]
ys = [1, 2, 4, 9, 16, 0, 12, 8]
plt.scatter(xs, ys)
# find the minimum distance between a set of x,y pairs
import math
# Compute the minimun distance between points given as lists for
# x,y ordinates. Return the indicies of closest pair and minimum distance
def min_distance(xs,ys):
# assume inputs xs,ys are lists of same length representing
# x,y point ordinates where points are distinct
# start with a high number as lowest
min_dist = 99999
min_index1 = 0
min_index2 = 0
# iterate x,y ordinates to find minimum distance
# YOUR CODE GOES HERE
return(min_index1, min_index2, min_dist)
index1,index2,d = min_distance(xs,ys)
print(“Closest ordinates {},{} with distance: {}”.format(index1,index2,d))
Question 3
You are given data for a hypothetical train survey on passenger addresses (a suburb) and travel time (hours) for : i) home to station, ii) rail journey, and iii) station to work.
The aim is to summarise the mean travel time by suburb. However there are issues with inconsistent terms for suburb names and missing travel time values. Your task will be to ‘clean’ the data and compute a best estimate summary with Pandas.
Code
# read the data from csv file
import pandas as pd
survey_data = pd.read_csv(“https://uqmaps.maps.arcgis.com/sharing/rest/content/items/2ce2a8c9b88b4ee4a6c3ae9d1a11f30f/data”)
survey_data
Question 3.1 Compute a preliminary summary
Use Pandas groupby on ‘suburb’ and compute aggregate mean for tt_home,tt_rail,tt_work
Code
# Use pandas groupby and agg to get mean for tt_home,tt_rail,tt_work
grouped_survey_data = survey_data.groupby(‘suburb’)
# YOUR CODE GOES HERE, i.e. replace None with code to compute means
summary_survey_data = None
summary_survey_data
Question 3.2. You notice suburb names varying, and could be ‘cleaned’ to use one name
Clean the suburb names by:
Captializing names
Substituting abbreviated names to full name, you are provided with data for common abbreviations
Code
# common abbreviations used to identify (by the key) an abreviated term and the substiture(value)
survey_suburbs_abbrevations = {‘Nth’:’North’, ‘Est’:’East’, ‘Wst’:’West’, ‘Sth’:’South’}
# YOUR CODE GOES HERE, i.e. essentially fix the suburbs as capitalised names
Question 3.3 Fix missing values for travel times, i.e. null values in dataframe (2.5 marks)
Missing values occur when survey respondents miss entering data. The missing value may be dealt by strategies (in increasing correctness) to removine rows wit null values, fillin based upon similar associated data, or substitute with an imputed value. Ses imputation statistics (https://en.wikipedia.org/wiki/Imputation_(statistics) ).
Code
# YOUR CODE GOES HERE, i.e. essentially fix missing data
Question 4
This is extra credit beyond the 15 marks awarded for questions 1-3, but is a very hard question and the marks do not match with the level of difficulty. In other words, only do this if you like an algorithm coding challenge. (3 marks)
Using the data for Question 2, write a function in standard Python (i.e. not importing other packages) to compute the maximal sized orthogonal rectangle that fits within a point set and its extent. This must be your worked solution and not obtained off the web or from others!
Code
# function fit_rectangle – input a set of points and
# – returns x,y for lower left corner, and widht, height of
# maximally contained orthogonal rectangle
def fit_rectangle(xs,ys):
# return example result
return (4,2,4,14)
xs = [1, 7, 2, 10, 3, 4, 8, 4]
ys = [1, 2, 4, 9, 16, 0, 12, 8]
x1,y1,w,h = fit_rectangle(xs,ys)
Example
# Example plot of a possible solution
import matplotlib.pyplot as plt
import matplotlib.patches as draw
fig, ax = plt.subplots()
ax.scatter(xs, ys)
ax.set_title(“Example plot of a possible maximally contained rectangle”)
rect = draw.Rectangle((x1,y1), w, h, fc =’none’, ec =’r’,)
ax.add_patch(rect)
plt.show()
Code