Before you start working on the exercise¶
Use Python version 3.7 up to 3.9. Make sure not to use Python 3.10
It is highly recommended to create a virtual environment for this course. You can find resources on how to create a virtual environment on the ISIS page of the course.
Copyright By PowCoder代写 加微信 powcoder
Make sure that no assertions fail or exceptions occur, otherwise points will be subtracted.
Use all the variables given to a function unless explicitly stated otherwise. If you are not using a variable you are doing something wrong.
Read the whole task description before starting with your solution.
After you submit the notebook more tests will be run on your code. The fact that no assertions fail on your computer locally does not guarantee that you completed the exercise correctly.
Please submit only the notebook file with its original name. If you do not submit an ipynb file you will fail the exercise.
Edit only between YOUR CODE HERE and END YOUR CODE.
Verify that no syntax errors are present in the file.
Before uploading your submission, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel\Restart) and then run all cells (in the menubar, select Cell\Run All).
import sys
if (3,7) <= sys.version_info[:2] <= (3, 9): print("Correct Python version") print(f"You are using a wrong version of Python: {'.'.join(map(str,sys.version_info[:3]))}") # This cell is for grading. DO NOT remove it # Use unittest asserts import unittest t = unittest.TestCase() from pprint import pprint from typing import Tuple, List # Helper assert function def assert_percentage(val): t.assertGreaterEqual(val, 0.0, f"Percentage ({val}) cannot be < 0") t.assertLessEqual(val, 1.0, f"Percentage ({val}) cannot be > 1″)
Before starting the homework sheet we recommend you finish these warm-up tasks. They won’t bring any points but should help you to get familiar with Python code.
Function and types (0 P)¶
Write a function using list comprehension that returns the types of list elements.
The function should be called types_of
The function expects a list as an input argument.
The function should return a list with the types of the given list elements.
Read the testing cell to understand how types_of is supposed to work.
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Test type_of_two function
types = types_of([7, 0.7, “hello”, True, (2, “s”)])
assert isinstance(types, list)
t.assertEqual(types[0], int)
t.assertEqual(types[1], float)
t.assertEqual(types[2], str)
t.assertEqual(types[3], bool)
t.assertEqual(types[-1], tuple)
Concatenation and enumerate (0 P)¶
Concatenate the strings from the array ‘animals’ into one string.
Use: counting += and string formatting (f-strings).
Use enumerate to get the ith index.
The result should look as follows: ‘0: mouse | 1: rabbit | 2: cat | 3: dog |’
Note that this is not the most efficient way to concatenate strings in Python but part of this exercise is to showcase for-loops
animals = [“mouse”, “rabbit”, “cat”, “dog”]
counting = “|”
for i, animal in enumerate(animals):
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
print(counting)
# Test of the enumeration loop
t.assertEqual(counting, “|0: mouse |1: rabbit |2: cat |3: dog |”)
String formating (0 P)¶
What does the following string formating result in?
Write the result of the string formating into the variables result1, result2, result3.
Example: string0 = “This is a {} string.”.format(“test”)
Example solution: result0 = “This is a test string”
# first string
string1 = “The sky is {}. {} words in front of {} random words create {} random sentence.”.format(
“clear”, “Random”, “other”, 1
# second string
a = “irony”
b = “anyone”
c = “room”
string2 = f”The {a} of the situation wasn’t lost on {b} in the {c}.”
# third string
string3 = f”{7*10} * {9/3} with three digits after the floating point looks like this: {70*3 :.3f}.”
# fourth string
string4 = ” Hello World. “.strip()
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Test the string results
t.assertEqual(string1, result1)
t.assertEqual(string2, result2)
t.assertEqual(string3, result3)
t.assertEqual(string4, result4)
Exercise Sheet 1: Python Basics¶
This first exercise sheet tests the basic functionalities of the Python programming language in the context of a simple prediction task. We consider the problem of predicting health risk of subjects from personal data and habits. We first use for this task a decision tree.
Make sure that you have downloaded the tree.png file from ISIS. For this exercise sheet, you are required to use only pure Python, and to not import any module, including Numpy. Next week are going to implement the nearest neighbor part of this exercise sheet using Numpy 😉.
Classifying a single instance (15 P)¶
In this sheet we will represent patient info as a tuple.
Implement the function decision that takes as input a tuple containing values for attributes (smoker,age,diet), and computes the output of the decision tree. Should return either ‘less’ or ‘more’. No other outputs are valid.
def decision(x: Tuple[str, int, str]) -> str:
This function implements the decision tree represented in the above image. As input the function
receives a tuple with three values that represent some information about a patient.
x (Tuple[str, int, str]): Input tuple containing exactly three values.
The first element represents a patient is a smoker this value will be ‘yes’.
All other values represent that the patient is not a smoker. The second
element represents the age of a patient in years as an integer. The last
element represents the diet of a patient. If a patient has a good diet
this string will be ‘good’. All other values represent that the patient
has a poor diet.
str: A string that has either the value ‘more’ or
‘less’. No other return value is valid.
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Test decision function
# Test expected ‘more’
x = (“yes”, 31, “good”)
output = decision(x)
print(f”decision({x}) –> {output}”)
t.assertIsInstance(output, str)
t.assertEqual(output, “more”)
# Test expected ‘less’
x = (“yes”, 29, “poor”)
output = decision(x)
print(f”decision({x}) –> {output}”)
t.assertIsInstance(output, str)
t.assertEqual(output, “less”)
# This cell is for grading. DO NOT remove it
Reading a dataset from a text file (10 P)¶
In the previous task we created a method to classify the risk of patients, by manualy setting rules defining for which inputs the user is in more or less risk regarding their health. In the next exercises we will approach the task differently. Our goal is to create a classification method based on data. In order to achieve this we need to also create functions that loads the existing data into the program so that we can use it. Furthermore, we can use the loaded data to apply on our decision tree implementation and check what its outputs are.
The file health-test.txt contains several fictious records of personal data and habits. We split this task into two parts. In the first part, we assume that we have read a line from the file and can now process it. In the second function we load the file and process each line using the function we have defined for this purpose.
Read the file automatically using the methods introduced during the lecture.
Represent the dataset as a list of tuples. Make sure that the tuples have the same format as in the previous task, e.g. (‘yes’, 31, ‘good’).
Make sure that you close the file after you have opened it and read its content. If you use a with statement then you don’t have to worry about closing the file.
Make sure when opening a file not to use an absolute path. An absolute path will
work on your computer, but when your code is tested on the departments computers it will fail. Use relative paths when opening files
Values read from files are always strings.
Each line contains a newline \n character at the end
If you are using Windows as your operating system, refrain from opening any text files using Notepad. It will remove any linebreaks \n. You should inspect the files using the Jupyter text editor or any other modern text editor.
def parse_line_test(line: str) -> Tuple[str, int, str]:
Takes a line from the file, including a newline, and parses it into a patient tuple
line (str): A line from the `health-test.txt` file
tuple: A tuple representing a patient
line[-1] == “\n”
), “Did you change the contents of the line before calling this function?”
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
x = “yes,23,good\n”
parsed_line = parse_line_test(x)
smoker, age, diet = parsed_line
print(parsed_line)
t.assertIsInstance(parsed_line, tuple)
t.assertEqual(len(parsed_line), 3)
t.assertIsInstance(age, int)
t.assertNotIn(“\n”, diet, “Are you handling line breaks correctly?”)
t.assertEqual(parsed_line[-1], “good”)
# This cell is for grading. DO NOT remove it
def gettest() -> List[Tuple[str, int, str]]:
Opens the `health-test.txt` file and parses it
into a list of patient tuples. You are encouraged to use
the `parse_line_test` function but it is not necessary to do so.
This functions assumes that the `health-test.txt` file is located in
the same directory as this notebook.
List[Tuple[str, int, str]]: A list of patient tuples as read
from the file.
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
testset = gettest()
pprint(testset)
t.assertIsInstance(testset, list)
t.assertEqual(len(testset), 8)
t.assertIsInstance(testset[0], tuple)
# This cell is for grading. DO NOT remove it
Applying the decision tree to the dataset (15 P)¶
Apply the decision tree to all points in the dataset, and return the ratio of them that are classified as “more”.
A ratio is a value in [0-1]. So if out of 50 data points 15 return “more” the value that should be returned is 0.3
def evaluate_testset(dataset: List[Tuple[str, int, str]]) -> float:
Calculates the percentage of datapoints for which the
decision function evaluates to `more` for a given dataset
dataset (List[Tuple[str, int, str]]): A list of patient tuples
float: The percentage of data points which are evaluated to `’more’`
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
ratio = evaluate_testset(gettest())
print(f”ratio –> {ratio}”)
t.assertIsInstance(ratio, float)
assert_percentage(ratio)
t.assertTrue(0.3 < ratio < 0.4)
Learning from examples (10 P)¶
Suppose that instead of relying on a fixed decision tree, we would like to use a data-driven approach where data points are classified based on a set of training observations manually labeled by experts. Such labeled dataset is available in the file health-train.txt. The first three columns have the same meaning than for health-test.txt, and the last column corresponds to the labels.
Read the health-train.txt file and convert it into a list of pairs. The first element of each pair is a triplet of attributes (the patient tuple), and the second element is the label.
Similarlly to the previous exercise we split the task into two parts. The first involves processing each line individually. The second handles opening the file and processing all lines of the file
Note: A triplet is a tuple that contains exactly three values, a pair is a tuple that contains exactly two values
def parse_line_train(line: str) -> Tuple[Tuple[str, int, str], str]:
This function works similarly to the `parse_line_test` function.
It parses a line of the `health-train.txt` file into a tuple that
contains a patient tuple and a label.
line (str): A line from the `health-train.txt`
Tuple[Tuple[str, int, str], str]: A tuple that
contains a patient tuple and a label as a string
assert line[-1] == “\n”
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
x = “yes,67,poor,more\n”
parsed_line = parse_line_train(x)
print(parsed_line)
t.assertIsInstance(parsed_line, tuple)
t.assertEqual(len(parsed_line), 2)
data, label = parsed_line
t.assertIsInstance(data, tuple)
t.assertEqual(len(data), 3)
t.assertEqual(data[1], 67)
t.assertIsInstance(label, str)
t.assertNotIn(“\n”, label, “Are you handling line breaks correctly?”)
t.assertEqual(label, “more”)
# This cell is for grading. DO NOT remove it
def gettrain() -> List[Tuple[Tuple[str, int, str], str]]:
Opens the `health-train.txt` file and parses it into
a list of patient tuples accompanied by their respective label.
List[Tuple[Tuple[str, int, str], str]]: A list
of tuples comprised of a patient tuple and a label
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
trainset = gettrain()
pprint(trainset)
t.assertIsInstance(trainset, list)
t.assertEqual(len(trainset), 16)
first_datapoint = trainset[0]
t.assertIsInstance(first_datapoint, tuple)
t.assertIsInstance(first_datapoint[0], tuple)
t.assertIsInstance(first_datapoint[1], str)
# This cell is for grading. DO NOT remove it
Nearest neighbor classifier (25 P)¶
We consider the nearest neighbor algorithm that classifies test points following the label of the nearest neighbor in the training data. You can read more about Nearest neighbor classifiers here. For this, we need to define a distance function between data points. We define it to be
distance(a, b) = (a[0] != b[0]) + ((a[1] – b[1]) / 50.0) ** 2 + (a[2] != b[2])
where a and b are two tuples representing two patients.
Implement the distance function.
Implement the function that retrieves for a test point the nearest neighbor in the training set, and classifies the test point accordingly (i.e. returns the label of the nearest data point).
Hint: You can use the special infinity floating point value with float(‘inf’)
Keep in mind that bools in Python are also ints. True is the same as 1 and False is the same as 0
def distance(a: Tuple[str, int, str], b: Tuple[str, int, str]) -> float:
Calculates the distance between two data points (patient tuples)
a, b (Tuple[str, int, str]): Two patient tuples for which we want to calculate the distance
float: The distance between a, b according to the above formula
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Test distance
x1 = (“yes”, 34, “poor”)
x2 = (“yes”, 51, “good”)
dist = distance(x1, x2)
print(f”distance({x1}, {x2}) –> {dist}”)
expected_dist = 1.1156
t.assertAlmostEqual(dist, expected_dist)
# This cell is for grading. DO NOT remove it
def neighbor(
x: Tuple[str, int, str],
trainset: List[Tuple[Tuple[str, int, str], str]],
Returns the label of the nearest data point in trainset to x.
If x is `(‘no’, 30, ‘good’)` and the nearest data point in trainset
is `(‘no’, 31, ‘good’)` with label `’less’` then `’less’` will be returned.
In case two elements have the exact same distance, element that first occurs
in the dataset is picked (the element with the smallest index).
x (Tuple[str, int, str]): The data point for which we want
to find the nearest neighbor
trainset (List[Tuple[Tuple[str, int, str], str]]):
A list of tuples with patient tuples and a label
str: The label of the nearest data point in the trainset.
Can only be ‘more’ or ‘less’
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Test neighbor
x = (“yes”, 31, “good”)
prediction = neighbor(x, gettrain())
print(f”prediction –> {prediction}”)
expected = “more”
t.assertEqual(prediction, expected)
# This cell is for grading. DO NOT remove it
In this part we want to compare the decision tree we have implemented with the nearest neighbor method. Apply both the decision tree and nearest neighbor classifiers on the test set, and return the list of data point(s) for which the two classifiers disagree, and with which probability it happens.
def compare_classifiers(
trainset: List[Tuple[Tuple[str, int, str], str]],
testset: List[Tuple[str, int, str]],
) -> Tuple[List[Tuple[str, int, str]], float]:
This function compares the two classification methods (decision tree, nearest neighbor)
by finding all the datapoints for which the methods disagree. It returns
a list of the test datapoints for which the two methods do not return
the same label as well as the ratio of those datapoints compared to the whole
trainset (List[Tuple[Tuple[str, int, str], str]]):
The training set used by the nearest neighbour classfier.
testset (List[Tuple[str, int, str]]): Contains the elements
which will be used to compare the decision tree and nearest
neighbor classification methods.
Tuple[List[Tuple[str, int, str]], float]: A list containing all the data points which yield
different results for the two classification methods. The ratio of
datapoints for which the two methods disagree.
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
return disagree, percentage
# Test compare_classifiers
disagree, ratio = compare_classifiers(gettrain(), gettest())
print(f”ratio = {ratio}”)
t.assertIsInstance(disagree, list)
t.assertIsInstance(ratio, float)
t.assertIsInstance(disagree[0], tuple)
t.assertEqual(len(disagree[0]), 3)
assert_percentage(ratio)
t.assertTrue(0.1 < ratio < 0.2)
One problem of simple nearest neighbors is that one needs to compare the point to predict to all data points in the training set. This can be slow for datasets of thousands of points or more. Alternatively, some classifiers train a model first, and then use it to classify the data.
Nearest mean classifier (25 P)¶
We consider one such trainable model, which operates in two steps:
Compute the average point for each class
Classify new points to be of the class whose average point is nearest to the point to predict.
For this classifier, we convert the attributes smoker and diet to real values (for smoker: 1.0 if 'yes' otherwise 0.0, and for diet: 0.0 if 'good' otherwise 1.0), and use the modified distance function:
distance(a,b) = (a[0] - b[0]) ** 2 + ((a[1] - b[1]) / 50.0) ** 2 + (a[2
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com