Lancaster University
MSCI 517: Introduction to Python Coursework III (60% of module)
Deadline: 22/3/19, 10AM Lent Term Maximum Marks: 100
1 Coursework Description
This coursework is composed of two parts. You have been given the flexibility to structure the code in the way that you think is best following the principles of re-usability, maintainability, information hiding and clarity.
Part 1 (30 Marks)
You should submit a single .ipynb file containing a working version of all the tasks. The name of this file should be your library number (e.g. 123456789.ipynb, where 123456789 is an example library number). In case you want to import any packages apart from pandas, matplotlib, numpy, math and seaborn, you should contact the module leaders first.
A newly appointed health minister has heard reports that there are some hospitals in England where the average Length of Stay (LoS) of patients is over 1 week, whereas in many others the average LoS is under 1 week. You will need to download the file patients.csv from Moodle. The file contains the following data for patients admitted to one of five NHS hospitals in England:
Patient ID
Admission type String
String
Age
Height Weight Hospital LoS in hours
Integer Integer Integer String Integer
Task 1: (2 marks) Give the code that creates a DataFrame from the information in patients.csv. The DataFrame should be called df.
Task 2: (10 marks) LoS in weeks can be used to provide the following “Status”: LoS in weeks Status
<1
=1
>1 and<3 ≥3
“Short LoS” “Normal LoS” “Long LoS” “Too Long LoS”
Please Turn Over
MSCI 517 Introduction to Python Page 2 of 4
Give the commands that display the following columns: Patient ID, LoS in weeks, Status, and Flag. If Status is “Long LoS” or “Too Long LoS”, then Flag should be set to “True”, else it should be set to “False”.
Task 3: (6 marks) Give the code that calculates the correlation between the Age and Height (using Pearson’s correlation coefficient). Also give the code for producing a scatter-plot of this data.
Task 4: (6 marks) Give the code that creates a histogram of the Height and fit a normal curve over the histogram. Also give the code that calculates the mean, median, standard deviation, range and inter-quartile range of the Height.
Task 5: (6 marks) Give the code that displays only those patients with LoS more than one week. Now give the code that uses this data to calculate the mean Age of patients from each of the five NHS hospitals.
Part 2 (70 Marks)
You should submit a single .py file containing a working version of your program. The name of this file should be your library number (e.g. 123456789.py, where 123456789 is an example library number). In case you want to import any packages apart from pandas, matplotlib, numpy, math and seaborn, you should contact the module leaders first.
In this part, you are to write a program that attempts to solve a machine delivery problem using stochastic optimisation techniques. Specifically your program will read in a problem instance from a file and will attempt to produce a solution using heuristic methods.
It is important to break your code into small, meaningful functions and appropriately comment your code.
Problem description
There are machine requests from n customers that all have to be satisfied. There is one depot location where all the machines are located. Trucks are hired to transport the machines from the depot to the customers. The number of available trucks is ⌊√n⌋, where n is the number of customers. All trucks must be allocated exactly ⌊n/√n⌋ deliveries (locations). The route of a truck must start and end at the depot. We assume that it does not take any time to load a machine at the depot or to unload a machine at a customer.
The objective is to minimise the total distance travelled by the trucks. In order to determine the travelled distances, integer coordinates are provided for the customer locations. The location of the depot is always (0,0). The distance between coordinates (x1, y1) and (x2, y2) is defined as (x1 − x2)2 + (y1 − y2)2, i.e., the Euclidean distance.
The problem instances, which are available in .csv format, contain the customer IDs and the x and y coordinates of the customer locations (all in integers). An example of a problem instance file I1.csv is shown below.
Please Turn Over
MSCI 517 Introduction to Python Page 3 of 4
ID
0 1 2 3
x
2 -2 0 3
y
2 1 -1 1
In this example, we have n = 4 customers. The number of available trucks is ⌊√n⌋ = 2. All trucks must be allocated ⌊n/√n⌋ = 2 deliveries (locations). The locations of the first, second, third and fourth customers are (2, 2), (-2, 1), (0, -1) and (3, 1), respectively.
Using the problem instance given above, a solution which specifies the routes for the trucks can be stored as follows: [ [2, 1], [3, 0] ]. Here the route for the first truck is given as [2, 1], and the route for the second truck is given as [3, 0].
Note that all trucks depart from and must return to the depot, so this is not mentioned explicitly in this solution format. The distance travelled by the first truck is 6.064. This can be computed by calculating the straight line distance between the coordinates of the locations (i.e. distance between depot (0, 0) and customer 2 (0, -1) + distance between customer 2 (0, -1) and customer 1 (-2, 1) + distance between customer 1 (-2, 1) and depot (0, 0)). The distance travelled by the second truck in this example is 7.405. Therefore, the total distance travelled by all trucks is 6.064 + 7.405 = 13.469.
Task 1: (5 marks) On Moodle you will find three problem instances: I1.csv, I2.csv and I3.csv. The first task is to write a Python program that reads the coordinates of the customer locations from a given file.
Task 2: (10 marks) Generate an initial solution to this problem at random. To get the full mark you will need to add some ‘intelligence’ to the procedure to produce better quality initial solution.
Task 3: (10 marks) Implement a TotalDistance function that calculates the total distance of the solution in the way described above.
Task 4: (30 marks) Implement a function that improves the initially generated solution using e.g. hill climbing method which accepts only non-worsening moves. To do this, you will need to implement an appropriate operator, e.g. swap operator. To get the full mark, you should make your method more efficient and powerful. Here are some ideas: When performing a move, you may notice that only few trucks are altered. Thus, you may be able to speed up your program by only re-evaluating the effected trucks. You may also consider advanced techniques such as simulated annealing, tabu search, or great deluge algorithms (you will need to research these yourselves). You might also consider more than one operators (e.g. swap and insert operators). In any case, you program should be run for a fixed number of iterations between 100,000 to 1,000,000.
Task 5: (10 marks) At the end of the run, your best obtained solution should be saved in a .csv file. The file specifies the routes for the trucks, and the last row displays the total distance of the solution. This format is extremely important, as I will be using my own checker software tool to automatically examine your solution and your calculated total distance. The example solution described above would be displayed:
Please Turn Over
MSCI 517 Introduction to Python Page 4 of 4
Truck 1 Truck 2 Truck 3
21
30
13.469
Task 6: (5 marks) Update your code to provide a visual representation of the best solution. Below an example of a visual representation to a solution produced by solving the problem instance I2.csv.
40
30
Truck 4 Tru
ck 5 Truck 6
20
10
0
-40 -30 -20 -10 0 1 0
-10
-20
02
03
04
-30
-40
2 Coursework Submission
The coursework deadline is 10:00AM, Friday 22nd March, 2019. In accordance with University regulations, marks are deducted from any coursework which is not submitted by the deadline. However, if an extension is given then the rule applies from the date of the extension.
3 Plagiarism
According to university rules, instances of plagiarism will be treated very seriously. Penalties are in line with the institutional framework of the University. It is natural that students will sometimes exchange ideas and help each other out in completing coursework of this nature. However, please be mindful that flagrant instances of copying will result in both parties receiving zero.
Table 1: Version Control System Version Changes
Date
1.0 2.0 2.1
-
math package is allowed.
clarify how distance is calculated
23/02/19
27/02/19
28/02/19
Author
Ahmed Kheiri Ahmed Kheiri Ahmed Kheiri