1 Introduction
In this assignment you will help a restaurant in their capacity and scheduling decisions. For the assignment, you need to write a concise report that includes the choices you made and the activities that followed from those choices. The report should not be just a collection of notes; it needs to have a proper flow with introduction, conclusions, graphs, etc. The final source code needs to be properly formatted and included in the appendix. You can upload the report on Nestor.
The assessment criteria for the assignment are the following:
● Solid reasoning explaining modelling choices
● Appropriate use of the techniques and Python code
● Use of Gurobi and Elasticsearch
● Quality of the report
In the final assignment you will show that you master the various programming skills that were practiced during the course. You will work with CSV or JSON files, data structures, Elasticsearch, Gurobi, etc.. Note that many roads lead to Rome. The techniques and libraries used during the practicals are sufficient to answer the various questions in this assignment, but you are also allowed to use additional libraries.
2. Case description
The restaurant you need to advise delivers meals from its address to home addresses in the city of Groningen. The outcome of the previous assignment of Dapom was that bundling orders increases efficiency. The restaurant (located at an address with zip code 9714AB) now had a good idea. Lunch orders (sandwiches) are usually ordered a day in advance, and because they are not kept hot, the delivery moment is not very time sensitive. Furthermore, they are small and light. Therefore, they propose to distribute lunch orders with cargo bikes, which can load dozens of lunches simultaneously. The company has made a representative data set available with destinations (zip codes) of 114 lunch orders.
Optimizing the delivery of that many orders is a real hard Operations Research problem. At this stage, however, the company is only interested in an approximate solution. The assignment consists of three steps:
• Step 1: separate the orders in batches. The company knows from experience that addresses of which the first 4 numbers of the zip code are the same, are close together. Therefore, they
propose you use this criterion for creating these batches. So all orders with the zip code 9712
are in one batch, orders with zip code 9714 are in another batch, etc.
• Step 2: create an optimal route for orders within each batch, so from the restaurant (9714AB) to
all destinations, and then back to the restaurant. This is known as the Traveling Salesman Problem (TSP). You can find a reference Gurobi implementation here: https://gurobi.github.io/modeling-examples/traveling_salesman/tsp.html
and here:
https://www.gurobi.com/documentation/9.0/examples/tsp_py.html
Use these as a template for your own code, and adapt them to your restaurant problem.
• Step 3: visualize the routes with Folium.
3. Data
You have three files available:
● groningenPostcodeDistances.csv: the travelling time (in seconds) and distance (in meters) from each postcode in Groningen to all other postcodes in Groningen. Normally you would calculate the distance and predicted traveling time between locations on the basis of actual addresses. Still, using postcodes is considered to be accurate enough to determine realistic time and distance between locations, because each postcode on average consists of only 15 houses. Note that this still is a large file with almost 30.000.000 distances. It was generated with the Open Source Routing Machine (http://project-osrm.org/) using a bicycle profile.
● orders.csv: this contains the destination zip codes for the 114 lunch orders.
Note that, for reasons of privacy, this data is randomly generated, fictitious and not based on real data in any way.
● groningenAddresses.txt. This file contains all (well, most) addresses in Groningen, including postcode, latitude, and longitude.
4. Data Analysis questions
To start, look at the structure of the data files and determine the meaning of each column. You will see that the orders.csvfile is very small. Therefore, you can read it directly in Python instead of indexing it in Elasticsearch.
Second, import the groningenPostcodeDistances.csv and groningenAddresses.txt files in Elasticsearch. Before you start the import, specify the kinds of fields that are in the data file, to make sure that the distance and time field are indexed as float (for an example, see section 5.5 of the B- practical manual how this was done for the taxi-file). Then, use the elasticsearch_loader program you also used in the practical.
The groningenAddresses.txtfile is a file with on each line a Json record. You can import it in the following way (adapt the file name such that it specifies the location of the file you downloaded from Nestor):
elasticsearch_loader –index groningen –encoding “iso-8859-1” json –lines “c:\path_to_the_file\groningenAddresses.txt”
The groningenPostcodeDistances.csv is a csv file, which you need to tell the program, for example:
elasticsearch_loader –index distances –bulk-size 5000 –encoding “iso-8859- 1” csv “c:\path_to_the_file\orders.csv”
Adapt the path to the location where you stored the file, and execute the command for both files (make sure you give both files their own index, so you don’t accidentally add the distances file to the index with addresses). The complete command needs to be on one line in your terminal.
If you get an encoding error, try to remove the complete encoding command (–encoding “iso- 8859-1”).
The file with postcodes is quite big, so importing it will take some time. On my reasonably fast laptop it took approximately 30 minutes. If it takes long, and you don’t trust whether the import is actually running, go to:
http://localhost:9200/insert-your-index-name-here/_count
After a minute or so, press F5. The count should increase. The total count for the postcode file should be around 29.000.000.
If you have imported all data successfully, do each step successively.
Step 1. Here you need to iterate the destination postcodes in the orders file to make batches. One way is to create a dictionary named order_batches with as keys the 4-zip code, and as values the 6-zip codes, that printed would look like:
{‘9725’: [[‘9725JV’], [‘9725EC’], [‘9725KT’]], ‘9715’: [[‘9715GP’], [‘9715LS’], [‘9715BG’], [‘9715PN’], ..etc..
Later, you can iterate this dictionary as:
for key, orders in order_batches.items(): for order in orders:
print(key, order[0])
Step 2. This is the complex and most important part of the assignment. You need to create the optimal route for each order_batch (i.e., the optimal route between all destinations of which the zip codes starts with the same 4 numbers). For example, the batch of destinations of which the zip code starts with 9725 are 9725JV, 9725EC, and 9725KT. In this step, you must use Gurobi to calculate the optimal route with which you will visit each of these destinations and the restaurant itself (which has the zip code 9714AB).
The TSP algorithm in the Gurobi example needs a distance matrix. This is a matrix that contains the distance from each place to visit, to all the other places. The code in the Gurobi TSP template reads
cities and their gps coordinates. These gps coordinates are used to calculate the Euclidean distance between each city and each other city. In our case, we have much better data, because the time as specified in the groningenPostcodeDistances.csv file contains the actual bicycling time needed instead of the Euclidean distance. So in this step, you need to adapt the TSP example code in such a way that the distance matrix contains the mutual traveling times between all postcodes. You must extract these times from the Elasticsearch index.
Note that we talk about a ‘distance’ matrix, but in this matrix we do not need to necessarily put the distance in meters. It can also be the distance in time.
One thing you should be aware of, is that the Gurobi algorithm expects that the distance between a and b is the same as the distance between b and a. Because our postcodeDistances file contains the actual bicycling times, it might very well be that a certain route takes longer than the reverse, due to one way streets. Make sure that the distances in your distance matrix between postcodes are the same from ato b and from b to a. Just pick one of the two distances between a and b and add them for both a to b and b to a (the differences are very small anyway).
Step 3.
Once you have the routes, you need to draw each of them in Folium, just like done in the Gurobi example. You can get the gps coordinates of the postcodes in the route from the Elasticsearch index you created based on the groningenAddresses.txt file. You might need to solve some cases of missing data. An additional requirement for this step is that each route needs its own color, otherwise they cannot be distinguished from each other.
Bonus question
5 points will be divided over the students who manage to add markers for each destination to the Folium map, and where clicking on the marker would show the zip code of the destination in a popup window. The maximum bonus given per person is 1 point. As the maximum for a repair is a 6 (which is the faculty rule), the bonus allows you to still get a 7.