Overview
In this project, you are going to build a simple webserver that implements a subset of the HTTP/1.1 protocol specification called TritonHTTP, defined here.
Project details
Basic web server functionality
At a high level, a web server listens for connections on a socket (bound to a specific adderss and port on a host machine). Clients connect to this socket and use the TritonHTTP protocol to retrieve files from the server. For this project, your server will need to be able to serve out HTML files as well as images in jpg and png formats. You do not need to support server-side dynamic pages, Node.js, server-side CGI, etc.
Mapping relative URLs to absolute file paths
Clients make requests to files using a Uniform Resource Locator, such as/images/cyrpto/enigma.jpg
. One of the key things to keep in mind in building your web server is that the server must translate that relative URL into an absolute filename on the local filesystem. For example, you might decide to keep all the files for your server in ~aturing/cse101/server/www-files/
, which we call the document root. When your server gets a request for the above-mentioned enigma.jpg file, it will prepend the document root to the specified file to get an absolute file name of ~aturing/cse101/server/www-files/images/crypto/enigma.jpg
. You need to ensure that malformed or malicious URLs cannot “escape” your document root to access other files. For example, if a client submits the URL /images/../../../.ssh/id_dsa
, they should not be able to download the ~aturing/.ssh/id_dsa
file. If a client uses one or more ..
directories in such a way that the server would “escape” the document root, you should return a 404 Not Found
error back to the client. Take a look at the realpath()
system call for help in dealing with document roots.
Program structure
At a high level, your program will be structured as follows.
Initialize
We will provide you with starter code that handles command-line arguments, and will call into your Python code with a port and the document root. Note that the document root and port number will be parameters that are passed into your program–do not hard code file paths or ports, as we will be testing your code against our own document root. Also do not assume that the files to serve out are in the same directory as the web server. We will call your program with either an asbolute or relative path to the document root that may or may not end in a final forward slash: e.g., “/var/home/htdocs” and/or “/var/home/htdocs/”, or “../../htdocs/”.
Setup server socket and threading
Create a TCP server socket, and arrange so that a thread is spawned (or thread in a thread pool is retrieved) when a new connection comes in. The use of multiprogramming via “fork” is OK too.
Executable
Your server binary should be called httpd.py
and should take two arguments. The first should be the port number, and the second should be the doc-root (given as either an absolute or relative path, with or without the trailing ‘/’):
$ python3 httpd.py [port] [doc_root]
for example:
$ python3 httpd.py 8080 /var/www/html
Implementation
You should use Python3 to build your web server.
If you use Python, you must directly program the network with sockets calls. You cannot use 3rd party web server/HTTP libraries.
Grading
Basic functionality for 200 error code responses (50 pts)
- This category represents error-free, valid requests that result in a
200
error code. Your server should correctly handle validGET
requests for HTML, JPEG, and PNG files.- The response headers should be set correctly
- The response body should match the content
- You should support directories and subdirectories
- “http://server:port/” should be mapped to “http://server:port/index.html”
Basic functionality for non-200 error code responses (40 pts):
- Handles 404 for files that aren’t found
- Handles 404 for URLs that escape the doc root
- Correctly handles malformed HTTP requests by issuing a 400 error
Concurrency (10 pts):
- Your server should be able to handle concurrent clients using threads
Autograder
Gradescope will run an autograder with its own htdocs directory filled with HTML, JPG, and PNG files (and subdirectories). Gradescope will only provide you with a very basic sanity check that your code compiles and runs against a simple test–it is your responsibility to ensure that your code precisely follows the TritonHTTP spec. The final autograder will include test cases not included in the version provided to you before the deadline.
Starter code (New)
To get a copy of the starter code, please use this invitation.
Submitting your work
Log into gradescope.com and upload your code. This assignment is to be done individually, or in a group of 2. If you choose to be in a group of 2, your group must be the same as in HW 3.
Due date/time
Friday Oct 26, 5pm
Points
This assignment is worth 10 points
Assigned TA
Bhargav Heeraguppe Sridharan