Your task for this part is to write a program to retrieve a file on a webserver via HTTP. Your program should make use of sockets to send and receive HTTP requests/responses and must be written in Python, Java, C, or C++. It is recommended that you use Python as more support will be available for it in this course.
Description
Base Functionality (25 marks)
Your program should take a HTTP URL as a command line input, leading to a file on a webserver. This URL could just be a domain name (eg. http://www.my.server.com) or include the resource location
(eg. http://www.my.server.com/about or http://www.my.server.com/file.json). The exact method of program in- vocation is described later.
Your program should open a TCP connection to the webserver and print the following information to stdout:
URL Requested: [url]
Client: [client-ip-addr] [client-port-num]
Server: [server-ip-addr] [server-port-num]
You do not need to handle invalid domains. Upon successful connection, your program should make a HTTP request to the server. After receiving the associated response, the following information should then be printed:
Retrieval Successful
Date Accessed: [dd/mm/yyyy] [hh:mm:ss] AEST
Last Modified: [dd/mm/yyyy] [hh:mm:ss] AEST
Both fields should be as per the value given in the HTTP response. Both time fields should be converted from UTC to AEST if required. The last modified field may not always be given – if this is the case, the text
“Last Modified not available” should replace that line.
The contents of the file retrieved from the webserver should be written to a file named “output.[extension]”. [ex- tension] should be replaced with an appropriate extension based on the MIME type of the retrieved file. You only have to support the MIME types given in the table below. You can read more about MIME types at https://tools.ietf.org/html/rfc6838. Note that the extension of the URL is not always indicative of a file’s MIME type, you will need to retrieve this from the HTTP response.
Supported MIME Types
MIME Type
File Extension
text/plain
.txt
text/html
.html
text/css
.css
text/javascript or application/javascript
.js
application/json
.json
application/octet-stream
No extension
HINT: Because TCP is a stream-based protocol, long HTTP responses may be transmitted over multiple packets. Your program should make sure it has received the entire file before terminating.
4
Handling Unsupported URLs (10 marks)
Your program does not need to work for HTTPS URLs. If a HTTPS URL is requested your program should print the following and terminate:
URL Requested: [url]
HTTPS Not Supported
If the status code of a HTTP response is in the range of 400-599, your program should notify the user of this and terminate, as follows:
URL Requested: [url]
Client: [client-ip-addr] [client-port-num]
Server: [server-ip-addr] [server-port-num]
Retrieval Failed ([code])
Redirection (15 marks)
The final feature your program should support is handling of 301 and 302 status codes. If a resource has been moved, you should repeat the above process until either the resource is found, or an invalid/unsupported URL is given. For example:
URL Requested: [url]
Client: [client-ip-addr] [client-port-num]
Server: [server-ip-addr] [server-port-num]
Resource [temporarily/permanently] moved to [url]
Client: [client-ip-addr] [client-port-num]
Server: [server-ip-addr] [server-port-num]
Retrieval Successful
Date Accessed: [dd/mm/yyyy] [hh:mm:ss] AEST
Last Modified: [dd/mm/yyyy] [hh:mm:ss] AEST
It is possible that you are redirected multiple times, in which case, the “moved to” line and client/server info should be repeated for each redirection. You can complete redirections over a persistent TCP connection or you may choose to create a new socket on each redirection. If you choose to do the former, you will have to handle the server closing the connection (and you should still re-print the client/server information if the connection is not closed).
Program Invocation
Your program should be able to be invoked from a UNIX command line as follows. url is the URL of the webpage to request.
Python
python3 assign1.py url
C/C++
make
./assign1 url
Java
make
java Assign1 url
5
Example Output
Note that for the following examples, the client/server information and dates may not be accurate. The following is an example output for a request to http://uq.edu.au/:
URL Requested: http://uq.edu.au/
Client: 192.168.12.15 54321
Server: 10.187.2.85 80
Retrieval Successful
Date Accessed: 04/03/2019 10:37:33 AEST
Last Modified: 04/03/2019 10:35:01 AEST
In this case, the user should be able to view the contents of this webpage in a file named “output.html”. An example for a request to http://uq.edu.au/missing is below:
URL Requested: http://uq.edu.au/missing
Client: 192.168.12.15 54321
Server: 10.187.2.85 80
Retrieval Failed (404)
An example for a request to http://abc.net.au/ is below:
URL Requested: http://abc.net.au/
Client: 192.168.12.15 54321
Server: 10.187.2.85 80
Resource permanently moved to http://www.abc.net.au/
Client: 192.168.12.15 54321
Server: 10.187.2.85 80
Resource temporarily moved to https://www.abc.net.au/
HTTPS Not Supported
It should be noted that all output uses single-spacing, not double-spacing (as it may appear in some examples).
Library Restrictions
• You should use standard socket libraries to open the TCP connection and communicate between the client and server
• You should NOT use higher level libraries, packages, or programs which retrieve data from HTTP servers, such as the python requests and urllib libraries.
– The only exception is that you may use text parsing functions which are a part of urllib (i.e. the urllib.parse module
• If you are unsure about whether you may use a certain library, please ask the course staff on Piazza