代写 html Java python shell compiler network COMP431 — INTERNET SERVICES & PROTOCOLS

COMP431 — INTERNET SERVICES & PROTOCOLS
Don Smith Fall 2019
Programming Assignment 1 Assigned August 21
Due: 3:00 PM, September 4
Introduction — Mini-steps Towards the Construction of Web Services
This assignment is a simple string-parsing and file-I/O problem whose solution we will build on in later assignments to build a simple web server and proxy server that will work with standard browsers such as Chrome, Firefox and Safari. At a high-level, a web server is simply a program that receives commands from clients (most likely web browsers), processes the commands, and sends the results of the processing back to the client as a response to the command. In this abstract view of a web server, it is a program that executes a logically infinite loop wherein it receives and processes commands. In this first assignment you will develop a portion of the code that will be used by the web server to validate and process commands it receives. Specifically, you are to write a program to determine if a command (a text string) received by a server is a valid HTTP GET command, and if so, to read the file requested from disk.
The HTTP GET Command
An HTTP GET command is simply a line of input that looks like the following:
GET /home/public/hw1/index.html HTTP/1.0
The GET command (more commonly referred to as a “request”) is made up of three substrings:
• a “method” — the word GET (in upper case characters),
• a “request URL” — a set of words separated by slashes (“/”) (taken together, the words and slashes are
interpreted as a file path name), and
• an “HTTP version identifier” — a string of the form “HTTP/x.y” where x and y are digits.
All of these components must appear in the order listed above and in a single character string terminated by a carriage-return-line-feed combination. Components may be separated in this string by any amount of “whitespace” (spaces and tabs).
The HTTP GET request is part of the larger HTTP protocol. Protocols such as HTTP are typically specified more formally than the English description above by using a more precise specification notation. (These notations are, in essence, a textual form of the syntax diagrams — sometimes called “railroad diagrams” — that are used to specify the formal syntax of a programming language.)
For example, the formal definition of the HTTP GET “request line” is given by the following grammer:1
Request-Line = Method WhSp Request-URL WhSp HTTP-Version *Space CRLF
1 As an aside, this form of notation is a variation of a commonly used notation called Backus-Naur Form (BNF). You will often see the syntax of protocols expressed using BNF and variations on BNF.

2
Method = “GET” Request-URL = Absolute-Path
HTTP-Version = “HTTP” “/” +DIGIT “.” +DIGIT
Absolute-Path = “/” *FileNameChar
FileNameChar = ALPHA | DIGIT | “.” | “_” | “/”
ALPHA = UPALPHA | LOALPHA
DIGIT =
UPALPHA = LOALPHA =
WhSp = +Space Space = (SP | HT)
CRLF = SP =
HT =
In this notation:
• Items appearing on the left-hand side of an expression are called tokens,
• Tokens written in all uppercase characters are called terminals and represent a single character in the
string; tokens with lowercase characters are called non-terminals and represent substrings,
• Anything in quotes is interpreted as a literal string or character that must appear exactly as written,
• Text in angle brackets (“<>”) is interpreted as an English textual description of required text,
• Parenthesis are a grouping operator,
• The vertical bar “|” is interpreted as an “or” operator and indicates a choice between components that are
mutually exclusive,
• The plus “+” denotes that one or more of the item following the plus sign must appear in sequence, and
• The asterisk “*” denotes that any number of the item following the asterisk (including 0) may appear in
sequence.
For example, the GET request above conforms to the formal description and hence is a valid HTTP GET request (assuming it is terminated with a carriage-return-line-feed — the line termination “character” for UNIX). The following strings do not conform to the formal description and would be rejected as invalid or illegal requests.
get /jasleen/public_html/Courses/Spring07 HTTP/1.0 GET /jasleen/public html/Courses/Spring07 HTTP/1.0 GET /jasleen/public_html/Courses/Spring07 HTTP /1.0
The first request contains an invalid Method token (the “get” is not in upper case); the second and third requests contain an invalid HTTP-Version token. In the case of the second request, note that for most parses of the request string, the substring “/jasleen/public” would be returned as a valid Absolute-Path and hence the next token searched for would be the HTTP-Version token. That is, having found the Absolute-Path “/jasleen/public,” a parser would next try and interpret the substring “html/Courses/Spring07” as the HTTP-Version token. Thus although the error in the request was white space appearing in the file name, the error manifests itself in the parse as an invalid HTTP-Version token. In the case of the third request, the space following the HTTP string is not allowed in the HTTP-Version token.
The Assignment — A Parser for HTTP GET
For this assignment you are to write a Java or Python program on Linux to read in lines of characters from standard input (stdin) and determine which lines, if any, are legal HTTP GET requests. For each legal line, the program will read from the disk the file requested and print it on standard output.
For each line of input your program should:
• Echo the line of input (i.e., print the line of input to stdout).

3
• For valid requests, list on subsequent lines the components of the request, followed by the contents of the requested file (more details provided below).
• For invalid requests, print out an error message indicating which token is missing or ill-formed (more details provided below).
For example, if the four sample requests from before were read by your program, the output would be:
GET /Admin/Schedules/index.html HTTP/1.0 Method = GET
Request-URL = /Admin/Schedules/index.html HTTP-Version = HTTP/1.0
get /jasleen/public_html/Courses/Spring07 HTTP/1.0 ERROR — Invalid Method token.
GET /jasleen/public html/Courses/Spring07 HTTP/1.0 ERROR — Invalid HTTP-Version token.
GET /jasleen/public_html/Courses/Spring07 HTTP /1.0 ERROR — Invalid HTTP-Version token.
Erroneous Input Processing
There are four possible errors that you can detect during a parse:
• An invalid method token,
• An invalid path token,
• An invalid HTTP version token, and
• Spurious text appearing between the version token and the end of the line.
Your program should format its output exactly as shown above and for each error encountered, should print one of the following error messages that corresponds to the error:
ERROR — Invalid Method token.
ERROR — Invalid Absolute-Path token. ERROR — Invalid HTTP-Version token. ERROR — Spurious token before CRLF.
For the purpose of deciding which is the first token that has error, assume tokens are delimited by , , , or . You should check for syntax errors and emit error messages as appropriate. All acknowledgement and error messages should be formatted exactly as shown above. When an error is encountered, skip all input till the next closest or or
(remember that both and produce a newline character).
Valid Input Processing
If the request is parsed without syntax errors, the program will do the following. If the filename ends in either of the strings “.txt”, “.htm”, or “.html” then your program should attempt to open the specified file, read successive lines from the file, and output these lines to standard output (stdout). (For this assignment you may assume that any file having a file name ending with any of the above extensions contains only ASCII text lines with the normal line termination character sequences.) The test for the file extension should be case insensitive and hence any uppercase or upper/lower case variant of the above file extensions is acceptable. For example, for the GET request:
GET /foo/bar.html HTTP/1.0
your program would open the file foo/bar.html and write the contents of the file to standard output.

4
The file name represented by the Request-URL is to be interpreted relative to the current directory in which your program is executing. That is, for the Request-URL “/foo/bar.html,” your program should attempt to open the file foo/bar.html in the current working directory (note that the initial “/” should be deleted from the file name – if present it refers to the root directory of the file system, not the current working directory). If the file name does not end in one of the extension strings listed above (or has no extension), the following error message should be output to standard output:
501 Not Implemented:
where “” is the Request-URL from the GET request. If the Request-URL references a file that
does not exist, the following error message should be output to standard output: 404 Not Found:
For all other errors encountered in reading the file, simply output to standard output the string: ERROR:
where “” is the error message string provided in the programming language used.
All output should be written to standard output (stdout). Your program should format its output exactly as shown above. Your program should terminate when it reaches the end of the input file (for example, when control-D is typed from the keyboard under UNIX). Your program must not output any user prompts, debugging information, status messages, extra white spaces, etc. Your outputs will be graded by comparing with a pre-generated output and any of these “extras” that your program outputs will incur a penalty in the grade.
The purpose of this assignment is to get up to speed with protocol message parsing and the use of Linux program development tools. Note that, in the abstract, this assignment has nothing to do with networking and is just a simple text parsing problem. This assignment is, however, a useful first step in writing an HTTP protocol parser that must adhere to standards strictly.
Testing
Two virtual machines running Linux are provided for the exclusive use of the students in COMP 431. These machines are named comp431afa19.cs.unc.edu and comp431bfa19.cs.unc.edu. Use your ONYEN login (not CS) to access these machines. Home directories are named by your ONYEN login name and are located in /home/students.
To aid in testing, sample input and output files are provided on the two 431 virtual machines at /home/students/smithfd/TestCases/Assign1. These sample tests are not comprehensive (i.e., you should test your program much more thoroughly than these test files) – and grading will certainly rely on many additional tests. These sample files are provided simply to aid you in initial testing, as well as catching if your program is making basic formatting/syntax mistakes. Some notes on generating your own test input are included at the end of this document.
For this assignment you should name your final source program “HTTPserver” with the language-specific extension (.java or .py). Use the provided test cases to start testing your code on comp431afa19.cs.unc.edu or comp431bfa19.cs.unc.edu using the following steps illustrated for test case 1 using redirection of stdin and stdout to files (stdin to Input1, stdout to myOutput1). < Input1 > myOutput1 Where is:
python3 HTTPserver.py, or
java HTTPserver //HTTPserver is the .class file from javac diff myOutput1 Output1
If your program works correctly, the diff command above will produce no output because the files match.

5
Grading
For this (and most other programming) assignments you will “turn in” your program for grading by placing it in a special directory on one of the 431 class machines specified below and filling out a Google form that will be provided on the course website. To ensure the TAs can grade your assignments in an efficient and timely fashion, please follow the following guidelines precisely. In particular, the order of these steps is critical. You should perform the steps below in exactly the order listed. WARNING: Failure to follow these steps exactly will result in the TA being unable to read your files. Should this occur, you will receive a grade of “0” for the assignment!
• Log on to comp431afa19.cs.unc.edu (use your onyen login id and password).
• In your Linux home directory on the above server, create the directory structure: comp431/submissions. (That is, create the directory comp431 in your home directory and inside this directory, create the
directory submissions.)
• Do not change any ACL settings for any files in your home directory.
• For each assignment you will create a subdirectory with a name specified in the assignment. You must
also name your program as specified in the assignment and store it in a directory named hw1 (inside your
~/comp431/submissions directory).
• When you have completed your assignment you should put your program in the specified subdirectory
and fill out the Google form linked from the course web page, indicating that the program is ready for
grading.
• Make sure that your program has the correct path by running the command below: “ls -l
~/comp431/submissions/hw1”
• Do not change any of your files for this assignment after the submission deadline! The lateness of
assignments will be determined by the Linux timestamps on your program files. If the timestamps on the
files change after the submission deadline, you will be penalized for turning in a late assignment.
• All programs will be tested under Linux. You should be able to develop your programs in whatever development environment you prefer and then upload to Linux. However, it is your responsibility to test and insure the program works properly in Linux (specifically, on the machine comp431afa19). In particular, if your program performs differently on your PC than it does on the 431 class server (e.g., because of some difference in library versions or compiler version), your grade will be based on your
program’s performance on the 431 class server.
• The program should be neatly formatted (i.e., easy to read) and well documented.
• The homework grade will have the following distribution:
o 18% recognize invalid token for Method
o 18% recognize invalid token for Absolute-Path o 18% recognize invalid HTTP-Versiontoken
o 18% recognize spurious token beforeCRLF
o 28%: Valid Input Processing
Example input and output files covering each test case are provided on the 431 virtual machines at /home/students/smithfd/TestCases/Assign1.
Creating Additional Test Input for this assignment
Creating test input for your program (more than the sample test files provided on the course web page) is not so simple as just typing a line of text into your favorite shell program. The issue is that different user interfaces use different mappings of key presses on the keyboard into a resulting character or character sequence. The difficulty is that many (most?) shell interfaces do not map the “Enter” key to the sequence. You may get alone or alone or . Further, trying to type something that looks like the character literals (escape sequences) \r\n will not work either.
The most straightforward way to generate test input that has the and included is to create a file of test lines and redirect your standard input to the file (see example below). In the file, terminate each line with a byte

6
that has the appropriate values ( is the value 13 (decimal) or 0D (Hex), and is the value 10 (decimal) or 0A (Hex)).
The next question is how to create such a file with these byte values. The easy way is to write a simple Java or Python program that writes your test lines to standard output and redirect the output to a file.
For example, in a Python program you name makeLines.py:
import sys …………………
sys.stdout.write(‘GET /home/public/hw1/index.html HTTP/1.0\r\n’)
To create the file with this program use:
% python3 makeLines.py > testInputLines
In a Java program you name makeLines.java:
import java.io.*;
…………………
DataOutputStream lineOut = new DataOutputStream(System.out); lineOut.writeBytes(“GET /home/public/hw1/index.html HTTP/1.0\r\n”);
To create the file with this program use:
% javac makeLines.java
% java makeLines > testInputLines