COMP30023 – Computer Systems
Application Layer – HTTP and HTML
Dr Lachlan Andrew
Recap
• History of the internet
• Network Protocol Models (stacks)
• OSI vs TCP/IP
• Acknowledgement:
• These slides are minor modifications of those prepared by Dr Chris Culnane
© 2021 University of Melbourne
2
Summary
• Top-down approach
– We’llgraduallypeelawaythelayersoverthecomingweeks
• Application Layer
– HTTP (the web protocol), and in relation to it, HTML
• Wireshark – viewing network protocols in real-time
© 2021 University of Melbourne
3
World Wide Web – A Short History
• Sir Tim Berners-Lee
– 1984returntoCERN(TCP/IPinstalled)
– Sawmanyonlinedatabaseswithdifferent access mechanisms (FTP, Gopher, …)
– 1989wrotetheproposal“alargehypertext database with typed links” (No takers)
– by1990,haddesignandbuilt:HTTP,HTML, httpd, WorldWideWeb (browser)
– 1992leftforMIT,afterCERNITHead described it as a misallocation of resources
• Hypertext
– TedNelsoncoinedthetermin1963
– Creationanduseoflinkedcontent
© 2021 University of Melbourne
4
World Wide Web – A Short History
• The vision was that HTTP would be the “glue” between data on different existing protocols
– e.g., FTP (file transfer protocol) – many files available for download
• GOPHER – distributed database developed at U. Minnesota in 1991
– Hierarchical file structure
– More suited for text interfaces – lower network overhead – February 1993, charging for server
• May 1994 first International WWW Conference (at CERN)
• September 1994 W3C formed (DARPA & European Community) – Standardisation of web technologies – royalty free
• Browser wars 1994-1998 (Microsoft vs. Netscape)
• 1999 – 2001 .com boom
• 2002+ Ubiquitous web
• Web 2.0 – semantic web, social media
© 2021 University of Melbourne
5
WWW – Components
• Client – typically a browser based access to pages
• Server – daemon based content delivery of pages
• URL ≈ Protocol + DNS Name + file name
© 2021 University of Melbourne
6
WWW – Architecture
© 2021 University of Melbourne
7
HTTP – Overview
• HyperText Transfer Protocol
– Definedeverythingneededfortheweb
• TCP/IP Model vs OSI Model
– Applicationlayer(exceptcompression/encoding-Presentation)
• Resources are referenced by URLs
© 2021 University of Melbourne
8
URL/URI
• Uniform Resource Locator
– SirTimcalleditthe“universalresourcelocator”
– DefinedinoriginalHTTPspecification
– Anaddressforaresource
– Canberelative“./nextpage.html”orabsolute “http://www.google.com”
• Separate specification by W3C in 1998 for URI – UniformResourceIdentifier
– scheme:[//[user[:password]@]host[:port]][/path][?query][#fragment] – abc://username:password@example.com:123/path/data?key=value#fragid1
2021 © University of Melbourne 9
HTTP – Protocol Overview
• Overview:
– Client initiates TCP connection (creates socket) to server, port 80
– ServeracceptsTCPconnectionfromclient
– HTTPmessages(application-layerprotocolmessages)exchanged between browser (HTTP client) and Web server (HTTP server)
– TCPconnectionclosed • Connections:
– HTTP1.0–singleuseconnection
– HTTP1.1–persistentconnections,additionalheaders
– HTTP/2–2015–Furtherspeedimprovements(originsinSPDY)
– HTTP/3(draft;inuse)–Allowmoreparallelismindataloading(QUIC)
© 2021 University of Melbourne
10
Non-persistent HTTP
©
©
2021
University of Melbourne
11
Persistent vs. Non-persistent
• Non-persistent:
– requires2“responsetimes”(onetoinitiateTCPconnectionandone
for initial HTTP request) per object + file transmission time
– OSoverheadforeachTCPconnection
– browsersoftenopenparallelTCPconnectionstofetchreferenced objects
• Persistent:
– serverleavesconnectionopenaftersendingresponse
– subsequentHTTPmessagesbetweensameclient/serversentover open connection
– clientsendsrequestsassoonasitencountersareferencedobject, reducing overall response time
© 2021 University of Melbourne
12
HTTP Request Connection
• HTTP with (a) multiple connections and sequential requests. (b) A persistent connection and sequential requests.
(c) A persistent connection and pipelined requests.
© 2021 University of Melbourne
13
HTTP – Summary of key steps
• Steps that occur when a link is selected:
– BrowserdeterminestheURL
– BrowserasksDNSfortheIPaddressoftheserver(ResolvingURL) – DNSreplies
– ThebrowsermakesaTCPconnection
– SendsHTTPrequestforthepage
– ServersendsthepageasHTTPresponse
– BrowserfetchesotherURLsasneeded
– Thebrowserdisplaysthepage(progressively,ascontentarrives) – TheTCPconnectionsarereleased
© 2021 University of Melbourne
14
HTTP – Request Methods
HTTP Method
Safe
Idempotent
Cacheable
GET
Yes
Yes
Yes
HEAD
Yes
Yes
Yes
POST
No
No
Yes/No
PUT
No
Yes
No
DELETE
No
Yes
No
CONNECT
No
No
No
OPTIONS
Yes
Yes
No
TRACE
Yes
Yes
No
PATCH
No
No
No
• Idempotent – multiple identical requests have same effect
• Safe – Only for information retrieval, should not change state
© 2021 University of Melbourne
15
Wireshark Example
© 2021 University of Melbourne
16
request line
(GET, POST, HEAD)
header lines
Blank line
(2 LF or 2 CR/LF) indicates end of message
GET /somedir/page.html HTTP/1.1
Host: www.somesite.com.au
User-agent: Mozilla/4.0
Connection: close
Accept-language: fr
(extra new line)
HTTP Request Example
© 2021 University of Melbourne
17
HTTP Response Codes
Code
Meaning
Examples
1xx
Information
100 – server agrees to handle client’s request
2xx
Success
200 = request succeeded; 204 = no content present
3xx
Redirection
301 = page moved; 304 = cached page still valid
4xx
Client error
403 = forbidden page; 404 = page not found
5xx
Server error
500 = internal server error; 503 try again later
© 2021 University of Melbourne
18
Status line (protocol status code and phrase)
header lines
Data, e.g., requested HTML file
HTTP/1.1 200 OK
Connection: close
Date: Thu, 06 Aug 2009 12:00:15 GMT
Server: Apache/2.2.11 (Unix)
Last-modified: Mon, 22 Jun 2009
Content-Length: 6821
Content-Type: text/html
HTTP – Response
© 2021 University of Melbourne
19
Header
User-Agent
Accept
Accept-Charset
Accept-Encoding
Accept-Language
If-Modified-Since
If-None-Match
Host
Authorization
Referer
Cookie
Set-Cookie
HTTP Headers
Server © Response
Type
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Description
Information about the browser and its platform
The type of pages the client can handle
The character sets that are acceptable to the client
The compression formats the client can handle
The natural languages the client can handle
Time and data to check freshness
Previously sent tags to check freshness
The server’s DNS name
A list of the client’s credentials
The previous URL from which the request came
Previously set cookie sent back to the server
Response
Cookie for the client to store
Information about the server
20
© 2021 University of Melbourne
Header Type
Description
HTTP Headers
Content-Encoding Response How the content is encoded (e.g., gzip)
Content-Language Response
The natural language used in the page
Content-Length
Response
The page’s length in bytes
Content-Type
Response
The page’s MIME type
Content-Range
Response
Identifies a portion of the page’s content
Last-Modified
Response
Time and date the page was last changed
Expires
Response
Time and date when the pages stops being valid
Location
Response
Tells the client where to send its request
Accept-Ranges
Response
Indicates the server will accept byte range requests
Date
Both
Date and time the message was sent
Range
Both
Identifies a portion of a page
Cache-Control
Both
Directives for how to treat cache
Etag
Both
Tag for the contents of the page
Upgrade
Both
The protocol the sender wants to switch to
© 2021 University of Melbourne
21
Client side processing
• Plugins/Extensions – integrated software module which executes inside the browser,
– directaccesstoonlinecontext
• Helper – separate program which can be instantiated by the
browser, but can only access local cache of file content – application/pdf
– application/msword
© 2021 University of Melbourne
22
Server side processing – static page
• 5 step process:
– AcceptTCPConnectionfromclient(browser)
– Identifythefilerequested
– Getthespecifiedfilefromthelocalstorage(disk,RAM,…) – Sendthefiletotheclient
– ReleasetheTCPconnection
© 2021 University of Melbourne
23
Multi-threaded Web Server
• A multithreaded Web server with a front end and processing modules.
© 2021 University of Melbourne
24
Multi-threaded Web Server – dynamic
• A processing module performs a series of steps:
– ResolvenameofWebpagerequested.
– PerformaccesscontrolontheWebpage.
– Checkthecache.
– Fetchrequestedpagefromdiskorrunprogram – Determinetherestoftheresponse
– Returntheresponsetotheclient.
– Makeanentryintheserverlog.
© 2021 University of Melbourne
25
Web Cache
• Goal: satisfy client request without involving origin server – reduce response time.
© 2021 University of Melbourne
26
Web proxy
• Used for caching, security and IP address sharing
• The browser sends all HTTP requests to the proxy. The proxy returns objects in its cache or else the proxy requests object from origin server, then returns object to client.
• Note: the proxy server acts as both client and server.
2021 © University of Melbourne 27
Cookies
• The network stores no state about web sessions
• Cookies can place small amount (<4Kb) of information on
the users computer and re-use deterministically (RFC 2109)
• Cookies have 5 fields
– domain,path,content,expiry,security
• How to keep state – maintain state at sender/receiver over
multiple transactions; http messages carry “state”
• Questionable mechanism for tracking users (invisibly perhaps) and learning about user behaviour
– e.g.,competitorsnooping,undesirablecontentetc.
© 2021 University of Melbourne
28
Example Cookies
amazon.com.au
Name
Value
Domain
Expires
HTTPOnly
Secure
Session-id
356-7554479- 6471342
.amazon.com.au
2036-01-01.
Session-id-time
2082787201l
.amazon.com.au
2036-01-01.
ad-id
A3kfU1c7DE3 Wqz474A25Zfs
.amazon.adsystem.com
2037-01-01.
nytimes.com
Name
Value
Domain
Expires
HTTPOnly
Secure
ad-id
A3kfU1c7DE3 Wqz474A25Zfs
.amazon.adsystem.com
2037-01-01.
© 2021 University of Melbourne
29
Static web documents
• HTML - Hypertext Markup Language
– asimplelanguagedesignedtoencodebothcontentand
presentational information
– Plaintextencoding,withbrowserbasedrendering
– RestrictedtoISO-8859Latin-1characterset(internationalisationnot introduced until XHTML with UTF encodings)
• Web Page Components
– Structuraldivisions:
• Head
– SyntacticallyRestrictedTagSets
– Attributes&Values
© 2021 University of Melbourne
30
2021 ©
University of Melbourne
31
Beyond HTML
• HTML was originally an instance of SGML – standard generalized markup language
• People wanted an HTML-like language to describe data that is not hypertext – but SGML is too general / “heavy”
• XML (Extensible Markup Language)
& XSL (Extensible Stylesheet Language)
– Primary feature: separation of content and presentational markup – Stringent validation requirements
• XHTML
– Essentially an expression of HTML 4.0 as valid XML
– Major differences to HTML 4.0 are the requirements for conformance, case folding, well-formedness, attribute specification, nesting and embedding, and inclusion of a document type identifier
© 2021 University of Melbourne
32
Dynamic Content
© 2021 University of Melbourne
33
Dynamic Content
© 2021 University of Melbourne
34
Scripting
© 2021 University of Melbourne
35
Client-side Scripting
• Technologies for producing interactive web applications include:
• JavaScript
• Java Applets – compiled Java code (platform independent)
• ActiveX – compiled code for Windows
• AJAX
– HTMLandCSS:presentinformationaspages.
– DOM:changepartsofpageswhiletheyareviewed.
– XML:letprogramsexchangedatawiththeserver.
– AnasynchronouswaytosendandretrieveXMLdata.
– JavaScript as a language to bind all this together
© 2021 University of Melbourne
36
And finally…
• Tracking with cookies is well known
• Tracking companies have expanded beyond simple cookies
– Plug-in,browserfingerprinting
• https://coveryourtracks.eff.org/ Project to research tracking
techniques in browsers
• How unique is your browser by Peter Eckersley (EFF) : https://coveryourtracks.eff.org/static/browser- uniqueness.pdf
© 2021 University of Melbourne
37