COMP3310/6331 – #15
The Web/HTTP
Dr Markus Buchhorn: markus.buchhorn@anu.edu.au
Applications choose their transport
• UDP-based applications:
– Short messages
– Simple request/response transactions – Light server touch
– ARQ suffices
• TCP-based applications:
– Larger content transfers
– Longer, and more complex, sessions
– Reliability matters
– Packaging and presentation becomes important – TCP is a bytestream
2
The Web
• Back in the old days… even on the Internet
– Everything was local
– Everything was standalone
– If two things were connected, you made a local copy
• Gopher, WAIS changed that
• The World Wide Web actually changed that – Sir Tim Berners-Lee – CS/Eng, at CERN, 1989
– Core idea – HTML to link “stuff”
– Which needed a protocol – HTTP (IETF)
– http://info.cern.ch/hypertext/WWW/TheProject.html – Now heads up W3C.org (and many other roles)
3
The Web
• HTTP underpins the web
– to deliver html and (many) associated content items
• Request(s)/response(s) from multiple resources/sites – Port 80, TCP
– A few versions
1.0
1.1
2.0
SPDY
0.9
1990 1995 2000 2005 2010 2015
4
Resources
• Aggregating and linking resources need IDENTIFIERS
• Uniform Resource Identifiers (URI)
– Or is that a Uniform Resource Name (URN)? – Or a Uniform Resource Locator (URL)?
• Stick with URLs here scheme:[//[user[:password]@]host[:port]][/path][?query][#fragment]
5
URLs – schemes
• scheme:[//[user[:password]@]host[:port]][/path][?query][#fragment] • https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
• 280 of them!
• Some interesting ones:
– Callto://
– File://
• And http:// and https://
6
URLs – the rest
• http://user:password@host:port [/path][?query][#fragment]
• You can provide authentication inline. If you want. In plain text… • Host = something you find in the DNS (or an IP address)
• Port=ifit’snot80,tellme
• Path identifies (absolute-path-to) resource on the host – #fragment goes to a point within that resource
– http://en.wikipedia.org/wiki/IEEE_802#See_also
• Query passes information to that resource
7
The magic of the web
• Static vs dynamic content
• Server-side vs client-side dynamic content
Or all of the above.
Anything that presents information can appear on a web page
8
8 Steps to HTTP happiness
1. Parse URL
2. Resolve DNS
3. Connect to host:port via TCP
4. Make HTTP request
5. Receive content
6. Close TCP connections
7. Unpack content
8. Render
9
HTTP requests – RFC1945 (HTTP 1.0)
• Request/response, text based, start with the method
GET
Get the resource at
Get the headers about the resource at
Append my contribution to the resource at
• Server returns headers, and a body (entity)
10
Use ‘telnet’ as a client
markus@homemaster:~$ telnet www.google.com 80
GET / HTTP/1.0\n\n
HTTP/1.0 302 Found
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Location: http://www.google.com.au/?gfe_rd=cr&dcr=0&ei=192xWr-uObPu8wfUm4noDQ Content-Length: 272
Date: Wed, 21 Mar 2018 04:21:43 GMT
302 Moved
The document has moved here.11
And get it wrong?
markus@homemaster:~$ telnet www.google.com 80
GET / HTTP/3.0\n\n
HTTP/1.0 400 Bad Request
Content-Type: text/html; charset=UTF-8
Content-Length: 1555
Date: Wed, 21 Mar 2018 04:25:14 GMT