CS计算机代考程序代写 FTP dns information retrieval javascript scheme Java cache database COMP30023 – Computer Systems

COMP30023 – Computer Systems
Application Layer – HTTP and HTML
Dr Lachlan Andrew

Recap
• History of the internet
• Network Protocol Models (stacks)
• OSI vs TCP/IP
• Acknowledgement:
• These slides are minor modifications of those prepared by Dr Chris Culnane
© 2021 University of Melbourne
2

Summary
• Top-down approach
– We’llgraduallypeelawaythelayersoverthecomingweeks
• Application Layer
– HTTP (the web protocol), and in relation to it, HTML
• Wireshark – viewing network protocols in real-time
© 2021 University of Melbourne
3

World Wide Web – A Short History
• Sir Tim Berners-Lee
– 1984returntoCERN(TCP/IPinstalled)
– Sawmanyonlinedatabaseswithdifferent access mechanisms (FTP, Gopher, …)
– 1989wrotetheproposal“alargehypertext database with typed links” (No takers)
– by1990,haddesignandbuilt:HTTP,HTML, httpd, WorldWideWeb (browser)
– 1992leftforMIT,afterCERNITHead described it as a misallocation of resources
• Hypertext
– TedNelsoncoinedthetermin1963
– Creationanduseoflinkedcontent
© 2021 University of Melbourne
4

World Wide Web – A Short History
• The vision was that HTTP would be the “glue” between data on different existing protocols
– e.g., FTP (file transfer protocol) – many files available for download
• GOPHER – distributed database developed at U. Minnesota in 1991
– Hierarchical file structure
– More suited for text interfaces – lower network overhead – February 1993, charging for server
• May 1994 first International WWW Conference (at CERN)
• September 1994 W3C formed (DARPA & European Community) – Standardisation of web technologies – royalty free
• Browser wars 1994-1998 (Microsoft vs. Netscape)
• 1999 – 2001 .com boom
• 2002+ Ubiquitous web
• Web 2.0 – semantic web, social media
© 2021 University of Melbourne
5

WWW – Components
• Client – typically a browser based access to pages
• Server – daemon based content delivery of pages
• URL ≈ Protocol + DNS Name + file name
© 2021 University of Melbourne
6

WWW – Architecture
© 2021 University of Melbourne
7

HTTP – Overview
• HyperText Transfer Protocol
– Definedeverythingneededfortheweb
• TCP/IP Model vs OSI Model
– Applicationlayer(exceptcompression/encoding-Presentation)
• Resources are referenced by URLs
© 2021 University of Melbourne
8

URL/URI
• Uniform Resource Locator
– SirTimcalleditthe“universalresourcelocator”
– DefinedinoriginalHTTPspecification
– Anaddressforaresource
– Canberelative“./nextpage.html”orabsolute “http://www.google.com”
• Separate specification by W3C in 1998 for URI – UniformResourceIdentifier
– scheme:[//[user[:password]@]host[:port]][/path][?query][#fragment] – abc://username:password@example.com:123/path/data?key=value#fragid1
2021 © University of Melbourne 9

HTTP – Protocol Overview
• Overview:
– Client initiates TCP connection (creates socket) to server, port 80
– ServeracceptsTCPconnectionfromclient
– HTTPmessages(application-layerprotocolmessages)exchanged between browser (HTTP client) and Web server (HTTP server)
– TCPconnectionclosed • Connections:
– HTTP1.0–singleuseconnection
– HTTP1.1–persistentconnections,additionalheaders
– HTTP/2–2015–Furtherspeedimprovements(originsinSPDY)
– HTTP/3(draft;inuse)–Allowmoreparallelismindataloading(QUIC)
© 2021 University of Melbourne
10

Non-persistent HTTP
©
©
2021
University of Melbourne
11

Persistent vs. Non-persistent
• Non-persistent:
– requires2“responsetimes”(onetoinitiateTCPconnectionandone
for initial HTTP request) per object + file transmission time
– OSoverheadforeachTCPconnection
– browsersoftenopenparallelTCPconnectionstofetchreferenced objects
• Persistent:
– serverleavesconnectionopenaftersendingresponse
– subsequentHTTPmessagesbetweensameclient/serversentover open connection
– clientsendsrequestsassoonasitencountersareferencedobject, reducing overall response time
© 2021 University of Melbourne
12

HTTP Request Connection
• HTTP with (a) multiple connections and sequential requests. (b) A persistent connection and sequential requests.
(c) A persistent connection and pipelined requests.
© 2021 University of Melbourne
13

HTTP – Summary of key steps
• Steps that occur when a link is selected:
– BrowserdeterminestheURL
– BrowserasksDNSfortheIPaddressoftheserver(ResolvingURL) – DNSreplies
– ThebrowsermakesaTCPconnection
– SendsHTTPrequestforthepage
– ServersendsthepageasHTTPresponse
– BrowserfetchesotherURLsasneeded
– Thebrowserdisplaysthepage(progressively,ascontentarrives) – TheTCPconnectionsarereleased
© 2021 University of Melbourne
14

HTTP – Request Methods
HTTP Method
Safe
Idempotent
Cacheable
GET
Yes
Yes
Yes
HEAD
Yes
Yes
Yes
POST
No
No
Yes/No
PUT
No
Yes
No
DELETE
No
Yes
No
CONNECT
No
No
No
OPTIONS
Yes
Yes
No
TRACE
Yes
Yes
No
PATCH
No
No
No
• Idempotent – multiple identical requests have same effect
• Safe – Only for information retrieval, should not change state
© 2021 University of Melbourne
15

Wireshark Example
© 2021 University of Melbourne
16

request line
(GET, POST, HEAD)
header lines
Blank line
(2 LF or 2 CR/LF) indicates end of message
GET /somedir/page.html HTTP/1.1
Host: www.somesite.com.au
User-agent: Mozilla/4.0
Connection: close
Accept-language: fr
(extra new line)
HTTP Request Example
© 2021 University of Melbourne
17

HTTP Response Codes
Code
Meaning
Examples
1xx
Information
100 – server agrees to handle client’s request
2xx
Success
200 = request succeeded; 204 = no content present
3xx
Redirection
301 = page moved; 304 = cached page still valid
4xx
Client error
403 = forbidden page; 404 = page not found
5xx
Server error
500 = internal server error; 503 try again later
© 2021 University of Melbourne
18

Status line (protocol status code and phrase)
header lines
Data, e.g., requested HTML file
HTTP/1.1 200 OK
Connection: close
Date: Thu, 06 Aug 2009 12:00:15 GMT
Server: Apache/2.2.11 (Unix)
Last-modified: Mon, 22 Jun 2009
Content-Length: 6821
Content-Type: text/html

HTTP – Response
© 2021 University of Melbourne
19

Header
User-Agent
Accept
Accept-Charset
Accept-Encoding
Accept-Language
If-Modified-Since
If-None-Match
Host
Authorization
Referer
Cookie
Set-Cookie
HTTP Headers
Server © Response
Type
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Request
Description
Information about the browser and its platform
The type of pages the client can handle
The character sets that are acceptable to the client
The compression formats the client can handle
The natural languages the client can handle
Time and data to check freshness
Previously sent tags to check freshness
The server’s DNS name
A list of the client’s credentials
The previous URL from which the request came
Previously set cookie sent back to the server
Response
Cookie for the client to store
Information about the server
20
© 2021 University of Melbourne

Header Type
Description
HTTP Headers
Content-Encoding Response How the content is encoded (e.g., gzip)
Content-Language Response
The natural language used in the page
Content-Length
Response
The page’s length in bytes
Content-Type
Response
The page’s MIME type
Content-Range
Response
Identifies a portion of the page’s content
Last-Modified
Response
Time and date the page was last changed
Expires
Response
Time and date when the pages stops being valid
Location
Response
Tells the client where to send its request
Accept-Ranges
Response
Indicates the server will accept byte range requests
Date
Both
Date and time the message was sent
Range
Both
Identifies a portion of a page
Cache-Control
Both
Directives for how to treat cache
Etag
Both
Tag for the contents of the page
Upgrade
Both
The protocol the sender wants to switch to
© 2021 University of Melbourne
21

Client side processing
• Plugins/Extensions – integrated software module which executes inside the browser,
– directaccesstoonlinecontext
• Helper – separate program which can be instantiated by the
browser, but can only access local cache of file content – application/pdf
– application/msword
© 2021 University of Melbourne
22

Server side processing – static page
• 5 step process:
– AcceptTCPConnectionfromclient(browser)
– Identifythefilerequested
– Getthespecifiedfilefromthelocalstorage(disk,RAM,…) – Sendthefiletotheclient
– ReleasetheTCPconnection
© 2021 University of Melbourne
23

Multi-threaded Web Server
• A multithreaded Web server with a front end and processing modules.
© 2021 University of Melbourne
24

Multi-threaded Web Server – dynamic
• A processing module performs a series of steps:
– ResolvenameofWebpagerequested.
– PerformaccesscontrolontheWebpage.
– Checkthecache.
– Fetchrequestedpagefromdiskorrunprogram – Determinetherestoftheresponse
– Returntheresponsetotheclient.
– Makeanentryintheserverlog.
© 2021 University of Melbourne
25

Web Cache
• Goal: satisfy client request without involving origin server – reduce response time.
© 2021 University of Melbourne
26

Web proxy
• Used for caching, security and IP address sharing
• The browser sends all HTTP requests to the proxy. The proxy returns objects in its cache or else the proxy requests object from origin server, then returns object to client.
• Note: the proxy server acts as both client and server.
2021 © University of Melbourne 27

Cookies
• The network stores no state about web sessions
• Cookies can place small amount (<4Kb) of information on the users computer and re-use deterministically (RFC 2109) • Cookies have 5 fields – domain,path,content,expiry,security • How to keep state – maintain state at sender/receiver over multiple transactions; http messages carry “state” • Questionable mechanism for tracking users (invisibly perhaps) and learning about user behaviour – e.g.,competitorsnooping,undesirablecontentetc. © 2021 University of Melbourne 28 Example Cookies amazon.com.au Name Value Domain Expires HTTPOnly Secure Session-id 356-7554479- 6471342 .amazon.com.au 2036-01-01. Session-id-time 2082787201l .amazon.com.au 2036-01-01. ad-id A3kfU1c7DE3 Wqz474A25Zfs .amazon.adsystem.com 2037-01-01. nytimes.com Name Value Domain Expires HTTPOnly Secure ad-id A3kfU1c7DE3 Wqz474A25Zfs .amazon.adsystem.com 2037-01-01. © 2021 University of Melbourne 29 Static web documents • HTML - Hypertext Markup Language – asimplelanguagedesignedtoencodebothcontentand presentational information – Plaintextencoding,withbrowserbasedrendering – RestrictedtoISO-8859Latin-1characterset(internationalisationnot introduced until XHTML with UTF encodings) • Web Page Components – Structuraldivisions: • Head … • Body …
– SyntacticallyRestrictedTagSets
– Attributes&Values
© 2021 University of Melbourne
30

2021 ©
University of Melbourne
31

Beyond HTML
• HTML was originally an instance of SGML – standard generalized markup language
• People wanted an HTML-like language to describe data that is not hypertext – but SGML is too general / “heavy”
• XML (Extensible Markup Language)
& XSL (Extensible Stylesheet Language)
– Primary feature: separation of content and presentational markup – Stringent validation requirements
• XHTML
– Essentially an expression of HTML 4.0 as valid XML
– Major differences to HTML 4.0 are the requirements for conformance, case folding, well-formedness, attribute specification, nesting and embedding, and inclusion of a document type identifier
© 2021 University of Melbourne
32

Dynamic Content
© 2021 University of Melbourne
33

Dynamic Content
© 2021 University of Melbourne
34

Scripting
© 2021 University of Melbourne
35

Client-side Scripting
• Technologies for producing interactive web applications include:
• JavaScript
• Java Applets – compiled Java code (platform independent)
• ActiveX – compiled code for Windows
• AJAX
– HTMLandCSS:presentinformationaspages.
– DOM:changepartsofpageswhiletheyareviewed.
– XML:letprogramsexchangedatawiththeserver.
– AnasynchronouswaytosendandretrieveXMLdata.
– JavaScript as a language to bind all this together
© 2021 University of Melbourne
36

And finally…
• Tracking with cookies is well known
• Tracking companies have expanded beyond simple cookies
– Plug-in,browserfingerprinting
• https://coveryourtracks.eff.org/ Project to research tracking
techniques in browsers
• How unique is your browser by Peter Eckersley (EFF) : https://coveryourtracks.eff.org/static/browser- uniqueness.pdf
© 2021 University of Melbourne
37