CIS 455/555: Internet and Web Systems
GFS and MapReduce October 26, 2020
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
1
Plan for today
n Publish/subscribe
n XFilter
n Google File System
n Introduction to MapReduce
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
2
NEXT
Decomposing into path nodes
Q1=/politics[@topic=“president”]/usa//body Query ID
Q1
1
0
1
Q1
2
1
0
Q1
3
-1
-1
Q1-1 Q1-2 Q1-3
Q2=//usa/*/body/p
Position in state machine
Relative Position (RP) in tree:
0 for root if it’s not preceded by “//”
-1 for any node preceded by “//”
Else =1+ (no of “*” nodes from predecessor node)
Level:
Begins as either 1, 0, or -1
1 for root if it’s not preceded by “//” -1 if RP = –1
Else 0
Finally, NextPathNodeSet points to next node
Q2
1
-1
-1
Q2
2
2
0
Q2
3
1
0
Q2-1 Q2-2 Q2-3
3
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
Indexing states by element names
X CL X WL
X X
n Query index entry for each XML tag
n Candidate List (CL) of active states n Wait List (WL) of pending states
n Events that cause state
transition are generated by
the XML parser
n CopyfromWLàCL n Populate the Level
politics
usa
body
p
Q1-1
Q2-1
Q1-2
X
Q1-3
Q2-2
X
X
X
Q2-3
© 2020 A. Haeberlen, Z. Ives, V. Liu
University of Pennsylvania
4
Encountering an element
n Look up the element name in the Query Index and all nodes in the associated CL
n Validate that we actually have a match
startElement: politics Entry in Query Index:
Query ID Position Rel. Position Level
NextPathNodeSet
Q1
1
0
1
Q1-1
politics
X X
CL
WL Q1-1
© 2020 A. Haeberlen, Z. Ives, V. Liu
University of Pennsylvania
5
Validating a match
n We first check that the current XML depth matches the level in the user query:
n If level in CL node is -1, then ignore height n else level in CL node must = height
n This ensures we’re matching at the right point in the tree! n Finally, we validate any predicates against
attributes (e.g., [@topic=“president”]) © 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
6
Processing further elements
n Queries that don’t meet validation are removed from the Candidate Lists
n For other queries, we advance to the next state
n We copy the next node of the query from the WL to the CL, and update the level
n When we reach a final state (e.g., Q1-3), we can output the document to the subscriber
n When we encounter an end element, we must remove that element from the CL
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
7
A simpler approach
n Instantiate a DOM tree for each document n Traverse and recursively match XPaths
n Pros and cons?
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
8
Recap: Publish-subscribe model
n Publish-subscribe model
n Publishers produce events
n Each subscriber is interested in a subset of the events
n Challenge: Efficient implementation n Comparison: XFilter vs RSS
n XFilter
n Interests are specified with XPaths (very powerful!)
n Sophisticated technique for efficiently matching documents against many XPaths in parallel
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
9
Plan for today
n XFilter
n Google File System (also Hadoop DFS)
n GFS/HDFS and MapReduce n Key assumptions
n Architecture and operation
n Introduction to MapReduce n Programming model
n Data flow
n A few simple example tasks
NEXT
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
10
Suppose We Have a Very
Unreliable Stream Processing Substrate
n We are reading records from a spout, and emitting events
n Each event gets sent to a bolt
n The bolts run on many compute nodes, and those
nodes crash very frequently
n Key idea: distributed checkpointing with replication! This is one key idea underlying the Google File System (GFS) and MapReduce.
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
11
Background: Distributed filesystems
n Many distributed filesystems have been developed:
n NFS, SMB are the most prevalent today
n YouprobablyusethemhereinSEAS–e.g.,everytimeyouloginto
eniac or a Linux box, or a Windows machine
n Andrew FileSystem (AFS) was also fairly popular
n Hundreds of other research filesystems, e.g., Coda, Sprite, … with different properties
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
12
NFS in a Nutshell
n (Single) server, multi-client architecture n Server is stateless, so clients must send all context
(including position to read from) in each request n Mostly mimics UNIX semantics
n Opening a file requires opening each dir along the way n fd=open(“/x/y/z.txt”)willdoa
n lookup for x from the root handle n lookup for y from x’s handle
n lookup for z from y’s handle
n Server must commit writes immediately
n Pros/cons?
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
13
The Google File System (GFS)
n Goals:
n Support millions of huge (many-TB) files
n Partition & replicate data across thousands of unreliable machines, in multiple racks (and even data centers)
n Willing to make some compromises to get there:
n Modified APIs – doesn’t plug into POSIX APIs n Infact,reliesonbeingbuiltoverLinuxfilesystem
n Doesn’t provide transparent consistency to apps!
n Appmustdetectduplicateorbadrecords,supportcheckpoints
n Performance is only good with a particular class of apps: n Stream-basedreads
n Atomicrecordappends
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
14
Key assumptions in GFS
n Component failures are the common case
n Thousands of storage machines, built from commodity parts n Need monitoring, error detection, fault tolerance, recovery!
n Special application workload
n Small number of very large, multi-GB files
n Primarily large streaming reads + small random reads
n Many large, sequential appends (e.g., many-way merging or producer/consumer); random writes are rare
n Multiple clients may be writing/appending concurrently n Exact sequence of appended records does not matter
n Benefits from co-designing file system & apps n For example, can relax consistency w/o burdening application
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
15
GFS Basic architecture & lookups
n Files broken into fixed 64MB “chunks”
n Master stores metadata; 3 chunkservers store each chunk
n As with Napster, actual data transfer from chunkservers to client
n Clients cache metadata, but not files!
n Pros/cons?
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
16
The Master: Metadata and Mappings
n Controls (and locks as appropriate):
n Mapping from files -> chunk IDs within each namespace
n Maintains a log (replicated to backups) of all mutations to the above
n Controls reallocation, garbage collection of chunks n Also knows mapping from chunk ID ->
{machines}
n Doesn’t have persistent knowledge of what’s on chunkservers
n Instead, during startup, it polls them n … Or when one joins, it registers
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
17
Primary/backup (in general)
X¬7 X?
Ordered requests X¬2, X?, X¬7
Backups
X¬2
n Scenario: Multiple replicas, concurrent
requests from different clients n How to ensure consistency?
n Idea: Designate one replica as the primary n Accepts all requests, orders them, and then forwards
ordered requests to the backups
n When should the primary report success to a client? n What has to happen when the primary fails?
n What kind of consistency does this provide? © 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
18
Primary