Spark tutorial 1 now online!
• Instructions to set up local dev environment • Instructions to check everything works as
• Give it a go this week!
Copyright By PowCoder代写 加微信 powcoder
Did you do your pre-session reading?
Daa oage
• Assume we have data – Where do we store it? – How do we retrieve it?
File system abstraction
• What do you expect of a filesystem?
– A namespace (filenames, directory names, etc.) – A storage format
– A mapping from filenames to storage
– Read/write access paths and API implementation
• Most filesystems store data in blocks. Why?
• Why should we bother with a DFS?
– Why/when is a centralised file system not a good
enough option?
• Wh don e ala e a DFS
– Why/when is a centralised file system the best option?
• What is in a DFS? What are its basic components and architecture?
Distributed file systems
• Maincomponents:
– A distributed back-end
• Storage layer: Stores data over a distributed fabric/overlay
• Access layer: Provides basic indexing/retrieval functionality • Boh calabili fal oleance eliabili load balancing • Access control? Consistency? Concurrency?
– A client-side library
• Provides a familiar/standards-compliant API
• Hides away all of the implementation details
• Communicates with the back-end through RPC of ome o
Network file systems
• Lo of hem aond – NFS
– CIFS – AFS – Coda –
• What is wrong with these?
Intermission
• Le ake a e back
• Suppose you have 10,000 workers (e.g., computers, processes, etc.) downloading and storing The Web on a DFS, in order to then index it and query it
• What is the most important quality/characteristic of a distributed file system that is to be used for storing and indexing The Web?
a. Throughput b. Latency
c. Fault tolerance d. Scalability
Cluster Topology
• Nodes in a cluster are placed in racks
• Typical topology: see image
• Thus, nodes in same rack are closer han node in eaae racks
– Bandwidthisgreater
– Latencyislower
– Otherwise, packets need to go through several network hops
• How can we know where is each node located and what is the distance between any two nodes?
HDFS — A bid ee ie
• Operates over a cluster of commodity (off-the-shelf — i.e., inexpensive but
unreliable) nodes
• 2 main back-end entity types:
– NameNode: Single node, provides directory service and metadata storage
– DataNodes: Actual data storage
– HDFS clients: Client-side RPC libraries
• All communication over TCP/IP — o ha acke don go AWOL
• Designed for write-once-read-many access
• Designed and built to provide
– High throughput batch storage and processing
– High availability/fault tolerance
– Scalability/elasticity
Availability
• How can we accomplish high availability?
• Typically:
– Partitioning (in blocks) — why?
– Per-block checksums
– Replication — why?
– Replica liveness checks + repair mechanisms – Careful placement of replicas — meaning?
High throughput
• How can we accomplish high write throughput?
• How can we accomplish high read throughput?
• Typically:
– Avoid touching the disk as much as possible
– If o can aoid i aoid andom IO a mch a oible
– But how is this possible?
• In-memory caching for reads
• In-memory buffering for writes
– But then what about durability? • Write-ahead log
• But now ee oching dik?
Designing a filesystem to store and inde he old ide eb
• Which of the following designs would lead to the highest write and read performance?
a. One random-access writer/multiple random-access readers per file/block
b. One append-only writer/multiple random-access readers per file/block
c. One random-access writer/multiple sequential readers per file/block
d. One append-only writer/multiple sequential readers per file/block
An HDFS file
1st replica
A file with 4 blocks
2nd replica
Block 2 Block 3 Block 4
DN0 DN1 DN1 DN0 DN3 DN2 DN2 DN3
Replicate blocks in different DNs
Store blocks in different DNs
Checksum computed and stored per block
HDFS architectural principles
• HDFS provides a familiar interface (UNIX-like): – A UNIX-like FileSystem interface:
• Open/close, read/write, etc.
– Unix-like inodes
– Unix-like filenames/namespace: pathnames
• Separates file content data from metadata
• Meadaaglobalandlocalefileblock
– Namenode NN oe global meadaa
– DataNodes (DN): stores file blocks and per-block metadata
HDFS architectural principles • A single NN?! But why?!?!
• In i a ingle oin of faile
• In i a efomance boleneck
HDFS reads
• How can we find all blocks for a file?
• How can we find all replica locations for a block?
NameNodes • Soe global metadata such as :
– The state of the namespace
• The set of directories and their files and their hierarchical organization
– Mapping path names to unique file IDs
– Access permissions for files
– Modification times for files
– Locations of the file blocks and their replicas
• How many blocks and which DNs store them
• How many replicas per block and which DNs store them
• Colleciel called he fileem image
– Loaded in RAM for as long as the NN is running — why?
– A snapshot is stored locally at the NN in a so called checkpoint file — why?
– Plus a journal file — a write-ahead log (WAL) for all mutations to the filesystem, also stored on the local filesystem of the NN — why?
• HDFS ide on naie filesystem of cluster nodes
• Each HDFS file block is actually two files in native FS
– One file for the contents of the HDFS file block
– One file fo he local metadata (per block ) • Checksums
• Size — Why?
• Each block is BIG: default = 128MBs
– Contrast to blocks in typical (local) filesystems – Why so large blocks?
HDFS reads
• What is the best way to read a block, assuming we know the locations of all of its replicas in our (commodity hardware) datacentre?
a. Contact all replicas, return when the first responds
b. Contact a replica chosen uniformly at random so as to balance the load
c. Conac he elica ha cloe o he clien neok distance)
d. Contact the replica that is less loaded
HDFS reads
Cluster Membership
Secondary NameNode
NameNode : Maps a file to a file-id and list of MapNodes DataNode : Maps a block-id to a physical location on disk SecondaryNameNode: Periodic merge of Transaction log
Cluster Membership
HDFS reads
• When is it best to check the block checksums and who should do it?
a. The DN, right after the checksum is computed and stored
b. The NN, right after the checksum is computed and stored
c. The DN, periodically (every X minutes)
d. The client, when it reads a block
HDFS writes
• What is the best way for a client to create/write replicas of a block?
a. Client contacts all replica locations in parallel (broadcast or multi-unicast)
b. Client contacts all replicas one after the other
c. Client sends data to first replica, which sends data to second, etc.
d. Client sends data to first replica, then forwards to remaining replicas lazily (i.e., when idle)
Writing to HDFS
Writing to HDFS:
pipelining blocks to DNs
Control signals
Packet transfer
Leases vs locks
• What is the difference?
• When is one preferable over the other?
• What does this mean for read consistency?
NN <> DNs communication
• On startup of DN: Handshake with NN
– Ensure correct version of SW and namespaceID
• Register, after successful handshake
– Obtain unique storageID
• Unie ID of DN indeenden of IP o
• Block Reports:
– List of (block_id, timestamp, length) for every block in DN – Think: Why NN needs to know block length info?
• Heartbeats
– Periodic (3secs)
– Piggyback storage use and capacity, etc.
NN <> DNs communication
• Did you notice that the NN will never initiate a
connection to a DN?
– How can the NN check that a DN is alive and well? – How can a DN check that the NN is alive and well? – Ho can he NN ge hold of he DN meadaa
Replication
• Who checks that all blocks have enough replicas? Why?
• What happens when a block has less replicas than
requested?
• What happens when a block has more replicas than requested?
• Anything we need to be aware of/careful about?
• How is the storage load balanced across nodes?
NN fault tolerance
• Note: single NN = bottleneck + SPoF!
• What if the NN dies?
– How is its state saved/restored?
– What if the checkpoint and journal files disappear with it?
Coming ne b no j e
• Transactionaldata?
– Eg consistent concurrent reads and concurrent
writes to the same data
• Structureddata?
– E.g., record oriented views, columns
• Relational table data support?
– E.g., indexing and random access
• HDFS was not designed for these
• Ene NSQL
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com