计算机代考 Spark tutorial 1 now online!

Spark tutorial 1 now online!
• Instructions to set up local dev environment • Instructions to check everything works as
• Give it a go this week!

Did you do your pre-session reading?

Da􏰔a 􏰖􏰔o􏰬age􏰅
• Assume we have data – Where do we store it? – How do we retrieve it?

File system abstraction
• What do you expect of a filesystem?
– A namespace (filenames, directory names, etc.) – A storage format
– A mapping from filenames to storage
– Read/write access paths and API implementation
• Most filesystems store data in blocks. Why?

• Why should we bother with a DFS?
– Why/when is a centralised file system not a good
enough option?
• Wh􏰇 don􏰁􏰔 􏰋e al􏰋a􏰇􏰖 􏰂􏰖e a DFS􏰷
– Why/when is a centralised file system the best option?
• What is in a DFS? What are its basic components and architecture?

Distributed file systems
• Maincomponents:
– A distributed back-end
• Storage layer: Stores data over a distributed fabric/overlay
• Access layer: Provides basic indexing/retrieval functionality • Bo􏰔h􏰳 􏰖calabili􏰔􏰇􏰄 fa􏰂l􏰔 􏰔ole􏰬ance􏰄 􏰬eliabili􏰔􏰇􏰄 load balancing􏰄 􏰅 • Access control? Consistency? Concurrency?
– A client-side library
• Provides a familiar/standards-compliant API
• Hides away all of the implementation details
• Communicates with the back-end through RPC 􏰌of 􏰖ome 􏰖o􏰬􏰔􏰅􏰆

Network file systems
• Lo􏰔􏰖 of 􏰔hem a􏰬o􏰂nd􏰅 – NFS
– CIFS – AFS – Coda –􏰅
• What is wrong with these?

Intermission
• Le􏰔􏰁􏰖 􏰔ake a 􏰖􏰔e􏰕 back 􏰅
• Suppose you have 10,000 workers (e.g., computers, processes, etc.) downloading and storing The Web on a DFS, in order to then index it and query it
• What is the most important quality/characteristic of a distributed file system that is to be used for storing and indexing The Web?
a. Throughput b. Latency
c. Fault tolerance d. Scalability

Cluster Topology
• Nodes in a cluster are placed in racks
• Typical topology: see image
• Thus, nodes in same rack are 􏰛closer􏰪 􏰔han node􏰖 in 􏰖e􏰕a􏰬a􏰔e racks
– Bandwidthisgreater
– Latencyislower
– Otherwise, packets need to go through several network hops
• How can we know where is each node located and what is the distance between any two nodes?

HDFS — A bi􏰬d􏰁􏰖 e􏰇e 􏰓ie􏰋
• Operates over a cluster of commodity (off-the-shelf — i.e., inexpensive but
unreliable) nodes
• 2 main back-end entity types:
– NameNode: Single node, provides directory service and metadata storage
– DataNodes: Actual data storage
– HDFS clients: Client-side RPC libraries
• All communication over TCP/IP — 􏰖o 􏰔ha􏰔 􏰕acke􏰔􏰖 don􏰁􏰔 go AWOL
• Designed for write-once-read-many access
• Designed and built to provide
– High throughput batch storage and processing
– High availability/fault tolerance
– Scalability/elasticity

Availability
• How can we accomplish high availability?
• Typically:
– Partitioning (in blocks) — why?
– Per-block checksums
– Replication — why?
– Replica liveness checks + repair mechanisms – Careful placement of replicas — meaning?

High throughput
• How can we accomplish high write throughput?
• How can we accomplish high read throughput?
• Typically:
– Avoid touching the disk as much as possible
– If 􏰇o􏰂 can􏰁􏰔 a􏰓oid i􏰔􏰄 a􏰓oid 􏰬andom I􏰊O a􏰖 m􏰂ch a􏰖 􏰕o􏰖􏰖ible
– But how is this possible?
• In-memory caching for reads
• In-memory buffering for writes
– But then what about durability? • Write-ahead log
• But now 􏰋e􏰁􏰬e 􏰔o􏰂ching di􏰖k?

Designing a filesystem to store and inde􏰈 􏰔he 􏰋o􏰬ld 􏰋ide 􏰋eb􏰅
• Which of the following designs would lead to the highest write and read performance?
a. One random-access writer/multiple random-access readers per file/block
b. One append-only writer/multiple random-access readers per file/block
c. One random-access writer/multiple sequential readers per file/block
d. One append-only writer/multiple sequential readers per file/block

An HDFS file
1st replica
A file with 4 blocks
2nd replica
Block 2 Block 3 Block 4
DN0 DN1 DN1 DN0 DN3 DN2 DN2 DN3
Replicate blocks in different DNs
Store blocks in different DNs
Checksum computed and stored per block

HDFS architectural principles
• HDFS provides a familiar interface (UNIX-like): – A UNIX-like FileSystem interface:
• Open/close, read/write, etc.
– Unix-like inodes
– Unix-like filenames/namespace: pathnames
• Separates file content data from metadata
• Me􏰔ada􏰔a􏰳􏰛global􏰪andlocal􏰌􏰕e􏰬fileblock􏰆
– Namenode 􏰌NN􏰆􏰳 􏰖􏰔o􏰬e􏰖 􏰀global􏰁 me􏰔ada􏰔a
– DataNodes (DN): stores file blocks and per-block metadata

HDFS architectural principles • A single NN?! But why?!?!
• I􏰖n􏰁􏰔 i􏰔 a 􏰖ingle 􏰕oin􏰔 of fail􏰂􏰬e􏰷􏰸
• I􏰖n􏰁􏰔 i􏰔 a 􏰕e􏰬fo􏰬mance bo􏰔􏰔leneck􏰷􏰸

HDFS reads
• How can we find all blocks for a file?
• How can we find all replica locations for a block?

NameNodes • S􏰔o􏰬e 􏰀global􏰁 metadata such as :
– The state of the namespace
• The set of directories and their files and their hierarchical organization
– Mapping path names to unique file IDs
– Access permissions for files
– Modification times for files
– Locations of the file blocks and their replicas
• How many blocks and which DNs store them
• How many replicas per block and which DNs store them
• Collec􏰔i􏰓el􏰇 called 􏰔he file􏰖􏰇􏰖􏰔em􏰁􏰖 image
– Loaded in RAM for as long as the NN is running — why?
– A snapshot is stored locally at the NN in a so called checkpoint file — why?
– Plus a journal file — a write-ahead log (WAL) for all mutations to the filesystem, also stored on the local filesystem of the NN — why?

• HDFS 􏰀􏰬ide􏰖􏰁 on na􏰔i􏰓e filesystem of cluster nodes
• Each HDFS file block is actually two files in native FS
– One file for the contents of the HDFS file block
– One file fo􏰬 􏰔he 􏰛local􏰪 metadata (per block ) • Checksums
• Size — Why?
• Each block is BIG: default = 128MBs
– Contrast to blocks in typical (local) filesystems – Why so large blocks?

HDFS reads
• What is the best way to read a block, assuming we know the locations of all of its replicas in our (commodity hardware) datacentre?
a. Contact all replicas, return when the first responds
b. Contact a replica chosen uniformly at random so as to balance the load
c. Con􏰔ac􏰔 􏰔he 􏰬e􏰕lica 􏰔ha􏰔􏰁􏰖 clo􏰖e􏰬 􏰔o 􏰔he clien􏰔 􏰌ne􏰔􏰋o􏰬k distance)
d. Contact the replica that is less loaded

HDFS reads
Cluster Membership
Secondary NameNode
NameNode : Maps a file to a file-id and list of MapNodes DataNode : Maps a block-id to a physical location on disk SecondaryNameNode: Periodic merge of Transaction log
Cluster Membership

HDFS reads
• When is it best to check the block checksums and who should do it?
a. The DN, right after the checksum is computed and stored
b. The NN, right after the checksum is computed and stored
c. The DN, periodically (every X minutes)
d. The client, when it reads a block

HDFS writes
• What is the best way for a client to create/write replicas of a block?
a. Client contacts all replica locations in parallel (broadcast or multi-unicast)
b. Client contacts all replicas one after the other
c. Client sends data to first replica, which sends data to second, etc.
d. Client sends data to first replica, then forwards to remaining replicas lazily (i.e., when idle)

Writing to HDFS

Writing to HDFS:
pipelining blocks to DNs
Control signals
Packet transfer

Leases vs locks
• What is the difference?
• When is one preferable over the other?
• What does this mean for read consistency?

NN <􏰉> DNs communication
• On startup of DN: Handshake with NN
– Ensure correct version of SW and namespaceID
• Register, after successful handshake
– Obtain unique storageID
• Uni􏰹􏰂e ID of DN 􏰌inde􏰕enden􏰔 of IP􏰄 􏰕o􏰬􏰔􏰄 􏰅􏰆
• Block Reports:
– List of (block_id, timestamp, length) for every block in DN – Think: Why NN needs to know block length info?
• Heartbeats
– Periodic (3secs)
– Piggyback storage use and capacity, etc.

NN <􏰉> DNs communication
• Did you notice that the NN will never initiate a
connection to a DN?
– How can the NN check that a DN is alive and well? – How can a DN check that the NN is alive and well? – Ho􏰋 can 􏰔he NN ge􏰔 hold of 􏰔he DN􏰖􏰁 me􏰔ada􏰔a􏰷

Replication
• Who checks that all blocks have enough replicas? Why?
• What happens when a block has less replicas than
requested?
• What happens when a block has more replicas than requested?
• Anything we need to be aware of/careful about?
• How is the storage load balanced across nodes?

NN fault tolerance
• Note: single NN = bottleneck + SPoF!
• What if the NN dies?
– How is its state saved/restored?
– What if the checkpoint and journal files disappear with it?

Coming 􏰂􏰕 ne􏰈􏰔 􏰌b􏰂􏰔 no􏰔 j􏰂􏰖􏰔 􏰇e􏰔􏰆􏰅
• Transactionaldata?
– E􏰍g􏰍􏰄 􏰛consistent􏰪 concurrent reads and concurrent
writes to the same data
• Structureddata?
– E.g., record oriented views, columns
• Relational table data support?
– E.g., indexing and random access
• HDFS was not designed for these
• En􏰔e􏰬 N􏰭SQL􏰅

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts