Issues in Replication and Resiliency
Speculative Execution In Distributed File Systems and
External Synchrony
.Nightingale, ,
Copyright By PowCoder代写 加微信 powcoder
Slides based on the SOSP and OSDI presentations
Issues in Maintaining Consistency across Replicas in Distributed File Systems for Writes
Aspects to be Kept in Mind: Consistency Availability Partition Tolerance
Failure Resiliency in Distributed File Systems
Think of a multi-tier application that needs to protect the data it writes.
Writes are performed by Applications on the Local (Primary) node
Writes are to be replicated to a Remote Replica (secondary)
Need to ensure remote replica is consistent with the primary, so that switchover from primary to secondary on failure is correct.
Impact on write performance is of concern in addition to C, A, P rehab synchronousabstractions: syn strong
strong reliability guarantees but slow
but are slow
asynchronous counterparts:
relax reliability guarantees
relax reliability asyn reasonable performanc
reasonable (better) performance
Replicate Write
Processing
Replicate Write
Processing
Processing
Processing
Processing
Replicate Writes
But performance very sensitive to latency!
Remote writes can be batched and pipelined
Local write
Synchronous
(A)Synchronous Replication
Local write
• Synchronous replication
• No data loss
• No unsafe replies •
Asynchronous replication
Typically some data loss
Client view not consistent with replica
External Synchrony
• provide the reliability and simplicity of a synchronous abstraction
• approximate the performance of an asynchronous abstraction.
Rethink the Sync
. Nightingale, , . Chen and OSDI 2006
Speculative Execution in a Distributed File System
. Nightingale, . Chen, and ACM Transactions on Computer Systems 2006
Rethink the Sync
. Nightingale, , . Chen and
Speculative Execution in a Distributed File System . Nightingale, . Chen, and
External Synchrony • Question
– How to improve both durability and performance
for local file system? • Two extremes
– Synchronous IO • Easy to use
• Guarantee ordering – Asynchronous IO
to make sure How loss
good performance
When a sync() is really async
• On sync() data written only to volatile cache
– 10x performance penalty and data NOT safe
Volatile Cache
Operating System
n 100x slower than asynchronous I/O if we disable cache
From Nightingale’s presentation 12
To whom are guarantees provided? • Synchronous I/O definition:
– Caller blocked until operation completes
n Guarantee provided to application
From Nightingale’s presentation 13
To whom are guarantees provided?
• Guarantee really needs to be provided to the user From Nightingale’s presentation 14
Example: Synchronous I/O
101 write(buf_1);
102 write(buf_2);
103 print(“work done”); 104 foo();
Application blocks Application blocks
%work done %
From Nightingale’s presentation 15
Observing synchronous I/O
Depends on 1st write Depends on 1st & 2nd write
101 write(buf_1);
102 write(buf_2);
103 print(“work done”); 104 foo();
ExternalSyne app • Sync I/O externalizes output based on causal ordering
– Enforcescausalorderingbyblockinganapplication
• External sync: Same causal ordering without blocking applications
causal orderng blockapp sync Yo causalordering don’tblock
From Nightingale’s presentation 16
Example: External synchrony
101 write(buf_1);
102 write(buf_2);
103 print(“work done”); 104 foo();
%work done %
From Nightingale’s presentation 17
External Synchrony Design Overview Synchrony defined by externally observable behavior.
– I/O is externally synchronous if output cannot be distinguished from output that could be produced from synchronous I/O.
– File system does all the same processing as for synchronous I/O.
Two optimizations made to improve performance.
– Group committing is used (commits are atomic).
– External output is buffered and processes continue execution.
Output guaranteed to be committed every 5 seconds.
External Synchrony Implementation
• Xsyncfs leverages Speculator infrastructure for output buffering and dependency tracking for uncommitted state.
• Speculator tracks commit dependencies between processes and uncommitted file system transactions.
• ext3 operates in journaled mode.
(ext3, was in mainline Linux from 2001: journaled file system – keeps track of changes not yet committed to the file system – record changes in a journal (circular log) On a system crash or power failure, such file systems brought back online more quickly, less data corruption).
Evaluation • Durability
• Performance
– IO intensive application (Postmark)
– Application that synchronizes explicitly (MySQL)
– Network intensive, Read-heavy application
– Output-trigger commit on delay
Postmark benchmark
ext3-async xsyncfs ext3-sync ext3-barrier
n Xsyncfs within 7% of ext3 mounted asynchronously From Nightingale’s presentation
Time (Seconds)
Specweb99 throughput
ext3-async xsyncfs ext3-sync ext3-barrier
n Xsyncfs within 8% of ext3 mounted asynchronously From Nightingale’s presentation
Throughput (Kb/s)
The MySQL benchmark
ext3-barrier
0 5 10 15 20
Number of db clients
n Xsyncfs can group commit from a single client From Nightingale’s presentation
Transactions Per Minute
Specweb99 latency
Request size
ext3-async
0.064 seconds
0.097 seconds
0.150 second
0.180 seconds
1.084 seconds
1.094 seconds
100-1000 KB
10.253 seconds
10.072 seconds
n Xsyncfs adds no more than 33 ms of delay
From Nightingale’s presentation
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com