CS计算机代考程序代写 file system distributed system concurrency cache PERSISTENCE: Distributed File Systems (NFS + AFS)

PERSISTENCE: Distributed File Systems (NFS + AFS)
Andrea Arpaci-Dusseau CS 537, Fall 2019

ADMINISTRIVIA
Project 7: xv6 File systems: Improvements + Checker Specification Quiz – Worth Project Points
7a due yesterday, 7b due Today
Can still request project partner if needed…
Final Exam
Friday, December 13th 7:25-9:25 pm
Two Rooms: Last name A-H in SOC SCI 5206, I-Z in Humanities 3650 Slightly cumulative (some T/F from Virtualization and Concurrency – 25%)
Exam Review
Next Tuesday: You ask questions to cover by Monday at 5:00pm Next Wednesday discussions

AGENDA / LEARNING OUTCOMES
What is the NFS stateless protocol?
What are idempotent operations and why are they useful?
What state is tracked on NFS clients?
What is the AFS protocol?
Why is AFS more scalable with more intuitive consistency model?

What is a Distributed System?
A distributed system is one where a machine I’ve never heard of can cause my program to fail.
— Leslie Lamport Definition:
More than 1 machine working together to solve a problem
Examples:
– client/server: web server and web client
– cluster: page rank computation, running massively parallel map-reduce

Why Go Distributed?
More computing power – throughput
– latency
More storage capacity Fault tolerance
Data sharing

New Challenges
System failure: need to worry about partial failure
Communication failure: network links unreliable – bit errors
– packet loss
– link failure
Individual nodes (machines) crash and recover – Some of our focus today

Distributed File Systems
Local FS (FFS, ext3/4, LFS):
Processes on same machine access shared files
Network FS (NFS, AFS):
Processes on different machines access shared files in same way
Many clients with single server…

Goals for distributed file systems
Fast + simple crash recovery
– Both clients and file server may crash
Transparent access
– Can’t tell accesses are over the network
– Normal UNIX semantics
Reasonable performance
– Scale with number of clients?

NFS: Network File System
Think of NFS as more of a protocol than a particular file system
Many companies have implemented NFS since 1980s: Oracle/Sun, NetApp, EMC, IBM
We’re looking at NFSv2
– NFSv4 has many changes
Why look at an older protocol?
– Simpler, focused goals (simplified crash recovery, stateless) – To compare and contrast NFS with AFS

Architecture Network API Caching
NFS Overview

NFS Architecture
RPC RPC
RPC RPC
RPC: Remote Procedure Call Cache individual blocks of NFS files
Client
Client
Cache
Cache
File Server
Local FS
Client
Cache
Client
Cache

Client FS
backups bak1 bak2
/
home
etc
bin
bak3
tyler
p1
. bashrc
537
Mount: device or fs protocol on namespace
/dev/sda1 on / /dev/sdb1 on /backups NFS on /home/tyler
p2

General Strategy: Export FS
etc bin
tyler
p1
backups
bak1 bak2 bak3
/
home
Where will read to /backups/bak1 go?
. bashrc
/dev/sda1 on / /dev/sdb1 on /backups NFS on /home/tyler
537
Client
Server
read
Local FS NFS
Local FS
p2

General Strategy: Export FS
etc bin
tyler
p1
backups
bak1 bak2 bak3
/
home
Where will read to /home/tyler/.bashrc go?
. bashrc
/dev/sda1 on / /dev/sdb1 on /backups NFS on /home/tyler
537
Client
Server
read
Local FS NFS
Local FS
p2

Architecture
Overview
Network API:
How do clients communicate with NFS server?
Caching

API Strategy 1
Attempt: Wrap regular UNIX system calls using RPC (Remote Procedure Call) – open() on client calls open() on server
– open() on server returns fd back to client
– read(fd) on client calls read(fd) on server
– read(fd) on server returns data back to client
Client Server
read
Local FS
NFS
Local FS

File Descriptors
Client Server
Local FS NFS
In memory
client fds
Local FS

File Descriptors
Client Server
open() = 2
Local FS NFS
client fds
Local FS

File Descriptors
Client Server
read(2)
Local FS NFS
client fds
Local FS
Remember: What is fd tracking?

Strategy 1 Problems
What about server crashes? (and reboots)
int fd = open(“foo”, O_RDONLY);
read(fd, buf, MAX);
read(fd, buf, MAX);

read(fd, buf, MAX);
Goal: behave like slow read
Server crash!
Client
Server
read(2)
Local FS NFS
client fds
Local FS

Potential Solutions
1. Run some crash recovery protocol when server reboots – Complex
2. Persist fds on server disk
– Slow for disks
– How long to keep fds? What if client crashes? misbehaves?
Client Server
read(2)
Local FS NFS
client fds
Local FS

aPI Strategy 2: put all info in requests
Every request from client completely describes desired operation
Use “stateless” protocol!
– servermaintainsnostateaboutclients
– servercanstillkeepotherstatejustashints(cachedcopies)
– cancrashandrebootwithnocorrectnessproblems(justperformance) – MainideaofNFSv2

Eliminate File Descriptors
Client Server
Local FS NFS
Local FS

Strategy 2: put all info in requests
Use “stateless” protocol!
– server maintains no state about clients
Need API change. Get rid of fds; One possibility:
pread(char *path, buf, size, offset); pwrite(char *path, buf, size, offset);
Specify path and offset in each message
Server need not remember anything from clients
Pros? Cons?
Server can crash and reboot transparently to clients
Too many path lookups

API Strategy 3: inode requests
inode = open(char *path); pread(inode, buf, size, offset); pwrite(inode, buf, size, offset);
With some new interfaces on server, this is pretty good! Any correctness problems?
If file is deleted, the inode could be reused
– Inode not guaranteed to be unique over time

API Strategy 4: file handles
fh = open(char *path); pread(fh, buf, size, offset); pwrite(fh, buf, size, offset);
File Handle = Opaque to client (client should not interpret internals)
One of the fields in an inode is generation #,
incremented each time inode is allocated to new file/directory

Can NFS Protocol include Append?
fh = open(char *path); pread(fh, buf, size, offset); pwrite(fh, buf, size, offset); append(fh, buf, size);
Problem with append()?
RPC often has “at-least-once” semantics (may call procedure on server multiple times) (implementing “exactly once” requires state on server, which we are trying to avoid)
If RPC library replays messages, what happens when append() is retried on server? Could wrongly append() multiple times if server crashes and reboots

Idempotent Operations
Solution:
Design API so no harm if execute function more than once
If f() is idempotent, then:
f() has the same effect as f(); f(); … f(); f()

file file pwrite
pwrite
file file pwrite
pwrite is idempotent
AAAA AAAA
ABBA AAAA
ABBA AAAA
ABBA AAAA

append is NOT idempotent
file file append
append
file file append
A
AB
ABB
ABBB

What operations are Idempotent?
Idempotent
– any sort of read that doesn’t change anything – pwrite
Not idempotent – append
What about these? – mkdir
– creat

API Strategy 4: file handles
Do not include append() in protocol
fh = open(char *path); pread(fh, buf, size, offset); pwrite(fh, buf, size, offset); append(fh, buf, size);
File Handle = Can applications call append????

Final API Strategy 5: client logic
Build normal UNIX API on client side on top of idempotent, RPC-based API Clients maintain their own file descriptors
Client open() creates a local fd object
Local fd object contains:
– file handle (returned by server)
– current offset (maintained by client)

Final API Strategy 5: client logic
read(fd=5, size=1024)
fd 5 local
Client
Server
client fds
Local FS
NFS
Local FS
fh=<...> off=123
Extract inode from fh pread(fh, 123, 1024) Server local
RPC
local FS

Architecture Network API Cache
NFS Overview

Cache Consistency
NFS can cache data in three places:
– server memory
– client disk
– client memory
How to make sure all versions are in sync?

NFS Architecture
RPC RPC
RPC RPC
RPC: Remote Procedure Call Cache individual blocks of NFS files
Client
Client
Cache
Cache
File Server
Local FS
Client
Cache
Client
Cache

Cache Problem 1: Server Memory
Client Server
NFS
write
write buffer
Local FS
write buffer
NSF Server often buffers writes to improve performance; Server might acknowledge write before write is pushed to disk
What happens if server crashes?

Server Memory – Lost on crash
client:
write A to 0
write B to 1 write C to 2
write X to 0 write Y to 1 write Z to 2
012
server mem: server disk:
X
B
Z
Problem:
No write failed, but disk state doesn’t match any point in time
What could have happened? Solutions????

Server Write Buffers
Client Server
NFS
write
write buffer
Local FS
Solution 1. Don’t use server write buffer
(persist data to disk before acknowledging write)
Problem: Slow!

Server Write Buffers
Client Server
NFS
write
write buffer
Local FS
write buffer
battery backed
2. Use persistent write buffer (more expensive)

Cache Problem 2 + 3: Distributed Cache
Client 1 Server
Client 2
NFS
Local FS
cache:
cache: A
Clients must cache some data
Too slow to always contact server Server would become severe bottleneck
NFS
cache:

Client 1
Cache
Server
Client 2
NFS
Local FS
NFS
cache: A
cache: A
cache:
read
Clients must cache some data
Too slow to always contact server Server would become severe bottleneck

Cache
Client 1 Server
Client 2
NFS
Local FS
NFS
cache: A
cache: A
cache: A
Clients must cache some data
Too slow to always contact server Server would become severe bottleneck
read

Cache problem 2: Update visibility
Client 1 Server
Client 2
NFS
write!
Local FS
cache: B
“Update Visibility” problem: server doesn’t have latest version
What happens if process on Client 2 (or any other client) reads data?
Sees old version (different semantics than local FS)
cache: A
NFS
cache: A

Solution to Update Visibility
Client 1 Server
write!
NFS
When client buffers a write, how can server (and other clients) see update? – Client flushes cache entry to server
When should client perform flush????? (3 reasonable options??)
Local FS
cache: B
cache: A

NFS Update Visibility
Possibilities
– Aftereverywrite(tooslow)
– Periodicallyaftersomeinterval(oddsemantics)
NFS solution: Flush blocks
– required on close()
– other times optionally too – e.g., when low on memory
Problems not solved by NFS:
– file flushes not atomic (one block of file at a time)
– two clients flush at once: mixed data

Client 1
Server
Client 2
Cache problem 3: Stale cache
NFS
cache: B
Local FS
cache: B
cache: A
flush
“Stale Cache” problem: Client 2 doesn’t have latest version from server
What happens if process on Client 2 reads data? Sees old version (different semantics than local FS)
NFS

Solution to Stale Cache
Server
Client 2
Local FS
cache: B
Problem: Client 2 has stale copy of data; how can it get latest? One possible solution:
– If NFS server had state, could push update to relevant clients NFS stateless solution:
– Clients recheck if cached copy is current before using data (recheck faster than getting data)
NFS
cache: A

Solution to Stale Cache
Server
Client 2
Local FS
t1
Before using data block, client sends file STAT request to server
– get’s last modified timestamp for this file (t2) (not block…)
– compare to cache timestamp
– if file changed since block fetch timestamp (t2 > t1), then refetch data block
Client cache records time when data block was fetched (t1)
t2
NFS
cache: B
cache: A

Measure then Build
Server
Client 2
Local FS
NFS developers found server overloaded – limits number of clients Found stat accounted for 90% of server requests
Why?
Because clients frequently recheck cache
t2
t1
NFS
cache: B
cache: A

Reducing Stat Calls
Server
Client 2
Local FS
Partial Solution: client caches result of stat (attribute cache)
NFS
cache: B
cache: A
t1 t2
What is result?
Never see updates on server!
Solution: Make stat cache entries expire after a given time (e.g., 3 seconds) (discard t2 at client 2)
What is the result?
Could read data that is up to 3 seconds old

NFS Summary
NFS handles client and server crashes very well; robust APIs are often: – stateless: servers don’t remember clients or open files
– idempotent: repeating operations gives same results
Caching and write buffering is hard in distributed systems, especially with crashes Problems:
– Consistency model is odd
(client may not see updates until 3 seconds after file is closed)
– Scalability limitations as more clients call stat() on server

AFS Goals
Andrew File System: Carnegie Mellon University in 1980s More reasonable semantics for concurrent file access Improved scalability (many clients per server)
Willing to sacrifice simplicity and statelessness

AFS Whole-File Caching
Approach
– Measurements show most files are read in entirety
– Upon open, AFS client fetches whole file, storing in local memory or disk – Upon close, client flushes file to server (if file was written)
Convenient and intuitive semantics:
– Use same version of file entire time between open and close
Performance advantages:
– AFSneedstodoworkonlyforopen/close
– Reads/writesarecompletelylocal

AFS Cache Consistency
1. Update visibility:
How are updates sent to the server?
2. Stale cache:
How are other caches kept in sync with server?

AFS solution:
AFS Update Visibility
– LikeNFS,alsoflushonclose
– Bufferwholefilesonlocaldisk; update file on server atomically
Concurrent writes?
– Last writer (i.e., last file closer) wins
– Never get data mixed from multiple versions on server (unlike NFS)

AFS Stale Cache Problem
Client 1 Server
Client 2
Client FS
Local FS
cache: B
cache: B
“Stale Cache” problem: client 2 doesn’t have latest
Client FS
cache: A

AFS: No Stale Cache
Server
Client 2
Local FS
cache: B
AFS solution: Server tells clients when data is overwritten
– Server must remember which clients have this file open right now
– Server is no longer stateless!
When clients cache data (on open), ask for “callback” from server if changes
– Clients can use data (during this open) without checking all the time
Clients only verifies callback when open() file (not every read); might not refetch on next open() – Operate on same version of file from open to close
Client FS
cache: A

AFS Callbacks: Dealing with STATE
1. What if client crashes?
2. What if server runs out of memory?
3. What if server crashes?

Detail 1: Client Crash
Server
Client 2
What should client do after reboot? (remember cached data can be on disk too…)
Local FS
may have missed notification that cached copy changed
Concern?
Option 1: evict everything from cache Option 2: ???
recheck entries before using
Client FS
cache: B
cache: A

Detail 2: Low Server Memory
Server
Client 2
Local FS
Strategy: tell clients you are dropping their callback What should client do?
Option 1: Discard entry from cache
Option 2: ???
Mark entry for recheck
Client FS
cache: B
cache: A

Detail (?) 3: Server Crashes
What if server crashes?
Option: tell all clients to recheck all data before next read
Handling server and client crashes without inconsistencies or race conditions is very difficult…

AFS Summary
State is useful for scalability, but makes handling crashes hard – Server tracks callbacks for clients that have file cached
– Lose callbacks when server crashes…
Workload drives design: whole-file caching
– More intuitive semantics
(see version of file that existed when file was opened)

Cache consistency comparison
• When will clients see changes? • NFS
– Individual reads: 3 seconds after other client closes file
• AFS
– Whole file: Next time open file after other client closes file

NFS vs AFS Protocols
When will server be contacted for NFS? For AFS? What data will be sent? What will each client see?

Nfs Protocol

AFS Protocol