COMP90015 Distributed Systems Semester 2, 2022
Topic: Distributed File Systems (DFS)
Dr Tawfiq Islam
School of Computing and Information Systems (CIS) The University of Melbourne, Australia
Copyright By PowCoder代写 加微信 powcoder
Learning Outcomes
• Understanding the need for Distributed File Systems (DFS)
• Revisiting the basics of Unix File Systems
• Understanding the key requirements for DFS
• Exploring the file service architecture
• Case Study: Sun Network File Systems (NFS)
• Reading: Distributed Systems: Concepts and Design by (5th edition). Chapter 12. Sections: 12.1, 12.2, 12.3
A Case for DFS
Uhm… perhaps time has come to buy a rack of servers….
I want to store my thesis on the server!
I need to have my book always available..
I need storage for my reports
My boss wants…
I need to store my analysis and reports safely…
A Case for DFS
Same here… I don’t remember..
Hey… but where did I put my docs?
Uhm… … maybe we need a DFS?… Well after the paper and a nap…
Wow… now I can store a lot more documents…
I am not sure whether server A, or B, or C…
A Case for DFS
Distributed File System
It is reliable, fault tolerant, highly available, location transparent…. I hope I can finish my newspaper now…
Nice… my boss will promote me!
Wow! I do not have to remember which server I stored the data into…
Good… I can access my folders from anywhere..
Storage systems and their properties
Main memory
File system
Distributed file system
Distributed shared memory Remote objects (RMI/ORB) Persistent object store
Peer-to-peer storage store
Sharing Persis- Distributed Consistency Example
cache/replicas maintenance
1 UNIX file system
Sun NFS Web server
Ivy (Ch. 16) 1 CORBA
1 CORBA Persistent Object Service
2 OcceanStore
Types of consistency between copies: 1 – strict one-copy consistency √ – approximate/slightly weaker guarantees X – no automatic consistency 2 – considerably weaker guarantees
Characteristics of File Systems
• Files contains both data and attributes. Data is usually in the form of a sequence of bytes and can be accessed and modified. Attributes include things like the length of the file, timestamps, file type, owner’s identity and access-control lists.
• Files have a name. Some files are directories that contain a list of other files; and they may themselves be (sub-) directories. This leads to a hierarchical naming scheme for the file. The pathname is the concatenation of the directory names and the file name.
File Attribute Record Structure
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Access control list
E.g. for UNIX: rw-rw-r–
updated by system:
updated by owner:
File System Modules (non-DFS)
Blocks Device
Directories
UNIX File System Operations
filedes = open(name, mode) filedes = creat(name, mode)
status = close(filedes)
count = read(filedes, buffer, n) count = write(filedes, buffer, n)
pos = lseek(filedes, offset, whence)
status = unlink(name)
status = link(name1, name2) status = stat(name, buffer)
Opens an existing file with the given name.
Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open file. The mode is read, write or both.
Closes the open file filedes.
Transfers n bytes from the file referenced by filedes to buffer. Transfers n bytes to the file referenced by filedes from buffer. Both operations deliver the number of bytes actually transferred and advance the read-write pointer.
Moves the read-write pointer to offset (relative or absolute, depending on whence).
Removes the file name from the directory structure. If the file has no other names, it is deleted.
Adds a new name (name2) for a file (name1). Gets the file attributes for file name into buffer.
Code in C: Copy File Program
Write a simple C program to copy a file using the UNIX file system operations.
#define BUFSIZE 1024
void copyfile(char* oldfile, char* newfile)
{ char buf[BUFSIZE]; int i,n=1, fdold, fdnew;
main(intargc,char**argv) {
if((fdold = open(oldfile, O_RDONLY)))>=0) {
fdnew = open(newfile, O_WRONLY | O_APPEND);
while (n>0) {
n = read(fdold, buf, BUFSIZE); if(write(fdnew, buf, n) < 0) break;
close(fdold); close(fdnew);
else printf("Copyfile: couldn't open file: %s \n", oldfile);
copyfile(argv[1], argv[2]);
Distributed File Systems (DFS)
• A file system provides a convenient programming interface for disk storage along with features such as access control and file-locking that allows file sharing.
• A basic distributed file system emulates the same functionality as a (non- distributed) file system for client programs running on multiple remote computers.
• A file service allows programs to store and access remote files exactly as they do local ones, allowing users to access their files from any computer in an intranet.
• Hosts that provide a file service can be optimized for persistent storage devices, e.g., for multiple disk drives, and can supply file services for a wide range of other services in an organization, e.g., for the web services and emailservices.Thisfurtherfacilitatesmanagementofthepersistent storage, including backups and archiving.
Distributed File Systems Requirements (1)
Transparency :
• Access transparency – Client programs should be unaware of the distribution of files. Same API is used for accessing local and remote files and so programs written to operate on local files can, unchanged, operate on remote files.
• Location transparency – Client programs should see a uniform file name space; the names of files should be consistent regardless of where the files are actually stored and where the clients are accessing them from.
• Mobility transparency – Client programs and client administration services do not need to change when the files are moved from one place to another.
• Performance transparency – Client programs should continue to perform satisfactorily while the load on the service varies within a specified range.
• Scaling transparency – The service can be expanded by incremental growth to deal with a wide range of loads and network sizes.
Distributed File Systems Requirements (2)
Concurrent file updates : Multiple clients’ updates to files should not interfere with each other. Policies should bemanageable.
File replication : Each file can have multiple copies distributed over several servers, that provides better capacity for accessing the file and better fault tolerance.
Hardware and operating system heterogeneity : The service should not require the client or server to have specific hardware or
operating system dependencies.
Faulttolerance:Transientcommunicationproblemsshouldnotleadto file corruption. Servers can use at-most-once invocation
semantics or the simpler at-least-once semantics with idempotentoperations.Serverscanalsobe stateless.
Distributed File Systems Requirements (3)
Consistency : Multiple, possibly concurrent, access to a file should see a consistent representation of that file, i.e. differences in the files location or update latencies should not lead to the file looking different at different times. File meta data should be consistently represented on all clients.
Security : Client requests should be authenticated and data transfer should be encrypted.
Efficiency : Should be of a comparable level of performance to conventional file systems.
File Service Architecture (FSA)
• Flat file service: The flat file service is concerned with implementing operations
on the contents of files. A unique file identifier (UFID) is given to the flat file service to refer to the file to be operated on. The UFID is unique over all the files in the distributed system. The flat file service creates a new UFID for each new file that it creates.
• Directory service: The directory service provides a mapping between text names and their UFIDs. The directory service creates directories and can add and delete files from the directories. The directory service is itself a client of the flat file service since the directory files are stored there.
• Client module: The client module integrates the directory service and flat file service to provide whatever application programming interface is expected by the application programs. The client module maintains a list of available file servers. It can also cache data in order to improve performance.
File Service Architecture (FSA)
Client computer
Lookup AddName UnName GetNames
Server computer
Application program
Application program
Directory service
Client module
Flat file service
Create Delete GetAttributes SetAttributes
FSA: Flat File Service Interface
Read(UFID, i, n) → Data Write(UFID, i, Data)
Create() → UFID Delete(UFID) GetAttributes(UFID) → Attr SetAttributes(UFID,Attr)
Reads up to n items from position i in the file.
Writes the data starting at position i in the file. The file is extended if necessary.
Creates a new file of length 0 and returns a UFID for it. Removes the file from the file store.
Returns the file attributes for the file.
Sets the file attributes.
FSA: Flat File Service Interface
Difference with UNIX interface:
• Recall that the UNIX interface shown earlier requires that the UNIX file system maintains state, i.e., a file pointer, that is manipulated during reads andwrites.
• The flat file service interface differs from the UNIX interface due to fault tolerance requirements:
• repeatable operations – with the exception of Create(), the operations are idempotent, allowing the use of at-least-once RPC semantics.
• stateless server – the flat file service does not need to maintain any state and can be restarted after a failure and resume operation without any need for clients or the server to restore any state.
• Also note that UNIX files require an explicit open command before they can be accessed, while files in the flat file service can be accessed immediately.
Flat File Service Access Control
• The service needs to authenticate the RPC caller and needs to ensure that illegal operations are not performed, e.g., that UFIDs are legal and that files access privileges are not ignored.
• The server cannot store any access control state as this would break the idempotent property.
Two ways to do this:
1. An access check can be made whenever a file name is converted to a UFID,and the results can be encoded in the form of a capability that is returned to the client for submission to the flat file server.
2. A user identity can be submitted with every client request, and access checks can be performed by the flat file server for every file operation.
Directory Service Interface
• The primary purpose of the directory service is to provide a translation from file names to UFIDs. An abstract directory service interface is shown on the next slide.
• The directory server maintains directory files that contain mappings between text file names and UFIDs. The directory files are stored in the flat file server and so the directory server is itself a client to the flat file server.
• A hierarchical file system can be built up from repeated accesses. E.g.,the root directory has name “/” and the contains subdirectories with names “usr”, “home”, “etc”, which themselves contain other subdirectories or files.AclientfunctioncanmakerequestsfortheUFIDsinturn,to proceed through the path to the file or directory at the end.
Directory Service Interface
Lookup(Dir, Name) → UFID AddName(Dir, Name, UFID) UnName(Dir, Name) GetNames(Dir,Pattern) → Names
Returns the UFID for the file name in the given directory. Adds the file name with UFID to the directory.
Remove the file name from the directory.
Returnsallthenamesinthedirectorythatmatchthepat- tern.
File Group
• A file group is a collection of files located on a given server. A server may hold several file groups and file groups can be moved between servers, but a file cannot change file group.
• File groups allow the file service to be implemented over several servers. Files are given UFIDs that ensure uniqueness across different servers, e.g., by concatenating the server IP address (32 bits) with a date that the file was created (16 bits). This allows the files in a group, i.e., that have a common part to their UFID called the file group identifier, to be relocated to a different server without conflicting with files already on that server.
• The file service needs to maintain a mapping of UFIDs to servers. This can be cached at the client module.
file group id:
Sun Network File System (NFS)
• The Sun Network File System (NFS) follows the abstract system shown earlier.
• There are many implementations of NFS and they all follow the NFS protocol using a set of RPCs that provide the means for the client to perform operations on the remote file store.
• We consider a UNIX implementation.
• The NFS client makes requests to the NFS server to access files.
NFS: System Architecture
Client computer
Client computer
Server computer
Application program
NFS Client
Application program
Application program
Application Program
system calls UNIX kernel
Operations on local files
Operations on remote files
Virtual file system
Virtual file system
UNIX file system
NFS client
NFS server
NFS Client
UNIX file system
NFS protocol (remote operations)
file system
NFS: Virtual File System
• UNIX uses a virtual file system (VFS) to provide transparent access to any number of different file systems. The NFS is integrated in the same way. The VFS maintains a VFS structure for each filesystem in use. The VFS structure relates a remote filesystem to the local filesystem; i.e., it combines the remote and local file system into a single filesystem.
• The VFS maintains a a v-node for each open file, and this records an indicator as to whether the file is local or remote.
• If the file is local, then the v-node contains a reference to the file’s i-node on the UNIX file system.
• If the file is remote then the v-node contains a reference to the file’s NFS file handle which is a combination of filesystem identifier, i-node number and whatever else the NFS server needs to identify the file.
NFS: Client Integration
The NFS client is integrated within the kernel so that:
• user programs can access files via UNIX system calls without recompilation or reloading;
• a single client module serves all of the user-level processes, with a shared cache;
• the encryption key used to authenticate user IDs passed to the server can be
retained in the kernel, preventing impersonation by user-level clients.
The client transfers blocks of files from the server host to the local host and caches them, sharing the same buffer cache as used for local input-output system. Since several hosts may be accessing the same remote file, caching presents a problem of consistency.
NFS: Server Interface
• The NFS server interface integrates both the directory and file operations in a single service. The creation and insertion of file names in directories is performed by a single create operation, which takes the text name of the new file and file handle for the target directory as arguments.
• The primitives of the interface largely resemble the UNIX filesystem primitives.
NFS: Mount Service
• Each server maintains a file that describes which parts of the local filesystems that are available for remote mounting.
cat /etc/exports
# /etc/exports: the access control list for filesystems which may be # exported to NFS clients.
/ store 192.168.1.0/255.255.255.0(rw)
• In the above example all hosts on the subnet can mount the filesystem directory /store with read and writeaccess.
• A hard-mounted filesystem will block on each access until the access is complete. A soft-mounted filesystem will retry a few times and then return an error to the calling process.
NFS: Example NFS Mount
export people
remote mount
var students
remote mount
mohammad marvin jonathan
Example NFS mounting in two different file systems
NFS: Server Caching
In conventional UNIX systems:
• data read from the disk or pages are retained in a main memory buffer cache
and are evicted when the buffer space is required for other pages. Accesses to
cached data does not require a disk access.
• Read-ahead anticipates read accesses and fetches the pages following those
that have been recently read.
• Delayed-write or write-back optimizes writes to the disk by only writing pages
when they have been both modified and evicted. A UNIX sync operation flushes modified pages to disk every 30 seconds. This works for a conventional filesystem, on a single host, because there is only one cache, and file accesses cannot bypass the cache.
NFS: Server Caching
• Use of the cache at the server for client reads does not introduce any problems. However, use of the cache for writes requires special care to ensure that client can be confident that the writes are persistent, especially in the event of a server crash.
There are two options for cache policies that are used by the server:
• Write-through – data is written to cache and directly to the disk. This increases diskI/Oandincreasesthelatencyforwriteoperations.The operation completes when the data has been written to disk.
• Commit – data is written to cache and is written to disk when a commit operation is received for the data. A reply to the commit is sent when the data has been written to disk.
• The first option is poor when the server receives a large number of write requests for the same data. It however saves network bandwidth.
• The second option uses more network bandwidth and may lead to
uncommitted data being lost. However, it receives the full benefit of the cache.
NFS: Client Caching
• The NFS client also caches data reads, writes, attributes and directory operations in order to reduce network I/O.
• Caching at the client introduces the cache consistency problem since now there is a cache at the client and the server, and there may be more than one client as well, each with its own cache.
• Note that reading is a problem as well as writing, because a write on another client in between two read operations will lead to the second read operationbeingincorrect.
• In NFS, clients poll the server to check for updates.
NFS: Client Caching
• Let Tc be the time when a cache block was last validated by the client. Let Tm be the time when a block was last modified.
• AcacheblockissaidtobevalidattimeTif(i)T− Tc