CS 111 Summer 2022
Lecture 13 Page 1
Operating System Principles: File Systems – Allocation, Naming, Performance, and Reliability
Operating Systems
Copyright By PowCoder代写 加微信 powcoder
• Allocating and managing file system free space
• Other performance improvement strategies
• File naming and directories
• File system reliability issues
CS 111 Summer 2022
Lecture 13 Page 2
Free Space and Allocation Issues
• How do I keep track of a file system’s free space?
• How do I allocate new disk blocks when needed?
– And how do I handle deallocation?
CS 111 Summer 2022
Lecture 13 Page 3
The Allocation/Deallocation Problem
• File systems usually aren’t static
• You create and destroy files
• You change the contents of files
– Sometimes extending their length in the process
• Such changes convert unused disk blocks to used blocks (or visa versa)
• Need correct, efficient ways to do that
• Typically implies a need to maintain a free list
of unused disk blocks
CS 111 Summer 2022
Lecture 13 Page 4
Remember Free Lists?
We talked about them in the context of memory allocation
– Primarily for variable sized partitions These aren’t variable sized partitions
– The free elements are fixed size blocks
But there are other issues
– For hard disks, locality matters
– For flash, there are issues of erasure and load leveling
These issues may affect free list organization Lecture 13
CS 111 Summer 2022
Creating a • Allocate a free file control block
– For UNIX
• Search the super-block free I-node list
• Take the first free I-node – For DOS
• Search the parent directory for an unused directory entry • Initialize the new file control block
– With file type, protection, ownership, … • Give new file a name
CS 111 Summer 2022
Lecture 13 Page 6
Extending a File
• Applicationrequestsnewdatabeassignedtoafile
– May be an explicit allocation/extension request
– May be implicit (e.g., write to a currently non-existent block – remember sparse files?)
• Findafreechunkofspace
– Traverse the free list to find an appropriate chunk – Remove the chosen chunk from the free list
• Associateitwiththeappropriateaddressinthefile
– Go to appropriate place in the file or extent descriptor
– Update it to point to the newly allocated chunk
CS 111 Summer 2022
Lecture 13 Page 7
Deleting a File
• Release all the space that is allocated to the file
– For UNIX, return each block to the free block list
– DOS does not free space
• It uses garbage collection
• So it will search out deallocated blocks and add them to the free list at some future time
• Deallocate the file control lock
– For UNIX, zero inode and return it to free list
– For DOS, zero the first byte of the name in the parent directory
CS 111 Summer 2022
• Indicating that the directory entry is no longer in use
Lecture 13
Free Space Maintenance • Filesystemmanagermanagesthefreespace
• Getting/releasingblocksshouldbefastoperations – They are extremely frequent
– We’d like to avoid doing I/O as much as possible
• Unlikememory,itmatterswhichblockwechoose – Can’t write fully-written flash blocks
– May want to do wear-levelling and keep data contiguous – Other issues for hard disk drives
• Free-listorganizationmustaddressbothconcerns – Speed of allocation and deallocation
CS 111 – Ability to allocate preferred device space Summer 2022
Lecture 13 Page 9
Other Performance Improvement Strategies
• Beyond disk layout issues
– Which are only relevant for hard drives, not flash
or other solid state devices
– Though they have their own unique issues
• Transfer size • Caching
CS 111 Summer 2022
Lecture 13 Page 10
Allocation/Transfer Size • Per operation overheads are high
– DMA startup, interrupts, device-specific costs
• Larger transfer units are more efficient
– Amortize fixed per-op costs over more bytes/op – Multi-megabyte transfers are very good
• What unit do we use to allocate storage space? – Small chunks reduce efficiency
– Large fixed size chunks -> internal fragmentation – Variable sized chunks -> external fragmentation
– Tradeoff between fragmentation and efficiency Summer 2022
Lecture 13 Page 11
Flash Drive Issues
• Flash is becoming the dominant technology
– Sales overtook HDD in 2021
• Special flash characteristics:
– Faster than hard disks, slower than RAM
– Any location equally fast to access
– But write-once/read-many access • Until you erase
– You can only erase very large chunks of memory • Think about this as we discuss other file
CS 111 system issues Summer 2022
Lecture 13 Page 12
Caching • Caching for reads
• Caching for writes
CS 111 Summer 2022
Lecture 13 Page 13
Read Caching
• Persistent storage I/O takes a long time
– Deep queues, large transfers improve efficiency – They do not make it significantly faster
• We must eliminate much of our persistent storage I/O
– Maintain an in-memory cache
– Depend on locality, reuse of the same blocks – Check cache before scheduling I/O
CS 111 Summer 2022
Lecture 13 Page 14
Read-Ahead
Request blocks from the device before any process asked for them
Reduces process wait time
When does it make sense?
– When client specifically requests sequential access – When client seems to be reading sequentially
What are the risks?
– May waste device access time reading unwanted blocks
– May waste buffer space on unneeded blocks Summer 2022
Lecture 13 Page 15
Write Caching
• Most device writes go to a write-back cache
– They will be flushed out to the device later • Aggregates small writes into large writes
– If application does less than full block writes
• Eliminates moot writes
– If application subsequently rewrites the same data – If application subsequently deletes the file
• Accumulates large batches of writes
– A deeper queue to enable better disk scheduling
CS 111 Summer 2022
Lecture 13 Page 16
Common Types of Disk Caching
• General block caching
– Popular files that are read frequently
– Files that are written and then promptly re-read
– Provides buffers for read-ahead and deferred write
• Special purpose caches
– Directory caches speed up searches of same dirs – Inode caches speed up re-uses of same file
• Special purpose caches are more complex
– But they often work much better by matching cache granularities to actual needs
CS 111 Summer 2022
Lecture 13 Page 17
Naming in File Systems
• Each file needs some kind of handle to allow us to refer to it
• Low level names (like inode numbers) aren’t usable by people or even programs
• We need a better way to name our files
– User friendly
– Allowing for easy organization of large numbers of files
– Readily realizable in file systems
CS 111 Summer 2022
Lecture 13 Page 18
File Names and Binding
• Filesystemknowsfilesbydescriptorstructures
• Wemustprovidemoreusefulnamesforusers
• Thefilesystemmusthandlename-to-filemapping
– Associating names with new files That’s what we mean by binding – Finding the underlying representation for a given name
– Changing names associated with existing files
– Allowing users to organize files using names
• Namespaces–thetotalcollectionofallnames known by some naming mechanism
– Sometimes means all names that could be created
CS 111 Summer 2022
by the mechanism
Lecture 13 Page 19
Name Space Structure
• There are many ways to structure a name space
– Flat name spaces
• All names exist in a single level
– Graph-based name spaces
• Can be a strict hierarchical tree
• Or a more general graph (usually directed)
• Are all files on the machine under the same name structure?
• Or are there several independent name spaces?
CS 111 Summer 2022
Lecture 13 Page 20
Some Issues in Name Space Structure
• Howmanyfilescanhavethesamename? – One per file system … flat name spaces
– One per directory … hierarchical name spaces
• Howmanydifferentnamescanonefilehave? – A single “true name”
– Only one “true name”, but aliases are allowed – Arbitrarily many
– What’s different about “true names”?
• Dodifferentnameshavedifferentcharacteristics? – Does deleting one name make others disappear too?
CS 111 – Do all names see the same access permissions? Summer 2022
Lecture 13 Page 21
Hierarchical Name Spaces • Essentiallyagraphicalorganization
• Typicallyorganizedusingdirectories – A file containing references to other files – A non-leaf node in the graph
– It can be used as a naming context
• Each process has a current directory
• File names are interpreted relative to that directory
• Nesteddirectoriescanformatree
– A file name describes a path through that tree
– The directory tree expands from a “root” node
• A name beginning from root is called “fully qualified”
– May actually form a directed graph
• If files are allowed to have multiple names
CS 111 Summer 2022
Lecture 13 Page 22
A Rooted Directory Tree root
(/user_2/file_b)
(/user_3/file_c)
(/user_1/file_a)
(/user_3/dir_a)
(/user_3/dir_a/file_b)
Lecture 13 Page 23
(/user_1/dir_a)
(/user_1/dir_a/file_a)
CS 111 Summer 2022
Directories Are Files • Directoriesareaspecialtypeoffile
– Used by OS to map file names into the associated files • Adirectorycontainsmultipledirectoryentries
– Each directory entry describes one file and its name
• Userapplicationsareallowedtoreaddirectories – To get information about each file
– To find out what files exist
• UsuallyonlytheOSisallowedtowritethem
– Users can cause writes through special system calls
– The file system depends on the integrity of directories
CS 111 Summer 2022
Lecture 13 Page 24
Traversing the Directory Tree
• Some entries in directories point to child directories
– Describing a lower level in the hierarchy
• To name a file at that level, name the parent
directory and the child directory, then the file
– With some kind of delimiter separating the file name components
• Moving up the hierarchy is often useful
– Directories usually have special entry for parent
– Many file systems use the name “..” for that
CS 111 Summer 2022
Lecture 13 Page 25
File Names Vs. Path Names
• Insomenamespacesystems,fileshad“truenames” – Only one possible name for a file,
– Kept in a record somewhere
• E.g.,inDOS,afileisdescribedbyadirectoryentry
– Local name is specified in that directory entry
– Fully qualified name is the path to that directory entry • E.g., start from root, to user_3, to dir_a, to file_b
• Whatiffileshadnointrinsicnamesoftheirown? – All names came from directory paths
CS 111 Summer 2022
Lecture 13 Page 26
Example: Unix Directories • Afilesystemthatallowsmultiplefilenames
– So there is no single “true” file name, unlike DOS • File names separated by slashes
– E.g.,/user_3/dir_a/file_b
• Theactualfiledescriptorsaretheinodes
– Directory entries only point to inodes
– Association of a name with an inode is called a hard link – Multiple directory entries can point to the same inode
• ContentsofaUnixdirectoryentry – Name (relative to this directory)
CS 111 – Pointer to the inode of the associated file Summer 2022
Lecture 13 Page 27
But what’s this “.” entry?
It’s a directory entry that points to the directory itself!
Root directory, inode #1
Unix Directories
Useful if we use defaults to attach prefixes to file names.
Directory /user_3, inode #114
Here’s a “..” entry, pointing to the parent directory
CS 111 Summer 2022
Lecture 13 Page 28
Multiple File Names In Unix • Howdolinksrelatetofiles?
– They’re the names only
• All other metadata is stored in the file inode
– File owner sets file protection (e.g., read-only)
• All links provide the same access to the file
– Anyone with read access to file can create new link
– But directories are protected files too
• Not everyone has read or search access to every directory
• Alllinksareequal
– There is nothing special about the first (or owner’s) link
CS 111 Summer 2022
Lecture 13 Page 29
Links and De-allocation Files exist under multiple names
What do we do if one name is removed?
If we also removed the file itself, what about the other names?
– Do they now point to something non-existent? The Unix solution says the file exists as long
as at least one name exists
Implying we must keep and maintain a reference count of links
– In the file inode, not in a directory Summer 2022
Lecture 13 Page 30
Unix Hard Link Example
user_1 user_3
Note that we now associate names with links rather than with files.
file_a /user_1/file_a and
/user_3/dir_a/file_b
are both links to the same inode
CS 111 Summer 2022
Lecture 13 Page 31
Hard Links, Directories, and Files
inode #1, root directory
inode #9, directory
inode #114, directory
inode #29, file
Link count = 2
CS 111 Summer 2022
Lecture 13 Page 32
Symbolic Links
• Adifferentwayofgivingfilesmultiplenames
• Symboliclinksimplementedasaspecialtypeoffile – An indirect reference to some other file
– Contents is a path name to another file
• Filesystemrecognizessymboliclinks
– Automatically opens associated file instead
– If file is inaccessible or non-existent, the open fails
• Symboliclinkisnotareferencetotheinode
– Symbolic links don’t prevent deletion or update link count
– Do not guarantee ability to follow the specified path
CS 111 – Internet URLs are similar to symbolic links Summer 2022
Lecture 13 Page 33
Symbolic Link Example
root user_1 user_3
(/user_1/file_a)
CS 111 Summer 2022
Lecture 13 Page 34
The link count for this file is still 1, though
Symbolic Links, Files, and Directories
inode #1, root directory
inode #9, directory
inode #29, file
inode #114, directory
Link count still equals 1!
inode #46, symlink
/user_1/file_a
CS 111 Summer 2022
Lecture 13 Page 35
File Systems Reliability • Whatcangowronginafilesystem?
• Dataloss
– File or data is no longer present
– Some/all of data cannot be correctly read back
• Filesystemcorruption
– Lost free space
– References to non-existent files
– Corrupted free-list multiply allocates space
– File contents over-written by something else
– Corrupted directories make files un-findable
– Corrupted inodes lose file info/pointers
CS 111 Summer 2022
Lecture 13 Page 36
File Systems – System Failures • Caused by system crashes or OS bugs
• Queued writes that don’t get completed
– Client writes that will not be persisted – Client creates that will not be persisted – Partial multi-block file system updates
• Can also be caused by power failures
– Solution: NVRAM disk controllers
– Solution: Uninterruptable Power Supply (UPS)
CS 111 Summer 2022
Lecture 13 Page 37
The Core Reliability Problem
• File system writes typically involve multiple operations
– Not just writing a data block to disk/flash
– But also writing one or more metadata blocks – The inode, the free list, maybe directory blocks
• All must be committed to disk for the write to succeed
• But each block write is a separate hardware operation
CS 111 Summer 2022
Lecture 13 Page 38
Deferred Writes –
A Worst Case Scenario
• Process allocates a new block to file A – We get a new block (x) from the free list
So we need to write a free list block.
– We write out the updated inode for file A
– We defer free-list write-back (happens all the time)
• The system crashes, and after it reboots – A new process wants a new block for file B – We get block x from the (stale) free list
So we need to write an inode.
• Two different files now contain the same block
– When file A is written, file B gets corrupted
– When file B is written, file A gets corrupted
CS 111 Summer 2022
Lecture 13 Page 39
Application Expectations When
• Applicationsmakesystemcallstoperformwrites
• Whensystemcallreturns,theapplication(anduser) expect the write to be “safe”
– Meaning it will persist even if system crashes
• Wecanblockthewritingapplicationuntilreallysafe
• Butthatmightblockapplicationforquiteawhile…
• Crashesarerare
– So persistence failure caused by them are also rare
– Must we accept big performance penalties for occasional safety?
CS 111 Summer 2022
Lecture 13 Page 40
Buffered Writes
• Don’twaitforthewritetoactuallybepersisted
• KeeptrackofitinRAM
• Telltheapplicationit’sOK
• Atsomelaterpoint,actuallywritetopersistent memory
• Up-sides:
– Less application blocking
– Deeper and optimizable write queues
• Down-side:
– What if there’s a crash between lying and fixing the lie?
CS 111 Summer 2022
Lecture 13 Page 41
Robustness – Ordered Writes
• Carefully ordered writes can reduce potential damage
• Write out data before writing pointers to it – Unreferenced objects can be garbage collected – Pointers to incorrect info are more serious
• Write out deallocations before allocations
– Disassociate resources from old files ASAP
– Free list can be corrected by garbage collection
– Improperly shared data is more serious than missing data
CS 111 Summer 2022
Lecture 13 Page 42
Practicality of Ordered Writes
• Greatly reduced I/O performance
– Eliminates accumulation of near-by operations
– Eliminates consolidation of updates to same block
• May not be possible
– Modern devices may re-order queued requests
• Doesn’t actually solve the problem
– Does not eliminate incomplete writes
– It chooses minor problems over major ones
CS 111 Summer 2022
Lecture 13 Page 43
Robustness – Audit and Repair
• Design file system structures for audit and repair
– Redundant information in multiple distinct places
• Maintain reference counts in each object
• Children have pointers back to their parents • Transaction logs of all updates
– All resources can be garbage collected • Discover and recover unreferenced objects
Mounting initializes a file
system and makes it available for use.
• Audit file system for correctness (prior to mount) – All objects are well formatted
– All references and free-lists are correct and consistent
• Use redundant info to enable automatic repair
CS 111 Summer 2022
Lecture 13 Page 44
Practicality of Audit and Repair
• Integrity checking a file system after a crash – Verifying check-sums, reference counts, etc.
– Automatically correct any inconsistencies
– A standard practice for many years (see fsck(8))
• No longer practical
– Checking a 2TB FS at 100MB/second = 5.5 hours
• We need more efficient partial write solutions – File systems that are immune to them
– File systems that enable very fast recovery
CS 111 Summer 2022
Lecture 13 Page 45
Journaling
• Create a circular buffer journaling device – Journal writes are always sequential
– Journal writes can be batched
– Journal is relatively small, may use NVRAM
• Journal all intended file system updates – Inode updates, block write/alloc/free
• Efficiently schedule actual file system updates – Write-back cache, batching, motion-scheduling
• Journal completions when real writes happen
CS 111 Summer 2022
Lecture 13 Page 46
A Journaling Example
Write to /a/foo
Let’s say that’s two pages of data Replacing two existing pages
Put a start record in the journal
Plus metadata about where to put the two blocks
Then write the two pages to the journal
Put an end record in the journal
Tell the writing process that it’s done
When the OS has spare time, copy the data pages to their true locations
Then get rid of the log entry
CS 111 Summer 2022
Journal space
Data Pages
The storage device
Lecture 13 Page 47
Batched Journal Entries
• Operation is safe after journal entry persisted
– Caller must wait for this to happen
• Small writes are still inefficient
• Accumulate batch until full or max wait time
if there is no current in-memory journal page
allocate a new page
add my transaction to the current journal page if curre
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com