程序代写 CS162 © UCB Spring 2022

Recall: Components of a File System
Directory Structure
File Header Structure
File number

Copyright By PowCoder代写 加微信 powcoder

One Block = multiple sectors Ex: 512 sector, 4K block
Data blocks
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Recall: FAT Properties
• File is collection of disk blocks (Microsoft calls them “clusters”)
• FAT is array of integers mapped 1-1 with disk blocks
– Each integer is either:
» Pointer to next block in file; or » End of file flag; or
» Free block flag
• File Number is index of root of block list for the file
– Follow list to get block #
– Directory must map name to block number at start of file
Disk Blocks
File 31, Block 0
File 31, Block 1
File 31, Block 3
File 31, Block 2
File number
• But:Where is FAT stored?
– Beginning of disk, before the data blocks N-1: – Usually 2 copies (to handle errors)
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022

Recall: Multilevel Indexed Files (Original 4.1 BSD)
• Sample file in multilevel indexed format:
– 10 direct ptrs, 1K blocks
– How many accesses for block #23? (assume file header accessed on open)?
» Two: One for indirect block, one for data
– How about block #5?
» One: One for data
– Block #340?
» Three: double indirect block, indirect block, and data
• UNIX 4.1 Pros and cons
– Pros: Simple (more or less)
Files can easily expand (up to a point) Small files particularly cheap and easy
– Cons:Lots of seeks
Very large files must read many indirect block (four I/Os per block!)
Joseph & Kubiatowicz CS162 © UCB Spring 2022

CASE STUDY:
BERKELEY FAST FILE SYSTEM (FFS)
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.4

Fast File System (BSD 4.2, 1984)
• Same inode structure as in BSD 4.1
– same file header and triply indirect blocks like we just studied
– Some changes to block sizes from 1024 ⇒ 4096 bytes for performance
• Paper on FFS:“A Fast File System for UNIX”
– Kusick, , and – Off the “resources” page of course website – Take a look!
• Optimization for Performance and Reliability:
– Distribute inodes among different tracks to be closer to data – Uses bitmap allocation in place of freelist
– Attempt to allocate files contiguously
– 10% reserved disk space
– Skip-sector positioning (mentioned later)
Joseph & Kubiatowicz CS162 © UCB Spring 2022

FFS Changes in Inode Placement: Motivation
• In early UNIX and DOS/Windows’ FAT file system, headers stored in special array in outermost cylinders
– Fixed size, set when disk is formatted
» At formatting time, a fixed number of inodes are created » Each is given a unique number, called an “inumber”
• Problem #1: Inodes all in one place (outer tracks)
– Head crash potentially destroys all files by destroying inodes
– Inodes not close to the data that the point to
» To read a small file, seek to get header, seek back to data
• Problem #2:When create a file, don’t know how big it will become (in UNIX, most writes are by appending)
– How much contiguous space do you allocate for a file? – Makes it hard to optimize for performance
Joseph & Kubiatowicz CS162 © UCB Spring 2022

FFS Locality: Block Groups
• The UNIX BSD 4.2 (FFS) distributed the header information (inodes) closer to the data blocks
– Often, inode for file stored in same “cylinder group” as parent directory of the file
– makes an “ls” of that directory run very fast
• File system volume divided into set of block groups
– Close set of tracks
• Data blocks, metadata, and free space interleaved within block group
– Avoid huge seeks between user data and system structure
• Put directory and its files in common block group
Joseph & Kubiatowicz CS162 © UCB Spring 2022

FFS Locality: Block Groups (Con’t)
• First-Free allocation of new file blocks
– To expand file, first try successive blocks in bitmap, then choose new range of blocks
– Few little holes at start, big sequential runs at end of group
– Avoids fragmentation
– Sequential layout for big files
• Important: keep 10% or more free! – Reserve space in the Block Group
• Summary: FFS Inode Layout Pros
– For small directories, can fit all data, file headers,
etc. in same cylinder ⇒ no seeks!
– File headers much smaller than whole block
(a few hundred bytes), so multiple headers fetched from disk at same time
– Reliability: whatever happens to the disk, you can find many of the files (even if directories disconnected)
Joseph & Kubiatowicz CS162 © UCB Spring 2022

UNIX 4.2 BSD FFS First Fit Block Allocation
• Fills in the small holes at the start of block group
• Avoids fragmentation, leaves contiguous free space at end
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.9

Attack of the Rotational Delay
• Problem 3: Missing blocks due to rotational delay
– Issue: Read one block, do processing, and read next block. In meantime, disk has continued turning: missed next block! Need 1 revolution/block!
Skip Sector
– Solution1: Skip sector positioning (“interleaving”)
» Place the blocks from one file on every other block of a track: give time for processing to overlap rotation
» Can be done by OS or in modern drives by the disk controller
– Solution 2: Read ahead: read next block right after first, even if application hasn’t asked for it yet
» This can be done either by OS (read ahead)
» By disk itself (track buffers) – many disk controllers have internal RAM that allows them to read a complete track
• Modern disks + controllers do many things “under the covers”
– Track buffers, elevator algorithms, bad block filtering
Track Buffer
(Holds complete track)
Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.10

UNIX 4.2 BSD FFS
– Efficient storage for both small and large files – Locality for both small and large files
– Locality for metadata and data
– No defragmentation necessary!
– Inefficient for tiny files (a 1 byte file requires both an inode and a data block) – Inefficient encoding when file is mostly contiguous on disk
– Need to reserve 10-20% of free space to prevent fragmentation
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Linux Example: Ext2/3 Disk Layout
• Disk divided into block groups
– Provides locality
– Each group has two block-sized bitmaps (free blocks/inodes)
– Block sizes settable at format time: 1K, 2K, 4K, 8K…
• Actual inode structure similar to 4.2 BSD – with 12 direct pointers
• Ext3: Ext2 with Journaling
– Several degrees of protection with comparable overhead
– We will talk about Journalling later
• Example: create a file1.dat under /dir1/ in Ext3
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Recall: Directory Abstraction
• Directories are specialized files
– Contents: List of pairs
/usr/lib4.3/foo
• System calls to access directories
– open / creat traverse the structure – mkdir /rmdir add/remove entries – link / unlink (rm)
• libc support
– DIR * opendir (const char *dirname)
– struct dirent * readdir (DIR *dirstream)
– int readdir_r (DIR *dirstream, struct dirent *entry,
struct dirent **result)
/usr/lib4.3
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Hard Links
• Hard link
– Mapping from name to file number in the directory structure – First hard link to a file is made when file created
– Create extra hard links to a file with the link() system call
– Remove links with unlink() system call
/usr/lib4.3/foo
• When can file contents be deleted?
– When there are no more hard links to the file
– Inode maintains reference count for this purpose
/usr/lib4.3
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022

Soft Links (Symbolic Links)
• Soft link or Symbolic Link or Shortcut
– Directory entry contains the path and name of the file – Map one name to another name
• Contrast these two different types of directory entries: – Normal directory entry:
– Symbolic link:
• OS looks up destination file name each time program accesses source file name
– Lookup can fail (error result from open)
• Unix: Create soft links with symlink syscall
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Directory Traversal
• What happens when we open /home/cs162/stuff.txt?
• “/” – inumber for root inode configured into kernel, say 2
– Read inode 2 from its position in inode array on disk
– Extract the direct and indirect block pointers
– Determine block that holds root directory (say block 49358)
– Read that block, scan it for “home” to get inumber for this directory (say 8086)
• Read inode 8086 for /home, extract its blocks, read block (say 7756), scan it for “cs162” to get its inumber (say 732)
• Read inode 732 for /home/cs162, extract its blocks, read block (say 12132), scan it for “stuff.txt” to get its inumber, say 9909
• Read inode 9909 for /home/cs162/stuff.txt
• Set up file description to refer to this inode so reads / write can access the data blocks referenced by its direct and indirect pointers
• Check permissions on the final inode and each directory’s inode…
block 49358
block 12132
“stuff.txt”:9909
block 7756
“home”:8086
“cs162”:732
2 9099 “home”:8086 732
“cs162”:732
“stuff.txt”:9909
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022

Large Directories: B-Trees (dirhash)
in FreeBSD, NetBSD, OpenBSD
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.17

CASE STUDY: WINDOWS NTFS
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.18

File System (NTFS)
• Default on modern Windows systems
• Variable length extents
– Rather than fixed blocks
• Instead of FAT or inode array: Master File Table
– Like a database, with max 1 KB size for each table entry
– Everything (almost) is a sequence of pairs » Meta-data and data
• Each entry in MFT contains metadata and:
– File’s data directly (for small files)
– A list of extents (start block, size) for file’s data
– For big files: pointers to other MFT entries with more extent lists
Joseph & Kubiatowicz CS162 © UCB Spring 2022

• Master File Table
– Database with Flexible 1KB entries for metadata/data – Variable-sized attribute records (data or metadata)
– Extend with variable depth tree (non-resident)
• Extents – variable length contiguous regions – Block pointers cover runs of blocks
– Similar approach in Linux (ext4)
– File create can provide hint as to
– size of file
• Journaling for reliability
– Discussed later
http://ntfs.com/ntfs-mft.htm
Joseph & Kubiatowicz CS162 © UCB Spring 2022

NTFS Small File: Data stored with Metadata
Create time, modify time, access time,
Owner id, security specifier, flags (RO, hidden, sys)
data attribute
Attribute list
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.21

NTFS Medium File: Extents for File Data
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.22

NTFS Large File: Pointers to Other MFT Records
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.23

NTFS Huge, Fragmented File: Many MFT Records
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.24

NTFS Directories
• Directories implemented as B Trees
• File’s number identifies its entry in MFT
• MFT entry always has a file name attribute
– Human readable name, file number of parent dir
• Hard link? Multiple file name attributes in MFT entry
Joseph & Kubiatowicz CS162 © UCB Spring 2022

MEMORY MAPPED FILES
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.26

Memory Mapped Files
• Traditional I/O involves explicit transfers between buffers in process address space to/from regions of a file
– This involves multiple copies into caches in memory, plus system calls
• What if we could “map” the file directly into an empty region of our address space
– Implicitly “page it in” when we read it – Write it and “eventually” page it out
• Executable files are treated this way when we exec the process!!
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Recall:Who Does What,When?
virtual address
page fault
physical address
offset frame#
instruction
Operating System
Page Fault Handler
update PT entry
load page from disk
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Using Paging to mmap() Files virtual address
MMU frame#
page fault
physical address
instruction
Operating System
Page Fault Handler
from memory!
Create PT entries for mapped region as “backed” by file
File scheduler
mmap() file to region of VAS Joseph & Kubiatowicz CS162 © UCB Spring 2022

mmap() system call
• May map a specific region or let the system find one for you – Tricky to know where the holes are
• Used both for manipulating files and for sharing between processes
4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.30

An mmap() Example
#include /* also stdio.h, stdlib.h, string.h, fcntl.h, unistd.h */
int something = 162;
int main (int argc, char *argv[]) {
char *mfile;
$ ./mmap test
7f8a33c04b70
7fff59e9db10
printf(“Data at: %16lx\n”, (long unsigned int) &something); This is line one
printf(“Heap at : %16lx\n”, (long unsigned int) malloc(1)); This is line two
printf(“Stack at: %16lx\n”, (long unsigned int) &mfile); This is line three
This is line four
/* Open the file */
myfd = open(argv[1], O_RDWR | O_CREAT);
if (myfd < 0) { perror("open failed!");exit(1); } /* map the file */ mfile = mmap(0, 10000, PROT_READ|PROT_WRITE, MAP_FILE|MAP_SHARED, myfd, 0); if (mfile == MAP_FAILED) {perror("mm$apcafatiltede"s)t; exit(1);} This is line one printf("mmap at : %16lx\n", (long unsigned int) mfile); puts(mfile); strcpy(mfile+20,"Let's write over it"); close(myfd); ThiLet's write over its line three This is line four Joseph & Kubiatowicz CS162 © UCB Spring 2022 Sharing through Mapped Files 0x000... VAS 2 ss instruction instruction • Also: anonymous memory between parents and children – no file backing – just swap space Joseph & Kubiatowicz CS162 © UCB Spring 2022 THE BUFFER CACHE 4/12/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 21.33 Buffer Cache • Kernel must copy disk blocks to main memory to access their contents and write them back if modified – Could be data blocks, inodes, directory contents, etc. – Possibly dirty (modified and not written back) • Key Idea: Exploit locality by caching disk data in memory – Name translations: Mapping from paths→inodes – Disk blocks: Mapping from block address→disk content • Buffer Cache: Memory used to cache kernel resources, including disk blocks and name translations – Can contain “dirty” blocks (with modifications not on disk) Joseph & Kubiatowicz CS162 © UCB Spring 2022 • OS implements a cache of disk blocks for efficient access to data, directories, inodes, freemap Dir Data blocks Free bitmap File System Buffer Cache Data blocks Blocks State Joseph & Kubiatowicz CS162 © UCB Spring 2022 File System Buffer Cache: open • Directory lookup repeat as needed: – load block of directory Data blocks file – search for map desc Dir Data blocks Free bitmap Blocks State free frdredire Joseph & Kubiatowicz CS162 © UCB Spring 2022 File System Buffer Cache: open Data blocks • Directory lookup repeat as needed: – load block of directory – search for map • Create reference via open file descriptor Dir Data blocks :inumber
Free bitmap
Blocks State
free dir inrode
Joseph & Kubiatowicz CS162 © UCB Spring 2022

File System Buffer Cache: Read?
• Read Process
– From inode, traverse index structure to find data block
– load data block
– copy all or part to read data buffer
Data blocks
Writin Dir Data blocks g
:inumber
Free bitmap
Blocks State
free dir inode
Joseph & Kubiatowicz CS162 © UCB Spring 2022

File System Buffer Cache:Write?
• Process similar to read, but may allocate new blocks (update free map), blocks need to be written back to disk; inode?
Data blocks
Dir Data blocks
:inumber
Free bitmap
Blocks State
free dir inode
Joseph & Kubiatowicz CS162 © UCB Spring 2022

File System Buffer Cache: Eviction?
Data blocks
• Blocks being written back to disc go through a transient state
Dir Data blocks
:inumber
Free bitmap
Blocks State
free dir dirty inode
Joseph & Kubiatowicz CS162 © UCB Spring 2022

Buffer Cache Discussion
• Implemented entirely in OS software – Unlike memory caches and TLB
• Blocks go through transitional states between free and in-use – Being read from disk, being written to disk
– Other processes can run, etc.
• Blocks are used for a variety of purposes – inodes, data for dirs and files, freemap
– OS maintains pointers into them
• Termination – e.g., process exit – open, read, write • Replacement – what to do when it fills up?
Joseph & Kubiatowicz CS162 © UCB Spring 2022

File System Caching
• Replacement policy? LRU
– Can afford overhead full LRU implementation
– Advantages:
» Works very well for name translation
» Works well in general as long as memory is big enough to accommodate a host’s working
set of files. – Disadvantages:
» Fails when some application scans through file system, thereby flushing the cache with data used only once
» Example:find . –exec grep foo {} \; • Other Replacement Policies?
– Some systems allow applications to request other policies
– Example,‘Use Once’:
» File system can discard blocks as soon as they are used
Joseph & Kubiatowicz CS162 © UCB Spring 2022

File System Caching (con’t)
• Cache Size: How much memory should the OS allocate to the buffer cache vs virtual memory?
– Too much memory to the file system cache ⇒ won’t be able to run many applications
– Too little memory to file system cache ⇒ many applications may run slowly (disk
caching not effective)
– Solution: adjust boundary dynamically so that the disk access rates for paging and file access are balanced
Joseph & Kubiatowicz CS162 © UCB Spring 2022

File System Prefetching
• Read Ahead Prefetching: fetch sequential blocks early
– Key Idea: exploit fact that most common file access is sequential by prefetching
subsequent disk

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com