The Design of File systems
Recap: How your application reaches storage
Applications
Copyright By PowCoder代写 加微信 powcoder
I/O libraries Buffer
data fread/fwrite — input.bin/output.bin fread/fwrite — input.bin/output.bin
read/write — 0, 512, 4096, … (block address)
read/write — block addresses
Device Driver Device Driver
read/write — block addresses
Device Controller Device Controller Device #2 2 Device #3 Device #4
File system
Device independent I/O interface (e.g. ioctl)
evice Driver
Kernel data
Device Controller
Recap: The “file” abstraction
• Operations:open,close,read,write
What is a file system?
• Alogically-structuredcollectionoffiles
• Definesthenamespaceofafile
• Providespersistence,accesscontrol,andotherprotection/security mechanisms
What is a file?
• Alogicalunitofstorage(e.g.anmp3),adeviceoradirectory
Files / File System provides an abstraction for secondary 3
Abstractions in operating systems
Process — the abstraction of a von Neumann machine
Thread — the abstraction of a processor
Virtual memory — the abstraction of memory
File system — the abstraction of space/location on a storage device, the storage device itself, as well as other peripherals
The “Old” UNIX File System BSD’s Fast File System Log-structured File System
Recap: Numbering the disk space with block addresses
0 Disk blocks 7
8 15 16 23 24 31 32 39 40 47 48 55 56 63
Questions for file systems
• Howdowemanagefileandfilesystemmetadata? How do we allocate storage space?
How do we make the file system fast?
How do we ensure file integrity?
How do we locate files?
• Howdowemanagehierarchicalnamespace?
How the original UNIX file system use disk blocks
0 Disk blocks 7 8 15
16 23 24 31 32 39 40 47 48 55 56 63
Information about the “file system” itself. (e.g. free blocks)
Information about the “files”.e.g.inodes
File System Metadata (Superblock)
File Metadata
Superblock — metadata of the file system
Contains critical file system information • Thevolumesize
• Thenumberofnodes
• Pointertotheheadofthefreelist
Located at the very beginning of the file system
inode — metadata of each file
File types: directory, file File size
Permission
Attributes
Unix inode
File types: directory, file File size
Permission
Attributes
Types of pointers:
• Direct: Access single data block
• Single Indirect: Access n data blocks
• Double indirect: Access n2 data blocks
• Triple indirect: Access n3 data blocks
If data block size is 512B and n = 256: max file size = (12+256+2562+2563)*512 = 8GB
inode has 15 pointers: 12 direct, 1 each
single-, double-, and triple-indirect
For a file /home/hungwei/CS202/foo.c , how many disk accesses does the original/old, unoptimized UNIX file system need to perform to reach the actual file content in the worst case?
E. Atleast10
Number of disk accesses
https://www.pollev.com/hungweitseng close in
For a file /home/hungwei/CS202/foo.c , how many disk accesses does the original/old, unoptimized UNIX file system need to perform to reach the actual file content in the worst case?
E. Atleast10
Number of disk accesses
What must be done to reach your files
Procedure: File system will…
• Open“/”file(Thisisinknownfromsuperblock.) • Locateentryfor“home,”openthatfile
• Locateentryfor“hungwei”,openthatfile
• Locateentryfor“foo.c”andopenthatfile
Let’s use “strace” to see what happens 17
Scenario: User wants to access
/home/hungwei/CS202/foo.c
How to reach /home/hungwei/CS202/foo.c
Superblock Disk blocks
8 15 16 23 24 31 32 39 40 47 48 55 56 63
index node (inode)
0 755 dir 24
permission
File System Metadata (Superblock)
File Metadata
permission
0 755 dir 31
permission
0 755 dir 34
permission
0 755 dir 44
permission
0 755 file 55
A Fast File System for UNIX
Marshall K. McKusick, . Joy, . Leffler and . Fabry
Computer Systems Research Group
Why do we care about fast file system
We want better performance!!! We want new features!
Let’s make file systems great again!
Problems in the “old” file system
Lots of seeks when accessing a file
• inodesareseparatedfromdatalocations
• datablocksbelongtothesamefilecanbespreadout
• 1out11inourpreviousexample—lessthan10%iffilesaresmall Limited file size
Crash recovery
Device oblivious
Low bandwidth utilization
• onlytheverylastisretrievingdata
What does fast file system propose?
Cylinder groups Larger block sizes Fragments Allocators
New features
• longfilenames • filelocking
• symboliclinks • renaming
How many of the following does FFS propose? ! Cylindergroupstoimproveaverageaccesstime ” Largerblocksizetoimprovebandwidth
# Largerblocksizetosupportlargerfiles
$ Replicatedsuperblocksfordatarecovery
% Pre-allocateblockstoimprovewriteperformance A. 1
What FFS proposes?
https://www.pollev.com/hungweitseng close in
How FFS use disk blocks
0 Disk blocks 7
8 15 16 23 24 31 32 39 40 47 48 55 56 63
File System Metadata (Superblock)
Backup Superblock
File Metadata
File Metadata
cylinder group
cylinder group
Cylinder groups
Consists of one or more consecutive cylinders on a disk
Each cylinder group contains the following
• redundantcopyofthesuperblock
• what’s the benefit?
• why not a cylinder group for all superblocks? inode space
bitmap of free blocks within the cylinder group summary of block usage
Improves average disk access time
• Allocatingblockswithinthesamecylindergroupforthesamefile • Placinginodealongwithdatawithinthesamecylindergroup
Cylinder groups
Which of the following factor of disk access can cylinder
groups help to improve when manage files?
A. Seektime
B. Rotationaldelay
C. Data transfer latency D. AandB
https://www.pollev.com/hungweitseng close in
Cylinder groups
Which of the following factor of disk access can cylinder
groups help to improve when manage files?
A. Seektime
B. Rotationaldelay
C. Data transfer latency D. AandB
Larger block sizes
Each file can only contain a fixed number of blocks Cannot fully utilize the I/O interface bandwidth
• EachI/Orequestcancarrymoredatatoimprovebandwidth However, larger block size leads to internal fragments
The block size of the old file system is aligned with the block
(sector) size of the disk
The new file system supports larger block sizes
• Supportslargerfiles
How larger block sizes improves bandwidth
SATA II (300MB/s in theory), 7200 R.P.M., seek time around 8 ms. Assume the controller overhead is 0.2ms. What’s the bandwidth of accessing 512B sectors and 4MB consecutive sectors?
Latency = seek time + rotational delay + transfer time + controller overhead
= 8 ms + 4.17 ms + 13.33 ms + 0.2 ms = 25.69 ms Bandwidth = volume_of_data over period_of_time
= 4MB = 155.7 MB/sec Trading latencies with bandwidth 25.69ms
= 8 ms + 4.17 ms + 0.00167 us + 0.2 ms = 12.36 ms
= 0.5KB = 40.45KB/sec 12.36ms
Addressable units within a block
Allocates fragments from a block with free fragments if the
writing file content doesn’t fill up a block
Global allocators Allocators
• Trytoallocateinodesbelongtosamefiletogether
• Spreadoutdirectoriesacrossthedisktoincreasethesuccessfulrate of the previous
Local allocators — allocate data blocks upon the request of the
global allocator
Rotationally optimal block in the same cylinder
Allocate a block from the cylinder group if global allocator needs one
Search for blocks from other cylinder group if the current cylinder group is exhausted
How many of the following does FFS propose? ! Cylindergroupstoimproveaverageaccesstime ” Largerblocksizetoimprovebandwidth
# Largerblocksizetosupportlargerfiles
$ Replicatedsuperblocksfordatarecovery
% Pre-allocateblockstoimprovewriteperformance A. 1
What FFS proposes?
following statements is/are true
How is BSD FFS doing?
Regarding the performance of BSD FFS, please identify how many of the
! BSDFFSisperformingbetterthanUFSregardlessofreadsandwrites
” TheperformanceofreadingdataisfasterthanwritingdatainBSDFFS,while the reading is slower than writing in UFS
# ThebandwidthutilizationofBSDFFSisbetterthanUFS $ TheCPUoverheadofBSDFFSishigherthanUFS
https://www.pollev.com/hungweitseng close in
Performance of FFS
not the case for old FS
writes in FFS are slower than reads
CPU load is fine given that UFS is way too slow!
How is BSD FFS doing?
Regarding the performance of BSD FFS, please identify how many of the
following statements is/are true
! BSDFFSisperformingbetterthanUFSregardlessofreadsandwrites
” TheperformanceofreadingdataisfasterthanwritingdatainBSDFFS,while the reading is slower than writing in UFS
# ThebandwidthutilizationofBSDFFSisbetterthanUFS $ TheCPUoverheadofBSDFFSishigherthanUFS
Larger overheads than the old file system as the new file system allocates blocks after write requests occur — Why not optimize for writes?
10% of overall time
writes are a lot faster already
Consistency
Writing metadata is synchronous rather than asynchronous —
What’s the benefit of synchronous writes?
What does fast file system propose?
• Cylinder groups — improve spread-out data locations
• Largerblocksizes—improvebandwidthandfilesizes
• Fragments —improvelowspaceutilizationduetolargeblocks
Allocators — address device oblivious
New features
• longfilenames • filelocking
• symboliclinks • renaming
The design and implementation of a
log-structured file system
and . Ousterhout Univ. of California, Berkeley
systems trying address?
How many of the following problems is/are Log-structured file
! Theperformanceofsmallrandomwrites
” Theefficiencyoflargefileaccesses
# Thespaceoverheadofmetadatainthefilesystem
$ Reducethemainmemoryspaceusedbythefilesystem A. 0
https://www.pollev.com/hungweitseng close in
How many of the following problems is/are Log-structured file
systems trying address?
! Theperformanceofsmallrandomwrites
” Theefficiencyoflargefileaccesses
# Thespaceoverheadofmetadatainthefilesystem
$ Reducethemainmemoryspaceusedbythefilesystem A. 0
Writes will dominate the traffic between main memory and disks — Unix FFS is designed under the assumption that only 10% of the traffic are writes
Whoiswrong? UFSispublishedin1984
As system memory grows, frequently read data can be cached
Every modern OS aggressively caches — use “free” in Linux to
Gaps between sequential access and random access Conventional file systems are not RAID aware
efficiently
systems trying address?
How many of the following problems is/are Log-structured file
! Theperformanceofsmallrandomwrites
” Theefficiencyoflargefileaccesses
# Thespaceoverheadofmetadatainthefilesystem
$ Reducethemainmemoryspaceusedbythefilesystem A. 0
Problems with BSD FFS
Data are spread out the whole disk
• Canachievesequentialaccesswithineachfile,butthedistancebetweenfilescanbe
An inode needs a standalone I/O in addition to file content
Creating files take at least five I/Os with seeks — can only use 5% bandwidth for data
• 2 for file attributes
• You have to check if the file exists or not
• You have to update after creating the file 1 for file data
1 for directory data
1 for directory attributes
Writes to metadata are synchronous
• Goodforcrashrecovery,badforperformance
Buffering changes in the system main memory and commit those changes sequentially to the disk with fewest amount of write operations
What does LFS propose?
write buffer
LFS in motion
Data chuck #1
Datachuck #2
Updated Data chuck #1
Data chuck
invalidate
Data chuck inode Updated Data
#2 Disk Space chuck #1
disk space (log)
systems trying address?
How many of the following problems is/are Log-structured file
! Theperformanceofsmallrandomwrites
” The efficiency of large file accesses leave it for the hardware designer
# Thespaceoverheadofmetadatainthefilesystem
$ Reducethemainmemoryspaceusedbythefilesystem
A. 0 B. 1 C. 2 D. 3 E. 4
increases due to write caching
increases due to garbage collection and inode maps
Checkpointing Crash recovery
• Createaredundantcopyofimportantfilesystemmetadata
periodically
Roll-forward
• Scanthrough/replaythelogaftercheckpointing
write buffer
LFS v.s. crash
Updated Data chuck #1
You still have a copy of data at some point
crash occurs
Data chuck #1
Data chuck inode Updated Data
#2 Disk Space chuck #1
disk space (log)
LFS v.s. write failed
You can try again!
write buffer
Updated Data chuck #1
You still have a copy of data at some point
write failed
Data chuck #1
Data chuck inode Updated Data
#2 Disk Space chuck #1
disk space (log)
Segment cleaning/Garbage collection
Reclaim invalidated segments in the log once the latest
updates are checkpointed
Rearrange the data allocation to make continuous segments
Must reserve enough space on the disk
• Otherwise,everywriteswilltriggergarbagecollection • Sinkthewriteperformance
Announcement
Reading quizzes due next Tuesday
Project due 3/3
Office hour
• CheckGoogleCalendar
• UsetheofficehourZoomlink,notthelectureone
Engineering
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com