CS计算机代考程序代写 file system cache PERSISTENCE: FFS

PERSISTENCE: FFS
Andrea Arpaci-Dusseau CS 537, Fall 2019

ADMINISTRIVIA
Make-up Points for Projects 2 + 3 (more later) – Canvas Quizzes Specification and Concepts/Implementation
Can take only one time
Up to 35 more points (scaled);
if > 70 points, no benefit
if received < 50 points, expect to take Due 1 week from when made available (Fri, Mon) File System Structures Quiz: Lecture Content Due Next Tuesday Project 6: Due next Friday Specification Quiz – Due yesterday AGENDA / LEARNING OUTCOMES How does FFS improve performance? • How to improve performance of complex system? • Why do file systems obtain worse performance over time? • How to choose the right block size? How to avoid internal fragmentation? • How to place related blocks close to one another on disk? RECAP SUMMARY Super Block Inode Bitmap Data Bitmap Data Block directories indirects Inode Table FS LAYOUT IIIII DDDDDDDD 0 78 15 DDDDDDDD DDDDDDDD 16 23 24 31 DDDDDDDD DDDDDDDD 32 39 40 47 DDDDDDDD DDDDDDDD 48 55 56 63 Inode Multiple inodes per disk block To modify inode: Read old disk block containing inode Change inode in memory Write entire disk block out to disk type (file or dir?) uid (owner) rwx (permissions) size (in bytes) Blocks time (access) ctime (create) links_count (# paths) Data pointers[N] (N data blocks) Indirect Pointer Double Indirect Pointer Triple Indirect Pointer data data indirect Double indirect Better for small files! How to handle even larger files? x1024 x1024 data data data data x1024 indirect indirect indirect indirect x1024 x1024 x1024 x1024 inode Simple Directory List Example valid name inode 1 . 134 1 .. 35 1 foo 80 1 bar 23 unlink( “foo”) FS Structs: SUPERBLOCK Basic FS configuration metadata, like block size, # of inodes SIBDBI I I I I DDDDDDDD 0 78 15 DDDDDDDD DDDDDDDD 16 23 24 31 DDDDDDDD DDDDDDDD 32 39 40 47 DDDDDDDD DDDDDDDD 48 55 56 63 - create file - write - open - read - close FS Operations REVIEW: create /foo/bar data inode root foo bar root foo bitmap bitmap inode inode inode data data create /foo/bar [traverse path] data inode root foo bar root foo bitmap bitmap inode inode inode data data read read read read Verify that bar does not already exist create /foo/bar [allocate inode] data inode root foo bar root foo bitmap bitmap inode inode inode data data read read read write read read create /foo/bar [populate inode] data inode root foo bar root foo bitmap bitmap inode inode inode data data read read read write read write Why must read bar inode? How to initialize inode? read read create /foo/bar [add bar to /foo] data inode root foo bar root foo bitmap bitmap inode inode inode data data read read read write read write write Update inode (e.g., size) and data for directory read read write open /foo/bar data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data read read read read read write to /foo/bar (assume file exists, is empty, and has been opened) data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data read write read write No reads or writes to / or /foo write append to /foo/bar (opened already) data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data read [allocate block] data inode root foo bar root foo bar append to /foo/bar bitmap bitmap inode inode inode data data data read write read [point to block] data inode root foo bar root foo bar append to /foo/bar bitmap bitmap inode inode inode data data data read write read write [write to block] data inode root foo bar root foo bar append to /foo/bar bitmap bitmap inode inode inode data data data read write read write What assumptions did we make for append? Needed to allocate a new data block (appended data didn’t fit in current block) How would file system know this? What steps would be skipped if don’t need to allocate new data block? write read /foo/bar – assume opened data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data read (Skipping updates to timestamps in inode which would cause another write) read close /foo/bar data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data nothing to do on disk! FAST FILE SYSTEM: FFS (1980s) Beginner’s approach 1. get idea 2. build it! Pro approach Measure then build Iterative process 1. identify existing state of the art 2. measure it, identify and understand problems System Building 3. get idea to improve – solutions often flow from deeply understanding problem 4. build it! Measure Old FS State of the art: Original UNIX file system 0N Free lists are embedded in inodes, data blocks Data blocks are 512 byte sectors (smallest possible) Measure throughput for whole sequential file reads/writes super block inodes Data Blocks Compare to theoretical max, which is? Old UNIX file system: Achieved only 2% of potential. Why? disk bandwidth Measurement 1: Aging? What is performance before/after aging? Over time, files created and deleted, new files created in newly freed space – NewFS:17.5%ofdiskbandwidth – Fewweeksold:3%ofdiskbandwidth Problem: FS is becomes fragmented over time – Freelistmakescontiguouschunkshardtofind Measurement 1: Aging? Problem: FS becomes fragmented over time – Freelistmakescontiguouschunkshardtofind Freelist Solutions? Hacky • • Solutions: Occasionally defragment disk: move around existing files Keep freelist sorted Measurement 2: Block SIZE? How does block size affect performance? Try doubling it (from 512 bytes to 1KB) Result: Performance more than doubled Why double the performance? • Logicallyadjacentblockswerenotphysicallyadjacent • With half the blocks, only half as many seeks+rotations required Why more than double the performance? • Largerblocksrequirefewerindirectblocks Multi-Level Indexing Dynamically allocate hierarchy of pointers to blocks as needed Meta-data: Small number of pointers allocated statically Additional pointers to blocks of pointers Examples: UNIX FFS-based file systems, ext2, ext3 indirect double indirect indirect triple indirect Double block size: Reach 2x data directly; Indirect blocks fit 2x as many pointers Technique: LargeR Blocks Observation: Doubling block size for old FS over doubled performance Why not make blocks huge? Most file are very small, even today! 50 37.5 25 12.5 0 Impact of LargeR Blocks 512 1024 2048 4096 Block Size Lots of waste due to internal fragment in most blocks Time vs. Space tradeoffs... Percent Wasted Space Measurement 3: block Layout Blocks laid out poorly – long distance between inodes / data – related inodes not close to one another FILE LAYOUT IMPORTANCE slow 0N Layout is not disk-aware! super block bitmaps inodes Data Blocks Old FS Summary 1. Free list becomes scrambledàrandom allocations 2. Small blocks (512 bytes) 3. Blocks laid out poorly – long distance between inodes/data – related inodes not close to one another Result: 2% of potential performance! Problem: Old FS treats disk like RAM! DISK-AWARE FILE SYSTEM How to make the disk use more efficient? Where to place meta-data and data on disk? Placement Technique 1: Bitmaps 0N Use bitmaps instead of free list Bitmaps provides better speed, with more global view Fast to update: Just flip single bit to change state from free -> allocated or back No sorting needed
Fast to do lookups: Reading one bitmap block gives state of many data blocks Quickly find contiguous free blocks
super block
bitmaps
inodes
Data Blocks
How many bits can be stored in one 4KB block? 4KB * 8 bits / Byte = 4096 * 8 = 32768
1 bitmap block gives state of 32768 data blocks

FS Structs: BITMAPS
DIBDBI I I I I
07 Separate Data bitmap and inode bitmap

Technique 2: Layout Policy
SidIIIII DDDDDDDD
0 78 15 DDDDDDDD DDDDDDDD
16 23 24 31
Assuming all free, which should be chosen?

inode
Bad File Layout
0
SidIIIII DDDDDDDD
0 78 15
321
DDDDDDDD DDDDDDDD
16 23 24 31

Better File Layout
inode
0123
SidIIIII DDDDDDDD
0 78 15
DDDDDDDD DDDDDDDD
16 23 24 31

Best File Layout
inode
0123
SidIIIII DDDDDDDD
0 78 15 DDDDDDDD DDDDDDDD
16 23 24 31 Can’t do this for all filesL

16
What does FS do for ls? What does FS do for ls –l?
23 24 31
Read directory data for file names
Read inodes of all files to show size and other metadata
Inode Layout matter?
inode
SidIIIII DDDDDDDD
0 78 15 DDDDDDDD DDDDDDDD
Keep data near its inode
Inodes in same directory should be near one another

Technique 2: Groups
slower
0N
before: whole disk How to keep inode close to data?
super block
bitmaps
inodes
Data Blocks

PLACEMENT Technique: Groups
S
B
I
D
S
B
I
D
S
B
I
D
0
group 1 G group 2 2G group 3 3G Key idea: Keep inode close to data
Use groups across disks;
Strategy: allocate inodes and related data blocks in same group

PLACEMENT TECHNIQUE: Groups
In FFS, groups were ranges of cylinders called cylinder group
In ext2, ext3, ext4 groups are ranges of blocks called block group

REPLICATED SUPER BLOCKS
S
B
I
D
S
B
I
D
S
B
I
D
0 group 1 G group 2 2G group 3 3G Is it useful to have multiple super blocks?

Yes, if some blocks (but not all) fail.

Problem
solution: for each group, store super-block at different offset
Old FS: All super-block copies are on the top platter. Correlated failures! What if top platter damage?

Smart Policy
… 0 group 1 G group 2 2G group 3 3G
Where should new inodes and data blocks go?
S
B
I
D
S
B
I
D
S
B
I
D

PLACEMENT Strategy
Put related pieces of data near each other (i.e., in same group) Rules:
1. Put data blocks near inodes
2. Put inodes near directory entries
Problem: File system is one big tree
All directories and files have a common root. All data in same FS is related in some way
Trying to put everything near everything else doesn’t make any choices!

Where to cut the tree and start growing into another group?
B1
B2
B3
B1
B2
dir inode dir data many
pointer related
file inode
break
dir inode
FFS puts dir inodes in a new group
“ls -l” is fast on directories with many files

POLICY SUMMARY
File inodes: allocate in same group with dir
Dir inodes: allocate in new group
Which group to pick?
One with with fewer used inodes than average group
First data block: allocate near inode (same group)
Other data blocks: allocate near previous block (in case had to move groups)

Problem: Large Files
Single large file can fill nearly all of a group Displaces data for many small files
Most files are small!
Better to do one seek for one large file than one seek for each of many small files

SPLITTING LARGE FILES
Define “large” as requiring an indirect block
Starting at indirect (e.g., after first 48 KB) put blocks in a new block group
Each chunk corresponds to one indirect block
Block size 4KB, 4 byte per address => 1024 address per indirect 1024*4KB = 4MB contiguous “chunk”

Large files: where to cut the tree and start growing into another group?
B1
B2
BI3nd B1 B2
dir inode
many
pointer related
dir data
file inode
break
dir inode
B3 B4
Define “large” as requiring an indirect block
Starting at indirect (e.g., after 48 KB) put blocks in a new block group
break

POLICY SUMMARY
File inodes: allocate in same group with dir
Dir inodes: allocate in new group with fewer used inodes than average group
First data block: allocate near inode
Other data blocks: allocate near previous block
Large file data blocks: after 48KB, go to new group.
Move to another group (w/ fewer than avg blocks) every subsequent chunk (e.g., 1MB or 4MB).

Group Descriptor (aka Summary Block)
How does file system know which new group to pick?
0G
Tracks number of free inodes and data blocks Something else to update on writes
super block
summary
bitmaps
inodes
Data Blocks

OTHER FFS FEATURES
FFS also introduced several new features:
– long file names
– atomic rename
– symbolic links
– large blocks (with libc buffering / fragments)

Large or Small blocks
• Large blocks:
– Improve performance
– Waste space (most files are small)
• Small blocks
– Don’t waste as much space
– Bad performance (more seeks)
• How to get best of both worlds? Hybrid solution?

Solution: Fragments
Hybrid Solution
– Combinebestoflargeblocksandbestofsmallblocks
Use large block when file is large enough
Introduce “fragment” for files that use parts of blocks – Onlytailoffileusesfragments

Fragment Example
Block size = 4096 Fragment size = 1024
bitmap:0000 0000 1111 0010 blk1 blk2 blk3 blk4
Whether addr refers to block or fragment is inferred by file offset What about when files grow?
Must copy fragments to new block if no room to grow

B
A
B
AAAA
File A, size 5KB
File B, size 2KB

File A, size 6KB
B
A
B
A
AAAA
append A to first file
File B, size 2KB

File A, size 7KB
File B, size 2KB
B
A
B
A
A
AAAA
append A to first file
Not allowed to use fragments across multiple blocks! What to do instead?

File A, size 8KB
B
B
AAAA
AAAA
append A to first file,
copy to fragments to new block
File B, size 2KB

Optimal Write Size
Writing less than a full block is inefficient
– Potentially requires moving fragments
– Small writes don’t get full sequential bandwidth of disk
Solution:
New API exposes optimal write size
Library enables caching writes in memory until large enough size

FFS: SECTOR placement
Similar to track skew in disks chapter
Modern disks: Disk cache

First disk-aware file system – Bitmaps
– Locality groups
– Rotated superblocks
– Smart allocation policy
Inspired modern files systems, including ext2 and ext3
FFS SUMMARY

OTHER TAKEAWAYs
All hardware is unique
Treat disk like disk!
Treat flash like flash!
Treat random-access memory like random-access memory!

NEXT STEPS
Next class: How to provide consistency despite failures?