PERSISTENCE:
FILE SYSTEM structures
Andrea Arpaci-Dusseau CS 537, Fall 2019
ADMINISTRIVIA
Midterm 2 Grades in Canvas
Make-up Points for Projects 2 + 3 (more later) – Canvas Quizzes
Specification and Concepts/Implementation Can take only one time
Up to 35 more points (scaled);
if > 70 points, no benefit
if received < 50 points, expect to take Due 1 week from when made available (Fri, Mon)
Project 6
Due next Friday
Specification Quiz – Due by tomorrow 6pm (idea: before Discussion Section) Discussion this week:
Project 6 - MapReduce
AGENDA / LEARNING OUTCOMES
What on-disk structures represent files and directories? Contiguous, Extents, Linked, FAT, Indexed, Multi-level indexed
What disk operations are needed for: make directory
open file write/read file close file
RECAP
File API WITH FILE DESCRIPTORS
int fd = open(char *path, int flag, mode_t mode) read(int fd, void *buf, size_t nbyte)
write(int fd, void *buf, size_t nbyte)
close(int fd)
advantages:
- string names
- hierarchical
- traverse once
- offsets precisely defined
inodes 0
1 2 3
...
Example: open /etc/bashrc How many reads???
“bashrc”: 3, ... “etc”: 0, ...
# settings: ...
location size=12
location size
location size
location size=6
1. Inode #2 à Get location of root directory
2. Read root directory data; see “etc” maps to inode 0
3. Inode #0 à Get location of etc directory
4. Read /etc directory; see “bashrc” is at inode 3
5. Inode #3 à Get location of /etc/bashrc file data
6. Read /etc/bashrc file data
inode number
fd table
Descriptor objects
Code Snippet: Open vs. Dup
012345
“file.txt” in directory also points here
offset = 0 12 inode =
inode
location = ... size = ...
offset = 0 inode =
int fd1 = open(“file.txt”); // returns 3 read(fd1, buf, 12);
int fd2 = open(“file.txt”); // returns 4 int fd3 = dup(fd2); // returns 5
Deleting Files
There is no system call for deleting files!
Inode (and associated file) is garbage collected when there are no references
Paths are deleted when: unlink() is called
à File not deleted when other hard links point to it
à File could be deleted when only a symbolic link points to it
FDs are deleted when: close() or process quits à File not deleted if any process has it open
FILESYSTEM DISK STRUCTURES
FS Structs: how to use disk Blocks?
DDDDDDDD DDDDDDDD
0 78 15
DDDDDDDD DDDDDDDD
16 23 24 31
DDDDDDDD DDDDDDDD
32 39 40 47
DDDDDDDD DDDDDDDD
48 55 56 63 Assume each block is 4KB
Similarity to Memory?
Same principle:
map logical abstraction to physical resource
Process 1
Process 3 Process 2
Logical View: Address Spaces
Physical View
Allocation Strategies
Many different approaches
• Contiguous
• Extent-based
• Linked
• File-allocation Tables
• Indexed
• Multi-level Indexed
What metrics matter?
Questions
• Amount of fragmentation (internal and external)
• Ability to grow file over time?
• Performance of sequential accesses (contiguous layout)?
• Speed to find data blocks for random accesses?
• Wasted space for meta-data overhead (everything that isn’t data)?
– Meta-data must be stored persistently too!
Contiguous Allocation
Allocate each file to contiguous sectors on disk
– Meta-data: Starting block and size of file
– OS allocates by finding sufficient free space
• Must predict future size of file; Should space be reserved?
– Example: IBM OS/360
AAA BBBBCCC
Fragmentation (internal and external)? Ability to grow file over time?
Seek cost for sequential accesses? Speed to calculate random accesses? Wasted space for meta-data?
- Horrible external frag (needs periodic compaction) - May not be able to without moving
+ Excellent performance
+ Simple calculation
+ Little overhead for meta-data
Small # of ExtentS
Allocate multiple contiguous regions (extents) per file – Meta-data: Small array (2-6) designating each extent
Each entry: starting block and size AAA BBBBCCC
DDAAADBBBBCCCBB
Fragmentation (internal and external)? Ability to grow file over time?
Seek cost for sequential accesses? Speed to calculate random accesses? Wasted space for meta-data?
+/- Helps external fragmentation (until out of extents)
+/- Can grow (until run out of extents)
+ Still good performance (generally) + Still simple calculation
+ Still small overhead for meta-data
Linked Allocation
Allocate linked-list of fixed-sized blocks (multiple sectors)
– Meta-data:
Location of first block of file
–
Examples: TOPS-10, Alto
Each block also contains pointer to next block DDAAADBBBBCCCBBDBD
Fragmentation (internal and external)? Ability to grow file over time?
Seek cost for sequential accesses? Speed to calculate random accesses? Wasted space for meta-data?
+ No external frag (use any block); internal?
+ Can grow easily
+/- Depends on data layout
- Ridiculously poor
- Waste pointer per block Trade-off: Block size (does not need to equal sector size)
File-Allocation Table (FAT)
Variation of Linked allocation
– Keep linked-list information for all files in on-disk FAT table
– Meta-data: Location of first block of file
• And, FAT table itself
DDAAADBBBBCCCBBDBD 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Draw corresponding FAT Table?
Comparison to Linked Allocation
• Same basic advantages and disadvantages
• Disadvantage: Read from two disk locations for every data read
• Optimization: Cache FAT in main memory
– Advantage: Greatly improves random accesses
– What portions should be cached? Scale with larger file systems?
Indexed Allocation
Allocate fixed-sized blocks for each file
– Meta-data: Fixed-sized array of block pointers
– Allocate space for ptrs at file creation time
DDAAADBBBBCCCBBDBD
Advantages
• No external fragmentation
• Files can be easily grown up to max file size (determined by number of block pointers)
• Supports random access
Disadvantages
• Large overhead for meta-data:
– Wastes space for unneeded pointers (most files are small!)
• Additional pointers to blocks of pointers
– Examples: UNIX FFS-based file systems, ext2, ext3
Data blocks
double indirect
indirect
triple
indirect
Multi-Level Indexing
Variation of Indexed Allocation
– Dynamically allocate hierarchy of pointers to blocks as needed
– Meta-data: Small number of pointers allocated statically
inode
indirect
Comparison to Indexed Allocation
• Advantage: Does not waste space for unneeded pointers
+ Still fast access for small files
+ Can grow to very large size
• Disadvantage: Need to read indirect blocks of pointers to find addresses (extra disk read) – Keep indirect blocks cached in main memory (esp for sequential)
Flexible # of ExtentS
Modern file systems:
Dynamic multiple contiguous regions (extents) per file – Organize extents into multi-level tree structure
• Each leaf node: starting block and contiguous size for data
• Minimizes meta-data overhead when have few extents
• Allows growth beyond fixed number of extents
Fragmentation (internal and external)? Ability to grow file over time?
Seek cost for sequential accesses? Speed to calculate random accesses? Wasted space for meta-data?
+ Both reasonable + Can grow
+ Still good performance
+/- Some calculations depending on size + Relatively small overhead
Assume Multi-Level Indexing
Simple approach
More complex file systems build from these basic data structures
On-Disk Structures
- data block
- inode table
- indirect block - directories
- data bitmap
- inode bitmap - superblock
FS Structs: Empty Disk
DDDDDDDD DDDDDDDD
0 78 15
DDDDDDDD DDDDDDDD
16 23 24 31
DDDDDDDD DDDDDDDD
32 39 40 47
DDDDDDDD DDDDDDDD
48 55 56 63 Assume each block is 4KB
FS Structs: DATA BLOCKS
DDDDDDDD DDDDDDDD 0 78 15 DDDDDDDD DDDDDDDD 16 23 24 31
DDDDDDDD DDDDDDDD
32 39 40 47
DDDDDDDD DDDDDDDD
48 55 56 63
Not actual layout : Examine better layout in next lecture Relative number of blocks is important
INODes
DDDIIIII DDDDDDDD 0 78 15 DDDDDDDD DDDDDDDD 16 23 24 31
DDDDDDDD DDDDDDDD
32 39 40 47
DDDDDDDD DDDDDDDD
48 55 56 63
One Inode Block
Each inode is typically 256 bytes (depends on the FS, maybe 128 bytes)
– 4KB disk block
– 16 inodes per inode block
How to modify 1 inode?
Static calculation to determine where particular inode resides on disk
inode 16
inode 17
inode 18
inode 19
inode 20
inode 21
inode 22
inode 23
inode 24
inode 25
inode 26
inode 27
inode 28
inode 29
inode 30
inode 31
Inode
type (file or dir?)
uid (owner)
rwx (permissions) size (in bytes)
Num Blocks
time (access) ctime (create) mtime (modify) links_count (# paths) addrs[N] (N data blocks)
FS Structs: INODE DATA POINTERS
DDDIIIII DDDDDDDD 0 78 15 DDDDDDDD DDDDDDDD 16 23 24 31
DDDDDDDD DDDDDDDD
32 39 40 47
DDDDDDDD DDDDDDDD
48 55 56 63
Inode
Assume single level
(just pointers to data blocks)
What is max file size?
Assume 256-byte inodes
(assume all can be used for pointers) Assume 4-byte addrs
256 / 4 = 64 pointers 64 * 4K = 256 KB!
How to get larger files?
type (file or dir?)
uid (owner)
rwx (permissions) size (in bytes) Blocks
time (access) ctime (create) links_count (# paths) addrs[N] (N data blocks)
data
data
data
data
inode
indirect
indirect
indirect
indirect
Largest file size with 64 indirect blocks? 64 * 1024 * 4KB = 64 MB
Any Cons?
what if we want to optimize for small files?
Indirect blocks are stored in regular data blocks
inode
Multi-level index
data
data
data
indirect
Better for small files!
How to handle even larger files?
Double indirect blocks Triple indirect blocks
Example:
12 direct pointers + single + double indirect block. Block size of 4 KB and 4-byte pointers
(12 + 1024 + 1024*1024 ) × 4 KB) = 4 GB
inode
File systems vary
Common design:
Store directory entries in data blocks
Large directories just use multiple data blocks
Use bit in inode to distinguish directories from files
Various formats could be used - lists
- b-trees
Directories
Simple Directory List Example
valid name inode
1
.
134
1
..
35
1
foo
80
1
bar
23
unlink( “foo”)
Remove entry from directory What do we do to the inode?
Allocation
How do we find free data blocks or free inodes?
Free list
Bitmaps (one bit designates state of each block; set to 1 if allocated, 0 if free) Tradeoffs in next lecture...
FS Structs: BITMAPS
DIBDBI I I I I DDDDDDDD 0 78 15 DDDDDDDD DDDDDDDD 16 23 24 31
DDDDDDDD DDDDDDDD
32 39 40 47
DDDDDDDD DDDDDDDD
48 55 56 63
Superblock
Need to know basic FS configuration metadata, like: - block size
- # of inodes
Store this in superblock
FS Structs: SUPERBLOCK
SIBDBI I I I I DDDDDDDD
0 78 15
DDDDDDDD DDDDDDDD
16 23 24 31
DDDDDDDD DDDDDDDD
32 39 40 47
DDDDDDDD DDDDDDDD
48 55 56 63
SUMMARY
Super Block Inode Bitmap Data Bitmap
Data Block
directories indirects
Inode Table
- create file - write
- open
- read
- close
Part 2 : Operations
create /foo/bar
data inode root foo bar root foo bitmap bitmap inode inode inode data data
read write
read
read
read
read
write
write
read write
What needs to be read and written?
open /foo/bar
data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data
read
read
read
read
read
write to /foo/bar (assume file exists and has been opened)
data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data
read write
read write
write
read /foo/bar – assume opened
data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data
read write
read
close /foo/bar
data inode root foo bar root foo bar bitmap bitmap inode inode inode data data data
nothing to do on disk!
Efficiency
How can we avoid this excessive I/O for basic ops?
Cache for:
- reads
- write buffering
Write Buffering
Why does procrastination help?
Overwrites, deletes, scheduling
Shared structs (e.g., bitmaps+dirs) often overwritten.
We decide: how much to buffer, how long to buffer... - tradeoffs?
NEXT STEPS
Next class: UNIX Fast-File System