File Systems
CPSC 313 – 2019W2 File Systems
1
CPSC 313
Where we are going
System calls we use to access files: open/close/read/write
Storage Devices (Disk, SSD)
CPSC 313 – 2019W2 File Systems
2
Why are we studying this?
• Understand the issues affecting file system performance
• Use this information to improve the performance of programs
• File systems are constrained by the medium they are stored on (disks)
– writing to disk is 100,000 (SSD) to 10,000,000 times slower than to memory
• To help inform our understanding of how file systems work, we need to know something about how disks work
CPSC 313 – 2019W2 File Systems
3
CPSC 313
Extending our View of a Computer System
Execution Cores
Execution Cores
L1 Cache
L1 Cache
Bus Interface
L2 Cache
Host bus adapter
System bus
I/O Bridge
Memory bus
Memory
IO bus
CPSC 313 – 2019W2 File Systems
4
CPSC 313
Why Disks? (Isn’t everything SSD?)
• Much of file systems work for the past forty years has been based on spinning disk. If you want to understand why file systems work as they do, then you also have to understand how disks work.
• Some newer file systems are designed to work specifically with SSD (flash), but the majority of file systems in use were designed with spinning disks in mind.
CPSC 313 – 2019W2 File Systems
5
CPSC 313
Disk
Platters
Tracks
Sectors
Disks
Disk arms
CPSC 313 – 2019W2 File Systems
6
CPSC 313
Disk Heads: Basics
Arm
Heads
Disk Head
Disk Surface
CPSC 313 – 2019W2 File Systems
7
Tracks
Surface Layout
Surface
Track k
Gaps
Spindle
Sectors
Adapted from: Computer Systems: A CPSC 313 – 2019W2 File Systems
8
Programmer’s Perspective
Rotating Disks characteristics • Hard disks: platter view (from the side)
Cylinder k
Surface 0
Surface 1 Surface 2
Surface 3 Surface 4
Surface 5
Platter 0 Platter 1
Platter 2
Spindle
CPSC 313 – 2019W2 File Systems
9
IBM 350 DISK System
From: http://en.wikipedia.org/wiki/File:BRL61-IBM_305_RAMAC.jpeg CPSC 313 – 2019W2 File Systems
152 cm long (60 inches) 172 cm high (68 inches) 74 cm wide (29 inches)
Weight: > 2000 lbs
10
IBM 350 Disk System – 1956
• 3.3MB
• 50platters@1200RPM
• 20trackstotheinch
• Recordedat100bits/inch
• 50,000 sectors/drive
• 1006bitcharacters/sector
• IBM350+diskleasedfor$3200 /month (equivalent to $27,287 in 2016 dollar).
• Weighedover1ton
From http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html CPSC 313 – 2019W2 File Systems
11
Disk Drive
CPSC 313 – 2019W2 File Systems
12
Head Crash
http://www.bolhuijo.com/gallery/diskdrive/aaa
CPSC 313 – 2019W2 File Systems
13
Disk in Action
Spindle
CPSC 313 – 2019W2 File Systems
14
CPSC 313 – 2019W2 File Systems
15
Today
– Midterm, one week from today! (A3 is still a really good way to study.)
• Hint: Both sections take the same exam, reviewing notes from both sections is undoubtedly a good idea.
– File systems will not be on the midterm.
• Today’sLearningOutcomes
– Use the file system APIs to read/write data.
– Identify the different things that a file system has to do to bridge between programmatic file system APIs and a storage device.
• Notes:
– Quiz due!
CPSC 313 – 2019W2 File Systems
17
Disk in Action
Spindle
CPSC 313 – 2019W2 File Systems
18
A Few terms
• Seek time: how long it takes for the head to move to the proper track
• Rotational latency: how long it takes for the sector to arrive at the head
• Transfer time : how long it takes to copy the data to, or from, a sector
• A sector is the smallest amount of data that can be transferred to/from a disk. Most disk drive sector sizes now are 4KiB (4096)
CPSC 313 – 2019W2 File Systems
19
CPSC 313
Performance and Capacity over Time
• Speed, as in how fast data can be transferred off a disk drive, is a function of:
– RPM – revolutions per minute (7200 – 10,000)
– Size of the device (1.8” – 3.5”)
– Density (bits per track; tracks per inch) 500-600Gbit/in2
Capacity
CPSC 313 – 2019W2 File Systems
20
How do we talk to these device? • File systems!
– When you access your data, you don’t see a disk, what do you see?
CPSC 313 – 2019W2 File Systems
21
How do we talk to these device? • File systems!
– When you access your data, you don’t see a disk, what do you see?
• Files
• Folders (directories)
• Names
• Automatic connections between files and applications (i.e., if you open a powerpoint document, you aren’t looking at the raw data).
CPSC 313 – 2019W2 File Systems
22
• File systems!
Hierarchical name space (provided by the file system)
How do we talk to these device?
– When you access your data, you don’t see a disk, what do you see?
• Files
• Folders (directories)
• Names
Functionality provided by your GUI and/or command interpreter (CLI)
• Automatic connections between files and applications (i.e., if you open a powerpoint document, you aren’t looking at the raw data).
CPSC 313 – 2019W2 File Systems
23
The POSIX File System API
• These are basically library functions (in libc) that access system
calls.
• open: takes a pathname (the name of a file) and returns a file
descriptor (fd), which you will use to access that file.
• close: takes an fd and frees up all the OS and library
resource that have been allocated to it.
• read: reads data from the file referenced by fd into a user- specified buffer.
• write: takes data from a buffer and writes it to the file. CPSC 313 – 2019W2 File Systems
24
CPSC 313
A Copy Function in Pseudo code
copy(infile, outfile)
fd_in = open infile for reading
fd_out = open outfile for writing
while (there is data left to read) read data from fd_in
write data to fd_out
close fd_in close fd_out
CPSC 313 – 2019W2 File Systems
25
open
• int open(const char *path, int oflag, int mode)
• Parameters:
– path: the path name of the file you wish to open
– oflag: flag values that tell you things like what to do if the file does/does not exist (O_CREATE, O_EXCL), how the file should behave with respect to any caching that the operating system does (O_SYNC, O_DIRECT), etc.
– mode: if the file does not exist and you create it, what should its access mode be set to (e.g., who is allowed to read/write it). You can ignore this for now; we’ll come back to it.
• Return value: a file descriptor
– non-negative integer on success; -1 on error).
• Example
– open(“myfile”, O_CREAT, 0600)
CPSC 313 – 2019W2 File Systems
26
CPSC 313
• int close(int fd)
• Parameters:
– fd: the file descriptor to be closed
• Return value: 0 on success; -1 on error
• Example
– int ret = close(fd);
close
CPSC 313 – 2019W2 File Systems
27
read
• ssize_t read(int fd, void *buf, size_t nbyte)
• Parameters:
– fd: file descriptor for the file from which to read – buf: a buffer into which the bytes read are placed – nbyte: the number of bytes to read
• Return value: a ssize_t (a signed long) – On success: the number of bytes read
– On end-of-file: 0
– On failure: -1
• Example
– char buf[4096];
– ssize_t bytes_read = read(fd, buf, 4096);
CPSC 313 – 2019W2 File Systems
28
write
• ssize_t write(int fd, const void *buf, size_t nbyte)
• Parameters:
– fd: file descriptor for the file to which to write – buf: a buffer containing the data to write
– nbyte: the number of bytes to write
• Return value: a ssize_t (a signed long) – On success: the number of bytes written
– On failure: -1
• Example
– ssize_t bytes_written = write(fd, buf, 4096);
CPSC 313 – 2019W2 File Systems
29
A Copy Program
#include
/*
* Usage: copy src dest
* This is not ideal error handling,
int main (int argc, char *argv[])
{
assert(argc == 3);
int fd_in = open(argv[1], O_RDONLY, 0); int fd_out = open(argv[2],
O_CREAT|O_TRUNC|O_WRONLY, 0666); assert(fd_in >= 0 && fd_out >= 0);
ssize_t nbytes; char buf[4096];
while ((nbytes =
read(fd_in, buf, 4096)) > 0) {
assert(write(fd_out, buf, nbytes) }/* == nbytes);
* If we get here, then either we’re at end of * file or we had an error.
*/
if (nbytes < 0)
* but is designed for brevity! */
}
fprintf(stderr,
"Read failed %s\n", strerror(errno));
CPSC 313 - 2019W2 File Systems
30
(void)close(fd_in); (void)close(fd_out);
CPSC 313
See if you can figure out some/all of the functionality that has to happen between the APIs we use and a storage device.
This is what a file system has to do!
Posix API: hierarchical name space, byte-streams, open, close, read, write
Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
CPSC 313 - 2019W2 File Systems
31
CPSC 313
See if you can figure out some/all of the functionality that has to happen between the APIs we use and a storage device.
This is what a file system has to do!
Posix API: hierarchical name space, byte-streams, open, close, read, write
open
close
read write
Map: Name to file metadata
Allocate and manage file descriptors
Map: fd to in- memory object
Find location of file metadata
Read/write data from/to disk Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
Read/write metadata
Map: memory object to disk blocks
CPSC 313 - 2019W2 File Systems
32
CPSC 313
The Parts of a File System
• Metadata: Information about the entire file system.
• Naming: Mapping symbolic names to persistent data structures – Providing support for organizing names into a hierarchy
• Managing file descriptors:
– Allocating file descriptors
– Mapping file descriptors to actual files
• Storing files on disk: Mapping a file to its collection of blocks – Maintaining metadata for each file (or directory)
– Deciding where to place files on disk
CPSC 313 - 2019W2 File Systems
33
CPSC 313
Today • Today’s Learning Outcomes
– Explain the different data structures that map between file descriptors and files.
– Relate these different representations to what aspects of file management are shared among threads, processes, users, etc.
– Identify when two file accesses share an offset and when they do not.
CPSC 313 - 2019W2 File Systems
34
CPSC 313
See if you can figure out some/all of the functionality that has to happen between the APIs we use and a storage device.
This is what a file system has to do!
Posix API: hierarchical name space, byte-streams, open, close, read, write
open
close
read write
Map: Name to file metadata
Read/write metadata
Allocate and manage file descriptors
Map: fd to in- memory object
Find location of file metadata
Map: memory object to disk blocks
Read/write data from/to disk Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
CPSC 313 - 2019W2 File Systems
35
What should be shared?
• If you and I are both allowed to read a file in the file system, should we be able to share the file’s data?
• If two processes are reading (writing) the same file, should they be using the same file offset?
• If a process opens the same file twice, should it have one file descriptor or two? In either case, should the two opens share an offset?
• If two threads are using the same file descriptor, should they be using the same file offset?
• If one process (the parent) creates another process (a child), should the child inherit the parent’s file descriptors? Will they use the same offset?
CPSC 313 - 2019W2 File Systems
36
Code
CPSC 313 - 2019W2 File Systems
37
CPSC 313
What should be shared?
• If you and I are both allowed to read a file in the file system, should we be
able to share the file’s data? Yes!
• If two processes are reading (writing) the same file, should they be using
the same file offset? No!
• If a process opens the same file twice, should it have one file descriptor or
two? In either case, should the two opens share an offset? No!
• If two threads are using the same file descriptor, should they be using the
same file offset? Yes!
• If one process (the parent) creates another process (a child), should the child inherit the parent’s file descriptors? Will they use the same offset?
Yes!
Yes!
CPSC 313 - 2019W2 File Systems
38
CPSC 313
Per-Process data Structures
• Facts:
– When a process issues the open system call, the return value is a small
integer called a file descriptor.
– Every process starts with three open file descriptors: • Standard input: fd = 0 (STDIN_FILENO)
• Standard output: fd = 1 (STDOUT_FILENO)
• Standard error: fd = 2 (STDERR_FILENO)
• Note: These are not stdin, stdout, stderr – those have different types. – What are stdin, stdout, stderr anyway?
• Given these facts: What data structure(s) do we need to retain for each process?
CPSC 313 - 2019W2 File Systems
39
CPSC 313
Per-Process data Structures
• Facts:
– When a process issues the open system call, the return value is a small
integer called a file descriptor.
– Every process starts with three open file descriptors: • Standard input: fd = 0 (STDIN_FILENO)
• Standard output: fd = 1 (STDOUT_FILENO)
• Standard error: fd = 2 (STDERR_FILENO)
• Note: These are not stdin, stdout, stderr – those have different types. – What are stdin, stdout, stderr anyway?
• Given these facts: What data structure(s) do we need to retain for each process? A file descriptor table.
CPSC 313 - 2019W2 File Systems
40
CPSC 313
(File) Descriptor Table
• A per-process data structure.
• The file descriptors returned by open correspond to indices into this table.
– Example: If a process calls open and gets an fd of 3 returned, then the file is represented by the 4th (recall that C arrays are 0-based) entry in that table.
• The table is mostly going to contain a pointer to other structures.
• Question: Can we store the offset in this table? CPSC 313 - 2019W2 File Systems
41
CPSC 313
(File) Descriptor Table
• Aper-processdatastructure.
• Thefiledescriptorsreturnedbyopencorrespondtoindicesinto
– Example: If a process calls open and gets an fd of 3 returned, then the file is represented by the 4th (recall that C arrays are 0-based) entry in that table.
• Thetableismostlygoingtocontainapointertootherstructures.
• Question: Can we store the offset in this table?
– No! Recall that a parent and child process are:
a. Different processes therefore have different file descriptor tables
b. But they share the offset of any file descriptors open when the parent forked.
this table.
CPSC 313 - 2019W2 File Systems
42
CPSC 313
So, where do we store offsets? • An offset corresponds to a call to open!
• So, we’re going to need a data a structure corresponding to each open.
• The (open) file table: Shared by all processes.
– Maintains the file position (i.e., offset in the file).
– Keeps a reference count (for when we have two fds referencing the same open file object (we’ll see how that can happen).
– Keeps a reference to an object that represents the actual file. CPSC 313 - 2019W2 File Systems
43
CPSC 313
Finally: In-memory objects representing files
• Vnode: Virtual node: in-memory representation of a file • Vnode table: collection of all vnodes; shared among all
processes.
– Contains a copy of the file’s meta-data
– Including a way to locate the file’s blocks on disk (we’ll talk more about precisely what this looks like when we talk about different ways to store files on disk).
CPSC 313 - 2019W2 File Systems
44
Descriptor table (one table
per process)
Open file table (shared by all processes)
v-node table (shared by all processes)
The Kernel View
File access
File size
File type
Pointers to Buffers
File A
File pos
refcnt=1
The above is one struct in the open file table
stdin fd0 stdout fd1 stderr fd2 fd 3 fd 4
Adapted from: Computer Systems: A Programmer’s Perspective CPSC 313 - 2019W2 File Systems
45
...
...
Actions on open()
Also includes info on how to find blocks of the file, if needed
v-node table
Descriptor table (one table
per process)
stdin fd0 stdout fd1 stderr fd2 fd 3 fd 4
fd = open("B",...)
Open file table (shared by all processes)
File perm
File size
File type
File perm
File size
File type
File A
File pos
refcnt=1
File B
File pos
refcnt=1
CPSC 313 - 2019W2 File Systems
46
... ...
... ...
Today
• Finish off from Wednesday: dup
• Today’s Learning Outcomes
– Explain how files might be represented on disk
– Use file system terminology: • Superblock
• Inode
• Inode number or inumber • Directory
• Buffer cache
• Internalfragmentation
• Externalfragmentation
– Compare and contrast different file representations and identify the tradeoffs among them.
CPSC 313 - 2019W2 File Systems
47
Midterm 2 – Monday March 16
• It will take place but whether it is online or in person is TBD
• Scenario 1: UBC switches to online mode
– The midterm will be moved online most likely through Canvas
– Access will begin at the scheduled time and end at the scheduled time
– Appropriate adjustments will be made for those who have reported a conflict, or write with the accessibility office, or are subject to other extenuating circumstances
• Scenario 2: Situation is the same as today
– To comply with the 250 person limit on gatherings • Section203(Donald’sclass)willwriteinDempster310 • Section204(Margo’sclass)willwriteinESB
• We will use piazza, Canvas announcements, and email to advise you if the midterm goes online as soon as we know what is happening at UBC on Monday.
CPSC 313 - 2019W2 File Systems
48
Recall - Actions on open()
Descriptor table (one table
per process)
stdin fd0 stdout fd1 stderr fd2 fd 3 fd 4
fd = open("B",...)
Open file table (shared by all processes)
v-node table
Also includes info on how to find blocks of the file, if needed
File perm
File size
File type
File perm
File size
File type
File A
File pos
refcnt=1
File B
File pos
refcnt=1
CPSC 313 - 2019W2 File Systems
50
... ...
... ...
Parent and Child have the same FD
Descriptor table (one table
per process)
fd 0 fd 1 fd 2 fd 3 fd 4
fd 0 fd 1 fd 2 fd 3 fd 4
Open file table (shared by all processes)
v-node table
File A
File B
File access
File size
File type
File access
File size
File type
File pos
refcnt=12
CPSC 313 - 2019W2 File Systems
51
...
...
...
...
Another way to have 2 file descriptors point to the same thing dup() and dup2()
• The Unix system call dup2, which has the form dup2(fd, newfd) copies fd to newfd in the descriptor table.
a
b
b
b
fd0 dup2(4,1) fd0
fd1 fd2 fd3 fd4
fd1 fd2 fd3 fd4
The dup (dup2) system call(s) “duplicates a file descriptor ” int dup(int fd);
int dup2(ind fd1, int fd2);
52
CPSC 313 - 2019W2 File Systems
open("/tmp/out",...);
dup2(4,1);
close(4);
Process file descriptor table
fd 0
fd 1
fd 2
fd 3
fd 4
terminal
dup2 example
File access
File size
File type
File pos
refcnt=10
File access
File size
File type
/tmp/out
File pos
refcnt=12
CPSC 313 - 2019W2 File Systems
53
...
...
...
...
Pseudo code for shell
for (;;) { // fgrep foo bar > foo.txt cmd = parse input
if (child = fork()) {
wait for child to finish
} else {
if (redirection needed) {
open file to use as stdout
} dup2
exec cmd }
}
CPSC 313 – 2019W2 File Systems
54
Posix API: hierarchical name space, byte-streams, open, close, read, write
open
close
read write
Map: Name to file metadata
Allocate and manage file descriptors
Map: fd to in- memory object
Find location of file metadata
Read/write data from/to disk Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
Read/write metadata
Map: memory object to disk blocks
CPSC 313 – 2019W2 File Systems
55
What is the Problem we Have to Solve?
• How do we represent a file, given: – Files come in all different sizes
– Disk storage is arranged in sectors
• Constraints:
– All data and metadata has to live on the disk/SSD (which are accessed
in block-sized units).
• You can read it into memory and cache it, but when you turn off your machine, the data and metadata need to be on the disk so you can start up next time you turn on the machine.
CPSC 313 – 2019W2 File Systems
56
Traditional Implementation Details
• Metadatathatdescribesanentirefilesystemtypicallyresidesina
special block (on disk) called a superblock.
– There are frequently many copies of the superblock stored in fixed
locations on disk. Why?
• Historically, we assign every file (or directory) on a file system a
unique number, called an inumber (for index number).
• Inodenumbersareusedtolocateinodes,whicharetraditionally
the on-disk file metadata structure.
• Fortherestofthisdiscussion,we’llassumethatthereisawayto map an inode to specific disk block.
CPSC 313 – 2019W2 File Systems
57
Design Task 1: File Representation • How might you represent a file?
– Requirements
• Must support sequential and random access to a file.
• Must be reasonably efficient – not use too much space, provide good performance.
– Questions to think about:
• In what size pieces will you allocate disk space to files?
• What per-file metadata do you need?
• Where will you store metadata?
• What is the ratio of metadata to data for your representation?
• What are the advantages/disadvantages of different approaches?
CPSC 313 – 2019W2 File Systems
58
Allocation Units
• Allocation units (in what unit do we allocate disk sectors to files):
– Fixed sized block
– A small number of fixed size blocks – Variable sizes blocks (called extents)
• What are the pros and cons of each approach? CPSC 313 – 2019W2 File Systems
59
CPSC 313
CPSC 313 – 2019W2 File Systems
60
Midterm 2 – Monday March 16 — NOT!
• Wasn’t that exciting?
• The plus is that you are now comfortable with the workflow.
• We really are sorry (you have no idea just how sorry …)
• But moving forward:
– Monday, March 23: The Real Midterm 2
– Friday, March 27: Assignment 3 due
– Friday, April 3: Quiz #5
– Friday, April 10: Assignment 4 due (tentative)
CPSC 313 – 2019W2 File Systems
61
Some logistics • If things go right this will be recorded
• First when giving the presentation I can’t see who has a question etc
• Two TAs, Wes and Sameer, will moderate
– Use Chat to say you have a question or raise your hand
– When you are told to go ahead unmute your microphone mute it when you are done
– If you don’t have a mic – type your question and one of the TAs will repeat it for everyone
CPSC 313 – 2019W2 File Systems
62
Today • More on representing files:
– Compare and contrast different file representations and identify the tradeoffs among them.
• If we have time: naming!
CPSC 313 – 2019W2 File Systems
63
We were here!
Posix API: hierarchical name space, byte-streams, open, close, read, write
open
close
read write
Map: Name to file metadata
Map: fd to in- memory object
Read/write metadata
Allocate and manage file descriptors
Find location of file metadata
Map: memory object to disk blocks
Read/write data from/to disk Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
CPSC 313 – 2019W2 File Systems
64
And we were trying to solve this problem:
• How do we represent a file, given: – Files come in all different sizes
– Disk storage is arranged in sectors
• Constraints:
– All data and metadata has to live on the disk/SSD (which are accessed
in block-sized units).
• You can read it into memory and cache it, but when you turn off your machine, the data and metadata need to be on the disk so you can start up next time you turn on the machine.
CPSC 313 – 2019W2 File Systems
65
Fragmentation
• Thisiswhathappenswhenyouhavespaceavailable,butit’snot arranged well or you have space allocated that you cannot use. This can happen in two ways:
• Internalfragmentation:YouaregivenablockofsizeN,butyouonly
need B bytes and B < N.
Your 100 bytes of file 4 KB block
Consider needing to allocate a file of 100 bytes, but the minimum unit the file system gives you is 4 KB.
wasted space
• Externalfragmentation:Youwanttoallocateacontiguousregionof size B bytes (e.g., 16 KB), and while you have B bytes available, they aren’t contiguous.
We need a block this big
Allocated blocks
Total disk Space
CPSC 313
CPSC 313 - 2019W2 File Systems
67
CPSC 313
Allocation Units: Fixed Sized Blocks
• What are the pros and cons of Fixed Sized Blocks?
+ Allocation is easy (all units are the same size)
- If the block size is small, then you have a lot of them, and your index is correspondingly large
- If the block size is large, then you might have a lot of internal fragmentation
- If the blocks aren’t close to each other, you may end up seeking all over the disk to read a file -> slow
File metadata (inode)
CPSC 313 – 2019W2 File Systems
68
Allocation Units: Extents • What are the pros and cons of Extents?
File metadata (inode)
CPSC 313 – 2019W2 File Systems
69
Allocation Units: Extents
• What are the pros and cons of Extents?
+ Can represent files very efficiently (just a few block addresses)
– If the block size is small, then you have a lot of them and your index is correspondingly large
– If the block size is large, then you might have a lot of internal fragmentation
– If the blocks aren’t close to each other, you may end up seeking all over the disk to read a file -> slow
File metadata (inode)
CPSC 313 – 2019W2 File Systems
70
Allocation Units: A Few Block Sizes
• What are the pros and cons of Having a few block sizes?
File metadata (inode)
CPSC 313 – 2019W2 File Systems
71
Allocation Units: A Few Block Sizes
• WhataretheprosandconsofHavingafewblocksizes? + Seems like an ideal compromise between the other two
+ Big blocks give you continuity on disk and a compact index
+ Small blocks limit internal fragmentation
– Diskmanagementismorecomplicated
– Howdoyoupicktheblocksizesandwhentouseeach?
File metadata (inode)
CPSC 313 – 2019W2 File Systems
72
File Representation
• Next,let’slookabitmorecloselyatwhatthemeta-datamightlook like for each of these representations. (First in words, then in pictures.)
– Single extent: Metadata is a single address (and perhaps a length) – A small (fixed) number of extents: Metadata is a few disk addresses
(perhaps with length)
– File is a large number of blocks
• Put blocks together in a linked list: metadata is an address
• Build a large flat index: Metadata is a large array of one address per block/extent • Build a multi-level index (like a multi level page table)
CPSC 313 – 2019W2 File Systems
73
Per-File Metadata • Thinkoftheper-filemeta-dataintwoparts:
– Attributes of the file, e.g., • Size
• Access control
• Timestamps: Last access time, last modified time, create time
– Index structure to map a logical block number to a block on the disk.
• Within a file, blocks are numbered from 0 to the maximum block in the file – these
are called logical block numbers.
• Our goal is to be able to find the right block on disk corresponding to any logical
block number; we are looking for disk addresses or physical block numbers. – Let’s look at some different index structures
CPSC 313 – 2019W2 File Systems
74
File Representation: Single Extent • Pros:
Index
File data
• Cons:
CPSC 313 – 2019W2 File Systems
75
File Representation: Single Extent
• Pros:
– Dead simple
– Good for both sequential and random access
– Very efficient • Cons:
– Inflexible – what happens if a file changes size?
– Have to pre allocate space at create time?
– Dynamic memory management – lots of external fragmentation
Index
File data
CPSC 313 – 2019W2 File Systems
76
File Representation: A Few Extents
• Pros:
– If extents are large you get good disk bandwidth
– Both sequential and random are good (you have to do some work for random)
– Metadataissmall • Cons:
– Lots of design decisions • How big are extents?
• How do you decide?
– Howdoyougrowfiles?
– Externalfragmentation
– Depending on answers to design, might have internal fragmentation
– Could end up with a file too big to represent CPSC 313 – 2019W2 File Systems
78
Index
File data
File data
…
…
CPSC 313
CPSC 313 – 2019W2 File Systems
79
Midterm 2 – redo • Monday March 23rd 5:30 – 6:30
• In response to comments we are:
– Looking at having the start and stop times be a window to reduce
contention issues – details will be provided
– Looking at providing extra time to hand in at the end – we did have that on Monday but we need to communicate that better
• Coverageexactlyasoriginallyscheduled
– In particular there will be nothing from Module 4 – File systems
– As announced the, number format, and breakdown of questions will be exactly as it was for the March 16th (e.g. there will be 5 questions on branch prediction)
– The cancelled March 16th exam will be very good practice
CPSC 313
CPSC 313 – 2019W2 File Systems
80
Temporary office hours • Today I’ll stay online until 12:20
• Normally Friday 4:00 – 5:00 – today 4:30 to 5 via Collaborate Ultra.
• Monday TBD but probably until 12:30
CPSC 313 – 2019W2 File Systems
81
Today
• More on representing files:
– Index structures for fixed-sizes block systems. – System calls for accessing file meta-data.
– On-disk meta-data structures.
• And maybe finally: naming!
CPSC 313
CPSC 313 – 2019W2 File Systems
82
We were here!
Posix API: hierarchical name space, byte-streams, open, close, read, write
open
close
read write
Map: Name to file metadata
Allocate and manage file descriptors
Map: fd to in- memory object
Find location of file metadata
Read/write data from/to disk Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
Read/write metadata
Map: memory object to disk blocks
CPSC 313
CPSC 313 – 2019W2 File Systems
83
Terminology (clarifying confusion from class)
• We’ll use the term extent to mean a variable-sized disk allocation.
– It is a single allocation on disk, so it has a starting disk address and a length (number of disk sectors).
• We’ll use the term block to mean a fixed-size unit of allocation.
– A file system may choose to have multiple block sizes, but that suggests some
small number of sizes, not the (almost) infinite number of variable-sized extents.
• In an extent-based system, you can have either:
– 1 extent per file
– Multiple (but typically a small, fixed number) extents per file
• In a block-based system, you pretty much must support multiple blocks per file and possibly a lot of blocks to produce a large file.
CPSC 313
CPSC 313 – 2019W2 File Systems
84
File Representation: Linked Blocks • Pros:
Index
• Cons:
CPSC 313 – 2019W2 File Systems
85
Index
• Pros:
File Representation: Flat Index
• Cons:
CPSC 313 – 2019W2 File Systems
87
File Representation: Flat Index • Pros:
– Don’t waste space in unallocated blocks.
– Sequential and Random access are easy.
• Cons:
– Still have to know maximum file size
– Will pay lots of seeks between blocks.
Index
at creation.
CPSC 313 – 2019W2 File Systems
88
File Representation: Multi-level Index • Pros:
Index
• Cons:
CPSC 313 – 2019W2 File Systems
89
File Representation: Multi-level Index
Index
• Pros:
– Simple
– Although there is a maximum file size, it’s really big
• Cons:
– Inefficient for small files (always have N
levels of indexing)
– May require seeks between reads of adjacent blocks.
CPSC 313 – 2019W2 File Systems
90
File Representation: Hybrid Index • Pros:
Index
• Cons:
CPSC 313 – 2019W2 File Systems
91
…… …
File Representation: Hybrid Index
• Pros:
– Simple
– No wasted space for unallocated blocks.
– Efficient for small files.
– Although there is a maximum file size, it’s really big
• Cons:
– Can be inefficient for large files
– May require seeks between reads of adjacent blocks.
Index
CPSC 313 – 2019W2 File Systems
92
…… …
File Structure Summary
Index
Extent-based
Linked Indexed
Hybrid Index
Index
Index
Index
CPSC 313 – 2019W2 File Systems
93
…
… …
…
CPSC 313
Monday March 23
Please standby
Today:
– Explain how a file system translates a pathname to one or more disk accesses (that is, how it finds the associated item on disk).
– Use system calls to read directory entries.
– Use system calls to read file system metadata.
– Given the contents of a disk, identify the sequence of disk blocks accessed to resolve a pathname.
CPSC 313 – 2019W2 File Systems
94
Midterm 2 – 5:30 • ChecktheCanvasannouncement
• Coupleofthings
– We are providing you with a 30 minute start window 5:30 – 6:00 (If you
start after 6:00 you will only have until 7:00 to finish)
• You have 60 minutes from when you start taking the midterm to – Download the exam
– Complete the exam
– Save the exam as a PDF
– Add the exam to your repo
• After committing the PDF to your repo you have an additional 10 minutes to do the
push
– We are reusing your MT2_* repo from last week make sure to update your repo and clean out the cruft – instructions are in the announcement
CPSC 313 – 2019W2 File Systems
95
Today • Today’s Learning Outcomes
– Explain how a file system translates a pathname to one or more disk accesses (that is, how it finds the associated item on disk).
– Use system calls to read directory entries.
– Use system calls to read file system metadata.
– Given the contents of a disk, identify the sequence of disk blocks accessed to resolve a pathname.
CPSC 313 – 2019W2 File Systems
96
You are here!
Posix API: hierarchical name space, byte-streams, open, close, read, write
open
close
read write
Map: Name to file metadata
Map: fd to in- memory object
Read/write metadata
Allocate and manage file descriptors
Find location of file metadata
Map: memory object to disk blocks
Read/write data from/to disk Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
CPSC 313 – 2019W2 File Systems
97
Examining a file’s metadata
• POSIX provides two* APIs to obtain information about a file’s metadata.
– int stat(const char *restrict path, struct stat *restrict buf);
– int fstat(ind fd, struct stat *buf);
• Returns (in the passed structure) the metadata for a file. – stat lets you access a file by name.
– fstat lets you access the file metadata by file descriptor.
* Actually more than two, but several are variants of the two I’ll introduce; feel free to use man pages to explore.
CPSC 313 – 2019W2 File Systems
98
struct stat { dev_t ino_t
st_dev;
st_ino;
/* device inode resides on */
/* inode’s number */
Let’s look at the stat structure
/* inode protection mode */
/* number of hard links to the file */
/* user-id of owner */
/* group-id of owner */
/* device type, for special file inode */
mode_t
nlink_t
uid_t
gid_t
dev_t
struct timespec st_atimespec; /* time of last access */
struct timespec st_mtimespec; /* time of last data modification */ struct timespec st_ctimespec; /* time of last file status change */
st_mode; st_nlink; st_uid; st_gid; st_rdev;
};
off_t
quad_t
u_long
u_long
u_long
st_size; /* file size, in bytes */
st_blocks; /* blocks allocated for file */
st_blksize;/* optimal file sys I/O ops blocksize */ st_flags; /* user defined flags for file */ st_gen; /* file generation number */
CPSC 313 – 2019W2 File Systems
99
What is the Problem we Have to Solve?
• Given a name, e.g., /Users/margo/UBC/TEACHING/313/midterm-with-answers.docx, how do we create a vnode for it?
• Constraints:
– All data and metadata has to live on the disk/SSD (which are accessed in block- sized units).
• You can read it into memory and cache it, but when you turn off your machine, the data and metadata need to be on the disk so you can start up next time you turn on the machine.
– We will look up names a component at a time. That is, you map each piece of the path separately, i.e.,
• Find “Users” in the “/” directory
• Find “margo” in the “Users” directory • Etc.
CPSC 313 – 2019W2 File Systems
100
Questions we need to answer
• Assuming that you need to implement this hierarchical name space (i.e., directories & files).
– How will we find the root directory (“/”)?
– How will we represent a directory?
– How will we support traversing up a directory tree (i.e., if you are in the directory /Users/margo, how do we get back to /Users)?
CPSC 313 – 2019W2 File Systems
101
• One directory for the entire disk (file system).
• Pros: • Cons:
• Small maximum name size.
• Set maximum number of files
at creation time. • Implementation:
– Pre-allocate space for the directory when you create the file system.
– The directory is essentially a big array of structures:
• char name[max-file-name- size];
• Either an actual file representation OR an id that easily maps to the file representation.
Naïve Naming
CPSC 313 – 2019W2 File Systems
102
• One directory for the entire disk (file system).
• Pros:
• Really simple
• Cons:
• Small maximum name size.
• Set maximum number of files
at creation time. • Implementation:
• •
Difficult to organize data No two objects may have the same name.
On a multi-user system, users might have name collisions
Names are limited.
– Pre-allocate space for the directory when you create the file system.
– The directory is essentially a big array of structures:
Naïve Naming
• size]; •
• char name[max-file-name-
• Either an actual file representation OR an id that easily maps to the file representation.
CPSC 313 – 2019W2 File Systems
103
• Generalized tree structure
– Directoriesareregularfileswithaspecial
– A bit in the file meta-data indicates that a file is of type directory.
– Adirectoryentryissimplyamapping between names and an inode number (a collection of name/value pairs).
• User programs can read directories just like they read files.
• Pros: • Cons:
format.
Hierarchical Naming
• Only the operating system can write directories (wouldn’t want a user to corrupt
the directory structure)
/
bin mach_kernel ls … cat
bin etc
usr
… lib
CPSC 313 – 2019W2 File Systems
104
• Generalized tree structure
– Directories are regular files with a
– A bit in the file meta-data indicates that a file is of type directory.
– A directory entry is simply a mapping between names and a file index (a collection of name/value pairs).
• User programs can read directories just like they read files.
• Only the operating system can write directories (wouldn’t want a user to corrupt the directory structure)
bin
ls … cat
• Pros:
• Makes names local to a directory (you and I can both have a file named “midterm2.txt”).
• Reuses whatever implementation we have for files for directories.
• Cons:
special format.
Hierarchical Naming
/ mach_kernel
•
Lookup is an iterative (and potentially expensive) operation.
usr
bin
etc
… lib
CPSC 313 – 2019W2 File Systems
105
Traditional Directory Implementation
• Directories are represented like files.
• Contents of directories are structured (struct dirents).
– Name
– Inode number (inumber) – Type
• Directories grow in chunks of dirents that fit on a single disk block.
• Root directory has a designated inode. CPSC 313 – 2019W2 File Systems
106
•
The Root Directory
This is the contents of the “/” directory on Margo’s old machine.
Name inumber
Applications 113 Documents 937803 Network 84416 Users 38892 cdrom 937840 etc 25116 net 3
sbin 4512 usr 40
.. 2
Name
Desktop Folder Library
System Volumes
cores home opt sw var
inumber
844727
213
37
26447
84418
5
937844
1024168
25156
Name inumber
Developer 844731 Marketocracy 937813 Updaters 937816 bin 24377 dev 296 mach_kernel 552433 private 214 tmp 25155
. 2
CPSC 313 – 2019W2 File Systems
107
Walking a Directory Path
• Given a path /C1/C2/C3 …
– Start at the root directory (a designated directory with a designated inumber).
1. Let inum = root directory inumber; current component = C1 2. Readthedirectorydataforinum
3. Find the entry with the name equal to the current component 4. Find the associated inumber
5. Read the inode for that inumber
– If it’s not a directory and this is the last component of the path, we’re done.
– If it’s not a directory and we have more components, then this is a bad pathname.
– If it is a directory, set inum to the inumber; set current component to next part of path and iterate back to step 2.
CPSC 313 – 2019W2 File Systems
108
Assume:
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains
8-15, etc.
• There are 100 blocks of inodes
Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
Disk block number
100 101 102 … 200
201
202
203
204
Contents
Directory Example
Data Blocks inodes
CPSC 313 – 2019W2 File Systems
109
200 202 203
204 205
.,2 usr, 16
., 11
is in
.,8
csh, 105 .,9
font, 77 ., 16 share, 52
.., 2 boot, 35
.., 2 this file .., 2
.., 16
.., 2 ucb, 15
bin, 8 kadb, 27
Some text ls, 91 libc.a, 55
lib, 9 old, 66
Directory Example (1)
Disk block number
Contents
100
200
101
202
203
102
204
205
…
200
., 2
.., 2
bin, 8
usr,16
boot, 35
kadb, 27
201
., 11
.., 2
Some text
is in
this file
202
., 8
.., 2
ls, 91
csh, 105
203
., 9
.., 16
libc.a, 55
font, 77
204
., 16
.., 2
lib, 9
share, 52
ucb, 15
old, 66
Assume:
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains
8-15, etc.
• There are 100 blocks of inodes
Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
1. The root inode has inumber 2, so we go to the first inode block (100) and access the third inode in that block.
2. Our picture indicates that that inode says the corresponding object (the root directory) is in disk block 200.
CPSC 313 – 2019W2 File Systems
110
Data Blocks inodes
Directory Example (2)
Disk block number
Contents
100
200
101
202
203
102
204
205
…
200
., 2
.., 2
bin, 8
usr,16
boot, 35
kadb, 27
201
., 11
.., 2
Some text
is in
this file
202
., 8
.., 2
ls, 91
csh, 105
203
., 9
.., 16
libc.a, 55
font, 77
204
., 16
.., 2
lib, 9
share, 52
ucb, 15
old, 66
Assume:
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains
8-15, etc.
• There are 100 blocks of inodes
Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
3. Reading block 200 means we are reading the actual contents of the directory.
4. We look for an entry with the name “usr”; it is inumber 16.
CPSC 313 – 2019W2 File Systems
111
Data Blocks inodes
Assume:
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains
8-15, etc.
• There are 100 blocks of inodes
Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
5. Inumber 16 is the 17th inode, so it lives in the 3rd inode block.
6. The 17th inode would be the 1st in this block; it says that the corresponding object can be found at block 204.
Disk block number
100 101 102 …
200
201
202
203
204
Contents
Directory Example (3)
200 202 203
204 205
.,2 usr, 16
., 11 is in
.,8
csh, 105
.,9 font, 77
., 16 share, 52
.., 2 boot, 35
.., 2 this file
.., 2
.., 16
.., 2 ucb, 15
bin, 8 kadb, 27
Some text
ls, 91
libc.a, 55
lib, 9 old, 66
Data Blocks inodes
CPSC 313 – 2019W2 File Systems
112
Directory Example (4)
Disk block number
Contents
100
200
101
202
203
102
204
205
…
200
., 2
.., 2
bin, 8
usr,16
boot, 35
kadb, 27
201
., 11
.., 2
Some text
is in
this file
202
., 8
.., 2
ls, 91
csh, 105
203
., 9
.., 16
libc.a, 55
font, 77
204
., 16
.., 2
lib, 9
share, 52
ucb, 15
old, 66
Assume:
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains
8-15, etc.
• There are 100 blocks of inodes
Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
7. Block 204 contains the contents of the /usr directory.
8. We need to find “lib.” It has inumber 9.
CPSC 313 – 2019W2 File Systems
113
Data Blocks inodes
Directory Example (5)
Assume:
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains
8-15, etc.
• There are 100 blocks of inodes
Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
9. Inumber 9 is the 10th inumber; it will be in the second block, 101.
10. The 10th inumber is the second in this block; its contents are in block 203.
Disk block number
100 101 102 …
200
201
202
203
204
Contents
200 202 203
204 205
.,2 usr, 16
., 11 is in
.,8
csh, 105
.,9 font, 77
., 16 share, 52
.., 2 boot, 35
.., 2 this file
.., 2
.., 16
.., 2 ucb, 15
bin, 8 kadb, 27
Some text
ls, 91
libc.a, 55
lib, 9 old, 66
Data Blocks inodes
CPSC 313 – 2019W2 File Systems
114
Directory Example (6)
Disk block number
Contents
100
200
101
202
203
102
204
205
…
200
., 2
.., 2
bin, 8
usr,16
boot, 35
kadb, 27
201
., 11
.., 2
Some text
is in
this file
202
., 8
.., 2
ls, 91
csh, 105
203
., 9
.., 16
libc.a, 55
font, 77
204
., 16
.., 2
lib, 9
share, 52
ucb, 15
old, 66
Assume:
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains
8-15, etc.
• There are 100 blocks of inodes
Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
11. Block 203 contains the contents of the /usr/lib directory.
12. We want to find the entry for libc.a, which has inumber 55.
CPSC 313 – 2019W2 File Systems
115
Data Blocks inodes
Directory Example (7)
Disk block number
Contents
100
200
101
202
203
102
204
205
…
200
., 2
.., 2
bin, 8
usr,16
boot, 35
kadb, 27
201
., 11
.., 2
Some text
is in
this file
202
., 8
.., 2
ls, 91
csh, 105
203
., 9
.., 16
libc.a, 55
font, 77
204
., 16
.., 2
lib, 9
share, 52
ucb, 15
old, 66
Assume:
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains
8-15, etc.
• There are 100 blocks of inodes
Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
13. We would then need to go read the inode corresponding to inumber 55. (It would be in what block?)
14. How many IOs did we issue in total?
CPSC 313 – 2019W2 File Systems
116
Data Blocks inodes
Assume: Directory Example (8)
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains 8-15, etc.
• There are 100 blocks of inodes Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
13. We would then need to go read the inode corresponding to inumber 55. (It would be in what block?) 55 / 8 = 7 (55 is the last inode in the 7th block, and that block would be #106)
14. How many IOs did we issue in total? 7: root inode (100), root dir (200), /usr inode (102), /usr dir (204), lib inode (101), lib dir (203), libc.a inode (106)
Disk block number
100 101 102 … 200
201
202
203
204
Contents
Data Blocks inodes
CPSC 313 – 2019W2 File Systems
117
200 202 203
204 205
.,2 usr, 16
., 11
is in
.,8
csh, 105 .,9
font, 77 ., 16 share, 52
.., 2 boot, 35
.., 2 this file .., 2
.., 16
.., 2 ucb, 15
bin, 8 kadb, 27
Some text ls, 91 libc.a, 55 lib, 9
old, 66
CPSC 313
• Today
– More directory fun
– Deeper dive into file sizes and blocks
– Case Studies –
• Unix V6 file system • Ext2fs
CPSC 313 – 2019W2 File Systems
118
Admin
• Midterm 2 returned via gradescope – average 80% • Assignment 3 due Friday
• Yes there will be an A4
CPSC 313 – 2019W2 File Systems
119
Assume: Directory Example (8)
• Inode 2 is in disk block 100
• Inodes fit 8 to the block
• Block 100 contains inodes 0-7, 101 contains 8-15, etc.
• There are 100 blocks of inodes Exercise:
List all the blocks, in order that you need to read to open /usr/lib/libc.a
13. We would then need to go read the inode corresponding to inumber 55. (It would be in what block?) 55 / 8 = 7 (55 is the last inode in the 7th block, and that block would be #106)
14. How many IOs did we issue in total? 7: root inode (100), root dir (200), /usr inode (102), /usr dir (204), lib inode (101), lib dir (203), libc.a inode (106)
Disk block number
100 101 102 … 200
201
202
203
204
Contents
Data Blocks inodes
CPSC 313 – 2019W2 File Systems
120
200 202 203
204 205
.,2 usr, 16
., 11
is in
.,8
csh, 105 .,9
font, 77 ., 16 share, 52
.., 2 boot, 35
.., 2 this file .., 2
.., 16
.., 2 ucb, 15
bin, 8 kadb, 27
Some text ls, 91 libc.a, 55 lib, 9
old, 66
More Directory Fun
• In POSIX, every directory has two special entries “.” and “..”.
– The “.” directory refers to the directory itself.
– The “..” directory refers to the parent directory.
– This is how the file system implements paths such as ../asst2.
• It is possible for more than one directory entry to refer to a single file.
– Hardlink:thesameinumberappearsintwodifferentdirectories.Thereferencecountforthe inumber is incremented.
• Could you create a hard link between two directories in different file systems?
• When you remove (unlink) a file, you decrement its reference count and remove a name from a directory.
When the reference count goes to zero, the file’s blocks are actually freed. – Soft link (symbolic link): file that contains the name of another file.
• Files of this sort are identified by a bit in their file descriptor.
• When the OS encounters a symbolic link, it continues pathname resolution using the pathname that appears
in the file.
• Can you create a soft link between two directories?
• What is the minimum link count for a directory?
CPSC 313 – 2019W2 File Systems
121
In POSIX, every directory has two special
entries “.” and “..”.
• The “.” directory refers to the directory 101
200 202 203
itself.
• The “..” directory refers to the parent
directory.
This is how the file system
implements paths such as ../old.
102 204 205 …
200 .,2
usr, 16
201 ., 11 is in
202 .,8
csh, 105
203 .,9 font, 77
204 ., 16 share, 52
.., 2 boot, 35
.., 2 this file .., 2
.., 16
.., 2 ucb, 15
bin, 8 kadb, 27
Some text ls, 91 libc.a, 55 lib, 9
old, 66
More Directory Fun
Disk block number
100
Contents
Data Blocks inodes
CPSC 313 – 2019W2 File Systems
122
More Directory Fun
• It is possible for more than one directory entry to refer to a single file.
– Hard link: the same inumber appears in two different directories. The reference count for the inumber is incremented.
• Could you create a hard link between a directory and a file on a different file system?
• Could you create a hard link from a directory to another directory on the same file system?
• When you remove (unlink) a file, you decrement its reference count and remove a name from a directory. When the reference count goes to zero, the file’s blocks are actually freed.
– Soft link (symbolic link): file that contains the name of another file.
• Files of this sort are identified by a bit in the i_mode filed of an ext2 inode.
• When the OS encounters a symbolic link, it continues pathname resolution using the pathname that appears in the file. (remember an inode is the starting point of a file)
• Can you create a soft link between two directories?
• What is the minimum link count for a directory?
CPSC 313 – 2019W2 File Systems
123
Working Directory
• It is cumbersome (and inefficient for the OS) to use full pathnames every time you reference a file.
• POSIX maintains a single “current working directory” (cwd) for each process. The inumber of the cwd is stored in a structure that the operating system maintains for each user process.
• When the OS wants to translate a name to an inumber, it looks at the first character in the path. If that character is “/”, the OS begins looking at the root. If it is not a path, the OS begins looking in the current directory.
• Some systems allow you to have more than one current working directory. The list of directories that are in the “current working directory set” are called a search path. (This is also a feature supported by most shells.)
CPSC 313 – 2019W2 File Systems
124
};
15 blocks:
12 data blocks
1 indirect block
1 double indirect block 1 triple indirect block
The on-disk structure (from Linux ext2)
struct ext2_inode { __le16 i_mode;
/* File mode */
/* Low 16 bits of Owner Uid */ /* Size in bytes */
/* Access time */
/* Creation time */
/* Modification time */
/* Deletion Time */
/* Low 16 bits of Group Id */ /* Links count */
/* Blocks count */
/* File flags */
Why both a block count and size? Isn’t size enough?
__le16 i_uid;
__le32 i_size;
__le32 i_atime;
__le32 i_ctime;
__le32 i_mtime;
__le32 i_dtime;
__le16 i_gid;
__le16 i_links_count;
__le32 i_blocks;
__le32 i_flags;
__le32 /* Machine dependent field here */
__le32 i_block[EXT2_N_BLOCKS];/* Pointers to blocks */
__le32 i_generation; __le32 i_file_acl; __le32 i_dir_acl; __le32 i_faddr;
/* File version (for NFS) */ /* File ACL */
/* Directory ACL */
/* Fragment address */
/* More OS dependent stuff here. */
CPSC 313 – 2019W2 File Systems
125
• Today
– Admin stuff
– Case Studies –
• Unix V6 file system • Ext2fs
CPSC 313
CPSC 313 – 2019W2 File Systems
126
Admin
• Office hours
– After each class until 12:20 (or 12:30 if there are still people with
questions)
– Friday 16:30 – 17:00
– Margo’s hours remain unchanged
CPSC 313 – 2019W2 File Systems
127
New Course Grading Schemes
Science courses required to develop 2 new course grading schemes that meet the following requirements:
• Agradingschemethatweightsthefinalexam5%
• Agradingschemethatweightsthefinalexam30%
• Thefinalexamcannotbeworth0
• Otherassessmentsinthecoursecannotbereducedto0
• Anyrequirementtopassthefinaltopassthecoursemustbe removed
In Computer Science we also have to receive the approval of our Associate Head for Undergraduate Operations. We have done that.
CPSC 313 – 2019W2 File Systems
128
Other Things
• Your grade will be computed using both schemes and you will get the best of the two. You do not have to select one.
• If you are a Science student you will also have other options sent to you in a letter
• Non-Science students check with your home faculty to determine if there are other options available to you
CPSC 313 – 2019W2 File Systems
129
The 30% Option
• Assignments (4) 35%
• Midterms (2) 28%
• Quizzes (best 5 of 6) 7%
• Final
30% (Do not have to pass the final to pass the course)
• With respect to the assignment component, each assignment will be normalized so that each assignment is out of the same amount. The assignment component will be computed by weighting the best 3 assignments 10% each and the weakest assignment 5%,
• With respect to assignments, a grade of 17.5 out of the 35 on the assignment component of the course must be achieved to pass the course. If this requirement is not met the grade will be the lower of 45% or the computed course grade.
CPSC 313 – 2019W2 File Systems
130
The 5% Option
• Assignments (4) 40%
• Midterms (2) 45%
• Quizzes (best 5 of 6) 10%
• Final
5% (Do not have to pass the final to pass the course)
• With respect to the assignment component, each assignment will be normalized so that each assignment is out of the same amount. The assignment component will be computed by weighting the best 3 assignments 11% each and the weakest assignment 7%,
• With respect to assignments, a grade of 20 out of the 40 on the assignment component of the course must be achieved to pass the course. If this requirement is not met the grade will be the lower of 45% or the computed course grade.
CPSC 313 – 2019W2 File Systems
131
You are here!
Posix API: hierarchical name space, byte-streams, open, close, read, write
open
close
read write
Map: Name to file metadata
Allocate and manage file descriptors
Map: fd to in- memory object
Find location of file metadata
Read/write data from/to disk Persistent storage: Numbered disk blocks, checksums and ECC, bad block handling
Read/write metadata
Map: memory object to disk blocks
CPSC 313 – 2019W2 File Systems
132
Case Study – The Unix v6 File System
• What is v6?
– One of the first widely distributed versions of Unix (1975)
– Ran on PDP-11 (a minicomputer from Digital Equipment Corporation) – Released as full source code
– Documented by John Lions of the University of New South Wales.
• Why V6: simple, but ancestor of most modern file systems. • Overview
• Diskrepresentation
• Directory structure
• Recoverycharacteristics
• With enormous thanks to: – Ken Thompson and Dennis Ritchie
– John Lions (https://en.wikipedia.org/wiki/Lions’_Commentary_on_UNIX_6th_Edition,_with_Source_Code) – Keith Bostic
– Whoever is behind: http://v6.cuzuco.com/
CPSC 313 – 2019W2 File Systems
133
Overview
• Disk is divided up into 512-byte blocks
– Block 0: boot block
– Block 1: superblock (struct filsys)
– Blocks 2 – f(Ninodes): inodes (16 inodes/block)
– Rest of disk contains file data (and spare blocks for “bad block” handling)
• FreeSpacemanagement
– Up to 99 blocks, referenced directly in the superblock.
– 1 block as the head of a linked list of blocks containing addresses of other free blocks (pictures coming).
CPSC 313 – 2019W2 File Systems
134
File system level metadata
• struct filsys
– Today, we call this the superblock
– Created when you create the file system – Read when you mount the file system
struct filsys { int s_isize; int s_fsize; int s_nfree;
int s_free[100]; int s_ninode;
int s_inode[100]; char s_flock; char s_ilock; char s_fmod;
char s_ronly; int s_time[2]; int pad[50];
}
/* size in blocks of the I list */
/* size in blocks of the entire volume */ /* number of in core free blocks
(between 0 and 100) */
/* in core free blocks */
/* number of in core I nodes (0-100) */ /* in core free I nodes */
/* lock during free list manipulation */ /* lock during I list manipulation */
/* super block modified flag */
/* super block read-only flag */
/* current date of last update */
CPSC 313 – 2019W2 File Systems
135
Disk Layout
Block numbers
0 1 2 – ( 1 + f . s _ i s i z e ) ( 2 + f . s _ i s i z e ) – …( f . s _ f s i z e – 1 ) …
Boot block
Super block
inode blocks data blocks
CPSC 313 – 2019W2 File Systems
136
Free Space Management
Super block
s_free
…
Block numbers of free blocks
s_nfree
nfree
next in chain free block number free block number free block number free block number
Number (up to 100 Block numbers in this block)
…
CPSC 313 – 2019W2 File Systems
137
free block number
Per-file Metadata (on-disk) • On disk inode:
struct ino {
int i_mode;
/* File mode */
/* Link count */
/* Owner user id */
/* Group id */
/* most significant of size */ /* least sig */
/* Disk addresses of blocks */ /* Access time */
/* Modified time */
}
char i_nlink; char i_uid; char i_gid; char i_size0; char *i_size1; int i_addr[8]; int i_atime[2]; int i_mtime[2];
CPSC 313 – 2019W2 File Systems
138
Per-file Metadata (in-memory) • In-memoryinode:
struct inode {
char i_flag;
char i_count;
int i_dev;
int i_number;
int i_mode;
char i_nlink;
char i_uid;
char i_gid;
char i_size0;
char *i_size1;
int i_addr[8];
int i_lastr;
/* reference count */
/* device where inode resides */
/* i number 1:1 w/device addr */
/* directory entries*/
/* owner */
/* group of owner */
/* most significant of size */
/* least sig */
/* device addresses constituting file */
/* last logical block read */
}
CPSC 313 – 2019W2 File Systems
139
Different sized files (1)
ino
flag
…
daddr[0]
daddr[1]
daddr[2]
daddr[3]
daddr[4]
daddr[5]
daddr[6]
daddr[7]
Zero in file size bit here indicates “small” file
Small file:
• Fewer than 8 blocks
• Daddr contains addresses
of data blocks. • Max size:
• 512*7=3.5KB
CPSC 313 – 2019W2 File Systems
140
Different sized files (2)
ino
flag
…
daddr[0]
daddr[1]
daddr[2]
daddr[3]
daddr[4]
daddr[5]
daddr[6]
daddr[7]
One in file size bit here indicates “large” file
Large file:
• No more than 7 entries
• Daddr contains addresses of
indirect blocks. • Max size:
• 7*256*512=889KB
CPSC 313 – 2019W2 File Systems
141
Different sized files (3)
ino
flag … daddr[0] daddr[1] daddr[2] daddr[3] daddr[4] daddr[5] daddr[6] daddr[7]
One in file size bit here indicates “large” file
…
CPSC 313 – 2019W2 File Systems
142
Huge file:
• 7 indirect blocks addresses in
Daddr, AND
• Daddr[8] contains a double
indirect block.
• Max size:
• 7*256*512=889KB+ • 256*256*512=
• 16MB+889KB
Directory Entries
• Hierarchical directory structure that you know and love, including “.” and “..”.
• Directory entries are 16 bytes: – 2 bytes of inode number
– 14 bytes (right padded) of name
• A directory entry with inode = 0 is unused CPSC 313 – 2019W2 File Systems
143
The Linux ext2 File System • SecondExtendedFileSystem(1993)
• Whyext2:simple,moremodernthanv6,basisofext3,ext4 • Overview
• Disk representation
• Directory structure
• Recovery characteristics
• With enormous thanks to:
– https://www.nongnu.org/ext2-doc
– https://wiki.osdev.org/Ext2
– Kevin Elphinstone of UNSW: https://cgi.cse.unsw.edu.au/~cs3231/18s1/lectures/lect10.pdf
CPSC 313 – 2019W2 File Systems
144
Overview
• Disk is divided up into blocks – size is specified when you create the file system (1 KB, 2KB, 4KB, or 8KB; in theory they can be larger).
– Superblock: 1024 bytes located at byte 1024 from the beginning of the volume. • Contents are similar to what we saw in v6.
• In ext2, this structure is called the struct ext2_super_block
– Afterthesuperblock,wehaveablockdescriptortable,whichdescribestheBlockGroups.Ittells you where to find things in the block groups.
– The disk is divided into Block Groups, to facilitate placing file data close together. A block group contains:
• Superblock copy [In every group in version 0; in some groups in later versions] • Group descriptor table copy [See superblock]
• Data Block bitmap (1 block)
• Inode bitmap (1 block)
Limits size of block groups
• Inode table (contains the number of entries that the superblock says are allowed per group) • Data blocks (rest of the block group)
CPSC 313 – 2019W2 File Systems
145
File system level metadata
• Struct ext2_super_block
• Similar to what we saw in v6 (and pretty much every other file system).
• A few ext2-specific things: – s_blocks_per_group
– s_inodes_per_group – s_log_block_size
CPSC 313 – 2019W2 File Systems
146
Disk Layout
Copies of superblock and block descriptor table
File system with blocksize = 1 KB
Block numbers:
0 1
2
…
…File system with blocksize > 1 KB
147
Block Group 0
Block Group 1
Block Group 2
Reserved
Block descriptor table
Super block
01
Block Group 0
Block Group 1
Block Group 2
Reserved Superblock
table
Block descriptor
CPSC 313 – 2019W2 File Systems
Only in groups 0, 1, and multiples of 3, 5, 7
Data block bitmap
…
inode table
…
data blocks
Super block Block descriptor table
Inode bitmap
CPSC 313
CPSC 313 – 2019W2 File Systems
148
Disk Layout: Block Groups
CPSC 313 – 2019W2 File Systems
149
CPSC 313 – Finish off ext2 file system case study
• Today
– Virtual memory and program isolation
CPSC 313 – 2019W2 File Systems
150
Admin
• Assignment 4 is out:
– Due April 10 23:59
– Can be done with a partner (encouraged to do so)
• Quiz 5 (#6) available Friday, due Monday at standard times • Final exam time:
– We have been instructed that it is to be given at the scheduled time regardless of time zone you are in
– Unfortunately that may be a very inconvenient time for some who have left the Vancouver area
CPSC 313 – 2019W2 File Systems
151
Quick Overview of things
Block Group 0 Block Group 1
… Block Group n-2 Block Group n-1
Inode Inode Table Data Blocks Bitmap
Super Block
Block Block Descriptor Bitmap
CPSC 313 – 2019W2 File Systems
152
Free Space Management
• How do you determine which blocks on the disk can be allocated to a new file?
• The bitmaps in the block groups tell you:
– Each bit represents a block
– Bit = 1 says the block is in use; bit = 0 says the block is available.
CPSC 313 – 2019W2 File Systems
153
Per-file Metadata (on-disk)
struct ext2_inode {
__le16 i_mode;
/* File mode */
/* Low 16 bits of Owner Uid */
/* Size in bytes */
/* Access time */
/* Creation time */
/* Modification time */
/* Deletion Time */
/* Low 16 bits of Group Id */
__le16 i_uid;
__le32 i_size;
__le32 i_atime;
__le32 i_ctime;
__le32 i_mtime;
__le32 i_dtime;
__le16 i_gid;
__le16 i_links_count; /* Links count */
__le32 i_blocks; /* Blocks count */
__le32 i_flags; /* File flags */
__le32 /* Machine dependent field here */
__le32 i_block[EXT2_N_BLOCKS];/* Pointers to blocks */
__le32 i_generation; /* File version (for NFS) */
__le32 i_file_acl; /* File ACL */
15 blocks:
12 data blocks
1 indirect block
1 double indirect block 1 triple indirect block
};
__le32 i_dir_acl; /* Directory ACL */
__le32 i_faddr; /* Fragment address */
/* More OS dependent stuff here. */
CPSC 313 – 2019W2 File Systems
154
Per-file Metadata (in-memory)
struct ext2_inode_info {
__le32 i_data[15];
__u32 i_flags;
__u32 i_faddr;
__u8 i_frag_no;
__u8 i_frag_size;
__u16 i_state;
__u32 i_file_acl;
__u32 i_dir_acl;
__u32 i_dtime;
__u32 i_bock_group;
struct ext2_block_alloc_info *i_block_alloc_info; __u32 i_dir_start_lookup;
rwlock_t i_meta_lock; struct mutex truncate_mutex; struct inode vfs_inode; struct list_head i_orphan;
}
From disk
In memory only
CPSC 313 – 2019W2 File Systems
155
daddr[0] daddr[1] daddr[2] daddr[3] daddr[4] daddr[5] daddr[6] daddr[7] daddr[8] daddr[9] daddr[10] daddr[11] daddr[12] daddr[13] daddr[14]
Standard Hybrid Index
…
……… …………
… ……
CPSC 313 – 2019W2 File Systems
156
Directory Entries
• Hierarchical directory structure that you know and love, including “.” and “..”.
• A directory contains a linked list of directory entry structures:
struct ext2_dir_entry { __le32 inode;
/* Inode number */
/* Directory entry length */ /* Name length */
/* File name */
Later versions of ext2 split the name_len into an 8-bit name length and an 8-bit file type
}
__le16 rec_len; __le16 name_len; char name[];
• Directory entries are 4-byte aligned. CPSC 313 – 2019W2 File Systems
157
Inode = 1234 Rec_len = 12 Name_len = 1 “.”
Building a Directory
Inode = 2 Rec_len = 1012 Name_len = 2 “..”
Block allocated to a directory
CPSC 313 – 2019W2 File Systems
158
Inode = 1234 Rec_len = 12 Name_len = 1 “.”
Inode = 2 Rec_len = 10212 Name_len = 2 “..”
Let’s create file: fuzzer *
Inode = 1235 Rec_len = 1?000 Name_len = 6 “Fuzzer ”
Building a Directory
Block allocated to a directory
* Fuzzer is one of the Margo’s cats
CPSC 313 – 2019W2 File Systems
159
Exercise
• Critique this file system design:
1. Howwelldoesithandlesequentialaccess?
2. Howwelldoesithandlerandomaccess?
3. Whatkindsoffreespacemanagementproblemscanyouforesee? 4. Wouldthisworkwellforasystemofmanytinyfiles?
5. Howaboutasystemofmanyhugefiles?
6. Howwouldyoucomparethistov6?
CPSC 313 – 2019W2 File Systems
160