CS考试辅导 CS 111 Summer 2022

CS 111 Summer 2022
Lecture 13 Page 1
Operating System Principles: File Systems
Operating Systems

Copyright By PowCoder代写 加微信 powcoder

• File systems:
– Why do we need them?
– Why are they challenging?
• Basic elements of file system design
• Example file systems – DOS FAT
– Unix System V file system
CS 111 Summer 2022
Lecture 13 Page 2

Introduction
• Most systems need to store data persistently
– So it’s still there after reboot, or even power down
• Typically a core piece of functionality for the
– Which is going to be used all the time
• Even the operating system itself needs to be stored this way
• So we must store some data persistently
CS 111 Summer 2022
Lecture 13 Page 3

Our Persistent Data Options
• Use raw storage blocks to store the data
– On a hard disk, flash drive, whatever
– Those make no sense to users
– Not even easy for OS developers to work with
• Use a database to store the data
– Probably more structure (and possibly overhead)
than we need or can afford
• Use a file system
– Some organized way of structuring persistent data
CS 111 – Which makes sense to users and programmers Summer 2022
Lecture 13 Page 4

File Systems
Originallythecomputerequivalentofaphysical filing cabinet

Organizedbysomesimpleprinciple – E.g., alphabetically by title
– Or chronologically by date
Goalistoprovide:
– Persistence
– Ease of access
– Good performance Summer 2022
Lecture 13 Page 5

The Basic File System Concept • Organize data into natural coherent units
– Like a paper, a spreadsheet, a message, a program
• Store each unit as its own self-contained entity – A file
– Store each file in a way allowing efficient access • Provide some simple, powerful organizing
principle for the collection of files
– Making it easy to find them
– And easy to organize them
CS 111 Summer 2022
Lecture 13 Page 6

File Systems and Hardware
• File systems are typically stored on hardware providing persistent memory
– Flash memory, hard disks, tapes, etc.
• With the expectation that a file put in one
“place” will be there when we look again
• Performance considerations will require us to match the implementation to the hardware
• But ideally, the same user-visible file system should work on any reasonable hardware
CS 111 Summer 2022
Lecture 13 Page 7

What Hardware Do We Use?
• Until recently, file systems were designed primarily for hard disks
• Which required many optimizations based on disk latency factors
– File system design had to hide as much of the latency as possible
• Flash drives are now more common
– They have different characteristics than hard disks
– Requiring different adjustments in file system design
CS 111 Summer 2022
Lecture 13 Page 8

Flash Drives
• Solid state persistent storage devices
– I.e., no moving parts
• Reads and writes are fairly fast – Reads up to 100 MB/sec
– Writes up to 40 MB/sec
• But a given block can only be written once
– Writing again requires erasing
– Much slower and erases large sectors of the drive
CS 111 Summer 2022
How will these factors effect file system design?
Lecture 13 Page 9

Data and Metadata
• Filesystemsdealwithtwokindsofinformation
• Data–theinformationthatthefileisactually supposed to store
– E.g., the instructions of the program or the words in the letter
• Metadata–Informationabouttheinformationthefile stores
– E.g., how many bytes are there and when was it created – Sometimes called attributes
• Ultimately,bothdataandmetadatamustbestored persistently
– And usually on the same piece of hardware CS 111
Summer 2022
Lecture 13 Page 12

A Further Wrinkle
• Wewantourfilesystemtobeagnostictothestorage medium
• Sameprogramshouldaccessthefilesystemthesame way, regardless of medium
– Otherwise it’s hard to write portable programs
• Shouldworkforflashdrivesofdifferenttypes
• Orifweuseharddiskinsteadofflash
• OrifweuseaRAIDinsteadofonedisk
• Orifevenwedon’tusepersistentmemoryatall – E.g., RAM file systems
CS 111 Summer 2022
Lecture 13 Page 13

Desirable File System Properties
• Whatarewelookingforfromourfilesystem?
– Persistence
– Easy use model
• For accessing one file
• For organizing collections of files
– Flexibility
• No limit on number of files
• No limit on file size, type, contents
– Portability across hardware device types
– Performance
– Reliability
– Suitable security
CS 111 Summer 2022
Lecture 13 Page 14

The Performance Issue
• Howfastdoesourfilesystemneedtobe?
• Ideally,asfastaseverythingelse – Like CPU, memory, and the bus
– So it doesn’t provide a bottleneck
• Buttheseotherdevicesoperatetodayatnanosecond speeds
• Diskdrivesoperateatmillisecondspeeds
– Flash drives are more like microseconds, but still slower
• Suggestingwe’llneedtodosomeseriousworkto hide the mismatch
CS 111 Summer 2022
Lecture 13 Page 15

The Reliability Issue
• Persistence implies reliability
• We want our files to be there when we check, no matter what
• Not just on a good day
• So our file systems must be free of errors
– Hardware or software
• Remember our discussion of concurrency, race
conditions, etc.?
– Might we have some challenges here?
CS 111 Summer 2022
Lecture 13 Page 16

“Suitable” Security
• What does that mean?
• Whoever owns the data should be able to control who accesses it
– Using some well-defined access control model and mechanism
• With strong guarantees that the system will enforce his desired controls
– Implying we’ll check on access – To the extent performance allows
CS 111 Summer 2022
Lecture 13 Page 17

Basics of File System Design • Where do file systems fit in the OS?
• File control data structures
CS 111 Summer 2022
Lecture 13 Page 18

File Systems and the OS
file container operations
directory operations
system calls
virtual file system integration layer
device I/O
socket I/O
The file system API
A common internal interface for file systems
Device independent block I/O
Some example file systems
Non-file system services that use the same API
Lecture 13 Page 19
device driver interfaces (disk-ddi)
DVD disk drivers drivers
CS 111 Summer 2022
diskette drivers
flash drivers
EXT3 FS UNIX FS DOS FS UDF FS

File Systems and Layered
Abstractions
• At the top, apps think they are accessing files
• At the bottom, various block devices are reading and writing blocks
• There are multiple layers of abstraction in between
• Why not translate directly from application file
operations to devices’ block operations?
CS 111 Summer 2022
Lecture 13 Page 20

The File System API
App 1 App 2
App 3 App 4
system calls
file container operations
directory operations
virtual file system integration layer
Device independent block I/O
device driver interfaces (disk-ddi)
CD drivers
CS 111 Summer 2022
disk RAID drivers drivers
flash drivers
Lecture 13 Page 21
device I/O
socket I/O
EXT3 FS UNIX FS DOS FS CD FS

The File System API
• HighlydesirabletoprovideasingleAPIto programmers and users for all files
• Regardlessofhowthefilesystemunderneathis actually implemented
• Arequirementifonewantsprogramportability – Very bad if a program won’t work because there’s a
different file system underneath
• Threecategoriesofsystemcallshere
1. File container operations
2. Directory operations
3. File I/O operations
CS 111 Summer 2022
Lecture 13 Page 22

File Container Operations
• Standard file management system calls
– Manipulate files as objects
– These operations ignore the contents of the file
• Implemented with standard file system methods
– Get/set attributes, ownership, protection … – Create/destroy files and directories
– Create/destroy links
• Real work happens in file system implementation
CS 111 Summer 2022
Lecture 13 Page 23

Directory Operations
Directories provide the organization of a file system
– Typically hierarchical
– Sometimes with some extra wrinkles
At the core, directories translate a name to a lower-level file pointer
Operations tend to be related to that – Find a file by name
– Create new name/file mapping
– List a set of known names
CS 111 Summer 2022
Lecture 13 Page 24

File I/O Operations
• Open–usenametosetupanopeninstance
• Readdatafromfileandwritedatatofile
– Implemented using logical block fetches
– Copy data between user space and file buffer
– Request file system to write back block when done
– Change logical offset associated with open instance
• Mapfileintoaddressspace
– File block buffers are just pages of physical memory
– Map into address space, page it to and from file system
CS 111 Summer 2022
Lecture 13 Page 25

The Virtual File System Layer
App 1 App 2
system calls
virtual file system integration layer
App 3 App 4
device I/O
socket I/O
Device independent block I/O
device driver interfaces (disk-ddi)
DVD drivers
CS 111 Summer 2022
disk RAID drivers drivers
flash drivers
Lecture 13 Page 26
file container operations
directory operations
EXT3 FS UNIX FS DOS FS UDF FS

The Virtual File System
(VFS) Layer
• Federationlayertogeneralizefilesystems
– Permits rest of OS to treat all file systems as the same – Support dynamic addition of new file systems
• Plug-ininterfaceforfilesystemimplementations – DOS FAT, Unix, EXT3, ISO 9660, NFS, etc.
– Each file system implemented by a plug-in module
– All implement same basic methods
• Create, delete, open, close, link, unlink,
• Get/put block, get/set attributes, read directory, etc.
• Implementationishiddenfromhigherlevelclients CS 111 – All clients see are the standard methods and properties
Summer 2022
Lecture 13 Page 27

The File System Layer
system calls
virtual file system integration layer
App 3 App 4
device I/O
socket I/O
Device independent block I/O
device driver interfaces (disk-ddi)
DVD drivers
CS 111 Summer 2022
disk RAID drivers drivers
flash drivers
Lecture 13 Page 28
file container operations
directory operations
EXT3 FS UNIX FS DOS FS UDF FS

The File Systems Layer
• Desirabletosupportmultipledifferentfilesystems
• AllimplementedontopofblockI/O
– Should be independent of underlying devices
• Allfilesystemsperformsamebasicfunctions
– Map names to files
– Map into
– Manage free space and allocate it to files
– Create and destroy files
– Get and set file attributes
– Manipulate the file name space
CS 111 Summer 2022
Lecture 13 Page 29

Why Multiple File Systems?
Whynotinsteadchooseone“good”one? Theremaybemultiplestoragedevices
– E.g., hard disk and flash drive
– They might benefit from very different file systems
Differentfilesystemsprovidedifferentservices, despite the same interface
– Differing reliability guarantees – Differing performance
– Read-only vs. read/write
Differentfilesystemsusedfordifferentpurposes
– E.g., a temporary file system Summer 2022
Lecture 13 Page 30

Device Independent Block I/O Layer
system calls
virtual file system integration layer
file container operations
device I/O
socket I/O
Device independent block I/O
device driver interfaces (disk-ddi)
disk RAID drivers drivers
flash drivers
CS 111 Summer 2022
DVD drivers
Lecture 13 Page 31
directory operations
EXT3 FS UNIX FS DOS FS UDF FS

File Systems and Block I/O Devices
• FilesystemstypicallysitonageneralblockI/Olayer
• Ageneralizingabstraction–makealldiskslooksame
• Implementsstandardoperationsoneachblockdevice – Asynchronous read (physical block #, buffer, bytecount)
– Asynchronous write (physical block #, buffer, bytecount)
• Maplogicalblocknumberstodeviceaddresses – E.g., logical block number to
• Encapsulatealltheparticularsofdevicesupport
– I/O scheduling, initiation, completion, error handlings
– Size and alignment limitations
CS 111 Summer 2022
Lecture 13 Page 32

Why Device Independent Block I/O?
• Abetterabstractionthangenericdisks
• AllowsunifiedLRUbuffercachefordiskdata – Hold frequently used data until it is needed again
– Hold pre-fetched read-ahead data until it is requested
• Providesbuffersfordatare-blocking
– Adapting file system block size to device block size – Adapting file system block size to user request sizes
• Handlesautomaticbuffermanagement
– Allocation, deallocation
– Automatic write-back of changed buffers
CS 111 Summer 2022
Lecture 13 Page 33

Why Do We Need That Cache?
• File access exhibits a high degree of reference locality at multiple levels:
– Users often read and write parts of a single block in small operations, reusing that block
– Users read and write the same files over and over
– Users often open files from the same directory
– OS regularly consults the same meta-data blocks
• Having common cache eliminates many disk accesses, which are slow
CS 111 Summer 2022
Lecture 13 Page 34

Why A Single Block I/O Cache?
• Why not one per process (or user)?
• Or one per device?
• A single cache is more efficient when multiple users access the same file
• A single cache provides better hit ratio than several independent caches
– Whether per process, user, or device
– Generally true for caching, not just here
CS 111 Summer 2022
Lecture 13 Page 35

File Systems Control Structures • Afileisanamedcollectionofinformation
• Primaryrolesoffilesystem:
– To store and retrieve data
– To manage the media/space where data is stored
• Typicaloperations:
– Where is the first block of this file?
– Where is the next block of this file?
– Where is block 35 of this file?
– Allocate a new block to the end of this file
– Free all blocks associated with this file
CS 111 Summer 2022
Lecture 13 Page 36

Finding Data On Devices
• Essentially a question of how you managed the space on your device
• Space management on a device is complex
– There are millions of blocks and thousands of files – Files are continuously created and destroyed
– Files can be extended after they have been written – Data placement may have performance effects
– Poor management leads to poor performance
• Must manage the space assigned to each file CS 111 – On-device, master data structure for each file
Summer 2022
Lecture 13 Page 37

On-Device File Control Structures
• On-devicedescriptionofimportantattributesofafile
– Particularly where its data is located
• Virtuallyallfilesystemshavesuchdatastructures
– Different implementations, performance & abilities
– Implementation can have profound effects on what the file system can do (well or at all)
• Acoredesignelementofafilesystem
• Pairedwithsomekindofin-memoryrepresentation
of the same information
CS 111 Summer 2022
Lecture 13 Page 38

The Basic File Control Structure Problem
• A file typically consists of multiple data blocks
• The control structure must be able to find them
• Preferably be able to find any of them quickly – I.e., shouldn’t need to read the entire file to find a
block near the end
• Blocks can be changed
• New data can be added to the file – Or old data deleted
• Files can be sparsely populated CS 111
Summer 2022
Lecture 13 Page 39

Sparse Files
• Can we have a file that isn’t yet entirely “filled in”?
• Some parts have been written, but not all
– And some unwritten ones are in the middle
– Not just at the end
The empty blocks need not be allocated (yet)
Full Empty
4095 4096 8191 8192 12287 12288 16383 16384 20479 20480 24575 24576 28671
CS 111 Summer 2022
Lecture 13 Page 40

The In-Memory Representation
• There is an on-disk structure pointing to device blocks (and holding other information)
• When file is opened, an in-memory structure is created
• Not an exact copy of the device version
– The device version points to device blocks
– The in-memory version points to RAM pages • Or indicates that the block isn’t in memory
– Also keeps track of which blocks have been written and which aren’t
CS 111 Summer 2022
Lecture 13 Page 41

In-Memory Structures and
• What if multiple processes have the same file
• Should they share one control structure or have one each?
• In-memory structures typically contain a cursor pointer
– Indicating how far into the file data has been read/written
• Sounds like that should be per-process . . .
CS 111 Summer 2022
Lecture 13 Page 42

Per-Process or Not?
• What if cooperating processes are working with the same file?
– They might want to share a file cursor
• And how can we know when all processes are finished with an open file?
– So we can reclaim space used for its in-memory descriptor
• Implies a two-level solution
1. A structure shared by all
2. A structure shared by cooperating processes
CS 111 Summer 2022
Lecture 13 Page 43

The Unix Approach
Two processes can share one descriptor
Two descriptors can share one inode
offset options I-node ptr
offset options I-node ptr
offset options I-node ptr
offset options I-node ptr
offset options I-node ptr
CS 111 Summer 2022
I-node I-node
I-node I-node
Open-file references (UNIX user file descriptor)
in process descriptor
Open file instance descriptors
In-memory file descriptors (UNIX struct inode)
On-disk file descriptors (UNIX struct
dinode) Lecture 13 Page 44

File System Structure
• How do I organize a device into a file system?
CS 111 Summer 2022
Lecture 13 Page 45
– Linked extents
• The DOS FAT file system
– File index blocks
• Unix System V file system

Basics of File System Structure
Mostfilesystemsliveonblock-orienteddevices Suchvolumesaredividedintofixed-sizedblocks
– Many sizes are used: 512, 1024, 2048, 4096, 8192 …
“meta-data” – Description of the file system (e.g., layout and state)
– File control blocks to describe individual files
– Lists of free blocks (not yet allocated to any file)
Allfilesystemshavesuchdatastructures
– Different OSes and file systems have very different goals
– These result in very different implementations Summer 2022
Lecture 13 Page 46

The Boot Block
• The 0th block of a device is usually reserved
for the boot block
– Code allowing the machine to boot an OS – Not just for DOS, for all OSes
• Not usually under the control of a file system – It typically ignores the boot block entirely
• Not all devices are bootable
– But the 0th block is usually reserved, “just in case”
• So file systems start work at block 1
CS 111 Summer 2022
Lecture 13 Page 47

Managing Allocated Space
• Acoreactivityforafilesystem,withvariouschoices
• Whatifwegiveeachfilethesameamountofspace? – Internal fragmentation … just like memory
• Whatifweallocatejustasmuchasfileneeds?
– External fragmentation, compaction … just like memory
• Perhapsweshouldallocatespacein“pages” – How many chunks (“pages”) can a file contain?
• Thefilecontroldatastructuredeterminesthis
– It only has room for so many pointers, then file is “full”
• Sohowdowewanttoorganizethespaceinafile?
CS 111 Summer 2022
Lecture 13 Page 48

Linked Extents • Asimpleanswer
• Filecontrolblockcontainsexactlyonepointer
– To the first chunk of the file
– Each chunk contains a pointer to the next chunk
– Allows us to add arbitrarily many chunks to each file
• Pointerscanbeinthechunksthemselves
– This takes away a little of every chunk
– To find chunk N, you have to read the first N-1 chunks
• Orpointerscanbeinauxiliary“chunklinkage”table – Faster searches, especially if table kept in memory
CS 111 Summer 2022
Lecture 13 Page 49

block 0512 block 1512
block 2512
boot block
The DOS File System
Cluster size and FAT length are specified in the BPB
Data clusters begin immediately after the end of the FAT
Root directory begins in the first data cluster
File Allocation Table (FAT)
CS 111 Summer 2022
Lecture 13 Page 50
BIOS parameter block (BPB)
cluster #1 (root directory)
cluster #2 …

DOS File Sy

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com