COMP 3430 Operating systems – Chapter 38, 42 reading notes
Winter 2022
About these reading notes
Chapter 38: Redundant arrays of inexpensive disks (RAID)
Copyright By PowCoder代写 加微信 powcoder
38.1InterfaceandRAIDinternals…………………………… 3 38.2Faultmodel ………………………………….. 3 38.3HowtoevaluateaRAID …………………………….. 3 38.4RAIDlevel0:striping ……………………………… 4
Chunksizes………………………………….. 4 BacktoRAID-0Analysis ……………………………. 5 EvaluatingRAIDperformance …………………………. 5 BacktoRAID-0analysis,again…………………………. 5
38.5RAIDlevel1:mirroring……………………………… 6 RAID-1analysis ………………………………… 6 38.6RAIDlevel4:savingspacewithparity ……………………… 6 RAID-4analysis………………………………… 7 38.7RAIDLevel5:rotatingparity…………………………… 7 RAID-5analysis………………………………… 7 38.8RAIDcomparison:asummary …………………………. 8 38.9OtherinterestingRAIDissues . . . . . . . . . . …………………. 8
Chapter 42: Crash Consistency: FSCK and Journaling 8 42.1Adetailedexample……………. …………………. 9 Crashscenarios………………………………… 10 Thecrashconsistencyproblem ………………………… 10 Solution#1:thefilesystemchecker …………………………. 10 43.2Solution#2: Journaling(orWrite-AheadLogging) . . . . . . . . . . . . . . . . . . . . . 12 Datajournaling………………………………… 12 Recovery …………………………………… 13 Batchinglogupdates……………………………… 14 Makingthelogfinite ……………………………… 14 Metadatajournaling ……………………………… 14 Trickycase:blockreuse ……………………………. 15 Wrappingupjournaling:atimeline ………………………. 15 42.4Solution#3:otherapproaches …………………………. 15 42.5Summary …………………………………… 16
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
About these reading notes
These are my own personal reading notes that I took (me, Franklin) as I read the textbook. I’m pro- viding these to you as an additional resource for you to use while you’re reading chapters from the textbook. These notes do not stand alone on their own — you might be able to get the idea of a chap- ter while reading these, but you’re definitely not going to get the chapter by reading these alone.
These notes are inconsistently all of the following:
• Mesummarizingpartsofthetext.
• Mecommentingonpartsofthetext.
• Measkingquestionstomyselfaboutthetext.
– …andsometimesansweringthosequestions.
The way that I would expect you to read or use these notes is to effectively permit me to be your in- ner monologue while you’re reading the textbook. As you’re reading chapters and sections within chapters, you can take a look at what I’ve written here to get an idea of how I’m thinking about this content.
Chapter 38: Redundant arrays of inexpensive disks (RAID)
Back down from abstractions and filesystems, let’s try to address “disks are too small/slow/fragile” by adding more disks to the situation.
RAID is a good approach, even given the facts:
1. We have commodity disks that are 16TB in size (cost is prohibitive, $700 compared to 2×8TB around $500, but density is higher).
2. SSDsarefast,buttinycomparedtorotationaldisks.
Real simply: RAID makes a bunch of physical hard drives look like one single drive from the perspective
of the OS.
RAID can be implemented in hardware or software. When it’s implemented in hardware, the OS just sees a big bag of blocks, the same as it did when it was looking at an individual disk. Really impor- tantly, when RAID is implemented in hardware, the OS has no idea that the data it’s writing is going to multiple physical disks.
RAID gives us three desirable properties, but each with a trade off: 1. Performance
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
2. Capacity 3. Reliability
Note: This is not in or about the textbook, but I want to be clear: RAID is not a back up strategy. Not in any way, shape, or form. Yes, you get desirable properties that do look like backing stuff up (e.g., redundancy means that you’re writing the same blocks to multiple physical disks) and making things more resilient, but RAID, in and of itself, is not a backup strategy. If you want an actual backup strategy, look at something like the 3-2-1 backup strategy.
38.1 Interface and RAID internals
An OS sends logical I/O requests to a RAID controller, the RAID controller has to translate those to physical requests – it has to figure out what drive the requested data actually belongs to.
Aside: we don’t actually see what a RAID physically looks like here, just a hand-wavy description of what it looks like.
Depending on how the RAID is configured (because there are multiple configurations), the RAID hard- ware may have to perform multiple physical writes for every logical write (again, the OS says “please write this block” and the RAID controller then has to decide how and where to place that block, across multiple physical disks).
The authors describe a bit of what RAID looks like as a controller, and they’re basically describing a tiny computer (yo dawg).
38.2 Fault model
Let’s just keep things simple for now. We will use a fault model called “fail-stop”, which is a binary “An entire drive works or doesn’t work”. We don’t care about disks where some of the blocks work and others don’t, we don’t care about data corruption.
38.3 How to evaluate a RAID
We’re going to be looking at multiple types of RAID, and we need to be able to compare them to each other. We’ll compare the different kinds of RAID on 3 metrics:
1. Capacity:given𝑛blocksacross𝑚disks,howmanytotalblocksdoweget?
2. Reliability: how many disks can fail without affecting the data on the entire array? How many
can fail and we can still get all the written data back?
3. Performance: how quickly can we read or write to the entire array of disks compared to just
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
38.4 RAID level 0: striping
The authors immediately note that striping is not Redundant at all, though it is an array (again: RAID
is never a backup strategy).
The basic idea here is that we distribute all blocks across all disks in the array in a round robin fashion. We can trivially calculate which block belongs on which disk using modular arithmetic.
Aside: The authors have switched entirely to using the term “blocks” to represent the atomic con- cept on a drive. This is slightly different from the idea of a “sector” that we saw before, but they are effectively synonymous.
They describe the writing of blocks as “round-robin” – Hey! It’s that idea again!
The name striping comes from the idea that the blocks in a row on a table are referred to as a “stripe” (e.g., blocks 0, 1, 2, 3 in figure 38.1 are a stripe, blocks 4, 5, 6, 7 are a stripe, blocks 8, 9, 10, 11 are a stripe, etc).
We’ll see this in a second, but performance here is high: both reads and writes are distributed across multiple disks.
Aside: A couple of weeks ago we talked about I/O scheduling algorithms having a fundamental differ- ence from process scheduling: the answer (which we’ll look at) is that I/O is explicitly not concurrent. However, now that we introduce this idea of RAID, I/O is concurrent. Do scheduling algorithms have to change? Does the OS care? Does the RAID controller have the responsibility of having a more complex I/O scheduler?
Chunk sizes
The chunk size represents how many sequential blocks in a stripe would be placed on the same disk.
The chunk size will directly influence across how many drives a single file will have its blocks scattered. A bigger chunk size might mean that a file in its entirety will be put onto the same drive. A smaller chunk size might mean that the file will be placed across many drives.
We’re trying to balance the sequential read/write performance of a single drive across coalescing bits from many drives.
The actual answer to this is that you’d have to run experiments on chunk size for the workload that you’re trying to evaluate. If you have lots of really big files that you want to improve random access for, you’re going to want to use a different chunk size than if you wanted to improve sequential access for.
Aside: Think about this later: is chunk size strictly a RAID-0 thing? Can RAID-1’s performance be af- fected by chunk size?
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
Back to RAID-0 Analysis
Capacity of RAID-0 – it’s as perfect as you can get: 𝑛 disks with 𝑏 blocks gives us 𝑛 × 𝑏 capacity as a single volume.
Reliability is perfect—-ly terrible! If a disk fails, all the data that was on that disk is lost without recovery options. As far as the filesystem is concerned, that means that many files are corrupted, and (possibly) the whole filesystem is lost if the superblock and other information regions are on the disk that’s been lost.
Performance is great! We can distribute reads and writes across all disks however we please.
Evaluating RAID performance
Now we’re getting into a more granular method of assessing what “good” means on a RAID array. Ba- sically, we’ve got two metrics:
1. single-request latency: how long it takes to service a single I/O request (Aside: is this for one block or one file? Almost certainly one block.)
2. steady-state throughput: how many bytes can maximally be moving through this RAID con- troller at any given time?
We measure these two metrics using different workloads:
1. Sequential: We’re going to request 𝑛 blocks where the address that the OS sees is in sequence (not necessarily the same as the physical location)
2. Random:We’regoingtorequest𝑛blockswheretheaddressistotallyrandom.
We’re now going back to our disk chapter and reminding ourselves that disks have a transfer rate. The
transfer rate for sequential workloads is higher than random workloads.
Aside: Why is that true? Can you convince yourself of that using your knowledge of the physical layout of a disk and how blocks are allocated that sequential reads are faster on a rotational disk?
Back to RAID-0 analysis, again
Yeesh, with the asides and back pointers.
I guess it makes sense, we want a variable to refer to random throughput vs sequential throughput.
Performance here is again, as fast as the disks in the array are. For a single request, we just forward the request to the disk, so this is going to be as fast as any disk is. For steady-state throughput with random access or sequential workloads, performance is optimal because with sequential (assuming
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
the data is striped evenly across the array), we can have the sequential transfer rate of one disk times all disks. The same is true for random access.
38.5 RAID level 1: mirroring
Now instead of distributing data evenly across disks, we’re duplicating data across disks.
We’re trying to address the reliability problem with RAID-0. Now we can say that 1 disk failing is OK,
we’re not going to lose any data.
Aside: The authors make the distinction between RAID-10 and RAID-01. RAID-10 is we pair up disks and do mirrors, then stripe across the pairs. Basically, treat each pair of disks as a single disk and then stripe across the single disks. RAID-01 is a stripe, then a mirror? Are we duplicating data on a single drive? No, we’re making two stripes then mirroring them. That is, disk 0 and 1 are a stripe, and 2 and 3 are a mirror of that stripe.
Read and write performance are different on RAID-1. Reading can be done from any of the disks with the mirrored data, but writing must be done to all of the disks in the mirror. Writes can be done in parallel (provided the disk isn’t busy?).
RAID-1 analysis
1. Capacity:RAID-1basicallyhalvesdata(or,moreaccurately𝑛thsdata,where𝑛isthenumberof mirrors that we have).
2. Reliability: In theory, if mirroring level is 2, then half of the array could fail, provided that the disks failing are only one of each pair.
3. Performance: Reads are good, but not as good as RAID-0. Single requests are still as fast as a single disk is for read, but a tiny bit slower for writes because we have to account for all of the disks seek times for writing to the mirror. Throughput is different, though. Sequential read throughput is lower because the worst case scenario is that the RAID controller does a bad job of distributing reads to the disks. Random access throughput has the highest performance for RAID-1 because we can (again) distribute reads across all disks, even for mirrored blocks.
38.6 RAID level 4: saving space with parity
Giving up an entire disk for mirroring data is not an ideal solution. Let’s use some maths to help us out with data integrity.
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
In fact, let’s use really simple maths and a simple idea: xor all the bits in a stripe. The xor tells us “Is there an even or an odd number of bits in this stripe?”. If one disk fails, we can really easily determine what that disk’s bit was. NEAT.
Even more neater: we can use xor on the remaining bits and the parity bit to tell us what the lost bit is!
RAID-4 analysis
1. Capacity:Welose1disk,nothalfofthedisks.Sothat’sbetter.
2. Reliability: We can lose any one disk. But, uh, only one. Lose two and we’ve lost data. This is definitely worse than mirroring where we could (in theory) lose multiple disks, provided they were in different pairs.
3. Performance: We can do pretty good with read, since we can distribute the sequential steady state reads across (𝑁 − 1) disks. Same with random reads. The issues that come up here are with writing, since for each write we need to calculate or recalculate the parity bit. “Full-stripe writes” offer similar performance to a RAID-0 array, where each disk in the array gets one write, and is done in parallel. Random writes are the tricky case: if we’re just writing one block, then we need to read the other blocks in the stripe to calculate the parity bit. Once we’ve calculated the parity bit we do two writes in parallel, the target block and the new parity block. This is called “additive parity”.
When doing a single block write, we can also read the old value of the target block and the cur- rent parity bit, then xor all three of them together to get the new parity bit.
Calculating the parity bits causes a bottleneck on I/O to the parity bit drive.
Aside: Is it OK to lose the parity disk? Aside: Are parallel writes atomic?
38.7 RAID Level 5: rotating parity
Putting all the parity bits on one disk leads to that disk being a bottleneck for the whole system. The idea here is that we put parity bits on all the disks, where each one takes a turn for each stripe.
RAID-5 analysis
1. Capacity:isthesameasRAID-4,butthebitsthatwouldhavebeenputon1drivearejustacross all.
2. Reliability:Again,wecanloseanydiskandnotloseanydata,butwecanonlyloseone.
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
3. Performance: This is where things get different. Sequential read and write for throughput are the same as RAID-4. Random reads are a bit better (+1) because we can randomly read from all drives instead of having to avoid the parity disk. Random write also improves because the problem we had with RAID-4 and the parity drive being a bottleneck is gone, all disks share the load for writing the parity bit.
The summary here is: RAID-5 is RAID-4 + 1, with no losses, so RAID-5 usually just wins.
38.8 RAID comparison: a summary
The summary the authors produce omit the constant values (like seek time on drives) and otherwise provides a handy summary with the number of drives. Figure 38.8 does a real nice job of summarizing the properties of each RAID level.
Basically: Here’s what you should pick when you want to favour different things.
38.9 Other interesting RAID issues
The authors write that there are RAID levels 2 and 3 in “the original taxonomy”, plus RAID-6 that per- mits multiple disk failures (how?). They don’t describe what the “original taxonomy” is, but thankfully Wikipedia’s page on RAID bails us out and lists all the RAID levels, including 2 and 3.
They go on to say that there are other types of failure modes, and what might happen to the disk array when it’s rebuilding an array.
Also: software RAID. Yeah, that’s also a thing. Is software RAID an OS responsibility? (yep)
Chapter 42: Crash Consistency: FSCK and Journaling
The big question here is: How does a file system handle a crash? More importantly, how does it deal with data that is corrupted because it didn’t get fully written, or some other issue (random gamma rays) corrupted the data?
Remember that the disk only guarantees that the write of a single sector is atomic, and we’ve learned that the sector size (what the disk considers to be an atomic unit) and block size (what the file system considers to be an atomic unit) are different from one another.
It’s also important to note that changes to the file system itself aren’t atomic. Think about that for a second (and also maybe look back at chapter 40). What does a file system have to do when you write a new file to the disk? How many different data structures need to be updated? Can you write those
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
changes to disk in one single atomic operation? (Uh, no). If your OS crashes in the middle of updating the data structures for writing a file, the file system itself is “inconsistent”. For example, you might have updated inodes, but not updated the inode bitmap. Or alternatively, you updated the free space bitmap, but never assigned that block to an inode. Or alternatively, … you get the idea.
There’s two approaches that this chapter will explore:
1. Dealwiththeproblemafterit’shappened(fsck)attheexpenseofpossiblylosingdatabutkeep- ing writes to the FS fast.
2. Deal with the problem before it’s happened (journal) at the expensive of slowing down writes, but making sure you can’t have half-written data.
42.1 A detailed example
The basic setup here is that we’ve got three blocks to write to a simple file system when appending to an existing file:
1. Awritetotheinodetoupdateadirectpointer(1block) 2. Awritetoreplacetheentiredatablockbitmap(1block) 3. Awritetowritethedatablockitself(1block)
Again, remember that block size != sector size, and only sector writes are atomic. If the system crashes at any point in between writing these blocks, the data structures may be left in an inconsistent state (corrupted).
The diagram here (pg 2) is an extremely simplified version of the FS diagrams that we saw in chapter 40, and the text is pretty much just stepping through the operations that are happening to add a new block to this file (update the data bit map, update the inode itself, write the data block).
which is just filled with whatever it is users put into files. Stolen music perhaps?
This dates the book. Nobody steals music anymore. We steal TV shows. Also nobody saves those files to their disk, do they?
The last little bit about this (just before “Crash Scenarios”) is talking a bit about virtual memory (pag- ing). You don’t need to be too concerned about this right now, other than to know that calling write on a file doesn’t actually change anything on a disk immediately, the memory manager will handle this.
COMP 3430 Operating systems – Chapter 38, 42 reading notes Winter 2022
Crash scenarios
Three block writes (data, inode, data bitmap), three crash scenarios where only one of those three blocks is w
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com