CS计算机代考程序代写 file system cuda distributed system concurrency cache arm algorithm PERSISTENCE: I/O DEVICES

PERSISTENCE: I/O DEVICES
Andrea Arpaci-Dusseau CS 537, Fall 2019

ADMINISTRIVIA
Grades:
Project 3 (email TAs if problems)
Project 5 available now (xv6 Memory)
– Due Monday 11/4 (5 pm)
– Many lab hours through then
– Turn in any of 3 versions:
– v1a (alloc alternating pages, all marked as UNKNOWN PID)
– v1b (alternating pages, some marked UNKNOWN, most known PIDs)
– v2 (contiguous allocations when possible, some marked UNKNOWN, most known PIDs
Midterm 2: Nov 11/6 (Wed) from 7:30-9:30pm
– Practice exams available
– Room details on Canvas
– Mostly Concurrency
+ Some Virtualization (usually repeated from Midterm1)

AGENDA / LEARNING OUTCOMES
How does the OS interact with I/O devices? How can we optimize this?
What are the components of a hard disk drive?
How to calculate sequential and random throughput of a disk?
What algorithms are used to schedule I/O requests?

OPERATING SYSTEMS: THREE EASY PIECES
Three conceptual pieces 1. Virtualization
2. Concurrency
3. Persistence

VIRTUALIZATION
Make each application believe it has each resource to itself CPU and Memory
Abstraction: Process API, Address spaces Mechanism:
Limited direct execution, CPU scheduling Address translation (segmentation, paging, TLB)
Policy: MLFQ, LRU etc.

CONCURRENCY
Events occur simultaneously and may interact with one another Need to
Hide concurrency from independent processes Manage concurrency with interacting processes
Provide abstractions (locks, semaphores, condition variables etc.) Correctness: mutual exclusion, ordering
Difficult to write multi-threaded applications!

OPERATING SYSTEMS: THREE EASY PIECES
Three conceptual pieces 1. Virtualization
2. Concurrency
3. Persistence

Persistence
How to ensure data is available across reboots
– even after power outages, hardware failure, system crashes?
Topics:
– Persistent storage devices (HDDs, R AID, SSDs)
– FileAPIforprocesses
– FSimplementation(meta-datastructures,allocationpolicies)
– Crashrecovery(journaling)
– Advanced Topics: Distributed systems?

Motivation: Need Input + Output
What good is a computer without any I/O devices? keyboard, display, disks
what if no input?
what if no output?
what if no state recorded between different computations?
We want:
– H/W that will let us plug in different I/O devices – OS that can interact with different combinations

Hierarchical buses
Hardware support for I/O

Canonical Device
OS reads/writes to these
Status
COMMAND
Microcontroller (CPU+RAM) Extra RAM
Other special-purpose chips
DATA
Device Registers Hidden Internals:

Example Write Protocol: Starting Point
Status
COMMAND
Microcontroller (CPU+RAM) Extra RAM
Other special-purpose chips
DATA
while (STATUS == BUSY)
; // spin
Write data to DATA register
Write command to COMMAND register while (STATUS == BUSY)
; // spin

Starts I/O
3 124
A
A
A
A
B
CPU: Disk:
C
C
A
A
while (STATUS == BUSY) ;
Write data to DATA register
Write command to COMMAND register // 3 while (STATUS == BUSY) // 4
;
// 1
// 2

CPU:CPU: Disk: Disk:
3 124
A
B
C
A
while (STATUS == BUSY) ;
Write data to DATA register
Write command to COMMAND register // 3 while (STATUS == BUSY) // 4
;
how to avoid spinning? interrupts!
// 1
// 2

CPU:CPU: Disk: Disk:
3 124
how to avoid spinning? interrupts!
A
B
C
A
// 1
Write data to DATA register
Write command to COMMAND register // 3 while (STATUS == BUSY) // 4
wait for interrupt;
while (STATUS == BUSY) wait for interrupt;
// 2

CPU:CPU:
1
3,4 2
how to avoid spinning? interrupts!
ABABAB Disk: Disk:
while (STATUS == BUSY) wait for interrupt;
Write data to DATA register
Write command to COMMAND register // 3 while (STATUS == BUSY) // 4
wait for interrupt;
Example Write Protocol: Interrupts
C
A
// 1
// 2

Interrupts vs. Polling
Are interrupts always better than polling?
Fast device: Better to spin than take interrupt overhead
– Device time unknown? Hybrid approach (spin then use interrupts)
Flood of interrupts arrive
– Can lead to livelock (always handling interrupts)
– Better to ignore interrupts while make some progress handling them
Other improvement
– Interrupt coalescing (batch together several interrupts)

Protocol Variants
Status
COMMAND
Microcontroller (CPU+RAM) Extra RAM
Other special-purpose chips
DATA
Status checks: polling vs. interrupts
PIO vs DMA
Special instructions vs. Memory mapped I/O

DATA TRANSFER COSTS with PIO
AAAAAA BBBBBAA AAAAA
while (STATUS == BUSY) // 1
wait for interrupt;
Write data to DATA register // 2
Write command to COMMAND register // 3
while (STATUS == BUSY) // 4
wait for interrupt;
A
A

Programmed I/O vs. Direct Memory Access
PIO (Programmed I/O):
– CPU directly tells device what the data is
DMA (Direct Memory Access):
– CPU leaves data in memory
– Device reads data directly from memory

Data Transfer with DMA
AAAAABBBBBBBBAA A
AAAAA
while (STATUS == BUSY) // 1 wait for interrupt;
Write data to DATA register // 2 Write command to COMMAND register // 3 while (STATUS == BUSY) // 4
wait for interrupt;
A
A

A
B
B
A
CPU: Disk:
1 3,4
C A
while (STATUS == BUSY) ;
Write data to DATA register
Write command to COMMAND register // 3 while (STATUS == BUSY) // 4
;
// 1
// 2

Protocol Variants
Status
COMMAND
Microcontroller (CPU+RAM) Extra RAM
Other special-purpose chips
DATA
Status checks: polling vs. interrupts
PIO vs DMA
Special instructions vs. Memory mapped I/O

Status
COMMAND
Microcontroller (CPU+RAM) Extra RAM
Other special-purpose chips
DATA
while (STATUS == BUSY) ;
Write data to DATA register
while (STATUS == BUSY) ;
// 1 // 2 // 4
Write command to COMMAND register // 3

Special Instructions vs. Mem-Mapped I/O
Special instructions
– each device has a port
– in/out instructions (x86) communicate with device
Memory-Mapped I/O
– H/W maps registers into address space
– loads/stores sent to device
Doesn’t matter much (both are used)

Protocol Variants
Status
COMMAND
Microcontroller (CPU+RAM) Extra RAM
Other special-purpose chips
DATA
Status checks: polling vs. interrupts
PIO vs DMA
Special instructions vs. Memory mapped I/O

Variety is a Challenge
Problem:
– many, many devices
– each has its own protocol
How can we avoid writing a slightly different OS for each H/W combination? Write device driver for each device
Drivers are 70% of Linux source code

Storage Stack
DEVICE DRIVERS
application file system scheduler
driver
hard drive
build common interface on top of all HDDs

HARD DISKS

HARD DISK INTERFACE
Mechanical (slow) nature of HDDs makes management “interesting”
Disk has a sector-addressable address space Appears as an array of sectors
Sectors are typically 512 bytes
Main operations:
reads + writes to sectors

Disk components
Platter

Spindle
Surface
Surface

RPM?
Many platters may be bound to spindle Motor connected to spindle spins platters
Rate of rotation: RPM
10000 RPMàsingle rotation is 6 ms

Surface is divided into rings: tracks Stack of tracks(across surfaces): cylinder

Tracks are divided into numbered
sectors
OS views as linear array
23 16
15 8
22 70 17 14 6 1 9
13 5 4 3 2 10 21 12 11 18
Actual mapping varies,
uses knowledge of disk details,
OS doesn’t need to know mapping 20 19

Heads on a moving arm can read from each surface
23 16 15 8
22 7 0 14 6
1 9
17
18
21
12 11 20 19
13 5 4 3 2 10

read/write head
Disk Terminology
spindle
platter surface
sector
track
cylinder
(stack of tracks across all surfaces)

Time to Read/write
Three components:
Time = seek + rotation + transfer time

READING DATA FROM DISK
Example: Read sector 1
1) Seek

READING DATA FROM DISK
Example: Read sector 1
2) Rotational delay 3) Transfer time

Seek, Rotate, Transfer
Calculate average seek cost for random I/O Seek cost: Function of cylinder distance
– Not purely linear cost
– (but can do a linear model for calculations in this course) Must accelerate, coast, decelerate, settle
– Settling alone can take 0.5 – 2 ms Entire seeks often takes several milliseconds
– 4-10ms
Approximate average seek distance?
= 1/3 max seek distance (derivation in text book)
23 16 15 8
2214 6 7 0 1 9 17
2113 54321018 12 11
20 19

Seek, Rotate, Transfer
Calculate average rotate cost for random I/O Depends on rotations per minute (RPM)
– 7200 RPM is common, 15000 RPM is high end With 7200 RPM, how long to rotate around?
1 / 7200 RPM = 1 minute / 7200 rotations = 1 second / 120 rotations =
8.3 ms / rotation
Average rotation distance? 1⁄2
Average rotation?
8.3 ms / 2 = 4.15 ms
23 16 15 8
2214 6 7 0 1 9 17
2113 54321018 12 11
20 19

Seek, Rotate, Transfer
Calculate average transfer cost for random I/O (transfer cost always the same) Pretty fast — depends on RPM and sector density
100+ MB/s is typical for maximum transfer rate
How long to transfer 512-bytes?
512 bytes * (1s / 100 MB) = 5 us
23 16 15 8
2214 6 7 0 1 9 17
2113 54321018 12 11
20 19

Workload Performance
So…
– seeks are slow (ms)
– rotations are slow (ms) – transfers are fast (us)
What kind of workload is fastest for disks?
Sequential (access sectors in order) vs. Random (access sectors in random order)?
Sequential: fast (no seek or rotation; transfer dominated) Random: slow (seek+rotation dominated)

Disk Spec: Seq vs Random Throughput
Capacity RPM
Avg Seek Max Transfer Platters Cache
Cheetah 300 GB 15, 000
4 ms
125 MB/s 4
Barracuda 1 TB
7, 200
9 ms
16 MB
Sequential workload: what is throughput for each?
Cheeta: 125 MB/s Barracuda: 105 MB/s
105 MB/s 4
32 MB

Disk Spec: Random Throughput
Capacity RPM
Avg Seek Max Transfer Platters Cache
Cheetah 300 GB 15, 000
4 ms
125 MB/s 4
16 MB
Barracuda 1 TB
7, 200
9 ms
105 MB/s 4
32 MB
Random workload: what is throughput for each? (what else do you need to know?)
What is size of each random read? Assume 16-KB reads

Throughput for average random 16-KB read w/ Cheetah?
Capacity RPM
Avg Seek Max Transfer Platters Cache
1000 ms 1 sec
Cheetah 300 GB 15, 000
4 ms
125 MB/s 4
16 MB
= 2 ms
Barracuda 1 TB
7, 200
9 ms
105 MB/s 4
32 MB
Time = Seek + rotation + transfer Average seek? Seek = 4 ms
Average rotation in ms?
avg rotation =
2
1 sec transfer =
125 MB
1
1 min 15000
16 KB
60 sec 1 min
Transfer of 16 KB?
1,000,000 us 1 sec
= 125 us

Throughput for average random 16-KB read w/ Cheetah?
Time = Seek + rotation + transfer
Cheetah time = 4ms + 2ms + 125us = 6.1ms
Random Throughput? (MB/s)
throughput = 16 KB 1 MB 6.1ms 1024 KB
Capacity RPM
Avg Seek Max Transfer Platters Cache
Cheetah 300 GB 15, 000
4 ms
125 MB/s 4
16 MB
Barracuda 1 TB
7, 200
9 ms
105 MB/s 4
32 MB
1000 ms = 2.5 MB/s 1 sec

RPM
Avg Seek Max Transfer
Throughput for average random 16-KB read on Barracuda? Time = seek + rotation + transfer
Avg seek = 9ms
avg rotation = 1 2
transfer = 1 sec 105 MB
Cheetah 15, 000
4 ms
125 MB/s
Barracuda 7, 200
9 ms
105 MB/s
1 min 7200
16 KB
60 sec 1000 ms = 4.1 ms 1 min 1 sec
1,000,000 us = 149 us 1 sec

Throughput for average random 16-KB read on Barracuda? Barracuda time = 9ms + 4.1ms + 149us = 13.2ms
Throughput (MB/s) =
16 KB 1 MB 1000 ms 13.2ms 1024 KB 1 sec
= 1.2 MB/s
RPM
Avg Seek Max Transfer
Cheetah 15, 000
4 ms
125 MB/s
Barracuda 7, 200
9 ms
105 MB/s

Capacity RPM
Avg Seek Max Transfer Platters Cache
Cheetah 300 GB 15, 000
4 ms
125 MB/s 4
16 MB
Cheetah 125 MB/s 2.5 MB/s
Barracuda 1 TB
7, 200
9 ms
105 MB/s 4
32 MB
Barracuda 105 MB/s 1.2 MB/s
Sequential Random

Disk Simulator
. /example-rand. csh . /example-seq. csh

Track Skew Zones Cache
Other Improvements

Problem
What if sequential request spans multiple tracks?
. /example-skew. csh
. /example-skew-fixed. csh

Track Skew
23 16 15 8
12 11 20 19
2214
13 21
9 17
10 18
Imagine sequential reading (start with sector 8…) How should sectors numbers be laid out on disk?

When reading 16 after 15, the head won’t settle quick enough, so we need to do a rotation
Track Skew
2214
13 21
23 16 15 8
12 11 20 19
9 17
10 18

Track Skew
21 22 15 8
12 11 18 17
2014
13 19
9 23
10 16

Track Skew
enough time to settle now
2014
13 19
9 23
10 16
21
15 8
12 11 18 17
2
3

Track Skew Zones Cache
Other Improvements

Zones

Zones

Zones

Zones
ZBR (Zoned bit recording): More sectors on outer tracks Goal: Constant density across tracks

Disk Simulator: Zones
Performance characteristics of ZBR? Where do you want your data?
. /example-zones-outer. csh . /example-zones-inner. csh

Track Skew Zones Cache
Other Improvements

Drive Cache
Drives may cache both reads and writes – OS caches file data too (later lecture) – Disks contain ~16MB used as cache
What advantage does caching in drive have for reads? Spatial locality — Read-ahead: “Track buffer”
• Read contents of entire track into track buffer during rotational delay What advantage does caching in drive have for writes?
Write caching with volatile memory (i.e., not persistent)
• Immediate reporting: Claim written to disk when not
• Data could be lost on power failure
2214
13 21
23 16 15 8
12 11 20 19
9 17
10 18
spin

Drive Cache: Buffering
Tagged command queueing (TCQ)
– Have multiple outstanding requests to the disk
– Disk can reorder (schedule) requests for better performance

I/O Schedulers

I/O Schedulers
Given stream of I/O requests, in what order should they be served?
Goal: Minimize seek + rotation time
Much different than CPU scheduling
Position of disk head relative to request position matters more than length of job

Impact of Disk Scheduling?
Assume seek+rotate = 10 ms for random request How long (roughly) does the below workload take?
– Requestsaregiveninsectornumbers
300001, 700001, 300002, 700002, 300003, 700003
FCFS: Best possible:
~60ms
~20ms

Simulator
. /example-sched-fifo. csh
What would be a better algorithm than FIFO?
. /example-sched-sstf. csh ./example-rotate.csh
. /example-rotate-sptf. csh
. /example-rotate-question. csh

I/O Schedulers
OS
I/O Scheduler
Where should the I/O scheduler go?
Scheduler
Disk

SPTF (Shortest Positioning Time First)
Strategy: choose request w/ least positioning time (seek + rotation) How to implement in disk?
– Greedy algorithm (just looks for best NEXT decision)
How to implement in OS?
– Use Shortest Seek Time First (SSTF) instead – Approximate by scheduling by sector number
Disadvantages?
Easy for far away requests to starve

./example-starve.csh
Starvation

Avoid Starvation: SCAN
Elevator Algorithm (or Scan)
– Sweep back and forth, from one end of disk to other, and back – Serve requests as pass that cylinder in each direction
– Sorts by cylinder number; ignores rotation delays
Disadvantage?
Better: C-SCAN (circular scan) – Only sweep in one direction
2214
13 21
23 16 15 8
12 11 20 19
9 17
10 18
spin

Another approach: Bounded window
Much schedule all requests in one window before moving to next ./example-starve-bsatf.csh
What is the impact of different window sizes? . /example-bsatf-w1. csh
. /example-bsatf-wall. csh

Is SPTF best possible?
Compare time for identical workloads…
. /example-greedy-satf. csh . /example-optimal. csh
Not computationally feasible to determine optimal schedule (even if know all future requests)
Even worse since don’t know which requests will arrive next…

I/O Schedulers
OS
I/O Scheduler
Where should the I/O scheduler go?
Scheduler
Disk

What happens at OS Level?
Assume 2 processes A and B each call read() with C-SCAN
void reader(int fd) {
char buf[1024];
int rv;
while((rv = read(buf)) != 0) {
assert(rv);
// takes short time, e.g., < 1ms process(buf, rv); 23 16 15 8 } } 2113 1018 Stream of requests seen by disk: ABABABA 12 11 20 19 2214 9 17 spin Work Conservation Work conserving schedulers always do work if work exists – Principle applies to any type of scheduler (CPU too) Could be better to wait if can anticipate another request will arrive Such non-work-conserving schedulers are called anticipatory schedulers – Keeps resource idle while waiting for future requests Better stream of requests for OS to give disk: AAAAAABBBBBBAAAAAA I/O Device Summary Overlap I/O and CPU whenever possible! – Use interrupts, DMA Storage devices provide common block interface On a disk: Never do random I/O unless you must! – Quicksort is a terrible algorithm on disk Spend time to schedule on slow, stateful devices Next: Other storage devices (RAIDs and SSDs/flash)