Recall: Magnetic Disks
• Cylinders: all the tracks under the head at a given point on all surfaces
Track Sector
• Read/write data is a three-stage process:
Copyright By PowCoder代写 加微信 powcoder
– Seek time: position the head/arm over the proper track
– Rotational latency: wait for desired sector to rotate under r/w head – Transfer time: transfer a block of bits (sector) under r/w head
Disk Latency = Queueing Time + Controller time + Seek Time + Rotation Time + Xfer Time
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Software Queue
(Device Driver)
Media Time (Seek+Rot+Xfer)
Hardware Controlle
Recall:Typical Numbers for Magnetic Disk
Info/Range
Space/Density
Space: 18TB (Seagate), 9 platters, in 31⁄2 inch form factor!
Areal Density: ≥ 1 Terabit/square inch! (PMR, Helium, …)
Average Seek Time
Typically 4-6 milliseconds
Average Rotational Latency
Most laptop/desktop disks rotate at 3600-7200 RPM (16-8 ms/rotation). Server disks up to 15,000 RPM. Average latency is halfway around disk so 4-8 milliseconds
Controller Time
Depends on controller hardware
Transfer Time
Typically 50 to 250 MB/s. Depends on:
• Transfer size (usually a sector): 512B – 1KB per sector
• Rotation speed: 3600 RPM to 15000 RPM
• Recording density: bits per inch on a track • Diameter: ranges from 1 in to 5.25 in
Used to drop by a factor of two every 1.5 years (or faster), now slowing down
4/5/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 19.2
Recall: FLASH Memory
Samsung 2015: 512GB, NAND Flash
• Like a normal transistor but:
– Has a floating gate that can hold charge
– To write: raise or lower wordline high enough to cause charges to tunnel – To read: turn on wordline as if normal transistor
» presence of charge changes threshold and thus measured current • Two varieties:
– NAND: denser, must be read and written in blocks
– NOR: much less dense, fast to read and write
• V-NAND: 3D stacking (Samsung claims 1TB possible in 1 chip)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Recall: SSD Summary
• Pros (vs. hard disk drives):
– Low latency, high throughput (eliminate seek/rotational delay)
– No moving parts:
» Very light weight, low power, silent, very shock insensitive
– Read at memory speeds (limited by controller and I/O bus) • Cons
– Small storage (0.1-0.5x disk), expensive (3-20x disk)
» Hybrid alternative: combine small SSD with large HDD
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Recall: SSD Summary
• Pros (vs. hard disk drives):
– Low latency, high throughput (eliminate seek/rotational delay)
– No moving parts:
» Very light weight, low power, silent, very shock insensitive
– Read at memory speeds (limited by controller and I/O bus) • Cons
– Small storage (0.1-0.5x disk), expensive (3-20x disk)
» Hybrid alternative: combine small SSD with large HDD
No longer true!
– Asymmetric block write performance: read pg/erase/write pg
» Controller garbage collection (GC) algorithms have major effect on performance
– Limited drive lifetime
» 1-10K writes/page for MLC NAND
» Avg failure rate is 6 years, life expectancy is 9–11 years
• These are changing rapidly!
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Nano-Tube Memory (NANTERO)
• Yet another possibility: Nanotube memory
– NanoTubes between two electrodes, slight conductivity difference between ones and zeros
– No wearout!
• Better than DRAM?
– Speed of DRAM, no wearout, non-volatile!
– Nantero promises 512Gb/dice for 8Tb/chip! (with 16 die stacking)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Ways of Measuring Performance:Times (s) and Rates (op/s)
• Latency – time to complete a task
– Measured in units of time (s, ms, us, …, hours, years)
• Response Time – time to initiate and operation and get its response – Able to issue one that depends on the result
– Know that it is done (anti-dependence, resource usage)
• Throughput or Bandwidth – rate at which tasks are performed – Measured in units of things per unit time (ops/s, GFLOP/s)
• Start up or “Overhead” – time to initiate an operation
• Most I/O operations are roughly linear in b bytes – Latency(b) = Overhead + b/TransferCapacity
• Performance???
– Operation time (4 mins to run a mile…) – Rate (mph, mpg, …)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Example: Overhead in Fast Network
Length (x)
4/5/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022
Example: 10 ms Startup Cost (e.g., Disk)
Half-power x = 1,250,000 bytes!
Length (x)
4/5/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022
What Determines Peak BW for I/O?
• Bus Speed
– PCI-X: 1064 MB/s = 133 MHz x 64 bit (per lane)
– ULTRA WIDE SCSI: 40 MB/s
– Serial Attached SCSI & Serial ATA & IEEE 1394 (firewire): 1.6 Gb/s full duplex (200 MB/s)
– USB 3.0 – 5 Gb/s
– Thunderbolt 3 – 40 Gb/s
• Device Transfer Bandwidth
– Rotational speed of disk
– Write / Read rate of NAND flash – Signaling rate of network link
• Whatever is the bottleneck in the path…
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Sequential Server Performance
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Single Pipelined Server
LL logical operation
divided over distinct resources
LLLL LLL …
Joseph & Kubiatowicz CS162 © UCB Spring 2022
I/O Processing User Process
Communication
File System
Upper Driver
Lower Driver
Example Systems “Pipelines”
• Anything with queues between operational process behaves roughly “pipeline like”
• Important difference is that “initiations” are decoupled from processing – May have to queue up a burst of operations
– Not synchronous and deterministic like in 61C
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Multiple Servers
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Example Systems “Parallelism”
I/O Processing
User Process User Process
User Process Communication
File Upper
Lower Driver
Parallel Computation, Databases, …
Joseph & Kubiatowicz CS162 © UCB Spring 2022
I/O Performance
[OS Paths] 100
Response Time = Queue + I/O device service time
Response Time (ms)
User Thread
• Performance of I/O subsystem –Metrics:ResponseTime,Throughput
EffBW(n) = n / (S + n/B) = B / (1 + SB/n )
Fixed overhead
I/O device
– Effective BW per op = transfer size / response time
Throughput (Utilization)
(% total BW)
time per op
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Controller
I/O Performance
[OS Paths] 100
Response Time = Queue + I/O device service time
Response Time (ms)
User Thread
I/O device
• Performance of I/O subsystem
–Metrics:ResponseTime,Throughput
100% Throughput (Utilization)
– Solutions?
Joseph & Kubiatowicz CS162 © UCB Spring 2022
– Effective BW per op = transfer size / response time
» EffBW(n) = n / (S + n/B) = B / (1 + SB/n )
– Contributing factors to latency:
» Software paths (can be loosely modeled by a queue) » Hardware controller
» I/O device service time
(% total BW)
• Queuing behavior:
– Can lead to big increases of latency as utilization increases
Controller
A Simple Deterministic World
arrivals Server departures
TQ TS TTT
• Assume requests arrive at regular intervals, take a fixed time
to process, with plenty of time between …
• Service rate (μ = 1/TS) – operations per second
• Arrival rate: (λ = 1/TA) – requests per second
• Utilization: U = λ/μ , where λ < μ
• Average rate is the complete story
Joseph & Kubiatowicz CS162 © UCB Spring 2022
A Ideal Linear World
Saturation
– Grows unbounded at a rate ~ (Ts/TA) till request rate subsides
Empty Queue
0101 Offered Load (TS/TA) Offered Load (TS/TA)
• What does the queue wait time look like?
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Queue delay
Queue delay
Delivered Throughput
Delivered Throughput
A Bursty World
Arrivals Q depth
Server TQ TS
departures
• Requests arrive in a burst, must queue up till served
• Same average arrival time, but almost all of the requests experience large queue delays
• Even though average utilization is low
Joseph & Kubiatowicz CS162 © UCB Spring 2022
So how do we model the burstiness of arrival?
• Elegant mathematical framework if you start with exponential distribution
– Probability density function of a continuous random variable with a mean of 1/λ
– f(x) = λe-λx
– “Memoryless”
Likelihood of an event occurring is independent of how long we’ve been waiting
Lots of short arrival intervals (i.e., high instantaneous rate)
Few long gaps (i.e., low instantaneous rate)
mean arrival interval (1/λ)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Background:
General Use of Random Distributions
• Server spends variable time (T) with customers – Mean (Average) m = Σp(T)×T
Distribution
of service times
Memoryless
σ – Variance (stddev2) σ2 = Σp(T)×(T-m)2 = Σp(T)×T2-m2
– Squared coefficient of variance: C = σ2/m2 Aggregate description of the distribution
• Important values of C:
– No variance or deterministic ⇒ C=0
– “Memoryless” or exponential ⇒ C=1
» Past tells nothing about future
» Poisson process – purely or completely random process
» Many complex systems (or aggregates) are well described as memoryless
– Disk response times C ≈ 1.5 (majority seeks < average) Joseph & Kubiatowicz CS162 © UCB Spring 2022
Introduction to Queuing Theory
Disk Queuing System
• What about queuing time??
– Let’s apply some queuing theory
Departures
– Queuing Theory applies to long term, steady state behavior ⇒ Arrival rate = Departure rate
• Arrivals characterized by some probabilistic distribution
• Departures characterized by some probabilistic distribution
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Controll er
Little’s Law
N departures L
• In any stable system
– Average arrival rate = Average departure rate
• The average number of jobs/tasks in the system (N) is equal to arrival time / throughput (λ) times the response time (L)
– N (jobs) = λ (jobs/s) x L (s)
• Regardless of structure, bursts of requests, variation in service
– Instantaneous variations, but it washes out in the average – Overall, requests match departures
Joseph & Kubiatowicz CS162 © UCB Spring 2022
N = 5 jobs
0 1 2 3 4 5 6 7 8 9 10111213141516
A: N = λ x L
• E.g., N = λ x L = 5
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Theorem: Proof Sketch
arrivals N λ
departures
L(i) = response time of job i N(t) = number of jobs in system
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Theorem: Proof Sketch
departures
L(i) = response time of job i N(t) = number of jobs in system
What is the system occupancy, i.e., average number of jobs in the system?
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Theorem: Proof Sketch
departures
L(i) = response time of job i N(t) = number of jobs in system
S(i) = L(i) * 1 = L(i)
S = S(1) + S(2) + ... + S(k) = L(1) + L(2) + ... + L(k)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Theorem: Proof Sketch
arrivals N λ
departures
L(i) = response time of job i N(t) = number of jobs in system
S(i) = L(i) * 1 = L(i)
Average occupancy (Navg) = S/T
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Theorem: Proof Sketch
arrivals N λ
departures
L(i) = response time of job i N(t) = number of jobs in system
S(i) = L(i) * 1 = L(i)
Navg = S/T = (L(1) + ... + L(k))/T
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Theorem: Proof Sketch
departures
L(i) = response time of job i N(t) = number of jobs in system
S(i) = L(i) * 1 = L(i)
Navg = (L(1) + ... + L(k))/T = (Ntotal/T)*(L(1) + ... + L(k))/Ntotal Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Theorem: Proof Sketch
departures
L(i) = response time of job i N(t) = number of jobs in system
S(i) = L(i) * 1 = L(i)
Navg = (Ntotal/T)*(L(1) + ... + L(k))/Ntotal = λavg × Lavg
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Theorem: Proof Sketch
arrivals N λ
departures
L(i) = response time of job i N(t) = number of jobs in system
S(i) = L(i) * 1 = L(i)
Navg = λavg × Lavg
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Little’s Law Applied to a Queue
• When Little’s Law applied to a queue, we get: Average Arrival Rate
Average length of the queue
Average time “waiting” in queue
Joseph & Kubiatowicz CS162 © UCB Spring 2022
A Little Queuing Theory: Computing TQ
• Assumptions: Why does response/queueing – System in equilibrium; No limit to the queue delay grow unboundedly even
– Time between successive arrivals is random and memoryless
Service Rate e μ=1/Tser r
though the utilization is < 1 ?
Arrival Rate λ
300 200 100
Response Time (ms)
• Parameters that describe our system:
– λ: mean number of arriving customers/second – Tser: mean time to service a customer (“m1”)
– C: squared coefficient of variance = σ2/m12
– μ:service rate = 1/Tser
– u:server utilization (0≤u≤1): u = λ/μ = λ × Tser
• Results:
– Memoryless service distribution (C = 1): (an “M/M/1 queue”):
» Tq = Tser x u/(1 – u)
– General service distribution, 1 server (an “M/G/1 queue”): » Tq = Tser x 1⁄2(1+C) x u/(1 – u)
Throughput (Utilization) (% total BW)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
System Performance In presence of a Queue
Operation Time
“Half-Power Point” : load at which system delivers half of peak performance - Design and provision systems to operate roughly in this regime
- Latency low and predictable, utilization good: ~50%
4/5/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 19.36
Why unbounded response time?
• Assume deterministic arrival process and service time
– Possible to sustain utilization = 1 with bounded response time!
arrival time
service time
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Why unbounded response time?
• Assume stochastic arrival process (and service time)
– No longer possible to achieve utilization = 1
300 200 100
Response Time (ms)
This wasted time can never be reclaimed!
So cannot achieve u = 1!
Throughput (Utilization)
(% total BW) time
Joseph & Kubiatowicz CS162 © UCB Spring 2022
A Little Queuing Theory:An Example
• Example Usage Statistics:
– User requests 10 x 8KB disk I/Os per second
– Requests & service exponentially distributed (C=1.0)
– Avg. service = 20 ms (From controller+seek+rot+trans)
• Questions:
– How utilized is the disk?
» Ans: server utilization, u = λTser
– What is the average time spent in the queue?
– What is the number of requests in the queue?
– What is the avg response time for disk request?
» Ans: Tsys = Tq + Tser • Computation:
λ (avg # arriving customers/s) = 10/s
Tser (avg time to service customer) = 20 ms (0.02s) u (server utilization) = λ x Tser= 10/s x .02s = 0.2 Tq (avg time/customer in queue) = Tser x u/(1 – u)
= 20 x 0.2/(1-0.2) = 20 x 0.25 = 5 ms (0 .005s)
(avg length of queue) = λ x Tq=10/s x .005s = 0.05 (avg time/customer in system) =Tq + Tser= 25 ms
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Queuing Theory Resources
• Resources page contains Queueing Theory Resources (under Readings):
– Scanned pages from Patterson and Hennessy book that gives further discussion and simple proof for general equation: https://cs162.eecs.berkeley.edu/static/readings/patterson_queue.pdf
– A complete website full of resources:
http://web2.uwindsor.ca/math/hlynka/qonline.html
• Some previous midterms with queueing theory questions • Assume that Queueing Theory is fair game for Midterm III!
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Optimize I/O Performance
User Thread
I/O device
300 200 100
Response Time (ms)
Queue [OS Paths]
Response Time =
Queue + I/O device service time
• How to improve performance?
– Make everything faster ☺
– More Decoupled (Parallelism) systems
Throughput (Utilization) (% total BW)
» multiple independent buses or controllers
– Optimize the bottleneck to increase service rate
» Use the queue to optimize the service
– Do other useful work while waiting
• Queues absorb bursts and smooth the flow • Admissions control (finite queues)
– Limits delays, but may introduce unfairness and livelock
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Controller
When is Disk Performance Highest?
• When there are big sequential reads, or
• When there is so much work to do that they can be piggy backed (reordering queues—one moment)
• OK to be inefficient when things are mostly idle
• Bursts are both a threat and an opportunity
•
• Other techniques:
– Reduce overhead through user level drivers
– Reduce the impact of I/O delays by doing other useful work in the meantime
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Disk Scheduling (1/3)
• Disk can do only one request at a time;What order do you choose to do queued requests?
User Requests
• FIFO Order
– Fair among requesters, but order of arrival may be
to random spots on the disk ⇒Very long seeks • SSTF: Shortest seek time first
– Pick the request that’s closest on the disk
– Although called SSTF, today must include
rotational delay in calculation, since
rotation can be as long as seek
– Con: SSTF good at reducing seeks, but
may lead to starvation
Joseph & Kubiatowicz CS162 © UCB Spring 2022
2,3 2,1 3,10 7,2 5,2 2,2
Disk Scheduling (2/3)
• Disk can do only one request at a time;What order do you choose to do queued requests?
User Requests
• SCAN: Implements an Elevator Algorithm: take the closest request in the direction of travel
– No starvation, but retains flavor of SSTF
Joseph & Kubiatowicz CS162 © UCB Spring 2022
2,3 2,1 3,10 7,2 5,2 2,2
Disk Scheduling (3/3)
• Disk can do only one request at a time;What order do you choose to do queued requests?
User Requests
• C-SCAN: Circular-Scan: only goes in one direction
– Skips any request
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com