Recall: How does the Processor Talk to the Device?
Processor Memory Bus
Bus Bus Adapt Adapt
Other Devices
Copyright By PowCoder代写 加微信 powcoder
Bus Interfac
Registers (port 0x20)
Addressabl e
Memory and/or
Queues Memory Mapped
Device Controller
Hardware Controller
write contr
Region: 0x8f008020
Joseph & Kubiatowicz CS162 © UCB Spring 2022
• CPU interacts with a Controller
– Contains a set of registers that can be read and written – May contain memory for request queues, etc.
• Processor accesses registers in two ways:
– Port-Mapped I/O: in/out instructions
» Example from the Intel architecture: out 0x21,AL
– Memory-mapped I/O: load/store instructions
» Registers/memory appear in physical address space » I/O accomplished with load and store instructions
Interrupt Request
Address + Data
Regular Memory
Interrupt Controller
Port-Mapped I/O in Pintos Speaker Driver
Pintos: devices/speaker.c
Pintos: threads/io.h
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Example: Memory-Mapped Display Controller
• Memory-Mapped:
– Hardware maps control registers and display memory into physical address space
» Addresses set by HW jumpers or at boot time
– Simply writing to display memory (also called the “frame buffer”) changes image on screen
» Addr: 0x8000F000 — 0x8000FFFF
– Writing graphics description to cmd queue
» Say enter a set of triangles describing some scene » Addr: 0x80010000 — 0x8001FFFF
– Writing to the command register may cause on-board graphics hardware to do something
» Say render the above scene » Addr:0x0007F004
• Can protect with address translation
0x80020000
0x80010000
0x8000F000
0x0007F004 0x0007F000
Graphics Comman
Display Memory
Physical Address Space
3/31/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022
There’s more than just a CPU in there!
3/31/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 18.4
Chip-scale Features of 2015 x86 (Sky Lake)
• Significant pieces:
– Four OOO cores with deeper buffers
» Intel MPX (Memory Protection Extensions) » Intel SGX (Software Guard Extensions)
» Issue up to 6 μ-ops/cycle
– GPU, System Agent (Mem, Fast I/O)
– Large shared L3 cache with on-chip ring bus » 2 MB/core instead of 1.5 MB/core
» High-BW access to L3 Cache
• Integrated I/O
– Integrated memory controller (IMC) » Two independent channels of DRAM
– High-speed PCI-Express (for Graphics cards)
– Direct Media Interface (DMI) Connection to PCH (Platform Control Hub)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Sky Lake I/O: PCH
Sky Lake System Configuration
• Platform Controller Hub
– Connected to processor with proprietary bus
» Direct Media Interface • Types of I/O on PCH:
– USB, Ethernet
– Thunderbolt 3
– Audio, BIOS support
– More PCI Express (lower speed than on Processor)
– SATA (for Disks)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Operational Parameters for I/O
• Data granularity: Byte vs. Block
– Some devices provide single byte at a time (e.g., keyboard) – Others provide whole blocks (e.g., disks, networks, etc.)
• Access pattern: Sequential vs. Random
– Some devices must be accessed sequentially (e.g., tape)
– Others can be accessed “randomly” (e.g., disk, cd, etc.) » Fixed overhead to start transfers
– Some devices require continual monitoring
– Others generate interrupts when they need service
• Transfer Mechanism: Programmed IO and DMA
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Transferring Data To/From Controller
• Programmed I/O:
– Each byte transferred via processor in/out or load/store
– Pro: Simple hardware, easy to program
– Con: Consumes processor cycles proportional to data size
• Direct Memory Access:
– Give controller access to memory bus
– Ask it to transfer
data blocks to/from memory directly
• Sample interaction with DMA controller (from OSC book):
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Transferring Data To/From Controller
• Programmed I/O:
– Each byte transferred via processor in/out or load/store
– Pro: Simple hardware, easy to program
– Con: Consumes processor cycles proportional to data size
• Direct Memory Access:
– Give controller access to memory bus
– Ask it to transfer
data blocks to/from memory directly
• Sample interaction with DMA controller (from OSC book):
Joseph & Kubiatowicz CS162 © UCB Spring 2022
I/O Device Notifying the OS
• The OS needs to know when:
– The I/O device has completed an operation – The I/O operation has encountered an error
• I/O Interrupt:
– Device generates an interrupt whenever it needs service – Pro: handles unpredictable events well
– Con: interrupts relatively high overhead
• Polling:
– OS periodically checks a device-specific status register » I/O device puts completion information in status register
– Pro: low overhead
– Con: may waste many cycles on polling if infrequent or unpredictable I/O operations
• Actual devices combine both polling and interrupts – For instance – High-bandwidth network adapter:
Interrupt for first incoming packet
Poll for following packets until hardware queues are empty
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Kernel Device Structure
The System Call Interface
Process Management
Memory Management
Filesystems
Device Control
Networking
Concurrency, Virtual Files and dirs: TTYs and multitasking memory the VFS device access
Connectivity
Architecture Dependent Code
Memory Manager
File System Types
Block Devices
Device Control
Network Subsystem
IF drivers
3/31/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 18.11
Recall: Device Drivers
• Device Driver: Device-specific code in the kernel that interacts directly with the device hardware
– Supports a standard, internal interface
– Same kernel I/O system can interact easily with different device drivers
– Special device-specific configuration supported with the ioctl() system call
• Device Drivers typically divided into two pieces: – Top half: accessed in call path from system calls
» implements a set of standard, cross-device calls like open(), close(), read(), write(), ioctl(), strategy()
» This is the kernel’s interface to the device driver
» Top half will start I/O to device, may put thread to sleep until finished – Bottom half: run as interrupt routine
» Gets input or transfers next block of output
» May wake sleeping threads if I/O now complete
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Recall: Life Cycle of An I/O Request
User Program
Kernel I/O Subsystem
Device Driver Top Half
Device Driver Bottom Half
Device Hardware
Joseph & Kubiatowicz CS162 © UCB Spring 2022
The Goal of the I/O Subsystem
• Provide Uniform Interfaces, Despite Wide Range of Different Devices – This code works on many different devices:
FILE fd = fopen(“/dev/something”, “rw”);
for (int i = 0; i < 10; i++) {
fprintf(fd, "Count %d\n", i);
close(fd);
– Why? Because code that controls devices (“device driver”) implements standard interface
• We will try to get a flavor for what is involved in actually controlling devices in rest of lecture
– Can only scratch surface!
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Want Standard Interfaces to Devices
• Block Devices: e.g. disk drives, tape drives, DVD-ROM
– Access blocks of data
– Commands include open(), read(), write(), seek() – Raw I/O or file-system access
– Memory-mapped file access possible
• Character Devices: e.g. keyboards, mice, serial ports, some USB devices – Single characters at a time
– Commands include get(), put()
– Libraries layered on top allow line editing
• Network Devices: e.g. Ethernet,Wireless, Bluetooth
– Different enough from block/character to have own interface – Unix and Windows include socket interface
» Separates network protocol from network operation
» Includes select() functionality
– Usage: pipes, FIFOs, streams, queues, mailboxes
Joseph & Kubiatowicz CS162 © UCB Spring 2022
How Does User Deal with Timing?
• Blocking Interface: “Wait”
– When request data (e.g. read() system call), put process to sleep
until data is ready
– When write data (e.g. write() system call), put process to sleep until device is ready for data
• Non-blocking Interface: “Don’t Wait”
– Returns quickly from read or write request with count of bytes
successfully transferred
– Read may return nothing, write may write nothing
• Asynchronous Interface: “Tell Me Later”
– When request data, take pointer to user’s buffer, return immediately;
later kernel fills buffer and notifies user
– When send data, take pointer to user’s buffer, return immediately; later kernel takes data and notifies user
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Storage Devices
• Magnetic disks
– Storage that rarely becomes corrupted
– Large capacity at low cost
– Block level random access (except for SMR – later!) – Slow performance for random access
– Better performance for sequential access
• Flash memory
– Storage that rarely becomes corrupted
– Capacity at intermediate cost (5-20x disk)
– Block level random access
– Good performance for reads; worse for random writes – Erasure requirement in large blocks
– Wear patterns issue
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Hard Disk Drives (HDDs)
Western Digital Drive http://www.storagereview.com/gu
IBM Personal Computer/AT (1986) 30 MB hard disk - $500 30-40ms seek time
0.7-1 MB/s (est.)
IBM/Hitachi Microdrive
Read/Write Head
Joseph & Kubiatowicz CS162 © UCB Spring 2022
The Amazing Magnetic Disk
• Unit of Transfer: Sector
– Ring of sectors form a track
– Stack of tracks form a cylinder – Heads position on cylinders
• Disk Tracks ~ 1μm (micron) wide – Wavelength of light is ~ 0.5μm
– Resolution of human eye: 50μm
– 100K tracks on a typical 2.5” disk
• Separated by unused guard regions
– Reduces likelihood neighboring tracks are corrupted during writes (still a small non-zero chance)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
The Amazing Magnetic Disk
• Track length varies across disk
– Outside: More sectors per track, higher bandwidth
– Disk is organized into regions of tracks with same # of sectors/track
– Only outer half of radius is used
» Most of the disk area in the outer regions of the disk
• Disks so big that some companies (like Google) reportedly only use part of disk for active data
– Rest is archival data
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Shingled Magnetic Recording (SMR)
• Overlapping tracks yields greater density, capacity
• Restrictions on writing, complex DSP for reading
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Review: Magnetic Disks
• Cylinders: all the tracks under the head at a given point on all surfaces
Track Sector
• Read/write data is a three-stage process:
– Seek time: position the head/arm over the proper track
– Rotational latency: wait for desired sector to rotate under r/w head – Transfer time: transfer a block of bits (sector) under r/w head
Disk Latency = Queueing Time + Controller time + Seek Time + Rotation Time + Xfer Time
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Software Queue
(Device Driver)
Media Time (Seek+Rot+Xfer)
Hardware Controlle
Typical Numbers for Magnetic Disk
Info/Range
Space/Density
Space: 18TB (Seagate), 9 platters, in 31⁄2 inch form factor!
Areal Density: ≥ 1 Terabit/square inch! (PMR, Helium, ...)
Average Seek Time
Typically 4-6 milliseconds
Average Rotational Latency
Most laptop/desktop disks rotate at 3600-7200 RPM (16-8 ms/rotation). Server disks up to 15,000 RPM. Average latency is halfway around disk so 4-8 milliseconds
Controller Time
Depends on controller hardware
Transfer Time
Typically 50 to 270 MB/s. Depends on:
• Transfer size (usually a sector): 512B – 1KB per sector
• Rotation speed: 3600 RPM to 15000 RPM
• Recording density: bits per inch on a track • Diameter: ranges from 1 in to 5.25 in
Used to drop by a factor of two every 1.5 years (or faster), now slowing down
3/31/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 18.23
Disk Performance Example
• Assumptions:
– Ignoring queuing and controller times for now
– Avg seek time of 5ms,
– 7200RPM ⇒ Time for rotation: 60000 (ms/min) / 7200(rev/min) ~= 8ms
– Transfer rate of 50MByte/s, block size of 4Kbyte ⇒
4096 bytes/50×106 (bytes/s) = 81.92 × 10-6 sec ≅ 0.082 ms for 1 sector
• Read block from random place on disk:
– Seek (5ms) + Rot. Delay (4ms) + Transfer (0.082ms) = 9.082ms
– Approx 9ms to fetch/put data: 4096 bytes/9.082×10-3 s ≅ 451KB/s
• Read block from random place in same cylinder:
– Rot. Delay (4ms) + Transfer (0.082ms) = 4.082ms
– Approx 4ms to fetch/put data: 4096 bytes/4.082×10-3 s ≅ 1.03MB/s
• Read next block on same track:
– Transfer (0.082ms): 4096 bytes/0.082×10-3 s ≅ 50MB/sec
• Key to using disk effectively (especially for file systems) is to minimize seek and rotational delays
Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 18.24
Lots of Intelligence in the Controller
• Sectors contain sophisticated error correcting codes – Disk head magnet has a field wider than track
– Hide corruptions due to neighboring track writes
• Sector sparing
– Remap bad sectors transparently to spare sectors on the same surface
• Slip sparing
– Remap all sectors (when there is a bad sector) to preserve sequential behavior
• Track skewing
– Sector numbers offset from one track to the next, to allow for disk head movement for sequential ops
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Hard Drive Prices over Time
3/31/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 18.26
Example of Current HDDs
• X18 (2020)
– 18 TB hard disk
» 9 platters, 18 heads
» Helium filled: reduce friction and power
– 4.16ms average seek time
– 4096 byte physical sectors
– 7200 RPMs
– Dual 6 Gbps SATA /12Gbps SAS interface
» 270MB/s MAX transfer rate
» Cache size: 256MB
– Price: $ 562 (~ $0.03/GB)
• IBM Personal Computer/AT (1986) – 30 MB hard disk
– 30-40ms seek time
– 0.7-1 MB/s (est.)
– Price: $500 ($17K/GB, 340,000x more expensive !!)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Solid State Disks (SSDs)
• 1995 – Replace rotating magnetic media with non-volatile memory (battery backed DRAM)
• 2009 – Use NAND Multi-Level Cell (2 or 3-bit/cell) flash memory
– Sector (4 KB page) addressable, but stores 4-64 “pages” per memory block
– Trapped electrons distinguish between 1 and 0
• No moving parts (no rotate/seek motors)
– Eliminates seek and rotational delay (0.1-0.2ms access time) – Very low power and lightweight
– Limited “write cycles”
• Rapid advances in capacity and cost ever since!
Joseph & Kubiatowicz CS162 © UCB Spring 2022
FLASH Memory
Samsung 2015: 512GB, NAND Flash
• Like a normal transistor but:
– Has a floating gate that can hold charge
– To write: raise or lower wordline high enough to cause charges to tunnel – To read: turn on wordline as if normal transistor
» presence of charge changes threshold and thus measured current • Two varieties:
– NAND: denser, must be read and written in blocks
– NOR: much less dense, fast to read and write
• V-NAND: 3D stacking (Samsung claims 1TB possible in 1 chip)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Flash Memory (Con’t)
• Data read and written in page-sized chunks (e.g. 4K)
– Cannot be addressed at byte level
– Random access at block level for reads (no locality advantage) – Writing of new blocks handled in order (kinda like a log)
• Before writing, must be erased (256K block at a time)
– Requires free-list management
– CANNOT write over existing block (Copy-on-Write is normal case)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
SSD Architecture – Reads
Flash Memory Controll
Buffer Manager (softwar
e er DD DD
Read 4 KB Page: ~25 usec
– No seek or rotational latency
– Transfer time: transfer a 4KB page
» SATA: 300-600MB/s => ~4 x103 b / 400 x 106 bps => 10 us
– Latency = Queuing Time + Controller time + Xfer Time – Highest Bandwidth: Sequential OR Random reads
3/31/2022 Joseph & Kubiatowicz CS162 © UCB Spring 2022
SSD Architecture – Writes
• Writing data is complex! (~200μs – 1.7ms)
– Can only write empty pages in a block
– Erasing a block takes ~1.5ms
– Controller maintains pool of empty blocks by coalescing used pages (read, erase, write), also reserves some % of capacity
• Rule of thumb: writes 10x reads, erasure 10x writes
https://en.wikipedia.org/wiki/Solid-state _drive
Joseph & Kubiatowicz CS162 © UCB Spring 2022 Lec 18.32
SSD Architecture – Writes
• SSDs provide same interface as HDDs to OS – read and write chunk (4KB) at a time
• But can only overwrite data 256KB at a time!
• Why not just erase and rewrite new version of entire 256KB block?
– Erasure is very slow (milliseconds)
– Each block has a finite lifetime, can only be erased and rewritten about 10K times – Heavily used blocks likely to wear out quickly
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Solution – Two Systems Principles
Layer of Indirection
– Maintain a Flash Translation Layer (FTL) in SSD
– Map virtual block numbers (which OS uses) to physical page numbers (which flash mem. controller uses)
– Can now freely relocate data w/o OS knowing
Copy on Write
– Don’t overwrite a page when OS updates its data – Instead, write new version in a free page
– Update FTL mapping to point to new location
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Flash Translation Layer
• No need to erase and rewrite entire 256KB block when making small modifications
• SSD controller can assign mappings to spread workload across pages – Wear Levelling
• What to do with old versions of pages?
– Garbage Collection in background
– Erase blocks with old pages, add to free list
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Some “Current” (large) 3.5in SSDs
• SSD: 15.36TB (2017) – Dual 12Gb/s interface
– Seq reads 860MB/s
– Seq writes 920MB/s
– Random Reads (IOPS): 102K
– Random Writes (IOPS): 15K
– Price (Amazon): $5495 ($0.36/GB)
• Nimbus SSD: 100TB (2019)
– Dual port: 12Gb/s interface
– Seq reads/writes: 500MB/s
– Random Read Ops (IOPS): 100K – Unlimited writes for 5 years!
– Price: ~ $40K? ($0.4/GB)
» However, 50TB drive costs $12500 ($0.25/GB)
Joseph & Kubiatowicz CS162 © UCB Spring 2022
Amusing calculation:
Is a full Kindle heavier than an empty one?
• Actually,“Yes”, but not by much
• Flash works by trapping electrons:
– So, erased state lower energy than written state
• Assuming that:
– Kindle has 4GB flash
– 1⁄2 of all bits in full Kindle are in high-energy state
– High-energy state about 10-15 joules higher
– Then: Full Kindle is 1 attogram (10-18gram) heavier (Using E = mc2)
• Of course, this is less than most sensitive scale can measure (it can measure 10-9 grams)
• Of course, this weight difference overwhelmed by battery discharge, weight from getting warm, ….
• Source: ( Times, Oct 24, 2011)
Joseph & K
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com