程序代写代做代考 compiler C arm clock cache graph Carnegie Mellon

Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
1
14-513
18-613

Carnegie Mellon
The Memory Hierarchy
15-213/18-213/14-513/15-513/18-613: Introduction to Computer Systems 10th Lecture, October 1, 2020
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
2

Carnegie Mellon
Announcements
 C Bootcamp Sunday 10/4, 7-9pm
 Recitation Monday: Stacks and Attacks  Lab 3 (attacklab)
▪ Due Thur, Oct. 8, 11:59pm ET
 Written Assignment 3 peer grading
▪ Due Wed, Oct. 7, 11:59pm ET
 Written Assignment 4 available on Canvas
▪ Due Wed, Oct. 7, 11:59pm ET
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
3

Carnegie Mellon
Today
 The memory abstraction
 RAM : main memory building block  Locality of reference
 The memory hierarchy
 Storage technologies and trends
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
4

Carnegie Mellon
Writing & Reading Memory  Write
▪ Transfer data from CPU to memory movq %rax, 8(%rsp)
▪ “Store” operation  Read
▪ Transfer data from memory to CPU movq 8(%rsp), %rax
▪ “Load” operation
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
From 5th lecture
5

Carnegie Mellon
Traditional Bus Structure Connecting CPU and Memory
 A bus is a collection of parallel wires that carry address, data, and control signals.
 Buses are typically shared by multiple devices. CPU chip
Register file
ALU
System bus Memory bus
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
6
I/O bridge
Main memory

Carnegie Mellon
Memory Read Transaction (1)  CPU places address A on the memory bus.
CPU chip
Register file
%rax
Load operation: movq A, %rax
Main memory I/O bridge A 0
A
ALU
x
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
7

Carnegie Mellon
Memory Read Transaction (2)
 Main memory reads A from the memory bus, retrieves word x, and places it on the bus.
Register file
%rax
Load operation: movq A, %rax
I/O bridge
x
Main memory
0 A
x
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
8
ALU

Carnegie Mellon
Memory Read Transaction (3)
 CPU read word x from the bus and copies it into register
%rax.
Register file
%rax
Load operation: movq A, %rax
x
ALU
I/O bridge
Main memory 0
A
x
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
9

Carnegie Mellon
Memory Write Transaction (1)
 CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive.
Register file
%rax
Store operation: movq %rax, A
y
ALU
I/O bridge
A
Main memory 0
A
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
10

Carnegie Mellon
Memory Write Transaction (2)  CPU places data word y on the bus.
Register file
%rax
Store operation: movq %rax, A
ALU
y
I/O bridge
y
Main memory 0
A
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
11

Carnegie Mellon
Memory Write Transaction (3)
 Main memory reads data word y from the bus and stores it at address A.
Register file
%rax
Store operation: movq %rax, A
y
ALU
I/O bridge
Main memory 0
A
y
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
12

Carnegie Mellon
Today
 The memory abstraction
 RAM : main memory building block  Locality of reference
 The memory hierarchy
 Storage technologies and trends
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
13

Carnegie Mellon
Random-Access Memory (RAM)
 Key features
▪ RAM is traditionally packaged as a chip.
▪ or embedded as part of processor chip
▪ Basic storage unit is normally a cell (one bit per cell). ▪ Multiple RAM chips form a memory.
 RAM comes in two varieties: ▪ SRAM (Static RAM)
▪ DRAM (Dynamic RAM)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
14

Carnegie Mellon
RAM Technologies  DRAM
 1 Transistor + 1 capacitor / bit
▪ Capacitor oriented vertically
 Must refresh state
periodically
 SRAM
 6 transistors / bit
 Holds state indefinitely (but will still lose data on power loss)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
15

Carnegie Mellon
SRAM vs DRAM Summary
Trans. Access Needs Needs
per bit time refresh? EDC? Cost
SRAM 6 or 8 1x No Maybe 100x DRAM 1 10x Yes Yes 1x
Applications
Cache memories
Main memories, frame buffers
EDC: Error detection and correction
 Trends
▪ SRAM scales with semiconductor technology
▪ Reaching its limits
▪ DRAM scaling limited by need for minimum capacitance
▪ Aspect ratio limits how deep can make capacitor ▪ Also reaching its limits
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
16

Carnegie Mellon
Enhanced DRAMs
 Operation of DRAM cell has not changed since its invention
▪ Commercialized by Intel in 1970.
 DRAM cores with better interface logic and faster I/O : ▪ Synchronous DRAM (SDRAM)
▪ Uses a conventional clock signal instead of asynchronous control
▪ Double data-rate synchronous DRAM (DDR SDRAM)
▪ Double edge clocking sends two bits per cycle per pin
▪ Different types distinguished by size of small prefetch buffer:
– DDR (2 bits), DDR2 (4 bits), DDR3 (8 bits), DDR4 (16 bits) ▪ By 2010, standard for most server and desktop systems
▪ Intel Core i7 supports DDR3 and DDR4 SDRAM
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
17

Carnegie Mellon
Conventional DRAM Organization  dxwDRAM:
▪ d⋅ w total bits organized as d supercells of size w bits 16 x 8 DRAM chip (toy example)
2 bits /
addr
0 0
1 rows
2 3
cols 123
Memory controller
(to/from CPU)
supercell (2,1)
8 bits
/
data
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
18
Internal row buffer

Carnegie Mellon
Reading DRAM Supercell (2,1)
Step 1(a): Row access strobe (RAS) selects row 2.
Step 1(b): Row 2 copied from DRAM array to row buffer.
16 x 8 DRAM chip
RAS = 2
2 /
addr
Memory controller
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
19
8
/
data
0 0
1
Rows
2 3
Cols
123
Internal row buffer

Carnegie Mellon
Reading DRAM Supercell (2,1)
Step 2(a): Column access strobe (CAS) selects column 1.
Step 2(b): Supercell (2,1) copied from buffer to data lines, and eventually
back to the CPU.
Step 3: All data written back to row to provide refresh
To CPU
supercell (2,1)
16 x 8 DRAM chip
CAS = 1
2 /
addr
Memory controller
8
/
data
supercell
(2,1)
20
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
0 0
1 Rows
2 3
Cols 123
Internal row buffer

Carnegie Mellon
Memory Modules
addr (row = i, col = j)
: supercell (i,j)
64 MB
memory module consisting of
eight 8Mx8 DRAMs
DRAM 0
DRAM
7
bits bits bits 56-63 48-55 40-47
bits bits 32-39 24-31
bits bits bits 16-23 8-15 0-7
63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
64-bit word main memory address A
Memory controller
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
21
64-bit word

Carnegie Mellon
Today
 The memory Abstraction
 DRAM : main memory building block  Locality of reference
 The memory hierarchy
 Storage technologies and trends
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
22

Carnegie Mellon
The CPU-Memory Gap
The gap
100,000,000.0 10,000,000.0 1,000,000.0 100,000.0 10,000.0 1,000.0 100.0 10.0 1.0 0.1 0.0
between DRAM, disk, and CPU speeds.
Disk SSD
DRAM
CPU
1985 1990 1995 2000 2003 2005 2010 2015
Effective CPU cycle time: accounts for parallelism within CPU (e.g., multiple
Disk seek time
SSD access time
DRAM access time SRAM access time
CPU cycle time Effective CPU cycle time
Year cores per CPU) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
23
Time (ns)

Carnegie Mellon
Locality to the Rescue!
The key to bridging this CPU-Memory gap is a fundamental
property of computer programs known as locality.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
24

Carnegie Mellon
Locality
 Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently
 Temporal locality:
▪ Recently referenced items are likely
to be referenced again in the near future
 Spatial locality:
▪ Items with nearby addresses tend
to be referenced close together in time
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
25

Carnegie Mellon
Locality Example
 Data references
▪ Reference array elements in succession
(stride-1 reference pattern).
▪ Reference variable sum each iteration.
 Instruction references
▪ Reference instructions in sequence. ▪ Cycle through loop repeatedly.
Spatial or Temporal Locality?
spatial temporal
spatial temporal
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
26
sum = 0;
for (i = 0; i < n; i++) sum += a[i]; return sum; Carnegie Mellon Qualitative Estimates of Locality  Claim: Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer.  Question: Does this function have good locality with respect to array a? Hint: array layout is row-major order Answer: yes Stride-1 reference pattern int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; } a [0] [0] ••• a [0] [N-1] a [1] [0] ••• a [1] [N-1] ••• a [M-1] [0] ••• a [M-1] [N-1] Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 27 Carnegie Mellon Locality Example  Question: Does this function have good locality with respect to array a? int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; } Answer: no Stride N reference pattern Note: If M is very small then good locality. Why? a [0] [0] ••• a [0] [N-1] a [1] [0] ••• a [1] [N-1] ••• a [M-1] [0] ••• a [M-1] [N-1] Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 28 Carnegie Mellon Locality Example  Question: Can you permute the loops so that the function scans the 3-d array a with a stride-1 reference pattern (and thus has good spatial locality)? $ time ./loopijk real 0m2.765s user 0m2.328s sys 0m0.422s $ time ./loopkij real 0m1.651s user 0m1.234s sys 0m0.422s Answer: make j the inner loop int sum_array_3d(int a[M][N][N]) { int i, j, k, sum = 0; for (i = 0; i < N; i++) for (j = 0; j < N; j++) for (k = 0; k < M; k++) sum += a[k][i][j]; return sum; } Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 29 Carnegie Mellon Today  The memory abstraction  DRAM : main memory building block  Locality of reference  The memory hierarchy  Storage technologies and trends Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 30 Carnegie Mellon Memory Hierarchies  Some fundamental and enduring properties of hardware and software: ▪ Fast storage technologies cost more per byte, have less capacity, and require more power (heat!). ▪ The gap between CPU and main memory speed is widening. ▪ Well-written programs tend to exhibit good locality.  These fundamental properties complement each other beautifully.  They suggest an approach for organizing memory and storage systems known as a memory hierarchy. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 31 Carnegie Mellon Example Memory Hierarchy Smaller, faster, and costlier (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices L6: L1: CPU registers hold words retrieved from the L1 cache. L0: Regs L1 cache (SRAM) L2 cache (SRAM) L3 cache (SRAM) Main memory (DRAM) Local secondary storage (local disks) L2: L1 cache holds cache lines retrieved from the L2 cache. L2 cache holds cache lines retrieved from L3 cache. L3 cache holds cache lines retrieved from main memory. Main memory holds disk blocks retrieved from local disks. Local disks hold files retrieved from disks on remote servers. L3: L4: L5: Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 32 Remote secondary storage (e.g., Web servers) Carnegie Mellon Caches  Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.  Fundamental idea of a memory hierarchy: ▪ For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.  Why do memory hierarchies work? ▪ Because of locality, programs tend to access the data at level k more often than they access the data at level k+1. ▪ Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit.  Big Idea (Ideal): The memory hierarchy creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 33 Carnegie Mellon General Cache Concepts 84 9 140 3 Cache Smaller, faster, more expensive memory caches a subset of the blocks Data is copied in block-sized transfer units Larger, slower, cheaper memory viewed as partitioned into “blocks” 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Memory 140 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 34 Carnegie Mellon General Cache Concepts: Hit 8 9 14 3 Cache Request: 14 Data in block b is needed Block b is in cache: Hit! 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Memory Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 35 Carnegie Mellon General Cache Concepts: Miss 8 192 14 3 Cache Request: 12 Data in block b is needed Block b is not in cache: Miss! Block b is fetched from memory Block b is stored in cache • Placement policy: determines where b goes • Replacement policy: determines which block gets evicted (victim) Impact of spatial locality on number of misses? 12 Request: 12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Memory Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 36 Carnegie Mellon General Caching Concepts: 3 Types of Cache Misses  Cold (compulsory) miss ▪ Cold misses occur because the cache starts empty and this is the first reference to the block.  Capacity miss ▪ Occurs when the set of active cache blocks (working set) is larger than the cache.  Conflict miss ▪ Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the block positions at level k. ▪ E.g. Block i at level k+1 must be placed in block (i mod 4) at level k. ▪ Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block. 37 ▪ E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Examples of Caching in the Mem. Hierarchy Carnegie Mellon Cache Type What is Cached? Where is it Cached? Registers 0 Compiler L1 cache 64-byte blocks On-Chip L1 L2 cache 64-byte blocks On-Chip L2 10 Hardware Virtual Memory 4-KB pages Main memory 100 Hardware + OS Buffer cache Parts of files Main memory 100 OS Disk cache Disk sectors Disk controller 100,000 Disk firmware Network buffer cache Parts of files Local disk 10,000,000 NFS client Browser cache Web cache Web pages Web pages Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 38 Local disk Remote server disks Latency (cycles) 10,000,000 1,000,000,000 Managed By 4-8 byte words CPU core TLB Address translations On-Chip TLB 0 Hardware MMU 4 Hardware Web browser Web proxy server Carnegie Mellon Quiz Time! Check out: https://canvas.cmu.edu/courses/17808/ Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 39 Carnegie Mellon Today  The memory abstraction  RAM : main memory building block  Locality of reference  The memory hierarchy  Storage technologies and trends Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 40 Carnegie Mellon Storage Technologies  Magnetic Disks  Store on magnetic medium  Electromechanical access Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition  Nonvolatile (Flash) Memory  Store as persistent charge  Implemented with 3-D structure ▪ 100+ levels of cells ▪ 3 bits data per cell 41 Carnegie Mellon What’s Inside A Disk Drive? Arm Actuator Spindle Platters SCSI connector Electronics (including a processor and memory!) Image courtesy of Seagate Technology Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 42 Carnegie Mellon Disk Geometry  Disks consist of platters, each with two surfaces.  Each surface consists of concentric rings called tracks.  Each track consists of sectors separated by gaps. Tracks Surface Track k Gaps Spindle Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 43 Sectors Carnegie Mellon Disk Capacity  Capacity: maximum number of bits that can be stored. ▪ Vendors express capacity in units of gigabytes (GB) or terabytes (TB), where 1 GB = 109 Bytes and 1 TB = 1012 Bytes  Capacity is determined by these technology factors: ▪ Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. ▪ Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment. ▪ Areal density (bits/in2): product of Tracks recording and track density. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 44 Carnegie Mellon Disk Operation (Single-Platter View) The disk surface spins at a fixed rotational rate The read/write head is attached to the end of the arm and flies over the disk surface on a thin cushion of air. spindle By moving radially, the arm can position the read/write head over any track. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 45 spindle spindle spindle Carnegie Mellon Disk Operation (Multi-Platter View) Read/write heads move in unison from cylinder to cylinder Arm Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 46 Spindle Carnegie Mellon Disk Access – Service Time Components After BLUE read Seek for RED Rotational latency After RED read Data transfer Seek Rotational Data transfer latency Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 47 Carnegie Mellon Disk Access Time  Average time to access some target sector approximated by: ▪ Taccess = Tavg seek + Tavg rotation + Tavg transfer  Seek time (Tavg seek) ▪ Time to position heads over cylinder containing target sector. ▪ Typical Tavg seek is 3—9 ms  Rotational latency (Tavg rotation) ▪ Time waiting for first bit of target sector to pass under r/w head. ▪ Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min ▪ Typical rotational rate = 7,200 RPMs  Transfer time (Tavg transfer) ▪ Time to read the bits in the target sector. ▪ Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min time for one rotation (in minutes) fraction of a rotation to be read Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 48 Carnegie Mellon Disk Access Time Example  Given: ▪ Rotational rate = 7,200 RPM ▪ Average seek time = 9 ms ▪ Avg # sectors/track = 400  Derived: ▪ Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms ▪ Tavg transfer = 60/7200 x 1/400 x 1000 ms/sec = 0.02 ms ▪ Taccess = 9 ms + 4 ms + 0.02 ms  Important points: ▪ Access time dominated by seek time and rotational latency. ▪ First bit in a sector is the most expensive, the rest are free. ▪ SRAM access time is about 4 ns/doubleword, DRAM about 60 ns ▪ Disk is about 40,000 times slower than SRAM, ▪ 2,500 times slower than DRAM. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 49 Carnegie Mellon I/O Bus CPU chip Register file ALU System bus Memory bus Bus interface I/O bridge Main memory I/O bus Expansion slots for other devices such as network adapters. USB controller Graphics adapter Disk controller Mouse Keyboard Monitor Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 50 Disk Carnegie Mellon Reading a Disk Sector (1) CPU chip CPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (address) associated with disk controller. Register file ALU Bus interface Main memory I/O bus USB controller Graphics adapter Disk controller mouse keyboard Monitor Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 51 Disk Carnegie Mellon Reading a Disk Sector (2) CPU chip Register file Disk controller reads the sector and performs a direct memory access (DMA) transfer into main memory. ALU Bus interface Main memory I/O bus USB controller Graphics adapter Disk controller Mouse Keyboard Monitor Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 52 Disk Carnegie Mellon Reading a Disk Sector (3) CPU chip When the DMA transfer completes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” pin on the CPU). Register file ALU Bus interface Main memory I/O bus USB controller Graphics adapter Disk controller Mouse Keyboard Monitor Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 53 Disk Carnegie Mellon Nonvolatile Memories  DRAM and SRAM are volatile memories ▪ Lose information if powered off.  Nonvolatile memories retain value even if powered off ▪ Read-only memory (ROM): programmed during production ▪ Electrically eraseable PROM (EEPROM): electronic erase capability ▪ Flash memory: EEPROMs, with partial (block-level) erase capability ▪ Wears out after about 100,000 erasings ▪ 3D XPoint (Intel Optane) & emerging NVMs ▪ New materials  Uses for Nonvolatile Memories ▪ Firmware programs stored in a ROM (BIOS, controllers for disks, network cards, graphics accelerators, security subsystems,...) ▪ Solid state disks (replacing rotating disks) ▪ Disk caches Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 54 Carnegie Mellon Solid State Disks (SSDs) I/O bus Solid State Disk (SSD) Requests to read and write logical disk blocks Block 0 Block B-1 Flash translation layer ... Page P-1  Pages: 512KB to 4KB, Blocks: 32 to 128 pages  Data read/written in units of pages.  Page can be written only after its block has been erased.  A block wears out after about 100,000 repeated writes. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 55 ... DRAM Buffer Flash memory ... Page P-1 Page 0 Page 1 Page 0 Page 1 Carnegie Mellon SSD Performance Characteristics  Benchmark of Samsung 940 EVO Plus https://ssd.userbenchmark.com/SpeedTest/711305/Samsung-SSD-970-EVO-Plus-250GB Sequential read throughput 2,126 MB/s Sequential write tput 1,880 MB/s Random read throughput 140 MB/s Random write tput 59 MB/s  Sequential access faster than random access ▪ Common theme in the memory hierarchy  Random writes are somewhat slower ▪ Erasing a block takes a long time (~1 ms). ▪ Modifying a block page requires all other pages to be copied to new block. ▪ Flash translation layer allows accumulating series of small writes before doing block write. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 56 Carnegie Mellon SSD Tradeoffs vs Rotating Disks  Advantages ▪ No moving parts→faster, less power, more rugged  Disadvantages ▪ Have the potential to wear out ▪ Mitigated by “wear leveling logic” in flash translation layer ▪ E.g. Samsung 940 EVO Plus guarantees 600 writes/byte of writes before they wear out ▪ Controller migrates data to minimize wear level ▪ In 2019, about 4 times more expensive per byte ▪ And, relative cost will keep dropping  Applications ▪ Smartphones, laptops ▪ Increasingly common in desktops and servers Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 57 Carnegie Mellon Summary  The speed gap between CPU, memory and mass storage continues to widen.  Well-written programs exhibit a property called locality.  Memory hierarchies based on caching close the gap by exploiting locality.  Flash memory progress outpacing all other memory and storage technologies (DRAM, SRAM, magnetic disk) ▪ Able to stack cells in three dimensions Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 58 Carnegie Mellon Supplemental slides Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 59 Carnegie Mellon Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 $/MB 2,900 320 256 100 75 60 320 116 access (ns) 150 35 15 3 2 1.5 200 115 Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 $/MB access (ns) typical size (MB) 880 100 30 1 200 100 70 60 0.256 4 16 64 0.1 0.06 50 40 2,000 8,000 0.02 44,000 20 10 16.000 62,500 Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985 $/GB 100,000 8,000 300 10 5 0.3 0.03 3,333,333 access(ms) 75 28 10 8 5 3 3 25 typical size (GB) 0.01 0.16 1 20 160 1,500 3,000 300,000 Storage Trends SRAM DRAM Disk Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 60 Carnegie Mellon 1985 1990 1995 2003 2005 2010 2015 2015:1985 CPU 80286 Clock rate (MHz) 6 Cycle time(ns) 166 Cores 1 Effective cycle 166 time (ns) 80386 Pentium 20 150 50 6 1 1 50 6 P-4 3,300 0.30 1 0.30 Core 2 2,000 Core i7(n) Core i7(h) 2,500 3,000 500 0.4 0.33 500 0.50 2444 0.25 0.10 0.08 2,075 CPU Clock Rates Inflection point in computer history when designers hit the “Power Wall” Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition (n) Nehalem processor (h) Haswell processor 61