Carnegie Mellon
The Memory Hierarchy
15-213/18-213/15-513: Introduction to Computer Systems 10th Lecture, June 9, 2020
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
1
Carnegie Mellon
Today
The memory abstraction
RAM : main memory building block Storage technologies and trends
Locality of reference
The memory hierarchy
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
2
Carnegie Mellon
Writing & Reading Memory Write
▪ Transfer data from CPU to memory movq %rax, 8(%rsp)
▪ “Store” operation Read
▪ Transfer data from memory to CPU movq 8(%rsp), %rax
▪ “Load” operation
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
From 5th lecture
3
Carnegie Mellon
Traditional Bus Structure Connecting CPU and Memory
A bus is a collection of parallel wires that carry address, data, and control signals.
Buses are typically shared by multiple devices. CPU chip
Register file
ALU
System bus Memory bus
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
4
I/O bridge
Main memory
Carnegie Mellon
Memory Read Transaction (1) CPU places address A on the memory bus.
Register file
%rax
Load operation: movq A, %rax
I/O bridge
Main memory A 0
A
x
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
5
ALU
Carnegie Mellon
Memory Read Transaction (2)
Main memory reads A from the memory bus, retrieves word x, and places it on the bus.
Register file
%rax
Load operation: movq A, %rax
I/O bridge
x
Main memory
0 A
x
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
6
ALU
Carnegie Mellon
Memory Read Transaction (3)
CPU read word x from the bus and copies it into register
%rax.
Register file
%rax
Load operation: movq A, %rax
x
ALU
I/O bridge
Main memory 0
A
x
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
7
Carnegie Mellon
Memory Write Transaction (1)
CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive.
Register file
%rax
Store operation: movq %rax, A
y
ALU
I/O bridge
A
Main memory 0
A
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
8
Carnegie Mellon
Memory Write Transaction (2) CPU places data word y on the bus.
Register file
%rax
Store operation: movq %rax, A
ALU
y
I/O bridge
y
Main memory 0
A
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
9
Carnegie Mellon
Memory Write Transaction (3)
Main memory reads data word y from the bus and stores it at address A.
Register file
%rax
Store operation: movq %rax, A
y
ALU
I/O bridge
Main memory 0
A
y
Bus interface
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
10
Carnegie Mellon
Today
The memory abstraction
RAM : main memory building block Storage technologies and trends
Locality of reference
The memory hierarchy
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
11
Carnegie Mellon
Random-Access Memory (RAM)
Key features
▪ RAM is traditionally packaged as a chip.
▪ or embedded as part of processor chip
▪ Basic storage unit is normally a cell (one bit per cell). ▪ Multiple RAM chips form a memory.
RAM comes in two varieties: ▪ SRAM (Static RAM)
▪ DRAM (Dynamic RAM)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
12
Carnegie Mellon
RAM Technologies
DRAM
1 Transistor + 1 capacitor / bit
▪ Capacitor oriented vertically
Must refresh state
periodically
SRAM
6 transistors / bit
Holds state indefinitely
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
13
Carnegie Mellon
SRAM vs DRAM Summary
Trans. Access Needs Needs
per bit time refresh? EDC? Cost
SRAM 6 or 8 1x No Maybe 100x DRAM 1 10x Yes Yes 1x
Applications
Cache memories
Main memories, frame buffers
EDC: Error detection and correction
Trends
▪ SRAM scales with semiconductor technology
▪ Reaching its limits
▪ DRAM scaling limited by need for minimum capacitance
▪ Aspect ratio limits how deep can make capacitor ▪ Also reaching its limits
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
14
Carnegie Mellon
Enhanced DRAMs
Operation of DRAM cell has not changed since its invention
▪ Commercialized by Intel in 1970.
DRAM cores with better interface logic and faster I/O : ▪ Synchronous DRAM (SDRAM)
▪ Uses a conventional clock signal instead of asynchronous control
▪ Double data-rate synchronous DRAM (DDR SDRAM)
▪ Double edge clocking sends two bits per cycle per pin
▪ Different types distinguished by size of small prefetch buffer:
– DDR (2 bits), DDR2 (4 bits), DDR3 (8 bits), DDR4 (16 bits) ▪ By 2010, standard for most server and desktop systems
▪ Intel Core i7 supports DDR3 and DDR4 SDRAM
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
15
Carnegie Mellon
Conventional DRAM Organization dxwDRAM:
▪ d⋅ w total bits organized as d supercells of size w bits 16 x 8 DRAM chip
2 bits /
addr
0 0
1 rows
2 3
cols 123
Memory controller
(to/from CPU)
supercell (2,1)
8 bits
/
data
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
16
Internal row buffer
Carnegie Mellon
Reading DRAM Supercell (2,1)
Step 1(a): Row access strobe (RAS) selects row 2.
Step 1(b): Row 2 copied from DRAM array to row buffer.
16 x 8 DRAM chip
RAS = 2
2 /
addr
0 0
1
Rows
2 3
Cols
123
Memory controller
8
/
data
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
17
Internal row buffer
Carnegie Mellon
Reading DRAM Supercell (2,1)
Step 2(a): Column access strobe (CAS) selects column 1.
Step 2(b): Supercell (2,1) copied from buffer to data lines, and eventually
back to the CPU.
Step 3: All data written back to row to provide refresh
To CPU
supercell (2,1)
16 x 8 DRAM chip
CAS = 1
2 /
addr
Memory controller
8
/
data
supercell
(2,1)
18
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
0 0
1 Rows
2 3
Cols 123
Internal row buffer
Carnegie Mellon
Memory Modules
: supercell (i,j)
64 MB
memory module consisting of
eight 8Mx8 DRAMs
DRAM 0
DRAM
7
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
19
addr (row = i, col = j)
bits bits bits 56-63 48-55 40-47
bits bits 32-39 24-31
bits bits bits 16-23 8-15 0-7
63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
64-bit word main memory address A
Memory controller
64-bit word
Carnegie Mellon
Today
The memory abstraction
RAM : main memory building block Storage technologies and trends
Locality of reference
The memory hierarchy
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
20
Carnegie Mellon
Storage Technologies Magnetic Disks
Store on magnetic medium
Electromechanical access
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
Nonvolatile (Flash) Memory
Store as persistent charge
Implemented with 3-D structure
▪ 100+ levels of cells ▪ 3 bits data per cell
21
Carnegie Mellon
What’s Inside A Disk Drive?
Arm Actuator
Spindle
Platters
SCSI connector
Electronics (including a processor
and memory!)
Image courtesy of Seagate Technology
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
22
Carnegie Mellon
Disk Geometry
Disks consist of platters, each with two surfaces.
Each surface consists of concentric rings called tracks. Each track consists of sectors separated by gaps.
Tracks
Surface
Track k Gaps
Spindle
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
23
Sectors
Carnegie Mellon
Disk Capacity
Capacity: maximum number of bits that can be stored.
▪ Vendors express capacity in units of gigabytes (GB) or terabytes (TB), where 1 GB = 109 Bytes and 1 TB = 1012 Bytes
Capacity is determined by these technology factors:
▪ Recording density (bits/in): number of bits that can be squeezed into
a 1 inch segment of a track.
▪ Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment.
▪ Areal density (bits/in2): product of Tracks recording and track density.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
24
Carnegie Mellon
Disk Operation (Single-Platter View)
The disk surface spins at a fixed rotational rate
The read/write head
is attached to the end
of the arm and flies over the disk surface on
a thin cushion of air.
spindle
By moving radially, the arm can position the read/write head over any track.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
25
spindle
spindle
spindle
Carnegie Mellon
Disk Operation (Multi-Platter View)
Read/write heads move in unison from cylinder to cylinder
Arm
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
26
Spindle
Carnegie Mellon
Disk Access – Service Time Components
After BLUE read Seek for RED Rotational latency After RED read
Data transfer Seek Rotational Data transfer latency
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
27
Carnegie Mellon
Disk Access Time
Average time to access some target sector approximated by:
▪ Taccess = Tavg seek + Tavg rotation + Tavg transfer Seek time (Tavg seek)
▪ Time to position heads over cylinder containing target sector. ▪ Typical Tavg seek is 3—9 ms
Rotational latency (Tavg rotation)
▪ Time waiting for first bit of target sector to pass under r/w head.
▪ Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min ▪ Typical rotational rate = 7,200 RPMs
Transfer time (Tavg transfer)
▪ Time to read the bits in the target sector.
▪ Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min
time for one rotation (in minutes) fraction of a rotation to be read Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
28
Carnegie Mellon
Disk Access Time Example
Given:
▪ Rotational rate = 7,200 RPM ▪ Average seek time = 9 ms
▪ Avg # sectors/track = 400
Derived:
▪ Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms
▪ Tavg transfer = 60/7200 x 1/400 x 1000 ms/sec = 0.02 ms ▪ Taccess = 9 ms + 4 ms + 0.02 ms
Important points:
▪ Access time dominated by seek time and rotational latency.
▪ First bit in a sector is the most expensive, the rest are free.
▪ SRAM access time is about 4 ns/doubleword, DRAM about 60 ns
▪ Disk is about 40,000 times slower than SRAM,
▪ 2,500 times slower than DRAM. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
29
Carnegie Mellon
I/O Bus CPU chip
Register file
ALU
System bus
Memory bus
Bus interface
I/O bridge
Main memory
I/O bus
Expansion slots for other devices such as network adapters.
USB controller
Graphics adapter
Disk controller
Mouse Keyboard
Monitor
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
30
Disk
Carnegie Mellon
Reading a Disk Sector (1)
CPU chip
CPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (address) associated with disk controller.
Register file
ALU
Bus interface
Main memory
I/O bus
USB controller
Graphics adapter
Disk controller
mouse keyboard
Monitor
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
31
Disk
Carnegie Mellon
Reading a Disk Sector (2) CPU chip
Register file
Disk controller reads the sector and performs a direct memory access (DMA) transfer into main memory.
ALU
Bus interface
Main memory
I/O bus
USB controller
Graphics adapter
Disk controller
Mouse Keyboard
Monitor
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
32
Disk
Carnegie Mellon
Reading a Disk Sector (3)
CPU chip
When the DMA transfer completes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” pin on the CPU).
Register file
ALU
Bus interface
Main memory
I/O bus
USB controller
Graphics adapter
Disk controller
Mouse Keyboard
Monitor
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
33
Disk
Carnegie Mellon
Nonvolatile Memories
DRAM and SRAM are volatile memories
▪ Lose information if powered off.
Nonvolatile memories retain value even if powered off ▪ Read-only memory (ROM): programmed during production
▪ Electrically eraseable PROM (EEPROM): electronic erase capability ▪ Flash memory: EEPROMs, with partial (block-level) erase capability
▪ Wears out after about 100,000 erasings ▪ 3D XPoint (Intel Optane) & emerging NVMs
▪ New materials
Uses for Nonvolatile Memories
▪ Firmware programs stored in a ROM (BIOS, controllers for disks, network cards, graphics accelerators, security subsystems,…)
▪ Solid state disks (replacing rotating disks)
▪ Disk caches
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
34
Carnegie Mellon
Solid State Disks (SSDs)
I/O bus
Solid State Disk (SSD)
Requests to read and write logical disk blocks
Block 0
Block B-1
Flash translation layer
…
Page P-1
Pages: 512KB to 4KB, Blocks: 32 to 128 pages
Data read/written in units of pages.
Page can be written only after its block has been erased.
A block wears out after about 100,000 repeated writes.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
35
…
DRAM Buffer
Flash memory
…
Page P-1
Page 0
Page 1
Page 0
Page 1
Carnegie Mellon
SSD Performance Characteristics Benchmark of Samsung 940 EVO Plus
https://ssd.userbenchmark.com/SpeedTest/711305/Samsung-SSD-970-EVO-Plus-250GB
Sequential read throughput 2,126 MB/s Sequential write tput 1,880 MB/s Random read throughput 140 MB/s Random write tput 59 MB/s
Sequential access faster than random access ▪ Common theme in the memory hierarchy
Random writes are somewhat slower
▪ Erasing a block takes a long time (~1 ms).
▪ Modifying a block page requires all other pages to be copied to new block.
▪ Flash translation layer allows accumulating series of small writes before doing block write.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
36
Carnegie Mellon
SSD Tradeoffs vs Rotating Disks Advantages
▪ No moving parts→faster, less power, more rugged
Disadvantages
▪ Have the potential to wear out
▪ Mitigated by “wear leveling logic” in flash translation layer
▪ E.g. Samsung 940 EVO Plus guarantees 600 writes/byte of
writes before they wear out
▪ Controller migrates data to minimize wear level
▪ In 2019, about 4 times more expensive per byte ▪ And, relative cost will keep dropping
Applications
▪ MP3 players, smart phones, laptops
▪ Increasingly common in desktops and servers Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
37
Carnegie Mellon
Today
The memory Abstraction
DRAM : main memory building block Storage technologies and trends
Locality of reference
The memory hierarchy
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
38
Carnegie Mellon
The CPU-Memory Gap
The gap
100,000,000.0 10,000,000.0 1,000,000.0 100,000.0 10,000.0 1,000.0 100.0 10.0 1.0 0.1 0.0
between DRAM, disk, and CPU speeds.
Disk SSD
DRAM
CPU
1985 1990 1995 2000 2003 2005 2010 2015
Year
Disk seek time
SSD access time
DRAM access time SRAM access time
CPU cycle time Effective CPU cycle time
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
39
Time (ns)
Carnegie Mellon
Locality to the Rescue!
The key to bridging this CPU-Memory gap is a fundamental
property of computer programs known as locality.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
40
Carnegie Mellon
Locality
Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently
Temporal locality:
▪ Recently referenced items are likely
to be referenced again in the near future
Spatial locality:
▪ Items with nearby addresses tend
to be referenced close together in time
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
41
Carnegie Mellon
Locality Example
Data references
▪ Reference array elements in succession
(stride-1 reference pattern).
▪ Reference variable sum each iteration.
Instruction references
▪ Reference instructions in sequence. ▪ Cycle through loop repeatedly.
Spatial or Temporal Locality?
spatial temporal
spatial temporal
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
42
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;
Carnegie Mellon
Qualitative Estimates of Locality
Claim: Being able to look at code and get a qualitative sense of its locality is a key skill for a professional programmer.
Question: Does this function have good locality with respect to array a?
Hint: array layout is row-major order
Answer: yes
int sum_array_rows(int a[M][N])
{
int i, j, sum = 0;
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
sum += a[i][j];
return sum;
}
a [0] [0]
•••
a [0] [N-1]
a [1] [0]
•••
a [1] [N-1]
•••
a [M-1] [0]
•••
a [M-1] [N-1]
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
43
Carnegie Mellon
Locality Example
Question: Does this function have good locality with
respect to array a?
int sum_array_cols(int a[M][N]) {
int i, j, sum = 0;
for (j = 0; j < N; j++)
for (i = 0; i < M; i++)
sum += a[i][j];
return sum;
}
Answer: no, unless... M is very small
a [0] [0]
•••
a [0] [N-1]
a [1] [0]
•••
a [1] [N-1]
•••
a [M-1] [0]
•••
a [M-1] [N-1]
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
44
Carnegie Mellon
Locality Example
Question: Can you permute the loops so that the function scans the 3-d array a with a stride-1 reference pattern (and thus has good spatial locality)?
int sum_array_3d(int a[M][N][N]) {
int i, j, k, sum = 0;
for (i = 0; i < N; i++) for (j = 0; j < N; j++)
for (k = 0; k < M; k++)
sum += a[k][i][j];
return sum; }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
45
Answer: make j the inner loop
Carnegie Mellon
Today
The memory abstraction
DRAM : main memory building block Storage technologies and trends
Locality of reference
The memory hierarchy
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
46
Carnegie Mellon
Memory Hierarchies
Some fundamental and enduring properties of hardware and software:
▪ Fast storage technologies cost more per byte, have less capacity, and require more power (heat!).
▪ The gap between CPU and main memory speed is widening. ▪ Well-written programs tend to exhibit good locality.
These fundamental properties complement each other beautifully.
They suggest an approach for organizing memory and storage systems known as a memory hierarchy.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
47
Carnegie Mellon
Example Memory Hierarchy
Smaller, faster, and costlier (per byte) storage devices
Larger, slower, and cheaper (per byte) storage devices
L6:
L1:
CPU registers hold words retrieved from the L1 cache.
L0:
Regs
L1 cache (SRAM)
L2 cache (SRAM)
L3 cache (SRAM)
Main memory (DRAM)
Local secondary storage (local disks)
L2:
L1 cache holds cache lines retrieved from the L2 cache.
L2 cache holds cache lines retrieved from L3 cache.
L3 cache holds cache lines retrieved from main memory.
Main memory holds disk blocks retrieved from local disks.
Local disks hold files retrieved from disks on remote servers.
L3:
L4:
L5:
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
48
Remote secondary storage (e.g., Web servers)
Carnegie Mellon
Caches
Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.
Fundamental idea of a memory hierarchy:
▪ For each k, the faster, smaller device at level k serves as a cache for the
larger, slower device at level k+1.
Why do memory hierarchies work?
▪ Because of locality, programs tend to access the data at level k more
often than they access the data at level k+1.
▪ Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit.
Big Idea (Ideal): The memory hierarchy creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
49
Carnegie Mellon
General Cache Concepts
84
9
140
3
Cache
Smaller, faster, more expensive memory caches a subset of the blocks
Data is copied in block-sized transfer units
Larger, slower, cheaper memory viewed as partitioned into “blocks”
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Memory
140
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
50
Carnegie Mellon
General Cache Concepts: Hit
8
9
14
3
Cache
Request: 14
Data in block b is needed
Block b is in cache:
Hit!
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Memory
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
51
Carnegie Mellon
General Cache Concepts: Miss
8
192
14
3
Cache
Request: 12
Data in block b is needed Block b is not in cache:
Miss!
Block b is fetched from memory
Block b is stored in cache
• Placement policy: determines where b goes
• Replacement policy: determines which block gets evicted (victim)
12
Request: 12
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Memory
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
52
Carnegie Mellon
General Caching Concepts: 3 Types of Cache Misses
Cold (compulsory) miss
▪ Cold misses occur because the cache starts empty and this is the first
reference to the block.
Capacity miss
▪ Occurs when the set of active cache blocks (working set) is larger than
the cache.
Conflict miss
▪ Most caches limit blocks at level k+1 to a small subset (sometimes a
singleton) of the block positions at level k.
▪ E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.
▪ Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block.
53
▪ E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
Examples of Caching in the Mem. Hierarchy
Carnegie Mellon
Cache Type
What is Cached?
Where is it Cached?
Registers
0
Compiler
L1 cache
64-byte blocks
On-Chip L1
L2 cache
64-byte blocks
On-Chip L2
10
Hardware
Virtual Memory
4-KB pages
Main memory
100
Hardware + OS
Buffer cache
Parts of files
Main memory
100
OS
Disk cache
Disk sectors
Disk controller
100,000
Disk firmware
Network buffer cache
Parts of files
Local disk
10,000,000
NFS client
Browser cache
Web cache
Web pages
Web pages
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
54
Local disk
Remote server disks
Latency (cycles)
10,000,000
1,000,000,000
Managed By
4-8 byte words
CPU core
TLB
Address translations
On-Chip TLB
0
Hardware MMU
4
Hardware
Web browser
Web proxy server
Carnegie Mellon
Summary
The speed gap between CPU, memory and mass storage continues to widen.
Well-written programs exhibit a property called locality.
Memory hierarchies based on caching close the gap by
exploiting locality.
Flash memory progress outpacing all other memory and storage technologies (DRAM, SRAM, magnetic disk)
▪ Able to stack cells in three dimensions Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
55
Carnegie Mellon
Supplemental slides
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
56
Carnegie Mellon
Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985
$/MB 2,900 320 256 100 75 60 320 116 access (ns) 150 35 15 3 2 1.5 200 115
Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985
$/MB
access (ns) typical size (MB)
880 100 30 1 200 100 70 60 0.256 4 16 64
0.1 0.06 50 40 2,000 8,000
0.02 44,000 20 10 16.000 62,500
Metric 1985 1990 1995 2000 2005 2010 2015 2015:1985
$/GB 100,000 8,000 300 10 5 0.3 0.03 3,333,333 access(ms) 75 28 10 8 5 3 3 25 typical size (GB) 0.01 0.16 1 20 160 1,500 3,000 300,000
Storage Trends
SRAM
DRAM
Disk
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
57
Carnegie Mellon
1985 1990 1995
2003
2005 2010 2015 2015:1985
CPU 80286
Clock
rate (MHz) 6
Cycle
time (ns) 166
Cores 1
Effective cycle 166 time (ns)
80386 Pentium
20 150
50 6 1 1
50 6
P-4 3,300
0.30 1
0.30
Core 2 2,000
Core i7(n) Core i7(h) 2,500 3,000 500 0.4 0.33 500
0.50 2444
0.25 0.10 0.08 2,075
CPU Clock Rates
Inflection point in computer history when designers hit the “Power Wall”
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
(n) Nehalem processor (h) Haswell processor
58