Many of the following slides are taken with permission from
Complete Powerpoint Lecture Notes for
Computer Systems: A Programmer’s Perspective (CS:APP)
Randal E. Bryant and David R. O’Hallaron
http://csapp.cs.cmu.edu/public/lectures.html
The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506.
CacheMemoryandPerformance MemoryHierarchy 1
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
L0:
(SRAM)
L2 cache (SRAM)
Main memory (DRAM)
Smaller, faster, costlier per byte
Larger,
slower, cheaper per byte
L5:
L2: L3:
L1 cache holds cache lines retrieved from L2 cache
L2 cache holds cache lines retrieved from main memory
Main memory holds disk blocks retrieved from local disks
Local disks hold files retrieved from disks on remote network servers
L1:
Registers L1 cache
L4:
Local secondary storage (local disks)
Remote secondary storage
(tapes, distributed file systems, Web servers)
CPU registers hold words retrieved from L1 cache
An Example Memory Hierarchy
MemoryHierarchy 2
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Key features
– RAM is traditionally packaged as a chip.
– Basic storage unit is normally a cell (one bit per cell).
– Multiple RAM chips form a memory.
Static RAM (SRAM)
– Each cell stores a bit with a four or six-transistor circuit.
– Retains value indefinitely, as long as it is kept powered.
– Relatively insensitive to electrical noise (EMI), radiation, etc.
– Faster and more expensive than DRAM.
Dynamic RAM (DRAM)
– Each cell stores bit with a capacitor. One transistor is used for access
– Value must be refreshed every 10-100 ms.
– More sensitive to disturbances (EMI, radiation,…) than SRAM.
– Slower and cheaper than SRAM.
Random-Access Memory (RAM) Memory Hierarchy 3
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Trans. Access Needs Needs
per bit time refresh? EDC? Cost Applications
SRAM 4 or 6 1X No Maybe 100x Cache memories
DRAM 1 10X Yes Yes 1X Main memories, frame buffers
SRAM vs DRAM Summary Memory Hierarchy 4
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices.
CPU chip
Register file
ALU
Bus interface
System bus Memory bus
I/O bridge
Main memory
Traditional CPU-Memory Bus Structure
MemoryHierarchy 5
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
CPU places address A on the memory bus.
Register file %eax
Loadoperation:movl A, %eax
Main memory I/O bridge A 0
A
ALU
x
Bus interface
Memory Read Transaction (1) Memory Hierarchy 6
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Main memory reads A from the memory bus, retrieves word x, and places it on the bus.
Register file %eax
Loadoperation:movl A, %eax
ALU
I/O bridge
x
Main memory 0
A
x
Bus interface
Memory Read Transaction (2) Memory Hierarchy 7
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
CPU read word x from the bus and copies it into register %eax.
Register file %eax
Loadoperation:movl A, %eax
x
ALU
I/O bridge
Main memory 0
A
x
Bus interface
Memory Read Transaction (3) Memory Hierarchy 8
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive.
Register file %eax
Storeoperation:movl %eax, A
y
ALU
I/O bridge
A
Main memory 0
A
Bus interface
Memory Write Transaction (1) Memory Hierarchy 9
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
CPU places data word y on the bus.
Register file %eax
Storeoperation:movl %eax, A
ALU
y
I/O bridge
y
Main memory 0
A
Bus interface
Memory Write Transaction (2) Memory Hierarchy 10
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Main memory reads data word y from the bus and stores it at address A.
register file %eax
Storeoperation:movl %eax, A
y
ALU
I/O bridge
main memory 0
A
y
bus interface
Memory Write Transaction (3) Memory Hierarchy 11
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
CPU chip
Register file
ALU
Bus interface
System bus
Memory bus
Mouse Keyboard
Monitor
I/O bridge
I/O bus
Expansion slots for other devices such as network adapters.
Disk
Main memory
USB controller
Graphics adapter
Disk controller
The Bigger Picture: I/O Bus Memory Hierarchy 12
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
SRAM
Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980
$/MB 19,200 2,900 320 256 100 75 60 320 access (ns) 300 150 35 15 3 2 1.5 200
DRAM
Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980
$/MB 8,000 access (ns) 375 typical size (MB) 0.064
880 100 30 1 200 100 70 60 0.256 4 16 64
0.1 0.06 50 40 2,000 8,000
130,000 9 125,000
Disk
Metric 1980 1985 1990 1995 2000 2005 2010 2010:1980
$/MB 500 100 8 0.30 0.01 0.005 0.0003 1,600,000 access(ms) 87 75 28 10 8 4 3 29
typical size (MB) 1 10 160 1,000 20,000 160,000 1,500,000 1,500,000
Storage Trends Memory Hierarchy 13
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
The gap
100,000,000.0 10,000,000.0 1,000,000.0 100,000.0 10,000.0 1,000.0 100.0 10.0 1.0 0.1 0.0
between DRAM, disk, and CPU speeds.
Disk
SSD
DRAM
CPU
1980 1985 1990 1995 2000 2003 2005 2010
Year
Disk seek time
Flash SSD access time DRAM access time SRAM access time
CPU cycle time Effective CPU cycle time
The CPU-Memory Gap Memory Hierarchy 14
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
ns
Principle of Locality: Programs tend to use data and instructions with addresses near or equal to those they have used recently
Temporal locality:
– Recently referenced items are likely
to be referenced again in the near future
Spatial locality:
– Items with nearby addresses tend
to be referenced close together in time
Locality
Memory Hierarchy 15
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;
Data references
– Reference array elements in succession (stride-1 reference pattern).
– Reference variable sum each iteration.
Instruction references
– Reference instructions in sequence.
– Cycle through loop repeatedly.
Spatial locality Temporal locality
Spatial locality Temporal locality
Locality Example Memory Hierarchy 16
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Memory hierarchy
Store everything on disk
Copy recently accessed (and nearby) items from disk to smaller DRAM memory – Main memory
Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory – Cache memory attached to CPU
Taking Advantage of Locality Memory Hierarchy 17
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
L0:
(SRAM)
L2 cache (SRAM)
Main memory (DRAM)
Smaller, faster, costlier per byte
Larger,
slower, cheaper per byte
L5:
L2: L3:
L1 cache holds cache lines retrieved from L2 cache
L2 cache holds cache lines retrieved from main memory
Main memory holds disk blocks retrieved from local disks
Local disks hold files retrieved from disks on remote network servers
L1:
Registers L1 cache
L4:
Local secondary storage (local disks)
Remote secondary storage
(tapes, distributed file systems, Web servers)
CPU registers hold words retrieved from L1 cache
An Example Memory Hierarchy
Memory Hierarchy 18
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Cache: a smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.
Fundamental idea of a memory hierarchy:
– For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.
Why do memory hierarchies work?
– Because of locality, programs tend to access the data at level k more often than they access the data at level k+1.
– Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit.
Big Idea: The memory hierarchy creates a large pool of storage that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.
Caches Memory Hierarchy 19
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
84
9
140
3
Cache
Smaller, faster, more expensive memory caches a subset of the blocks
Data is copied in block-sized transfer units
Larger, slower, cheaper memory viewed as partitioned into “blocks”
Memory
140
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
General Cache Concepts Memory Hierarchy 20
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
8
9
14
3
Cache
Request: 14 Data in block b is needed
Block b is in cache:
Hit!
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Memory
General Cache Concepts: Hit Memory Hierarchy 21
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
8
192
14
3
Cache
Request: 12
Data in block b is needed Block b is not in cache:
Miss!
Block b is fetched from
memory
Block b is stored in cache
• Placement policy: determines where b goes
• Replacement policy: determines which block
gets evicted (victim)
Memory
12
Request: 12
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
General Cache Concepts: Miss Memory Hierarchy 22
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Cold (compulsory) miss
– Cold misses occur because the cache is empty. Conflict miss
– Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the block positions at level k.
E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.
– Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block.
E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.
Capacity miss
– Occurs when the set of active cache blocks (working set) is larger than the cache.
Types of Cache Misses Memory Hierarchy 23
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Cache Type
What is Cached?
Where is it Cached?
Latency (cycles)
Managed By
Registers
4-8 bytes words
CPU core
0
Compiler
TLB
Address translations
On-Chip TLB
0
Hardware
L1 cache
64-bytes block
On-Chip L1
1
Hardware
L2 cache
64-bytes block
On/Off-Chip L2
10
Hardware
Virtual Memory
4-KB page
Main memory
100
Hardware + OS
Buffer cache
Parts of files
Main memory
100
OS
Disk cache
Disk sectors
Disk controller
100,000
Disk firmware
Network buffer cache
Parts of files
Local disk
10,000,000
AFS/NFS client
Browser cache
Web pages
Local disk
10,000,000
Web browser
Web cache
Web pages
Remote server disks
1,000,000,000
Web proxy server
Examples of Caching in the Hierarchy Memory Hierarchy 24
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
Cache memories are small, fast SRAM-based memories managed automatically in hardware.
– Hold frequently accessed blocks of main memory
CPU looks first for data in caches (e.g., L1, L2, and L3), then in main memory.
Typical system structure:
CPU chip
Register file
Cache memories
ALU
Bus interface
System bus Memory bus
I/O bridge
Main memory
Cache Memories Memory Hierarchy 25
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain