May 11
Caches III
https://what-if.xkcd.com/111/
CMPT 295
L16: Caches III
Making memory accesses fast!
Cache basics
Principle of locality
Memory hierarchies
Cache organization
Direct-mapped (sets; index + tag)
Associativity (ways)
Replacement policy
Handling writes
Program optimizations that consider caches
2
CMPT 295
L16: Caches III
Divide addresses into “index” and “tag”
General Cache Organization (, , )
3
= blocks/lines per set
= # sets
=
set
“line” (block plus
management bits)
0
1
2
K-1
Tag
V
valid bit
= bytes per block
Cache size:
data bytes
(doesn’t include V or Tag)
CMPT 295
L16: Caches III
“Layers” of a cache:
Block (data)
Line (data + management bits)
Set (many lines based on associativity)
Cache (many sets based on cache size & associativity)
Valid bit lets us know if this line has been initialized (“is valid”).
Notation Review
We just introduced a lot of new variable names!
Please be mindful of block size notation when you look at past exam questions or are watching videos
4
Variable This Quarter Formulas
Block size ( in book)
Cache size
Associativity
Number of Sets
Address space
Address width
Tag field width
Index field width
Offset field width ( in book)
CMPT 295
L16: Caches III
Review this on your own
DON’T MEMORIZE FORUMLAS
Example Cache Parameters Problem
4 KiB address space, 125 cycles to go to memory.
Fill in the following table:
5
Cache Size 256 B
Block Size 32 B
Associativity 2-way
Hit Time 3 cycles
Miss Rate 20%
Tag Bits
Index Bits
Offset Bits
AMAT
CMPT 295
L16: Caches III
Tag bits = 5
Index bits = 2
Offset bits = 5
AMAT = 3 + 0.2(125) = 28 cycles
Cache Read
6
0
1
2
K-1
tag
v
bits
bits
bits
Address of byte in memory:
tag
set
index
block
offset
data begins at this offset
Locate set
Check if any line in set
is valid and has
matching tag: hit
Locate data starting
at offset
valid bit
= # sets
=
= blocks/lines per set
= bytes per block
CMPT 295
L16: Caches III
Example: Direct-Mapped Cache ( = 1)
7
Direct-mapped: One line per set
Block Size = 8 B
bits
0…01
100
Address of int:
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
find set
= sets
CMPT 295
L16: Caches III
Example: Direct-Mapped Cache ( = 1)
8
bits
0…01
100
Address of int:
0
1
2
7
tag
v
3
6
5
4
match?: yes = hit
valid? +
block offset
Direct-mapped: One line per set
Block Size = 8 B
CMPT 295
L16: Caches III
Example: Direct-Mapped Cache ( = 1)
9
bits
0…01
100
Address of int:
0
1
2
7
tag
v
3
6
5
4
match?: yes = hit
valid? +
int (4 B) is here
block offset
No match? Then old line gets evicted and replaced
This is why we want alignment!
Direct-mapped: One line per set
Block Size = 8 B
CMPT 295
L16: Caches III
Example: Set-Associative Cache ( = 2)
10
bits
0…01
100
Address of short int:
find set
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
2-way: Two lines per set
Block Size = 8 B
CMPT 295
L16: Caches III
1 fewer s bit, 2x fewer sets
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
Example: Set-Associative Cache ( = 2)
11
bits
0…01
100
compare both
valid? +
match: yes = hit
block offset
tag
2-way: Two lines per set
Block Size = 8 B
Address of short int:
CMPT 295
L16: Caches III
0
1
2
7
tag
v
3
6
5
4
0
1
2
7
tag
v
3
6
5
4
Example: Set-Associative Cache ( = 2)
12
bits
0…01
100
valid? +
match: yes = hit
block offset
short int (2 B) is here
No match?
One line in set is selected for eviction and replacement
Replacement policies: random, least recently used (LRU), …
compare both
Address of short int:
2-way: Two lines per set
Block Size = 8 B
CMPT 295
L16: Caches III
Example Code Analysis Problem
Assuming the cache starts cold (all blocks invalid) and sum is stored in a register, calculate the miss rate:
= 12 bits, = 256 B, = 32 B, = 2
#define SIZE 8
long ar[SIZE][SIZE], sum = 0; // &ar=0x800
for (int i = 0; i < SIZE; i++)
for (int j = 0; j < SIZE; j++)
sum += ar[i][j];
13
CMPT 295
L16: Caches III
k = 5 bits
256 / 32 = 8 blocks
8 / 2 = 4 sets
s = 2 bits
t = 12 – 5 – 2 = 5 bits
0x800 = 0b 1000 0000 0000
8 bytes per elem
4 longs fit in a block
¼ miss rate (missed are compulsory)
What about writes?
Multiple copies of data exist:
L1, L2, possibly L3, main memory
What to do on a write-hit?
Write-through: write immediately to next level
Write-back: defer write to next level until line is evicted (replaced)
Must track which cache lines have been modified (“dirty bit”)
What to do on a write-miss?
Write-allocate: (“fetch on write”) load into cache, update line in cache
Good if more writes or reads to the location follow
No-write-allocate: (“write around”) just write immediately to memory
Typical caches:
Write-back + Write-allocate, usually
Write-through + No-write-allocate, occasionally
14
CMPT 295
L16: Caches III
Why? Reuse is common
Typically you would read, then write (incr)
Or after you initialize a value (say, to 0), likely read and write it again soon
Write-back, write-allocate example
15
0xBEEF
Cache
Memory
G
0xCAFE
0xBEEF
0
F
G
dirty bit
tag (there is only one set in this tiny cache, so the tag is the entire block address!)
In this example we are sort of
ignoring block offsets. Here a block
holds 2 bytes (16 bits, 4 hex digits).
Normally a block would be much
bigger and thus there would be
multiple items per block. While only
one item in that block would be
written at a time, the entire line would
be brought into cache.
Contents of memory stored at address G
CMPT 295
L16: Caches III
Valid bit not shown
Tiny cache, 1 set, tag is G
Dirty bit because write back
Write-back, write-allocate example
16
0xBEEF
Cache
Memory
G
0xCAFE
0xBEEF
0
F
G
STORE 0xFACE, F
dirty bit
CMPT 295
L16: Caches III
Miss, need to maybe write back
But not dirty, so I don’t have to
Pull in F
0xBEEF
U
0
Write-back, write-allocate example
17
0xCAFE
Cache
Memory
F
0xCAFE
0xBEEF
F
G
dirty bit
0xCAFE
0
Step 1: Bring F into cache
STORE 0xFACE, F
CMPT 295
L16: Caches III
Pull in F, then write
0xBEEF
U
0
Write-back, write-allocate example
18
0xCAFE
Cache
Memory
F
0xCAFE
0xBEEF
F
G
dirty bit
0xFACE
1
Step 2: Write 0xFACE
to cache only and set
dirty bit
STORE 0xFACE, F
CMPT 295
L16: Caches III
Write, make dirty (write-back)
0xBEEF
U
0
Write-back, write-allocate example
19
0xCAFE
Cache
Memory
F
0xCAFE
0xBEEF
F
G
mov 0xFEED, F
dirty bit
0xFACE
1
Write hit!
Write 0xFEED to
cache only
STORE 0xFACE, F
CMPT 295
L16: Caches III
Another write hit, don’t need to go to mem
0xBEEF
U
0
Write-back, write-allocate example
20
0xCAFE
Cache
Memory
F
0xCAFE
0xBEEF
F
G
LOAD G, %t1
dirty bit
0xFEED
1
STORE 0xFEED, F
STORE 0xFACE, F
CMPT 295
L16: Caches III
Read miss, and my line is dirty! Write back
Write-back, write-allocate example
21
0xBEEF
Cache
Memory
G
0xFEED
0xBEEF
0
F
G
dirty bit
1. Write F back to memory
since it is dirty
2. Bring G into the cache so
we can copy it into %t1
LOAD G, %t1
STORE 0xFEED, F
STORE 0xFACE, F
CMPT 295
L16: Caches III
/docProps/thumbnail.jpeg