20. Caches: Set Associative
EECS 370 – Introduction to Computer Organization – Fall 2020
Satish Narayanasamy
EECS Department
University of Michigan in Ann Arbor, USA
© Narayanasamy 2020
The material in this presentation cannot be copied in any form without written permission
Announcements
Upcoming deadlines:
HW4 due Nov 10th Project 3 due Nov. 12th
Grading policy: Best of two
2
Fully-associative caches
0123456789
10
11
12
13
14
15
Memory
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
tag data to any location
A block can go
Address:
tag
block offset
3 bits
1 bit
3
Recap
Fully-associative cache: Placement & Access
Address
Tag
Block Offset
Tag array
Data array
V
tag
Compare all tags
MUX
Block offset Data
Hit?
4
Recap
Direct-mapped caches
0 tagdata 234
1 56 2 78 39
A block can go to one location
Memory Cache 01
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
Address:
1 bit
2 bits
1 bit
10
11
12
13
14
15
tag
line index
block offset
5
Recap
Direct-mapped cache: Placement & Access
Address
Line Index
Block Offset
Tag
Tag array
Data array
V
tag
Block offset Data
=?
MUX
Hit?
6
Recap
This lecture
Set Associative Caches
Idea Illustration 3C problem
7
The middle ground…
Set associative caches
Partition memory into regions,like direct mapped but fewer partitions Associate a region to a set of cache lines
Check tags for all lines in a set to determine a HIT Treat each line in a set like a small fully associative cache
LRU (or LRU-like) policy generally used
8
Set-associative cache
Memory
0123456789 Way1
10
11
12
13
14
15
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
Set 0 Set 1
tag data
Way0
Address:
2 bit
1 bits
1 bit
tag
set index
block offset
9
Set-associative cache: Placement & Access
SET
Tag array
Data array
V
tag
V
tag
=?
=?
MUX MUX
Block offset
Set Index
Block Offset
Logic
Hit?
Address
Tag
10
Cache Organization Comparison
Cache size = 8 bytes (for all caches) Block size = 2 bytes
#blocks = 4
Fully associative
# blocks per set = all blocks = 4 in this example;
so, also correct to view this cache as 4-way associative
Vdlrutag data Vdlrutag data Vdlrutag data Vdlrutag data
Direct mapped: (#blocks per set = 1) Vdtag data
2-way associative (#blocks per set = 2) V d lru tag data
11
Cache Organization: Equations
Block
block_offset_size = log2(#block_size)
#cache lines = #blocks
= #lines / (lines per set)
(all lines are in 1 set)
Set
#blocks = cache size / block_size
Direct-mapped: 2-way associative: n-way associative: fully-associative:
#sets = #sets = #sets = #sets = #sets =
#lines/ #ways #lines/1 #lines/2 #lines/n
1
set_index_size = log2(#sets)
Tag size = address size – set_index_size – block_offset_size
12
Class Problem 1
For a 32-bit address and 16KB cache with 64-byte blocks, show the breakdown of the address for the following cache configuration:
A) fully associative cache B) 4-way set associative cache
C) Direct-mapped cache
13
Class Problem 1 (Solution)
For a 32-bit address and 16KB cache with 64-byte blocks, show the breakdown of the address for the following cache configuration: #lines = 16 KB / 64 byte = 256
A) fully associative cache
Tag = 32 – 6 = 26 bits
C) Direct-mapped cache
#sets = #lines / ways = #lines/1 = 256 Set_index_size = log(#sets) = log(256) = 8 bits Tag = 32 – 6 – 8 = 18 bits
Block_offset_size = log(block_size) = log(64)= 6 bits
B) 4-way set associative cache
#sets = #lines / ways = #lines/ 4 = 64 Set Index size = log(64) = 6 bits
Tag = address_size – set_index_size – block_offset_size = 32 – 6 – 6
= 20 bits
14
Set Associative Caches: Illustration
15
Set-associative cache example (Write-back, write allocate)
Processor
Cache Vdtag data
78
29
120
123
71
0
0
0
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
150
162
173
18
21
33
28
19
Misses: 0 Hits: 0
Memory
0123456789
10
11
12
13
14
15
16
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
Tag
Index Block_offset
Set-associative cache (REF 1)
Processor
Cache Vdtag data
Memory
0123456789
10
11
12
13
14
15
17
78
29
120
123
71
0
0
0
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
150
162
173
18
21
33
28
19
Misses: 0 Hits: 0
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
Tag
Index Block_offset
Set-associative cache (REF 1)
Processor
Cache Vdtag data
1
0
0
78
29
0
0
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
Misses: 1 Hits: 0
Memory
0123456789
10
11
12
13
14
15
18
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru
Tag
Index Block_offset
Set-associative cache (REF 2)
Processor
Cache Vdtag data
1
0
0
78
29
0
0
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
Misses: 1 Hits: 0
Memory
0123456789
10
11
12
13
14
15
19
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru
Tag
Index Block_offset
Set-associative cache (REF 2)
Processor
Cache Vdtag data
1
1
0
0
0
0
1
78
29
71
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
Misses: 2 Hits: 0
Memory
0123456789
10
11
12
13
14
15
20
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru
Tag
Index Block_offset
Set-associative cache (REF 3)
Processor
Cache Vdtag data
1
1
0
0
0
0
1
78
29
71
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
Misses: 2 Hits: 0
Memory
0123456789
10
11
12
13
14
15
21
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru
Tag
Index Block_offset
Set-associative cache (REF 3)
Processor
Cache Vdtag data
1
1
0
1
0
1
1
78
29
1
0
71
150
162
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
Misses: 3 Hits: 0
Memory
0123456789
10
11
12
13
14
15
22
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru lru
Tag
Index Block_offset
Set-associative cache (REF 4)
Processor
Cache Vdtag data
1
1
0
1
0
1
1
78
29
1
0
71
150
162
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
Misses: 3 Hits: 0
Memory
0123456789
10
11
12
13
14
15
23
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru lru
Tag
Index Block_offset
Set-associative cache (REF 4)
Processor
Cache Vdtag data
1
1
0
1
0
1
1
78
29
1
1
29
150
162
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
Misses: 3 Hits: 1
Memory
0123456789
10
11
12
13
14
15
24
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru lru
Tag
Index Block_offset
Set-associative cache (REF 5)
Processor
Cache Vdtag data
1
1
0
1
0
1
1
78
29
1
1
29
150
162
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
Misses: 3 Hits: 1
Memory
0123456789
10
11
12
13
14
15
25
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru lru
Tag
Index Block_offset
Set-associative cache (REF 5)
Processor
Cache Vdtag data
1
1
0
1
0
1
1
78
29
1
1
29
150
162
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
78
Misses: 3 Hits: 2
Memory
0123456789
10
11
12
13
14
15
26
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru lru
Tag
Index Block_offset
Set-associative cache (REF 6)
Processor
Cache Vdtag data
1
1
0
1
0
1
1
78
29
1
1
29
150
162
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
78
Misses: 3 Hits: 2
Memory
0123456789
10
11
12
13
14
15
27
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru lru
Tag
Index Block_offset
Set-associative cache (REF 6)
Processor
Cache Vdtag data
1
1
0
1
0
1
1
78
29
1
1
29
150
162
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
150
78
Misses: 3 Hits: 2
Memory
0123456789
10
11
12
13
14
15
28
78
29
120
123
29
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru
lru
Tag
Index Block_offset
Set-associative cache (REF 6)
Processor
Cache Vdtag data
1
1
0
1
0
2
1
78
29
1
0
18
21
162
150
0
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 7 ] St R1→M[ 4 ] Ld R3←M[ 0 ] Ld R2←M[ 8 ]
R0 R1 R2 R3
29
18
78
Misses: 4 Hits: 2
Memory
0123456789
10
11
12
13
14
15
29
78
29
120
123
29
150
162
173
18
21
33
28
19
200
210
225
0 0 0 01
00 00 00 11 00 10 01 01 01 00 01 11 01 10 10 01 10 00 10 11 10 10 11 01 11 00
11 11 11 1
lru
lru
Tag
Index Block_offset
Reasons for cache misses a.k.a. The 3C’s of Cache Misses
Compulsory miss
First reference to any block will always miss Also sometimes called a “cold start” miss
Capacity miss
Cache is too small to hold all the data Would have had a hit with an infinite cache
Conflict miss
Would have had a hit with a fully associative cache
30
Classifying Cache Misses
Can we classify a cache miss into one of the following?
Compulsory miss
Capacity miss
Conflict miss
Yes! Simulate three different caches
Simulate with a cache of unlimited size (cache size = memory size)
– Any misses must be compulsory misses
Simulate again with a fully associative cache of the intended size
– Any new misses must be capacity misses Simulate a third time, with the actual intended cache
– Any new misses must be conflict misses
31
Fixing cache misses
Compulsory misses
First reference to an address
No way to completely avoid these
Reduce by increasing block size (spatial locality) This reduces the total number of blocks
Capacity misses
Would have a hit with a large enough cache Reduce by building a bigger cache
Conflict misses
Would have had a hit with a fully associative cache Cache does not have enough associativity
Reduce by increasing associativity
32
3 C’s Sample Problem
Consider a cache with the following configuration: write-allocate
Cache size = 64 bytes
Block size = 16 bytes 2-way associative.
16-bit byte-addressable ISA. (address size is 16 bits). LRU replacement policy.
Assume the cache is empty at the start.
For the following memory accesses, indicate whether the reference is a hit or miss, and the type of a miss (compulsory, conflict, capacity)
33
3 C’s Practice Problem – Address sequence
Address
0x00 0x14 0x27 0x08 0x38 0x4A 0x18 0x27 0x0F 0x40
34
3 C’s Practice Problem – Simulate infinite cache Address
0478A87F0
0x0
0x1
0x2
0x0
0x3
0x4
0x1
0x2
0x0
0x4
35
Tag Block_offset
3 C’s Practice Problem – Simulate fully associative cache Address
tag
0x0
0x1
0x2
0x0
0x3
0x4
0x1
0x2
0x0
0x4
0478A87F0
36
Tag Block_offset
3 C’s Practice Problem – Simulate given set associative cache Address
0478A87F0
0x0
0x1
0x2
0x0
0x3
0x4
0x1
0x2
0x0
0x4
tag data Set 0
Set 1
tag data
37
Block_offset
Tag
Set index
3 C’s Practice Problem – 3 C’s
Address
Infinite
FA
SA
3Cs
0x00
M
M
M
0x14
M
M
M
0x27
M
M
M
0x08
H
H
H
0x38
M
M
M
0x4A
M
M
M
0x18
H
M
H
0x27
H
M
M
0x0F
H
M
M
0x40
H
H
M
38
3 C’s Practice Problem – 3 C’s
Address
Infinite
FA
SA
3Cs
0x00
M
M
M
Compulsory
0x14
M
M
M
Compulsory
0x27
M
M
M
Compulsory
0x08
H
H
H
—
0x38
M
M
M
Compulsory
0x4A
M
M
M
Compulsory
0x18
H
M
H
—
0x27
H
M
M
Capacity
0x0F
H
M
M
Capacity
0x40
H
H
M
Conflict
39
Cache Parameters vs. Miss Rate
Cache Size
Block Size Associativity Replacement policy
40
Questions to ask
Can block size be not power of 2?
Can number of sets be not power of 2?
Can number of ways be not power of 2? Can we have 3-way set associative cache?
41
Cache Size
Cache size in the total data (not including tag) capacity bigger can exploit temporal locality better
not ALWAYS better
Too large a cache adversely affects hit & miss latency
smaller is faster => bigger is slower access time may degrade critical path
Too small a cache
doesn’t exploit temporal locality well useful data replaced often
Working set: the whole set of data executing application references
Within a time interval
hit rate
“working set” size
cache size
42
Block size (also called Line size)
Block size is the data that is associated with an address tag Sub-blocking: A block divided into multiple pieces (each with V bit)
Can improve “write” performance
Too small blocks
don’t exploit spatial locality well have larger tag overhead
Too large blocks
too few total # of blocks likely-useless data transferred Extra bandwidth/energy consumed
hit rate
43
Block size
Associativity
How many blocks map to the same set (same set index)?
Larger associativity
lower miss rate, less variation among programs diminishing returns
Smaller associativity lower cost
faster hit time
Especially important for L1 caches
Power of 2 associativity?
hit rate
associativity
44