Caches II CSE 351 Autumn 2016
Caches II
CMPT 295
L15: Caches II
Making memory accesses fast!
Cache basics
Principle of locality
Memory hierarchies
Cache organization
Direct-mapped (sets; index + tag)
Associativity (ways)
Replacement policy
Handling writes
Program optimizations that consider caches
2
CMPT 295
L15: Caches II
Divide addresses into “index” and “tag”
Cache Organization (1)
Block Size (): unit of transfer between and Mem
Given in bytes and always a power of 2 (e.g. 64 B)
Blocks consist of adjacent bytes (differ in address by 1)
Spatial locality!
3
Note: The textbook uses “B” for block size
CMPT 295
L15: Caches II
Just like withinBlock() from Lab 1!
Cache Organization (1)
Block Size (): unit of transfer between and Mem
Given in bytes and always a power of 2 (e.g. 64 B)
Blocks consist of adjacent bytes (differ in address by 1)
Spatial locality!
Offset field
Low-order bits of address tell you which byte within a block
(address) mod = lowest bits of address
(address) modulo (# of bytes in a block)
4
Block Number
Block Offset
-bit address:
(refers to byte in memory)
bits
bits
Note: The textbook uses “b” for offset bits
CMPT 295
L15: Caches II
Just like withinBlock() from Lab 1!
Little k (or b) for offset bits
Peer Instruction Question
If we have 6-bit addresses and block size = 4 B, which block and byte does 0x15 refer to?
Block Num Block Offset
1 1
1 5
5 1
5 5
We’re lost…
5
CMPT 295
L15: Caches II
0b010101
Num =0b0101 = 5
Offset = 0b01 = 1
Cache Organization (2)
Cache Size (): amount of data the can store
Cache can only hold so much data (subset of next level)
Given in bytes () or number of blocks ()
Example: = 32 KiB = 512 blocks if using 64-B blocks
Where should data go in the cache?
We need a mapping from memory addresses to specific locations in the cache to make checking the cache for an address fast
What is a data structure that provides fast lookup?
Hash table!
6
CMPT 295
L15: Caches II
Review: Hash Tables for Fast Lookup
7
0
1
2
3
4
5
6
7
8
9
Insert:
5
27
34
102
119
Apply hash function to map data to “buckets”
CMPT 295
L15: Caches II
Simple hash function = mod10
Fast
Distributes data
Place Data in Cache by Hashing Address
Map to cache index from block address
Use next bits
(block address) mod (# blocks in cache)
How many bits do I need to specify an index in the cache
8
Block Num Block Data
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Memory
Cache
Index Block Data
00
01
10
11
Here = 4 B
and = 4
CMPT 295
L15: Caches II
Assume 6-bit addresses. 4bit bnum 2 bit offset
Don’t remember, you just use the next needed number of bits
Place Data in Cache by Hashing Address
Map to cache index from block address
Let adjacent blocks fit in cache simultaneously!
Consecutive blocks go in consecutive cache indices
9
Block Addr Block Data
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Memory
Cache
Index Block Data
00
01
10
11
Here = 4 B
and = 4
CMPT 295
L15: Caches II
Practice Question
6-bit addresses, block size = 4 B, and our cache holds = 4 blocks.
A request for address 0x2A results in a cache miss. Which index does this block get loaded into and which 3 other addresses are loaded along with it?
10
CMPT 295
L15: Caches II
0x2A = 0b101010
2 bit offset, 4 bit block num (2 bit index)
Index = 10
Other addresses
1010_00 = 0x28
1010_01 = 0x29
1010_10 = 0x2A (this one)
1010_11 = 0x2B
Place Data in Cache by Hashing Address
Collision!
Different index, same address
Don’t want to mix things up!
Solution?
11
Block Addr Block Data
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Memory
Cache
Index Block Data
00
01
10
11
Here = 4 B
and = 4
CMPT 295
L15: Caches II
Same index, different address (and data)
Tags Differentiate Blocks in Same Index
Tag = rest of address bits
bits =
Check this during a cache lookup
12
Block Addr Block Data
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Memory
Cache
Index Tag Block Data
00 00
01
10 01
11 01
Here = 4 B
and = 4
CMPT 295
L15: Caches II
Possible analogies for Tag:
Hash by first name, tag by last name
Hash by 7-digit phone number, tag by area code
Again, don’t memorize m – s – k
Checking for a Requested Address
CPU sends address request for chunk of data
Address and requested data are not the same thing!
Analogy: your friend ≠ his or her phone number
TIO address breakdown:
Index field tells you where to look in cache
Tag field lets you check that data is the block you want
Offset field selects specified start byte within block
Note: and sizes will change based on hash function
13
Tag ()
Offset ()
-bit address:
Block Number
Index ()
CMPT 295
L15: Caches II
Tio = uncle in spanish
Go through steps
Why are they in this order?
Offset must be last
You want index to ”change fast” to minimize collision
If you swapped tag and index, adjacent blocks would collide (jump forward two slides)
Cache Puzzle
Based on the following behavior, which of the following block sizes is NOT possible for our cache?
Cache starts empty, also known as a cold cache
Access (addr: hit/miss) stream:
(14: miss), (15: hit), (16: miss)
4 bytes
8 bytes
16 bytes
32 bytes
We’re lost…
14
CMPT 295
L15: Caches II
Lab 4!
Stream tells us they weren’t in same block
Draw blocks
Direct-Mapped Cache
Hash function: just index bits
Each memory address maps to exactly one index in the cache
Fast (and simpler) to find an address
15
Block Addr Block Data
00 00
00 01
00 10
00 11
01 00
01 01
01 10
01 11
10 00
10 01
10 10
10 11
11 00
11 01
11 10
11 11
Memory
Cache
Index Tag Block Data
00 00
01 11
10 01
11 01
Here = 4 B
and = 4
CMPT 295
L15: Caches II
Direct-Mapped Cache Problem
What happens if we access the following addresses?
8, 25, 8, 25, 8, …?
Conflict in cache (misses!)
Rest of cache goes unused
Solution?
16
Block Addr Block Data
00 00
00 01
00 10
00 11
01 00
01 01
01 10
01 11
10 00
10 01
10 10
10 11
11 00
11 01
11 10
11 11
Memory
Cache
Index Tag Block Data
00 ??
01 ??
10
11 ??
Here = 4 B
and = 4
CMPT 295
L15: Caches II
Conflict! Rest of cache unused!
8 = 0b 00 10 00
25 = 0b 01 10 01
Associativity
What if we could store data in any place in the cache?
More complicated hardware = more power consumed, slower
So we combine the two ideas:
Each address maps to exactly one set
Each set can store block in more than one way
17
0
1
2
3
4
5
6
7
0
1
2
3
Set
0
1
Set
1-way:
8 sets,
1 block each
2-way:
4 sets,
2 blocks each
4-way:
2 sets,
4 blocks each
0
Set
8-way:
1 set,
8 blocks
direct mapped
fully associative
CMPT 295
L15: Caches II
“Where is address 2?”
Cache Organization (3)
Associativity (): # of ways for each set
Such a cache is called an “-way set associative cache”
We now index into cache sets, of which there are
Use lowest = bits of block address
Direct-mapped: = 1, so = as we saw previously
Fully associative: = , so = 0 bits
18
Decreasing associativity
Fully associative
(only one set)
Direct mapped
(only one way)
Increasing associativity
Selects the set
Used for tag comparison
Selects the byte from block
Tag ()
Index ()
Offset ()
CMPT 295
L15: Caches II
Assume a fixed cache size
Example Placement
Where would data from address 0x1833 be placed?
Binary: 0b 0001 1000 0011 0011
19
= ?
block size: 16 B
capacity: 8 blocks
address: 16 bits
Set Tag Data
0
1
2
3
4
5
6
7
Direct-mapped
Set Tag Data
0
1
2
3
Set Tag Data
0
1
2-way set associative
4-way set associative
Tag ()
Offset ()
-bit address:
Index ()
=
=
= ––
= ?
= ?
CMPT 295
L15: Caches II
19
k = 4
E = 1; s = 3; idx = 3
E = 2; s = 2; idx = 3
E = 4; s = 1; idx = 1
Block Replacement
Any empty block in the correct set may be used to store block
If there are no empty blocks, which one should we replace?
No choice for direct-mapped caches
Caches typically use something close to least recently used (LRU)
(hardware usually implements “not most recently used”)
20
Set Tag Data
0
1
2
3
4
5
6
7
Direct-mapped
Set Tag Data
0
1
2
3
Set Tag Data
0
1
2-way set associative
4-way set associative
CMPT 295
L15: Caches II
Peer Instruction Question
We have a cache of size 2 KiB with block size of 128 B. If our cache has 2 sets, what is its associativity?
2
4
8
16
We’re lost…
If addresses are 16 bits wide, how wide is the Tag field?
21
CMPT 295
L15: Caches II
C = 2^11 B
K = 2^7 B
2^4 = 16 blocks
16 / 2 = 8 E
k = 7; s = 1; t = 16 – 7 – 1 = 8
/docProps/thumbnail.jpeg