程序代写代做代考 cache May 11

May 11

Caches III
https://what-if.xkcd.com/111/

CMPT 295
L16: Caches III

Making memory accesses fast!
Cache basics
Principle of locality
Memory hierarchies
Cache organization
Direct-mapped (sets; index + tag)
Associativity (ways)
Replacement policy
Handling writes
Program optimizations that consider caches
2

CMPT 295
L16: Caches III
Divide addresses into “index” and “tag”

General Cache Organization (, , )
3

= blocks/lines per set
= # sets
=
set
“line” (block plus
management bits)

0
1
2
K-1

Tag
V
valid bit

= bytes per block
Cache size:
data bytes
(doesn’t include V or Tag)

CMPT 295
L16: Caches III
“Layers” of a cache:
Block (data)
Line (data + management bits)
Set (many lines based on associativity)
Cache (many sets based on cache size & associativity)

Valid bit lets us know if this line has been initialized (“is valid”).

Notation Review
We just introduced a lot of new variable names!
Please be mindful of block size notation when you look at past exam questions or are watching videos
4
Variable This Quarter Formulas
Block size ( in book)

Cache size
Associativity
Number of Sets
Address space
Address width
Tag field width
Index field width
Offset field width ( in book)

CMPT 295
L16: Caches III
Review this on your own
DON’T MEMORIZE FORUMLAS

Example Cache Parameters Problem
4 KiB address space, 125 cycles to go to memory.
Fill in the following table:

5
Cache Size 256 B
Block Size 32 B
Associativity 2-way
Hit Time 3 cycles
Miss Rate 20%
Tag Bits
Index Bits
Offset Bits
AMAT

CMPT 295
L16: Caches III
Tag bits = 5
Index bits = 2
Offset bits = 5
AMAT = 3 + 0.2(125) = 28 cycles

Cache Read
6

0
1
2
K-1

tag
v

bits
bits
bits
Address of byte in memory:

tag
set
index
block
offset
data begins at this offset
Locate set
Check if any line in set
is valid and has
matching tag: hit
Locate data starting
at offset
valid bit
= # sets
=
= blocks/lines per set
= bytes per block

CMPT 295
L16: Caches III

Example: Direct-Mapped Cache ( = 1)
7

Direct-mapped: One line per set
Block Size = 8 B
bits
0…01
100
Address of int:

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4
find set
= sets

CMPT 295
L16: Caches III

Example: Direct-Mapped Cache ( = 1)
8
bits
0…01
100
Address of int:

0
1
2
7
tag
v
3
6
5
4
match?: yes = hit
valid? +
block offset
Direct-mapped: One line per set
Block Size = 8 B

CMPT 295
L16: Caches III

Example: Direct-Mapped Cache ( = 1)
9
bits
0…01
100
Address of int:

0
1
2
7
tag
v
3
6
5
4
match?: yes = hit
valid? +

int (4 B) is here
block offset
No match? Then old line gets evicted and replaced
This is why we want alignment!
Direct-mapped: One line per set
Block Size = 8 B

CMPT 295
L16: Caches III

Example: Set-Associative Cache ( = 2)
10
bits
0…01
100
Address of short int:
find set

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4
2-way: Two lines per set
Block Size = 8 B

CMPT 295
L16: Caches III
1 fewer s bit, 2x fewer sets

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4
Example: Set-Associative Cache ( = 2)
11
bits
0…01
100
compare both
valid? +
match: yes = hit
block offset
tag
2-way: Two lines per set
Block Size = 8 B
Address of short int:

CMPT 295
L16: Caches III

0
1
2
7
tag
v
3
6
5
4

0
1
2
7
tag
v
3
6
5
4
Example: Set-Associative Cache ( = 2)
12
bits
0…01
100
valid? +
match: yes = hit
block offset

short int (2 B) is here
No match?
One line in set is selected for eviction and replacement
Replacement policies: random, least recently used (LRU), …
compare both
Address of short int:
2-way: Two lines per set
Block Size = 8 B

CMPT 295
L16: Caches III

Example Code Analysis Problem
Assuming the cache starts cold (all blocks invalid) and sum is stored in a register, calculate the miss rate:
= 12 bits, = 256 B, = 32 B, = 2
#define SIZE 8
long ar[SIZE][SIZE], sum = 0; // &ar=0x800
for (int i = 0; i < SIZE; i++) for (int j = 0; j < SIZE; j++) sum += ar[i][j]; 13 CMPT 295 L16: Caches III k = 5 bits 256 / 32 = 8 blocks 8 / 2 = 4 sets s = 2 bits t = 12 – 5 – 2 = 5 bits 0x800 = 0b 1000 0000 0000 8 bytes per elem 4 longs fit in a block ¼ miss rate (missed are compulsory) What about writes? Multiple copies of data exist: L1, L2, possibly L3, main memory What to do on a write-hit? Write-through: write immediately to next level Write-back: defer write to next level until line is evicted (replaced) Must track which cache lines have been modified (“dirty bit”) What to do on a write-miss? Write-allocate: (“fetch on write”) load into cache, update line in cache Good if more writes or reads to the location follow No-write-allocate: (“write around”) just write immediately to memory Typical caches: Write-back + Write-allocate, usually Write-through + No-write-allocate, occasionally 14 CMPT 295 L16: Caches III Why? Reuse is common Typically you would read, then write (incr) Or after you initialize a value (say, to 0), likely read and write it again soon Write-back, write-allocate example 15 0xBEEF Cache Memory G 0xCAFE 0xBEEF 0 F G dirty bit tag (there is only one set in this tiny cache, so the tag is the entire block address!) In this example we are sort of ignoring block offsets. Here a block holds 2 bytes (16 bits, 4 hex digits). Normally a block would be much bigger and thus there would be multiple items per block. While only one item in that block would be written at a time, the entire line would be brought into cache. Contents of memory stored at address G CMPT 295 L16: Caches III Valid bit not shown Tiny cache, 1 set, tag is G Dirty bit because write back Write-back, write-allocate example 16 0xBEEF Cache Memory G 0xCAFE 0xBEEF 0 F G STORE 0xFACE, F dirty bit CMPT 295 L16: Caches III Miss, need to maybe write back But not dirty, so I don’t have to Pull in F 0xBEEF U 0 Write-back, write-allocate example 17 0xCAFE Cache Memory F 0xCAFE 0xBEEF F G dirty bit 0xCAFE 0 Step 1: Bring F into cache STORE 0xFACE, F CMPT 295 L16: Caches III Pull in F, then write 0xBEEF U 0 Write-back, write-allocate example 18 0xCAFE Cache Memory F 0xCAFE 0xBEEF F G dirty bit 0xFACE 1 Step 2: Write 0xFACE to cache only and set dirty bit STORE 0xFACE, F CMPT 295 L16: Caches III Write, make dirty (write-back) 0xBEEF U 0 Write-back, write-allocate example 19 0xCAFE Cache Memory F 0xCAFE 0xBEEF F G mov 0xFEED, F dirty bit 0xFACE 1 Write hit! Write 0xFEED to cache only STORE 0xFACE, F CMPT 295 L16: Caches III Another write hit, don’t need to go to mem 0xBEEF U 0 Write-back, write-allocate example 20 0xCAFE Cache Memory F 0xCAFE 0xBEEF F G LOAD G, %t1 dirty bit 0xFEED 1 STORE 0xFEED, F STORE 0xFACE, F CMPT 295 L16: Caches III Read miss, and my line is dirty! Write back Write-back, write-allocate example 21 0xBEEF Cache Memory G 0xFEED 0xBEEF 0 F G dirty bit 1. Write F back to memory since it is dirty 2. Bring G into the cache so we can copy it into %t1 LOAD G, %t1 STORE 0xFEED, F STORE 0xFACE, F CMPT 295 L16: Caches III /docProps/thumbnail.jpeg