程序代写代做代考 Agda cache go 19. Caches: Direct Mapped

19. Caches: Direct Mapped
EECS 370 – Introduction to Computer Organization – Fall 2020
Satish Narayanasamy
EECS Department
University of Michigan in Ann Arbor, USA
© Narayanasamy 2020
The material in this presentation cannot be copied in any form without written permission

Announcements
Upcoming deadlines:
HW4 due Nov 10th Project 3 due Nov. 12th
2

Recap: Cache Blocks and Write policy
3

Review: Cache Organization Tag
Block Offset0
16-bit ISA
15
V D
LRU
Address
Tag
Data block (bytes)
Contents of a Cache line
Cache blocks:
Captures spatial locality (increase cache hit rate) Reduces tag overhead (number and size of tags)
Need not store block offset in the cache line
Determine byte to be read/written from the address directly
4
Recap

Review: How to find tag from address?
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
1
0
2
100
Memory
0123456789
10 11 12 13 14
15
100
110
120
130
140
1
110
150
140
160
150
170
180
190
200
210
110
150
R0 R1 R2 R3
Cache tag data
lru
Addr: 0101
Misses: 2 Hits: 0
220
230
240
250
5
Recap

Review: Writes
Write-allocate vs. no-write-allocate caches
Policy that decides what to do with a cache-miss on a store instruction.
Write-allocate: First bring data from memory into the cache, then write
No-write-allocate: do not bring data in the cache, just write directly to the memory, not to the cache
6
Recap

Review: Writes
Write-through vs. write-back caches
Policy that decides when to write to cache vs. memory vs. both
Write-through: write to both cache and memory
Write-back: write only to cache, keep track of dirty cache line, write to memory when dirty cache line is evicted
7
Recap

Review: Writes
Store w No Allocate Write-Back Write-Through
Hit? Write Cache Write to Cache + Memory
Miss? Write to Memory Write to Memory
Replace block? If evicted block is dirty, Do Nothing write to Memory
Store w Allocate Write-Back Write-Through
Hit? Write Cache Write to Cache + Memory
Miss? Read from Memory to Cache, Allocate to LRU block
Write to Cache
Read from Memory to Cache, Allocate to LRU block
Write to Cache + Memory
Replace block? If evicted block is dirty, Do Nothing write to Memory
8
Recap

Direct Mapped Caches
9

Fully-associative caches
A block can go to any location
tag
data
Memory
00 0 0
01 00010
2 0011
3 0010
4 010 1
5 010 0
6 011 1
7 011 0
8 100 1
9 100 0
10 101 1
11 101 0
12 110 1
13 110 0
14 111 1
15 111
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
Address:
3 bits
1 bit
tag
block offset
11
Tag Block_offset

Fully-associative caches
We designed a fully-associative cache
•A memory location can be copied to any cache line.
•We check every cache tag to determine whether the data is in the cache.
This approach can be too slow sometimes
•Parallel tag searches are expensive and can be slow. Why?
12
EECS 370: Introduction to Computer Organization

Direct mapped caches
We can redesign the cache to eliminate the requirement for parallel tag lookups
•Direct mapped caches partition memory into as many regions as there are cache lines
•Each memory region maps to a single cache line in which data can be placed
•You then only need to check a single tag – the one associated with the region the reference is located in
13
EECS 370: Introduction to Computer Organization

Mapping memory to cache (Direct-mapped)
Cache
0 tag data
1 2 3
Memory
0123456789
10
11
12
13
14
15
0 0 0 01
0 00 0 0 01 1 0 01 0 0 10 1 0 10 0 0 11 1 0 11 0 1 00 1 1 00 0 1 01 1 1 01 0 1 10 1 1 10 0 1 11 1 1 11
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
A block can go to only one location
Address:
1 bit
2 bits 1 bit
tag
line index
block offset
14
Tag
Index Block_offset

Direct-mapped cache: Placement & Access
Address
Line Index
Block Offset
Tag
V
tag
=?
Tag array
Data array
Block offset
Hit?
MUX
Data
EECS 370: Introduction to Computer Organization
15

Direct mapped caches
Two blocks in memory that map to the same cache index cannot be present in the cache at the same time (conflict)
One indexone entry
Can lead to 0% hit rate if more than one block accessed in an interleaved manner map to the same index
Assume addresses A and B have the same index bits but different tag bits A, B, A, B, A, B, A, B, …
All accesses are conflict misses
16
EECS 370: Introduction to Computer Organization

Direct-mapped cache
Processor
Cache Vdtag data
Memory
0123456789
10
11
12
13
14
15
78
29
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
0
120
123
71
150
162
0
173
18
LRU
Misses: 0 Hits: 0
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 1)
Processor
Cache Vdtag data
Memory
0123456789
10
11
12
13
14
15
18
78
29
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
120
123
71
0
150
162
0
173
18
21
33
28
19
Misses: 0 Hits: 0
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 1)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
0
0
0
78
Memory
0123456789
10
11
12
13
14
15
19
29
29
Misses: 1 Hits: 0
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 2)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
0
0
0
78
Memory
0123456789
10
11
12
13
14
15
20
29
29
Misses: 1 Hits: 0
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 2)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
0
0
1
71
Memory
0123456789
10
11
12
13
14
15
21
150
29
150
Misses: 2 Hits: 0
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 3)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
0
0
1
71
Memory
0123456789
10
11
12
13
14
15
22
150
29
150
Misses: 2 Hits: 0
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 3)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
1
0
1
1
0
71
150
150
123
Memory
0123456789
10
11
12
13
14
15
23
29
150
Misses: 3 Hits: 0
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 4)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
1
0
1
1
0
71
150
150
123
Memory
0123456789
10
11
12
13
14
15
24
29
150
Misses: 3 Hits: 0
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 4)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
1
0
1
1
0
71
150
150
123
Memory
0123456789
10
11
12
13
14
15
25
29
150
Misses: 4 Hits: 0
78
29
150
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 4)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
1
0
78
29
150
123
71
150
162
1
1
1
71
150
162
173
29
Memory
0123456789
10
11
12
13
14
15
26
18
21
33
28
29
150
Misses: 4 Hits: 0
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 5)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
1
0
1
1
1
71
150
162
29
Memory
0123456789
10
11
12
13
14
15
27
29
150
Misses: 4 Hits: 0
78
29
150
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Direct-mapped (REF 5)
Processor
Cache Vdtag data
Ld R1←M[ 1 ] Ld R2←M[ 5 ] St R2→M[ 2 ] St R1→M[ 7 ] Ld R2←M[ 4 ]
R0 R1 R2 R3
1
1
0
1
1
1
71
150
162
29
Memory
0123456789
10
11
12
13
14
15
28
29
71
Misses: 4 Hits: 1
78
29
150
123
71
150
162
173
18
21
33
28
19
200
210
225
00 0 0 00 0 1 00 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 10 0 0 10 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1
Tag
Index Block_offset

Class Problem – What is the state of the cache after executing the following instruction sequence?
Processor
Ld R1←M[ 3 ] LdR2←M[ 12] St R2→M[ 15] St R1→M[ 4 ] Ld R2←M[ 13 ]
R0 R1 R2 R3
Cache
Vdtag data
0
0
Memory
0123456789
10
11
12
13
14
15
29
78
Direct-mapped, write-allocate write-back
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225

Class Problem – What is the state of the cache after executing the following instruction sequence?
Processor
Direct-mapped, write-allocate write-back
Ld R1←M[ 3 ] LdR2←M[ 12] St R2→M[ 15] St R1→M[ 4 ] Ld R2←M[ 13 ]
R0 R1 R2 R3
1
1
Cache
Vdtag data
0
0
3
0
19
200
120
123
Memory
0123456789
10
11
12
13
14
15
30
78
29
120
123
71
150
162
173
18
21
33
28
123
19
19
200
210
225

Class Problem – What is the state of the cache after executing the following instruction sequence?
Processor
Direct-mapped, write-allocate write-back
Ld R1←M[ 3 ] LdR2←M[ 12] St R2→M[ 15] St R1→M[ 4 ] Ld R2←M[ 13 ]
R0 R1 R2 R3
1
1
Cache
Vdtag data
0
1
3
3
19
200
210
19
Memory
0123456789
10
11
12
13
14
15
31
78
29
120
123
71
150
162
173
18
21
33
28
123
19
19
200
210
225

Class Problem – What is the state of the cache after executing the following instruction sequence?
Processor
Direct-mapped, write-allocate write-back
Ld R1←M[ 3 ] LdR2←M[ 12] St R2→M[ 15] St R1→M[ 4 ] Ld R2←M[ 13 ]
R0 R1 R2 R3
1
1
Cache
Vdtag data
1
1
1
3
123
150
210
19
Memory
0123456789
10
11
12
13
14
15
32
78
29
120
123
71
150
162
173
18
21
33
28
123
19
19
200
210
225

Class Problem – What is the state of the cache after executing the following instruction sequence?
Processor
Direct-mapped, write-allocate write-back
Ld R1←M[ 3 ] LdR2←M[ 12] St R2→M[ 15] St R1→M[ 4 ] Ld R2←M[ 13 ]
R0 R1 R2 R3
1
1
Cache
Vdtag data
0
1
3
3
19
200
210
19
Memory
0123456789
10
11
12
13
14
15
33
78
29
120
123
123
150
162
173
18
21
33
28
123
200
19
200
210
225

Class Problem – What is the state of the cache after executing the following instruction sequence?
Processor
Direct-mapped, write-allocate write-back
Ld R1←M[ 3 ] LdR2←M[ 12] St R2→M[ 15] St R1→M[ 4 ] Ld R2←M[ 13 ]
R0 R1 R2 R3
1
1
Cache
Vdtag data
0
0
3
3
19
200
210
19
Memory
0123456789
10
11
12
13
14
15
34
78
29
120
123
123
150
162
173
18
21
33
28
123
200
19
200
210
19

Class Problem
How many tag bits are required for:
32-bit address, byte addressed, direct-mapped 32k cache, 128 byte block size, write-back
What are the overheads of this cache?
35
EECS 370: Introduction to Computer Organization

Class Problem
How many tag bits are required for:
32-bit address, byte addressed, direct-mapped 32k cache, 128 byte block size, write- back
# Bytes in block = 128 => Block offset size = 7 bits (*byte addressable*) #Lines =32k/128=256 =>Lineindex=8bits
Tag bits = 32 – 7 – 8 = 17 bits
What is the overhead of this cache?
17 bits (Tag) + 1 bit (Valid) + 1 bit (Dirty) = 19 bits / line 19 bits / line * 256 lines = 4864 bits
4864 bits / 32KB = 1.9% overhead
36
EECS 370: Introduction to Computer Organization

What about cache for instructions?
Instructions should be cached as well
We have two choices:
1. Treat instruction fetches as normal data and allocate cache lines when fetched
2. Create a second cache (called the instruction cache or ICache) which caches instructions only
How do you know which cache to use? What are advantages of a separate ICache?
37

Integrating Caches into a Pipeline
How are caches integrated into a pipelined implementation? Replace instruction memory with Icache
Replace data memory with Dcache
Issues:
Memory accesses now have variable latency Both caches may miss at the same time
38
EECS 370: Introduction to Computer Organization

LC2K Pipeline with Caches
M U X
1
+
+
ALU
M U X
regA
0
PC
3
M U X
data
R0
R1
R2
R3
R4
R5
R6
R7
4
5
14
7
10
11
21
M U X
regB
7
11
add
lw
Dcache
75
To mem
IF/
ID/ EX/
nand To Mem/
ID EX Mem WB
mem
39
nand1 2 5
Register file
ICache

Summary: Direct-mapped caches
Memory Cache 01
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
0 tagdata 234
1 56 2 78 39
Address:
1 bit
2 bits
1 bit
10
11
12
13
14
15
tag
line index
block offset
40

Next lecture: Get the advantage of both…
Set associative caches:
Partition memory into regions
like direct mapped but fewer partitions Associate a region to a set of cache lines
Check tags for all lines in a set to determine a HIT Treat each line in a set like a small fully associative cache
LRU (or LRU-like) policy generally used
41

Set-associative cache
Memory
0123456789 Way1
10
11
12
13
14
15
42
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
Set 0 Set 1
tag data
Way0
Address:
2 bit
1 bits
1 bit
tag
set index
block offset
EECS 370: Introduction to Computer Organization