18. Block Size and Writes
EECS 370 – Introduction to Computer Organization – Fall 2020
Satish Narayanasamy
EECS Department
University of Michigan in Ann Arbor, USA
© Narayanasamy 2020
The material in this presentation cannot be copied in any form without written permission
Announcements
Upcoming deadlines:
HW4 due Nov 10th Project 3 due Nov. 12th
Midterm regrade issues are being worked out. Should be resolved this week.
2
Recap: Memory Hierarchy and Cache: The Basics
3
Memory Technology Design Space
Storage
FLASH
1E-01 1E-02
1E-03 1E-04 1E-05
1E-06 1E-07 1E-08
1E-09
Derived from Keynote by Martin Fink, NVMW 2017
Focus of 370
DRAM
SRAM
10
New memory technologies (emerging)
Memory
Ideal
0.01
0.1 1 Die Cost ($/GB)
4
Recap
Access time (s)
Memory hierarchy: Leveraging locality of reference
Fast
Cost Expensive Size Small
Temporarily move what you use here
For a program with
good locality of reference, memory appears as fast as cache and
as big as disk
Have a copy of everything here
Slow
Cache (SRAM)
Main memory (DRAM)
Disk
(magnetic or floating gate)
Cost Cheap Size Big
5
Recap
Memory in a Modern System
ISA Memory
Abstraction manual/compiler
register spilling
Automatic HW cache management
automatic demand paging
Register File
32 words, sub-nsec
L1 cache ~32 KB, ~nsec
L2 cache
512 KB ~ 1MB, many nsec
L3 cache, …..
Main memory (DRAM), GB, ~100 nsec
Swap Disk 100 GB, ~10 msec (~10,000,000nsec)
6
Recap
Intel Optane
Intel Optane web page
7
Cache: Importance
Caches consume
most of a processor’s die area
CORE 0
CORE 1
DRAM MEMORY CONTROLLER
CORE 2
CORE 3
8
Recap
DRAM BANKS DRAM INTERFACE
L2 CACHE 1 L2 CACHE 3 L2 CACHE 0 L2 CACHE 2
SHARED L3 CACHE
Review: A simple cache architecture
Tag Array
V/I/I
1
Addr-3
Data Array
cache line
data-3
0
Addr-5
data-5
0
A cache memory consists of multiple tag/block pairs (called cache lines) Address is searched across all tags in parallel.
At most one tag will match
If there is a tag match, it is a cache HIT
If there is no tag match, it is a cache MISS
Cache data that is likely to be used in future – exploit temporal locality 9
data-6
cache block
Addr-6
1
Addr-7
data-7
Recap
Temporal Locality
The principle of temporal locality in program references says that if you access a memory location (e.g., 0x1000) you will be more likely to re-access that location (e.g., 0x1000) than you will be to reference some other random location
Temporal locality says any miss location should be placed into the cache It is the most recent reference location
Temporal locality says that data in least recently referenced (or least recently used – LRU ) cache line should be evicted to make room for the new line
Because the re-access probability falls over time as a cache line isn’t referenced, the LRU line is least likely to be re-referenced
10
Recap
Tracking LRU
Naïve method
Maintain LRU_rank per cache line
Set the LRU_rank of newly accessed block to 0. Increment others by 1. Replace cache line with highest LRU_rank
Area overhead: log(number of cache lines) per cache line
LRU with least area overhead
# permutations for N cache lines is n! need just log(n!) bits for N cache lines
11
Problem — AMAT
Assume main memory access time does not include cache access time.
Suppose that accessing a cache takes 10ns while accessing main memory in case of cache-miss takes 100ns. What is the average memory access time if the cache hit rate is 97%?
To improve performance, the cache size is increased. It is determined that this will increase the hit rate by 1%, but it will also increase the time for accessing the cache by 2ns. Will this improve the overall average memory access time?
12
Problem —AMAT
Suppose that accessing a cache takes 10ns while accessing main memory in case of cache-miss takes 100ns. What is the average memory access time if the cache hit rate is 97%?
AMAT=10 +(1-0.97)*100=13ns
To improve performance, the cache size is increased. It is determined that this will increase the hit rate by 1%, but it will also increase the time for accessing the cache by 2ns. Will this improve the overall average memory access time?
AMAT=12 +(1-0.98)*100=14ns
13
This lecture
Cache blocks
Spatial locality
Problems
Stores: Write-through vs Write-back
14
How can we reduce the overhead for each cache line?
lru
tag data
Cache bigger units than bytes
Each block has a single tag, and blocks can be whatever size we choose.
1
1
110
1
7
170
15
Increasing cache block size reduces (tag and other) overhead
Block Block Case1 Case2
0123456789
10
11
12
13
14
15
Case 1:
Block size: 1 byte
V tag data (block)
1
0
74
1
6
160
Bits per tag
= log2(number of blocks in memory) = log2(16) = 4 bits
Memory
74
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
0123456789
10
11
12
13
14
15
01234567
16
Case 2:
Block size: 2 bytes
V tag data (block)
1
0
74
110
1
6
160
170
Bits per tag
= log2(number of blocks in memory) = log2(8) = 3 bits
Block size: Idea
If you increase cache block size
=> Increases data per cache line
=> Increases data per tag + reduces tag size
17
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache
2 cache lines
3-bit tag field
2-byte block
tag data
Memory
0123456789
10
11
12
13
14
15
120
130
V
140
150
V
100
110
160
170
180
190
200
210
R0 R1 R2 R3
220
230
240
250
18
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache tag data
Memory
0123456789
10
11
12
13
14
15
0
120
130
140
150
0
100
110
160
170
180
190
200
210
R0 R1 R2 R3
220
230
240
250
19
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
1
0
100
Memory
0123456789
10
11
12
13
14
15
100
110
120
130
140
110
150
0
160
170
180
190
200
210
110
R0 R1 R2 R3
Cache tag data
lru
Addr: 0001
Misses: 1 Hits: 0
220
230
240
250
20
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache tag data
lru
0
0
100
Memory
0123456789
10
11
12
13
14
15
100
110
120
130
140
1
110
150
160
170
180
190
200
210
110
R0 R1 R2 R3
220
Misses: 1 Hits: 0
230
240
250
21
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
1
0
2
100
Memory
0123456789
10
11
12
13
14
15
100
110
120
130
140
1
110
150
140
160
150
170
180
190
200
210
110
150
R0 R1 R2 R3
Cache tag data
lru
Addr: 0101
Misses: 2 Hits: 0
220
230
240
250
22
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache
tag data lru
1
2
100
Memory
0123456789
10
11
12
13
14
15
110
120
130
140
1
0
110
150
140
100
160
150
170
180
190
200
210
110
150
R0 R1 R2 R3
220
Misses: 2 Hits: 0
230
240
250
23
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache tag data
lru
1
2
100
Memory
0123456789
10
11
12
13
14
15
110
120
130
140
1
0
110
150
140
100
160
150
170
180
190
200
210
110
150
110
R0 R1 R2 R3
220
Misses: 2 Hits: 1
230
240
250
24
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache tag data
lru
1
2
100
Memory
0123456789
10
11
12
13
14
15
110
120
130
140
1
0
110
150
140
100
160
150
170
180
190
200
210
110
150
110
R0 R1 R2 R3
220
Misses: 2 Hits: 1
230
240
250
25
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache
tag data lru
1
2
100
Memory
0123456789
10
11
12
13
14
15
110
120
130
140
1
0
110
150
140
100
160
150
170
180
190
200
210
110
150
140
R0 R1 R2 R3
220
Misses: 2 Hits: 2
230
240
250
26
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache
tag data lru
1
2
100
Memory
0123456789
10
11
12
13
14
15
110
120
130
140
1
0
110
150
140
100
160
150
170
180
190
200
210
110
150
140
R0 R1 R2 R3
220
Misses: 2 Hits: 2
230
240
250
27
Block size for caches
Processor
Ld R1←M[ 1 ] Ld R2←M[ 5 ] Ld R3←M[ 1 ] Ld R3←M[ 4 ] Ld R2←M[ 0 ]
Cache tag data
lru
1
2
100
Memory
0123456789
10
11
12
13
14
15
110
120
130
140
1
0
110
150
140
100
160
150
170
180
190
200
210
110
1400
140
R0 R1 R2 R3
220
Misses: 2 Hits: 3
230
240
250
28
Spatial Locality
When we accessed address 1, we also brought in address 0.
This turned out to be a good thing since we later referenced address 0 and found it in the cache.
Spatial locality in a program says that if we reference a memory location (e.g., 1000), we are more likely to reference a location near it (e.g. 1001, 999) than some random location.
29
Storing arrays in memory: Row major vs Column major
Row major is
a common choice
30
Spatial Locality
Observation: Applications access data near to what they just accessed
31
Spatial Locality
Observation: Applications access data near to what they just accessed
32
Spatial Locality
Observation: Applications access data near to what they just accessed
33
Importance of Cache on Performance:
Cache Aware code can be several times faster than non-Aware code
Live demo:
See L1_3_370_Course_Overview Video at minute 18:00
34
Basic Cache organization
Decide on the block size
How? Simulate lots of different block sizes and see which one gives the best performance
Common cache block sizes: 32, 64, or 128 bytes
Larger block sizes reduce cache area overhead by:
Reducing number of cache lines (therefore, number of tags and other meta-data) Reducing each tag size
Address
Block size = 2 ^ (block_offset) sizeof(block_offset) = log2(block_size)
Tag size = address_size – block_offset_size
Tag
Block offset
35
Practice Problem– Compute the register and cache state after executing the instruction sequence
Processor
Cache
2 cache lines 2-bit tag field 4-byte block
V tag data
Memory
0123456789
10
11
12
13
14
15
Ld R0M[ 3] Ld R2M[ 12] Ld R3M[ 15] Ld R1M[ 4] Ld R2M[ 9]
R0 R1 R2 R3
0
0
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
36 225
Solution to Practice Problem
Ld R0M[ 3]
V tag data
Ld R2M[ 12] Ld R3M[ 15] Ld R1M[ 4] Ld R2M[ 9] Vtag data Vtag data Vtag data Vtag data
1
0
78
78
1
29
29
lru lru
miss
lru
lru lru
120
1
0
120
123
123
0
1
3
19
1
3
1
71
1
71
150
150
162
162
173
1
3
19
19
2
173
1
18
200
1
0
78
29
120
123
200
200
21
210
210
210
33
225
225
225
1
28
miss
hit
miss
miss
37
EECS 370: Introduction to Computer Organization
Solution to Practice Problem
Ld R0M[ 3]
V tag data
Ld R2M[ 12] Ld R3M[ 15] Ld R1M[ 4] Ld R2M[ 9] Vtag data Vtag data Vtag data Vtag data
1
0
78
78
1
29
29
lru lru
miss
lru
lru lru
120
1
0
120
123
123
0
1
3
19
1
3
1
71
1
71
150
150
162
162
173
1
3
19
19
2
173
1
18
200
1
0
78
29
120
123
200
200
21
210
210
210
33
225
225
225
1
28
miss
hit
miss
miss
38
EECS 370: Introduction to Computer Organization
Class Problem 1
Given a cache with the following configuration: cache size is 8 bytes, block size is 2 bytes, fully associative, LRU replacement. The memory address size is 16 bits and is byte addressable.
1. How many bits are for each tag? How many blocks in the cache?
2.For the following reference stream, indicate whether each reference is a hit or miss: 0, 1, 3, 5, 12, 1, 2, 9, 4
3.What is the hit rate?
4.How many bits are needed for storage overhead for each block?
39
Class Problem 1
Given a cache with the following configuration: cache size is 8 bytes, block size is 2 bytes, fully associative, LRU replacement. The memory address size is 16 bits and is byte addressable. Assume 2 bits per cache line to implement LRU.
1. How many bits are for each tag? How many blocks in the cache? Tag = 16 – 1 (block offset) = 15 bits;
2 byte blocks, 8 bytes total = 4 blocks.
2.For the following reference stream, indicate whether each reference is a hit or miss: 0, 1, 3, 5, 12, 1, 2, 9, 4 M,H,M,M,M,H,H,M,M
3.What is the hit rate? 3/9 = 33 %
4.How many bits are needed for storage overhead for each block?
Overhead = 15 (Tag) + 1 (V) + 2 (LRU) = 18 bits
We are assuming 2 bits per cache line to store the LRU rank.
More efficient solutions for LRU exists: log2(#cache_lines!) = log2(4!) = 5 bits
40
Class Problem 2—Storage overhead
Consider the following cache:
32-bit byte addressable ISA
Cache size : 64KB (kilo-bytes) = 64 * 8 kilo-bits = 512 Kilo-bits Cache block size : 64 Bytes
Write-allocate, write-back, fully associative Recall: 1 kilobyte = 1024 bytes (NOT 1000 bytes!)
What is the cache area overhead (tags, valid, dirty, LRU)?
41
Class Problem 2—Storage overhead
Consider the following cache:
32-bit byte addressable ISA
Cache size : 64KB (kilo-bytes) = 64 * 8 kilo-bits = 512 Kilo-bits Cache block size : 64 Bytes
Write-allocate, write-back, fully associative Recall: 1 kilobyte = 1024 bytes (NOT 1000 bytes!)
What is the cache area overhead (tags, valid, dirty, LRU)? Assume log(#blocks) bits for LRU.
Tag
#blocks
LRU bits
Overhead per block Total overhead for cache
= 26 bits = 1024 = 10 bits
= 26(Tag) + 1(V) + 1(D) + 10 (LRU) = 38 bits * #blocks = 38 * 1024
= 32 (Address) – 6 (block offset) = 64K / 64
= log(#blocks)
= 38 bits
= 38912 bits
42
Handling Stores:
write-through vs write-back caches
43
Memory
0123456789
10
11
12
13
14
15
44
74
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
Data Array
Tag Array
V/I/I
What about stores?
45
Where should you write the result of a store to an address X? If address X is in the cache
Write to the cache.
Should we also write to memory?
(yes – write-through policy)
(no – write-back policy – write only when modified cache block is evicted)
If address X is not in the cache Allocate address in the cache?
yes – allocate-on-write policy
no – directly write to memory, no allocate-on-write policy
Handling stores (write-through)
Processor
Cache
V tag data
0
Memory
0123456789
10
11
12
13
14
15
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
162
0
Misses: 0 Hits: 0
78
29
173
18
21
33
28
19
200
210
225
46
write-through (REF 1)
Processor
Cache
V tag data
0
Memory
0123456789
10
11
12
13
14
15
78
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
0
Misses: 0 Hits: 0
29
120
123
71
150
162
173
18
21
33
28
19
200
210
225
47
write-through (REF 1)
Processor
Cache
V tag data
lru
1
0
78
Memory
0123456789
10
11
12
13
14
15
78
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
29
162
0
173
18
21
33
28
29
19
Misses: 1 Hits: 0
200
210
225
48
write-through (REF 2)
Processor
Cache
V tag data
lru
1
0
78
Memory
0123456789
10
11
12
13
14
15
78
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
29
162
0
173
18
21
33
28
29
19
Misses: 1 Hits: 0
200
210
225
49
write-through (REF 2)
Processor
Cache
lru V tag data
1
1
0
Memory
0123456789
10
11
12
13
14
15
78
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
3
29
162
162
173
173
18
21
33
28
29
173
19
Misses: 2 Hits: 0
78
200
210
225
50
write-through (REF 3)
Processor
Cache
lru V tag data
1
1
0
Memory
0123456789
10
11
12
13
14
15
78
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
3
29
162
162
173
173
18
21
33
28
29
173
19
Misses: 2 Hits: 0
78
200
210
225
51
write-through (REF 3)
Processor
Cache
V tag data
lru
Memory
0123456789
10
11
12
13
14
15
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
0
29
120
123
71
150
3
29
162
1
162
173
173
173
18
21
33
28
29
173
19
Misses: 2 Hits: 1
173
200
210
225
52
write-through (REF 4)
Processor
Cache
V tag data
lru
1
1
0
Memory
0123456789
10
11
12
13
14
15
173
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
3
29
162
162
173
173
18
21
33
28
29
173
19
Misses: 2 Hits: 1
173
200
210
225
53
write-through (REF 4)
Processor
Cache
lru V tag data
1
0
Memory
0123456789
10
11
12
13
14
15
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
2
29
173
Misses: 3 Hits: 1
173
29
71
12590
173
29
120
123
71
12590
162
173
18
21
33
28
19
200
210
225
54
write-through (REF 6)
Processor
Cache
lru V tag data
1
1
0
Memory
0123456789
10
11
12
13
14
15
173
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
29
2
29
162
71
173
29
18
21
33
28
29
173
19
Misses: 3 Hits: 1
173
200
210
225
55
write-through (REF 6)
Processor
Cache
V tag data
lru
1
5
2
33
Memory
0123456789
10
11
12
13
14
15
173
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
1
29
28
162
71
173
29
18
21
33
28
29
33
19
Misses: 4 Hits: 1
200
210
225
56
How many reads/writes to main memory in a write-through cache?
Each cache miss reads a block (2 bytes in our example) from main memory Total reads from main memory: 8 bytes
Each store writes a byte to main memory Total writes to main memory: 2 bytes
But, cache miss rate < 20%.
total main memory reads due to cache misses << main memory writes in a write-through cache
57
Write-through vs. write-back
Can we also design the cache to NOT write all stores to memory immediately?
We can keep the most recent copy in the cache and update the memory only when
that data is evicted from the cache (a write-back policy).
Do we need to write-back all evicted cache blocks? No, only blocks that have been stored into
Keep a “dirty bit”:
reset when the line is allocated
set when the block is stored into
If a block is “dirty” when evicted, write its data back into memory.
58
Handling stores (write-back)
Processor
Cache Vdtag data
0
Memory
0123456789
10
11
12
13
14
15
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
162
0
Misses: 0 Hits: 0
78
173
18
21
33
28
19
200
210
225
59
write-back (REF 1)
Processor
0
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
Cache Vdtag data
0
Misses: 0 Hits: 0
78
29
120
123
71
150
162
173
18
21
33
28
19
200
210
15
225
60
write-back (REF 1)
Processor
Cache Vdtag data
1
0
0
0
78
Memory
0123456789
10 11 12 13 14
78
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
29
162
173
18
21
33
28
29
19
Misses: 1 Hits: 0
200
210
15
225
61
lru
write-back (REF 2)
Processor
Cache Vdtag data
1
0
0
0
78
Memory
0123456789
10 11 12 13 14
78
29
120
123
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
71
150
29
162
173
18
21
33
28
29
19
Misses: 1 Hits: 0
200
210
15
225
62
lru
write-back (REF 2)
Processor
Cache Vdtag data
0
0
78
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
29
1
0
3
162
173
78
29
120
123
71
150
162
173
18
21
33
28
29
173
19
Misses: 2 Hits: 0
200
210
15
225
63
lru
write-back (REF 3)
Processor
Cache Vdtag data
0
0
78
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
29
1
0
3
162
173
78
29
120
123
71
150
162
173
18
21
33
28
29
173
19
Misses: 2 Hits: 0
200
210
15
225
64
lru
write-back (REF 3)
Processor
Cache Vdtag data
1
0
173
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
29
1
0
3
162
173
78
29
120
123
71
150
162
173
18
21
33
28
29
173
19
Misses: 2 Hits: 1
200
210
15
225
65
lru
write-back (REF 4)
Processor
Cache Vdtag data
1
0
173
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
29
1
0
3
162
173
78
29
120
123
71
150
162
173
18
21
33
28
29
173
19
Misses: 2 Hits: 1
200
210
15
225
66
lru
write-back (REF 4)
Processor
Cache Vdtag data
1
0
173
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
29
1
1
2
71
29
78
29
120
123
71
150
162
173
18
21
33
28
29
173
19
Misses: 3 Hits: 1
200
210
15
225
67
lru
write-back (REF 5)
Processor
Cache Vdtag data
1
0
173
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
29
1
1
2
71
29
78
29
120
123
71
150
162
173
18
21
33
28
29
173
19
Misses: 3 Hits: 1
200
210
15
225
68
lru
write-back (REF 5)
Processor
Cache Vdtag data
1
0
173
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
29
1
1
2
71
29
17783
29
120
123
71
150
162
173
18
21
33
28
29
173
19
Misses: 4 Hits: 1
200
210
15
225
69
lru
write-back (REF 5)
Processor
Cache Vdtag data
0
5
33
Memory
0123456789
10 11 12 13 14
Ld R1←M[ 1 ] Ld R2←M[ 7 ] St R2→M[ 0 ] St R1→M[ 5 ] Ld R2←M[ 10 ]
R0 R1 R2 R3
1
28
1
1
2
71
29
78
29
120
123
71
150
162
173
18
21
33
28
29
33
19
Misses: 4 Hits: 1
200
210
15
225
70
lru
How many reads/writes to main memory in a write-through cache?
Each cache miss reads a block (2 bytes in our example) from main memory Total reads from main memory: 8 bytes
Each store writes a byte to main memory Total writes to main memory: 4 bytes
For this example, would you choose write-back or write-through?
71
Summary: Questions to ask about a cache
Cache design (so far):
How many bytes of data storage?
What is the block size?
How many lines?
What is the replacement policy?
Performance
What is the hit rate?
What is the latency of an access?
Area overhead
How much overhead storage?
(= cache size / cache block size) (LRU? LFU? FIFO? Random?)
(spatial locality) (temporal locality)
(tags, valid, dirty, LRU)
72
Interactive tools
73
Interactive tool: caches
URL: Cache interactive simulator
74
Interactive tool: pipeline
URL: Pipeline interactive simulator
75