May 11
Agenda
AMAT
Multilevel Caches
Improving Cache Performance
Anatomy of a Cache Question
Example Cache Questions
1
CMPT 295
L17: Caches IV
Great Idea #3: Principle of Locality/
Memory Hierarchy
2
CMPT 295
L17: Caches IV
Measuring Memory Accesses
In a machine without caches, the time it takes to access memory is easy to measure! If going to memory takes 1000 cycles, every access is 1000 cycles
What happens when we introduce caches?
Every access first checks the cache
Some hit (are found) and so we don’t need to check memory at all!
Some miss and have to go to memory
Can represent “average” memory access time
3
CMPT 295
L17: Caches IV
3
Average Memory Access Time
Average Memory Access Time (AMAT): average time to access memory considering both hits and misses
AMAT = Hit time + Miss rate × Miss penalty
(abbreviated AMAT = HT + MR × MP)
Hit time: time to check if the data we’re looking for is in the cache (also called “tag check!)
Miss rate: percentage of accesses of a program that are not in the cache when we look for them
Miss penalty: cost incurred by going to memory
7/17/2017
4
CMPT 295
L17: Caches IV
AMAT Goals!
Goal 1: Examine how changing the different cache parameters affects our AMAT
Hit time, miss rate, miss penalty
Goal 2: Examine how to optimize your code for better cache performance (Project 4)
Cache blocking, access patterns, etc.
5
CMPT 295
L17: Caches IV
5
Hit Time
Time taken to search the cache for the address/information you want!
Contributing factors:
How many comparisons do you have to do?
Cache size decreases → hit time decreases
Generally, as associativity increases, so does hit time
Why? Increase in associativity == decrease in number of sets, even without a change in blocks! We have to look in more places for the same data.
6
CMPT 295
L17: Caches IV
6
Miss Rate
The fraction of accesses your program makes which do not hit in the cache; the fraction of accesses which require you to go to memory
Contributing factors:
Access pattern in your program
Goal: apply f(x) to all vals in array
Good: apply f(x) to blocks of the array at a time
Bad: pick elements at random, or between blocks
Block/cache size, type
Larger blocks, larger caches, more associativity can hold more data, keep data around longer
Reducing misses (Three C’s!)
7
CMPT 295
L17: Caches IV
7
Sources of Cache Misses: The 3Cs
Compulsory: (Many names: cold start, process migration (switching processes), 1st reference)
First access to block impossible to avoid;
Effect is small for long running programs
Capacity:
Cache cannot contain all blocks accessed by the program, so full associativity won’t hold all blocks
Conflict: (collision)
Multiple memory locations mapped to the same cache location, so theres a lack of associativity
8
CMPT 295
L17: Caches IV
The 3Cs: Design Solutions
Compulsory:
Increase block size (increases MP; too large blocks could increase MR)
Capacity:
Increase cache size (may increase HT)
Conflict:
Increase associativity (to fully associative) (may increase HT)
9
CMPT 295
L17: Caches IV
Which type of miss is it?
10
No.
STOP! This is a conflict miss.
Yes.
Have you seen more than B* unique blocks?
B* = # blocks in cache
Have you pulled this data into your cache before?
No.
STOP! This is a compulsory miss
Yes.
STOP! This is a capacity miss.
CMPT 295
L17: Caches IV
10
Miss Penalty
“Cost” (usually in cycles, or fractions of a second) of a miss
Often, this is going to memory, but sometimes it is the cost of going to another cache!
Contributing factors:
How “big” is your memory hierarchy?
More steps == more penalty for things that miss at every level, but less things to incur the full penalty
Smaller block size → lower MP
11
CMPT 295
L17: Caches IV
11
Agenda
AMAT
Multilevel Caches
Improving Cache Performance
Anatomy of a Cache Question
Example Cache Questions
Bonus: Contemporary Cache Specs
12
CMPT 295
L17: Caches IV
Minimising AMAT
To make AMAT small, we want to have a small hit time, low miss rate, and/or small miss penalty. It is hard to do all these at once!
Direct mapped caches have small hit time
Fully-Associative caches have a low miss rate
Small blocks help with lowering miss penalty, so do many opportunities to store data
Can we combine all these things into one structure to achieve our goal?
13
CMPT 295
L17: Caches IV
13
Multiple Cache Levels
With advancing technology, have more room on chip for bigger L1 caches and for L2 (and in some cases even L3) cache
Higher numbered caches are lower-level (closer to memory)
Multilevel caching is a way to reduce miss penalty; picking the right kinds of caches can help reduce other AMAT factors
So what does this look like?
14
CMPT 295
L17: Caches IV
Return
Multilevel Cache Diagram
15
L1$
L2$
Main Memory
.
.
.
CPU
Memory
Access
Path of data back to CPU
Miss
Miss
Hit
Hit
Legend:
Request for data
Return of data
Store
Store
If Write Allocate
Write Miss
Write Miss
CMPT 295
L17: Caches IV
The stores back in higher-level caches depends on your write-miss policy (write allocate or no-write allocate).
15
Design Considerations
L1$ focuses on low hit time (fast access)
minimize HT to achieve shorter clock cycle
L1 MP significantly reduced by presence of L2$, so can be smaller/faster even with higher MR
e.g. smaller $ (fewer rows)
L2$, L3$ focus on low miss rate
As much as possible avoid reaching to main memory (heavy penalty)
e.g. larger $ with larger block sizes (same # rows)
16
CMPT 295
L17: Caches IV
Multilevel Cache AMAT
AMAT = L1 HT + L1 MR × L1 MP
Now L1 MP depends on other cache levels
Must trace ENTIRE path to memory
L1 MP = L2 HT + L2 MR × L2 MP
If more levels, then continue this chain
(i.e. MPi = HTi+1 + MRi+1 × MPi+1)
Final MP is main memory access time
For two levels:
AMAT = L1 HT + L1 MR × (L2 HT + L2 MR × L2 MP)
17
CMPT 295
L17: Caches IV
Local vs. Global Miss Rates
Local miss rate: Fraction of references to one level of a cache that miss
e.g. L2$ local MR = L2$ misses/L1$ misses
Specific to level of caching (as used in AMAT)
Global miss rate: Fraction of all references that miss in all levels of a multilevel cache
Property of the overall memory hierarchy
Global MR is the product of all local MRs
Start at Global MR = Ln misses/Ln-1 accesses all multiplied together
So by definition, global MR ≤ any local MR
18
CMPT 295
L17: Caches IV
L1$ misses are the only requests that reach L2$
18
Global Miss Rates
We may also refer to the global miss rate of a particular level of cache
For example Global MR L2
This means the fraction of total accesses that miss at L1 and L2
As a result we can sometimes talk about global miss rates without necessarily involving every level of cache
19
CMPT 295
L17: Caches IV
19
Local and Global Miss Rates
For every 1000 CPU-to-memory references
40 will miss in L1$; what is the local MR?
20 will miss in L2$; what is the local MR?
Overall global miss rate?
20
CPU
L1$
L2$
MM
1000 mem refs
40 mem refs
20 mem refs
1 cycle
10 cycles
100 cycles
0.04
0.5
0.02
CMPT 295
L17: Caches IV
Rewriting Performance
For a two level cache, we know:
MRglobal = L1 MR × L2 MR
AMAT:
AMAT = L1 HT + L1 MR × (L2 HT + L2 MR × L2 MP)
AMAT = L1 HT + L1 MR × L2 HT + MRglobal × L2 MP
Aside: Sometimes might have to convert between global and local MR
L2 Global MR = L2 Local MR × L1 MR
L2 Local MR = L2 Global MR ÷ L1 MR
21
CMPT 295
L17: Caches IV
AMAT and CPI calculated differently.
AMAT is a direct substitution, where L1 MP = L2 HT + L2 MR * L2 MP.
CPIstall calculates memory-stall cycles, so is additive (HT is hidden away in CPIbase).
21
Multilevel Cache Practice (1/3)
Processor specs:
L1$ and L2$
5 cycle L1$ hit time and 4% L1$ miss rate
100 cycle penalty to go to main memory
0.5% L2$ global miss rate
25 cycle penalty to go to L2$
What is AMAT?
22
CMPT 295
L17: Caches IV
Multilevel Cache Practice (2/3)
23
L2 Local MR = L2 Global MR ÷ L1 MR
CMPT 295
L17: Caches IV
Multilevel Cache Practice (3/3)
Without L2$:
AMAT = (5 + 0.04×100)
= 9 cycles
With L2$:
AMAT = HTL1$ + MRL1$ × (HTL2$ + MRL2$ × MPL2$)
= 5 + .04 × ( 25 + .125 × 100)
= 6.5 cycles
24
CMPT 295
L17: Caches IV
Agenda
AMAT
Multilevel Caches
Improving Cache Performance
Anatomy of a Cache Question
Example Cache Questions
Bonus: Contemporary Cache Specs
25
CMPT 295
L17: Caches IV
The Cache Design Space
26
Several interacting dimensions
Cache parameters:
Cache size, Block size, Associativity
Policy choices:
Write-through vs. write-back
Replacement policy
Optimal choice is a compromise
Depends on access characteristics
Workload and use (I$, D$)
Depends on technology / cost
Simplicity often wins
(Associativity)
Cache Size
Block Size
Bad
Good
Less
More
Factor A
Factor B
CMPT 295
L17: Caches IV
No fancy replacement policy is needed for the direct mapped cache.
As a matter of fact, that is what cause direct mapped trouble to begin with: only one place to go in the cache–causes conflict misses.
Effect of Block and Cache Sizes
on Miss Rate
Cache Size
27
Miss rate goes up if the block size becomes a significant fraction of the cache size because the number of blocks that can be held in the same size cache is smaller (increasing capacity misses)
CMPT 295
L17: Caches IV
Increasing the block size usually decreases the miss rate.
A more serious problem is that the miss penalty goes up since it is primarily determined by the time to fetch the block from the next lower level of the hierarchy and load it into the cache.
Benefits of Set-Associative Caches
28
Consider cost of a miss vs. cost of implementation
Largest gains are in going from direct mapped to 2-way
(20%+ reduction in miss rate)
CMPT 295
L17: Caches IV
As cache sizes grow, the relative improvement from associativity increases only slightly; since the overall miss rate of a larger cache is lower, the opportunity for improving the miss rate decreases and the absolute improvement in miss rate from associativity shrinks significantly.
/docProps/thumbnail.jpeg