代写 C html parallel Many of the following slides are taken with permission from

Many of the following slides are taken with permission from
Complete Powerpoint Lecture Notes for
Computer Systems: A Programmer’s Perspective (CS:APP)
Randal E. Bryant and David R. O’Hallaron
http://csapp.cs.cmu.edu/public/lectures.html
The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506.
Cache Memory and Performance Cache Organization 1
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

Cache memories are small, fast SRAM-based memories managed automatically in hardware.
– Hold frequently accessed blocks of main memory
CPU looks first for data in caches (e.g., L1, L2, and L3), then in main memory. Typical system structure:
CPU chip
Register file
Cache memories
ALU
Bus interface
System bus Memory bus
I/O bridge
Main memory
Cache Memories Cache Organization 2
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

E = 2e lines (blocks) per set
0 1 2e-1
set
line (block)
S = 2s sets
0 1 2 3
2s-1
0
1
2
B-1
valid bit
C = S x E x B data bytes
B = 2b bytes per cache block (the data)
Cache size:
v
tag
General Cache Organization (S, E, B) Cache Organization 3
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

The “geometry” of the cache is defined by:
S = 2s E = 2e B = 2b
the number of sets in the cache
the number of lines (blocks) in a set the number of bytes in a line (block)
These values define a related way to think about the organization of DRAM: DRAM consists of a sequence of blocks of B bytes.
The bytes in a block (line) can be indexed by using b bits. DRAM consists of a sequence of groups of S blocks (lines).
The blocks (lines) in a group can be indexed by using s bits.
Each group contains SxB bytes, which can be indexed by using s + b bits.
Cache Defines View of DRAM Cache Organization 4
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

Cache size:
C = S x E x B = 64 data bytes
E = 21 blocks (lines) per set
01 0
DRAM
0 1 2 3 4 5
7 … 252 253 254 255
..
S = 23 sets
1
2
3
4
5 66 7
0
1
2
3
v
tag
valid bit
B = 22 bytes per cache block (the data)
Cache (8, 2, 4) and 256-Byte DRAM Cache Organization 5
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

Assume a cache has the following geometry:
address DRAM
00000000 00000001 00000010 00000011 00000100 00000101 00000110 00000111 …….. 00011100 00011101 00011110 00011111 ……..
S = 22 = 8 E = 21 = 2 B = 22 = 4
the number of sets in the cache
the number of lines (blocks) in a set the number of bytes in a line (block)
Suppose that DRAM consists of 256 bytes, so we have 8-bit addresses.
Then DRAM consists of:
– 64 blocks, each holding 4 bytes – 8groups,eachholding8blocks
group


Example of Cache View of DRAM Cache Organization 6
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
block 0 block 1 block 7

Pick an address:
011 group
01110101
101 01 block byte
address DRAM
…….. 01100000 …….. 01110000 01110001 01110010 01110011 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 ……..
……..
byte 00
a1a0 give the byte number
and the offset of the byte in the block.
……..
byte 00
byte 01
byte 10
byte 11
byte 00
group
byte 01
……..
byte 10
byte 11
byte 00
byte 01
byte 10
byte 11
Example of Cache View of DRAM Cache Organization 7
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
block 000 block 100 block 101 block 110

Pick an address:
011 group
01110101
101 01 block byte
address DRAM
…….. 01100000 …….. 01110000 01110001 01110010 01110011 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 ……..
……..
byte 00
……..
byte 00
a4a3a2 give the block number
byte 01
byte 10
byte 11
byte 00
group
byte 01
a4a3a200* equals the offset of the block in the group.
* a4a3a200 = a4a3a2 x 22
= a4a3a2 x (size of a block)
……..
byte 10
byte 11
byte 00
byte 01
byte 10
byte 11
Example of Cache View of DRAM Cache Organization 8
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
block 000 block 100 block 101 block 110

Pick an address:
011 group
01110101
101 01 block byte
address DRAM
…….. 01100000 …….. 01110000 01110001 01110010 01110011 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 ……..
……..
byte 00
byte 00
byte 01
byte 10
a7a6a5 give the group number a7a6a500000* equals the offset of the
group
byte 11
byte 00
group in the DRAM.
00000000 01100000
……..
* Why?
……..
byte 01
byte 10
byte 11
byte 00
byte 01
byte 10
byte 11
Example of Cache View of DRAM Cache Organization 9
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
block 000 block 100 block 101 block 110

011 101 01
DRAM
00000000
00100000
01000000
01100000
10000000
10100000
11000000
11100000
Group 011 in DRAM
01100000
01100100
01101000
01101100
01110000
01110100
01111000
01111100
Block 101 in Group 011
01110100 01110101 01110110 01110111
byte 00
byte 01
byte 10
byte 11
The BIG Picture Cache Organization 10
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

Pick an address: 01110101
How does this address map into the cache?
The DRAM block number determines the cache set used to store the block.
Note this means that two DRAM blocks from the same DRAM group cannot map into the same cache set.
So the address: 01110101 maps to set 101 in the cache.
Each set in our cache can hold 2 blocks.
This block could be stored at either location within the corresponding set.
address DRAM
…….. 01110000 01110001 01110010 01110011 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 ……..
……..
byte 00
byte 01
byte 10
byte 11
byte 00
byte 01
byte 10
byte 11
byte 00
byte 01
byte 10
byte 11
……..
Example of Cache View of DRAM Cache Organization 11
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
block 100 block 101 block 110

DRAM block containing address: 01110101 Maps to cache set: 01110101
address DRAM
…….. 01110000 01110001 01110010 01110011 01110100 01110101 01110110 01110111 01111000 01111001 01111010 01111011 ……..
……..
Cache
byte 00
0 1 2 3 4 5 6 7
byte 01
byte 10
byte 11
byte 00
byte 01
byte 10
byte 11
or
valid tag
byte 00
byte 01
byte 10
0
1
2
3
1
011
byte 11
……..
Example of Cache View of DRAM Cache Organization 12
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain
block 100 block 101 block 110

So, to generalize, suppose a cache has:
S = 2s E = 2e B = 2b
sets
blocks (lines) per set bytes per block (line)
And, suppose that DRAM uses N-bit addresses. then for any address:
aN-1 … as+b
as+b-1 … ab
ab-1 … a0
Bits ab-1:a0 give the byte index within the block Bits ab+s-1:ab give the set index
Bits aN-1:as+b become the tag for the data
Note that these bits are only the same for blocks that are within the same DRAM group.
Cache View of DRAM Cache Organization 13
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

1. Locate set
2. Check if any line in set has matching tag
3. Yes + line valid: hit
4. Locate data starting at offset
Address of word:
Set
1
1
Line0 1 2e-1 0
t bits
s bits = K
b bits = J
tag
set block index offset
K 2s-1
K
2
data begins at this offset
3
4
0
1
J
2b-1
v
tag
Cache Read Cache Organization 14
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

Direct mapped: One line per set Assume: cache block size 8 bytes
Address of int:
find set
0
1
2
3
4
5
6
7
v
tag
t bits
0…01
100
0
1
2
3
4
5
6
7
S = 2s sets
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
v
tag
Example: Direct Mapped Cache (E = 1) Cache Organization 15
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

Direct mapped: One line per set Assume: cache block size 8 bytes
valid? + matching tagshit
Address of int:
v
tag
0
1
2
3
4
5
6
7
t bits
0…01
100
block offset
Example: Direct Mapped Cache (E = 1) Cache Organization 16
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

Direct mapped: One line per set Assume: cache block size 8 bytes
valid? + matching tagshit
Address of int:
v
tag
0
1
2
3
4
5
6
7
t bits
0…01
100
int (4 Bytes) is here
No match: old line (block) is evicted and replaced by requested block from DRAM
block offset
Example: Direct Mapped Cache (E = 1) Cache Organization 17
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

t=1 s=2 b=1
M=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 Blocks/set
Address trace (reads, one byte per read):
0 [00002], miss
1 [00012], hit
7 [01112], miss
8 [10002],
0 [00002]
x
xx
x
miss miss
v Tag
Block
01
010?
M[80?-19]
1
0
M[6-7]
Set 0 Set 1 Set 2 Set 3
Direct-Mapped Cache Simulation Cache Organization 18
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

E = 2: Two lines per set
Assume: cache block size 8 bytes
4
5
6
7
0
0
1
1
2
2
3
3
4
4
Address of short int:
5
5
t bits
6
6
7
7
0…01
100
v
tag
0
1
2
3
v
tag
find set
v
tag
0
1
2
3
4
5
6
7
v
tag
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
E-way Set Associative Cache (Here: E = 2)
Cache Organization 19
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

E = 2: Two lines per set Assume: cache block size 8 bytes
valid? +
Address of short int:
compare both
matching tagshit
t bits
0…01
100
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
block offset
E-way Set Associative Cache (Here: E = 2)
Cache Organization 20
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

E = 2: Two lines per set Assume: cache block size 8 bytes
Address of short int:
valid? +
compare both matching tagshit
t bits
0…01
100
v
tag
0
1
2
3
4
5
6
7
v
tag
0
1
2
3
4
5
6
7
No match:
short int (2 Bytes) is here
• One line in set is selected for eviction and replacement
• Replacement policies: random, least recently used (LRU), …
block offset
E-way Set Associative Cache (Here: E = 2)
Cache Organization 21
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

t=2 s=1 b=1
M=16 byte addresses, B=2 bytes/block, S=2 sets, E=2 blocks/set
Address trace (reads, one byte per read):
xx
x
x
0 1 7 8 0
v Tag
[00002], [00012], [01112], [10002], [00002]
Block
miss hit miss miss hit
01
0?0
M[0?-1]
0 1
10
M[8-9]
Set 0 Set 1
0 1
01
M[6-7]
0
2-Way Set Associative Cache Simulation
Cache Organization 22
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

The “geometry” of the cache is defined by:
S = 2s E = 2e B = 2b
E = 1 (e = 0) S >1
E = K > 1
S = 1 (only one set) E = # of cache blocks
the number of sets in the cache
the number of lines (blocks) in a set the number of bytes in a line (block)
direct-mapped cache
only one possible location in cache for each DRAM block K-way associative cache
K possible locations (in same cache set) for each DRAM block
fully-associative cache
each DRAM block can be at any location in the cache
Cache Organization Types Cache Organization 23
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain

If we have an associative cache (K-way or fully), how do we determine if a given DRAM block occurs within a set?
Compare the tag we’re trying to match to all of the tags for blocks in the relevant set at the same time!
Then factor in the valid bits, also in parallel.
And employ a MUX
Searching a Set Cache Organization 24
CS@VT Computer Organization II ©2005-2015 CS:APP & McQuain