程序代写代做代考 flex mips computer architecture algorithm cache PowerPoint 演示文稿

PowerPoint 演示文稿

CO101
Principle of Computer

Organization

Lecture 16: Memory 2
Liang Yanyan

澳門科技大學

Macau of University of Science and
Technology

Computer Organization
• CPU clock rate is much faster than memory.
• Slow memory can dramatically reduce the performance.

2

Processor

Control

Datapath

Memory

Devices

Input

Output

C
ache

M
ain

M
em

ory

Secondary
M

em
ory

(D
isk)

A Typical Memory Hierarchy
• Take advantage of the principle of locality to present the

user with as much memory as is available in the
cheapest technology at the speed offered by the fastest
technology

3

Second
Level
Cache

(SRAM)

Control

Datapath

Secondary
Memory
(Disk)

On-Chip Components

R
egFile

Main
Memory
(DRAM) Data

C
ache

Instr
C

ache

IT
L

B

D
T

L
B

Speed (%cycles): ½’s 1’s 10’s 100’s 10,000’s

Size (bytes): 100’s 10K’s M’s G’s T’s

Cost: highest lowest

Important issues
• Where can a data be placed in cache when we copy the

data from main memory to cache?
• As we can move data from memory to cache to make use of

temporal and spatial locality.

• How to locate a data in cache?
• This is related to the first issue.

• If cache is full, how to replace an existing data?

4

Processor read data from memory
• Assume memory address is M-bit width
• Search the cache

• Use the lower K bits of the memory address as index to locate the
corresponding cache entry.

• Check if the tag equals to the upper (M-K) bits of the memory address.
• Check if the valid bit is 1.
• If all these are true → cache hit, return data from cache to processor.

5

CPU:
I need data

of this address

(M-K) bits K bits 0
1
2
3

1022
1023

Index Tag Data Valid Address (32 bits)

=

To CPU

Hit

10 22

Index

Tag

Cache Example
• 8-blocks, 1 word/block, direct mapped
• Initial state

6

Index V Tag Data
000 N
001 N
010 N
011 N
100 N
101 N
110 N
111 N

Cache Example

7

Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110

Index V Tag Data
000 N
001 N
010 N
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Cache Example

8

Index V Tag Data
000 N
001 N
010 Y 11 Mem[11010]
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Word addr Binary addr Hit/miss Cache block
26 11 010 Miss 010

Cache Example

9

Index V Tag Data
000 N
001 N
010 Y 11 Mem[11010]
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Word addr Binary addr Hit/miss Cache block
22 10 110 Hit 110
26 11 010 Hit 010

Cache Example

10

Index V Tag Data
000 Y 10 Mem[10000]
001 N
010 Y 11 Mem[11010]
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Word addr Binary addr Hit/miss Cache block
16 10 000 Miss 000
3 00 011 Miss 011
16 10 000 Hit 000

Cache Example

11

Index V Tag Data
000 Y 10 Mem[10000]
001 N
010 Y 10 Mem[10010]
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Word addr Binary addr Hit/miss Cache block
18 10 010 Miss 010

Cache Example

12

Index V Tag Data
000 Y 10 Mem[10000]
001 N
010 Y 10 Mem[10010]
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Word addr Binary addr Hit/miss Cache block
16 10 000 Hit 000

Example:
• 4 bits memory address

13

1

0
1
2
3

Index Tag Data Valid

Address (4 bits)

=

Hit

2

Block offset

Mux

Data

8 8

8

1 10

Tag Index (2 bits)

1 0

Example:

14

n

0
1
2
3

Index Tag Data Valid

Address (4 bits)

=

Hit

2

Block offset

Mux

Data

8 8

8

n nn

Tag Index (2 bits)

1
1
1

1

0
1
0
1

0xCA 0xFE
0xDE 0xAD
0xBE 0xEF
0xFE 0xED

0

0

For the addresses below,
what byte is read from the
cache (or is there a miss)?

� 1010
� 1110
� 0001
� 1101

A larger case
• Here is a cache with 1,024

blocks of 4 bytes each, and
32-bit memory addresses.

15

Tag = 20 bits

Index = 10 bits

Block offset = 2 bits

0
1
2
3

1022
1023

Index Tag Data Valid

Address (32 bits)

=

Hit

10 20

Tag

2 bits

Mux

Data

8 8 8 8

8

Cache Field Sizes
• The number of bits in a cache includes both the storage

for data and for the tags.
• 32-bit byte address
• For a direct mapped cache with 2n blocks, n bits are used for the

index
• For a block size of 2m words (2m+2 bytes), m bits are used to

address the word within the block and 2 bits are used to address
the byte within the word

• What is the size of the tag field?
• 32 – (n + m +2)

• The total number of bits in a direct-mapped cache is then
• 2n x (block size + tag field size + valid field size)

16

MIPS Direct Mapped Cache Example
• One word blocks, cache size = 1K words (or 4KB)

17

20 Tag 10
Index

Data Index Tag Valid
0
1
2
.
.
.

1021
1022
1023

31 30 . . . 13 12 11 . . . 2 1 0
Byte
offset

20

Data

32

Hit

Multiword Block Direct Mapped Cache
• Four words/block, cache size = 1K words

18

8
Index

Data Index Tag Valid
0
1
2
.
.
.

253
254
255

31 30 . . . 13 12 11 . . . 4 3 2 1 0
Byte
offset

20

20 Tag

Hit Dat

32

Block offset

Miss Rate vs Block Size vs Cache Size

• Miss rate goes up if the block size becomes a significant
fraction of the cache size because the number of blocks
that can be held in the same size cache is smaller
(increasing capacity misses)

19

Disadvantages of direct-mapped cache
• The direct-mapped cache is easy: indices and offsets

can be computed with bit operators or simple arithmetic,
because each memory address belongs to exactly one
block.

• However, this isn’t really
flexible. Consider a program
keep reading two addresses
A1, A2, A1, A2, A1, A2 ,…
where A1 and A2 have the
same cache index and
block offset.
What will happen?

20

Directed mapped cache: keep reading 0000, 1000

21

Memory
Address

00
01
10
11

Index Tag Byte 0 Byte 1

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

Read 0000: miss.
Load it from memory,
store a copy into cache. A

B

C

D

A B 0

00
01
10
11

C D 1 Read 1000: miss.
Load it from memory,
store a copy into cache.

00
01
10
11

A B 0 Read 0000: miss.
Load it from memory,
store a copy into cache.

00
01
10
11

C D 1 Read 1000: miss.
Load it from memory,
store a copy into cache.

A fully associative cache
• A fully associative cache permits data to be stored in any

cache block, instead of enforcing each memory address into
one particular block.
• When data is stored to cache, it can be placed in any unused block

of the cache. There is no cache index field to locate the cache entry.
• In this way we can reduce the conflict between two or more memory

addresses which map to a single cache block.
• In the previous example, we might put memory address A1 in

cache block 1, and address A2 in block 2. Then subsequent
accesses to A1 and A2 would all be hits instead of misses.

• If all the blocks are already in use, it’s usually best to replace
the least recently used one which is least used in recent time.
Another replacement strategy is most recently used which is
replacing the one that is most used in recent time.

22

Fully associative cache: keep reading 0000, 1000

23

Memory
Address

Tag Byte 0 Byte 1

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

Read 0000: miss.
Load it from memory,
store a copy into cache. A

B

C

D

A B 000

A B 000
Read 1000: miss.
Load it from memory,
store a copy into cache.

Read 0000: hit.
Load it from cache.

Read 1000: hit.
Load it from cache.

C D 100

A B 000

C D 100

A B 000

C D 100

Data can be stored in arbitrary location in the cache, thus no cache index field.

Disadvantages of fully associative cache
• Since data can be put into arbitrary location, so there is no cache

index field.
• Assume the address is m bits, each block contains 2k bytes of data.
• The tag is (m-k) bits long.
• To read a data, we must scan all blocks in the cache and check if the

tag match the address we are going to read. → Require too many
comparators, huge hardware size.

24
:

Cache Data
Byte 0

0 k-1 m-1

:

Cache Tag (m-k bits long)

Valid Bit

:

Byte 1 2k-1 :
:

Cache Tag

Byte Select

=

=
=

=

=

Set-associative cache
• Set-associative cache provides an intermediate solution.

• Cache blocks are divided into groups, each group is called a set.
• Each memory address is mapped to exactly one set in the cache, but

data can be stored in arbitrary block within the set.
• If each set has N blocks, the cache is called a N-way associative

cache.
• Here are several possible organizations of an eight-block cache.

25

0
1
2
3
4
5
6
7

Set

0

1

2

3

Set

0

1

Set

1-way associative
8 sets, 1 block each

2-way associative
4 sets, 2 blocks each

4-way associative
2 sets, 4 blocks each

Locating a set associative block
• We can determine where a memory address belongs to

an associative cache block in a similar way as before.
• If a cache has 2s sets and each block has 2n bytes, the

memory address can be partitioned as follows.

26

Used to address a set
instead of an individual
data block previously.

Used to address a byte
in the block.

Used to check if the current
block is what we are looking for.

Address (m bits)

s (m-s-n) n

Tag Index Block
offset

2-way set associative cache implementation
• How does an implementation

of a 2-way set associative
cache compare with that of a
fully-associative cache?

• Only two comparators are

needed.
• The cache tags are a little

shorter too.
• Some bits are used for

index.

27

0

2k-1

Index Tag Cache block Valid

Address (m bits)

=

Hit

k (m-k-n)

Tag

2-to-1 mux

Data block

2n

Tag Valid Cache block

2n

2n

=

Index Block
offset

one set

block 1 block 0

Example: a 2-way set associative cache

28

Memory
Address

00
01
10
11

Set
Block 1

Tag Byte 0 Byte 1
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

Read 0000: miss, load it from memory, store a copy
into set 00. As block 0 is free, we can put it into block 0.

A

B

C

D

00
01
10
11

C D 1

Block 0
Tag Byte 0 Byte 1

A B 0

A B 0

The middle two address bits are used as cache index to locate a set,
the first address bit is used as tag.

Read 1000: miss, load it from memory, store a copy
into set 00. As block 1 is free, we can put it into block 1.

Set
Block 1

Tag Byte 0 Byte 1
Block 0

Tag Byte 0 Byte 1

Example
• Where would data from memory byte address 6195 be placed in the cache,

assuming the eight-block cache designs below, with 16 bytes per block?
• 6195 in binary is 00…0110000 011 0011
• Each block has 16 bytes, so the lowest 4 bits are the block offset.
For the 1-way cache, the next three bits (011) are the set index.
For the 2-way cache, the next two bits (11) are the set index.
For the 4-way cache, the next one bit (1) is the set index.
• The data may go to any block within the set which is in green.

29

0
1
2
3
4
5
6
7

Set

0

1

2

3

Set

0

1

Set

1-way associative
8 sets, 1 block each

2-way associative
4 sets, 2 blocks each

4-way associative
2 sets, 4 blocks each

Set associative
� By now you may have noticed the 1-way set associative

cache is the same as a direct-mapped cache.

� Similarly, if a cache has N blocks, a N-way set
associative cache would be the same as a fully-
associative cache.

30

0
1
2
3
4
5
6
7

Set

0

1

2

3

Set

0

1

Set

1-way
8 sets,

1 block each

2-way
4 sets,

2 blocks each

4-way
2 sets,

4 blocks each

0

Set

8-way
1 set,

8 blocks

direct mapped fully associative

Range of Set Associative Caches
• For a fixed size cache, each increase by a factor of two

in associativity doubles the number of blocks per set (i.e.,
the number or ways) and halves the number of sets –
decreases the size of the index by 1 bit and increases
the size of the tag by 1 bit.

31

Block offset Byte offset Index Tag

Decreasing associativity

Fully associative
(only one set)
Tag is all the bits except
block and byte offset

Direct mapped
(only one way)
Smaller tags, only a
single comparator

Increasing associativity

Selects the set Used for tag compare Selects the word in the block

Spectrum of Associativity
• For a cache with 8 entries

32

Associativity Example
• Compare 4-block caches

• Direct mapped, 2-way set associative,
fully associative

• Block access sequence: 0, 8, 0, 6, 8

• Direct mapped

33

Block
address

Cache
index

Hit/miss Cache content after access
0 1 2 3

0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]

Associativity Example
• 2-way set associative

• Fully associative

34

Block
address

Cache
index

Hit/miss Cache content after access
Set 0 Set 1

0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]

Block
address

Hit/miss Cache content after access

0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]

How Much Associativity
• Increased associativity decreases miss rate

• But with diminishing returns

• Simulation of a system with 64KB
D-cache, 16-word blocks, SPEC2000
• 1-way: 10.3%
• 2-way: 8.6%
• 4-way: 8.3%
• 8-way: 8.1%

35

Benefits of Set Associative Caches
• The choice of direct mapped or set associative depends

on the cost of a miss versus the cost of implementation.

36

0

2

4

6

8

10

12

1-way 2-way 4-way 8-way
Associativity

M
is

s
R

at
e

4KB
8KB
16KB
32KB
64KB
128KB
256KB
512KB

Data from Hennessy & Patterson,
Computer Architecture, 2003

Largest gains are in going from direct mapped to 2-way (20%+
reduction in miss rate)

How about the set is full?
• Data can be placed in any block in a set.
• Direct mapped: no choice.
• Select the least recently used (LRU) block to replace if

the current set is full.
• The block which is least accessed in recent time.
• Must have hardware to keep track of when each way’s block was used relative

to the other blocks in the set.
• For 2-way set associative, takes one bit per set → set the bit when a block is

referenced (and reset the other way’s bit).

• Select the most recently used (MRU) one block to
replace.

• The block which is most accessed in recent time.
• For random access patterns and repeated scans over large datasets (sometimes

known as cyclic access patterns), MRU cache algorithms have more hits than
LRU due to their tendency to retain older data.

• Random
• Gives approximately the same performance as LRU for high associativity.

37

Write-hit policy: write through
• Write-hit: when processor write data (e.g. sw), the

address is found in cache.
• Write through: Data is written to both the block in the

cache and to the main memory.
• Advantage: cache and main memory are always consistent.
• Disadvantage: introduce too many writes to memory, occupy the

communication bandwidth.

38

Index Tag Data V Address

110

1 11010 21763

Data

21763

1101 0110

Mem[214] = 21763

Cache Main memory

214 = 11010110

Write buffer
• Write through is always associated with write buffers so

that the processor don’t need to wait for the slow main
memory.

• A write buffer queues the write requests from the
processor, and perform the actual write operations on
memory later.
• The processor doesn’t need to wait for the write operation to

complete. The processor just need to issues write requests to
the write buffer and continue to process other instructions.

• If the processor issues write request too fast, extra data are
stored in the buffer, which will be written to main memory later.

39

Processor
Cache

Write Buffer

Main
Memory

Write-hit policy: write back
• Write back: Data is written only to the block in the cache.

This modified cache block is written to main memory
only when it is replaced.
• Advantage: write to cache is fast, subsequent read will be served

by cache. No need to access memory for every write.
• Disadvantage: data inconsistency between cache and main

memory.

40
Use a dirty bit to indicate data
inconsistency.

Index Tag Data Dirty Address

110

1 11010 21763

Data

42803

1000 1110

1101 0110

Mem[214] = 21763

1225

V

1

214 = 11010110

Write back: replacement when read miss
• If there is a read miss when cache is full:
• Step 1: select LRU block, check the dirty bit of the mapped cache

location. If the dirty bit is 1, means the data in cache has been
modified. If the dirty bit is 0, means no modification.

• Step 2: assume dirty bit is 1, we need to store data from cache to
memory.

• Step 3: copy the read (requested) data from main memory to cache,
set dirty bit to 0.

41

Index Tag Data

110

Dirty

1 11010 21763

Address Data

2138

1000 1110

1101 0110

1225

V

1

Index Tag Data

110

10001 1225

Address Data

21763

1000 1110

1101 0110

1225

Dirty

0

V

1

Write-miss Policy: Write Allocate versus Not
Allocate
• When processor write data (e.g. sw), but the data block

is not found in cache → write miss.
• Write Allocate

• Load the block from memory to cache, the write is resumed and
results in a write-hit.

• Use write-hit policy to handle.

• Write Not Allocate
• Just update memory only

42

Index Tag Data V

110

1 00010 123456

Address Data

21763

1101 0110

Mem[214] = 21763

Possible combinations
Write hit policy Write miss policy
1. Write Through Write Allocate
2. Write Through Write Not Allocate
3. Write Back Write Allocate
4. Write Back Write Not Allocate

43

Possible combinations
• Write Through with Write Allocate:

• On hits it writes to cache and main memory.
• On misses it loads the block from main memory to cache, and

update the cache.
• The purpose of using Write Allocate is that subsequent write to the

same block will cause cache hit, so the subsequent write just need
to update the cache. However, using Write Through cannot make
use of this benefit.

• Since Write Through policy is used, even on cache hit, it will
generate a write to both cache and main memory → memory will
still be updated which means it cannot make use of cache to save
time.

• Write Through with Write Not Allocate
• On hits it writes to cache and main memory.
• On cache miss it updates the block in main memory only.
• As a result, on cache miss, time is saved by not loading the block

from memory to cache.

44

Possible combinations
• Write Back with Write Allocate

• On hits it writes to cache only, main memory is not updated.
• On misses it loads the block from main memory to cache, and

update the cache.
• The purpose of using Write Allocate is that subsequent write to the

same block will cause cache hit, so the subsequent write just need
to update cache only. Using Write Back can make use of this benefit.

• Subsequent writes to the same block cause cache hits, Write Back
just need to update cache and set dirty bit for the block. This
eliminates extra memory accesses and results in very efficient
execution compared with the (Write Through + Write Allocate)
combination.

• Write Back with Write Not Allocate
• On hits it writes to cache only, main memory is not updated.
• On misses it updates the block in main memory only.
• Subsequent writes to the same block, if the block originally caused

a write miss, it will always generate misses and result in very
inefficient execution.

45

Two Machines’ Cache Parameters

46

Intel Nehalem AMD Barcelona
L1 cache
organization & size

Split I$ and D$; 32KB for each
per core; 64B blocks

Split I$ and D$; 64KB for each
per core; 64B blocks

L1 associativity 4-way (I), 8-way (D) set
assoc.; ~LRU replacement

2-way set assoc.; LRU
replacement

L1 write policy write-back, write-allocate write-back, write-allocate
L2 cache
organization & size

Unified; 256KB (0.25MB) per
core; 64B blocks

Unified; 512KB (0.5MB) per
core; 64B blocks

L2 associativity 8-way set assoc.; ~LRU 16-way set assoc.; ~LRU
L2 write policy write-back write-back
L2 write policy write-back, write-allocate write-back, write-allocate
L3 cache
organization & size

Unified; 8192KB (8MB)
shared by cores; 64B blocks

Unified; 2048KB (2MB) shared
by cores; 64B blocks

L3 associativity 16-way set assoc. 32-way set assoc.; evict block
shared by fewest cores

L3 write policy write-back, write-allocate write-back; write-allocate

CO101�Principle of Computer Organization
Computer Organization
A Typical Memory Hierarchy
Important issues
Processor read data from memory
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Cache Example
Example:
Example:
A larger case
Cache Field Sizes
MIPS Direct Mapped Cache Example
Multiword Block Direct Mapped Cache
Miss Rate vs Block Size vs Cache Size
Disadvantages of direct-mapped cache
Directed mapped cache: keep reading 0000, 1000
A fully associative cache
Fully associative cache: keep reading 0000, 1000
Disadvantages of fully associative cache
Set-associative cache
Locating a set associative block
2-way set associative cache implementation
Example: a 2-way set associative cache
Example
Set associative
Range of Set Associative Caches
Spectrum of Associativity
Associativity Example
Associativity Example
How Much Associativity
Benefits of Set Associative Caches
How about the set is full?
Write-hit policy: write through
Write buffer
Write-hit policy: write back
Write back: replacement when read miss
Write-miss Policy: Write Allocate versus Not Allocate
Possible combinations
Possible combinations
Possible combinations
Two Machines’ Cache Parameters