CS计算机代考程序代写 scheme cache algorithm Chapter 6

Chapter 6
Memory

2

6.1 Introduction

• Up until now, we have just spoken about how

the CPU works assuming main memory exists.

• In this chapter, we focus on memory

organization.

• We’ll talk about how different types of memory

affect performance and how memory is

accessed in a computer system.

3

6.2 Types of Memory

• Random Access Memory, RAM

– Static RAM (SRAM)

• Faster

• Built with flip-flops

– Dynamic RAM (DRAM)

• Cheaper, built with capacitors, requires a refresh

• Has variants (SDRAM, DDR SDRAM)

• Read-Only Memory, ROM

– Permanent or complicated to modify

– Has variants, PROM, EPROM, EEPROM, Flash

4

6.3 The Memory Hierarchy

• Faster memory is often more expensive than slower
memory.

• To provide the best performance at the lowest cost,

memory is organized in a hierarchical fashion.

• Small, fast storage elements like registers are kept

in the CPU

• Main memory is slightly slower and accessed
through the main memory bus.

• Larger, storage like SSDs and Hard Drives are
accessed via various Input/Output methods.

5

• This storage organization can be thought of as a pyramid:

6.3 The Memory Hierarchy

6

• Our focus will be on

– registers, cache, main memory, and virtual memory.

• Cache is faster than main memory but is smaller.

• Virtual memory uses slower larger storage to

function as main memory.

• To access a particular piece of data:

– the CPU first sends a request to its nearest memory.

– If the data is not there, the next level down is queried.

– Once the data is located, then the data and nearby data

elements (called a block) are fetched into cache

memory.

6.3 The Memory Hierarchy

7

• A hit is when data is found at a given memory level.
– The hit rate is the percentage of time data is found.

– The hit time is the time required to access data

• A miss is when it is not found.
– The miss rate is the percentage of time it is not.

• Miss rate = 100% – hit rate.

– The miss penalty is the time required to process a miss,
including the time that it takes to replace a block of memory
plus the time it takes to deliver the data to the processor.

6.3 The Memory Hierarchy

8

• An entire blocks of data is copied after a hit
because the principle of locality tells us that once a

byte is accessed, it is likely that a nearby data

element will be needed soon.

• There are three forms of locality:

– Temporal locality- Recently-accessed data elements tend

to be accessed again.

– Spatial locality – Accesses tend to cluster.

– Sequential locality – Instructions tend to be accessed

sequentially.

6.3 The Memory Hierarchy

9

Questions?

• Types of memory

• Memory hierarchy

• What a cache is

• What virtual memory is

10

6.4 Cache Memory

• The purpose of cache memory is to speed up
accesses by storing recently used data closer to the

CPU, instead of storing it in main memory.

• Although cache is much smaller than main memory,
its access time is a fraction of that of main memory.

• There are different ways to handle storing a larger

memory into a smaller memory, this is called a

mapping.

• In all mappings, both levels of memory are grouped

into equal sized blocks.

11

• The simplest cache mapping scheme is direct
mapped cache.

• In a direct mapped cache consisting of N blocks

of cache, block X of main memory maps to cache
block X mod N.

• Thus, if we have 10 blocks of cache, block 7 of

cache may hold blocks 7, 17, 27, 37, . . . of main

memory.

The next slide illustrates this mapping.

6.4 Cache Memory

12

6.4 Cache Memory

• With direct

mapped cache

consisting of N

blocks of cache,

block X of main

memory maps to

cache block X

mod N.

13

6.4 Cache Memory

• A larger

example.

14

6.4 Cache Memory

• To perform direct mapping, the binary main memory

address is partitioned into the fields shown below.

– The offset field uniquely identifies an address within a

specific block.

– The block field selects a unique block of cache.

– The tag field is whatever is left over.

• The sizes of these fields are determined by characteristics

of both memory and cache.

15 (1/4)

• EXAMPLE 6.1 Consider a byte-addressable main
memory consisting of 4 blocks, and a cache with 2

blocks, where each block is 4 bytes.

• This means Block 0 and 2 of main memory map to

Block 0 of cache, and Blocks 1 and 3 of main

memory map to Block 1 of cache.

• Using the tag, block, and offset fields, we can see

how main memory maps to cache as follows.

6.4 Cache Memory

16 (2/4)

• EXAMPLE 6.1 Cont’d Consider a byte-addressable main

memory consisting of 4 blocks, and a cache with 2

blocks, where each block is 4 bytes.

– First, we need to determine the address format for mapping.

Each block is 4 bytes, so the offset field must contain 2 bits; there

are 2 blocks in cache, so the block field must contain 1 bit; this

leaves 1 bit for the tag (as a main memory address has 4 bits

because there are a total of 24=16 bytes).

6.4 Cache Memory

17 (3/4)

• EXAMPLE 6.1 Cont’d

– Suppose we need to access

main memory address 316
(0011 in binary). If we partition

0011 using the address format

from Figure a, we get Figure b.

– Thus, the main memory address

0011 maps to cache block 0.

– Figure c shows this mapping,

along with the tag that is also

stored with the data.

6.4 Cache Memory

a

b

The next slide illustrates

another mapping.

c

18 (4/4)

6.4 Cache Memory

Load block 2, Evict block 0

19 (1/1)

• EXAMPLE 6.2 Assume a byte-addressable memory

consists of 214 bytes, cache has 16 blocks, and each

block has 8 bytes.

– The number of memory blocks are:

– Each main memory address requires14 bits. Of this 14-bit address

field, the rightmost 3 bits reflect the offset field

– We need 4 bits to select a specific block in cache, so the block

field consists of the middle 4 bits.

– The remaining 7 bits make up the tag field.

6.4 Cache Memory

20 (1/2)

• EXAMPLE 6.3 Assume a byte-addressable memory

consisting of 16 bytes divided into 8 blocks. Cache

contains 4 blocks. We know:

– A memory address has 4 bits.

– The 4-bit memory address is divided into the fields below.

6.4 Cache Memory

21 (2/2)

• EXAMPLE 6.3 Cont’d The mapping for memory

references is shown below:

6.4 Cache Memory

22 (1/1)

• EXAMPLE 6.4 Consider 16-bit memory addresses and

64 blocks of cache where each block contains 8 bytes.

We have:

– 3 bits for the offset

– 6 bits for the block

– 7 bits for the tag.

• A memory reference for 0x0404 maps as follows:

6.4 Cache Memory

23

• In summary, direct mapped cache maps main
memory blocks in a modular fashion to cache

blocks. The mapping depends on:

• The size of main memory
– Determines the total number of bits needed to

represent an address

• The size of the cache
– Determines the size of the block and tag field

• How many addresses are in a block
– Determines the size of the offset field

6.4 Cache Memory

24

• Direct Mapped Cache

Questions

25

• Suppose instead of placing memory blocks in
specific cache locations based on memory

address, we could allow a block to go anywhere

in cache.

• In this way, cache would have to fill up before
any blocks are evicted.

• This is how fully associative cache works.

• A memory address is partitioned into only two
fields: the tag and the word.

6.4 Cache Memory

26

• Suppose, as before, we have 14-bit memory
addresses and a cache with 16 blocks, each block

of size 8. The field format of a memory reference

is:

• When the cache is searched, all tags are searched
in parallel to retrieve the data quickly.

• This requires special, costly hardware.

6.4 Cache Memory

27

• You will recall that direct mapped cache evicts a
block whenever another memory reference

needs that block.

• With fully associative cache, we have no such
mapping, thus we must devise an algorithm to

determine which block to evict from the cache.

• There are several ways to pick a block to evict,

we will discuss them shortly.

6.4 Cache Memory

28

• Set associative cache combines the ideas of direct
mapped cache and fully associative cache.

• An N-way set associative cache mapping is like

direct mapped cache in that a memory reference
maps to a particular location in cache.

• Unlike direct mapped cache, a memory reference

maps to a set of several cache blocks, like the way in

which fully associative cache works.

• Instead of mapping anywhere in the entire cache, a

memory reference can map only to the subset of

cache slots.

6.4 Cache Memory

29

• The number of cache blocks per set in set associative

cache varies according to overall system design.

6.4 Cache Memory

– For example, a 2-way set associative

cache can be conceptualized as shown in

the schematic below.

– Each set contains two different memory

blocks.

Logical view Linear view

30

• In set associative cache mapping, a memory
reference is divided into three fields: tag, set,

and offset.

• As with direct-mapped cache, the offset field
chooses the word within the cache block, and

the tag field uniquely identifies the memory

address.

• The set field determines the set to which the
memory block maps.

6.4 Cache Memory

31

• EXAMPLE 6.5 Suppose we are using 2-way set

associative mapping with a word-addressable main

memory of 214 words and a cache with 16 blocks,

where each block contains 8 words.

– Cache has a total of 16 blocks, and each set has 2 blocks,

then there are 8 sets in cache.

– Thus, the set field is 3 bits, the offset field is 3 bits, and

the tag field is 8 bits.

6.4 Cache Memory

32

• EXAMPLE 6.7 A byte-addressable computer with an 8-

block cache of 4 bytes each, trace memory accesses:

0x01, 0x04, 0x09, 0x05, 0x14, 0x21, and 0x01 for each

mapping approach.

• The address format for direct mapped cache is:

6.4 Cache Memory

Our trace is on the next slide.

6.4 Cache Memory

33

34

• EXAMPLE 6.7 cont’d: A byte-addressable computer

with an 8-block cache of 4 bytes each, trace memory

accesses: 0x01, 0x04, 0x09, 0x05, 0x14, 0x21, and

0x01 for each mapping approach.

• The address format for fully associative cache is:

6.4 Cache Memory

Our trace is on the next slide.

6.4 Cache Memory

35

36

• EXAMPLE 6.7 cont’d: A byte-addressable computer

with an 8-block cache of 4 bytes each, trace memory

accesses: 0x01, 0x04, 0x09, 0x05, 0x14, 0x21, and

0x01 for each mapping approach.

• The address format for 2-way set-associative cache is:

6.4 Cache Memory

Our trace is on the next slide.

6.4 Cache Memory

37

38

• Associative cache mapping

• Set Associative cache mapping

Questions

39

• With fully associative and set associative cache, a
replacement policy is used to determine which block

to evict when a new block is needed.

• An optimal replacement policy would be able to see
the future to see which blocks won’t be needed for

the longest period.

• Although it is impossible to implement an optimal

replacement algorithm, it can be used as a potential
best case to compare other replacement policies

against.

6.4 Cache Memory

40

• The replacement policy that we choose depends upon
the locality that we are trying to optimize.

– Usually, we are interested in temporal locality.

• A least recently used (LRU) algorithm keeps track of

the last time that a block was assessed.

– Is is complex: LRU has to maintain an access history for

each block, which ultimately slows down the cache.

• First-in, first-out (FIFO) replacement policy is popular.

– The block that has been in the cache the longest is evicted.

• A random replacement policy is exactly that.

– It picks a block at random to evict.

6.4 Cache Memory

41

• The performance of hierarchical memory is
measured by its effective access time (EAT).

• EAT is a weighted average that takes into account

the hit ratio and relative access times of successive
levels of memory.

• The EAT for a two-level memory is given by:

EAT = H  AccessC + (1-H)  AccessMM.

where H is the cache hit rate and AccessC and AccessMM are

the access times for cache and main memory, respectively.

6.4 Cache Memory

42

• For example,

– Main memory access time of 200ns

– Cache has a 10ns access time and a hit rate of 99%.

– Cache and main memory are read concurrently

• The Effective access time (EAT) is:

0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns.

6.4 Cache Memory

43

• For example,

– Main memory access time of 200ns

– Cache has a 10ns access time and a hit rate of 99%.

– Cache and main memory are read sequentially

• The Effective access time (EAT) is:

0.99(10ns) + 0.01(10ns + 200ns)

= 9.9ns + 2ns
= 11ns.

6.4 Cache Memory

44

• Caching is depend upon programs exhibiting good
locality.

– Some object-oriented programs have poor locality

owing to their complex, dynamic structures.

– Arrays stored in column-major rather than row-major

order can be problematic for certain cache

organizations.

• With poor locality, caching can cause performance
degradation rather than performance improvement.

6.4 Cache Memory

45

• Cache replacement policies must account for
dirty blocks, those blocks that have been updated

while they were in the cache.

• Dirty blocks must be written back to memory. A
write policy determines how this will be done.

• There are two types of write policies,write through
and write back.

• Write through updates cache and main memory
simultaneously on every write.

6.4 Cache Memory

46

• Write back (also called copyback) updates memory

only when the block is selected for replacement.

• The disadvantage of write through is that memory

must be updated with each cache write, which slows

down the access time on updates. This slowdown is

usually negligible, because the majority of accesses

tend to be reads, not writes.

• The advantage of write back is that memory traffic is

minimized, but its disadvantage is that memory does

not always agree with the value in cache, causing

problems in systems with many concurrent users.

6.4 Cache Memory

47

• The cache we have been discussing is called a
unified or integrated cache where both instructions

and data are cached.

• Many modern systems employ separate caches for

data and instructions.

– This is called a Harvard cache.

• The separation of data from instructions provides
better locality, at the cost of greater complexity.

– Simply making the cache larger provides about the same

performance improvement without the complexity.

6.4 Cache Memory

48

• Cache performance can also be improved by
adding a small associative cache to hold blocks

that have been evicted recently.

• A trace cache is a variant of an instruction

cache that holds decoded instructions for
program branches, giving the illusion that

noncontiguous instructions are really
contiguous.

6.4 Cache Memory

49

• Most of today’s small systems employ multilevel

cache hierarchies.

• The levels of cache form their own small memory

hierarchy.

• Level1 cache (8KB to 64KB) is situated on the

processor itself.

– Access time is typically about 4ns.

• Level 2 cache (64KB to 2MB) may be on the

motherboard, or on an expansion card.

– Access time is usually around 15 – 20ns.

6.4 Cache Memory

50

• In systems that employ three levels of cache, the

Level 2 cache is placed on the same die as the CPU

(reducing access time to about 10ns)

• Accordingly, the Level 3 cache (2MB to 256MB)

refers to cache that is situated between the

processor and main memory.

• Once the number of cache levels is determined, the

next thing to consider is whether data (or

instructions) can exist in more than one cache level.

6.4 Cache Memory

51

• Memory heiarchy

• Cache Memory

• Direct Mapping

• ascociative mapping

Questions?