Chapter 6
Memory
2
6.1 Introduction
• Up until now, we have just spoken about how
the CPU works assuming main memory exists.
• In this chapter, we focus on memory
organization.
• We’ll talk about how different types of memory
affect performance and how memory is
accessed in a computer system.
3
6.2 Types of Memory
• Random Access Memory, RAM
– Static RAM (SRAM)
• Faster
• Built with flip-flops
– Dynamic RAM (DRAM)
• Cheaper, built with capacitors, requires a refresh
• Has variants (SDRAM, DDR SDRAM)
• Read-Only Memory, ROM
– Permanent or complicated to modify
– Has variants, PROM, EPROM, EEPROM, Flash
4
6.3 The Memory Hierarchy
• Faster memory is often more expensive than slower
memory.
• To provide the best performance at the lowest cost,
memory is organized in a hierarchical fashion.
• Small, fast storage elements like registers are kept
in the CPU
• Main memory is slightly slower and accessed
through the main memory bus.
• Larger, storage like SSDs and Hard Drives are
accessed via various Input/Output methods.
5
• This storage organization can be thought of as a pyramid:
6.3 The Memory Hierarchy
6
• Our focus will be on
– registers, cache, main memory, and virtual memory.
• Cache is faster than main memory but is smaller.
• Virtual memory uses slower larger storage to
function as main memory.
• To access a particular piece of data:
– the CPU first sends a request to its nearest memory.
– If the data is not there, the next level down is queried.
– Once the data is located, then the data and nearby data
elements (called a block) are fetched into cache
memory.
6.3 The Memory Hierarchy
7
• A hit is when data is found at a given memory level.
– The hit rate is the percentage of time data is found.
– The hit time is the time required to access data
• A miss is when it is not found.
– The miss rate is the percentage of time it is not.
• Miss rate = 100% – hit rate.
– The miss penalty is the time required to process a miss,
including the time that it takes to replace a block of memory
plus the time it takes to deliver the data to the processor.
6.3 The Memory Hierarchy
8
• An entire blocks of data is copied after a hit
because the principle of locality tells us that once a
byte is accessed, it is likely that a nearby data
element will be needed soon.
• There are three forms of locality:
– Temporal locality- Recently-accessed data elements tend
to be accessed again.
– Spatial locality – Accesses tend to cluster.
– Sequential locality – Instructions tend to be accessed
sequentially.
6.3 The Memory Hierarchy
9
Questions?
• Types of memory
• Memory hierarchy
• What a cache is
• What virtual memory is
10
6.4 Cache Memory
• The purpose of cache memory is to speed up
accesses by storing recently used data closer to the
CPU, instead of storing it in main memory.
• Although cache is much smaller than main memory,
its access time is a fraction of that of main memory.
• There are different ways to handle storing a larger
memory into a smaller memory, this is called a
mapping.
• In all mappings, both levels of memory are grouped
into equal sized blocks.
11
• The simplest cache mapping scheme is direct
mapped cache.
• In a direct mapped cache consisting of N blocks
of cache, block X of main memory maps to cache
block X mod N.
• Thus, if we have 10 blocks of cache, block 7 of
cache may hold blocks 7, 17, 27, 37, . . . of main
memory.
The next slide illustrates this mapping.
6.4 Cache Memory
12
6.4 Cache Memory
• With direct
mapped cache
consisting of N
blocks of cache,
block X of main
memory maps to
cache block X
mod N.
13
6.4 Cache Memory
• A larger
example.
14
6.4 Cache Memory
• To perform direct mapping, the binary main memory
address is partitioned into the fields shown below.
– The offset field uniquely identifies an address within a
specific block.
– The block field selects a unique block of cache.
– The tag field is whatever is left over.
• The sizes of these fields are determined by characteristics
of both memory and cache.
15 (1/4)
• EXAMPLE 6.1 Consider a byte-addressable main
memory consisting of 4 blocks, and a cache with 2
blocks, where each block is 4 bytes.
• This means Block 0 and 2 of main memory map to
Block 0 of cache, and Blocks 1 and 3 of main
memory map to Block 1 of cache.
• Using the tag, block, and offset fields, we can see
how main memory maps to cache as follows.
6.4 Cache Memory
16 (2/4)
• EXAMPLE 6.1 Cont’d Consider a byte-addressable main
memory consisting of 4 blocks, and a cache with 2
blocks, where each block is 4 bytes.
– First, we need to determine the address format for mapping.
Each block is 4 bytes, so the offset field must contain 2 bits; there
are 2 blocks in cache, so the block field must contain 1 bit; this
leaves 1 bit for the tag (as a main memory address has 4 bits
because there are a total of 24=16 bytes).
6.4 Cache Memory
17 (3/4)
• EXAMPLE 6.1 Cont’d
– Suppose we need to access
main memory address 316
(0011 in binary). If we partition
0011 using the address format
from Figure a, we get Figure b.
– Thus, the main memory address
0011 maps to cache block 0.
– Figure c shows this mapping,
along with the tag that is also
stored with the data.
6.4 Cache Memory
a
b
The next slide illustrates
another mapping.
c
18 (4/4)
6.4 Cache Memory
Load block 2, Evict block 0
19 (1/1)
• EXAMPLE 6.2 Assume a byte-addressable memory
consists of 214 bytes, cache has 16 blocks, and each
block has 8 bytes.
– The number of memory blocks are:
– Each main memory address requires14 bits. Of this 14-bit address
field, the rightmost 3 bits reflect the offset field
– We need 4 bits to select a specific block in cache, so the block
field consists of the middle 4 bits.
– The remaining 7 bits make up the tag field.
6.4 Cache Memory
20 (1/2)
• EXAMPLE 6.3 Assume a byte-addressable memory
consisting of 16 bytes divided into 8 blocks. Cache
contains 4 blocks. We know:
– A memory address has 4 bits.
– The 4-bit memory address is divided into the fields below.
6.4 Cache Memory
21 (2/2)
• EXAMPLE 6.3 Cont’d The mapping for memory
references is shown below:
6.4 Cache Memory
22 (1/1)
• EXAMPLE 6.4 Consider 16-bit memory addresses and
64 blocks of cache where each block contains 8 bytes.
We have:
– 3 bits for the offset
– 6 bits for the block
– 7 bits for the tag.
• A memory reference for 0x0404 maps as follows:
6.4 Cache Memory
23
• In summary, direct mapped cache maps main
memory blocks in a modular fashion to cache
blocks. The mapping depends on:
• The size of main memory
– Determines the total number of bits needed to
represent an address
• The size of the cache
– Determines the size of the block and tag field
• How many addresses are in a block
– Determines the size of the offset field
6.4 Cache Memory
24
• Direct Mapped Cache
Questions
25
• Suppose instead of placing memory blocks in
specific cache locations based on memory
address, we could allow a block to go anywhere
in cache.
• In this way, cache would have to fill up before
any blocks are evicted.
• This is how fully associative cache works.
• A memory address is partitioned into only two
fields: the tag and the word.
6.4 Cache Memory
26
• Suppose, as before, we have 14-bit memory
addresses and a cache with 16 blocks, each block
of size 8. The field format of a memory reference
is:
• When the cache is searched, all tags are searched
in parallel to retrieve the data quickly.
• This requires special, costly hardware.
6.4 Cache Memory
27
• You will recall that direct mapped cache evicts a
block whenever another memory reference
needs that block.
• With fully associative cache, we have no such
mapping, thus we must devise an algorithm to
determine which block to evict from the cache.
• There are several ways to pick a block to evict,
we will discuss them shortly.
6.4 Cache Memory
28
• Set associative cache combines the ideas of direct
mapped cache and fully associative cache.
• An N-way set associative cache mapping is like
direct mapped cache in that a memory reference
maps to a particular location in cache.
• Unlike direct mapped cache, a memory reference
maps to a set of several cache blocks, like the way in
which fully associative cache works.
• Instead of mapping anywhere in the entire cache, a
memory reference can map only to the subset of
cache slots.
6.4 Cache Memory
29
• The number of cache blocks per set in set associative
cache varies according to overall system design.
6.4 Cache Memory
– For example, a 2-way set associative
cache can be conceptualized as shown in
the schematic below.
– Each set contains two different memory
blocks.
Logical view Linear view
30
• In set associative cache mapping, a memory
reference is divided into three fields: tag, set,
and offset.
• As with direct-mapped cache, the offset field
chooses the word within the cache block, and
the tag field uniquely identifies the memory
address.
• The set field determines the set to which the
memory block maps.
6.4 Cache Memory
31
• EXAMPLE 6.5 Suppose we are using 2-way set
associative mapping with a word-addressable main
memory of 214 words and a cache with 16 blocks,
where each block contains 8 words.
– Cache has a total of 16 blocks, and each set has 2 blocks,
then there are 8 sets in cache.
– Thus, the set field is 3 bits, the offset field is 3 bits, and
the tag field is 8 bits.
6.4 Cache Memory
32
• EXAMPLE 6.7 A byte-addressable computer with an 8-
block cache of 4 bytes each, trace memory accesses:
0x01, 0x04, 0x09, 0x05, 0x14, 0x21, and 0x01 for each
mapping approach.
• The address format for direct mapped cache is:
6.4 Cache Memory
Our trace is on the next slide.
6.4 Cache Memory
33
34
• EXAMPLE 6.7 cont’d: A byte-addressable computer
with an 8-block cache of 4 bytes each, trace memory
accesses: 0x01, 0x04, 0x09, 0x05, 0x14, 0x21, and
0x01 for each mapping approach.
• The address format for fully associative cache is:
6.4 Cache Memory
Our trace is on the next slide.
6.4 Cache Memory
35
36
• EXAMPLE 6.7 cont’d: A byte-addressable computer
with an 8-block cache of 4 bytes each, trace memory
accesses: 0x01, 0x04, 0x09, 0x05, 0x14, 0x21, and
0x01 for each mapping approach.
• The address format for 2-way set-associative cache is:
6.4 Cache Memory
Our trace is on the next slide.
6.4 Cache Memory
37
38
• Associative cache mapping
• Set Associative cache mapping
Questions
39
• With fully associative and set associative cache, a
replacement policy is used to determine which block
to evict when a new block is needed.
• An optimal replacement policy would be able to see
the future to see which blocks won’t be needed for
the longest period.
• Although it is impossible to implement an optimal
replacement algorithm, it can be used as a potential
best case to compare other replacement policies
against.
6.4 Cache Memory
40
• The replacement policy that we choose depends upon
the locality that we are trying to optimize.
– Usually, we are interested in temporal locality.
• A least recently used (LRU) algorithm keeps track of
the last time that a block was assessed.
– Is is complex: LRU has to maintain an access history for
each block, which ultimately slows down the cache.
• First-in, first-out (FIFO) replacement policy is popular.
– The block that has been in the cache the longest is evicted.
• A random replacement policy is exactly that.
– It picks a block at random to evict.
6.4 Cache Memory
41
• The performance of hierarchical memory is
measured by its effective access time (EAT).
• EAT is a weighted average that takes into account
the hit ratio and relative access times of successive
levels of memory.
• The EAT for a two-level memory is given by:
EAT = H AccessC + (1-H) AccessMM.
where H is the cache hit rate and AccessC and AccessMM are
the access times for cache and main memory, respectively.
6.4 Cache Memory
42
• For example,
– Main memory access time of 200ns
– Cache has a 10ns access time and a hit rate of 99%.
– Cache and main memory are read concurrently
• The Effective access time (EAT) is:
0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns.
6.4 Cache Memory
43
• For example,
– Main memory access time of 200ns
– Cache has a 10ns access time and a hit rate of 99%.
– Cache and main memory are read sequentially
• The Effective access time (EAT) is:
0.99(10ns) + 0.01(10ns + 200ns)
= 9.9ns + 2ns
= 11ns.
6.4 Cache Memory
44
• Caching is depend upon programs exhibiting good
locality.
– Some object-oriented programs have poor locality
owing to their complex, dynamic structures.
– Arrays stored in column-major rather than row-major
order can be problematic for certain cache
organizations.
• With poor locality, caching can cause performance
degradation rather than performance improvement.
6.4 Cache Memory
45
• Cache replacement policies must account for
dirty blocks, those blocks that have been updated
while they were in the cache.
• Dirty blocks must be written back to memory. A
write policy determines how this will be done.
• There are two types of write policies,write through
and write back.
• Write through updates cache and main memory
simultaneously on every write.
6.4 Cache Memory
46
• Write back (also called copyback) updates memory
only when the block is selected for replacement.
• The disadvantage of write through is that memory
must be updated with each cache write, which slows
down the access time on updates. This slowdown is
usually negligible, because the majority of accesses
tend to be reads, not writes.
• The advantage of write back is that memory traffic is
minimized, but its disadvantage is that memory does
not always agree with the value in cache, causing
problems in systems with many concurrent users.
6.4 Cache Memory
47
• The cache we have been discussing is called a
unified or integrated cache where both instructions
and data are cached.
• Many modern systems employ separate caches for
data and instructions.
– This is called a Harvard cache.
• The separation of data from instructions provides
better locality, at the cost of greater complexity.
– Simply making the cache larger provides about the same
performance improvement without the complexity.
6.4 Cache Memory
48
• Cache performance can also be improved by
adding a small associative cache to hold blocks
that have been evicted recently.
• A trace cache is a variant of an instruction
cache that holds decoded instructions for
program branches, giving the illusion that
noncontiguous instructions are really
contiguous.
6.4 Cache Memory
49
• Most of today’s small systems employ multilevel
cache hierarchies.
• The levels of cache form their own small memory
hierarchy.
• Level1 cache (8KB to 64KB) is situated on the
processor itself.
– Access time is typically about 4ns.
• Level 2 cache (64KB to 2MB) may be on the
motherboard, or on an expansion card.
– Access time is usually around 15 – 20ns.
6.4 Cache Memory
50
• In systems that employ three levels of cache, the
Level 2 cache is placed on the same die as the CPU
(reducing access time to about 10ns)
• Accordingly, the Level 3 cache (2MB to 256MB)
refers to cache that is situated between the
processor and main memory.
• Once the number of cache levels is determined, the
next thing to consider is whether data (or
instructions) can exist in more than one cache level.
6.4 Cache Memory
51
• Memory heiarchy
• Cache Memory
• Direct Mapping
• ascociative mapping
Questions?