PowerPoint 演示文稿
CO101
Principle of Computer
Organization
Lecture 15: Memory 1
Liang Yanyan
澳門科技大學
Macau of University of Science and
Technology
Computer Organization
• CPU clock rate is much faster than memory.
• Slow memory can dramatically reduce the performance.
2
Processor
Control
Datapath
Memory
Devices
Input
Output
C
ache
M
ain
M
em
ory
Secondary
M
em
ory
(D
isk)
Processor-Memory Performance Gap
3
Slow memory can reduce the performance of computer
systems.
µProc
55%/year
(2X/1.5yr)
DRAM
7%/year
(2X/10yrs)
Processor-Memory
Performance Gap
(grows 50%/year)
“Moore’s Law”
The “Memory Wall”
• Processor vs DRAM speed disparity continues to grow
4
0.01
0.1
1
10
100
1000
VAX/1980 PPro/1996 2010+
Core
Memory
C
lo
ck
s
pe
r i
ns
tr
uc
tio
n
C
lo
ck
s
pe
r D
R
A
M
a
cc
es
s
Good memory hierarchy (cache) design is increasingly
important to overall performance
The Memory Hierarchy Goal
• Fact: Large memories are slow and fast memories are
small
• How do we create a memory that gives the illusion of
being large, cheap and fast (most of the time)?
• With hierarchy
5
A Typical Memory Hierarchy
• Take advantage of the principle of locality to present the
user with as much memory as is available in the
cheapest technology at the speed offered by the fastest
technology
6
Second
Level
Cache
(SRAM)
Control
Datapath
Secondary
Memory
(Disk)
On-Chip Components
R
egFile
Main
Memory
(DRAM) Data
C
ache
Instr
C
ache
IT
L
B
D
T
L
B
Speed (%cycles): ½’s 1’s 10’s 100’s 10,000’s
Size (bytes): 100’s 10K’s M’s G’s T’s
Cost: highest lowest
Memory Hierarchy Technologies
• Caches use SRAM for speed
and technology compatibility
• Fast (typical access times of
0.5 to 2.5 nsec)
• Low density (6 transistor per
bit), higher power, expensive
($2000 to $5000 per GB in
2008)
• Static: content will last “forever”
(as long as power is left on)
7
• Main memory uses DRAM for
size (density)
• Slower (typical access times of
50 to 70 nsec)
• High density (1 transistor per
bit), lower power, cheaper ($20
to $75 per GB in 2008)
• Dynamic: needs to be
“refreshed” regularly (~ every 8
ms)
• consumes1% to 2% of the
active cycles of the DRAM
The Memory Hierarchy: Why Does it Work?
• Two Different Types of Locality:
• Temporal Locality (Locality in Time): If an item is referenced, it
will tend to be referenced again soon.
• Spatial Locality (Locality in Space): If an item is referenced,
items whose addresses are close by tend to be referenced soon.
• How can we take advantage of the principles of locality
to improve the performance of computer systems? At the
same time:
• Present the user with as much memory as is available in the
cheapest technology.
• Provide access at the speed offered by the fastest technology.
8
a = b + c;
d = c;
e = b + a;
for (i=0; i <, 100; i++) { a[i] = b[i] + c[i]; } Make use of locality: add a cache • Temporal locality • The first time CPU read a data D from main memory, D is also loaded into cache. • The next time CPU requests D, D is loaded from cache instead of main memory → reduce read latency. • Spatial locality • Assume an array D[10], the first time CPU reads D[0], it is likely that the CPU will read subsequent data later, e.g. D[1], D[2], … • The first time CPU read a data D[0] from main memory, data close to D are also loaded into cache, e.g. address D+4, D+8, etc. • The next time CPU requests data close to D, e.g. data at address D+4, it will be loaded from cache instead of main memory → reduce read latency. 9 Processor Cache Main memory Characteristics of the Memory Hierarchy 10 Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the memory at each level Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM 4-8 bytes (word) 1 to 4 blocks 1,024+ bytes (disk sector = page) 8-32 bytes (block) Memory Hierarchy Levels • Block (or line): the minimum unit of information that is present (or not) in a cache. • May be multiple words of copying. • If accessed data is present in upper level • A cache hit occurs if the cache contains the data that the processor is looking for. Hits are good, because the cache can return the data to the processor much faster than main memory. • If accessed data is absent • A cache miss occurs if the cache does not contain the requested data. This is bad, since the processor has to load data form the slow main memory. 11 Some Definitions • There are two basic measurements of cache performance. • The hit rate is the percentage of memory access that are handled by the cache. • Hit Time: Time to access that level which consists of Time to access the block + Time to determine hit/miss • The miss rate (1 - hit rate) is the percentage of access that must be handled by the slow main memory. • Miss Penalty: Time to replace a block in that level with the corresponding block from a lower level which consists of Time to access the block in the lower level + Time to transmit that block to the level that experienced the miss + Time to insert the block in that level + Time to pass the block to the requestor • Hit Time << Miss Penalty • Typical caches have a hit rate of 95% or higher, so in fact most memory accesses will be handled by the cache and results in higher performance. 12 Important Issues • Where can a data be placed in cache when we copy the data from main memory to cache? • As we can move data from memory to cache to make use of temporal and spatial locality. • How to locate a data in cache? • This is related to the first issue. • If cache is full, how to replace an existing data? 13 Where can a data be placed in cache? • Each memory location is mapped to a location in the cache using the least significant k bits of the address → direct mapped cache • Least significant 2 bits are used in this example. • E.g. data in memory address 1110 will be stored in cache address (index) 10, data in memory address 0001 will be stored in cache address 01. • Given a memory address A, the corresponding cache index C is: C = A mod 2K • Multiple memory locations can be mapped to one cache location. 14 Cache Main memor y 00 01 10 11 Index 45 45 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Memory Address Load data from cache • Since multiple memory locations can be mapped to one cache location • If we want to read data from memory address 0010, how can we know that the cache index 10 is storing data for memory 0010, 0110, 1010, or 1110? • Each cache entry is associated with a tag to specify the upper memory address of the memory data that the current cache location is storing. 15 00 01 10 11 Index 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Tag Data 00 11 01 01 00 01 10 11 Index Tag Data 00 11 01 01 00 + 00 = 0000 11 + 01 = 1101 01 + 10 = 0110 01 + 11 = 0111 Main memory address in cache block Is data in cache? • Even the Tag and Index match when reading data from cache, how can we know the data is update (valid) or not? • Startup? • CPU loads another program to execute? • Add a valid bit • 0: Data is invalid for reading. • 1: data is valid and ready for reading. 16 00 01 10 11 Index Tag Data 00 11 01 01 00 + 00 = 0000 Invalid Invalid 01 + 11 = 0111 Main memory address in cache block 1 0 0 1 Valid Bit Processor read data from memory • Assume memory address is M-bit width • Search the cache • Use the lower K bits of the memory address as index to locate the corresponding cache entry. • Check if the tag equals to the upper (M-K) bits of the memory address. • Check if the valid bit is 1. • If all these are true → cache hit, return data from cache to processor. 17 CPU: I need data of this address (M-K) bits K bits 0 1 2 3 ... ... 1022 1023 Index Tag Data Valid Address (32 bits) = To CPU Hit 10 22 Index Tag Write data to cache (from processor to memory) • To write a data to cache • Use the lower K bits of the memory address as index to locate the corresponding cache entry. • Store the upper (M-K) bits of the memory address into tag field. • Store data into cache’s data field. • Set the valid bit to 1. • Different write policies will be discussed later. 18 CPU: I want to write data to this address (M-K) bits K bits 0 1 2 3 ... ... 1023 Index Tag Data Valid Address (32 bits) 10 22 Index Tag Data 1 CO101�Principle of Computer Organization Computer Organization Processor-Memory Performance Gap The “Memory Wall” The Memory Hierarchy Goal A Typical Memory Hierarchy Memory Hierarchy Technologies The Memory Hierarchy: Why Does it Work? Make use of locality: add a cache Characteristics of the Memory Hierarchy Memory Hierarchy Levels Some Definitions Important Issues Where can a data be placed in cache? Load data from cache Is data in cache? Processor read data from memory Write data to cache (from processor to memory)