CS计算机代考程序代写 cache Memory and Caches Part 1

Memory and Caches Part 1

Stewart Smith Digital Systems Design 4

Digital System Design 4
Memory & Caches Part 1

Dr Stewart Smith

Stewart Smith Digital Systems Design 4

This Section

• Memory Access Locality
• Memory Hierarchy
• Cache Technology
‣ Directly Mapped Caches

‣ Misses and Writes

• Main memory Technology

Stewart Smith Digital Systems Design 4

Classic Components of a Computer: Memory

The processor gets instructions and data from memory. Input writes data to
memory, and output reads data from memory. Control sends the signals that

determine the operations of the datapath, memory, input, and output.

Stewart Smith Digital Systems Design 4

Full Datapath

Stewart Smith Digital Systems Design 4

Memory Technology
• Static RAM (SRAM)
‣ 0.5ns – 2.5ns, $2000 – $5000 per GB

• Dynamic RAM (DRAM)
‣ 50ns – 70ns, $20 – $75 per GB

• Magnetic disk
‣ 5ms – 20ms, $0.20 – $2 per GB

• Ideal memory
‣ Access time of SRAM
‣ Capacity and cost/GB of disk

Stewart Smith Digital Systems Design 4

Memory Technology
Speed Size Cost ($/bit)

Current
Technology

Fastest Smallest Highest SRAM

DRAM

Slowest Biggest Lowest
Flash SSD
or Magnetic

Disk

Processor

Memory

Memory/
Storage

Stewart Smith Digital Systems Design 4

Binary Notation – Giga v.s. “Gibi”
Decimal
Term

Abbreviation Value
Binary
Term

Abbreviation Value % larger

Kilobyte KB 103 Kibibyte KiB 210 2%

Megabyte MB 106 Mebibyte MiB 220 5%

Gigabyte GB 109 Gibibyte GiB 230 7%

Terabyte TB 1012 Tebibyte TiB 240 10%

Same for bits as for bytes, so gigabit (Gb) is 109 bits
and a gibibit is 230 bits (1073741824!)

Stewart Smith Digital Systems Design 4

Principle of Locality
• Programs access a small proportion of their

address space at any time

• Temporal locality
‣ Items accessed recently are likely to be accessed

again soon
‣ e.g., instructions in a loop, induction variables

• Spatial locality
‣ Items near those accessed recently are likely to be

accessed soon
‣ E.g., sequential instruction access, array data

Stewart Smith Digital Systems Design 4

Taking Advantage of Locality

• Memory hierarchy
• Store everything on disk
• Copy recently accessed (and nearby) items

from disk to smaller DRAM memory
‣ Main memory

• Copy more recently accessed (and nearby)
items from DRAM to smaller SRAM memory
‣ Cache memory attached to CPU

Stewart Smith Digital Systems Design 4

Memory Hierarchy
CPU

Level 1

Level 2

…

Level n

Levels in the
memory hierarchy

Increasing distance
from the CPU in

access time

Size of the memory at each level

Stewart Smith Digital Systems Design 4

Memory Hierarchy Levels
• Block (aka line): unit of copying
‣ May be multiple words

• If accessed data is present in upper level:
‣ Hit: access satisfied by upper level

– Hit ratio: hits/accesses
• If accessed data is absent
‣ Miss: block copied from lower level

– Time taken: miss penalty
– Miss ratio: misses/accesses

= 1 – hit ratio

‣ Then accessed data supplied from upper
level

Stewart Smith Digital Systems Design 4

Cache Memory
• Cache memory
‣ The level/s of the memory hierarchy closest to the CPU
• Cache: a safe place for hiding or storing things
• Given accesses X1, …, Xn–1, Xn

• How do we know if the
data is present?

• Where do we look?

Stewart Smith Digital Systems Design 4

Direct Mapped Cache
• Location determined by address
• Direct mapped: only one choice
‣ (Block address) modulo (#Blocks in cache)

• #Blocks is a
power of 2

• Use low-order
address bits

00101 → 101

Stewart Smith Digital Systems Design 4

Tags and Valid Bits
• How do we know which particular block is

stored in a cache location?
‣ Store block address as well as the data
‣ Actually, only need the high-order bits
‣ Called the tag

• What if there is no data in a location?
‣ Valid bit: 1 = present, 0 = not present
‣ Initially 0

Stewart Smith Digital Systems Design 4

Cache Example 1
• 8-blocks, 1 word/block, direct mapped
• Initial state
‣ All valid-bits = N (0, not present)

Index V Tag Data

000 N

001 N

010 N

011 N

100 N

101 N

110 N

111 N

Stewart Smith Digital Systems Design 4

Index V Tag Data

000 N

001 N

010 N

011 N

100 N

101 N

110 N

111 N

Index V Tag Data

000 N

001 N

010 N

011 N

100 N

101 N

110 Y 10 Mem[10110]

111 N

Word address Binary address Hit/miss Cache index

22 10 110 Miss 110

Cache Example 2

Stewart Smith Digital Systems Design 4

Index V Tag Data

000 N

001 N

010 N

011 N

100 N

101 N

110 N

111 N

Index V Tag Data

000 N

001 N

010 N

011 N

100 N

101 N

110 Y 10 Mem[10110]

111 N

Word address Binary address Hit/miss Cache index

22 10 110 Miss 110

Cache Example 2

Stewart Smith Digital Systems Design 4

Index V Tag Data

000 N

001 N

010 N

011 N

100 N

101 N

110 Y 10 Mem[10110]

111 N

Word addr Binary addr Hit/miss Cache block

26 11010 ? ?

Cache Example 3 !

Stewart Smith Digital Systems Design 4

Index V Tag Data

000 N

001 N

010 N

011 N

100 N

101 N

110 Y 10 Mem[10110]

111 N

Index V Tag Data

000 N

001 N

010 Y 11 Mem[11010]

011 N

100 N

101 N

110 Y 10 Mem[10110]

111 N

Word addr Binary addr Hit/miss Cache block

26 11 010 Miss 010

Cache Example 3

Stewart Smith Digital Systems Design 4

Index V Tag Data

000 N

001 N

010 Y 11 Mem[11010]

011 N

100 N

101 N

110 Y 10 Mem[10110]

111 N

Word addr Binary addr Hit/miss Cache block

22 10 110 Hit 110

26 11 010 Hit 010

Cache Example 4

Stewart Smith Digital Systems Design 4

Index V Tag Data

000 N

001 N

010 Y 11 Mem[11010]

011 N

100 N

101 N

110 Y 10 Mem[10110]

111 N

Index V Tag Data

000 Y 10 Mem[10000]

001 N

010 Y 11 Mem[11010]

011 N

100 N

101 N

110 Y 10 Mem[10110]

111 N

Index V Tag Data

000 Y 10 Mem[10000]

001 N

010 Y 11 Mem[11010]

011 Y 00 Mem[00011]

100 N

101 N

110 Y 10 Mem[10110]

111 N

Word addr Binary addr Hit/miss Cache block

16 10 000 Miss 000

3 00 011 Miss 011

Cache Example 5

Stewart Smith Digital Systems Design 4

Word addr Binary addr Hit/miss Cache block

16 10 000 Hit 000

Cache Example 6

Index V Tag Data

000 Y 10 Mem[10000]

001 N

010 Y 11 Mem[11010]

011 Y 00 Mem[00011]

100 N

101 N

110 Y 10 Mem[10110]

111 N

Stewart Smith Digital Systems Design 4

Word addr Binary addr Hit/miss Cache block

18 10010 ? ?

Cache Example 7

Index V Tag Data

000 Y 10 Mem[10000]

001 N

010 Y 11 Mem[11010]

011 Y 00 Mem[00011]

100 N

101 N

110 Y 10 Mem[10110]

111 N

Stewart Smith Digital Systems Design 4

Index V Tag Data

000 Y 10 Mem[10000]

001 N

010 Y 11 Mem[11010]

011 Y 00 Mem[00011]

100 N

101 N

110 Y 10 Mem[10110]

111 N

Index V Tag Data

000 Y 10 Mem[10000]

001 N

010 Y 10 Mem[10010]

011 Y 00 Mem[00011]

100 N

101 N

110 Y 10 Mem[10110]

111 N

Word addr Binary addr Hit/miss Cache block

18 10 010 Miss 010

Cache Example 7

Stewart Smith Digital Systems Design 4

Address Subdivision

Comparator

1024 Blocks = 210 (10 bit index)

Stewart Smith Digital Systems Design 4

Address Structure

• How can you go from the address
structure to the cache structure

‣ What is the block size (words or bytes)?

‣ How many blocks/lines in the cache?

31 … 14 13 … 6 5 … 2 1 0
Tag Index Block offset byte

Stewart Smith Digital Systems Design 4

Example of FastMATH Cache

Stewart Smith Digital Systems Design 4

Choice of Block Size

• Larger blocks exploit spatial locality
‣ Contiguous data in arrays for example

• However the benefits reduce as block size
increases
‣ Very large blocks mean fewer blocks in cache

and an increased miss rate

‣ Large blocks also increase the miss penalty of
retrieving the block from the next level

Stewart Smith Digital Systems Design 4

Choice of Block Size

• Larger blocks exploit spatial locality
‣ Contiguous data in arrays for example

• However the benefits reduce as block size
increases
‣ Very large blocks mean fewer blocks in cache

and an increased miss rate

‣ Large blocks also increase the miss penalty of
retrieving the block from the next level

Stewart Smith Digital Systems Design 4

Cache Misses (Reads)
• On cache hit, CPU proceeds normally
• On cache miss
‣ Stall the CPU pipeline
‣ Fetch block from next level of hierarchy
‣ Instruction cache miss
– Restart instruction fetch

‣ Data cache miss
– Complete data access

Stewart Smith Digital Systems Design 4

Write-Through
• On data-write hit, could just update the block in

cache
‣ But then cache and memory would be inconsistent
• Write through: also update memory
• But it makes writes take longer
‣ e.g., if base CPI = 1, 10% of instructions are stores,

write to memory takes 100 cycles
– Effective CPI = 1 + 0.1×100 = 11 (factor of 10 decrease…)

• Solution: write buffer
‣ Holds data waiting to be written to memory
‣ CPU continues immediately
– Only stalls on write if write buffer is already full

Stewart Smith Digital Systems Design 4

Write-Back

• Alternative: On data-write hit, just update
the block in cache
‣ Keep track of whether each block is dirty

• When a dirty block is replaced
‣ Write it back to memory
‣ Can use a write buffer to allow replacing

block to be read first

Stewart Smith Digital Systems Design 4

Write Allocation
• What should happen on a write miss?
• For write-back
‣ Usually fetch the block
‣ Need to make sure we don’t lose changed data

• Alternatives for write-through
‣ Allocate on miss: fetch the block
‣ Write around: don’t fetch the block
– Since programs often write a whole block before

reading it (e.g., initialization)

Stewart Smith Digital Systems Design 4

Write Through
No Allocation
Write Around

Write
Allocation

Stewart Smith Digital Systems Design 4

Write Back
Write

Allocation

Stewart Smith Digital Systems Design 4

Main Memory Supporting Caches

• Use DRAMs for main memory
‣ Fixed width (e.g., 1 word)
‣ Connected by fixed-width clocked bus
– Bus clock is typically slower than CPU clock

• Example cache block read
‣ 1 bus cycle for address transfer
‣ 15 bus cycles per DRAM access
‣ 1 bus cycle per data transfer
• For 4-word block, 1-word-wide DRAM
‣ Miss penalty = 1 + 4×15 + 4×1 = 65 bus cycles
‣ Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle

Stewart Smith Digital Systems Design 4

DRAM Technology
• Data stored as a charge in a capacitor
‣ Single transistor used to access the charge
‣ Must periodically be refreshed
– Read contents and write back
– Performed on a DRAM “row”

Stewart Smith Digital Systems Design 4

DRAM Performance Factors

• Row buffer
‣ Allows several words to be read and refreshed in parallel

• Synchronous DRAM
‣ Allows for consecutive accesses in bursts without

needing to send each address

‣ Improves bandwidth

• DRAM banking
‣ Allows simultaneous access to multiple DRAMs
‣ Improves bandwidth

Stewart Smith Digital Systems Design 4

Advanced DRAM Organisation

• Bits in a DRAM are organised as a
rectangular array
‣ DRAM accesses an entire row
‣ Burst mode: supply successive words from a row

with reduced latency

• Double data rate (DDR) DRAM
‣ Transfer on rising and falling clock edges

• Quad data rate (QDR) DRAM
‣ Separate DDR inputs and outputs

Stewart Smith Digital Systems Design 4

Row Hammer
• Activating a row in DRAM can cause

fluctuating potentials in adjacent row

• Rapidly cycling the activation can cause
‘bit flips’

• Worse effects seen in more advanced
memory from DDR3 onwards

• Exploits have been demonstrated which
can bypass security on computers/mobiles

Stewart Smith Digital Systems Design 4

DRAM Generations
DRAM Access time (ns)

100

1000

1980
1983
1985
1989
1992
1996
1998
2000
2004
2007
2010
2012
2015
2018

Trac
Tcac

Year Capacity $/GiB

1980 64 Kib $6,480,000

1983 256 Kib $1,980,000

1985 1 Mib $720,000

1989 4 Mib $128,000

1992 16 Mib $30,000

1996 64 Mib $9,000

1998 128 Mib $900

2000 256 Mib $840

2004 512 Mib $150

2007 1 Gib $40

2010 2 Gib $13

2012 4 Gib $5

2015 8 Gib $7

2108 16 Gib $6

TRAC – time to access new row/column

TCAC – Time to access column in active row

Stewart Smith Digital Systems Design 4

Increasing Memory Bandwidth

• 4-word wide memory
‣ Miss penalty = 1 + 15 + 1 = 17 bus cycles
‣ Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle

• 4-bank interleaved memory
‣ Miss penalty = 1 + 15 + 4×1 = 20 bus cycles
‣ Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle

The primary method of achieving higher memory bandwidth is to increase the physical
or logical width of the memory system.

Stewart Smith Digital Systems Design 4

Next Section

•Cache Performance
•Associative Caches
•Multilevel Caches
•Real examples

Related Posts