程序代写代做代考 C assembly algorithm flex mips go arm graph clock cache computer architecture CSC 230: Memory

CSC 230: Memory
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 1

Cache Memory
• Memory-system characteristics • Cache Memory principles
• Aspects of cache-memory design
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 2

On the face of it…
• Computermemorylookslikeasimpleconcept:
– There is some device which stores the value of a bit…
– … for retrieval at a later point in time (nanoseconds, second, days, etc.).
• Inreality,computermemoryisoneofthe strangest parts of a computer system
• Exhibitsthewidestrangeof: – “types”
– technologies – performance – cost
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 3

1951: Delay Line memory
Sonic pulse created at one end of a tube of mercury, and is detected
CSC 230: Computer Architecture and Assembly Language at the othUneivrersietynofdV.ictorPiaresent of absence of pulse is the value of a bit.
(A memory leak was literal occurrence!) Picture scale: about 6 feet
Department of Computer Science
Cache Memory: Slide 4
Images: http://bit.ly/2F27090, http://bit.ly/2GSLQdJ

1955: Magnetic core memory
Depending upon currents running through two lines, a core (i.e., metal
toroid) could be magnetized clockwise or counterclockwise. Depending
upon the direction, that core stores a 0 or a 1.
Picture on left compare 8 bytes with 8GB.
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 5
Images: http://bit.ly/2EXE2ey, http://bit.ly/2t6zaye, http://bit.ly/2GPNOf2

1725: Paper tape, punched cards
Read-only form of memory, adopted in the 1950s from other industries
as a cheap form of permanent storage. Shown: tape-controlled loom from
1725; compositor at a keyboard recording letters on paper tape for
CSC 230: Computer Architecture and Assembly Language later casUtniivenrsgity oof Vfictomriaetal type using hot lead (around 1900); stack of
punched cards (1960s).
Cache Memory: Slide 6
Department of Computer Science
Images: http://bit.ly/2CqG2tP, http://bit.ly/2CpQeTz, https://ibm.co/2EXHmX4

Memory system characteristics
1. location
2. capacity
3. unit of transfer
4. access method
5. performance
6. physical type
7. physical characteristics
8. organization
University of Victoria Department of Computer Science
• Computermemoryisabig topic
• We’llorganizeourcoverage by using eight key characteristics to describe memory systems
• Coverageofmaterialwill by necessity be light.
• (Butifthereistimewecan go into a deep-dive about flash memory)
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 7

1. Location
• Is the memory internal to the computer …
• … or external?
• Main memory is internal…
– … where “internal” refers more to the nature of the address/data buses used to read/write such memory.
– … but there may be other forms of internal memory
• External storage:
– usually peripheral storage (disk, SSD, tape)
– normally accessed by the CPU via I/O controllers
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 8

2. Capacity
• A simple statement of either:
– number of words available in the memory device – number of bytes
• Note that word has different definitions
– For our course, a MIPS words is four bytes (32 bits)
– In other architectures (i.e., AVR as used in the Arduino Mega), a word is two bytes (16 bits)
– Some earlier computers had a variable word size!
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 9

3. Unit of Transfer
• Determinedlargelyby:
– Number of electrical lines leading into and out of memory module
• Relatedconcept:addressableunits
– Unit of memory addressed at level of byte? of word? or
something even bigger?
• Forinternalmemory:
– Number of bits read out of / written into memory at one time
• Forexternalmemory:
– Data transferred in units larger than a word (sometimes
called blocks)
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 10

4. Access method
• Sequential access:
– Access is made in a specific linear sequence
– Usually consequence of physical medium
• Random access:
– Each memory location can be accessed independent of others at no extra cost (time & energy)
• Direct access:
– Location of memory block determined by location
– Searching, counting, waiting used to reach final location
• Associative
– Word retrieved based on the
contents of some key
University of Victoria Department of Computer Science
Diagram, image: http://bit.ly/2oEPPDd, http://bit.ly/2t3AeCJ
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 11

5. Performance
• Three parameters are used: access time, memory-cycle time, transfer rate
a) access time (latency)
– for random access: duration of time from the instant an address is presented to the instant the data is available for us
– for other memory types: time to position read/write mechanism at needed location
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 12

5. Performance
b) memory cycle time
– Access time…
– … plus additional time for electrical transients to die out on signal lines
– This is almost exclusively concerned with the system bus
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 13

5. Performance
c) transfer rate
– Rate at which memory can be transferred into or out of a memory unit
! ” = ! $ + ‘&
– Tn = average time to read or write n bits
– TA = average access time
– n = number of bits
– R = transfer rate in bits per second (bps)
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 14

5. Performance
c) transfer rate
– Rate at which memory can be transferred into or out of a memory unit
! ” = ! $ + ‘&
– Tn = average time to read or write n bits
– TA = average access time
– n = number of bits
– R = transfer rate in bits per second (bps)
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 15

(wee bit more about transfer rate)
• Foraharddrive:
– TA = time to position arm over correct track, rotational latency for sector to arrive under read/write head
– R = number of bytes transferred per second with read/write head in position
• ForDynamicRAM
– TA = time from when the appropriate page of memory cells is first “strobed” to when bytes from that page can be read/written
– R = number of bytes transferred per second with needed page active
– Note: Usually there exist many banks of DRAM working together in parallel!
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 16

512MByte ECC DIMM memory module
Image: https://bit.ly/2AOMZWF
Each of eight of the chips correspond to 512 MBits of storage
These eight chip contributes 8 bits to the memory module’s data width Therefore capacity = 8 * 512MBits = 512 MBytes
One of the nine chips is used to store an error-correcting code (which
University of Victoria
might invoDelpvaretmetnthofeComupsuter Scoiefnceparity).
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 17

6. Physical type
Linear-TapUneiverOsiptyeofnVic(torLiaTO) format tape & tape unit.
(LTO-4: 0.8 TB per tape; LTO-8: 12 TB per tape)
Cache Memory: Slide 18
Department of Computer Science
CSC 230: Computer Architecture and Assembly Language
• Most common today:
– semiconductor memory (SRAM, DRAM, flash, etc.) – magnetic surface memory
– tape, optical and magneto-optical
Image: http://bit.ly/2EYB6hA

7. Physical characteristics
• Volatile memory
– Informationthatdecays“naturally”…
– …orislostwhenelectricalpowerisremoved
• Non-volatile memory
– oncerecorded,informationremainswithoutdeterioration…
– …althoughatimescaledoesneedtobeindicated(i.e.,non- volatile does not necessarily mean archival)
• Semiconductor memory: some non-volatile examples
– Non-erasableversion:read-onlymemory(ROM)
– Re-writableversion:EEPROM(ElectricallyErasable Programmable Read-Only Memory)
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 19

8. Organization
• How are bits physically arranged to form words?
– Banks
– Ranks
– Pages
– NAND, NOR memory (for flash memory)? – etc. etc.
– Answer is not always obvious
• Other bits in addition to data bits? – error-detection?
– error correction?
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 20

So far…
• Have now seen eight different dimensions in which to characterize memory:
– location,capacity
– unitoftransfer,accessmethod,performance – physicaltype,physicalcharacteristics
– organization
• Of course, memory is always part of a computer system
• Let us now look at what we know is true about memory
in computer systems
– Memoryhierarchy
– Localityofreference
– Cachingasmethodtoovercomememory-hierarchy problems by exploiting locality of reference
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 21

Memory Hierarchy
• We can summarize the previous eight characteristics in the ways they constrain a computer-system designer
– How much memory of a certain type can be obtained (capacity)?
– How fast is that memory (access time)?
– How expensive is it (cost per bit)
• And there are tradeoffs
– Faster access time à greater cost per bit
– Greater capacity à smaller cost per bit
– Greater capacity à slower access time
• This is a dilemma!
– And the only way out is to not rely on a single memory component
or technology
– Rather, we have a hierarchy of memory components which we exploit
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 22

Memory hierarchy
Diagram isUnivderesiftyiofnVictoreialy not to scale! Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 23

Memory Hierarchy
• Aswegodownthehierarchy,weseethat:
a) cost per bit decreases
b) capacity increases
c) access time increases (i.e., slower)
d) decreasing frequency of access of the memory by the processor
• Idea:
– small, more expensive, faster memories…
– … are supplemented by larger, cheaper, slower memories.
• Weneedtoexplore,however,themeaningofd) above…
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 24

Locality of reference
• During the execution a program:
– memoryreferencesaremadebytheprocessor(i.e.,reads and writes)…
– …forbothinstructionsanddata…
– …andthesereferencestendtocluster.
• The idea:
– Aprogramentersaprocedure,withitsloops,etc.
– Thesameprogrammayalsoaccessalotofdatainanarray (i.e., a row, or a column, or a slice).
– Thesereferencesareclusteredintoasmallerrangeof addresses than the whole program.
– Overashortwindowoftime,aprocessortendstowork within only a few clusters of addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 25

Locality of reference
Diagram shows a snapshot of some
program’s memory-reference
behavior. (Think of pages as blocks
of addresses.)
As time passes, the traces showing
mem-ref clusters also change
somewhat.
And at any one point in time, many
parts of the program’s memory are
not referenced.
University of Victoria Department of Computer Science
Image: Stallings, 10th ed. figure 4.2
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 26

Locality of reference
• We can exploit locality of reference by:
– Detecting when mem-ref clusters change…
– … and moving those clusters’ contents to faster memory (i.e., level 1) …
– … with the knowledge that most CPU mem-refs will be to this level 1 memory.
• As program behavior changes:
– old clusters are no longer valid…
– … and these clusters are moved into slower memory (level 2) …
– … in order to free up memory in faster memory (level 1).
• If the hit ratio is good (i.e., large fraction of mem-refs satisfied by level 1 references)…
– … the system’s performance improves.
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 27

Locality of reference
If mem-ref can be satisfied from Level 1 memory: time for reference is T1
If mem-ref cannot be satisfied, from Level 1 (and ignore for now how that is determined), the mem- ref’s data is first obtained from Level 2 (required time T2) and is copied into, and the mem-ref restarted.
If level 1 access time is 0.01 μs (microseconds), and level 2 access time is 0.1 μs, and H (hit ratio) = 0.95, what is the average access time for a mem-ref?
T1 + T2 T2
T1
01
Fraction of accesses involving only Level 1 (Hit ratio)
Image: Stallings, 10th ed. figure 4.2
gure 4.2 Performance of a Simple Two-Level Memory
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 28
University of Victoria Department of Computer Science
Average access time
i

Caching
• The insights regarding locality of reference motivate the concept of a cache
– Combine the memory-access time of expensive, smaller, high speed memory…
– … with the much larger memory size of cheaper, larger, lower-speed memory.
• We can even expand out the cache itself into several levels
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 29

Cache in relation to main memory
A single cache
CPU CPU
Cache Cache
Main Memory Main Memory
CPU CPU
Level 1
(L1) cache Level 1
(L1) cache
Level 2
(L2) cache Level 2
(L2) cache
Main
Memory Main
Memory
A multi-level cache
(here: three levels)
Level 3
(L3) cache Level 3
(L3) cache
Word Transfer Word Transfer
Block Transfer Block Transfer
Slow Slow
(a) Single cache (a) Single cache
Fast Fast
Fastest Fastest
Fast Fast
Less Slow fast
Less Slow fast
Image: Stallings, 10th ed. figure 4.3
(b) Three-level cache organization
(b) Three-level cache organization Figure 4.3 Cache and Main Memory
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 30
University of Victoria
Figure 4.3 Cache and Main Memory
Department of Computer Science

Cache structure
• Let’s assume here that memory is referred to by its word address.
– Ifmemoryaddressesrefertobyteaddresses,thenitlooks like “word” = “byte” for that architecture
• We can view memory as a structure with addressable words
– Ifanaddresshasnbits,thenthenumberofwordsis2n
• We also view memory as consisting of non-overlapping
blocks of K words
– ” is the number of blocks in memory
– Therefore ” = 2%/’
• A cache consists of M blocks called lines – EachlineisKwordsinsize
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 31

Cache in relation to main memory
Main memory
)
–1
1
2
Block Length (K Words)
Cache
(a) Cache
Note that each line has slightly more memory in that line than just the block stored in it. That is, a tag is necessary to help identify which main-memory
block is in the line, and this tag increases the
size of the line.
2n – 1
Line
Number Tag 0
2
C –1
2 3
3
Line Number
Tag Block
Block
Memory address
Memory address
0
1
2
Block 0
00 11
Block 0 (K words)
(K words
C –1
Block Length (K Words)
(a) Cache
Image: Stallings, 10th ed. figure 4.4
(b) Main memory
Word Length
(b) Main memory
Block M
2n – 1
CSC 230: Computer Architecture and Assembly Language
Block M – 1
University of Victoria Department of Computer Science
Cache Memory: Slide 32
Word Figure 4.4 Cache/Main-Memory Structure Length
Figure 4.4 Cache/Main-Memory Structure

Cache read operation
Memory Read Address (RA) is received from the CPU
if (block containing RA is in a cache line) {
fetch RA word contents from cache;
} deliver word contents to CPU
else {
access main memory for block containing RA;
allocate cache line for main memory block;
parallel-tasks-begin {
// cache HIT
// cache MISS
[task A] load main-memory block contents into cache line;
[task B] deliver word contents to CPU;
} }
University of Victoria CSC 230: Computer Architecture and Assembly Language Department of Computer Science Cache Memory: Slide 33
• Whatfollowspseudocodedescription
– Remember that all this is done in hardware!
– All this is phrased from the “point of view” of the memory system

Typical cache organizatoin
Processor
Cache
Note the address & data buffers. These are disabled when a cache hit occurs (i.e., memory reference is completed from cache).
Address
Control
Address buffer
Control
Data buffer
Data
University of Victoria
CSC 230: Computer Architecture and Assembly Language
Department of Computer Science
Cache Memory: Slide 34
Figure 4.6
Typical Cache Organization
Image: Stallings, 10th ed. figure 4.6
System Bus

Cache design elements
1. cache addresses
2. cache size
3. mapping function
4. replacement algorithm
5. write policy
6. line size
7. number of caches
University of Victoria Department of Computer Science
• Inordertoexploreseveral cache implementation ideas…
• …weneedto simultaneously examine basic cache design elements.
• Thesehelptoclassifyand differentiate cache architectures
• Themostinvolvedofthese is the mapping function
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 35

1. Cache addresses
• This refers rather to the kind of memory references (to data and instructions) given to the cache
– Are they physical addresses (i.e., like those generated by an AVR processor)…
– or are they virtual addresses as generated by a laptop or desktop processor (i.e., running macOS, Unix, Windows, etc.)?
• Virtual addresses imply virtual memory
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 36

The code within a process sees a single address space.
That single address space, however, is not real (i.e., it is virtual). It is an illusion implemented by the computer + operating system.
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 37
In reality, different parts of the virtual address space are stored in real memory, and some (not needed at present) are stored on disk.
Only one virtual address space is shown here, but there are as many virtual address spaces as there are processes.

1. Cache addresses
• Two different kinds of addresses:
– Logical addressàgenerated in a virtual address space
– Physical addressàused to access real RAM
• To implement virtual memory, a memory management unit (MMU) is needed
– At run time, the MMU very quickly converts a logical address into a physical address
– Does this for the currently-running process
• If the cache is meant to use logical address mem-refs:
– then the address goes directly to the cache (called a logical cache) • If the cache is meant to use physical addresses:
– The logical address is first run through the MMU…
– … and then the resulting physical address is given to the cache
– … when is then called a physical cache
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 38

Logical vs. physical cache
Processor
Processor
MMU
Main memory
Main memory
Cache
logical cache
Processor
Processor
MMU
Main memory
Main memory
physical cache
Cache ysical addre
Logical address
Cache Logical address
MMU Physical address Physical address
Data
Data
(a) Logical Cache
(a) Logical Cache
Logical address
Logical address
Physical address
MMU Ph Data
ss
University of Victoria Department of Computer Science
Data
(b) Physical Cache
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 39
(b) Physical Cache
Cache

2. Cache size
• Wewant:
– a small enough cache (to keep costs down)
– but have it large enough (to keep hit ratio high)
• Anotherdilemma:
– the larger the cache, the slower it gets (for the same technology)
– and there may be other hard limits regarding chip layouts
• Cacheperformanceisverysensitivetonatureof workload
– We will see more about this when we reach the part of the course dealing with computer-system performance and benchmarks
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 40

2. Cache size
Processor
Type
Year
L1 Cache
L2 Cache
L3 Cache
IBM 360/85
mainframe
1968
16-32 kB


VAX 11/780
minicomputer
1978
16 kB


IBM 3090
mainframe
1985
128/256 kB


Pentium
PC
1993
8/8 kB
256-512 kB
PowerPC G4
PC/server
1999
32/32 kB
256 kB to 1 MB
2 MB
Pentium 4
PC/server
2000
8/8 kB
256 kB
IBM z10
mainframe
2008
64/128 kB
3 MB
24-48MB
Intel Core
i7-7700K
PC/server
2016
128 kB
1 MB
(per core)
8 MB
split numbers (with a “/”) are for architectures with separate instruction & data caches
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 41

3. Mapping function
• Cache is, by definition, smaller than main memory
– Each cache line holds a main memory block
– Therefore there are far fewer cache lines than main memory blocks
• The problem:
– When bringing in the contents of a main-memory block to cache…
– … how do we determine the cache line into which to store a main memory block?
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 42

Mapping: three main techniques
A. Direct mapping: the easiest to understand and implement, but the least flexible.
B. Associative mapping: the most flexible, but also the most expensive / most complex to implement
C. Set-associative mapping: an attempt to combine the best features of direct mapping with the flexibility of associative mapping
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 43

A. Direct Mapping
• Central idea: Each block of main memory must be mapped into exactly one cache line (where each cache line is size w)
i=j%m
where:
i = cache line number
j = main-memory block number m = number of lines in the cache
• Normally quantities such as the block size w and cache lines m are powers of 2
• Let’s also make a little adjustment to the way we diagram main memory…
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 44

Modified view of memory
Red byte is stored as offset 5 in the block.
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 45

w is 8 m is 16
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 46

w is 8 m is 16
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 47

w is 8 m is 16
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 48

w is 8 m is 16
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 49

Again:
w is 8 m is 16
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 50

Again:
w is 8 m is 16
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 51

Again:
w is 8 m is 16
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 52

Again:
w is 8 m is 16
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 53

A change:
w is 16 m is 64
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 54

A change:
w is 16 m is 64
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 55

A change:
w is 16 m is 64
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 56

A change:
w is 16 m is 64
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 57

Yet another change:
w is 8 m is 32
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 58

Yet another change:
w is 8 m is 32
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 59

Yet another change:
w is 8 m is 32
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 60

Yet another change:
w is 8 m is 32
1. Cache addresses
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 61

An observation
• Normally the mapping scheme is chosen by the computer architect
• Also: Normally the size of blocks and number of lines are chosen by the computer architect
• That is: The caching scheme is not something the user can configure!
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 62

Direct-mapping scheme
Cache
(64 bytes in size)
Blue block: bytes from 0x0001efe0 to 0x0001efe7 Main memory block # (j): 0x003dfc
Number of lines in cache (m): 8
i = j % m = (003dfc & 0x7) = 4
tag = 0x7bf
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 63
University of Victoria Department of Computer Science

Direct mapping: Important quantities
• Numberofbitstoidentifybyteinblock:w • Addresslength:s+wbits
– If architecture is 32-bits, and w = 3, then s = 29
• Number of addressable bytes: 2s+w bytes
• Size of a block = 2w bytes
• Numberofblocksinmainmemory:(2s+w/2w) – Also: 2s
• Numberoflinesincache:m=2r • Size of cache: 2r+w bytes
• Sizeoftag:(s–r)bits
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 64

Direct-Mapping Cache Organization
Note in this diagram that blocks (both main memory and cache lines) are shown vertically (rather than horizontally).
Tag
Line
Word
w
(hit in cache)
WO
W1
W2
W3
W4j
w
W(4j+1)
W(4j+2)
W(4j+3)
(miss in cache)
s+w
Cache
Tag Data
Main Memory
Memory Address
r
B0
s–r
L0
s–r
s
Compare
w Li B j
1 if match
0 if no match
0 if match
1 if no match
Lm–1
Image: Stallings, 10th ed. figure 4.9
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 65
University of Victoria Department of Computer Science
Figure 4.9 Direct-Mapping Cache Organization

A. Direct Mapping
• What happens when cache line to which the CPU’s memory address is mapped contains a different block?
– Thatis,whathappenswhenthetagofthememory-address doesn’t match the tag of the block currently in the cache?
• Simple answer: this is a cache miss
• More complete answer depends upon the write-
– Wewilleventuallylookatsuchpolicies…
– …butfornow,weassumethecurrentcontentsinthecache line are stored in main memory…
– …suchthatwecanreplacethatcachelinewithblockof current memory address.
through policy
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 66

0000
0000
0000
0000
0001
0001
0001
1111
1111
1111 1111
0000
0000
0000
0000
0110
0110
0110
0110
1111
1111
1111 1111
0000
0000
1111
1111
0000
0000
0000
0000
1111 1111
0000
0000
1111
1111
0000
0000
0000
0000
1111 1111
0000
1111
1111
0000
0000
1111
0000
0000
1111 1111
0000
0100
1000
1100
0000
0100
T
0000
0000
ag
0000
0000
0000
Line +
0000
Word
0000
0100
In this example:
s + w = 24 bits w = 2 bits
r = 14 bits
tag is most-significant 8 bits (i.e., s – r – w)
0000
13579246
0000
0000
0000
Data
1357924
0000
6
77777777
00 000000001111111111111000 00
000000001111111111111100
00 13579246
16 11235813
Line Tag Data Number
00 13579246 0000 16 11235813 0001
11235813
16 000101100000000000000000 77777777
16 000101101111111111111100 12345678 16
16 000101100000000000000100 11235813 FEDCBA98
16 FEDCBA98 FEDCBA98 0CE7
11223344 3FFE 12345678 3FFF
0001
0011
0011
1001
1100
16 000101100011001110011100 FEDCBA98
16 FF
1100
FF 111111110000000000000000
FF 11223344
8 bits 32 bits
1111
1111
FF 111111110000000000000100 12345678
16-Kline cache
16 12345678
0000
0100
1000 1100
1111
1111
1111
1111
1111
Main memory address =
Figur
111111
11
1000
11223344 24682468
1111
111111
11
1100
11223344
24682468
8 bits
e 4.10 Direct
Tag (hex)
00 00
00 00
16 16
16
16
FF FF
FF FF
Tag
Line + Word
Data
Main memory address (binary)
Tag (hex)
00 00
Line
Number
0000 0001
0CE7
3FFE 3FFF
Main memory address (binary)
Tag
Data
FF FF
8 bits
Note: Memory address values are in binary representation;
other values are in hexadecimal
32 bits 16-Kline cache
32 bits 16-MByte main memory
Tag
Line
14 bits
Mapping Example
WordNote the two pairs of blocks 2 bits
mapping to same cache line.
Note: Memory address values are in binary representation;
Image: Stallings,
other values are in hexadecimal
10th ed. figure 4.10
University of Victoria
32 bits 16-MByte main memory
CSC 230: Computer Architecture and Assembly Language
Department of Computer Science
Cache Memory: Slide 67

A. Direct Mapping
• Benefits: – simple
– inexpensive to implement
• Main disadvantage:
– If programs happens to reference words/bytes from two different blocks, one after the other …
– … and where the two blocks maps to the same cache line …
– … then the blocks will be repeatedly “swapped” into and out of the cache.
– This phenomenon is known as thrashing
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 68

B. Associative Mapping
• Each memory block may be loaded into any line of cache
– Thetaginacachelinenowuniquelyidentifiesablockin memory
• Eliminates the main disadvantage of direct mapping.
• To determine if a block is in the cache…
– …thecachelogicsimultaneouslyexamineseveryline’stag for a match (i.e., comparisons performed on all tags in parallel)
– This“lookup”isdoneinhardware(i.e.,itisveryfast).
– Resultoflookupisthe#ofthecachelineholdingblock(if
block is in the cache)
– Sometimesmemorysystemssupportingthesekindsof lookups are labeled as associative memory or content- addressable memory
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 69
– (Thisisalittlebitlikeahardwareimplementationofahash
table.)
University of Victoria
Department of Computer Science

Fully-Associative Cache Organization
W0
W1
W2
W3
W4j
W(4j+1)
W(4j+2)
W(4j+3)
Tag
Word
(hit in cache)
(miss in cache)
s+w
Image: Stallings, 10th ed. figure 4.11
Main Memory
Memory Address
Cache
Tag Data
s
L0
B0
w
w Lj
s w
Compare
1 if match
0 if no match
0 if match
1 if no match
Bj
s
Lm–1
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 70
Figure 4.11 Fully Associative Cache Organization

0000
0000
0000
0000
0000
00
00
0000
0000
0000
0000
0000
01
00
579246
0001
0110
0011
0011
1001
10
16
16
0
235813 DCBA98
In this example:
s + w = 24 bits w = 2 bits
tag is most-significant 22 bits (i.e., s – w)
Tag
3FFFFE
058CE7
Line Number
13579246 0000 11235813 0001
FEDCBA98 0CE7
32 bits
Data
11223344
FEDCBA98
FEDCBA98
33333333
13579246
24682468
00 16
16
FF 16
0001
0110
0011
0011
1001
11
00
0001
0110
0011
0011
1010
00
16
00
000000
000000
000000
000000
000101 0
000101
00
111111
111111
111111
111111
345678
11223344 3FFE 12345678 3FFF
3FFFFD
ts
16-Kline cache
3FFFFF
1111
1111
1111
1111
1111
01
00
1111
1111
1111
1111
1111
10
00
1111
1111
1111
1111
1111
11
00
Main memory address (binary)
Tag (hex)
000000
000001
Tag
Word Data
13579246
Main memory address (binary)
Tag
(hex) Tag Line + Word Data
0000000000000000000013 00000000000000000100
00001111111111111000 00001111111111111100
100000000000000000 77777777 100000000000000100 11
Tag
Data
Line
Number
0000 0001
3FFD
3FFE
3FFF
058CE6
058CE7
058CE8
FEDCBA98
0101100011001110011100 FE
16 000101101111111111111100 12 FF110000000000000000
FF110000000000000100
FF11111111111111100011223344 FF11111111111111110024682468
000000
Note: Memory address values are in binary representation;
other values are in hexadecimal
8 bi
32 bits 16 Kline Cache
22 bits
Main memory address =
32 bits 16-MByte main memory
Tag
8 bits
Line
14 bits
Word
2 bits
3FFFFD
3FFFFE
3FFFFF
33333333
11223344
24682468
32 bits
16 MByte Main Memory
Figure 4.10 Direct Mapping Example
Note: Memory address values are
in binary representation;
other values are in hexadecimal
University of Victoria
Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 71
Image: Stallings, 10th ed. figure 4.12

Associative mapping: Important quantities
• Numberoflinesincache:undetermined • Sizeofcache:undetermined
• Size of tag: s bits
changed from direct mapping
• Numberofbitstoidentifybyteinblock:w • Addresslength:s+wbits
– If architecture is 32-bits, and w = 3, then s = 29
• Number of addressable bytes: 2s+w bytes
• Size of a block = 2w bytes
• Numberofblocksinmainmemory:(2s+w/2w) – Also: 2s
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 72

B. Associative mapping
• Benefits:
– when a cache miss occurs, there is total flexibility in choosing which cache line into which to load a block
– choice can be made in such a way as to maximize the hit ratio
– therefore thrashing is practically eliminated
• Main disadvantage:
– Complex circuitry is required for the parallel lookup
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 73

C. Set-Associative mapping
• Observation 1:
– Direct mapping is simple to implement…
– … but is inflexible when frequently-used blocks are mapped to the same cache line.
• Observation 2:
– Associative mapping can be designed to improve cache-hit rates …
– … but the hardware complexity and cost is high
• A compromise: combine the two ideas together.
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 74

C. Set-Associative Mapping
• Our cache now consists of sets of cache lines – Each set has the same number of lines
– Each set uses associative mapping
• And the chosen set for a memory block is determined from the memory address
• That is:
– A block can be mapped into any of the lines of its set…
– … but the set itself is determined by the block.
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 75

C. Set-Associative Mapping
• Eachblockofmainmemorymaybemappedinto exactly one cache line
–”=$×&
–’=(%$
• Where
m = number of lines in the cache v = number of sets
k = number of lines in each set
d = cache set number
j = main memory block number
• Wealsorefertoak-wayset-associativemapping
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 76

v Associative Maps caches
B0
L0
Lk–1 Cache memory – set 0
Bv–1
First v blocks of main memory (equal to number of sets)
Image: Stallings, 10th ed. figure 4.13(a)
Cache memory – set v–1
(a) v associative-mapped caches University of Victoria
Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 77
k lines

Set-Associative Cache Organization
B0
B1
Bj
Tag
Set
Word
F0
d
w
F1
Fk–1
Fk
Compare
(hit in cache)
Fk+i
F2k–1
(miss in cache)
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language
Note that F in the cache refers to
“Fully Associative”, while B in the
main memory refers to “Block” (i.e.,
we can’t really use F or B for both
areas of memory).
n
Cache Memory: Slide 78
s+w
Image: Stallings, 10th ed. figure 4.14
Cache
Tag Data
Main Memory
Memory Address
s–d
s–d
Set 0
s+w
Set 1
1 if match
0 if no match
0 if match
1 if no match
Figure 4.14
k-Way Set Associative Cache Organizatio

Set-Associative: Important quantities
• Number of lines in set: k
• Number of sets: v or 2d
• Numberoflinesincache:m=kv=k×2d
• Sizeofcache:k×2(d+w)
• Size of tag: (s – d) bits changed from (full) associative mapping
• Number of bits to identify byte in block: w
• Address length: s + w bits
– Ifarchitectureis32-bits,andw=3,thens=29
• Number of addressable bytes: 2s+w bytes
• Size of a block = line size = 2w bytes
• Number of blocks in main memory: (2s+w / 2w) – Also:2s
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 79

0000
0000
0000
0000
1111
1111
Main memory address (binary)
Main Memory Address =
Tag
(hex) Tag Set + Word
Data
Wd
00000
00001
00001
11111
000
111
111
000
000
0000
0000
000 000
02C 02C
02C FEDCBA98
1111
1111
0000
0000
0000
0000
1111
1111
0000
0000
0000
0100
1000
1100
Tag
Set
or
000 000
00000
000
13579246
9 bits 13 bits 2 bits
Note how sets are represented (i.e., each set has two lines, lines are side-by-side).
Main memory address (binary)
Tag (hex)Wordta
Tag Data
Set Number Tag
Data
00 00
0001
01100
000
0000
0000
0000
0001
01100
000
0000
0000
0100
77777777
11235813
000 13579246 0000 02C 11235813 0001
02C FEDCBA98 0CE7 Line
Tag Data Number
00 13579246 0000 16 11235813 0001
1FF 11223344 1FFE
00 0000000011111111 00 0000000011111111
0001
0110
0
011
0 1100 1110 0111
16 0001011000000000 16 0001011000000000
77777777 11235813
16 000101100011001110011100 FEDCBA98 000101100111111111111100 12345678
16 FEDCBA98 0CE7
02C
1FF 1FF
1FF 1FF
6
02
8 bits 32 bits
78 1FFF
C
1
2
3
4
5
16 000101101111111111111100 12345678 FF
E
33
16 12345678 3FFF
F
F 11
22
44
3
FF
University of Victoria
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 80
0011
Department of Computer Science
Image: Stallings, 10th ed. fFigiugruere4.41.515
Two-Way Set Associative Mapping Example
1001
1100
0000
0100
FF
9 bits 32 bits 9 bits 16-Kline cache
32 bits
0000
0000
FF 1111111111111111 FF 1111111111111111
11223344
0000
0000
0000
1111
1111
0000
0000
0100
1000
1100
Da
1357
0000
9246
02C
1FF
77777777
24682468
1111
1111
1111
1111
0000
0000
0000
0000
0000
0000
0000
0000
0100
0000
0100
11223344 24682468
16 Kline Cache
1111
1111
11111
1111
1111
1
1
111
111
1111
1111
1111
1111
1000
1100
1111
1111
1000
1100
Main memory address =
Line
14 bits
Word
2 bits
24682468
32 bits 16-MByte main memory
Tag
8 bits
Note: Memory address values are in binary representation;
other values are in hexadecimal
32 bits Figure 4.10 Direct Mapping Example 16 MByte Main Memory
Note: Memory address values are in binary representation;
other values are in hexadecimal
Two-Way Set-Associative Mapping Example≈
Tag Line +
0000000000000000

Reasoning about mapping
• Recall:
– v:numberofsets
– m:numberoflinesincache – k:numberoflinesinaset
• Ifv=mandk=1:
– Eachblockinmemorymapstoexactlyoneset
– Thereareonlymsuchsets,eachwithexactlyoneline – Thereforethisschemeisidenticaltodirectmapping
• Ifv=1andk=m:
– Allblocksmaptothesameset…
– …butthesetitselfconsistsofallthecachelines.
– Thereforethisschemeisidenticaltofullassociativemapping
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 81

So what is better?
• We want to:
– choose a cache size and cache design…
– … such that we improve the cache’s hit rate…
– … yet only have set-associative circuitry that is as complex as we need.
• Put differently:
– What design choices yield the best tradeoff of performance to complexity?
– (What gives us the “best bang for the buck”?)
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 82

128k 256k 512k 1M
Varying Associativity over Cache Size
1.0
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
0.9 0.8 0.7 0.6 0.5 0.4
1k 2k 4k 8k 16k 32k 64k Cache size (bytes)
direct 2-way 4-way 8-way 16-way
0.3
Image: Stallings, 10th ed.
figure 4.16
University of Victoria
Figure 4.16 Varyi
Department of Computer
Graphic shows results of one simulation study performed at F0re.2escale Semiconductor Inc. in 2004
For example: Improvement from 2-way to 4-way is much
0.1
CSC 230: Computer Architecture and Assembly Language
g Associativity over Cache Size Cache Memory: Slide 83
smaller in a 4K cache than it is in an 8K cache.
Science
Hit ratio
Hit ratio
n
0.0

Cache design elements
1. cache addresses
2. cache size
3. mapping function
4. replacement algorithm
5. write policy
6. line size
7. number of caches
• Recall our list…
• … and we will now continue with #4.
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 84

4. Replacement algorithms
• When cache is filled (i.e., all lines occupied)…
– …andweneedtobringanewblockintothecache…
– …wemustchooseanexistingline(i.e.,ablockalreadyinthe cache) to replace.
• Direct mapping: no choice!
• Associative & set associative: use a replacement
– LRU(leastrecentlyused)
– FIFO(first-infirst-out)
– LFU(leastfrequentlyused) – Random
• Algorithm must be fast (therefore must be implemented in hardware)
algorithm
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 85

5. Write policy
• The need to replace blocks has some serious consequences
– Iftheblockcontainsmemorythathasonlyeverbeenread, then main-memory contents exactly match the cache contents for that block.
– Iftheblockcontainsmemorythathasbeenmodified(i.e., written), then main-memory and cache differ for the block.
• And the problem does not necessarily involved only block replacement
– Ifmultiplecoreshavetheirowncaches…
– …andthesamemain-memoryblockisintwoormorecaches …
– …thenamodificationoftheblockonecachehas consequences for the same block in the other cache(s)
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 86

5. Write policy
• Write through:
– Simplest,butperhapsmostexpensive
– Anywritetoacachelinearealsomadetoitsmain-memory block
– Thewritetothecachelinecarriesthroughtomainmemory block
• Write back:
– Eachcachelinemaintainsadirtybit.
– Whenblockfirstloadedintoline,dirtybitiscleared
– Whencachelineismodified,dirtybitisset
– Whenlineischosenforreplacement,blockinlineisonly written back to memory if the dirty bit is already set
– Doeshavesomegotchas(i.e.,I/Omustnowgothrough cache)
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 87

6. Line size
• When a block is loaded into the cache…
– …itisusuallybecausesomememoryaddresswasnotinthe cache …
– …andsooncetheblockisloaded,otheraddressesaround the original “cache miss” address will now be in cache
• As line size increases…
– …this“pre-fetching”behaviorimprovesperformance… – andinitiallythehitratioimproves.
• However:
– Iflinesizebecomestoolarge…
– …thenthisincreasestheprobabilitythatmemorylocations in cache are not needed …
– …andsohitratiostopsimprovement.
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 88

7. Number of caches
CPU
Main Memory Main Memory
Cache
Main Memory
Main Memory
CPU
Level 1 (L1) cache
Level 2 (L2) cache
Level 3 (L3) cache
Level 3
• Recall one of our earlier diagrams:
Word Transfer Word Transfer
Block Transfer Block Transfer
CPU
Cache
Fast Slow
Fast
(a) Single cache Slow (a) Single cache
Fastest Fastest
University of Victoria Department of Computer Science
Level 1 (L1) cache
Fast Level 2 Less (L2) cache fast
CPU
(L3) cache
Slow
Slow
Fast
(b) Three-level cache organfaizsattion
Less
(b) Three-level cache organization Figure 4.3 Cache and Main Memory
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 89
Figure 4.3 Cache and Main Memory

7. Number of caches
• Multi-level caches are the current norm
• Level 1 is closest to the CPU
• Level 2 is normally now on the CPU chip
– But even if it isn’t, a special bus normally connects L2 to the CPU (i.e., L2 is not connected via the usual system bus)
• Level 3 is most often off chip
• It does appear, however, the ultimate benefit / payoff strongly depends upon the L1 hit rate
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 90

Total Hit Ratio
8-kB and 16-kB Level 1 caches where size of L2 varies.
From the same study examining the effect of varying associative over cache size.
0.98
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
0.80
0.78
University of Victoria
L1 = 16k
L1 = 8k
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M L2 Cache size (bytes)
Image: Stallings, 10th ed. figure 4.17
CSC 230: Computer Architecture and Assembly Language
Figure 4.17 Total Hit Ratio (L1 and L2) for 8 Kbyte and 16 Kbyte L1
Cache Memory: Slide 91
Department of Computer Science
Hit ratio

Unified vs. Split caches
• Unified:
– Both instructions and data share the same cache
• Split:
– Instructions are in one dedicated cache…
– … while data is in its own dedicated cache.
• Pros and cons:
– Unified automatically balances load between instruction fetches and data fetches.
– Unified means one cache
– Split permits instruction and data fetches to occur in parallel (i.e., eliminates cache contention for instruction fetch and instruction execute)
• Current state of the art: Split L1 caches, unified at other levels
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 92

Summary
• History of memory devices
• Memory system characteristics
• Memory hierarchy
• Cache characteristics, especially mapping schemes
– direct mapping
– (full) associative mapping – set-associative mapping
• Some performance aspects
University of Victoria Department of Computer Science
CSC 230: Computer Architecture and Assembly Language Cache Memory: Slide 93