Computer Memory (I): Memory hierarchy and Cache
Computers need Memory to store information
In fact, the performance of most modern computers is limited by the bandwidth (i.e., bytes/second) of the connection between the CPU and main memory, the so-called memory bottleneck.
Copyright By PowCoder代写 加微信 powcoder
The goal of this lecture is to understand the works have been done to mitigate this bottleneck.
Learning Objectives
◆Understand the characteristics of different memory units
◆Memory hierarchy – why and how
◆The interplay between Cache and Main Memory
Computer Memory
Storage Technologies (omitted for now in this lecture)
Storage Units
The abstract of storage units that have unique characteristics
Computer Memory
◆Computer memory refers to any physical device capable of storing information
– a wide range of storage technologies, resulting in various access speed, capacity, cost, etc. — Characteristics of Memory
◆Memory organization: select the right memory under design constraints, and put it to the right place
Motivative Example
◆Consider buying a desk to fit in my room
everyone wants this, but too expensive
this one looks nice, but I want a desk with draws — it does not satisfy my needs
Motivative Example
◆Choosing the right memory (physical device) is just like buying the right desk
◆Underlying learning goals:
– trade-off between performance and cost
– design philosophies/techniques to use constrained cost to achieve better performance
Knowing the Characteristics of Memory ◆Memory can be categorized based on Technologies
– SRAM (Static Random Access Memory)
– DRAM (Dynamic Random Access Memory) – PROM (Programmable Read-Only Memory) – Magnetic Disk
◆These Technologies have resulted in different characteristics
Characteristics of Memory
◆Access method — Sequential Access
– memory is organized into units of data (records);
– access each records in sequence: move from a current
location to the desired location sequentially
– as a result, the time to access an arbitrary record is highly variable
– example: tape
Characteristics of Memory
◆Access method — Direct Access/Random Access
– each location in the memory has a unique address
– direct access often refers to secondary devices such as disks
(access the block of data, may plus sequential search)
– random access often refers to main memory or cache
– the time to access a given location is typically independent of the sequence of prior accesses and is often constant
Characteristics of Memory
◆Measures of Performance — Access Time (latency)
– for random access, the time it takes to perform a read or write operation, i.e., from the time when an address is presented to memory to the time when data have been stored or become available
– for non-random access, from the time the instruction is issued to the time the data position is located for read/write operation
Characteristics of Memory
◆Measures of Performance — Memory Cycle Time
– primarily applied to random access memory
– it consists of the access time plus any additional time required
before a second access can be made (cycle)
– example: in computer games, skills (QWER) + cool down
– it is concerned with the system bus
– an associated concept is transfer rate = 1/memory cycle time
Characteristics of Memory
◆Physical features — Volatile vs. Non-volatile
– volatile memory: need power to maintain the stored information (memory is lost when power is off), e.g., registers, RAM
– non-volatile memory: no need to be charged all the time, e.g., hard drive
Characteristics of Memory
◆Physical features — Erasable vs. Non-erasable
– erasable: can be deleted or rewritten with new data, e.g., registers, hard drive
– non-erasable: cannot be deleted/rewritten, e.g., ROM (Read Only Memory), CD
Characteristics of Memory
◆As a result, these storage units are used in
different locations in the computer system
– internal memory or external memory?
– internal: registers in processor, main memory, cache
– external: the peripheral storage devices, such as disk and tape — they are accessible to the processor via I/O controllers
Characteristics of Memory
Comparison among typical storage types
In general: even we did not consider cost, we have two types:
smaller-and-faster vs. bigger-and-slower
Can we get the best of both worlds? (large, fast, cheap) — Memory Hierarchy
Memory Hierarchy
◆Our goal: to have a large, fast, and cheap main memory
– Clearly, we cannot achieve this using a single type of storage ◆Idea: we use a hierarchal system of memories to
emulate the large, fast, cheap memory
Memory Hierarchy Approaches ◆Approach One: Expose Hierarchy
– Provide the programmer some amount of each type of storage explicitly
– Let the programmer to decide how to best allocate the memory resources for each particular computation
Memory Hierarchy Approaches ◆Approach Two: Hide Hierarchy
– For programmer: single memory with single address
– A memory system “behind the scene” to move the data between the levels of
the hierarchy
– Is this memory system possible? – it needs to automatically arrange for the right data to be in the right place at the right time
Example: Two-Level Memory
◆The processor has access to two levels of memory
– Level-1: capacity C1 = 100 words, access time T1 = 0.1s
– Level-2: capacity C2 = 1000 words, access time T2 = 1s
– access mechanism: if some data is in Level-1, directly access the data; otherwise, move the data from Level-2 to Level-1 first, and then access the data from Level-1
– T1 vs. (T1 + T2)
Example: Two-Level Memory
◆Hit Ratio: the fraction of accesses involving only Level-1
If the hit ratio is high, the average access time is close to T1 (as if, we are only using the faster Level-1 memory)
We can reduce the average access time only if when “frequently-accessed data” are placed in faster memory at a higher chance.
Idea of Memory Hierarchy ◆In the hierarchy
– closer to the processor: we have faster but lower-capacity, more expensive memory
– further to the processor: we have slower but higher-capacity, less expensive memory
– idea: can we “allocate” the information such that frequently accessed information is stored in faster memory
– But how can we know “frequently accessed information”?
Locality of Reference
◆Phenomenon: during program execution, memory
accesses by processor tend to cluster
– technically, access to address X at time t implies that access to address X+∆ X at time t+∆t becomes more probable as ∆ X and ∆t approach
– reasons: programs typically contain loops/subroutines — repeatedly access a small set of instructions; also, for data, operations on tables and arrays involve access to a clustered set of data
Designing Memory based on Locality of Reference
◆Clustering effect
– during some specific time period, processor wants to access a
portion of main memory (one cluster)
– over time, processor accesses different clusters
cluster 1 at time t1
cluster 2 at time t2
cluster 3 at time t3
Level-1 memory (fast but low-capacity)
Level-2 memory (slow but low-capacity)
How will design the memory?
Designing Memory based on Locality of Reference ◆Solution — increase the hit ratio
Time t1 Time t2 Time t3
Level-1 memory (fast but low-capacity)
Level-2 memory (slow but low-capacity)
This is the basic idea of using Cache between Main Memory (RAM) and Processor
Cache Memory ◆Cache
– A small storage component that transparently retains (caches) data from recently accessed locations
– Very fast access if data is cached
– Located in CPU or on motherboard (On Chip) close to CPU
◆Learning goals
– basic principles of cache
– some design issues, in particular, mapping functions
Cache is fast but has relatively small capacity.
Basic mechanism: cache contains the copy of portions of main memory (blocks). when CPU wants to read a word:
if the word is in cache, access the word
else a block of main memory (containing that word) is read into cache, then the
the word is delivered to CPU
Due to locality of reference, the hit ratio is high (words can be found in cache with high chance) As a result, as if there is memory with large capacity and fast access speed
Multiple-Level Cache Organization
A typical memory hierarchy
Structure of a Cache/Main-Memory System
Word: the basic memory unit (could have different lengths in different systems; differentiate it from the word we used in MIPS)
Each word has a unique n-bit address – so, the memory is composed of 2^n addressable words
A block consists of a fixed number of K words; together we have M = 2^n/K blocks
Structure of a Cache/Main-Memory System
The Cache consists of m lines, where each line has a length of K words (the size of one block in memory)
m lines in Cache vs. M blocks in Memory (m <
It is modular operation!
Implementation of Direct Mapping
How does CPU access a word?
(1) Given an address (s+w) bits (2) Use r bits to locate the line in Cache
(3) Compare the (s-r) bits with the tag:
(3.1) if match, the line stores the desired block; use w bits to identify the desired word
(3.2) if no match, access the memory, copy the block to the line
Example of Direct Mapping ◆Settings:
– word = one byte, block = 4 words (4 bytes)
– main memory has 16 Mbytes (note: 1 MB = 1024 KB; 1 KB = 1024B);
that is, 2^{24} bytes -> 24 bits for the address
– we have 2^{22} blocks -> s = 22 bits, w = 2 bits
– cache has 64 KB = 2^{16} bytes
– question: how many lines are there in the Cache?
– 2^{14} lines -> among the 22 bits, r = 14 bits, 8 bits for the tag
Example of Direct Mapping ◆Mapping:
The first two hexadecimal digits serve as the tag
Blocks in the same row are mapped to the same cache line, but they have distinct tag number
Direct Mapping Summary ◆Multiple Blocks to One Line
– address s + w bits
– cache will interprete it as three fields: tag (s-r), line (r), word (w) – line (r) determines which line in cache the block will map to
– tag (s-r) determines the current block that line is storing
– CPU can directly check that line, and compare the tag field to determine whether the block is now in the cache
Direct Mapping ◆Pros and Cons
– simple and inexpensive to implement
– main disadvantage: multiple blocks in memory are mapped to a
fixed Cache line
– if a program happens to access words repeatedly from two different
blocks that map into the same line
– the two blocks will be continually swapped in the cache — low hit ratio
Associative Mapping
◆Key difference from direct mapping:
– a block in memory can be mapped to any line in cache
– cache will interpret the address as two fields Tag and Word – address = s + w bits; 2^s blocks in memory
– s serves as the tag, and it is stored together with the data
– note: there is no field to determine the line number
Example of Associative Mapping ◆Settings:
– memory has 2^{22} blocks, each block has 4 bytes (data = 32 bits) – address is 24 bits = 22 bits (s) + 2 bits (w)
– Tag: 22 bits
Implementation of Associative Mapping
A tag (s bits) is stored together with the block data
CPU needs to check every tag to determine if there is a hit — more complex than direct mapping
Example of Associative Mapping
A block can be mapped to any line in cache!
What if the cache is full?
For direct mapping, there is no choice. For associative mapping, we need to decide which block should be replaced — replacement algorithms
The main disadvantage of associative mapping is the complex circuitry required to examine the tags of all cache lines in parallel
Set-Associative Mapping (not required)
◆A compromise between direct mapping and associative
– cache is organized into a number of sets, where each set contains a number of lines
– a block is mapped to a fixed set
– However, we have the freedom of which line in the set should be
– Special case: two-way set associative (two lines in a set)
Measurements of Cache Performance
◆We need to measure how good a Cache design is
– Essentially, how fast we can find a word in Cache – We have defined a few important metrics
◆Miss rate = 1 – hit rate (ratio)
– Fraction of memory accesses not found in the cache (misses/accesses)
– Typically, 3—10% for L1 cache; can be quite small (< 1%) for L2 cache, depending on size, etc.
Measurements of Cache Performance ◆Hit time
- Time to deliver a word in the cache to the processor (includes time to determine whether the word is in the cache)
- Typically, 1-2 clock cycle for L1; 5 -20 clock cycles for L2 ◆Miss Penalty
- Additional time required because of a miss (need to access main memory)
- Typically, 50 – 200 clock cycles
Measurements of Cache Performance ◆Huge difference between a hit and a miss
- Could be 100x, if just L1 and main memory
◆How better 99% hit ratio is than 97% hit ratio? - Suppose, hit time = 1 cycle and miss penalty = 100 cycle
Measurements of Cache Performance
◆How better 99% hit ratio is than 97% hit ratio?
- Suppose, hit time = 1 cycle and miss penalty = 100 cycle
- Average access time:
-97%hit: 0.97*1cycle+0.03*(1+100)cycles=1+0.03*100=4 -99%hit: 0.99*1cycle+0.01*(1+100) cycles=1+0.01*100=2 - 99% hit is twice as good as 97% hit !
- This is why oftentimes we use miss rate ( 3% vs. 1%) instead of hit rate
◆Previously, we introduced how to map blocks into cache ◆At certain point, the cache would be Full
◆The next question: which block (line in cache) should be
replaced when a new block comes? -- Replacement Algorithms
Replacement Algorithms for Associative Mapping ◆Least Recently Used (LRU)
- replace the block that has been in the cache longest with no reference to it
- there requires some extra index to record the time when a line is referenced – the price need to pay
- Simple for two-way set-associative mapping: use a single bit USE to indicate the most recently used line in the set
- Most popular replacement algorithm
Replacement Algorithms for Associative Mapping ◆Other common replacement algorithms
- First-in-first-out (FIFO): replace that block in the set that has been in the cache longest (what's the difference from LRU)
- Least Frequently Used (LFU): associate a counter to each line
- Random (not based on usage): randomly pick one to replace – some papers show that Random is only slightly worse than previous usage-based algorithms
Write Policy (brief)
◆The problem: inconsistency between cache and main memory
◆Why does inconsistency happen?
- cache stores the copy of the same data in main memory
- two pieces of data can be modified (write) from different sources -- inconsistency
Write Policy (brief)
◆Why does inconsistency happen?
- multiple CPUs: each CPU has its own cache, which may store the same block in memory
- I/O can directly access main memory
- we need a policy to deal with such inconsistency
Write Policy I - Write Through
◆All writes go to cache as well as main memory
- multiple CPUs need to monitor main memory traffic to keep local (with respect to CPU) cache up to date
- lots of traffic
- slows down writes
Write Policy II - Write Back
◆Updates initially made in cache only, then memory
- use an extra bit along with each cache line to indicate whether there's update in this line (set the bit if there's update)
- when this cache line needs to be replaced, check this bit, write back to memory if the bit is set (such that updates are not lost)
- requirement: I/O has to access cache, not main memory directly
Summary of Today's Lecture ◆Characteristics of Different Memory
- access methods, access time, capacity, cost, ...
- the design of memory system is to use contrained budget to meet
different needs -- trade-offs among various characteristics - hierachy of memory
Summary of Today's Lecture ◆Concept of Cache
- the mechanism to use fast but small memory combined with large but slow memory to emulate fast and large memory -- dynamically copy frequently used data into fast memory
- the supporting principle -- locality of references
- important measurements – miss rate, hit time, miss penalty, average access time
Summary of Today's Lecture ◆Design elements of Cache
- mapping functions -- different rules of mapping blocks in memory to cache lines
- others: replacement algorithms, write policy, size of cache, levels of cache, ... (textbook reading)
Tasks after Lecture ◆Reading Chapter 4.1 - 4.3
- some parts are not required (set-associative mapping, details of write policy, ...)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com