Chapter 6
Memory (Part 2)
2
• RAM vs. ROM
• Memory Hierarchy
• What is cache
• Direct Mapped cache mapping
• Fully Associative cache mapping
• Set Associative cache mapping
What we covered so far
3
• When an associative cache is used, a
replacement policy, used to determine which block to
evict, is needed.
• An optimal replacement policy would be able to see
the future to see which blocks won’t be needed for
the longest period.
• Time travel is impossible so an optimal replacement
algorithm can’t be implemented,
– it can be used as a potential best case to compare other
replacement policies against.
6.4 Cache Memory
4
• The replacement policy that we choose depends upon
the locality that we are trying to optimize.
– Usually, we are interested in temporal locality.
• A least recently used (LRU) algorithm keeps track of
the last time that a block was assessed.
– Is is complex: LRU must maintain an access history for each
block, which ultimately slows down the cache.
• First-in, first-out (FIFO) replacement policy is popular.
– The block that has been in the cache the longest is evicted.
• A random replacement policy is exactly that.
– It picks a block at random to evict.
6.4 Cache Memory
5
• Cache performance depends upon programs
exhibiting good locality.
– Some object-oriented programs have poor locality
owing to their complex, dynamic structures.
– Arrays stored in column-major rather than row-major
order can be problematic for certain cache
organizations.
• With poor locality, the miss rate will be higher
causing performance degradation by frequently
reloading blocks into the cache
6.4 Cache Memory
6
• We have only discussed reads, writes to cache
must also make their way back to main memory.
• When cache is written to
the block is marked as dirty.
• Dirty blocks must be written back to memory.
A write policy determines how this will be done.
6.4 Cache Memory
7
• There are two types of write policies:
– Write Through updates cache and main memory
simultaneously on every write
• Writes are slower but are often less frequent.
• Main memory stays consistent.
– Write Back updates memory when the block is
evicted.
• Writes are not slowed down.
• Main memory is inconsistent with the cache.
6.4 Cache Memory
8
• The cache we have been discussing is called a
unified or integrated cache where both instructions
and data are cached.
• Many modern systems employ separate caches for
data and instructions.
– This is called a Harvard cache.
• The separation of data from instructions provides
better locality, at the cost of greater complexity.
– Simply making the cache larger provides about the same
performance improvement without the complexity.
6.4 Cache Memory
9
• Cache performance can also be improved by
adding a small associative cache to hold blocks
that have been evicted recently.
• A trace cache is a variant of an instruction
cache that holds decoded instructions.
• Many systems use multilevel cache hierarchies.
– The levels of cache form their own memory hierarchy.
– Level 1 and 2 cache is on the processor core.
– Level 3 cache is shared amonst all cores in a
multicore system
– Each level is ~3 times slower than the one above.
6.4 Cache Memory
10 (1/3)
• The performance of hierarchical memory is
measured by its effective access time (EAT).
• EAT is a weighted average that takes into account
the hit ratio and relative access times of successive
levels of memory.
• The EAT for a two-level memory is given by:
EAT = H AccessC + (1-H) AccessMM.
where H is the cache hit rate and AccessC and AccessMM are
the access times for cache and main memory, respectively.
6.4 Cache Memory
11 (2/3)
• For example,
– Main memory access time of 200ns
– Cache has a 10ns access time and a hit rate of 99%.
– Cache and main memory are read concurrently
• The Effective access time (EAT) is:
0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns.
6.4 Cache Memory
12 (3/3)
• For example,
– Main memory access time of 200ns
– Cache has a 10ns access time and a hit rate of 99%.
– Cache and main memory are read sequentially
• The Effective access time (EAT) is:
0.99(10ns) + 0.01(10ns + 200ns)
= 9.9ns + 2ns
= 11ns.
6.4 Cache Memory
13
• Cache Memory
• Direct, Associative Mappings
• Replacement Policies
• Write Policies
• Effective access time (EAT)
Questions?
14
6.5 Virtual Memory
• Cache memory gives us a net faster memory access.
• Virtual memory simulates having more
memory without having more memory.
• A portion of cheaper slower storage is used as an
extension of main memory.
• Access to the extra memory has a performance cost.
• The operating system swaps pages of data between
main memory and virtual memory as needed.
15
• A Paged Memory Management Unit (PMMU) is
used to create a larger address space
– A physical address is a location in physical memory.
– A virtual addresses is an address managed by the PMMU
– A virtual address is translated to a physical address
• A Page fault occurs when a virtual address is
accessed but does not have a physical address.
– Similar to a cache miss
• Memory fragmentation is when the paging process
creates small, unusable ranges of addresses.
6.5 Virtual Memory
16
• Main memory and virtual memory are divided into
equal sized pages.
– Like cache blocks
• The entire address space required by a process
need not be in memory at once. Some parts can be
on disk, while others are in main memory.
• Further, the pages allocated to a process do not
need to be stored contiguously– either on disk or in
memory.
• In this way, only the needed pages are in memory
at any time, the unnecessary pages are in slower
disk storage.
6.5 Virtual Memory
17
• Information about each page is in a data structure
called a page table (shown below).
• There is one page table for each active process.
6.5 Virtual Memory
18
• When a process generates a virtual address, the
operating system translates it into a physical
memory address.
• To accomplish this, the virtual address is divided
into two fields: A page field, and an offset field.
– The page field is the page number in virtual memory
– The offset is the location of the address in the page.
• The logical page number is translated into a
physical page frame through a lookup in the page
table.
6.5 Virtual Memory
19
• If the valid bit is zero in the page table, the page is
not in memory and must be fetched from disk.
– This is a page fault.
– If necessary, a page is evicted from memory and is replaced
by the page retrieved from disk, and the valid bit is set to 1.
• If the valid bit is 1, the virtual page number is
replaced by the physical frame number.
• The data is then accessed by adding the offset to the
physical frame number.
6.5 Virtual Memory
20
• As an example, suppose a system has a virtual address
space of 8K and a physical address space of 4K, and the
system uses byte addressing.
– We have 213/210 = 23 virtual pages.
• A virtual address has 13 bits (8K = 213) with 3 bits for the page
field and 10 for the offset, because the page size is 1024.
• A physical memory address requires 12 bits, the first two bits
for the page frame and the trailing 10 bits the offset.
6.5 Virtual Memory
21
• Suppose we have the page table shown below.
• What happens when CPU generates address:
• 545910 = 10101010100112 = 0x1553?
• 410010 = 10000000001002 = 0x1004?
6.5 Virtual Memory
22
• We said earlier that effective access time (EAT) takes
all levels of memory into consideration.
• Thus, virtual memory is also a factor in the
calculation, and we also have to consider page table
access time.
• Suppose a main memory access takes 200ns, the
page fault rate is 1%, and it takes 10ms to load a
page from disk. We have:
EAT = 0.99(200ns + 200ns) + 0.01(10ms) = 100396ns.
6.5 Virtual Memory
23
• Even if we had no page faults, the EAT would be
400ns because memory is always read twice: First to
access the page table, and second to load the page
from memory.
• Because page tables are read constantly, it makes
sense to keep them in a special cache called a
translation look-aside buffer (TLB).
• TLBs are a special associative cache that stores the
mapping of virtual pages to physical pages and is part
of the PMMU
6.5 Virtual Memory
24
• Another approach to virtual memory is the use of
segmentation.
• Instead of dividing memory into equal-sized pages,
virtual address space is divided into variable-length
segments, often under the control of the programmer.
• A segment is located through its entry in a segment
table, which contains the segment’s memory location
and a bounds limit that indicates its size.
• After a page fault, the operating system searches for
a location in memory large enough to hold the
segment that is retrieved from disk.
6.5 Virtual Memory
25
• Both paging and segmentation can cause
fragmentation.
• Paging is subject to internal fragmentation.
– The whole page might not be needed
• Segmentation is subject to external fragmentation
– Segment sizes may vary as they are allocated and freed.
– Awkward sized gaps will form over time.
The next slides illustrate internal and external fragmentation.
6.5 Virtual Memory
26
• 32K of memory.
• 8 frames of 4K pages.
• 4 processes, P1..4 need 31K
6.5 Virtual Memory
27
• First three processes are loaded,
memory looks like this:
• All frames used by three processes.
6.5 Virtual Memory
28
• P4 has to wait for one of the other
three to terminate, because there are
no unallocated frames even though
there is enough unused memory.
• This is an example of internal
fragmentation.
6.5 Virtual Memory
29
• Suppose that instead of
frames, our 32K system
uses segmentation.
• The memory segments
of two processes is
shown in the table at the
right.
• The segments can be
allocated anywhere in
memory.
6.5 Virtual Memory
30
• All of the segments of P1 and one of
the segments of P2 are loaded as
shown at the right.
• Segment S2 of process P2 requires
11K of memory, and there is only 1K
free, so it waits.
6.5 Virtual Memory
31
• Eventually, Segment 2 of Process 1
is no longer needed, so it is
unloaded giving 11K of free memory.
• But Segment 2 of Process 2 cannot
be loaded because the free memory
is not contiguous.
6.5 Virtual Memory
32
• Over time, the problem gets
worse, resulting in small
unusable blocks scattered
throughout physical memory.
• This is an example of external
fragmentation.
• Eventually, this memory is
recovered through compaction,
and the process starts over.
6.5 Virtual Memory
33
• Large page tables are slow, but with its uniform
memory mapping, page operations are fast.
• Segmentation allows fast access to the segment table,
but segment loading is labor-intensive.
• Paging and segmentation can be combined to take
advantage of the best features of both by assigning
fixed-size pages within variable-sized segments.
• Each segment has a page table. This means that a
memory address will have three fields, one for the
segment, another for the page, and a third for the
offset.
6.5 Virtual Memory
34
6.6 A Real-World Example
TLB: translation look-aside buffer
35
• Computer memory is organized in a hierarchy
• Cache memory gives faster access to main memory
• Virtual memory gives the illusion of larger memory.
• Cache maps blocks of main memory to blocks of
cache memory. Virtual memory maps page frames
to virtual pages.
– Direct mapped, fully associative and set associative.
• Replacement policies: LRU, FIFO, or LFU.
– Policies must also consider what to do with dirty blocks.
• All virtual memory must deal with fragmentation.
Chapter 6 Conclusion