CS计算机代考程序代写 data structure cache algorithm Chapter 6

Chapter 6
Memory (Part 2)

2

• RAM vs. ROM

• Memory Hierarchy

• What is cache

• Direct Mapped cache mapping

• Fully Associative cache mapping

• Set Associative cache mapping

What we covered so far

3

• When an associative cache is used, a
replacement policy, used to determine which block to

evict, is needed.

• An optimal replacement policy would be able to see
the future to see which blocks won’t be needed for

the longest period.

• Time travel is impossible so an optimal replacement

algorithm can’t be implemented,

– it can be used as a potential best case to compare other

replacement policies against.

6.4 Cache Memory

4

• The replacement policy that we choose depends upon
the locality that we are trying to optimize.

– Usually, we are interested in temporal locality.

• A least recently used (LRU) algorithm keeps track of

the last time that a block was assessed.

– Is is complex: LRU must maintain an access history for each

block, which ultimately slows down the cache.

• First-in, first-out (FIFO) replacement policy is popular.

– The block that has been in the cache the longest is evicted.

• A random replacement policy is exactly that.

– It picks a block at random to evict.

6.4 Cache Memory

5

• Cache performance depends upon programs
exhibiting good locality.

– Some object-oriented programs have poor locality

owing to their complex, dynamic structures.

– Arrays stored in column-major rather than row-major

order can be problematic for certain cache

organizations.

• With poor locality, the miss rate will be higher
causing performance degradation by frequently

reloading blocks into the cache

6.4 Cache Memory

6

• We have only discussed reads, writes to cache
must also make their way back to main memory.

• When cache is written to

the block is marked as dirty.

• Dirty blocks must be written back to memory.
A write policy determines how this will be done.

6.4 Cache Memory

7

• There are two types of write policies:

– Write Through updates cache and main memory

simultaneously on every write

• Writes are slower but are often less frequent.

• Main memory stays consistent.

– Write Back updates memory when the block is

evicted.

• Writes are not slowed down.

• Main memory is inconsistent with the cache.

6.4 Cache Memory

8

• The cache we have been discussing is called a
unified or integrated cache where both instructions

and data are cached.

• Many modern systems employ separate caches for

data and instructions.

– This is called a Harvard cache.

• The separation of data from instructions provides
better locality, at the cost of greater complexity.

– Simply making the cache larger provides about the same

performance improvement without the complexity.

6.4 Cache Memory

9

• Cache performance can also be improved by
adding a small associative cache to hold blocks

that have been evicted recently.

• A trace cache is a variant of an instruction

cache that holds decoded instructions.

• Many systems use multilevel cache hierarchies.

– The levels of cache form their own memory hierarchy.

– Level 1 and 2 cache is on the processor core.

– Level 3 cache is shared amonst all cores in a

multicore system

– Each level is ~3 times slower than the one above.

6.4 Cache Memory

10 (1/3)

• The performance of hierarchical memory is
measured by its effective access time (EAT).

• EAT is a weighted average that takes into account

the hit ratio and relative access times of successive
levels of memory.

• The EAT for a two-level memory is given by:

EAT = H  AccessC + (1-H)  AccessMM.

where H is the cache hit rate and AccessC and AccessMM are

the access times for cache and main memory, respectively.

6.4 Cache Memory

11 (2/3)

• For example,

– Main memory access time of 200ns

– Cache has a 10ns access time and a hit rate of 99%.

– Cache and main memory are read concurrently

• The Effective access time (EAT) is:

0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns.

6.4 Cache Memory

12 (3/3)

• For example,

– Main memory access time of 200ns

– Cache has a 10ns access time and a hit rate of 99%.

– Cache and main memory are read sequentially

• The Effective access time (EAT) is:

0.99(10ns) + 0.01(10ns + 200ns)

= 9.9ns + 2ns
= 11ns.

6.4 Cache Memory

13

• Cache Memory

• Direct, Associative Mappings

• Replacement Policies

• Write Policies

• Effective access time (EAT)

Questions?

14

6.5 Virtual Memory

• Cache memory gives us a net faster memory access.

• Virtual memory simulates having more

memory without having more memory.

• A portion of cheaper slower storage is used as an
extension of main memory.

• Access to the extra memory has a performance cost.

• The operating system swaps pages of data between

main memory and virtual memory as needed.

15

• A Paged Memory Management Unit (PMMU) is
used to create a larger address space

– A physical address is a location in physical memory.

– A virtual addresses is an address managed by the PMMU

– A virtual address is translated to a physical address

• A Page fault occurs when a virtual address is

accessed but does not have a physical address.

– Similar to a cache miss

• Memory fragmentation is when the paging process
creates small, unusable ranges of addresses.

6.5 Virtual Memory

16

• Main memory and virtual memory are divided into
equal sized pages.
– Like cache blocks

• The entire address space required by a process
need not be in memory at once. Some parts can be

on disk, while others are in main memory.

• Further, the pages allocated to a process do not
need to be stored contiguously– either on disk or in

memory.

• In this way, only the needed pages are in memory
at any time, the unnecessary pages are in slower

disk storage.

6.5 Virtual Memory

17

• Information about each page is in a data structure

called a page table (shown below).

• There is one page table for each active process.

6.5 Virtual Memory

18

• When a process generates a virtual address, the
operating system translates it into a physical

memory address.

• To accomplish this, the virtual address is divided
into two fields: A page field, and an offset field.
– The page field is the page number in virtual memory

– The offset is the location of the address in the page.

• The logical page number is translated into a

physical page frame through a lookup in the page

table.

6.5 Virtual Memory

19

• If the valid bit is zero in the page table, the page is
not in memory and must be fetched from disk.

– This is a page fault.

– If necessary, a page is evicted from memory and is replaced

by the page retrieved from disk, and the valid bit is set to 1.

• If the valid bit is 1, the virtual page number is
replaced by the physical frame number.

• The data is then accessed by adding the offset to the

physical frame number.

6.5 Virtual Memory

20

• As an example, suppose a system has a virtual address

space of 8K and a physical address space of 4K, and the

system uses byte addressing.

– We have 213/210 = 23 virtual pages.

• A virtual address has 13 bits (8K = 213) with 3 bits for the page

field and 10 for the offset, because the page size is 1024.

• A physical memory address requires 12 bits, the first two bits

for the page frame and the trailing 10 bits the offset.

6.5 Virtual Memory

21

• Suppose we have the page table shown below.

• What happens when CPU generates address:

• 545910 = 10101010100112 = 0x1553?

• 410010 = 10000000001002 = 0x1004?

6.5 Virtual Memory

22

• We said earlier that effective access time (EAT) takes
all levels of memory into consideration.

• Thus, virtual memory is also a factor in the

calculation, and we also have to consider page table
access time.

• Suppose a main memory access takes 200ns, the

page fault rate is 1%, and it takes 10ms to load a

page from disk. We have:

EAT = 0.99(200ns + 200ns) + 0.01(10ms) = 100396ns.

6.5 Virtual Memory

23

• Even if we had no page faults, the EAT would be
400ns because memory is always read twice: First to

access the page table, and second to load the page

from memory.

• Because page tables are read constantly, it makes
sense to keep them in a special cache called a

translation look-aside buffer (TLB).

• TLBs are a special associative cache that stores the
mapping of virtual pages to physical pages and is part

of the PMMU

6.5 Virtual Memory

24

• Another approach to virtual memory is the use of
segmentation.

• Instead of dividing memory into equal-sized pages,

virtual address space is divided into variable-length
segments, often under the control of the programmer.

• A segment is located through its entry in a segment

table, which contains the segment’s memory location
and a bounds limit that indicates its size.

• After a page fault, the operating system searches for

a location in memory large enough to hold the
segment that is retrieved from disk.

6.5 Virtual Memory

25

• Both paging and segmentation can cause
fragmentation.

• Paging is subject to internal fragmentation.

– The whole page might not be needed

• Segmentation is subject to external fragmentation

– Segment sizes may vary as they are allocated and freed.

– Awkward sized gaps will form over time.

The next slides illustrate internal and external fragmentation.

6.5 Virtual Memory

26

• 32K of memory.

• 8 frames of 4K pages.

• 4 processes, P1..4 need 31K

6.5 Virtual Memory

27

• First three processes are loaded,

memory looks like this:

• All frames used by three processes.

6.5 Virtual Memory

28

• P4 has to wait for one of the other
three to terminate, because there are

no unallocated frames even though
there is enough unused memory.

• This is an example of internal

fragmentation.

6.5 Virtual Memory

29

• Suppose that instead of
frames, our 32K system

uses segmentation.

• The memory segments

of two processes is
shown in the table at the

right.

• The segments can be
allocated anywhere in

memory.

6.5 Virtual Memory

30

• All of the segments of P1 and one of
the segments of P2 are loaded as

shown at the right.

• Segment S2 of process P2 requires

11K of memory, and there is only 1K
free, so it waits.

6.5 Virtual Memory

31

• Eventually, Segment 2 of Process 1
is no longer needed, so it is

unloaded giving 11K of free memory.

• But Segment 2 of Process 2 cannot

be loaded because the free memory
is not contiguous.

6.5 Virtual Memory

32

• Over time, the problem gets
worse, resulting in small

unusable blocks scattered

throughout physical memory.

• This is an example of external
fragmentation.

• Eventually, this memory is

recovered through compaction,
and the process starts over.

6.5 Virtual Memory

33

• Large page tables are slow, but with its uniform
memory mapping, page operations are fast.

• Segmentation allows fast access to the segment table,

but segment loading is labor-intensive.

• Paging and segmentation can be combined to take

advantage of the best features of both by assigning
fixed-size pages within variable-sized segments.

• Each segment has a page table. This means that a

memory address will have three fields, one for the

segment, another for the page, and a third for the
offset.

6.5 Virtual Memory

34

6.6 A Real-World Example

TLB: translation look-aside buffer

35

• Computer memory is organized in a hierarchy

• Cache memory gives faster access to main memory

• Virtual memory gives the illusion of larger memory.

• Cache maps blocks of main memory to blocks of

cache memory. Virtual memory maps page frames

to virtual pages.

– Direct mapped, fully associative and set associative.

• Replacement policies: LRU, FIFO, or LFU.

– Policies must also consider what to do with dirty blocks.

• All virtual memory must deal with fragmentation.

Chapter 6 Conclusion