代写 algorithm MIPS operating system software Use main memory as a “cache” for secondary (disk) storage

Use main memory as a “cache” for secondary (disk) storage
– Managed jointly by CPU hardware and the operating system (OS)
Programs share main memory
– Each gets a private virtual address space holding its frequently used code and data
– Protected from other programs
CPU and OS translate virtual addresses to physical addresses
– VM “block” is called a page
– VM translation “miss” is called a page fault
Virtual Memory Virtual Memory 1
CS@VT Computer Organization II ©2005-2015 McQuain

Idea: hold only those data in physical memory that are actually accessed by a process Maintain map for each process
{ virtual addresses }  { physical addresses }  { disk addresses }
OS manages mapping, decides which virtual addresses map to physical (if allocated) and
which to disk
Disk addresses include:
– Executable .text, initialized data
– Swap space (typically lazily allocated)
– Memory-mapped (mmap’d) files (see example)
Demand paging: bring data in from disk lazily, on first access
– Unbeknownst to application
Paging to/from Disk Virtual Memory 2
CS@VT Computer Organization II ©2005-2015 McQuain

OS maintains structure of each process’s address
space – which addresses are valid, what do they refer to, even those that aren’t in main memory currently
%esp
Backed by
Not paged, or swap file swap file
kernel virtual memory
stack
Memory mapped region for shared libraries
run-time heap (via malloc)
uninitialized data (.bss)
initialized data (.data)
program text (.text)
code: shared .so file data: swap file (*)
swap file
swap file
swap file (*) executable
0
Process Virtual Memory Image Virtual Memory 3
CS@VT Computer Organization II ©2005-2015 McQuain

Fixed-size pages (e.g., 4KB)
Swap file
Address Translation Virtual Memory 4
CS@VT Computer Organization II ©2005-2015 McQuain

On page fault, the page must be fetched from disk
– Takes millions of clock cycles
– Handled by OS code
Try to minimize page fault rate
– Fully associative placement
– Smart replacement algorithms
How bad is that?
Assume a 3 GHz clock rate. Then 1 million clock cycles would take 1/3000 seconds or 1/3 ms.
Subjectively, a single page fault would not be noticed… but page faults can add up.
We must try to minimize the number of page faults.
Page Fault Penalty Virtual Memory 5
CS@VT Computer Organization II ©2005-2015 McQuain

Stores placement information
– Array of page table entries, indexed by virtual page number
– Page table register in CPU points to page table in physical memory
If page is present in memory
– PTE stores the physical page number
– Plus other status bits (referenced, dirty, …)
If page is not present
– PTE can refer to location in swap space on disk
Page Tables Virtual Memory 6
CS@VT Computer Organization II ©2005-2015 McQuain

1
3
2
4
5
Translation Using a Page Table Virtual Memory 7
CS@VT Computer Organization II ©2005-2015 McQuain

Mapping Pages to Storage Virtual Memory 8
CS@VT Computer Organization II ©2005-2015 McQuain

To reduce page fault rate, prefer least-recently used (LRU) replacement (or approximation)
– Reference bit (aka use bit) in PTE set to 1 on access to page
– Periodically cleared to 0 by OS
– A page with reference bit = 0 has not been used recently
Disk writes take millions of cycles
– Block at once, not individual locations
– Write through is impractical
– Use write-back
– Dirty bit in PTE set when page is written
Replacement and Writes Virtual Memory 9
CS@VT Computer Organization II ©2005-2015 McQuain

Address translation would appear to require extra memory references – One to access the PTE
– Then the actual memory access
Can’t afford to keep them all at the processor level.
But access to page tables has good locality
– So use a fast cache of PTEs within the CPU
– Called a Translation Look-aside Buffer (TLB)
– Typical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cycles for miss, 0.01%–1% miss rate
– Misses could be handled by hardware or software
Fast Translation Using a TLB Virtual Memory 10
CS@VT Computer Organization II ©2005-2015 McQuain

Fast Translation Using a TLB Virtual Memory 11
CS@VT Computer Organization II ©2005-2015 McQuain

If page is in memory
– Load the PTE from memory and retry
– Could be handled in hardware
 Can get complex for more complicated page table structures
– Or in software
 Raise a special exception, with optimized handler
If page is not in memory (page fault)
– OS handles fetching the page and updating the page table
– Then restart the faulting instruction
TLB Misses Virtual Memory 12
CS@VT Computer Organization II ©2005-2015 McQuain

TLB miss indicates whether
– Page present, but PTE not in TLB
– Page not present
Must recognize TLB miss before destination register overwritten
– Raise exception
Handler copies PTE from memory to TLB
– Then restarts instruction
– If page not present, page fault will occur
TLB Miss Handler Virtual Memory 13
CS@VT Computer Organization II ©2005-2015 McQuain

Use faulting virtual address to find PTE Locate page on disk
Choose page to replace
– If dirty, write to disk first
Read page into memory and update page table
Make process runnable again
– Restart from faulting instruction
Page Fault Handler Virtual Memory 14
CS@VT Computer Organization II ©2005-2015 McQuain

If cache tag uses physical address
– Need to translate before cache lookup
Alternative: use virtual address tag – Complications due to aliasing
 Different virtual addresses for shared physical address
TLB and Cache Interaction Virtual Memory 15
CS@VT Computer Organization II ©2005-2015 McQuain

Different tasks can share parts of their virtual address spaces
– But need to protect against errant access
– Requires OS assistance
Hardware support for OS protection
– Privileged supervisor mode (aka kernel mode)
– Privileged instructions
– Page tables and other state information only accessible in supervisor mode
– System call exception (e.g., syscall in MIPS)
Memory Protection Virtual Memory 16
CS@VT Computer Organization II ©2005-2015 McQuain

Intel Nehalem 4-core processor
Per core: 32KB L1 I-cache, 32KB L1 D-cache, 512KB L2 cache
Multilevel On-Chip Caches Virtual Memory 17
CS@VT Computer Organization II ©2005-2015 McQuain

Intel Nehalem
AMD Opteron X4
Virtual addr
48 bits
48 bits
Physical addr
44 bits
48 bits
Page size
4KB, 2/4MB
4KB, 2/4MB
L1 TLB (per core)
L1 I-TLB: 128 entries for small pages, 7 per thread (2×) for large pages
L1 D-TLB: 64 entries for small pages, 32 for large pages
Both 4-way, LRU replacement
L1 I-TLB: 48 entries
L1 D-TLB: 48 entries
Both fully associative, LRU replacement
L2 TLB (per core)
Single L2 TLB: 512 entries 4-way, LRU replacement
L2 I-TLB: 512 entries
L2 D-TLB: 512 entries
Both 4-way, round-robin LRU
TLB misses
Handled in hardware
Handled in hardware
2-Level TLB Organization Virtual Memory 18
CS@VT Computer Organization II ©2005-2015 McQuain

Nehalem Overview Virtual Memory 19
CS@VT Computer Organization II ©2005-2015 McQuain