Synchronization
Virtual Memory
Questions answered in this lecture:
How to run process when not enough physical memory?
When should a page be moved from disk to memory?
What page in memory should be replaced?
How can the LRU page be approximated efficiently?
CSE 2431
Introduction to Operating Systems
Based on slides by Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau
Reading
Chapters 21, 22
Motivation
OS goal: Support processes when not enough physical memory
Single process with very large address space (see next slide)
Multiple processes with large combined address spaces
User code should be independent of amount of physical memory
Correctness at least, if not performance
Virtual memory: OS provides illusion of more physical memory
Why does this work?
Relies on key properties of user processes (workload) and machine architecture (hardware)
How much memory?
Even in 64 bit systems today, although addresses are 64 bits, fewer than 64 bits are used.
In most systems, only the 42 least significant bits of the 64 bit virtual address are used (the 22 most significant bits of the address are always 0’s).
How much RAM (main memory) can be accessed using a 42 bit virtual address?
A 32 bit virtual address allows a 4 GB virtual address space; how much more does 42 bits allow?
210 X 4 GB = 4 TB (4 terabytes)!!!!!
Do YOU have 4 TB of RAM in your system?
Do you know anyone who does?
You can see we still have a lot of room for growth in future systems!
Combined address spaces
Suppose, in a system with 16 GB of RAM (quite typical today), we are running 20 different processes, and each process uses a 4 GB virtual address space.
What is the size of the combined address spaces?
20 X 4 GB = 80 GB
WAIT! We only have 16 GB of RAM. How can we run all of these 20 processes at the same time????
Using paging, it is not only possible, but quite straightforward! [See below]
code
data
Program
Virtual Memory
code
data
Program
Virtual Memory
code
data
heap
stack
Process 1
create
code
data
Program
code
data
heap
stack
Process 1
create
what’s in code?
Virtual Memory
data
Program
LibA
LibB
Prog
LibC
create
data
heap
stack
Process 1
LibA
LibB
Prog
LibC
many large libraries, some
of which are rarely/never used
Virtual Memory
How to avoid wasting physical pages to back rarely used virtual pages?
data
Program
LibA
LibB
Prog
LibC
data
heap
stack
Process 1
LibA
LibB
Prog
LibC
Virtual Memory
Phys Memory
Prog
LibC
data
Program
LibA
LibB
Prog
LibC
data
heap
stack
Process 1
LibA
LibB
Prog
LibC
Virtual Memory
Phys Memory
Prog
LibC
data
Program
LibA
LibB
Prog
LibC
data
heap
stack
Process 1
LibA
LibB
Prog
LibC
Virtual Memory
Phys Memory
Prog
LibC
access LibB
data
Program
LibA
Prog
LibC
data
heap
stack
Process 1
LibA
LibB
Prog
LibC
Virtual Memory
Phys Memory
Prog
LibC
copy (or move)
to RAM
LibB
LibB
data
Program
LibA
Prog
LibC
data
heap
stack
Process 1
LibA
LibB
Prog
LibC
Virtual Memory
Phys Memory
Prog
LibC
Called “paging” in
LibB
LibB
Locality of Reference
Leverage locality of reference within processes
Spatial: reference memory addresses near previously referenced addresses
Temporal: reference memory addresses that have been referenced in the past
Processes spend majority of time in small portion of virtual address space
Estimate: 90% of time in 10% of virtual space
Implication:
Process only uses small amount of address space at any moment (over any short time window)
Only small amount of address space must be resident in physical memory
Memory Hierarchy
Leverage memory hierarchy of machine architecture
Each layer (except highest layer) acts as “backing store” for layer above
disk storage
main memory
cache
registers
size
speed
cost
Virtual Memory Intuition
Idea: OS keeps unreferenced pages for each process on disk
Slower, cheaper, and larger backing store than memory
Process can run when not all pages are loaded into main memory (there must be a mechanism to make this work, which we will see below)
OS and hardware cooperate to provide illusion of large disk as fast as main memory
Same behavior (in terms of correctness) as if all of address space of process in main memory
Hopefully have similar performance (perhaps not quite as good, but aim to have performance which is close to having all process’s pages loaded into main memory)
Requirements:
OS must have mechanism to identify location of each page in address space in memory or on disk (or perhaps neither)
OS must have policy for determining which pages live in memory and which on disk at any given time (pages in memory or on disk virtually always change as process runs)
Virtual Address Space Mechanisms
Each page in virtual address space of process maps to one of three locations:
Physical main memory (RAM): Smaller, faster, more expensive
Disk (backing store): Larger, slower, cheaper
No location (error): Invalid/unused page for process
Extend page tables with an extra bit: present
permissions (r/w/x), valid, present
Page in memory: present bit set (bit is 1) in PTE
Page on disk: present bit cleared (bit is 0) in PTE
PTE points to block on disk
Causes exception into OS when page is referenced
Exception: page fault (called a fault because instruction which causes fault will execute again when process resumes execution after the exception is handled by the OS)
NOTE: Saying a bit is SET means it has the value 1; saying a bit is CLEARED means it has the value 0.
Present Bit
PFN valid prot present
10 1 r-x 1
– 0 – –
23 1 rw- 0
28 1 rw- 0
4 1 rw- 1
– 0 – –
– 0 – –
– 0 – –
– 0 – –
– 0 – –
– 0 – –
– 0 – –
– 0 – –
Phys Memory
Disk
16 rw- 1
What if access vpn 0xb?
Virtual Memory Mechanisms
Hardware and OS cooperate to translate addresses
First, hardware checks TLB for virtual address
if TLB hit, address translation is done; page in physical memory
If TLB miss…
Hardware or OS walk page tables (depends on HW or OS managed TLB)
If PTE designates page is present, then page in physical memory
If page fault (i.e., present bit is cleared)
Trap into OS (page fault not handled by hardware)
OS selects victim page in memory to replace (if no frames free)
Write victim page out to disk if modified (add dirty bit to PTE)
OS reads referenced page from disk into memory
Page table is updated, present bit is set for page brought in
Process continues execution
What should scheduler do?
Mechanism for
Continuing a Process
Continuing a process after a page fault is tricky
Want page fault to be transparent to user
Page fault may have occurred in middle of instruction
When instruction is being fetched
When data is being loaded or stored
Requires hardware support
precise interrupts: stop CPU pipeline such that instructions before faulting instruction have completed, and those after can be restarted
Complexity depends upon instruction set
Can faulting instruction be restarted from beginning?
Must track side effects so hardware can undo
Virtual Memory Policies
Goal: Minimize number of page faults
Page faults require milliseconds to handle (reading from disk)
Implication: Plenty of time for OS to make good decision
OS has two decisions
Page selection
When should a page (or pages) on disk be brought into memory?
Page replacement (if no free frame available in memory)
Which resident page (or pages) in memory should be thrown out to disk?
Page Selection
When should a page be brought from disk into memory?
Demand paging: Load page only when page fault occurs
Intuition: Wait until page must absolutely be in memory
When process starts: No pages are loaded in memory
First instruction (typically, first few) will cause page faults
Problems: Pay cost of page fault for every newly accessed page
Prepaging (anticipatory, prefetching): Load page before referenced
OS predicts future accesses (oracle) and brings pages into memory early
Works well for some access patterns (e.g., sequential)
Problems?
Page Replacement
Which page in main memory should be selected as victim (to be replaced)?
Write out victim page to disk if modified (dirty bit set)
If victim page is not modified (clean), just “discard” (that is, overwrite)
SOME POSSIBLE POLICIES BELOW (but not all)
OPT: Replace page not used for longest time in future
Advantages: Guaranteed to minimize number of page faults
Disadvantages: Requires that OS predict the future; Not practical, but good for comparison
FIFO: Replace page that has been in memory the longest
Intuition: First referenced long time ago, probably done with it now
Advantages: Fair: All pages receive equal residency; Easy to implement (circular buffer)
Disadvantage: Some pages may always be needed
LRU: Least-recently-used: Replace page not used for longest time in past
Intuition: Use past to predict the future
Advantages: With locality, LRU approximates OPT
Disadvantages:
Harder to implement, must track which pages have been accessed
Does not handle all workloads well
Page Replacement Example
Page reference string: ABCABDADBCB
OPT
FIFO
LRU
ABC
B
D
A
D
B
C
B
A
Three pages
of physical memory
Metric:
Miss count
5, 7, 5 misses
Page Replacement Comparison
Add more physical memory, what happens to performance?
LRU, OPT: Add more memory, guaranteed to have fewer (or same number of) page faults
Smaller memory sizes are guaranteed to contain a subset of larger memory sizes
Stack property: smaller cache always subset of bigger
FIFO: Add more memory, usually have fewer page faults
Belady’s anomaly: May actually have more page faults!
Fifo Performance may Decrease!
Consider access stream: ABCDABEABCDE
Consider physical memory size: 3 pages vs. 4 pages
How many misses with FIFO?
3 pages: 9 misses
4 pages: 10 misses
Problems with
LRU-based Replacement
LRU does not consider frequency of accesses
Is a page accessed once in the past equal to one accessed N times?
Common workload problem:
Scan (sequential read, never used again) one large data region flushes memory
Solution: Track frequency of accesses to page
Pure LFU (Least-frequently-used) replacement
Problem: LFU can never forget pages from the far past
There are other more sophisticated algorithms, sometimes used for certain kinds of applications:
LRU-K: Combines recency and frequency attributes
Expensive to implement, used in databases
Implementing LRU
Software Perfect LRU
OS maintains ordered list of physical pages by reference time
When page is referenced: Move page to front of list
When need victim: Pick page at back of list
Trade-off: Slow on memory reference, fast on replacement
Hardware Perfect LRU
Associate timestamp register with each page
When page is referenced: Store system clock in register
When need victim: Scan through registers to find oldest clock
Trade-off: Fast on memory reference, slow on replacement (especially as size of memory grows)
In practice, do not implement Perfect LRU
LRU is an approximation anyway, so use approximation to LRU
Goal: Find an old page, but not necessarily the very oldest
Clock Algorithm
Hardware
Keep use (or reference) bit for each page frame
When page is referenced: set use bit
Operating System
Page replacement: Look for page with use bit cleared
(has not been referenced for a while)
Implementation:
Keep pointer to last examined page frame
Traverse pages in circular buffer
Clear use bits as search
Stop when find page with already cleared use bit, replace this page
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=1
use=1
use=0
use=1
clock hand
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=0
use=1
use=0
use=1
clock hand
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=0
use=0
use=0
use=1
clock hand
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=0
use=0
use=0
use=1
clock hand
evict page 2 because it has not been recently used
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=0
use=0
use=0
use=1
clock hand
page 0 is accessed…
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=1
use=0
use=0
use=1
clock hand
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=1
use=0
use=0
use=1
clock hand
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=1
use=0
use=0
use=1
clock hand
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=1
use=0
use=0
use=0
clock hand
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=0
use=0
use=0
use=0
clock hand
Clock:
Look For a Page
0
1
2
3
…
Physical Mem:
use=0
use=0
use=0
use=0
clock hand
evict page 1 because it has not been recently used
Clock Extensions
Replace multiple pages at once
Intuition:
Expensive to run replacement algorithm and to write single block to disk
Find multiple victims each time and track free list
Add software counter (“chance”)
Intuition: Better ability to differentiate across pages (how much they are being accessed)
Increment software counter if use bit is 0
Replace when chance exceeds some specified limit
Use dirty bit to give preference to dirty pages (preference is NOT to replace dirty pages, if possible)
Intuition: More expensive to replace dirty pages
Dirty pages must be written to disk, clean pages do not
Replace pages that have use bit and dirty bit cleared
What if no Hardware Support?
What can the OS do if hardware does not have use bit (or dirty bit)?
Can the OS “emulate” these bits?
Leading question:
How can the OS get control (i.e., generate a trap) every time use bit should be set? (i.e., whenever a page is accessed?)
prior use and page faults
Question: If a page is referenced/accessed by a process, and it has been accessed previously, can the reference cause a page fault?
We can ask the question a different way: Will any page which has been previously referenced by the process be in memory?
Answer
The answer is: Not necessarily.
If the page has been replaced since the last time it was referenced, it will not be in memory.
In that case, a page fault will occur, and the page will have to be read back into memory.
Conclusions
Illusion of virtual memory:
Processes can run when sum of virtual address spaces > amount of physical memory
Mechanism:
Extend page table entry with “present” bit
OS handles page faults (or page misses) by reading in desired page from disk
Policy:
Page selection – demand paging, prefetching, hints
Page replacement – OPT, FIFO, LRU, others
Implementations (clock) perform approximation of LRU