Virtual Memory
See chapter 9 in [SGG 9/E]
1. Preliminaries
2. The Principle of Locality
Copyright By PowCoder代写 加微信 powcoder
3. Revisiting PMMUs
4. Handling page faults
5. VM management policies
6. Page-replacement algorithms
7. Allocation and load control policies
8. Improving program behaviour
9. Other design issues
CMPUT 379 (E.S. Elmallah)
1. Preliminaries
VM systems allow the total logical space of all running processes to exceed the size of the available physical memory.
As such, VM is essential in
o running very large processes, and
o increasing the multiprogramming degree of an OS.
We’ll study paged VM systems, where the behaviour of a process is analyzed by monitoring its address reference string .
CMPUT 379 (E.S. Elmallah)
Observation #1:
Observation #2: it is possible to run a process efficiently with only some fraction of its pages in memory at any given time.
CMPUT 379 (E.S. Elmallah)
2. Program Behaviour: The Principle of Locality
Typical examples that cause the above behaviour:
o Sequential loops: e.g., initializing a matrix to zero o Access of data structures: e.g., arrays, linked lists,
trees, etc.
o Solving one part of a multipart problem: e.g., lexical analysis, code generation, etc. in compilers
“During any interval of execution, a program tends to favour subsets of its pages, these subsets predominate in the reference pattern in that they appear more often than other subsets and change membership slowly.”
CMPUT 379 (E.S. Elmallah)
In fact, one can distinguish the following types of localities:
o The tendency to reference in the near future the pages referenced in the recent past
o The tendency to make references to a portion of the address space in the neighbourhood of the last reference.
CMPUT 379 (E.S. Elmallah)
3. Revisiting MC68851 PMMU
o The MMU page table is typically a small part of the
process’ page table
CMPUT 379 (E.S. Elmallah)
o Upon receiving a logical address, the MMU searches
• its page table CAM, and
• the remaining part of the page table in the main memory,
if the page is not found a page fault is generated, causing a trap to the OS.
o Recall also, the used, modified, and write protect bits.
CMPUT 379 (E.S. Elmallah)
Why is it difficult to support VM on the 68000 CPU?
o Consider: MOVEM.L D0–D7,–(A0)
o Suppose D7 through D4 have been stored, and consequently A0 have been decremented by 16; what if a page fault occurs at this stage?
The 68000 can only restart an entire instruction, leading to a wrong result.
CMPUT 379 (E.S. Elmallah)
Fix: which of the following is feasible/practical?
o Use a compiler that is aware of the CPU limitations.
o Design a CPU that can restart from a micro instruction.
o Design a CPU that can test instruction execution before an actual execution.
o Design a CPU that uses temporary registers to hold the values of overwritten locations. If there is a page fault, restore the original values before initiating a trap.
CMPUT 379 (E.S. Elmallah)
4. Handling Page Faults
Typically, the OS performs the following steps: o block the current process
o find an empty memory frame
o search the swap area for the missing page
o perform i/o to read the page o restart the process
CMPUT 379 (E.S. Elmallah)
Impact on Performance: o Assume:
• memory access time ≈ 100 nanoseconds
• the disk overhead takes ≈ 25 milliseconds (e.g., 8 msec average rotational latency + 15 msec average seek time + 2 msec transfer time)
o Let p = page fault probability, then
Effective access time = (1-p) * (100 nanoseconds) + p * (25 msec)
o So, missing once every 1000 accesses (i.e., p= 10-3 ), causes a slow down by a factor of 250!
• Certainly an issue for the user
• Can also be an issue for the overall system throughput
CMPUT 379 (E.S. Elmallah)
5. VM Management Policies
Fetching: When to bring the pages into memory?
o Request paging: let the process identify its needs (not practical)
o Pre-paging: start with a few pages pre-loaded, pre- fetch other (not yet referenced) pages as you go
o Demand paging: start with no pages, bring pages as needed
Placement: Where in the main memory the fetched pages should be placed?
o This issue appears in segmentation (not paging) CMPUT 379 (E.S. Elmallah)
Replacement: What page should be removed from memory?
Allocation/Load Control:
o How much physical memory to allocate to each
o How to set the multiprogramming level?
CMPUT 379 (E.S. Elmallah)
6. Page-Replacement Algorithms
General Remarks
For pure demand paging
Replacing a modified page requires a disk i/o
Algorithm success criteria :
o Fast and inexpensive
o Minimizes the # of page faults for many applications
CMPUT 379 (E.S. Elmallah)
the minimum # of page faults incurred by any page replacement algorithm
# of distinct pages in the process’ reference string
o Avoids Belady’s anomaly [’69] :
o Select a random victim, with preference to unmodified pages.
“for some page-replacement algorithms, the page-fault rate may increase as the number of allocated frames increases”
o Easy to implement, yet unpredictable
CMPUT 379 (E.S. Elmallah)
Store the pages in a queue; append new pages to the end. If required, replace the front page.
Pros and cons:
o [+ve] requires no special hardware
o [-ve] requires a dynamic queue data structure o [-ve] suffers from Belady’s anomaly:
• Example: Consider S = (1,2,3,4,1,2,5,1,2,3,4,5) verify that
– 3 frames 9 faults
– 4 frames 10 faults !!
CMPUT 379 (E.S. Elmallah)
OPT (or MIN)
At this point, one may ask:
o Is there a provably optimum strategy?
o If one exists, does it avoid Belady’s anomaly? The answer is YES to both questions.
An optimum strategy works by replacing the page that will not be used for the longest period of time.
Example: using OPT on S= (1,2,3,4,1,2,5,1,2,3,4,5), and 4 frames 6 faults.
OPT requires knowledge of the future, hence it
is mostly useful as an absolute lower bound. CMPUT 379 (E.S. Elmallah)
LRU (Least Recently Used)
LRU uses the recent past as an approximation of the near future.
o Example: using LRU on S= (1,2,3,4,1,2,5,1,2,3,4,5), and 4 frames 8 faults.
LRU belongs to a class of replacement algorithms called stack algorithms [Mattson et al. ’70] that are not subject to Belady’s anomaly.
Exact Implementation Strategies: o Use time stamps. Cons:
• [-] clock management overhead
• [-] search for a victim page is time consuming o Use a stack. Cons:
• [-] stack management overhead CMPUT 379 (E.S. Elmallah)
Approximate Implementation Strategies: o Using the MMU used (or reference) bit:
a) eachpageaccesscausestheMMUtosetthe corresponding used bit
b) everyn(say,n=100)msecstheOSclearsall used bits.
c) pages with zero bits are considered “least recently used”.
CMPUT 379 (E.S. Elmallah)
o Using history registers :
a) asabove
b) the OS associates a history register with each page
c) every n msecs, the OS updates the register:
d) pages with small history values are considered “least recently used”.
CMPUT 379 (E.S. Elmallah)
The Clock Algorithm (2nd Chance)
Yet another approximation of LRU. o Example: using 2nd Chance on S=
(1,2,3,4,1,2,5,1,2,3,4,5), and 4 frames ?? faults Basic Ingredients:
a) Initially,aprocessisallocatedsomefreeframes (with zero used bits). A pointer (clock hand) points to the first frame.
CMPUT 379 (E.S. Elmallah)
b) Choosingavictim:rotatethehanduntileither:
• a zero bit is found, or
• the hand completes one revolution;
• clear all used bits encountered during rotation.
c) Replace the page to which the hand is pointing.
CMPUT 379 (E.S. Elmallah)
o What does it mean if the hand is moving very slowly? fast ?
An Enhanced Implementation (MAC OS?)
a) The OS clears the used bit from time to time to
figure out how often pages are used.
b) Classifyeachpageaccordingto(usedbit,modify
• (0,0) neither recently used, nor modified
• (0,1) not recently used, but modified
• (1,0) recently used, but clean
• (1,1) recently used and modified
c) Choosing a victim: rotate the hand to the first page encountered in the lowest nonempty class.
CMPUT 379 (E.S. Elmallah)
Global versus Local Replacement Policies
Global – select a victim among all processes
Pros and cons:
o [+] may result in better throughput
o [-] the page fault rate of a process is affected by the behaviour of other processes
Local – select a page from the faulted process Unix 4BSD uses a variant of the Clock algorithm
as a global replacement policy.
CMPUT 379 (E.S. Elmallah)
7. Allocation and Load Control Policies
A basic strategy to increase CPU utilization is to increase the Multiprogramming Level (MPL), i.e., the number of processes using the memory.
A system-wide issue: How to allocate free frames to processes?
Side remarks:
o BSD Unix keeps a list of free frames;
o page faults are serviced by a special function pagein()
o if the free list is empty, pagein() blocks
o a page daemon remains dormant until the number of free frames drops below a certain threshold
o when the daemon discovers shortage it executes a variant of the clock algorithm
CMPUT 379 (E.S. Elmallah)
To start a discussion, let’s consider the following issues:
o local replacement policies encourage static allocation o global replacement policies encourage adaptive
allocation
o What is the minimum # of frames required by a process?
• Architecture dependent
• Note: empirical results show an initial heavy paging activity by new processes. Moreover, a small number of instructions are executed within a page before control moves to another page.
CMPUT 379 (E.S. Elmallah)
In static allocation:
o How about using equal allocation between
processes?
• [-ve] different processes have different sizes
o How about using proportional allocation?
• [-ve] well, it is not the total size of a process that determines its memory requirement; rather it is the size of its locality set (also called working-set).
So, we need an adaptive allocation algorithm that deals with working-sets that change dynamically, right?
CMPUT 379 (E.S. Elmallah)
What if a process is not allocated “enough memory”?
o It will page fault quickly, and bounce back and forth between the ready queue and the swap device queue, without significant execution (called thrashing).
o How does thrashing differ from frequent (normal) file i/o?
o Does thrashing of a single process affect the entire system?
CMPUT 379 (E.S. Elmallah)
Yet another system-wide issue:
How do we decide when to increase/decrease the MPL?
How “not” to do it:
avoiding the global replacement vicious circle:
a) theOSdetectslowCPUutilization,and increases the MPL
b) processesstealpagesfromeachother
c) page faults increase; swap device queue gets longer
d) CPUutilizationdecreasesandtheOSgoesto the first step!
CMPUT 379 (E.S. Elmallah)
Next, we discuss some adaptive allocation strategies
A Quick Recapture
CMPUT 379 (E.S. Elmallah)
A Working Set Based Strategy [Denning ’69]
Analyzing working sets to understand program behaviour:
o Provides a reasonable starting point
o Came after initial experiments with FIFO, and LRU,
o Predates the development of some advanced replacement algorithms (e.g., the Clock algorithm)
o Lead to the Principle of Locality
CMPUT 379 (E.S. Elmallah)
Approximating working sets using the immediate past: define a working set W(t,Δ) to be the set of
pages referenced during the interval (t- Δ,t). o Example: W(t, Δ=6) = {1,2}
A possible strategy:
o Select a suitable history interval Δ
o Monitor the working set of every process, and allocate just enough frames to accommodate its working set
o Don’t victimize a page if it is part of the working set of some process
o A process should be run iff its working set is in memory
CMPUT 379 (E.S. Elmallah)
How does the WS algorithm compare with LRU? Does the WS algorithm satisfy Mattson’s et al.
Stack Property?
Problems:
o How to choose a suitable Δ?
o Exact computation of W(t,Δ) is very difficult.
o Approximating W(t,Δ) is not the best we can do.
CMPUT 379 (E.S. Elmallah)
Page-Fault Frequency [Chu & Opderbeck ’72]
Determining the working set is an “over achievement” on the OS part; we only need to know how many pages are “enough”.
On page fault, the PFF algorithm examines two quantities:
o Tcritical ≡ the critical inter-page-fault time (set by the OS),
o T ≡ the measured average inter-page-fault time (as monitored by the OS using a clock over some time interval)
CMPUT 379 (E.S. Elmallah)
Increase decisions: if T ≤ Tcritical bring the required pages into memory, don’t replace any page
Decrease decisions: may free pages which have not been referenced since the last page fault if T
> Tcritical
The OS may also choose two critical threshold
values, and try to keep the inter-page-fault time between the two bounds.
CMPUT 379 (E.S. Elmallah)
Case Study: 4.3BSD Page Replacement and Load Control
A Simplified Description:
o 4 times per second the system checks the memory
needs (e.g. the size of the freelist). CMPUT 379 (E.S. Elmallah)
o If the freelist is short (e.g, < 64K Bytes in 512K memory), the pagedaemon is awakened. o The pagedaemon determines the number of frames that should be added to the freelist; it then calls a variant of the clock algorithm to mark the pages to be replaced (and then calls pageout()). CMPUT 379 (E.S. Elmallah) The Swapper: o Swaps an entire process from main memory: that is, the process’ page tables, data and stack segments, the user structure, and the text segment o Swapping is done if: • the system becomes short of memory that the pagedaemon cannot free memory fast enough • a process page table could not be allocated contiguously • a process is inactive for a long time (e.g. 20 seconds) o Criteria for swapping out: process size and residency in main memory CMPUT 379 (E.S. Elmallah) 8. Improving Program Behaviour: Program Restructuring A User’s Perspective: assume 1K Byte pages. How many frames do we need to execute each of the following: o "for (i=1, i <= 1000, i++) o "for (i=1, i <= 1000, i++) for (i=1, i <= 1000, i++) a[i]= b[i]= 0;" ? a[i]= 0; b[i]= 0;" ? CMPUT 379 (E.S. Elmallah) A Clustering Approach: o Divide the program into blocks; execute the program to collect information on the dynamic referencing of blocks o Example: block reference string “1 3 1 3 4 6 1 2 4 6 2 5 2 5 4 6” CMPUT 379 (E.S. Elmallah) o build a weighted graph where (informally) higher wij more desirability to put blocks i and j in the same page o run a clustering algorithm to group blocks into pages o the clustering optimization problem is NP-complete o selected blocks may not fit in the same page CMPUT 379 (E.S. Elmallah) 9. Other Design Issues Choice of Page Size. o “small” pages give less internal fragmentation, and better memory utilization, but result in large page tables o Memory technology trends: getting larger and cheaper CMPUT 379 (E.S. Elmallah) I/O Interlock. o Avoid the following scenario: • Process issues a disk read request; the OS schedules a DMA transfer to one of A’s pages, and blocks A • the OS reallocates the page to process B (e.g., as in global replacement algorithms); the DMA overwrites one of B’s pages o Possible solutions: • Use a lock bit to lock pages in memory • Never execute I/O to user memory CMPUT 379 (E.S. Elmallah) 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com