WSCLoca – A Simple and Effective Algorithm for Virtual Memory Management
Richard W. Cart 1 Department of Computer Science Stanford University
A new virtual memory management algorithm WSCLOCKhas been synthesized from the local working set (WS) algorithm, the global CLOCK algorithm, and a new load control mechanism for auxiliary memory access. The new algorithm combines the most useful feature of WS-a natural and efti:ctive load control that prevents thrashing-with the simplicity and efficiency of CLOCK. Studies are presented to show that the performance of WS and WSCLOCKare equivalent, even if the savings in overhead are ignored.
Introduction
Copyright By PowCoder代写 加微信 powcoder
Modern memory management policies optimize performance by varying the space allocated to each task as its perceived need changes. Such policies also vary the load (i.e., the number of active tasks) to achieve high levels of multiprogramming while avoiding thrashing. Modern va,’iable-space, variable-load memory management policies have been divided into local policies and globalptflicies. Ideally. a local policy estimates the memory needs, or locality, of each task independently of other tasks and allocates sufficient main memory to hold the each active task’s locality. A global policy correlates a task’s memory allocation with its locality, but makes no explicit, independent measure of the locality, and does not necessarily allocate sufficient main memory for each active task’s locality.
This work was supported by the Departanent of Energy, Contract DE-AC03-76SF00515.
IAuthor’s current address: Tandem Computers Inc., 19333 Vallco Parkway, Cupertino, CA 95014
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
. Systems Laboratory Stanford University
Local policies are typified by the working set (WS) policy which was first defined by Denning [DI~N~68] and has been the object of much study [DENN72, RODR73, PRIE73, SMIT76, MARS79]. Global policies are typified by the global least-recently-used (LRU) approximation algorithm CLOCKthat is used in MULTICS; studies of CLOCK have appeared infrequently [CORB68, EAST76]. Although a local policy, such as WS, isolates tasks from each other and may be better at preventing thrashing, a global policy is often used in real systems because it is simpler to implement and has less computational overhead.
This paper presents a new policy WSCLOCK that combines the operational advantages of WS with the simplicity and efficiency of CLOCK. We describe the data structures, replacement algorithm and load control used to implement WSCLOCK. We introduce a simple new Loading Task~Running Task (LT/RT) load control mechanism to control competiton for access to auxiliary memory; although LT/RT is a general control for all memory management policies, it is shown to be’particularly appropriate for WSCLOCK. Finally, we describe the use of a realistic simulation model to demonstrate the effectiveness of both the LT/RT control and of WSCLOCK.
Our primary focus in this paper is virtual memory management methods in interactive systems. Interactive systems are characterized by large numbers of tasks which make numerous small processing requests; in such systems, the amount of memory used by all tasks can be many times the size of main memory and, thus, is where one finds the greatest advantage of virtual memory. Compared to batch systems, interactive systems activate and deactivate tasks very frequently and perform the basic memory management functions more often; these systems benefit the most from algorithmic simplicity and efficiency. Although the methods presented in this paper may be most effective in interactive systems, they do not appear to have any disadvantages when used in batch systems.
© 1981 ACM 0-89791-062-1-12/81-0087 $00.75
General Model of a Virtual Memory Computer System
This section is a brief summary of a virtual memory computer system model presented in [CARR81]. The model is designed specifically to compare scheduling, memory management, and load control policies on conventional large-scale computers. The model incorporates both an accurate representation of program behavior based on measured program reference strings, and a general, but detailed, model of a virtual memory operating system.
Task Model
A task is modeled by a virtual address space P={ Pi I i=l,2,…,m} of pages and a reference string {rt I t= 1,2,…,T}. Each reference rt is an ordered pair (p,d), where p E P and d is Boolean variable which is true if the reference changes or dirties the page. The virtual time VT of a task is the number of references that have been completed for that task. Tasks make I / 0 requests at stochastically- distributed intervals of virtual time.
Configuration Model
The computer system configuration model contains (1) a central processor, (2) a main memory of M page frames, and (3) a collection of 1/0 devices. The central processor is implicity defined as capable of executing one reference for one task in each unit of virtual time. Associated with each frame are the use-bit, set when the frame is referenced, and the dirty-bit, set by each dirty reference. I/O devices are modeled as simple servers with independent and indentically distributed service times.
One or more of the l/O devices is designated as an auxiliary memory, which contains a copy of every task page. Although the model permits both task and paging I/O requests to access the same device, the studies presented in this paper assume that paging devices are separated from task I/O devices.
Operating System Model
The operating system model has three main components: the scheduler, the memory manager, and the load control.
Each task occupies one of two scheduler queues (the ready queue and the active queue) or is dormant. The scheduler orders the ready queue and assigns a time-slice to each ready task in order to balance objectives of response time, processor ntilization, throughput and externally specified priorities. The model uses a multi-level load-balancing queue, described in [CARR81], which can be parameterized to operate as any of the simpler queueing disciplines, such as first-in-first-out or round-robin. Tasks in the active queue arc selected for execution in round-robin order for short time quanta to approximate a processor sharing discipline. A task remains in the active queue until (1) the time-slice is exhausted, (2) the task is deactivated by the load control, or (3) the task completes or becomes dormant.
The memory manager allocates main memory frames to each task, and requests paging I/O operations to copy pages between main and auxiliary memory. At any given time, each task address space P is partitioned into a resident set R, of pages which occupy main memory frames, and a missing set R’. (R’ denotes the complement of the set R.) If the processor executes a reference (p,d) and p l[ R, a page fault blocks the task until the missing page is made resident by copying it from auxiliary memory. The model assumes demand- paging: a page p is loaded only after a page fault occurs for p. A p E R is clean whenever it is copied from main memory to auxiliary memory, or vice-versa; p is dirty following any reference which sets the frame dirty-bit.”
To achieve optimal performance in a virtual memory computer, the operating system seeks to maximize the number of active tasks without inducing thrashing. Thrashing occurs when so many tasks are active that the sum of their memory needs exceeds the size of main memory and; memory becomes overcommitted. The load control monitors the commitment of main memory (either directly or indirectly) and when memory appears to be undcrcommitted, load control may move tasks from the ready queue to the active queue; when memory appears to be overcommitted, the load control moves tasks from the active queue to the ready queue.
Working Set Policy
The reader should be familiar with the basic concepts of the WS policy (see [Dt~NN68] or [DENN70]). We limit our discussion to the relevant details of its implementation.
Working Set Determination
The WS policy defines the working set W to be all p C P referenced in the previous 0 units of the task’s virtual time. Implementation of WS load control and replacement requires some timely mechnism to determine each task’s W. If a task’s page table contains only p E W, then the arrival of a new p (~ W is signalled by a page fault. To detect the departure of a page from W is more difficult. Typically, we require (1) a task’s virtual time VT, (2) the last reference time LR(p) for each page, and (3) a procedure to detect pages for which VT-LR(p) > 0.
Task VT is easily obtained by summing the time intervals that the task has executed. Pure WS assumes some mechanism that can update LR(p) automatically in parallel with program exeqution, but practical implementations use either the page frame use-bit or special hardware to approximate LR(p). To use the frame use-bit, each p (~ R of a given task is examined by software at various times (e.g., at faults or at fixed intervals). The use-bit is tested and cleared, and if the page was recently referenced then LR(p) is set to the task’s current VT. To detect p • W and for which VT-LR(p) > 0impliesaWS-scanofeachpEPtofindeachpEW. (To
scan only p £ W for a given task would require the maintenance of an additional data structure.) The WS-scan can incorporate the use-bit test to approximate LR(p) with little extra cost.
With the hardware support for WS on the Maniac II (see [MORR72]) each page frame has an associated counter that approximates the virtual-time-since-last-reference VT- LR(p) directly. The counter is cleared whenever the page is referenced and the processor automatically increments the counter of each page of the task every .25 msec. that the task executes. Since the Maniac II has only 64 page frames, the processor time is minimal, but with the larger memories on modern machines, the time required to update thousands of counters might be excessive. The Maniac II implementation still requires a WS-scan to remove the pages for which VT-LR(p) > 0. In essence, the Maniac It scheme is equivalent to the ordinary use-bit method, except that the scanning is scheduled and performed without the overhead associated with a system interrupt.
Load Control and Replacement Algorithm
When a task is active, W = [W[ frames are committed to the task. The total memory commitment Wactiveis the sum of the W of all active tasks. The WS load control will activate a ready task unless the W of the first ready task exceeds M-Wactive.
A set A of available frames is replenished whenever a task is deactivated or when a WS-scan removes resident pages from an active ,task’s W. The replacement algorithm simply chooses some frame in A. If A is empty, then the load control selects a task to deactivate and that task’s resident pages are placed in A.
Page Writing and Reclamation
When a dirty page is placed in A, it must be cleaned before it is replaced. A simple approach is to couple the writing of a dirty page and the reading of the page that replaces it; this method blocks a faulting task for the time of two paging I/Os instead of one. Another approach is to replace only the clean pages in A and to clean the dirty pages in A when there are no outstanding page read requests for a device: if all pages in A are dirty, then cleaning operations will naturally have precedence over page reads.
Although a page in A is eligible for replacement, it may not be replaced for some time. If a task references a page in A, the system can avoid the delay and a page-in I/O operation if it can reclaim the page in A. This requires a special procedure to search A each time a page fault occurs.
CLOCK Policy
CLOCK is a simple approximation of the global LRU replacement algorithm [Cold~68]. All main memory page frames are ordered in a fixed circular list as illustrated in Figure 1. A pointer or “hand” always points to the last frame replaced. When a frame is needed to hold a missing page, the pointer is advanced “clockwise”,
“‘~ LastFrame
_L-‘=-‘:’—~’ I
]Reaaced ]NotReplaceable
IAdvanceCLOCKL Pointer I”
~TestandClear
L Use-Bit ] ¢
[__L_~N°tReplaceable Replaceable
SchedulePage ForCleaning I
Figure 1. CLOCK Replacement Algorithm
scanning frames in circular order. The use-bit is tested and cleared: if the ,bit was set, the frame is recently-used and is not replaced; if the bit was clear, the frame is not-recently-used and is replaceable if the page is clean. If a replaceable page is dirty, then it is scheduled for cleaning and is not replaced. When the CLOCK-scan locates a clean and not-recently-used page, the algorithm halts, leaving the pointer pointing to the chosen frame to mark the starting point for the next scan. Note that a page is never removed from R until it is actually replaced.
Global policies, such as CLOCK, allow all active tasks to compete for main memory allocation. There is no mechanism to determine a task’s memory needs independently of the other tasks. Thus, instead of a load control based on explicit estimates of the main memory committment, global policies typically require an adaptive feedback control mechanism. For example, the control may monitor the page fault rate (or auxiliary memory traffic) and adjust the multiprograrnming level if the rate is too high or too low.
ComparingWS and CLOCK
The WS and CLOCK policies can be compared at many levels. At the implementation level, WS appears to be more complex than CLOCK. In particular, WS requires:
(1) scheduling and executing the WS-scan procedure,
> ReplacePageJ
(2) memory to store LR(p) for every page of every task, and (3) algorithms to maintain the set A of available pages,
including a method to reclaim pages from A.
The need to store LR(p) effectively doubles the size of the page tables. In a system where the total of all task virtual memory is many times the size of main memory, minimization of page tables is an important consideration.
The implementation of the CLOCKreplacement algorithm is simpler and consumes less memory, but CLOCK requires an adaptive feedback load control mechanism that is heuristic and, compared to the WS load control, may be more difficult to tune.
At the policy level, WS appears to have the advantages of good task isolation, and a predictive load control. Under a global policy, a task’s resident set depends on how actively it references its current locality relative to the other active tasks. Mathematical models of global replacement (see [SMrrS0]) show that some tasks can monopolize main memory and force other tasks to execute slowly and inefficiently; global algorithms can also lead tb thrashing [I)F,NN70]. WS isolates tasks from each other and guarantees each active task can acquire sufficient main memory to hold its working set.
At the performance evaluation level, no conclusive comparisons of WS and CLOCK have been performed, Analytical models are too weak to characterize the. differences between local and global
memory management policies in general, or the WS and CLOCK policies in particular. Empirical and simulation studies have not addressed the problem with sufficient completeness. This work makes no claim that either WS or CLOCK is more effective than the other; it is entirely likely that a well-implemented version of either policy will have approximately the same performance if the overhead of computing the policy is eliminated. This widely-held conjecture is supported by studies in [CARR81]. The purpose of dais paper is to present a policy which is as effective as both WS and CLOCK and avoids many of the implementation difficulties of
The WSCLOCK policy combines the best features of WS and CLOCK. It retains the thrashing-preventative load control and task isolation properties of WS, but it eliminates:
(1) the WS-scan,
(2) the space for LR(p) for each task page, (3) the available frame set A, and
(4) the page reclamation procedure.
The WSCLOCK replacement algorithm uses the simple mechanism found in CI.OCKbut does not require an adaptive feedback load control. WSCLOCK is simpler than either WS or Ct,OCK.
Data Slruclures
Main memory frames are arranged in a fixed circular CLOCK-like list. The CLOCK pointer identifies the last frame replaced in the previous CLOCK-SCan. Instead of an LR(p) for all p E P, LR(p) is defined only for the resident pages, p E R, in a storage cell associated with each page frame.
When a page fault occurs, a page read request is placed on a paging queue. When an auxiliary memory device is available, a request for that device is removed and processed; at that time, the replacement algorithm is invoked to obtain a frame containing a clean replaceable page to hold the incoming page.
Replacement Algorithm
The WSCI,OCK replacement algorithm uses the CLOCK scanning method to apply the WS replacement rule as shown in Figure 2.
To examine a frame, WSCLOCK tests and clears the frame use-bit. If the bit was set, I,R(p) is set to the owning task’s I/T. Otherwise, if KT- LR(p) >_8 then the page is renloved from W. A page p is replaceable if either (1) p ¢ W , or (2) the owning task is not active. If a replaceable page is dirty, then it is scheduled for cleaning and not replaced. When WSCLOCK finds a clean replaceable page, it halts, and leaves the pointer at the chosen frame.
I AdvanceCLOCK~: Pointer i
TestandClear Use-Bit I
I SchedulePage I ForCleaning
I Yes I No @e,
Figure 2. WSCLOCK Replacement Algorithm
WSCLoCK eliminates the available page set A because it simply searches for and finds a replaceable page when one is needed. Page reclamation is eliminated because a page is not removed from a task’s R until it is selected for replacement. Pages to be cleaned are placed on the paging queue with the read requests. The queue can be ordered by time of request (FIFO) or by placing reads before writes.
In general, it is unnecessary to remove a page being cleaned from R. The dirty-bit is cleared when the I/O to write the page is initiated; if the page is updated subsequently (even during the I/O) the dirty-bit is reset and the page will be cleaned again before it is replaced. After a page is cleaned, it will be replaced on the next circuit of the CI.OCK pointer unless it is referenced (and reenters W). Alternatively, a list of recently-cleaned pages can be maintained; the replacement algorithm takes a page from this list in preference to performing the CLOCK-SCan.
Load Control
Since WSCLOCK scans only p E R, it does not approximate W if W contains some p ~ R. WSCLOCKapproximates only the resident working set RW = R I”1 W. WSCLOCKload control uses the same rules as WS load control, but using RW = IRWI as the memory commitment of each task. RWactive is defined to be the sum of the RW of the active tasks. When a task is deactivated, all of its pages are eligible for replacement and, thus, RW of a ready task is the value of RW when that task was deactivated.
WSCLOCK detects overcommitment when the CLOCK-scan fails to find a clean replaceable page in a full circuit of the frames. If there are any page cleaning requests on the paging queue, these are processed to produce a clean replaceable page. Otherwise, there are no replaceable pages, either clean or dirty, and some task must be deactivated to relieve overcommitment.
LT/RT Control
Introduction
When a task is activated, it usually enters a loading phase in which it has few p E R and must load missing pages to execute efficiently. When the task has loaded a sufficient resident set, it enters a running phas
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com