Lottery Scheduling: Flexible Proportional-Share Resource Management
Carl A. Waldspurger . Weihl MIT Laboratory for Computer Science
This paper presents lottery scheduling, a novel randomized resource allocation mechanism. Lottery scheduling provides ef- ficient, responsive control over the relative execution rates of computations. Such control is beyond the capabilities of conven- tional schedulers, and is desirable in systems that service requests of varying importance, such as databases, media-based applica- tions, and networks. Lottery scheduling also supports modular resource management by enabling concurrent modules to insulate their resource allocation policies from one another. A currency ab- straction is introduced to flexibly name, share, and protect resource rights. We also show that lottery scheduling can be generalized to manage many diverse resources, such as I/O bandwidth, mem- ory, and access to locks. We have implemented a prototype lottery scheduler for the Mach 3.0 microkernel, and found that it provides flexible and responsive control over the relative execution rates of a wide range of applications. The overhead imposed by our unoptimized prototype is comparable to that of the standard Mach timesharing policy.
1 Introduction
Copyright By PowCoder代写 加微信 powcoder
Scheduling computations in multithreaded systems is a complex, challenging problem. Scarce resources must be multiplexed to service requests of varying importance, and the policy chosen to manage this multiplexing can have an enormous impact on throughput and response time. Accu- rate control over the quality of service provided to users and applications requires support for specifying relative computation rates. Such control is desirable across a wide spectrum of systems. For long-running computations such as scientific applications and simulations, the consumption of computing resources that are shared among users and ap- plications of varying importance must be regulated [Hel93]. For interactive computations such as databases and media- based applications, programmers and users need the ability
E-mail: fcarl, World Wide Web: http://www.psg.lcs.mit.edu/. The first author was supported in part by an AT&T USL Fellowship and by a grant from the MIT X Con- sortium. Prof. Weihl is currently supported by DEC while on sabbatical at DEC SRC. This research was also supported by ARPA under contract N00014-94-1-0985, by grants from AT&T and IBM, and by an equipment grant from DEC. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. government.
to rapidly focus available resources on tasks that are cur- rently important [Dui90].
Few general-purpose schemes even come close to sup- porting flexible, responsive control over service rates. Those that do exist generally rely upon a simple notion of priority that does not provide the encapsulation and modu- larity properties required for the engineering of large soft- ware systems. In fact, with the exception of hard real-time systems, it has been observed that the assignment of pri- orities and dynamic priority adjustment schemes are often ad-hoc [Dei90]. Even popular priority-based schemes for CPU allocation such as decay-usage scheduling are poorly understood, despite the fact that they are employed by nu- merous operating systems, including Unix [Hel93].
Existing fair share schedulers [Hen84, Kay88] and mi- croeconomic schedulers [Fer88, Wal92] successfully ad- dress some of the problems with absolute priority schemes. However, the assumptions and overheads associated with these systems limit them to relatively coarse control over long-running computations. Interactive systems require rapid, dynamic control over scheduling at a time scale of milliseconds to seconds.
We have developed lottery scheduling, a novel ran- domized mechanism that provides responsive control over the relative execution rates of computations. Lottery scheduling efficiently implements proportional-share re- source management — the resource consumption rates of active computations are proportional to the relative shares that they are allocated. Lottery scheduling also provides excellent support for modular resource management. We have developed a prototype lottery scheduler for the Mach 3.0 microkernel, and found that it provides efficient, flexible control over the relative execution rates of compute-bound tasks, video-based applications, and client-server interac- tions. This level of control is not possible with current operating systems, in which adjusting scheduling parame- ters to achieve specific results is at best a black art.
Lottery scheduling can be generalized to manage many diverse resources, such as I/O bandwidth, memory, and access to locks. We have developed a prototype lottery- scheduled mutex implementation, and found that it provides flexible control over mutex acquisition rates. A variant of lottery scheduling can also be used to efficiently manage space-shared resources such as memory.
Cambridge, MA 02139 USA
In the next section, we describe the basic lottery schedul- ing mechanism. Section 3 discusses techniques for modular resource management based on lottery scheduling. Imple- mentation issues and a description of our prototype are presented in Section 4. Section 5 discusses the results of several quantitative experiments. Generalizations of the lottery scheduling approach are explored in Section 6. In Section 7, we examine related work. Finally, we summarize our conclusions in Section 8.
The number of lotteries required for a client’s first win
has a geometric distribution. The expected number of
lotteries n that a client must wait before its first win is
E n 1p, with variance 2 1 pp2. Thus, a n
client’s average response time is inversely proportional to its ticket allocation. The properties of both binomial and geometric distributions are well-understood [Tri82].
With a scheduling quantum of 10 milliseconds (100 lot- teries per second), reasonable fairness can be achieved over subsecond time intervals. As computation speeds continue to increase, shorter time quanta can be used to further im- prove accuracy while maintaining a fixed proportion of scheduler overhead.
Since any client with a non-zero number of tickets will eventually win a lottery, the conventional problem of star- vation does not exist. The lottery mechanism also operates fairly when the number of clients or tickets varies dynami- cally. For each allocation, every client is given a fair chance of winning proportional to its share of the total number of tickets. Since any changes to relative ticket allocations are immediately reflected in the next allocation decision, lottery scheduling is extremely responsive.
Lottery Scheduling
Lottery scheduling is a randomized resource allocation mechanism. Resource rights are represented by lottery tickets.1 Each allocation is determined by holding a lot- tery; the resource is granted to the client with the winning ticket. This effectively allocates resources to competing clients in proportion to the number of tickets that they hold.
2.1 Resource Rights
Lottery tickets encapsulate resource rights that are ab- stract, relative, and uniform. They are abstract because they quantify resource rights independently of machine details. Lottery tickets are relative, since the fraction of a resource that they represent varies dynamically in proportion to the contention for that resource. Thus, a client will obtain more of a lightly contended resource than one that is highly contended; in the worst case, it will receive a share propor- tional to its share of tickets in the system. Finally, tickets are uniform because rights for heterogeneous resources can be homogeneously represented as tickets. These properties of lottery tickets are similar to those of money in computa- tional economies [Wal92].
2.2 Lotteries
Scheduling by lottery is probabilistically fair. The ex- pected allocation of resources to clients is proportional to the number of tickets that they hold. Since the scheduling algorithm is randomized, the actual allocated proportions are not guaranteed to match the expected proportions ex- actly. However, the disparity between them decreases as the number of allocations increases.
The number of lotteries won by a client has a binomial
distribution. The probability p that a client holding t tickets
will win a given lottery with a total of T tickets is simply
p tT . After n identical lotteries, the expected number of
winsw isE w np,withvariance 2 np1 p.The w
Modular Resource Management
coefficient of variation for the observed proportion of wins
The explicit representation of resource rights as lot- tery tickets provides a convenient substrate for modular resource management. Tickets can be used to insulate the resource management policies of independent modules, be- cause each ticket probabilistically guarantees its owner the right to a worst-case resource consumption rate. Since lot- tery tickets abstractly encapsulate resource rights, they can also be treated as first-class objects that may be transferred in messages.
This section presents basic techniques for implementing resource management policies with lottery tickets. Detailed examples are presented in Section 5.
3.1 Ticket Transfers
Ticket transfers are explicit transfers of tickets from one client to another. Ticket transfers can be used in any situ- ation where a client blocks due to some dependency. For example, when a client needs to block pending a reply from an RPC, it can temporarily transfer its tickets to the server on which it is waiting. This idea also conveniently solves the conventional priority inversion problem in a manner similar to priority inheritance [Sha90]. Clients also have the ability to divide ticket transfers across multiple servers on which they may be waiting.
3.2 Ticket Inflation
Ticket inflation is an alternative to explicit ticket transfers in which a client can escalate its resource rights by creating more lottery tickets. In general, such inflation should be
p Ew
is proportional to its ticket allocation, with accuracy that
1A single physical ticket may represent any number of logical tick- ets. This is similar to monetary notes, which may be issued in different denominations.
. Thus, a client’s throughput
improves with
disallowed, since it violates desirable modularity and load insulation properties. For example, a single client could easily monopolize a resource by creating a large number of lottery tickets. However, ticket inflation can be very use- ful among mutually trusting clients; inflation and deflation can be used to adjust resource allocations without explicit communication.
3.3 Ticket Currencies
In general, resource management abstraction barriers are desirable across logical trust boundaries. Lottery schedul- ing can easily be extended to express resource rights in units that are local to each group of mutually trusting clients. A unique currency is used to denominate tickets within each trust boundary. Each currency is backed, or funded, by tickets that are denominated in more primitive currencies. Currency relationships may form an arbitrary acyclic graph, such as a hierarchy of currencies. The effects of inflation can be locally contained by maintaining an exchange rate between each local currency and a base currency that is conserved. The currency abstraction is useful for flexibly naming, sharing, and protecting resource rights. For exam- ple, an access control list associated with a currency could specify which principals have permission to inflate it by creating new tickets.
3.4 Compensation Tickets
total = 20
random [0 .. 19] = 15
Σ = 10 Σ > 15?
A client which consumes only a fraction f
located resource quantum can be granted a compensation ticket that inflates its value by 1f until the client starts its next quantum. This ensures that each client’s resource con- sumption, equal to f times its per-lottery win probability p, is adjusted by 1f to match its allocated share p. Without compensation tickets, a client that does not consume its en- tire allocated quantum would receive less than its entitled share of the processor.
4 Implementation
We have implemented a prototype lottery scheduler by modifying the Mach 3.0 microkernel (MK82) [Acc86, Loe92] on a 25MHz MIPS-based DECStation 5000/125. Full support is provided for ticket transfers, ticket inflation, ticket currencies, and compensation tickets.2 The schedul- ing quantum on this platform is 100 milliseconds.
4.1 Random Numbers
An efficient lottery scheduler requires a fast way to gen- erate uniformly-distributed random numbers. We have im- plemented a pseudo-random number generator based on the
2 Our first lottery scheduler implementation, developed for the Prelude [Wei91] runtime system, lacked support for ticket transfers and currencies.
Figure 1: Example Lottery. Five clients compete in a list-based lottery with a total of 20 tickets. The fifteenth ticket is randomly selected, and the client list is searched for the winner. A running ticket sum is accumulated until the winning ticket value is reached. In this example, the third client is the winner.
Park-Miller algorithm [Par88, Car90] that executes in ap- proximately 10 RISC instructions. Our assembly-language implementation is listed in Appendix A.
4.2 Lotteries
A straightforward way to implement a centralized lot- tery scheduler is to randomly select a winning ticket, and then search a list of clients to locate the client holding that ticket. This requires a random number generation and O n operations to traverse a client list of length n, accumulating a running ticket sum until it reaches the winning value. An example list-based lottery is presented in Figure 1.
Various optimizations can reduce the average number of clients that must be examined. For example, if the distri- bution of tickets to clients is uneven, ordering the clients by decreasing ticket counts can substantially reduce the av- erage search length. Since those clients with the largest number of tickets will be selected most frequently, a simple “move to front” heuristic can be very effective.
For large n, a more efficient implementation is to use a tree of partial ticket sums, with clients at the leaves. To locate the client holding a winning ticket, the tree is traversed starting at the root node, and ending with the winning client leaf node, requiring only O lg n operations. Such a tree-based implementation can also be used as the basis of a distributed lottery scheduler.
4.3 Interface
The kernel representation of tickets and currencies is depicted in Figure 2. A minimal lottery scheduling interface is exported by the microkernel. It consists of operations to create and destroy tickets and currencies, operations to fund and unfund a currency (by adding or removing a ticket from its list of backing tickets), and operations to compute the current value of tickets and currencies in base units.
Our lottery scheduling policy co-exists with the standard timesharing and fixed-priority policies. A few high-priority threads (such as the Ethernet driver) created by the Unix server (UX41) remain at their original fixed priorities.
of its al-
Σ = 12 Σ > 15?
Σ = 17 Σ > 15?
amount currency
list of backing tickets
unique name
active amount
list of issued tickets
Figure 2: Kernel Objects. A ticket object contains an amount denominated in some currency. A currency object contains a name, a list of tickets that back the currency, a list of all tickets issued in the currency, and an active amount sum for all issued tickets.
4.4 Ticket Currencies
Our prototype uses a simple scheme to convert ticket amounts into base units. Each currency maintains an ac- tive amount sum for all of its issued tickets. A ticket is active while it is being used by a thread to compete in a lottery. When a thread is removed from the run queue, its tickets are deactivated; they are reactivated when the thread rejoins the run queue.3 If a ticket deactivation changes a currency’s active amount to zero, the deactivation propa- gates to each of its backing tickets. Similarly, if a ticket activation changes a currency’s active amount from zero, the activation propagates to each of its backing tickets.
A currency’s value is computed by summing the value of its backing tickets. A ticket’s value is computed by multi- plying the value of the currency in which it is denominated by its share of the active amount issued in that currency. The value of a ticket denominated in the base currency is defined to be its face value amount. An example currency graph with base value conversions is presented in Figure 3. Currency conversions can be accelerated by caching values or exchange rates, although this is not implemented in our prototype.
Our scheduler uses the simple list-based lottery with a move-to-front heuristic, as described earlier in Section 4.2. To handle multiple currencies, a winning ticket value is selected by generating a random number between zero and the total number of active tickets in the base currency. The run queue is then traversed as described earlier, except that the running ticket sum accumulates the value of each thread’s currency in base units until the winning value is reached.
3 A blocked thread may transfer its tickets to another thread that will actively use them. For example, a thread blocked pending a reply from an RPC transfers its tickets to the server thread on which it is waiting.
Figure 3: Example Currency Graph. Two users compete for computing resources. Alice is executing two tasks: task1 is cur- rently inactive, and task2 has two runnable threads. Bob is exe- cuting one single-threaded task, task3. The current values in base units for the runnable threads are thread2 = 400, thread3 = 600, and thread4 = 2000. In general, currencies can also be used for groups of users or applications, and currency relationships may form an acyclic graph instead of a strict hierarchy.
4.5 Compensation Tickets
As discussed in Section 3.4, a thread which consumes only a fraction f of its allocated time quantum is automati- cally granted a compensation ticket that inflates its value by 1f until the thread starts its next quantum. This is consis- tent with proportional sharing, and permits I/O-bound tasks that use few processor cycles to start quickly.
For example, suppose threads A and B each hold tickets valued at 400 base units. Thread A always consumes its entire 100 millisecond time quantum, while thread B uses only 20 milliseconds before yielding the processor. Since both A and B have equal funding, they are equally likely to win a lottery when both compete for the processor. How- ever, thread B uses only f 15 of its allocated time, allowing thread A to consume five times as much CPU, in violation of their 1 : 1 allocation ratio. To remedy this situation, thread B is granted a compensation ticket valued at 1600 base units when it yields the processor. When B next competes for the processor, its total funding will be 400f 2000 base units. Thus, on average B will win the processor lottery five times as often as A, each time consuming 15 as much of its quantum as A, achieving the desired 1 : 1 allocation ratio.
4.6 Ticket Transfers
The mach msg system call was modified to temporarily transfer tickets from client to server for synchronous RPCs. This automatically redirects resource rights from a blocked client to the
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com