Wait-Free Synchronization
MAURICE HERLIHY
Digital Equipment Corporation
Au,a,zt-free implementation ofa concurrent data object is one that guarantees that any process can complete any operation in a finite number of’ steps, regardless of the execution speeds of the other processes. The problem ofconstructinga wait-free implementation of one data object from another lies at the heart of much recent work in concurrent algorithms, concurrent data structures, and multiprocessor architectures. First, we introduce a simple and general technique, based on reduction to a consensus protocol, for proving statements of the form, “there is no wait-free implementation of X by Y .” We derive a hierarchy of objects such that no object at one level has a wait-free implementation in terms of objects at lower levels. In particular, we show that atomic read/write registers, which have been the focus of much recent attention, are at the bottom of the hierarchy: they cannot be used to construct wait-free implementations of many simple and familiar data types. Moreover, classical synchronization primitives such as test&set and fetch&add, while more powerful than read andunte, are also computationally weak, as are the standard message-passing primitives. Second, nevertheless, we show that there do exist simple universal objects from which one can
Copyright By PowCoder代写 加微信 powcoder
construct await-free
Categories and Subject D.3.3 [Programming
implementation
Descriptors: Languages]
[Operating
of any sequential object.
D.1.3 [Programming Techniques] : Concurrent Programming;
rnmg structures; synchronization
General T erms: Additional Key
:LanWage Constructs-abstract data types, Systems]: Process Management—concurrency,
concurrent program- rnutualexclusiorz,
1. INTRODUCTION
Algorithms, Languages, Verification
Words and Phrases: Linearlzability, wait-free synchronization
A concurrent object is a data structure shared by concurrent processes. Algorithms
for implementing concurrent objects lie at the heart of many important problems in concurrent systems. The traditional approach to implementing such objects centers around the use of critical sections: only one process at a time is allowed to operate on the object. Nevertheless, critical sections are poorly suited for asynchronous, fault-tolerant systems: if a faulty process is halted or delayed in a critical section, nonfaulty processes will also be unable to progress. Even in a failure-free system, a process can encounter unexpected delay as a result of a
A preliminary version of this paper appeared in the Proceedings of the 7th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computmg (Aug. 1988), pp. 276-290.
Author’s address: Digital Equipment Corporation Cambridge Research Laboratory, One Kendall Square, Cambridge, MA 02139.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery . T o copy otherwise, or to republish, requires a fee and/or specific permission.
@ 1991 ACM 0164-0925/91/0100-0124 $01.50
ACM TransactIons on Programming Languages and Systems, Vol. 11, No 1, January 1991, Pages 124-149
page fault or cache miss, by exhausting its scheduling quantum, or if it is swapped out. Similar problems arise in heterogeneous architectures, where some processors may be inherently faster than others, and some memory locations may be slower to access.
A wait-free implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds on the other processes. The wait-free condition provides fault-tolerance: no process can be prevented from completing an operation by undetected halting failures of other processes, or by arbitrary variations in their speed. The fundamental problem of wait-free synchronization can be phrased as follows:
Given two concurrent objects X and Y, does there exist a wait-free implementation of X by Y?
It is clear how to show that a wait-free implementation exists: one displays it. Most of the current literature takes this approach. Examples include “atomic” registers from nonatomic “safe” registers [18], complex atomic registers from simpler atomic registers [4, 5, 16, 23, 25, 26, 29, 31], read-modify-write operations from combining networks [11, 15], and typed objects such as queues or sets from simpler objects [14, 19, 20].
It is less clear how to show that such an implementation does not exkt. In the first part of this paper, we propose a simple new technique for proving statements
of the form “there is no wait-free implementation of X by Y .” We derive a hierarchy of objects such that no object at one level can implement any object at higher levels (see Figure 1). The basic idea is the following each object has an associated consensus number, which is the maximum number of processes for which the object can solve a simple consensus problem. In a system of n or more concurrent processes, we show that it is impossible to construct a wait-free implementation of an object with consensus number n from an object with a lower consensus number.
These impossibility results do not by any means imply that wait-free synchro- nization is impossible or infeasible. In the second part of this paper, we show that there exist universal objects from which one can construct a wait-free implementation of any object. We give a simple test for universality, showing that an object is universal in a system of n processes if and only if it has a consensus number greater than or equal to n. In Figure 1, each object at level n is universal for a system of n processes. A machine architecture or programming language is computationally powerful enough to support arbitrary wait-free synchronization if and only if it provides a universal object as a primitive.
Most recent work on wait-free synchronization has focused on the construction
of atomic read/write registers [4, 5, 16, 18, 23, 25, 26, 29, 31]. Our results address a basic question: what are these registers good for? Can they be used to construct wait-free implementations of more complex data structures? We show that atomic registers have few, if any, interesting applications in this area. From a set of atomic registers, we show that it is impossible to construct a wait-free implemen- tation of (1) common data types such as sets, queues, stacks, priority queues, or lists, (2) most if not all the classical synchronization primitives, such as test&set,
ACM Transactions on Programming Languages and Systems, Vol 11, No 1, January 1991.
Wait-Free Synchronization “ 125
Consensus Number
read/write registers
test& set, swap, fetch& add, queue, stack
n-register assignment
memory-to-memory compare&swap,
Fig.1. Impossibility
move and swap, augmented queue, fetch&cons, sticky byte
anduniversality hierarchy.
compare&swap, and fetchd-add, and (3) such simple memory-to-memory
tions as move or memory-to-memory progress in understanding wait-free tion from the conventional read primitives.
opera- swap. These results suggest that further
synchronization requires turning our atten- and write operations to more fundamental
Our results also illustrate inherent limitations of certain multiprocessor archi-
tectures. The NYU
support for wait-free
They use combining
test&set. IBM’s RP3 [8] project is investigating a similar approach. The fetch&add operation is quite flexible: it can be used for semaphores, for highly concurrent queues, and even for database synchronization [11, 14, 30]. Nevertheless, we show that it is not universal, disproving a conjecture of Gottlieb et al. [11]. We also show that message-passing architectures such as hypercubes [28] are not universal either.
This paper is organized as follows. Section 2 defines a model of computation, Section 3 presents impossibility results, Section 4 describes some universal objects, and Section 5 concludes with a summary.
2. THE MODEL
Informally, our model of computation consists of a collection of sequential threads of control called processes that communicate through shared data structures called objects. Each object has a type, which defines a set of possible states and a set of primitive operations that provide the only means to manipulate that object. Each process applies a sequence of operations to objects, issuing an invocation and receiving the associated response. The basic correctness condition for con- current systems is tinearizability [14]: although operations of concurrent processes may overlap, each operation appears to take effect instantaneously at some point between its invocation and response. In particular, operations that do not overlap take effect in their “real-time” order.
2.1 1/0 Automata
Formally, we model objects and processes using a simplified form of 1/0 automata [22]. Because the wait-free condition does not require any fairness or Iiveness conditions, and because we consider only finite sets of processes and objects, we
ACM Transactions on Programmmg and Systems, Vcd 11, No. 1, January 1991
ultracomputer implementations
project [10] has investigated architectural of common synchronization primitives.
networks to implement fetch&add, a generalization of
do not make use of the full power of the 1/0 automata formalism. Nevertheless, simplified 1/0 automata provide a convenient way to describe the basic structure of our model and to give the basic definition of what it means for one object to implement another. For brevity, our later constructions and impossibility results are expressed less formally using pseudocode. It is a straightforward exercise to translate this notation into 1/0 automata.
An 1/0 automaton A is a nondeterministic automaton with the following components+
(1)States (A) is a finite or infinite set of states, including a distinguished set of starting states.
(2) In(A) is a set of input events,
(3) Out(A) is a set of output events,
(4) Int (A ) is a set of internal events,
(5) Step (A) is a transition relation given by a set of triples (s’, e, s), where s
ands’ are states and e is an event. Such a triple is called a step, and it means that an automaton in state s‘ can undergo a transition to state s, and that transition is associated with the event e.
If (s’, e, s) is a step, we say that e is enabled in s‘. 1/0 automata must satisfy the additional condition that inputs cannot be disabled; for each input event e and each state s‘, there exist a state s and a step (s’, e, s).
An execution fragment of an automaton A is a finite sequence so, e], S1, . . . . en, ..
s~ or mfmlte sequence SO,el, S1, . . . of alternating states and events such that each (s,, e,+,, s,+, ) is a step of A. An execution is an execution fragment where SO is a starting state. A history fragment of an automaton is the subsequence of events occurring in an execution fragment, and a history is the subsequence occurring in an execution.
A new 1/0 automaton can be constructed by composing a set of compatible 1/0 automata. (In this paper we consider only finite compositions.) A set of automata are compatible if they share no output or internal events. A state of the composed automaton S is a tuple of component states, and a starting state is a tuple of component starting states. The set of events of S, Euents (S ), is the union of the components’ sets of events. The set of output events of S, Out (S), is the union of the components’ sets of output events; the set of internal events, Int (S ), is the union of the components’ sets of internal events; and the set of input events of S, In(S), is In(S) – Out(S), all the input events of S that are not output events for some component. A triple (s’, e, s ) is in Steps (S) if and only if, for all component automata A, one of the following holds: (1) e is an event of A, and the projection of the step onto A is a step of A, or (2) eis not an event of A, and A‘s state components are identical in s‘ and s. Note that composition is associative. If H is a history of a composite automaton and A a component automaton, H IA denotes the subhistory of H consisting of events of A.
‘T o remain consistent with the terminology of [14] we use “event” where Lynch and Tuttle use “operation,” and “hietory” where they use “schedule.”
ACM Transactions on Programming Languages and Systems, Vol 11, No. 1, January 1991.
Wait-Free Synchronization . 127
2.2 Concurrent Systems
A concurrent system is a set of processes and a set of objects. Processes represent sequential threads of control, and objects represent data structures shared by processes. Aprocess P is an 1/0 automaton with output events INVOKE(P, op, X), where op is an operation’ of object X, and input events RESPOND (P, res, X), where res is a result value. We refer to these events as invocations and responses. Two invocations and responses match if their process and object names agree. To capture the notion that a process represents a single thread of control, we say that a process history is well formed if it begins with an invocation and alternates matching invocations and responses. An invocation is pending if it is not followed by a matching response. An object X has input events INVOKE (P, op, X), where P is a process and op is an operation of the object, and output events RESPOND (P, res, X), where ras is a result value. Process and object names are unique, ensuring that process and object automata are compatible.
A concurrent system [Pl, . . . . P~; Al, . . . . Am} is an 1/0 automaton composed from processes PI, . . . . P. and objects Al, . . . . Am, where processes and objects are composed by identifying corresponding INVOKE and RESPOND events. A history of a concurrent system is well formed if each H I P, is well formed, and a concurrent system is well formed if each of its histories is well-formed, Hence- forth, we restrict our attention to well-formed concurrent systems.
An execution is sequential if its first event is an invocation, and it alternates matching invocations and responses. A history is sequential if it is derived from a sequential execution. (Notice that a sequential execution permits process steps to be interleaved, but at the granularity of complete operations. ) If we restrict our athention to sequential histories, then the behavior of an object can be specified in a particularly simple way: by giving pre- and postconditions for each operation. We refer to such a specification as a sequential specification. In this paper, we consider only objects whose sequential specifications are total: if the object has a pending invocation, then it has a matching enabled response. For example, a partial deq might be undefined when applied to an empty queue, while a total deq would return an exception. We restrict out attention to objects whose operations are total because it is unclear how to interpret the wait-free condition for partial operations. For example, the most natural way to define the effects of a partial deq in a concurrent system is to have it wait until the queue becomes nonempty, a specification that clearly does not admit a wait-free implementation.
Each history H induces a partial “real-time” order <~ on its operations: opo
2.3 Implementations
An implementation of an object A is a concurrent system ~Fl, . . . . F.; R}, where the F, are called front-ends, and R is called the representation object. Informally,
R is the data structure that implements A, and FL is the procedure called by process P, to execute an operation. An object implementation is shown schemat- ically in Figure 2.
(1) The external events of the implementation are just the external events of A: each input event INVOKE(P ,, op, A) of A is an input event of F ,, and each output event RESPOND(P,, res, A ) of A is an output event of F,.
(2) The implementation has the following internal events: each input event INVOKE(F,, op, R ) of R is composed with the matching output event of F,, and each output event RESPOND(FL, res, R ) of R is composed with the matching input event of F,.
(3) T o rule out certain trivial communicate indirectly through R.
Let I, be an implementation
of every system ~Pl, …. P.; Al, …. 11, …. A~}, there exists a history . . .. Pn. Al, A,, ,A,, . . ., A~}, such that Hl{Pl, . . ..P~} =H’1 {P ,,…, Pn}.
An implementation is wait-free if the following are true:
(1) It has no history in which an invocation of Pi remains pending across an infinite number of steps of F,.
(2) If P, has a pending invocation in a states, then there exists a history fragment starting from s, consisting entirely of events of F, and R, that includes the response to that invocation.
The first condition rules out unbounded busy-waiting: a front-end cannot take an infinite number of steps without responding to an invocation. The second condition rules out conditional waiting: F, cannot block waiting for another process to make a condition true. Note that we have not found it necessary to make fairness or liveness assumptions: a wait-free implementation guarantees only that if R eventually responds to all invocations of ~,, then F, will eventually respond to all invocations of P,, independently of process speeds.
An implementation is bounded wait-free if there exists N such that there is no history in which an invocation of P , remains pending across N steps of F ,. ACM Transactions on Programmmg Languages and Systems, Vol. 11, No. 1, .January 1991.
Wait-Free Synchronization s 129
solutions, front-ends share no events; they
of A,. I, is correct, if for every history H
Fig. 2. Schematic view of object implementation.
wait-free implies wait-free, but not vice-versa. We use the wait-free
for impossibility results and the bounded wait-free condition for
constructions.
For brevity, we say that R implements A if there exists a wait-free implemen-
tation {Fl, . . . . F.; R } of A. It is immediate from the definitions that implements
is a reflexive partial order on the universe of objects. In the rest of the paper, we investigate the mathematical structure of the implements relation. In the next section, we introduce a simple technique for proving that one object does not implement another, and in the following section we display some “universal” objects capable of implementing any
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com