程序代写代做代考 concurrency data structure distributed system graph algorithm ER Systems, Networks & Concurrency 2019

Systems, Networks & Concurrency 2019
Safety & Livene7ss Uwe R. Zimmer – The Australian National University

[Ben2006]
[Chandy1983]
Ben-Ari, M
Chandy, K, Misra, Jayadev & Haas, Laura
Principles of Concurrent and Dis- tributed Programming
second edition, Prentice-Hall 2006
Distributed deadlock detection
© 2019 Uwe R. Zimmer, The Australian National University
page 460 of 757 (chapter 7: “Safety & Liveness” up to page 462)
Safety & Liveness
References for this chapter
Transactions on Computer Sys- tems (TOCS) 1983 vol. 1 (2)
[Silberschatz2001]
Silberschatz, Abraham, Gal- vin, Peter & Gagne, Greg Operating System Concepts John Wiley & Sons, Inc., 2001

Safety properties: Liveness properties:
Safety & Liveness
Repetition
Correctness concepts in concurrent systems
Extended concepts of correctness in concurrent systems: ¬ Terminationisoftennotintendedorevenconsideredafailure
(P (I) /Processes (I, S)) & XQ (I, S)
where XQ means that Q does always hold
(P (I) /Processes (I, S)) & oQ (I, S)
where oQ means that Q does eventually hold (and will then stay true)
and S is the current state of the concurrent system
© 2019 Uwe R. Zimmer, The Australian National University page 461 of 757 (chapter 7: “Safety & Liveness” up to page 462)

Examples:
Safety & Liveness
Repetition
Correctness concepts in concurrent systems
Liveness properties:
(P (I) /Processes (I, S)) & oQ (I, S)
where oQ means that Q does eventually hold (and will then stay true)
• Requests need to complete eventually.
• The state of the system needs to be displayed eventually. • No part of the system is to be delayed forever (fairness).
Interesting liveness properties can become very hard to proof
© 2019 Uwe R. Zimmer, The Australian National University page 462 of 757 (chapter 7: “Safety & Liveness” up to page 462)

Liveness properties:
(P (I) /Processes (I, S)) & oQ (I, S)
Safety & Liveness
• Weak fairness: ?4R & ?G … eventually, if a process requests continually.
Liveness
Fairness
where oQ means that Q does eventually hold (and will then stay true) Fairness (as a means to avoid starvation): Resources will be granted …
• Strong fairness: 4?R & ?G … eventually, if a process requests infinitely often.
• Linear waiting: ?R & ?G … before any other process had the same resource granted more than once (common fairness in distributed systems).
• First-in, first-out: ?R & ?G … before any other process which applied for the same resource at a later point in time (common fairness in single-node systems).
© 2019 Uwe R. Zimmer, The Australian National University page 463 of 757 (chapter 7: “Safety & Liveness” up to page 512)

Safety & Liveness
Revisiting
Correctness concepts in concurrent systems
Safety properties: Examples:
(P (I) /Processes (I, S)) & XQ (I, S)
where XQ means that Q does always hold
• Mutual exclusion (no resource collisions) has been addressed • Absence of deadlocks to be addressed now
(and other forms of ‘silent death’ and ‘freeze’ conditions)
• Specified responsiveness or free capabilities Real-time systems (typical in real-time / embedded systems or server applications)
© 2019 Uwe R. Zimmer, The Australian National University page 464 of 757 (chapter 7: “Safety & Liveness” up to page 512)

Safety & Liveness
Deadlocks
Most forms of synchronization may lead to
Deadlocks
(Avoidance / prevention of deadlocks is one central safety property)
How to predict them?
How to find them?
How to resolve them?
… or are there structurally dead-lock free forms of synchronization?
© 2019 Uwe R. Zimmer, The Australian National University page 465 of 757 (chapter 7: “Safety & Liveness” up to page 512)

process P1;
statement X;
var reserve_1, reserve_2 : semaphore := 1; process P2;
wait (reserve_1);
wait (reserve_2);
wait (reserve_2);
wait (reserve_1);
statement Y; — employ all resources
signal (reserve_2);
signal (reserve_1);
statement B; — employ all resources
signal (reserve_1);
signal (reserve_2);
statement Z; end P1;
statement C; end P2;
Sequence of operations: A
© 2019 Uwe R. Zimmer, The Australian National University
C; X
Z; 6X,Z ; A,B,C@; 6A,C ; X,Y,Z@; J6B ; Y@
page 466 of 757 (chapter 7: “Safety & Liveness” up to page 512)
< B < or: 6A ; X@ followed by a deadlock situation. Safety & Liveness Towards synchronization Reserving resources in reverse order < Y < statement A; process P1; statement X; process P2; statement A; process P3; statement K; wait (reserve_1); wait (reserve_2); wait (reserve_2); wait (reserve_3); wait (reserve_3); wait (reserve_1); statement Y; statement B; statement L; signal (reserve_2); signal (reserve_1); signal (reserve_3); signal (reserve_2); signal (reserve_1); signal (reserve_3); statement Z; end P1; statement C; end P2; statement M; end P3; Sequence of operations: A < B < C; X < Y < Z; K < L < Safety & Liveness Towards synchronization Circular dependencies var reserve_1, reserve_2, reserve_3 : semaphore := 1; M; 6X,Z ; A,B,C ; K,M@; 6A,C ; X,Y,Z ; K,M@; 6A,C ; K,L,M ; X,Z@; J6B ; Y ; L@ or: 6A ; X ; K@ followed by a deadlock situation. © 2019 Uwe R. Zimmer, The Australian National University page 467 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Necessary deadlock conditions: 1. Mutual exclusion: resources cannot be used simultaneously. © 2019 Uwe R. Zimmer, The Australian National University page 468 of 757 (chapter 7: “Safety & Liveness” up to page 512) 1. Mutual exclusion: resources cannot be used simultaneously. Safety & Liveness Deadlocks Necessary deadlock conditions: 2. Hold and wait: a process applies for a resource, while it is holding another resource (sequential requests). © 2019 Uwe R. Zimmer, The Australian National University page 469 of 757 (chapter 7: “Safety & Liveness” up to page 512) 1. Mutual exclusion: resources cannot be used simultaneously. Safety & Liveness Deadlocks Necessary deadlock conditions: 2. Hold and wait: a process applies for a resource, while it is holding another resource (sequential requests). 3. No pre-emption: resources cannot be pre-empted; only the process itself can release resources. © 2019 Uwe R. Zimmer, The Australian National University page 470 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Necessary deadlock conditions: 1. Mutual exclusion: resources cannot be used simultaneously. 2. Hold and wait: a process applies for a resource, while it is holding another resource (sequential requests). 3. No pre-emption: resources cannot be pre-empted; only the process itself can release resources. 4. Circular wait: a ring list of processes exists, where every process waits for release of a resource by the next one. © 2019 Uwe R. Zimmer, The Australian National University page 471 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Necessary deadlock conditions: 1. Mutual exclusion: resources cannot be used simultaneously. 2. Hold and wait: a process applies for a resource, while it is holding another resource (sequential requests). 3. No pre-emption: resources cannot be pre-empted; only the process itself can release resources. 4. Circular wait: a ring list of processes exists, where every process waits for release of a resource by the next one. A system may become deadlocked, if all these conditions apply! © 2019 Uwe R. Zimmer, The Australian National University page 472 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness • Deadlock detection & recovery find deadlocked processes and recover the system in a coordinated way Deadlocks Deadlock strategies: • Ignorance & restart Kill or restart unresponsive processes, power-cycle the computer, ... • Deadlock avoidance the resulting system state is checked before any resources are actually assigned • Deadlock prevention the system prevents deadlocks by its structure © 2019 Uwe R. Zimmer, The Australian National University page 473 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Deadlock prevention (Remove one of the four necessary deadlock conditions) 1. Break Mutual exclusion: Mutual exclusion Hold and wait No pre-emption Circular wait © 2019 Uwe R. Zimmer, The Australian National University page 474 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Deadlock prevention (Remove one of the four necessary deadlock conditions) 1. Break Mutual exclusion: By replicating critical resources, mutual exclusion becomes un- necessary (only applicable in very specific cases). Mutual exclusion Hold and wait No pre-emption Circular wait 2. Break Hold and wait: © 2019 Uwe R. Zimmer, The Australian National University page 475 of 757 (chapter 7: “Safety & Liveness” up to page 512) 1. Break Mutual exclusion: By replicating critical resources, mutual exclusion becomes un- necessary (only applicable in very specific cases). Mutual exclusion Hold and wait No pre-emption Circular wait 2. Break Hold and wait: Allocation of all required resources in one request. Processes can either hold none or all of their required resources. 3. Introduce Pre-emption: : Safety & Liveness Deadlocks Deadlock prevention (Remove one of the four necessary deadlock conditions) © 2019 Uwe R. Zimmer, The Australian National University page 476 of 757 (chapter 7: “Safety & Liveness” up to page 512) 1. Break Mutual exclusion: By replicating critical resources, mutual exclusion becomes un- necessary (only applicable in very specific cases). Mutual exclusion Hold and wait No pre-emption Circular wait 2. Break Hold and wait: Allocation of all required resources in one request. Processes can either hold none or all of their required resources. Safety & Liveness Deadlocks Deadlock prevention (Remove one of the four necessary deadlock conditions) 3. Introduce Pre-emption: Provide the additional infrastructure to allow for pre-emption of resources. Mind that re- sources cannot be pre-empted, if their states cannot be fully stored and recovered. 4. Break Circular waits: © 2019 Uwe R. Zimmer, The Australian National University page 477 of 757 (chapter 7: “Safety & Liveness” up to page 512) 1. Break Mutual exclusion: By replicating critical resources, mutual exclusion becomes un- necessary (only applicable in very specific cases). Mutual exclusion Hold and wait No pre-emption Circular wait 2. Break Hold and wait: Allocation of all required resources in one request. Processes can either hold none or all of their required resources. Safety & Liveness Deadlocks Deadlock prevention (Remove one of the four necessary deadlock conditions) 3. Introduce Pre-emption: Provide the additional infrastructure to allow for pre-emption of resources. Mind that re- sources cannot be pre-empted, if their states cannot be fully stored and recovered. 4. Break Circular waits: E.g. order all resources globally and restrict processes to request resources in that order only. © 2019 Uwe R. Zimmer, The Australian National University page 478 of 757 (chapter 7: “Safety & Liveness” up to page 512) RAG = "V,E,; Resource allocation graphs consist of vertices V and edges E. V = P jR; Vertices V can be processes P or Resource types R. Pi with processes P = "P1,f,Pn, and resources types R = "R1,fRk, requests Safety & Liveness Deadlocks Pi Resource Allocation Graphs holds (Silberschatz, Galvin & Gagne) E = Ec jEr jEa; Edges E can be “claims” Ec, “requests” Er or “assignments” Ea with claims Ec = $Pi " Rj,f. requests Er = $Pi " Rj,f. and assignments Ea = $Rj " Pi,f. Rj Pi Note: any resource type Rj can have more than one instance of a resource. © 2019 Uwe R. Zimmer, The Australian National University page 479 of 757 (chapter 7: “Safety & Liveness” up to page 512) Rj claims Rj Resource Allocation Graphs © 2019 Uwe R. Zimmer, The Australian National University page 480 of 757 (chapter 7: “Safety & Liveness” up to page 512) (Silberschatz, Galvin & Gagne) R1 Safety & Liveness Deadlocks P1 P2 R2 Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 Two process, reverse allocation deadlock: P1 P2 R2 Safety & Liveness Deadlocks © 2019 Uwe R. Zimmer, The Australian National University page 481 of 757 (chapter 7: “Safety & Liveness” up to page 512) Resource Allocation Graphs © 2019 Uwe R. Zimmer, The Australian National University page 482 of 757 (chapter 7: “Safety & Liveness” up to page 512) (Silberschatz, Galvin & Gagne) R1 R3 Safety & Liveness Deadlocks P1 P2 P3 R2 Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 R3 No circular dependency no deadlock: P1 P2 P3 R2 Safety & Liveness Deadlocks © 2019 Uwe R. Zimmer, The Australian National University page 483 of 757 (chapter 7: “Safety & Liveness” up to page 512) Resource Allocation Graphs © 2019 Uwe R. Zimmer, The Australian National University page 484 of 757 (chapter 7: “Safety & Liveness” up to page 512) (Silberschatz, Galvin & Gagne) R1 R3 Safety & Liveness Deadlocks P1 P2 P3 R2 Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 R3 Two circular dependencies deadlock: P1 "R1 "P2 "R3 "P3 "R2 "P1 as well as: P2 " R3 " P3 " R2 " P2 P1 P2 P3 R2 Derived rule: If some processes are deadlocked then there are cycles in the resource allocation graph. Safety & Liveness Deadlocks © 2019 Uwe R. Zimmer, The Australian National University page 485 of 757 (chapter 7: “Safety & Liveness” up to page 512) Edge Chasing (for the distributed version see Chandy, Misra & Haas) 6blocking processes: R1 R3 Send a probe to all requested yet unassigned resources con- taining ids of: [the blocked, the sending, the targeted node]. P1 P2 P3 R2 6nodes on probe reception: Propagate the probe to all processes holding the critical resources or to all requested yet unassigned resources – while updating the second and third entry in the probe. 7a process receiving its own probe: (blocked-id = targeted-id) Circular dependency detected. © 2019 Uwe R. Zimmer, The Australian National University page 486 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 R3 P1 P2 P3 Knowledge of claims: Claims are potential future requests which have no blocking ef- Safety & Liveness Deadlocks fect on the claiming process – while actual requests are blocking. R2 © 2019 Uwe R. Zimmer, The Australian National University page 487 of 757 (chapter 7: “Safety & Liveness” up to page 512) Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 R3 Assignment of resources such that circular dependencies are avoided: P1 P2 R2 P3 © 2019 Uwe R. Zimmer, The Australian National University page 488 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Resource Allocation Graphs (Silberschatz, Galvin & Gagne) Earlier derived rule: R1 R3 If some processes are deadlocked then there are cycles in the resource allocation graph. P1 P2 P3 R2 Reverse rule for multiple instances: If there are cycles in the resource allocation graph and there are multiple instances per resource then the involved processes are potentially deadlocked. Reverse rule for single instances: If there are cycles in the resource allocation graph and there is exactly one instance per resource then the involved processes are deadlocked. © 2019 Uwe R. Zimmer, The Australian National University page 489 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 Reverse rule for single instances: P1 P2 R2 If there are cycles in the resource allocation graph and there is exactly one instance per resource then the involved processes are deadlocked. Actual deadlock identified © 2019 Uwe R. Zimmer, The Australian National University page 490 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 R3 Reverse rule for multiple instances: P1 P2 P3 R2 If there are cycles in the resource allocation graph and there are multiple instances per resource then the involved processes are potentially deadlocked. Potential deadlock identified © 2019 Uwe R. Zimmer, The Australian National University page 491 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 R3 Reverse rule for multiple instances: P1 P2 P3 R2 If there are cycles in the resource allocation graph and there are multiple instances per resource then the involved processes are potentially deadlocked. Potential deadlock identified – yet clearly not an actual deadlock here © 2019 Uwe R. Zimmer, The Australian National University page 492 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks P4 Resource Allocation Graphs (Silberschatz, Galvin & Gagne) R1 R3 How to detect actual deadlocks in the general case? (multiple instances per resource) P1 P2 R2 P3 © 2019 Uwe R. Zimmer, The Australian National University page 493 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks • Claimed [i, j] • Requested [i, j] • Completed [i] • Simulated_Free [j] Safety & Liveness Deadlocks Banker’s Algorithm There are processes Pi ! {P1,f,Pn} and resource types Rj ! {R1,f,Rm} and data structures: • Allocated [i, j] • Free [j] the number of resources of type j currently allocated to process i. the number of currently available resources of type j. the number of resources of type j required by process i eventually. the number of currently requested resources of type j by process i. boolean vector indicating processes which may complete. Number of available resources assuming that complete processes deallocate their resources. © 2019 Uwe R. Zimmer, The Australian National University page 494 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Banker’s Algorithm 1. Simulated_Free % Free; 6i: Completed [i] % False; 2.While 7i: JCompleted [i] and 6j: Requested [i, j] < Simulated_Free [j] do: 6j: Simulated_Free [j] % Simulated_Free [j] + Allocated [i, j]; Completed [i] % True; 3. If 6i: Completed [i] then the system is currently deadlock-free! else all processes i with JCompleted [i] are involved in a deadlock!. © 2019 Uwe R. Zimmer, The Australian National University page 495 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Deadlocks Banker’s Algorithm 1. Simulated_Free % Free; 6i: Completed [i] % False; 2.While 7i: JCompleted [i] and 6j: Claimed [i, j] < Simulated_Free [j] do: 6j: Simulated_Free [j] % Simulated_Free [j] + Allocated [i, j]; Completed [i] % True; 3. If 6i: Completed [i] then the system is safe! A safe system is a system in which future deadlocks can be avoided assuming the current set of available resources. © 2019 Uwe R. Zimmer, The Australian National University page 496 of 757 (chapter 7: “Safety & Liveness” up to page 512) if (Request < Claimed) and (Request < Free) then Free := Free - Request; Claimed := Claimed - Request; Allocated := Allocated + Request; if System_is_safe (checked by e.g. Banker’s algorithm) then Grant request else Safety & Liveness Deadlocks Banker’s Algorithm Check potential future system safety by simulating a granted request: (Deadlock avoidance) Restore former system state: (Free, Claimed, Allocated) end if; end if; © 2019 Uwe R. Zimmer, The Australian National University page 497 of 757 (chapter 7: “Safety & Liveness” up to page 512) (keeping most processes close to the resources they require). Organize the nodes in an adequate topology (e.g. a tree). Safety & Liveness Check for deadlock inside nodes with blocked resource requests and detect/avoid local deadlock immediately. Exchange resource status information between nodes occasionally and detect global deadlocks eventually. Deadlocks Distributed deadlock detection Observation: Deadlock detection methods like Banker’s Algorithm are too communication intensive to be commonly applied in full and at high frequency in a distributed system. Therefore a distributed version needs to: Split the system into nodes of reasonable locality © 2019 Uwe R. Zimmer, The Australian National University page 498 of 757 (chapter 7: “Safety & Liveness” up to page 512) A deadlock has been detected now what? Breaking the circular dependencies can be done by: Either pre-empt an assigned resource which is part of the deadlock. or stop a process which is part of the deadlock. Safety & Liveness Deadlocks Deadlock recovery Usually neither choice can be implemented ‘gracefully’ and deals only with the symptoms. Deadlock recovery does not address the reason for the problem! (i.e. the deadlock situation can re-occur again immediately) © 2019 Uwe R. Zimmer, The Australian National University page 499 of 757 (chapter 7: “Safety & Liveness” up to page 512) • Deadlock avoidance System state is checked with every resource assignment. Safety & Liveness Deadlocks Deadlock strategies: • Deadlock prevention System prevents deadlocks by its structure or by full verification More generally applicable, yet computationally very expensive. • Deadlock detection & recovery Detect deadlocks and break them in a ‘coordinated’ way. Less computationally expensive (as lower frequent), yet usually ‘messy’. • Ignorance & random kill Kill or restart unresponsive processes, power-cycle the computer, ... More of a panic reaction than a method. © 2019 Uwe R. Zimmer, The Australian National University page 500 of 757 (chapter 7: “Safety & Liveness” up to page 512) The best approach if applicable. Definitions of atomicity: An operation is atomic if the processes performing it ... • (by communication) ... do not communicate with other processes while the atomic operation is performed. • (by means of states) ... cannot detect any outside state change and do not reveal their own state changes until the atomic operation is complete. Short: Safety & Liveness Atomic & idempotent operations Atomic operations • (by ‘awareness’) ... are not aware of the existence of any other active process, and no other active process is aware of the activity of the processes during the time the processes are performing the atomic operation. An atomic operation can be considered to be indivisible and instantaneous. © 2019 Uwe R. Zimmer, The Australian National University page 501 of 757 (chapter 7: “Safety & Liveness” up to page 512) Indivisible phases Commitment times Atomic Operations Flow of tasks Safety & Liveness Atomic & idempotent operations Atomic operations 0 5 10 15 20 25 30 35 40 45 time © 2019 Uwe R. Zimmer, The Australian National University page 502 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Atomic & idempotent operations Atomic operations Important implications: 1. An atomic operation is either performed in full or not at all. 2. A failed atomic operation cannot have any impact on its surroundings (must keep or re-instantiate the full initial state). 3. If any part of an atomic operation fails, then the whole atomic operation is declared failed. 4. All parts of an atomic operations (including already completed parts) must be prepared to declare failure until the final global commitment. © 2019 Uwe R. Zimmer, The Australian National University page 503 of 757 (chapter 7: “Safety & Liveness” up to page 512) Definition of idempotent operations: An operation is idempotent if the observable effect of the oper- ation are identical for the cases of executing the operation: • once, • multipletimes, • infinitelyoften. Observations: • Idempotent operations are often atomic, but do not need to be. • Atomic operations do not need to be idempotent. • Idempotent operations can ease the requirements for synchronization. Safety & Liveness Atomic & idempotent operations Idempotent operations © 2019 Uwe R. Zimmer, The Australian National University page 504 of 757 (chapter 7: “Safety & Liveness” up to page 512) Reliability ::= measure of success with which a system conforms to its specification. Failure Error Fault ::= a deviation of a system from its specification. ::= the system state which leads to a failure. ::= the reason for an error. ::= low failure rate. Safety & Liveness Reliability, failure & tolerance ‘Terminology of failure’ or ‘Failing terminology’? © 2019 Uwe R. Zimmer, The Australian National University page 505 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Reliability, failure & tolerance Faults during different phases of design • Inconsistent or inadequate specifications frequent source for disastrous faults • Software design errors • Component & communication system failures © 2019 Uwe R. Zimmer, The Australian National University page 506 of 757 (chapter 7: “Safety & Liveness” up to page 512) frequent source for disastrous faults rare and mostly predictable Safety & Liveness Reliability, failure & tolerance Faults in the logic domain • Non-termination / -completion Systems ‘frozen’ in a deadlock state, blocked for missing input, or in an infinite loop Watchdog timers required to handle the failure • Range violations and other inconsistent states Run-time environment level exception handling required to handle the failure • Value violations and other wrong results User-level exception handling required to handle the failure © 2019 Uwe R. Zimmer, The Australian National University page 507 of 757 (chapter 7: “Safety & Liveness” up to page 512) • Transient faults • Intermittent faults • Permanent faults Single ‘glitches’, interference, ... very hard to handle Faults of a certain regularity ... require careful analysis Faults which stay ... the easiest to find © 2019 Uwe R. Zimmer, The Australian National University page 508 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Reliability, failure & tolerance Faults in the time domain fail never fail uncontrolled © 2019 Uwe R. Zimmer, The Australian National University page 509 of 757 (chapter 7: “Safety & Liveness” up to page 512) too early too late never (omission) Constraint Value Time domain Value domain fail silent fail stop fail controlled Safety & Liveness Reliability, failure & tolerance Observable failure modes Failure modes error error Safety & Liveness Reliability, failure & tolerance Fault prevention, avoidance, removal, ... and / or Fault tolerance © 2019 Uwe R. Zimmer, The Australian National University page 510 of 757 (chapter 7: “Safety & Liveness” up to page 512) • Full fault tolerance Safety & Liveness Reliability, failure & tolerance Fault tolerance the system continues to operate in the presence of ‘foreseeable’ error conditions , without any significant loss of functionality or performance — even though this might reduce the achievable total operation time. • Graceful degradation (fail soft) the system continues to operate in the presence of ‘foreseeable’ error conditions, • Fail safe Full fault tolerance is not maintainable for an infinite operation time! while accepting a partial loss of functionality or performance. the system halts and maintains its integrity. Graceful degradation might have multiple levels of reduced functionality. © 2019 Uwe R. Zimmer, The Australian National University page 511 of 757 (chapter 7: “Safety & Liveness” up to page 512) • Liveness • Fairness • Safety • Deadlockdetection • Deadlockavoidance • Deadlockprevention • Atomic & Idempotent operations • Definitions & implications • Failure modes • Definitions, fault sources and basic fault tolerance © 2019 Uwe R. Zimmer, The Australian National University page 512 of 757 (chapter 7: “Safety & Liveness” up to page 512) Safety & Liveness Summary Safety & Liveness