代写代考 SWEN90010 – High Integrity

SWEN90010 – High Integrity
Systems Engineering Fault Tolerant Design
Toby MD 8.17 (Level 8, Doug McDonell Bldg)
http://people.eng.unimelb.edu.au/tobym @tobycmurray

Copyright By PowCoder代写 加微信 powcoder

INTRODUCTION TO FAULT TOLERANCE

Fault Tolerance
3 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Fault Tolerance
A system is fault tolerant if it can continue to function according to specification in the presence of a finite number of faults.
3 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Fault Tolerance
A system is fault tolerant if it can continue to function according to specification in the presence of a finite number of faults.
3 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Fault Tolerance
A system is fault tolerant if it can continue to function according to specification in the presence of a finite number of faults.
“… finite number of faults …” means we, as engineers, choose the degree of fault tolerance.
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Fault Tolerance
A system is fault tolerant if it can continue to function according to specification in the presence of a finite number of faults.
“… finite number of faults …” means we, as engineers, choose the degree of fault tolerance.
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Fault Tolerance
A system is fault tolerant if it can continue to function according to specification in the presence of a finite number of faults.
“… finite number of faults …” means we, as engineers, choose the degree of fault tolerance.
(How? HAZOP etc.)
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Fault Tolerance
A system is fault tolerant if it can continue to function according to specification in the presence of a finite number of faults.
“… finite number of faults …” means we, as engineers, choose the degree of fault tolerance.
(How? HAZOP etc.)
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Fault Tolerance
A system is fault tolerant if it can continue to function according to specification in the presence of a finite number of faults.
“… finite number of faults …” means we, as engineers, choose the degree of fault tolerance.
(How? HAZOP etc.)
“… according to spec in the presence of … faults …” means the system has to be able to detect occurrences of faults and work around them
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Definitions (c.f. SWEN90006)
4 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Definitions (c.f. SWEN90006)
4 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Definitions (c.f. SWEN90006)
System behaviour deviates from specification.
4 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Definitions (c.f. SWEN90006)
System behaviour deviates from specification.
4 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Definitions (c.f. SWEN90006)
System behaviour deviates from specification.
Cause of a failure: incorrect step, process, data definition etc.

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Definitions (c.f. SWEN90006)
System behaviour deviates from specification.
Cause of a failure: incorrect step, process, data definition etc.

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Definitions (c.f. SWEN90006)
System behaviour deviates from specification.
Cause of a failure: incorrect step, process, data definition etc.
Manifestation (occurrence) of a fault.

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Definitions (c.f. SWEN90006)
System behaviour deviates from specification.
Cause of a failure: incorrect step, process, data definition etc.
Manifestation (occurrence) of a fault.
Reliability

Definitions (c.f. SWEN90006)
System behaviour deviates from specification.
Cause of a failure: incorrect step, process, data definition etc.
Manifestation (occurrence) of a fault.
Reliability
Probability that system operates without failure
over a specific time interval.
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Two Kinds of Faults
Hardware Fault
Software Fault
5 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Two Kinds of Faults
Hardware Fault
A physical defect that can cause the system
or component to produce an error
Software Fault

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Two Kinds of Faults
Hardware Fault
A physical defect that can cause the system
or component to produce an error
Software Fault
A defect in the source of the software that can cause the system or component to produce an error

Hardware vs Software Faults
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
int quotient(int a, int b){
return a/b; }
Hardware and software fail in different ways

Failure Curve for Hardware
7 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Failure Curve for Hardware
Does this make sense for software failures?
7 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Failure Curve for Hardware
Does this make sense for software failures? No
7 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Hardware and Software Failures Hardware (tends to) fail randomly
Software (tends to) fail systematically
8 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Hardware and Software Failures
Hardware (tends to) fail randomly e.g. degradation of components
Software (tends to) fail systematically
8 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Hardware and Software Failures
Hardware (tends to) fail randomly e.g. degradation of components
e.g. changes in environmental conditions Software (tends to) fail systematically

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Hardware and Software Failures
Hardware (tends to) fail randomly e.g. degradation of components
e.g. changes in environmental conditions
Software (tends to) fail systematically with same state, same inputs, fails
every time or succeeds every time

Hardware and Software Failures
Hardware (tends to) fail randomly e.g. degradation of components
e.g. changes in environmental conditions
Software (tends to) fail systematically with same state, same inputs, fails
every time or succeeds every time (although beware concurrency and
other sources of nondeterminism)
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Causes of Failures
9 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Causes of Failures
Faults in the Specification
9 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Causes of Failures
Faults in the Specification
Need good spec validation techniques
9 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Causes of Failures
Faults in the Specification
Need good spec validation techniques Faults in System Components (s/w or h/w)
9 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Causes of Failures
Faults in the Specification
Need good spec validation techniques
Faults in System Components (s/w or h/w) Need good engineering techniques
(focus of most of this subject)

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Causes of Failures
Faults in the Specification
Need good spec validation techniques
Faults in System Components (s/w or h/w) Need good engineering techniques
(focus of most of this subject)
Faults due to Environment Effects

Causes of Failures
Faults in the Specification
Need good spec validation techniques
Faults in System Components (s/w or h/w) Need good engineering techniques
(focus of most of this subject)
Faults due to Environment Effects
e.g. radiation for space vehicles, temperature, G-forces, etc.
9 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Causes of Failures
Faults in the Specification
Need good spec validation techniques
Faults in System Components (s/w or h/w) Need good engineering techniques
(focus of most of this subject)
Faults due to Environment Effects
e.g. radiation for space vehicles, temperature, G-forces, etc. (Ideally identified during HAZOP etc.)
9 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

3 Ways to Class Failures
10 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

3 Ways to Class Failures
Temporal Behaviour
10 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

3 Ways to Class Failures
Temporal Behaviour
permanent, intermittent or transient
10 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

3 Ways to Class Failures
Temporal Behaviour
permanent, intermittent or transient
Output Behaviour
10 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

3 Ways to Class Failures
Temporal Behaviour
permanent, intermittent or transient
Output Behaviour
Non-Malicious or Byzantine
10 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

3 Ways to Class Failures
Temporal Behaviour
permanent, intermittent or transient
Output Behaviour
Non-Malicious or Byzantine
Independence and Correlation
10 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

3 Ways to Class Failures
Temporal Behaviour
permanent, intermittent or transient
Output Behaviour
Non-Malicious or Byzantine
Independence and Correlation
Independent or Correlated
10 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Temporal Behaviour Time
11 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Temporal Behaviour Time
11 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Temporal Behaviour Time
Permanent Failure
11 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Temporal Behaviour Time
Permanent Failure
11 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Temporal Behaviour Time
Permanent Failure
Intermittent Failure
11 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Temporal Behaviour Time
Permanent Failure
Intermittent Failure
11 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Temporal Behaviour Time
Permanent Failure
Intermittent Failure
Transient Failure
11 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
12 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
12 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
12 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
Non-Malicious
12 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
Non-Malicious
(output able to be interpreted consistently
by all components receiving it)
12 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
13 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
13 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
13 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
“ERROR” Byzantine
13 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Output Behaviour
(output not able to be interpreted consistently
by all components receiving it)
13 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Independence vs Correlated
Two identical servers. Failure = server crash. Independent or Correlated?
14 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Independence vs Correlated
Two identical servers. Failure = server crash. Independent or Correlated?
c.f. Reliability Block Diagrams (SWEN90006)
14 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Independence vs Correlated
Two identical servers. Failure = server crash. Independent or Correlated?
c.f. Reliability Block Diagrams (SWEN90006)
14 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Independence vs Correlated
Two identical servers. Failure = server crash. Independent or Correlated?
c.f. Reliability Block Diagrams (SWEN90006)
RBDs assume that server failure is not correlated.
14 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

REDUNDANCY

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Redundancy
Essential if a system is to continue to function with loss of components
Hardware Redundancy: Multiple processors, duplicate hardware components and computations
Software Redundancy: Multiple implementations of the software
Information Redundancy: Error checking and error correcting codes
Time Redundancy: Retrying tasks, transaction rollback etc.

Example: Airbus A330/340 (https://de.slideshare.net/sommerville-videos/airbus-fcs)
17 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Hardware Redundancy
To detect and tolerate specific errors in a system
Example: Static Pair
Assumption: P1 and P2 fail independently 18 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Hardware Redundancy
To detect and tolerate specific errors in a system
Example: Static Pair
Monitor and Interface check each other
Assumption: P1 and P2 fail independently 18 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Detecting Errors
19 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Detecting Errors
Monitor receives 32.0 from P1 and 32.1 from P2
19 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Detecting Errors
Monitor receives 32.0 from P1 and 32.1 from P2 OK
19 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Detecting Errors
Monitor receives 32.0 from P1 and 32.1 from P2 OK Monitor receives 32.0 from P1 and 4.3 from P2
19 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Detecting Errors
Monitor receives 32.0 from P1 and 32.1 from P2 OK Monitor receives 32.0 from P1 and 4.3 from P2 ERROR
19 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Detecting Errors
Monitor receives 32.0 from P1 and 32.1 from P2 OK Monitor receives 32.0 from P1 and 4.3 from P2 ERROR
But which of P1 or P2 is at fault?
19 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

When more than two components are compared. Allows determining which components might have failed.
20 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
21 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
21 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
21 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
21 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
Two measures sufficiently equal (SE) when they are within some small distance ε of each other
21 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
22 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
22 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
22 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
22 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Approximate Agreement
P1 and P2 are sufficiently equal but P3 is very distant from both, so likely erroneous
22 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Voting Algorithms
23 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Voting Algorithms
23 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Voting Algorithms
23 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Voting Algorithms
23 Copyright University of Melbourne 2016, provided under Creative Commons Attribution

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com