CS代写 SWEN90010 – High Integrity

SWEN90010 – High Integrity
Systems Engineering High Integrity Systems, Safety
Toby MD 8.17 (Level 8, Doug McDonell Bldg)
http://people.eng.unimelb.edu.au/tobym @tobycmurray

Copyright By PowCoder代写 加微信 powcoder

INTRODUCTION
TO SAFETY ENGINEERING

What is “safety”?
What does it mean for software?
How do we get it? (safety engineering)
3 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Doesn’t cause unacceptable harm to the environment or people
4 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
e.g. we accept some risk of planes crashing
Doesn’t cause unacceptable harm to the environment or people
How we define and quantify unacceptability is a big part of safety engineering.

Can software harm anyone on its own?
Only when in the right context
Only as part of a larger system
Software engineering is always about the larger system, but even more so for safety.
6 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

CASE STUDY: THERAC-25

mid-1980s Radiation Therapy Machine
8 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Therac-25 Harm
June 1985 – Jan 1987: 6 radiation overdoses,
multiple deaths.
March 21, 1986 (3rd incident): “Inside the treatment room Cox was hit with a powerful shock. He knew from previous treatments this was not supposed to happen. He tried to get up. Not seeing or hearing him because of the broken communications between the rooms, the technician pushed the “p” key, meaning “proceed.” Cox was hit again. The treatment finally stopped when Cox stumbled to the door of the room and beat it with his fists.”
He was sent home but returned to the hospital a few weeks later … diagnosed radiation overexposure. It later paralysed his left arm, both legs, his left vocal chord, and his
diaphragm. He died nearly five months later.
9 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Therac-25 Causes
Software errors
“Therac-25 typically issued up to four error messages a day”
“At the computer console she typed in the prescription data for an electron beam of 180 rads, then noticed she’d made an error by typing in command x (for x-ray treatments) instead of e (for electron). She ran the cursor up the screen to change the command x to e”
Poor Usability
10 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Software Overconfidence
“On Therac-6 and 20 hardware lockout mechanisms do not allow the operator to do something dangerous”
“When it came time to make Therac-25, AECL decided to keep only the computer control. They not only refused manual controls, but also the hardware lock mechanisms.”
Manufacturer refused to believe machine was faulty initially, until more incidents.

CASE STUDY:

LONDON AMBULANCE DISPATCH SYSTEM

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
26 October 1992
“Just a few hours [into its deployment] AVLS was unable to keep track of the ambulances and their
statuses in the system. It began sending multiple units to some locations and no units to other locations. … The system began to generate such a great quantity of exception messages … that calls got lost. The problem was compounded when people called back additional times because the ambulances they were expecting did not arrive. … The next day, the LAS switched back to a part-manual system, and shut down the computer system completely when it quit working altogether eight days later”

“There were as many as 46 deaths that would have been avoided had the requested ambulance arrived on time.”
“One heart attack patient waited six hours for an ambulance before her son took her to the hospital.”
“Another woman called the LAS every 30 minutes for almost three hours before an ambulance arrived. It was too late, as her husband had already died.”
“One ambulance crew arrived only to find that the patient had not only died, but his body had been taken away by a mortician.”
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Further systems launched in 2005 and 2011, both extremely unreliable initially.
Ambulance scheduling is not exceedingly difficult.
Problem is that these systems weren’t engineered as if they were safety critical.

CASE STUDY: AIRBUS A320

Airbus A320
A computer network in a plane.
~150 ECUs, so a big distributed system. First civilian fly-by-wire aircraft.
17 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

26 June, 1988
Airbus A320 first passenger flight
Low-speed flyover at Habsheim Air Show, at 10 metres altitude
One woman and two children died.
18 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

What Happened
Official Report
Pilot flew too low, too slow. Failed to see forest. Cause: Pilot Error
Captain’s Report
Fly-by-wire system prevented plane from levelling at correct altitude and climbing.
Forest not shown on airport map given to pilots.
system failure.
Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Not (only)
pilot error:
Expecting different (longer) runway

CASE STUDY: BOEING 737 MAX

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
29 October 2018: Lion Air Flight 610 crashes
10 March 2019, Ethiopian Airlines Flight 302 crashes Both Boeing 737 MAX aircraft
346 people killed
Precise causes still unknown
However hypothesised that major contributing factor was MCAS system (which is a new system added to 737 MAX series)

22 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

MCAS Function
23 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Angle of Attack Sensors
24 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Immediate Cause
MCAS repeatedly activated Repeatedly pushed plane’s nose down Pilots were unable to correct its effects Plane eventually crashed into the ground
MCAS repeated activated believed to be caused by receiving faulty AoA readings
25 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Contributing Factors
Lack of redundancy: MCAS took input from only one AoA sensor (despite the 737 MAX having two AoA sensors)
Lack of human override: Pilots repeatedly tried to bring plane nose up by moving the control column. But MCAS was designed so that it couldn’t be overridden this way.
Lack of Documentation/Training: MCAS not described in manual or training — apparently deliberate choice to maximise similarity to prior 737s to minimise total cost of ownership
26 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Culture: Economics vs Safety
MCAS resulted from an economic incentive to make the 737 MAX as economically attractive to potential buyers as possible
(Bigger engines = more efficient plane; but necessitated MCAS to maximise similarity to prior 737s)
(Maximising similarity = minimal retraining;
 but meant pilots were unaware of MCAS)
(Using only 1 AoA sensor = no need to re-certify;
 but meant a single point of failure)
27 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Consequences
346 deaths
Indefinite grounding of all 737 MAXs pending investigation and re-certification
Projected cost: 18.4 billion USD
CEO removed
2019: airlines cancelled 183 orders (at $100M per plane = 18.3 billion USD)
Cheapest way to build something is to build it right the first time
28 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Culture and Ethics
Engineers have a responsibility to act ethically
In cultures that (implicitly) prioritise economics over safety or other concerns. this may require challenging that culture or refusing to follow management directives
Remember: The first employee ( ) of Volkswagen sent to prison over its emissions cheating system was a software engineer
29 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

SAFETY ENGINEERING

Importance of The System Safety cannot be talked about
without considering the entire system
e.g. a pilot might fail to notice a signal, and be blamed (“pilot error”),
but blame is tricky to assign if the signal was placed in the cockpit
at a location that was hard to see.
c.f. early SSL/TLS warning dialogs
31 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

The System
Software Hardware Operating Procedures
(this is where people, users, come in.)
The context in which the system sits.
32 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

In light of the above, a more precise definition:
software and hardware used under correct operating conditions don’t cause unacceptable harm to people or environment.
33 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Safety Engineering Process
34 Copyright University of Melbourne 2016, provided under Creative Commons Attribution License

Copyright University of Melbourne 2016, provided under Creative Commons Attribution License
Safety Engineering
How do we engineer safe systems?
Safety engineers are experts in their domain:
medical, rail signalling, aviation etc. Safety engineers are experts in past accidents,
incidents and failures.
It is difficult to guard against what you can’t predict.
Is why air crashes are so thoroughly investigated.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com