Advanced Software Engineering Week 1 – Introduction
Lecturer: Dr. Hana Chockler
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 1
Course assessment
• Exam counts towards 100% of your mark
– 3 out of 3 questions
– 2 hours
– questions are either bookwork, taken from the tutorials, or like the questions in the tutorials
– Previous years’ exams are available
– The structure of the exam will be similar – expect some changes since it only MSc level this year
– The questions will reflect the material studied this year
• Problems solved through the course and during the tutorials – do not count towards your mark
• but can appear in the exam
• also help you to understand the course material better, so please invest time and effort in solving them
Lectures and Tutorials
• Two-hours lecture
• Tutorial questions’ sheets appear on KEATS after the lecture in the week they have been given
• Tutorial solutions appear after the small group tutorial sessions and are adjacent to the questions
• Small group tutorials
– ledbytheTAs
– are interactive discussions of problems following the module curriculum
– important for succeeding in the exam
– cover the material studied in the previous week
– try to solve the problems yourself before the tutorial
– no tutorial in the first week of term!
Course material
• Software Testing and Analysis, by Mauro Pezze and Michal Young
• Introduction to Software Testing, by Paul Ammann and Jeff Offutt
• KEATS – lecture slides and tutorial questions
• References in the slides
Lecture capture
• We provide lecture capture for most modules.
• It is important to use lecture capture wisely:
• Lecture recordings are a study and revision aid.
• Watching lectures online is NOT a replacement for attending lectures.
• Statistically, there is a clear and direct link between attendance and attainment: Students who do not attend lectures do less well in exams.
• Attending a lecture is more than watching it online – if you do not attend, you miss out!
Expectations of inclusive behaviour
• The Department of Informatics is committed to providing an inclusive learning and working environment.
• Staff and students are expected to behave respectfully to one another – during lectures, outside of lectures and when communicating online or through email.
• We won’t tolerate inappropriate or demeaning comments related to gender, gender identity and expression, sexual orientation, disability, physical appearance, race, religion, age, or any other personal characteristic.
• If you witness or experience any behaviour you are concerned about, please speak to someone about it. This could be one of your lecturers, your personal tutor, a programme administrator, the Informatics equality & diversity lead, or any other member of staff you feel comfortable talking to.
• The College also has a range of different support and reporting procedures that you might find helpful: kcl.ac.uk/harassment
Week numbers
– Week numbers are in logical order
– A reading week in the middle of the term – no lecture
Check the KEATS page regularly for any important announcements.
Module topics
(indicative –
please check on KEATS regularly)
1. This introduction
2. A framework for test and analysis; Basic principles
3. Test and analysis activities within a software process
4. Finite models; FSM; finite state verification
5. CFG; Functional testing; combinatorial testing
6. Model-based testing; specification-based testing
7. Condition-decision testing
8. Fault-based testing; symbolic simulation
9. Program analysis; planning and monitoring
10.Test execution
11.Revision lecture
Following last years’ feedback and exam results:
• Introduction to formal methods is removed from the curriculum
– There is a module 6CCS3VER on formal verification
• More time is dedicated to the explanation of the finite state machines and control flow graphs
• The tutorials are in small groups
• Added material about current testing practices
Jobs
The course has real practical relevance.
Several students in the past have walked straight into jobs in testing, based on their expertise form this course.
Much of the research underlying this course was done here at King’s.
Motivation – Famous errors Therac-25
• medical radiation therapy machine
• 6 massive overdoses leaving several dead (1985-87, USA Canada)
• cause: errors in the control program due to concurrency , no hardware safety backup
Analysis [Leveson 1995]:
• excessive trust in software when designing system
• reliability ≠safety
• lack of hardware interlocks
• lack of appropriate software engineering practices (defensive design, specification, documentation, simplicity, formal analysis, testing)
• correcting one error does not necessarily make system safer !
https://en.wikipedia.org/wiki/Therac-25
Motivation – Famous errors Ariane 5 space rocket
Self-destructed due to malfunction 40 seconds after launch (1996)
•Cause: 64-bit float → 16-bit int conversion generated uncaught exception in its ADA program
•Cost: $500 M (rocket), $7 billion (project)
Analysis
•main cause: inappropriate software reuse
•code taken over from the Ariane 4, without judicious analysis
– higher than expected velocity was deemed impossible, but Ariane 5 was faster than Ariane 4!
– no analysis of overflow for unprotected variables
– ⇒ necessity of specifying and observing an interface
•bad design of system fault tolerance: the inertial reference system and the backup system affected by the same error
http://www.around.com/ariane.html
Motivation – Famous errors Mars space probes
Mars Pathfinder, 1997
• Problem: on Mars, space probe was resetting frequently
– total system resets, each resulting in losses of data! • Cause:
priority inversion between processes sharing common resources 1.Process A (low priority) requests resource R
2. A interrupted by C (high priority)
3. C waits for R to be freed; switch back to A
4. A interrupted by B (medium priority, A < B < C)
⇒ C waits for lower priority B, without directly depending on it !
Solution:
Motivation – Famous errors Mars space probes
raising the priority of a process (A) that obtains a resource to the level of the highest priority process (C) that can request the resource
• Issue and solution were well known in literature !
[Sha, Rajkumar, Lehoczky. Priority Inheritance Protocols, 1990]
http://research.microsoft.com/
en-us/um/people/mbj/mars_pathfinder/mars_pathfinder.html
Motivation – Famous errors Mars space probes
Mars Climate Orbiter, 1998
• disintegrated upon entry to Mars atmosphere
• technical error: mismatch between UK and metric units (pound-sec instead of Newton-sec) for computation of impulse
1 pound-sec = 1.5 newton-sec
• multiple process errors: lack of formal interfaces between modules
•Cost: $327.6 Mil (orbiter and lander) + $193.1 Mil (spacecraft development) + $91.7 Mil (launching) + $42.8 Mil (mission operations)
https://en.wikipedia.org/wiki/Mars_Climate_Orbiter
Motivation – Famous errors Mars space probes
Mars Polar Lander, 1998
•landing gear prematurely activated upon entry to atmosphere
•resulting shock is interpreted as landing, engines are stopped
•error: lack of integration testing
•happened two-and-a-half months after the loss of the Mars Climate Orbiter .
https://en.wikipedia.org/wiki/Mars_Polar_Lander
Techniques from the motivational slides
• Software engineering practices: defensive design, specification, documentation, simplicity, formal analysis, testing
• Testing for errors due to re-used code
• Attention to concurrency-specific issues (like priorities)
• Definition of formal interfaces between components
• System testing
• Integration testing
17
Software Test and Analysis in a Nutshell
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 18
Learning objectives
• View the “big picture'' of software quality in the context of a software development project and organization:
• Introduce the range of software verification and validation activities
• Provide a rationale for selecting and combining them within a software development process.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 19
Engineering processes
• Sophisticated tools
– amplify capabilities
– but do not remove human error
• Engineering disciplines pair
– construction activities with
– activities that check intermediate and final products
• Software engineering is no exception: construction of high quality software requires
– construction and
– verification activities
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 20
Verification and design activities
• Verification and design activities take various forms
– suited to highly repetitive construction of non- critical items for mass markets
– highly customized or highly critical products.
• Appropriate verification activities depend on – engineering discipline
– construction process
– final product
– quality requirements.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 21
Peculiarities of software
Software has some characteristics that make V&V particularly difficult:
– Many different quality requirements
– Evolving (and deteriorating) structure – Inherent non-linearity
– Uneven distribution of faults
Example
If an elevator can safely carry a load of 1000 kg, it can also safely carry any smaller load;
If a procedure correctly sorts a set of 256 elements, it may fail on a set of 255 or 53 or 12 elements,
as well as on 257 or 1023.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 22
Discussion:
• Why can a procedure work fine on 256 elements, but fail on 237 or on 270 elements?
• Where in testing do we assume monotonicity? Why? Is it justified? Examples.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 23
Impact of new technologies
• Advanced development technologies
– can reduce the frequency of some classes of errors – but do not eliminate errors
• New development approaches can introduce new kinds of faults
examples
– deadlock or race conditions for distributed software
– new problems due to the use of polymorphism, dynamic binding and private state in object-oriented software.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 24
Variety of approaches
• There are no fixed recipes
• Test designers must
– choose and schedule the right blend of techniques • to reach the required level of quality
• within cost constraints
– design a specific solution that suits • the problem
• the requirements
• the development environment
(c) 2007 Mauro Pezzè & Michal Young
Ch 1, slide 25
Five Basic Questions
1. When do verification and validation (V&V) start?
When are they complete?
2. What particular techniques should be applied during development?
3. How can we assess the readiness of a product?
4. How can we control the quality of successive releases?
5. How can the development process itself be improved?
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 26
Intermission:
talk from Google test engineering director
• https://www.youtube.com/watch?v=KXGnXq5u XR4
– start at 1:24 until 4:40; then 7:20 to 10:00
– ”dog fooding” = company staff uses the product before it is released to the company
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 27
Intermission:
talk from Google test engineering director
• https://www.youtube.com/watch?v=KXGnXq5u XR4
– start at 1:24 until 4:20; then 7:20 to 10:00
• “I have never seen a high-quality software”
• “Quality cannot be tested in”
• Thecostofbugsisgoesupifthebugis discovered late rather than early
• What are the most difficult bugs?
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 28
1: When do V&V start? When are they complete?
• Test is not a (late) phase of software development
– Execution of tests is a small part of the verification and validation process
• V&V start as soon as we decide to build a software product, or even before
• V&V last far beyond the product delivery
as long as the software is in use, to cope with evolution and adaptations to new conditions
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 29
Early start: from feasibility study
• The feasibility study of a new project must take into account the required qualities and their impact on the overall cost
• At this stage, quality related activities include
– risk analysis
– measures needed to assess and control quality at each stage of development.
– assessment of the impact of new features and new quality requirements
– contribution of quality control activities to development cost and schedule.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 30
Long lasting: beyond maintenance
• Maintenance activities include
– analysis of changes and extensions
– generation of new test suites for the added functionalities
– re-executions of tests to check for non regression of software functionalities after changes and extensions
– fault tracking and analysis
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 31
2: What particular techniques should be applied during development?
No single A&T technique can serve all purposes
The primary reasons for combining techniques are:
– Effectiveness for different classes of faults
example: analysis instead of testing for race conditions
– Applicability at different points in a project
example: inspection for early requirements validation
– Differences in purpose
example: statistical testing to measure reliability
– T radeoffs in cost and assurance
example: expensive technique for key properties
(c) 2007 Mauro Pezzè & Michal Young
Ch 1, slide 32
Static Analysis Techniques (no execution of the code)
• analysis includes
– manual inspection techniques – automated analysis
• can be applied at any development stage
• well suited for the early stages of specifications an design
(c) 2007 Mauro Pezzè & Michal Young Ch 4, slide 33
Inspection
• can be applied to essentially any document
– requirements statements
– architectural and detailed design documents
– test plans and test cases
– program source code
• may also have secondary benefits
– spreading good practices
– instilling shared standards of quality.
• takes a considerable amount of time
• re-inspecting a changed component can be expensive
• used primarily
– where other techniques are inapplicable
– where other techniques do not provide sufficient coverage
(c) 2007 Mauro Pezzè & Michal Young Ch 4, slide 34
Automatic Static Analysis
• More limited in applicability
– can be applied to some formal representations of requirements models
– not to natural language documents
• Are selected when available
– substituting machine cycles for human effort makes them particularly cost-effective
• Tend to have many false alarms
(c) 2007 Mauro Pezzè & Michal Young Ch 4, slide 35
Testing
• Start as early as possible
• Early test generation has several advantages
– Tests generated independently from code, when the specifications are fresh in the mind of analysts
– The generation of test cases may highlight inconsistencies and incompleteness of the corresponding specifications
– tests may be used as compendium of the specifications by the programmers
(c) 2007 Mauro Pezzè & Michal Young
Ch 4, slide 36
Staging A&T techniques
Requirements Requirements Architectural Detailed Elicitation Specification Design Design
Unit Coding
Integration & Maintenance Delivery
Identify qualites
Plan acceptance test
Plan system test
Plan unit & integration test
Monitor the A&T process
Validate specifications
Analyze architectural design
Inspect architectural design
Inspect detailed design
Generate system test
Generate integration test
Generate unit test
Design scaffolding Design oracles
Code inspection
Generate regression test
Update regression test
Execute unit test Analyze coverage
Generate structural test
Execute integration test
Execute system test Execute acceptance test
Execute regression test
Collect data on faults
analyze faults and improve the process
(c) 2007 Mauro Pezzè & Michal Young
Ch 1, slide 37
Process improvement
test caseexecution and swvalidation
Generation of tests
Verification of specs Planning & monitoring
3: How can we assess the readiness of
a product?
• A&T during development aim at revealing faults
• We cannot reveal or remove all faults
• A&T cannot last indefinitely: we want to know if products meet the quality requirements
• We must specify the required level of
dependability
– and determine when that level has been attained.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 38
Different measures of dependability
• Availability measures the quality of service in terms of running versus down time
• Mean time between failures (MTBF) measures the quality of the service in terms of time between failures
• Reliability indicates the fraction of all attempted operations that complete successfully (= probability of failure-free operation)
(c) 2007 Mauro Pezzè & Michal Young
Ch 1, slide 39
Example of different dependability measures
Web application:
• 50 interactions terminating with a credit card charge.
• The software always operates flawlessly up to the point that a credit card is to be charged, but on half the attempts it charges the wrong amount.
What is the reliability of the system?
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 40
Example of different dependability measures
Web application:
• 50 interactions terminating with a credit card charge.
• The software always operates flawlessly up to the point that a credit card is to be charged, but on half the attempts it charges the wrong amount.
What is the reliability of the system?
• If we count the fraction of individual interactions that are correctly carried out, only one operation in 100 fail: The system is 99% reliable.
• If we count entire sessions, only 50% reliable, since half the sessions result in an improper credit card charge
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 41
• Correctness:
Dependability Qualities
– A program is correct if it is consistent with its specification
• seldom practical to check fully for non-trivial systems
• cannot be “30% correct” – either correct or incorrect
• meaningfulness of correctness depends fully on the quality of the specification (see the lift example)
• Reliability:
– probability of failure-free software operation for a specified period of time in a specified environment
• relative to a specification and usage profile
• statistical approximation to correctness (100% reliable = correct)
• Safety:
– preventing hazards
– considered separately from correctness
• Robustness:
– acceptable (degraded) behavior under extreme conditions
(c) 2007 Mauro Pezzè & Michal Young
Ch 4, slide 42
Is a correct and free of hazards system always useful?
(c) 2007 Mauro Pezzè & Michal Young Ch 4, slide 43
Is a correct and free of hazards system always useful?
• No!
• It is possible to have a fully correct and free of hazards system that is completely useless
– For example, a system that does nothing
– a less extreme example is a system that is too slow, has terrible user interface, no documentation, or missing critical features
(c) 2007 Mauro Pezzè & Michal Young Ch 4, slide 44
Example of Dependability Qualities
1112 1 10 2 93
87654
• Correctness, reliability: let traffic pass according to correct pattern and central scheduling
• Robustness, safety: Provide degraded function when possible; never signal conflicting greens.
• Blinking red / blinking yellow is better than no lights; no lights is better than conflicting greens
(c) 2007 Mauro Pezzè & Michal Young
Ch 4, slide 45
Relation among Dependability Qualites
reliable but not correct: failures occur rarely
Reliable Correct Safe Robust
correct but not safe or robust: the specification is inadequate
safe but not correct: doesn’t always work correctly
(c) 2007 Mauro Pezzè & Michal Young Ch 4, slide 46
robust but not safe: catastrophic failures can occur
Assessing dependability
• Randomly generated tests following an operational profile
• Alpha test: tests performed by users in a controlled environment, observed by the development organization
• Beta test: tests performed by real users in their own environment, performing actual tasks without interference or close monitoring
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 47
4: How can we control the quality of successive releases?
• Software test and analysis does not stop at the first release.
• Software products operate for many years, and undergo many changes:
– They adapt to environment changes
– They evolve to serve new and changing user requirements.
• Quality tasks after delivery
– test and analysis of new and modified code
– re-execution of system tests – extensive record-keeping
(c) 2007 Mauro Pezzè & Michal Young
Ch 1, slide 48
5: How can the development process itself be improved?
• The same defects are encountered in project after project
• We need to improve the process by
– identifying and removing weaknesses in the development process
– identifying and removing weaknesses in test and analysis that allow them to remain undetected
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 49
A four step process to improve fault analysis and process
1. Define the data to be collected and implementing procedures for collecting them
2. Analyze collected data to identify important fault classes
3. Analyze selected fault classes to identify weaknesses in development and quality measures
4. Adjust the quality and development process
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 50
An example of process improvement
1. Faults that affect security were given highest priority
2. During A&T we identified several buffer overflow problems that may affect security
3. Faults were due to bad programming practice and were revealed late due to lack of analysis
4. Action plan: Modify programming discipline and environment and add specific entries to inspection checklists
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 51
Summary
• The quality process has three different goals:
– Improving a software product
– assessing the quality of the software product
– improving the quality process
• We need to combine several A&T techniques through the software process
• A&T depend on organization and application domain.
• Cost-effectiveness depends on the extent to which techniques can be re-applied as the product evolves.
• Planning and monitoring are essential to evaluate and refine the quality process.
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 52
Home reading
• Chapter 1 and parts of Chapter 4 of the book Software Testing and Analysis, by Mauro Pezze and Michal Young
– Software test and analysis in a nutshell
(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 53