Course Code
:
SOF204
Course Name
:
Software Architecture and Design Patterns
Lecturer
:
Dr. Tee Sim Hui
Academic Session
:
202009
Assessment Title
:
Assignment1
Submission Due Date
:
08 Dec 2020
Prepared by
:
Student ID
Student Name
DMT1909215
ZHAI XINGXIANG
Date Received
:
Feedback from Lecturer:
Mark:
Mark:
Own Work Declaration
I/We hereby understand my/our work would be checked for plagiarism or other misconduct, and the softcopy would be saved for future comparison(s).
I/We hereby confirm that all the references or sources of citations have been correctly listed or presented and I/we clearly understand the serious consequence caused by any intentional or unintentional misconduct.
This work is not made on any work of other students (past or present), and it has not been submitted to any other courses or institutions before.
Signature:
Date: 07 Dec 2020
Task(s)
Supposed that you are a software architect of an online merchant. Formulate a proposal that contains:
(a). Concrete scenarios for availability
Scenario 1:
Source of Stimulus:
Customers
Stimulus
Large shopping festivals (11.11)
Artifact
All the systems
Environment
Normal operations, but there is a lot of traffic
Response
The web page cannot load and the response is slow
Response Measure
The system is available after 1 minute
Scenario 2:
Source of Stimulus:
Customers
Stimulus
The screening method is not perfect
(search engine is slow)
Artifact
System database
Environment
Normal operations
Response
The web page cannot return an optimal recommend in certain time
Response Measure
The system is available after 1 minute
Scenario 3:
Source of Stimulus:
The network architecture
Stimulus
Transaction payment problem
Artifact
Payment system
Environment
Normal operation
Response
Payment failed or paid but the order was not generated
Response Measure
The information is update, and upgrade the network architecture
Scenario 4:
Source of Stimulus:
Hackers
Stimulus
Malicious attack
Artifact
All the system
Environment
Normal operation
Response
Recovery the system as soon as possible
Response Measure
The system is available
Scenario 5:
Source of Stimulus:
Electricity
Stimulus
The system is out of power
Artifact
All the system
Environment
Normal operation
Response
Recovery the system as soon as possible
Response Measure
The system is available and messages are recovered
Scenario 6:
Source of Stimulus:
Customer
Stimulus
Many people compete for the last item at the same time
Artifact
Database
Environment
Normal operation
Response
Data mismatch may cause system failure
Response Measure
The system is available and the item was distributed to someone
(b). Tactics to enhance the availability of the software system of the online merchant.
Availability tactics are designed to keep faults from being failures or make repair possible. It can be classified into three categories: fault detection, fault recovery, and fault prevention.(Bass, Clements, & Kazman, 1998)
Figure 1.Abstract (adapt from Alenezi 2020,figure 1)
Fault detection
The main idea of fault detection is to respond to availability issues in a predictable and defined way. (Atchison L, 2016) This means being alerted when problems occur so that people can take actions. Additionally, the development team leader should establish processes and procedures that the team can follow to help diagnose issues and easily fix common failure scenarios. There are some tactics:
Ping/echo—It requires a time threshold to be set, which tells the pinging component how long to wait for the echo before considering the pinged component to have failed. In the software system of the online merchant, internet server and database are necessary components, privileges are given to make sure them active and responsive.
Figure 2.Ping example
Monitor—Use monitor to monitor the state of health of various parts of the system: processors, processes, I/O, memory, and so on. A system monitor can detect failure or congestion in the shared resources like network.
Heartbeat—Employ a periodic message exchange between a system monitor and a process being monitored. If the receiving point does not receive a heartbeat for a time, the machine that should have sent the heartbeat is assumed to have failed.
Time stamp—It establishes a temporal order among a set of events through assigning the state of a local clock to the event after it occurs. By checking the sequence of events, it can detect whether something wrong happened.
Sanity checking—Based on a knowledge of the internal design, sanity checking can be employed at interfaces to examine a specific information flow by checking the validity or reasonableness of specific operations or outputs of a component.
Exception detection—It is a tactic that checks the condition that alters the normal flow of execution. It can be further refined as system exception and timeout. System exception includes faults like divide by zero, address faults, and so on. Timeout can be achieved by making timing constraints between components. If the time exceeds the limit, a timeout exception is raised.
Self-test—Component can initiate self-test procedures, they test themselves for correct operation when they run. Or the procedures can be invoked by a system monitor. (Bass et al., 1998)
Instruction detection system (IDS)—It is a software monitors system and network resources and activities. It will notify network security personnel when it detects faults. (Ralph Stair& George Reynolds,2018)
Figure 3.IDS
Fault recovery
Fault recovery tactics are categorized into preparation-and-repair tactics and reintroduction tactics. Preparation-and-repair tactics are based on combinations of retrying a computation or introducing redundancy. Reintroduction is where a failed component is reintroduced after it has been corrected.
Preparation-and repair tactics:
Active and passive redundancy—Active redundancy refers to a configuration where all the nodes in a group receive and process identical inputs in parallel. The parallel processing allows the redundant spares to maintain synchronous state with the active nodes. Because the redundant spare processes an identical state, it can recover from a fault in milliseconds. passive redundancy refers to a configuration where only the active members of the protection group process input traffic. Providing the redundant spares with periodic state updates.
Retry and rollback—Retry tactic assumes the failure is transient, like accidental connection failure. Rollback tactic is to revert the system to a previous known good state.
Degradation—The system may drip some less critical functions and maintain the most critical part. For example, when the online merchant system receives too much customers in a time, the system may degrade itself to keep the most critical function.
Reintroduction:
The shadow tactic—It refers to operating a previously failed or in-service upgraded component in a “shadow mode”. During which its behavior can be monitored for correctness and it can repopulate its state.
State resynchronization—This refers to synchronize the state of two or more machines after the repair.
Some other tactics like software update, ignore faulty behavior, and reconfiguration are also important. When we want to fault recovery, the basic plan is—notification, protection of activity logs, activity log maintenance, incident containment, eradication, and incident follow-up. In an incident, the primary goal should be to regain control and limit damage, not to attempt to monitor or catch an intruder. (Ralph Stair& George Reynolds,2018)
Fault prevention
Fault prevention explains the idea “Build with failure in mind” (Atchison L, 2016), The tactics to prevent faults from happening are:
Increase competence set— A program’s competence set is the set of states in which it is competent to operate. Increasing a component’s competence set means designing it to handle more cases of faults as part of its normal operation. For example, the system can allow many functions to read the shared resource, but it may allow only one function to write the shared resource in a time.
Predictive model—It is employed to monitor the state of health of a system to ensure that the system is operating within its nominal operating parameters. Also, it must take corrective action when condition is detected that may cause faults.
Removal from services— Removal from Service tactic eliminates a module of the web application from procedure to undertake some activities to prevent the predicted failures
The other method are transaction and process monitor—A transaction is the collection of several consecutive steps in such a manner that the entire collection can be undone at once. Process monitor can delete the nonperforming process and create a new instance of it once a fault in a process has been detected. (Harrison, Avgeriou, & Zdun, 2010)
In general, the system takes the above actions to detect faults and errors in the running system, prevent faults from impacting the integrity of the system, and recovering gracefully from faults if they do occur, runtime tactics are specific actions the system will take to achieve the desired quality attribute while the system is running.
(Alenezi, Agrawal, Kumar, & Khan, 2020)
Reference
Alenezi, M., Agrawal, A., Kumar, R., & Khan, R. A. J. I. A. (2020). Evaluating performance of web application security through a fuzzy based hybrid multi-criteria decision-making approach: Design tactics perspective. 8, 25543-25556.
Bass, L., Clements, P. C., & Kazman, R. J. T. I. W. o. (1998). Software Architecture in Practice, Third Edition. 235(2), 1002-1006.
Harrison, N. B., Avgeriou, P., & Zdun, U. (2010). On the impact of fault tolerance tactics on architecture patterns. Paper presented at the Proceedings of the 2nd International Workshop on Software Engineering for Resilient Systems.
Atchison, L. . (2016). Architecting for Scale: High Availability for Your Growing Applications. O’Reilly Media, Inc.
Ralph Stair, George Reynolds. Principles of Information Systems (13th Edition). Cengage Learning, 2018
APPENDIX 1
Marking Rubrics
Component Title
Assignment
Percentage (%)
15
Criteria
Score and Descriptors
Weight (%)
Marks
Excellent
(5)
Good
(4)
Average
(3)
Need Improvement
(2)
Poor
(1)
Format
Strict compliant
Compliant in most parts of the document
Compliant in some parts of the document
Non-compliant in most parts of the document
Completely non-compliant
3
Concrete scenarios for availability
Relevant and comprehensive
Mostly relevant and comprehensive
Relevant and comprehensive to the moderate extent
Mostly irrelevant and non-comprehensive
Completely irrelevant and non-comprehensive
5
Tactics to enhance the availability
Relevant and comprehensive tactics
Mostly relevant and comprehensive tactics
Relevant and comprehensive to the moderate extent
Mostly irrelevant and non-comprehensive
Completely irrelevant and non-comprehensive
7
TOTAL
15
Note to students: Please print out and attach this appendix together with the submission of coursework