SlideShare a Scribd company logo
Chapter 13 – Dependability engineeringLecture 11Chapter 13 Dependability Engineering
Topics coveredRedundancy and diversityFundamental approaches to achieve fault tolerance.Dependable processesHow the use of dependable processes leads to dependable systemsDependable systems architecturesArchitectural patterns for software fault toleranceDependable programmingGuidelines for programming to avoid errors.Chapter 13 Dependability Engineering2
Software dependabilityIn general, software customers expect all software to be dependable. However, for non-critical applications, they may be willing to accept some system failures.Some applications (critical systems) have very high dependability requirements and special software engineering techniques may be used to achieve this.Medical systemsTelecommunications and power systemsAerospace systems3Chapter 13 Dependability Engineering
Dependability achievementFault avoidanceThe system is developed in such a way that human error is avoided and thus system faults are minimised.The development process is organised so that faults in the system are detected and repaired before delivery to the customer.Fault detectionVerification and validation techniques are used to discover and remove faults in a system before it is deployed.Fault toleranceThe system is designed so that faults in the delivered software do not result in system failure.4Chapter 13 Dependability Engineering
The increasing costs of residual fault removal 5Chapter 13 Dependability Engineering
Regulated systemsMany critical systems are regulated systems, which means that their use must be approved by an external regulator before the systems go into service. Nuclear systemsAir traffic control systemsMedical devicesA safety and dependability case has to be approved by the regulator. Therefore, critical systems development has to create the evidence to convince a regulator that the system is dependable, safe and secure.Chapter 13 Dependability Engineering6
Diversity and redundancyRedundancyKeep more than 1 version of a critical component available so that if one fails then a backup is available.DiversityProvide the same functionality in different ways so that they will not fail in the same way.However, adding diversity and redundancy adds complexity and this can increase the chances of error.Some engineers advocate simplicity and extensive V & V is a more effective route to software dependability.7Chapter 13 Dependability Engineering
Diversityand redundancy examplesRedundancy. Where availability is critical (e.g. in e-commerce systems), companies normally keep backup servers and switch to these automatically if failure occurs.Diversity. To provide resilience against external attacks, different servers may be implemented using different operating systems (e.g. Windows and Linux)8Chapter 13 Dependability Engineering
Process diversity and redundancyProcess activities, such as validation, should not depend on a single approach, such as testing, to validate the systemRather, multiple different process activities the complement each other and allow for cross-checking help to avoid process errors, which may lead to errors in the softwareChapter 13 Dependability Engineering9
Dependable processesTo ensure a minimal number of software faults, it is important to have a well-defined, repeatable software process.A well-defined repeatable process is one that does not depend entirely on individual skills; rather can be enacted by different people.Regulators use information about the process to check if good software engineering practice has been used.For fault detection, it is clear that the process activities should include significant effort devoted to verification and validation.10Chapter 13 Dependability Engineering
Attributes of dependable processes11Chapter 13 Dependability Engineering
Validation activitiesRequirements reviews.Requirements management.Formal specification.System modelingDesign and code inspection.Static analysis.Test planning and management.Change management, discussed in Chapter 25, is also essential.12Chapter 13 Dependability Engineering
Fault toleranceIn critical situations, software systems must be fault tolerant. Fault tolerance is required where there are high availability requirements or where system failure costs are very high.Fault tolerance means that the system can continue in operation in spite of software failure.Even if the system has been proved to conform to its specification, it must also be fault tolerant as  there may be specification errors or the validation may be incorrect.13Chapter 13 Dependability Engineering
Dependable system architecturesDependable systems architectures are used in situations where fault tolerance is essential. These architectures are generally all based on redundancy and diversity.Examples of situations where dependable architectures are used:Flight control systems, where system failure could threaten the safety of passengersReactor systems where failure of a control system could lead to a chemical or nuclear emergencyTelecommunication systems, where there is a need for 24/7 availability.Chapter 13 Dependability Engineering14
Protection systemsA specialized system that is associated with some other control system, which can take emergency action if a failure occurs.System to stop a train if it passes a red lightSystem to shut down a reactor if temperature/pressure are too highProtection systems independently monitor the controlled system and the environment.If a problem is detected, it issues commands to take emergency action to shut down the system and avoid a catastrophe.Chapter 13 Dependability Engineering15
Protection system architecture16Chapter 13 Dependability Engineering
Protection system functionalityProtection systems are redundant because they include monitoring and control capabilities that replicate those in the control software.Protection systems should be diverse and use different technology from the control software.They are simpler than the control system so more effort can be expended in validation and dependability assurance.Aim is to ensure that there is a low probability of failure on demand for the protection system.Chapter 13 Dependability Engineering17
Self-monitoring architecturesMulti-channel architectures where the system monitors its own operations and takes action if inconsistencies are detected.The same computation is carried out on each channel and the results are compared. If the results are identical and are produced at the same time, then it is assumed that the system is operating correctly.If the results are different, then a failure is assumed and a failure exception is raised.Chapter 13 Dependability Engineering18
Self-monitoring architecture19Chapter 13 Dependability Engineering
Self-monitoring systemsHardware in each channel has to be diverse so that common mode hardware failure will not lead to each channel producing the same results.Software in each channel must also be diverse, otherwise the same software error would affect each channel.If high-availability is required, you may use several self-checking systems in parallel.This is the approach used in the Airbus family of aircraft for their flight control systems.Chapter 13 Dependability Engineering20
Airbus flight control system architecture21Chapter 13 Dependability Engineering
Airbus architecture discussionThe Airbus FCS has 5 separate computers, any one of which can run the control software.Extensive use has been made of diversityPrimary systems use a different processor from the secondary systems.Primary and secondary systems use chipsets from different manufacturers.Software in secondary systems is less complex than in primary system – provides only critical functionality.Software in each channel is developed in different programming languages by different teams.Different programming languages used in primary and secondary systems.Chapter 13 Dependability Engineering22
Key pointsDependability in a program can be achieved by avoiding the introduction of faults, by detecting and removing faults before system deployment, and by including fault tolerance facilities.The use of redundancy and diversity in hardware, software processes and software systems is essential for the development of dependable systems.The use of a well-defined, repeatable process is essential if faults in a system are to be minimized. Dependable system architectures are system architectures that are designed for fault tolerance. Architectural styles that support fault tolerance include protection systems, self-monitoring architectures and N-version programming.Chapter 13 Dependability Engineering23
Chapter 13 – Dependability engineeringLecture 224Chapter 13 Dependability Engineering
N-version programmingMultiple versions of a software system carry out computations at the same time. There should be an odd number of computers involved, typically 3.The results are compared using a voting system and the majority result is taken to be the correct result.Approach derived from the notion of triple-modular redundancy, as used in hardware systems.Chapter 13 Dependability Engineering25
Hardware fault toleranceDepends on triple-modular redundancy (TMR).There are three replicated identical components that receive the same input and whose outputs are compared.If one output is different, it is ignored and component failure is assumed.Based on most faults resulting from  component failures rather than design faults and a low probability of simultaneous component failure.26Chapter 13 Dependability Engineering
Triple modular redundancy27Chapter 13 Dependability Engineering
N-version programming28Chapter 13 Dependability Engineering
N-version programmingThe different system versions are designed and implemented by different teams. It is assumed that there is a low probability that they will make the same mistakes. The algorithms used should but may not be different.There is some empirical evidence that teams commonly misinterpret specifications in the same way and chose the same algorithms in their systems.29Chapter 13 Dependability Engineering
Software diversityApproaches to software fault tolerance depend on software diversity where it is assumed that different implementations of the same software specification will fail in different ways.It is assumed that implementations are (a) independent and (b) do not include common errors.Strategies to achieve diversityDifferent programming languagesDifferent design methods and toolsExplicit specification of different algorithmsChapter 13 Dependability Engineering30
Problems with design diversityTeams are not culturally diverse so they tend to tackle problems in the same way.Characteristic errorsDifferent teams make the same mistakes.  Some parts of an implementation are more difficult than others so all teams tend to make mistakes in the same place;Specification errors;If there is an error in the specification then this is reflected in all implementations;This can be addressed to some extent by using multiple specification representations.31Chapter 13 Dependability Engineering
Specification dependencyBoth approaches to software redundancy are susceptible to specification errors. If the specification is incorrect, the system could failThis is also a problem with hardware but software specifications are usually more complex than hardware specifications and harder to validate.This has been addressed in some cases by developing separate software specifications from the same user specification.32Chapter 13 Dependability Engineering
Improvements in practiceIn principle, if diversity and independence can be achieved, multi-version programming leads to very significant improvements in reliability and availability.In practice, observed improvements are much less significant but the approach seems leads to reliability improvements of between 5 and 9 times.The key question is whether or not such improvements are worth the considerable extra development costs for multi-version programming.Chapter 13 Dependability Engineering33
Dependable programmingGood programming practices can be adopted that help reduce the incidence of program faults.These programming practices supportFault avoidanceFault detectionFault toleranceChapter 13 Dependability Engineering34
Good practice guidelines for dependable programming35Chapter 13 Dependability Engineering
Control the visibility of information in a programProgram components should only be allowed access to data that they need for their implementation.This means that accidental corruption of parts of the program state by these components is impossible.You can control visibility by using abstract data types where the data representation is private and you only allow access to the data through predefined operations such as get () and put ().Chapter 13 Dependability Engineering36
Check all inputs for validityAll program take inputs from their environment and make assumptions about these inputs.However, program specifications rarely define what to do if an input is not consistent with these assumptions.Consequently, many programs behave unpredictably when presented with unusual inputs and, sometimes, these are threats to the security of the system.Consequently, you should always check inputs before processing against the assumptions made about these inputs.Chapter 13 Dependability Engineering37
Validity checksRange checksCheck that the input falls within a known range.Size checksCheck that the input does not exceed some maximum size e.g. 40 characters for a name.Representation checksCheck that the input does not include characters that should not be part of its representation e.g. names do not include numerals.Reasonableness checksUse information about the input to check if it is reasonable rather than an extreme value.Chapter 13 Dependability Engineering38
Provide a handler for all exceptionsA program exception is an error or some unexpected event such as a power failure.Exception handling constructs allow for such events to be handled without the need for continual status checking to detect exceptions.Using normal control constructs to detect exceptions needs many additional statements to be added to the program. This adds a significant overhead and is potentially error-prone.39Chapter 13 Dependability Engineering
Exception handling40Chapter 13 Dependability Engineering
Exception handlingThree possible exception handling strategiesSignal to a calling component that an exception has occurred and provide information about the type of exception.Carry out some alternative processing to the processing where the exception occurred. This is only possible where the exception handler has enough information to recover from the problem that has arisen.Pass control to a run-time support system to handle the exception.Exception handling is a mechanism to provide some fault toleranceChapter 13 Dependability Engineering41
Minimize the use of error-prone constructsProgram faults are usually a consequence of human error because programmers lose track of the relationships between the different parts of the systemThis is exacerbated by error-prone constructs in programming languages that are inherently complex or that don’t check for mistakes when they could do so.Therefore, when programming, you should try to avoid or at least minimize the use of these error-prone constructs.Chapter 13 Dependability Engineering42
Error-prone constructsUnconditional branch (goto) statementsFloating-point numbersInherently imprecise. The imprecision may lead to invalid comparisons.PointersPointers referring to the wrong memory areas can corrupt data. Aliasing can make programs difficult to understand and change.Dynamic memory allocationRun-time allocation can cause memory overflow.43Chapter 13 Dependability Engineering
Error-prone constructsParallelismCan result in subtle timing errors because of unforeseen interaction between parallel processes.RecursionErrors in recursion can cause memory overflow as the program stack fills up.InterruptsInterrupts can cause a critical operation to be terminated and make a program difficult to understand.  InheritanceCode is not localised. This can result in unexpected behaviour when changes are made and problems of understanding the code.44Chapter 13 Dependability Engineering
Error-prone constructsAliasingUsing more than 1 name to refer to the same state variable.Unbounded arraysBuffer overflow failures can occur if no bound checking on arrays.Default input processingAn input action that occurs irrespective of the input.This can cause problems if the default action is to transfer control elsewhere in the program. In incorrect or deliberately malicious input can then trigger a program failure.Chapter 13 Dependability Engineering45
Provide restart capabilitiesFor systems that involve long transactions or user interactions, you should always provide a restart capability that allows the system to restart after failure without users having to redo everything that they have done.Restart depends on the type of systemKeep copies of forms so that users don’t have to fill them in again if there is a problemSave state periodically and restart from the saved stateChapter 13 Dependability Engineering46
Check array boundsIn some programming languages, such as C, it is possible to address a memory location outside of the range allowed for in an array declaration.This leads to the well-known ‘bounded buffer’ vulnerability where attackers write executable code into memory by deliberately writing beyond the top element in an array.If your language does not include bound checking, you should therefore always check that an array access is within the bounds of the array.Chapter 13 Dependability Engineering47
Include timeouts when calling external componentsIn a distributed system, failure of a remote computer can be ‘silent’ so that programs expecting a service from that computer may never receive that service or any indication that there has been a failure.To avoid this, you should always include timeouts on all calls to external components. After a defined time period has elapsed without a response, your system should then assume failure and take whatever actions are required to recover from this.Chapter 13 Dependability Engineering48
Name all constants that represent real-world valuesAlways give constants that reflect real-world values (such as tax rates) names rather than using their numeric values and always refer to them by nameYou are less likely to make mistakes and type the wrong value when you are using a name rather than a value.This means that when these ‘constants’ change (for sure, they are not really constant), then you only have to make the change in one place in your program.Chapter 13 Dependability Engineering49
Key pointsSoftware diversity is difficult to achieve because it is practically impossible to ensure that each version of the software is truly independent.Dependable programming relies on the inclusion of redundancy in a program to check the validity of inputs and the values of program variables.Some programming constructs and techniques, such as goto statements, pointers, recursion, inheritance and floating-point numbers, are inherently error-prone. You should try to avoid these constructs when developing dependable systems.Chapter 13 Dependability Engineering50

More Related Content

What's hot (20)

PPT
Ian Sommerville, Software Engineering, 9th Edition Ch1
Mohammed Romi
 
PPTX
Ch6-Software Engineering 9
Ian Sommerville
 
PPT
Chapter 01 software engineering pressman
RohitGoyal183
 
PPTX
Ch8-Software Engineering 9
Ian Sommerville
 
PPTX
Ch1-Software Engineering 9
Ian Sommerville
 
PPTX
Ch7-Software Engineering 9
Ian Sommerville
 
PPTX
Software Project Management
NoorHameed6
 
PPTX
Ch5- Software Engineering 9
Ian Sommerville
 
PPTX
Ch2-Software Engineering 9
Ian Sommerville
 
PPT
Ch 6
Mohammed Romi
 
PPT
Ian Sommerville, Software Engineering, 9th EditionCh 8
Mohammed Romi
 
PPTX
Ch12-Software Engineering 9
Ian Sommerville
 
PPTX
Ch3-Software Engineering 9
Ian Sommerville
 
PPTX
Ch17-Software Engineering 9
Ian Sommerville
 
PDF
Software Engineering - Ch17
Siddharth Ayer
 
PPTX
Ch22 project management
software-engineering-book
 
PPTX
Staff training and certification
ashamarsha
 
PPTX
Component based software engineering
Charotar University Of Science And Technology,Gujrat
 
PPTX
Ch18 service oriented software engineering
software-engineering-book
 
PPT
Introduction to Software Engineering
Zahoor Khan
 
Ian Sommerville, Software Engineering, 9th Edition Ch1
Mohammed Romi
 
Ch6-Software Engineering 9
Ian Sommerville
 
Chapter 01 software engineering pressman
RohitGoyal183
 
Ch8-Software Engineering 9
Ian Sommerville
 
Ch1-Software Engineering 9
Ian Sommerville
 
Ch7-Software Engineering 9
Ian Sommerville
 
Software Project Management
NoorHameed6
 
Ch5- Software Engineering 9
Ian Sommerville
 
Ch2-Software Engineering 9
Ian Sommerville
 
Ian Sommerville, Software Engineering, 9th EditionCh 8
Mohammed Romi
 
Ch12-Software Engineering 9
Ian Sommerville
 
Ch3-Software Engineering 9
Ian Sommerville
 
Ch17-Software Engineering 9
Ian Sommerville
 
Software Engineering - Ch17
Siddharth Ayer
 
Ch22 project management
software-engineering-book
 
Staff training and certification
ashamarsha
 
Component based software engineering
Charotar University Of Science And Technology,Gujrat
 
Ch18 service oriented software engineering
software-engineering-book
 
Introduction to Software Engineering
Zahoor Khan
 

Viewers also liked (8)

PPTX
Ch26 - software engineering 9
Ian Sommerville
 
PPTX
Ch25-Software Engineering 9
Ian Sommerville
 
PPTX
Ch4-Software Engineering 9
Ian Sommerville
 
PPTX
Chap5 RE management
Ian Sommerville
 
PPTX
Chap3 RE elicitation
Ian Sommerville
 
PPTX
Chap2 RE processes
Ian Sommerville
 
PPTX
Chap1 RE Introduction
Ian Sommerville
 
PPTX
Chap4 RE validation
Ian Sommerville
 
Ch26 - software engineering 9
Ian Sommerville
 
Ch25-Software Engineering 9
Ian Sommerville
 
Ch4-Software Engineering 9
Ian Sommerville
 
Chap5 RE management
Ian Sommerville
 
Chap3 RE elicitation
Ian Sommerville
 
Chap2 RE processes
Ian Sommerville
 
Chap1 RE Introduction
Ian Sommerville
 
Chap4 RE validation
Ian Sommerville
 
Ad

Similar to Ch13-Software Engineering 9 (20)

PPTX
Ch13.pptx
MohammedNouh7
 
PPTX
CS 5032 L8 dependability engineering 2 2013
Ian Sommerville
 
PDF
Separation of concerns is a design concept [Dij82] that suggests that any com...
premsridev11
 
PPTX
CS 5032 L7 dependability engineering 2013
Ian Sommerville
 
PDF
ASE_Chap1 - Compatibility Mode for advance software
12a3lehanguyenkhanh1
 
PPTX
ch10.pptx
gdfgdfgdf1
 
PPTX
Dependablity Engineering 1 (CS 5032 2012)
Ian Sommerville
 
PPTX
Ch10 - Dependable Systems
Harsh Verdhan Raj
 
PPTX
Dependability Engineering 2 (CS 5032 2012)
Ian Sommerville
 
PPTX
Ch10 dependable systems
software-engineering-book
 
PPTX
Ch11 reliability engineering
software-engineering-book
 
PPTX
Ch11 - Reliability Engineering
Harsh Verdhan Raj
 
PPTX
High dependability of the automated systems
Alan Tatourian
 
PPT
Chapter- Five fault powers poin lecture
borchala1
 
PDF
Advance Software Engineering notes for ME students
poornank05
 
PPTX
RTS fault tolerance, Reliability evaluation
4132lenin6497ram
 
PPTX
real time systems fault tolerance, Redundancy
4132lenin6497ram
 
PDF
Software reliability engineering
Mark Turner CRP
 
PDF
[2015/2016] Software systems engineering PRINCIPLES
Ivano Malavolta
 
Ch13.pptx
MohammedNouh7
 
CS 5032 L8 dependability engineering 2 2013
Ian Sommerville
 
Separation of concerns is a design concept [Dij82] that suggests that any com...
premsridev11
 
CS 5032 L7 dependability engineering 2013
Ian Sommerville
 
ASE_Chap1 - Compatibility Mode for advance software
12a3lehanguyenkhanh1
 
ch10.pptx
gdfgdfgdf1
 
Dependablity Engineering 1 (CS 5032 2012)
Ian Sommerville
 
Ch10 - Dependable Systems
Harsh Verdhan Raj
 
Dependability Engineering 2 (CS 5032 2012)
Ian Sommerville
 
Ch10 dependable systems
software-engineering-book
 
Ch11 reliability engineering
software-engineering-book
 
Ch11 - Reliability Engineering
Harsh Verdhan Raj
 
High dependability of the automated systems
Alan Tatourian
 
Chapter- Five fault powers poin lecture
borchala1
 
Advance Software Engineering notes for ME students
poornank05
 
RTS fault tolerance, Reliability evaluation
4132lenin6497ram
 
real time systems fault tolerance, Redundancy
4132lenin6497ram
 
Software reliability engineering
Mark Turner CRP
 
[2015/2016] Software systems engineering PRINCIPLES
Ivano Malavolta
 
Ad

Ch13-Software Engineering 9

  • 1. Chapter 13 – Dependability engineeringLecture 11Chapter 13 Dependability Engineering
  • 2. Topics coveredRedundancy and diversityFundamental approaches to achieve fault tolerance.Dependable processesHow the use of dependable processes leads to dependable systemsDependable systems architecturesArchitectural patterns for software fault toleranceDependable programmingGuidelines for programming to avoid errors.Chapter 13 Dependability Engineering2
  • 3. Software dependabilityIn general, software customers expect all software to be dependable. However, for non-critical applications, they may be willing to accept some system failures.Some applications (critical systems) have very high dependability requirements and special software engineering techniques may be used to achieve this.Medical systemsTelecommunications and power systemsAerospace systems3Chapter 13 Dependability Engineering
  • 4. Dependability achievementFault avoidanceThe system is developed in such a way that human error is avoided and thus system faults are minimised.The development process is organised so that faults in the system are detected and repaired before delivery to the customer.Fault detectionVerification and validation techniques are used to discover and remove faults in a system before it is deployed.Fault toleranceThe system is designed so that faults in the delivered software do not result in system failure.4Chapter 13 Dependability Engineering
  • 5. The increasing costs of residual fault removal 5Chapter 13 Dependability Engineering
  • 6. Regulated systemsMany critical systems are regulated systems, which means that their use must be approved by an external regulator before the systems go into service. Nuclear systemsAir traffic control systemsMedical devicesA safety and dependability case has to be approved by the regulator. Therefore, critical systems development has to create the evidence to convince a regulator that the system is dependable, safe and secure.Chapter 13 Dependability Engineering6
  • 7. Diversity and redundancyRedundancyKeep more than 1 version of a critical component available so that if one fails then a backup is available.DiversityProvide the same functionality in different ways so that they will not fail in the same way.However, adding diversity and redundancy adds complexity and this can increase the chances of error.Some engineers advocate simplicity and extensive V & V is a more effective route to software dependability.7Chapter 13 Dependability Engineering
  • 8. Diversityand redundancy examplesRedundancy. Where availability is critical (e.g. in e-commerce systems), companies normally keep backup servers and switch to these automatically if failure occurs.Diversity. To provide resilience against external attacks, different servers may be implemented using different operating systems (e.g. Windows and Linux)8Chapter 13 Dependability Engineering
  • 9. Process diversity and redundancyProcess activities, such as validation, should not depend on a single approach, such as testing, to validate the systemRather, multiple different process activities the complement each other and allow for cross-checking help to avoid process errors, which may lead to errors in the softwareChapter 13 Dependability Engineering9
  • 10. Dependable processesTo ensure a minimal number of software faults, it is important to have a well-defined, repeatable software process.A well-defined repeatable process is one that does not depend entirely on individual skills; rather can be enacted by different people.Regulators use information about the process to check if good software engineering practice has been used.For fault detection, it is clear that the process activities should include significant effort devoted to verification and validation.10Chapter 13 Dependability Engineering
  • 11. Attributes of dependable processes11Chapter 13 Dependability Engineering
  • 12. Validation activitiesRequirements reviews.Requirements management.Formal specification.System modelingDesign and code inspection.Static analysis.Test planning and management.Change management, discussed in Chapter 25, is also essential.12Chapter 13 Dependability Engineering
  • 13. Fault toleranceIn critical situations, software systems must be fault tolerant. Fault tolerance is required where there are high availability requirements or where system failure costs are very high.Fault tolerance means that the system can continue in operation in spite of software failure.Even if the system has been proved to conform to its specification, it must also be fault tolerant as there may be specification errors or the validation may be incorrect.13Chapter 13 Dependability Engineering
  • 14. Dependable system architecturesDependable systems architectures are used in situations where fault tolerance is essential. These architectures are generally all based on redundancy and diversity.Examples of situations where dependable architectures are used:Flight control systems, where system failure could threaten the safety of passengersReactor systems where failure of a control system could lead to a chemical or nuclear emergencyTelecommunication systems, where there is a need for 24/7 availability.Chapter 13 Dependability Engineering14
  • 15. Protection systemsA specialized system that is associated with some other control system, which can take emergency action if a failure occurs.System to stop a train if it passes a red lightSystem to shut down a reactor if temperature/pressure are too highProtection systems independently monitor the controlled system and the environment.If a problem is detected, it issues commands to take emergency action to shut down the system and avoid a catastrophe.Chapter 13 Dependability Engineering15
  • 16. Protection system architecture16Chapter 13 Dependability Engineering
  • 17. Protection system functionalityProtection systems are redundant because they include monitoring and control capabilities that replicate those in the control software.Protection systems should be diverse and use different technology from the control software.They are simpler than the control system so more effort can be expended in validation and dependability assurance.Aim is to ensure that there is a low probability of failure on demand for the protection system.Chapter 13 Dependability Engineering17
  • 18. Self-monitoring architecturesMulti-channel architectures where the system monitors its own operations and takes action if inconsistencies are detected.The same computation is carried out on each channel and the results are compared. If the results are identical and are produced at the same time, then it is assumed that the system is operating correctly.If the results are different, then a failure is assumed and a failure exception is raised.Chapter 13 Dependability Engineering18
  • 19. Self-monitoring architecture19Chapter 13 Dependability Engineering
  • 20. Self-monitoring systemsHardware in each channel has to be diverse so that common mode hardware failure will not lead to each channel producing the same results.Software in each channel must also be diverse, otherwise the same software error would affect each channel.If high-availability is required, you may use several self-checking systems in parallel.This is the approach used in the Airbus family of aircraft for their flight control systems.Chapter 13 Dependability Engineering20
  • 21. Airbus flight control system architecture21Chapter 13 Dependability Engineering
  • 22. Airbus architecture discussionThe Airbus FCS has 5 separate computers, any one of which can run the control software.Extensive use has been made of diversityPrimary systems use a different processor from the secondary systems.Primary and secondary systems use chipsets from different manufacturers.Software in secondary systems is less complex than in primary system – provides only critical functionality.Software in each channel is developed in different programming languages by different teams.Different programming languages used in primary and secondary systems.Chapter 13 Dependability Engineering22
  • 23. Key pointsDependability in a program can be achieved by avoiding the introduction of faults, by detecting and removing faults before system deployment, and by including fault tolerance facilities.The use of redundancy and diversity in hardware, software processes and software systems is essential for the development of dependable systems.The use of a well-defined, repeatable process is essential if faults in a system are to be minimized. Dependable system architectures are system architectures that are designed for fault tolerance. Architectural styles that support fault tolerance include protection systems, self-monitoring architectures and N-version programming.Chapter 13 Dependability Engineering23
  • 24. Chapter 13 – Dependability engineeringLecture 224Chapter 13 Dependability Engineering
  • 25. N-version programmingMultiple versions of a software system carry out computations at the same time. There should be an odd number of computers involved, typically 3.The results are compared using a voting system and the majority result is taken to be the correct result.Approach derived from the notion of triple-modular redundancy, as used in hardware systems.Chapter 13 Dependability Engineering25
  • 26. Hardware fault toleranceDepends on triple-modular redundancy (TMR).There are three replicated identical components that receive the same input and whose outputs are compared.If one output is different, it is ignored and component failure is assumed.Based on most faults resulting from component failures rather than design faults and a low probability of simultaneous component failure.26Chapter 13 Dependability Engineering
  • 27. Triple modular redundancy27Chapter 13 Dependability Engineering
  • 28. N-version programming28Chapter 13 Dependability Engineering
  • 29. N-version programmingThe different system versions are designed and implemented by different teams. It is assumed that there is a low probability that they will make the same mistakes. The algorithms used should but may not be different.There is some empirical evidence that teams commonly misinterpret specifications in the same way and chose the same algorithms in their systems.29Chapter 13 Dependability Engineering
  • 30. Software diversityApproaches to software fault tolerance depend on software diversity where it is assumed that different implementations of the same software specification will fail in different ways.It is assumed that implementations are (a) independent and (b) do not include common errors.Strategies to achieve diversityDifferent programming languagesDifferent design methods and toolsExplicit specification of different algorithmsChapter 13 Dependability Engineering30
  • 31. Problems with design diversityTeams are not culturally diverse so they tend to tackle problems in the same way.Characteristic errorsDifferent teams make the same mistakes. Some parts of an implementation are more difficult than others so all teams tend to make mistakes in the same place;Specification errors;If there is an error in the specification then this is reflected in all implementations;This can be addressed to some extent by using multiple specification representations.31Chapter 13 Dependability Engineering
  • 32. Specification dependencyBoth approaches to software redundancy are susceptible to specification errors. If the specification is incorrect, the system could failThis is also a problem with hardware but software specifications are usually more complex than hardware specifications and harder to validate.This has been addressed in some cases by developing separate software specifications from the same user specification.32Chapter 13 Dependability Engineering
  • 33. Improvements in practiceIn principle, if diversity and independence can be achieved, multi-version programming leads to very significant improvements in reliability and availability.In practice, observed improvements are much less significant but the approach seems leads to reliability improvements of between 5 and 9 times.The key question is whether or not such improvements are worth the considerable extra development costs for multi-version programming.Chapter 13 Dependability Engineering33
  • 34. Dependable programmingGood programming practices can be adopted that help reduce the incidence of program faults.These programming practices supportFault avoidanceFault detectionFault toleranceChapter 13 Dependability Engineering34
  • 35. Good practice guidelines for dependable programming35Chapter 13 Dependability Engineering
  • 36. Control the visibility of information in a programProgram components should only be allowed access to data that they need for their implementation.This means that accidental corruption of parts of the program state by these components is impossible.You can control visibility by using abstract data types where the data representation is private and you only allow access to the data through predefined operations such as get () and put ().Chapter 13 Dependability Engineering36
  • 37. Check all inputs for validityAll program take inputs from their environment and make assumptions about these inputs.However, program specifications rarely define what to do if an input is not consistent with these assumptions.Consequently, many programs behave unpredictably when presented with unusual inputs and, sometimes, these are threats to the security of the system.Consequently, you should always check inputs before processing against the assumptions made about these inputs.Chapter 13 Dependability Engineering37
  • 38. Validity checksRange checksCheck that the input falls within a known range.Size checksCheck that the input does not exceed some maximum size e.g. 40 characters for a name.Representation checksCheck that the input does not include characters that should not be part of its representation e.g. names do not include numerals.Reasonableness checksUse information about the input to check if it is reasonable rather than an extreme value.Chapter 13 Dependability Engineering38
  • 39. Provide a handler for all exceptionsA program exception is an error or some unexpected event such as a power failure.Exception handling constructs allow for such events to be handled without the need for continual status checking to detect exceptions.Using normal control constructs to detect exceptions needs many additional statements to be added to the program. This adds a significant overhead and is potentially error-prone.39Chapter 13 Dependability Engineering
  • 40. Exception handling40Chapter 13 Dependability Engineering
  • 41. Exception handlingThree possible exception handling strategiesSignal to a calling component that an exception has occurred and provide information about the type of exception.Carry out some alternative processing to the processing where the exception occurred. This is only possible where the exception handler has enough information to recover from the problem that has arisen.Pass control to a run-time support system to handle the exception.Exception handling is a mechanism to provide some fault toleranceChapter 13 Dependability Engineering41
  • 42. Minimize the use of error-prone constructsProgram faults are usually a consequence of human error because programmers lose track of the relationships between the different parts of the systemThis is exacerbated by error-prone constructs in programming languages that are inherently complex or that don’t check for mistakes when they could do so.Therefore, when programming, you should try to avoid or at least minimize the use of these error-prone constructs.Chapter 13 Dependability Engineering42
  • 43. Error-prone constructsUnconditional branch (goto) statementsFloating-point numbersInherently imprecise. The imprecision may lead to invalid comparisons.PointersPointers referring to the wrong memory areas can corrupt data. Aliasing can make programs difficult to understand and change.Dynamic memory allocationRun-time allocation can cause memory overflow.43Chapter 13 Dependability Engineering
  • 44. Error-prone constructsParallelismCan result in subtle timing errors because of unforeseen interaction between parallel processes.RecursionErrors in recursion can cause memory overflow as the program stack fills up.InterruptsInterrupts can cause a critical operation to be terminated and make a program difficult to understand. InheritanceCode is not localised. This can result in unexpected behaviour when changes are made and problems of understanding the code.44Chapter 13 Dependability Engineering
  • 45. Error-prone constructsAliasingUsing more than 1 name to refer to the same state variable.Unbounded arraysBuffer overflow failures can occur if no bound checking on arrays.Default input processingAn input action that occurs irrespective of the input.This can cause problems if the default action is to transfer control elsewhere in the program. In incorrect or deliberately malicious input can then trigger a program failure.Chapter 13 Dependability Engineering45
  • 46. Provide restart capabilitiesFor systems that involve long transactions or user interactions, you should always provide a restart capability that allows the system to restart after failure without users having to redo everything that they have done.Restart depends on the type of systemKeep copies of forms so that users don’t have to fill them in again if there is a problemSave state periodically and restart from the saved stateChapter 13 Dependability Engineering46
  • 47. Check array boundsIn some programming languages, such as C, it is possible to address a memory location outside of the range allowed for in an array declaration.This leads to the well-known ‘bounded buffer’ vulnerability where attackers write executable code into memory by deliberately writing beyond the top element in an array.If your language does not include bound checking, you should therefore always check that an array access is within the bounds of the array.Chapter 13 Dependability Engineering47
  • 48. Include timeouts when calling external componentsIn a distributed system, failure of a remote computer can be ‘silent’ so that programs expecting a service from that computer may never receive that service or any indication that there has been a failure.To avoid this, you should always include timeouts on all calls to external components. After a defined time period has elapsed without a response, your system should then assume failure and take whatever actions are required to recover from this.Chapter 13 Dependability Engineering48
  • 49. Name all constants that represent real-world valuesAlways give constants that reflect real-world values (such as tax rates) names rather than using their numeric values and always refer to them by nameYou are less likely to make mistakes and type the wrong value when you are using a name rather than a value.This means that when these ‘constants’ change (for sure, they are not really constant), then you only have to make the change in one place in your program.Chapter 13 Dependability Engineering49
  • 50. Key pointsSoftware diversity is difficult to achieve because it is practically impossible to ensure that each version of the software is truly independent.Dependable programming relies on the inclusion of redundancy in a program to check the validity of inputs and the values of program variables.Some programming constructs and techniques, such as goto statements, pointers, recursion, inheritance and floating-point numbers, are inherently error-prone. You should try to avoid these constructs when developing dependable systems.Chapter 13 Dependability Engineering50