2
Most read
4
Most read
Five Common Mistakes when
Conducting Software Failure
Modes Effects Analysis
Ann Marie Neufelder
SoftRel, LLC
amneufelder@softrel.com
http://www.softrel.com
© SoftRel, LLC 2019. This presentation may not be reprinted in whole or part
without written permission from amneufelder@softrel.com
Five Common Mistakes when
Conducting Software Failure Modes
Effects Analysis
 The software FMECA is a powerful tool for identifying
software failure modes but there are 5 common mistakes
that can derail the effectiveness of the analysis.
 #1 - Software is analyzed as a black box (and
shouldn't be).
 #2 - It's assumed that the software will work as
expected
 #3 - It's conducted far too late in development life
cycle
 #4 - It's conducted at wrong level of abstraction
 #5 - The most common failure modes aren't
considered
2
#1 - Software is analyzed as a black
box (and shouldn’t be).
 The single most common mistake is to analyze the software
based on what it "is" instead of what it "does".
 The black box approach is common for hardware FMECA.
 However, it doesn't work well for software.
 Software doesn't wear out - it fails because the code doesn't
perform the required functions.
 Hence, it must be analyzed from a functionality versus black box
standpoint.
 Examples of “Black box” SFMECA which should be avoided.
3
LRU Failure mode Recommendation
Executive
CSCI
CSCI fails to
execute
Doesn’t address states, timing, missing
functionality, wrong data, faulty error
handling, etc.
Executive
CSCI
CSCI fails to
perform required
function
CSCI performs far too many features and
functions. List each feature and what can
go wrong instead.
Examples of functional SFMEA
 Example of a use case to move a turret analyzed
based on what it does/doesn’t do and not what it is
4
Use
Case
Failure mode Root causes
Move
turret
Faulty timing • Turret moves too late
• Turret moves too early
Faulty sequencing
and state
management
• Turret moves inadvertently
• Turret fails to move when commanded
Faulty error handling • Turret exceeds the maximum range allowed
• Failures in turret hardware aren’t detected
Faulty processing Turret moves upon startup after an abnormal
shutdown
Faulty data • Turret moves to the wrong location because of
improperly formatted, improperly scaled or null
data
• Turret comes too close to a hard stop because of
overly tight specifications
• Turret doesn’t move the entire spectrum of
possible radians
Faulty functionality Use case doesn’t meet the system requirements
#2 - It's assumed that the
software will work as expected
 The "software" FMECA focuses on how the "software" fails.
 Yet many analysts assume that the software will work
perfectly.
 There's no point in doing a "software" FMECA if you're
going to assume that the software always works.
 One must assume that
 1) Unwritten assumptions will lead to failures
 2) If an important detail isn't in writing it won't get coded or
tested
 3) If the requirements don't discuss fault handling the
software won't handle faults
 4) even when the requirements are complete, the code
may not be written to meet the requirements.
5
Example: Unwritten assumptions in the
software requirements leading to a failure
Satellite is lost at a cost of $186 million.
Engine continues to operate until fuel is consumed
First stage of launch on 10/8/05 is successful.
Second stage stops performing when required
command to cut off main engine doesn’t occur.
SRS specifications missing requirement for main
engine cutoff (Unwritten assumption that software
engineers would know that the software must cut
of the main engine after first stage of launch)
European
Space
Agency
CryoSat-1
Example: Important details missing from
requirements won’t get coded or tested
This is the specification for the logging feature:
1) The software shall log all warnings, failures and successful
missions.
2) At least 8 hours of operation shall be captured
3) Logging to an SD card shall be supported in addition to logging
to the computer drive
This is what you know about the software organization and software
itself
1) Logging function will be called from nearly every use case since
nearly every use case checks for warnings, failures and
successes
2) Testing will cover the requirements. But no plans to cover stress
testing, endurance testing, path testing, fault insertion testing.
3) Software engineers have discretion to test their code as they see
fit.
Example: Important details missing from requirements
won’t get coded or tested
 These are the faults that can/will fall through the cracks
 No checking of read/write errors, file open, file exist errors which are
common
 No rollover of log files once drive is full (may be beyond 8 hours)
 No checking of SD card (not present, not working)
 Logging when heavy usage versus light or normal usage (might take
less than 8 hours to fill drive if heavy usage)
 This is why these faults aren’t found prior to operation
 No one is required to explicitly test these faults
 No one is required to review the code for this fault checking
 No one is required to test beyond 8 hours of operation
 This is the effect if any of these faults happens
 Entire system is down because it crashes on nearly every function
once drive is full, SD card removed, file is open or read/write errors
 With the SFMEA you cannot assume that best practices will be followed unless
there is a means to guarantee that
Example: If the requirements don't discuss fault
handling the software won't handle faults
 This state diagram based on the written software requirements,
doesn’t have a faulty state or transitions to/from a faulty state
 Hence, these faults are unlikely to be handled in design, code
or test plan
 The SFMECA should not assume otherwise
9
Initialization
Ready
Prepare for
launch
Launch
Fails to account for initialization
failures in HW, SW
Fails to account for
failures in launch
preparation
Fails to account for
launch failures such as
hang fire, misfire, etc
Cost = $18.5 million of 1962 dollars.
Rocket destroyed 293 seconds after liftoff.
Faulty corrections sent the rocket off
course.
Without the smoothing function the
software treated normal variations in
velocity as if they were serious.
The requirements document clearly
indicated an overbar which was
supposed to be an averaging function of
velocity. However, the programmer
ignored the superscript when transcribing
the formula into code.
Mariner 1 rocket failure in 1962.
[Mariner]
Example: Even when the requirements are
complete, the code may not be written to meet
the requirements
#3 It's conducted far too late
in the development life cycle
 The perfect time to conduct a software FMECA is
immediately after the first pass of the software
requirements/use cases and before the code is written to
those requirements.
 Typically the first pass of the SRS and use cases is when
the "shalls" are defined.
 In the second pass is when the "shall nots" or alternative
flows should be defined.
 The SFMECA can be used to strengthen the requirements
and can even be used as a requirements review tool.
 If SFMECA is conducted after code is written
 Less effective but still time to effect test procedures
 If SFMECA is conducted after testing is finished
 Significantly less effective – can only effect user training or
next release of software
11
#4 It's conducted at the
wrong level of abstraction
 Some analysts work through the code one line at a time
and analyze how that single line of code could fail.
 For software functions that are associated with
particularly high hazards that may be appropriate but not
necessarily sufficient.
 When analyzing one line of code at a time the analyst
misses the failure modes due to
 1) required code is missing altogether
 2) defects that are caused by more than one line of code.
 Effective software FMECAs focus on the requirements, use
cases, interfaces, detailed design and usability.
12
Focusing at too high or too level a level of
abstraction
System
requirements
Software
requirements
Software interface
design
Software design – state
diagrams, timing diagrams,
sequence diagrams, DB design,
GUI design
Module and class design
Line of code
Functions, procedures (code)
Not enough
coverage
across the
software and
not enough
coverage of
design or
software only
requirements
Analyzing
one line of
code at a
time has
potential to
miss the
design and
requirement
s related
faults
#5 The most common failure
modes aren't considered
 The most common failure modes that apply to all
software intensive systems are:
 Faulty functionality - missing required functionality, function
doesn't work as required
 Faulty processing - can't perform after an interruption of
service or extended usage
 Faulty error handling - doesn't handle hardware, interfaces,
software or user faults
 Faulty state management - executes when it shouldn't,
encounters dead states, faulty state transitions, etc.
 Faulty timing - race conditions, a function executes too
early, too late, accumulates timing errors when left on too
long, etc.
 Faulty data - missing, corrupt, improperly sized, improperly
formatted, improperly scaled data isn't handled
14
Just a few examples of failure modes
that causes major failure events
Failure Event Associated failure mode
Several patients suffered radiation
overdose from the Therac 25
equipment in the mid-1980s. [THERAC]
Faulty timing - A race condition combined with
ambiguous error messages and missing
hardware overrides.
AT&T long distance service was down
for 9 hours in January 1991. [AT&T]
Faulty sequencing - An improperly placed
“break” statement was introduced into the
code while making another change.
Ariane 5 Explosion in 1996. [ARIAN5] Faulty data - An unhandled mismatch
between 64 bit and 16 bit format.
NASA Mars Climate Orbiter crash in
1999.[MARS]
Faulty data - Metric/English unit mismatch.
Mars Climate Orbiter was written to take thrust
instructions using the metric unit Newton (N),
while the software on the ground that
generated those instructions used the Imperial
measure pound-force (lbf).
On October 8th, 2005, The European
Space Agency's CryoSat-1 satellite was
lost shortly after launching. [CRYOSAT]
Faulty functionality - Flight Control System
code was missing a required command from
the on-board flight control system to the main
engine.
A rail car fire in a major underground
metro system in April 2007. [RAILCAR]
Faulty error handling - Missing error detection
and recovery by the software.
Additional references
 Effective Application of Software Failure Modes Effects
Analysis
 This book provides practical guidance and examples for
conducting effective software FMECAs.
 If you want to learn more- attend the 2 day software
FMECA technical interchange Register here. Or attend
the online self guided software FMEA training.
16

More Related Content

PDF
Software Failure Modes Effects Analysis Overview
PPTX
An Introduction to Software Failure Modes Effects Analysis (SFMEA)
PDF
Software FMEA and Software FTA – An Effective Tool for Embedded Software Qual...
PPTX
Predict Software Reliability Before the Code is Written
PDF
Revised IEEE 1633 Recommended Practices for Software Reliability
PPTX
Introduction to Software Failure Modes Effects Analysis
PPTX
SDLC - Software Development Life Cycle
PDF
Software Common Defect Enumeration
Software Failure Modes Effects Analysis Overview
An Introduction to Software Failure Modes Effects Analysis (SFMEA)
Software FMEA and Software FTA – An Effective Tool for Embedded Software Qual...
Predict Software Reliability Before the Code is Written
Revised IEEE 1633 Recommended Practices for Software Reliability
Introduction to Software Failure Modes Effects Analysis
SDLC - Software Development Life Cycle
Software Common Defect Enumeration

What's hot (20)

PPTX
Stlc ppt
PPT
PPTX
Overview of software reliability engineering
PPTX
Regression testing
PPTX
defect tracking and management
PDF
Introduction to software FMEA
PDF
IEEE 1633 Recommended Practices for Reliable Software
PPT
Chapter 15
PDF
The Top Ten things that have been proven to effect software reliability
PPTX
Software testing
PPTX
PPT
Software Development Life Cycle (SDLC)
PPT
Validating Non Functional Requirements
PPTX
Acceptance testing
PPTX
Software Development Life Cycle
PPTX
Software Testing: History, Trends, Perspectives - a Brief Overview
PDF
Software testing methods, levels and types
PPTX
Software testing ppt
PPTX
Software Testing Life Cycle – A Beginner’s Guide
PPT
Software Testing Life Cycle
Stlc ppt
Overview of software reliability engineering
Regression testing
defect tracking and management
Introduction to software FMEA
IEEE 1633 Recommended Practices for Reliable Software
Chapter 15
The Top Ten things that have been proven to effect software reliability
Software testing
Software Development Life Cycle (SDLC)
Validating Non Functional Requirements
Acceptance testing
Software Development Life Cycle
Software Testing: History, Trends, Perspectives - a Brief Overview
Software testing methods, levels and types
Software testing ppt
Software Testing Life Cycle – A Beginner’s Guide
Software Testing Life Cycle
Ad

Similar to Five Common Mistakes made when Conducting a Software FMECA (20)

PDF
real simple reliable software
PPTX
Cheat sheet BSC computer science 3rd yr.pptx
PPTX
basic software testing principles and obectives.pptx
PDF
IT6701 Information Management - Unit II
PPTX
Taxonomy of bugs total topic covered presentation
PPT
Software Performance
DOC
Software Bugs A Software Architect Point Of View
PPTX
Testing in multiplatform environment
PDF
Testing concepts [3] - Software Testing Techniques (CIS640)
PDF
Software Defects.pdf
PPT
Dfmea
DOCX
Faq
DOC
Manual testing interview question by INFOTECH
PPTX
Unit 1.pptx
PPTX
Defect Tracking Software Project Presentation
PDF
"Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a P...
PDF
Fundamentals_of_testing.pdf
PPTX
SOFTWARE TESTING UNIT-4
PPTX
Software unit4
PPT
lecture02.ppt
real simple reliable software
Cheat sheet BSC computer science 3rd yr.pptx
basic software testing principles and obectives.pptx
IT6701 Information Management - Unit II
Taxonomy of bugs total topic covered presentation
Software Performance
Software Bugs A Software Architect Point Of View
Testing in multiplatform environment
Testing concepts [3] - Software Testing Techniques (CIS640)
Software Defects.pdf
Dfmea
Faq
Manual testing interview question by INFOTECH
Unit 1.pptx
Defect Tracking Software Project Presentation
"Can We Have Both Safety and Performance in AI for Autonomous Vehicles?," a P...
Fundamentals_of_testing.pdf
SOFTWARE TESTING UNIT-4
Software unit4
lecture02.ppt
Ad

Recently uploaded (20)

PPTX
SE unit 1.pptx aaahshdhajdviwhsiehebeiwheiebeiev
PDF
IAE-V2500 Engine Airbus Family A319/320
PPTX
Agentic Artificial Intelligence (Agentic AI).pptx
PDF
IAE-V2500 Engine for Airbus Family 319/320
PDF
Principles of operation, construction, theory, advantages and disadvantages, ...
PDF
Cryptography and Network Security-Module-I.pdf
PDF
[jvmmeetup] next-gen integration with apache camel and quarkus.pdf
DOCX
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
PDF
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
PPTX
WN UNIT-II CH4_MKaruna_BapatlaEngineeringCollege.pptx
PPTX
AI-Reporting for Emerging Technologies(BS Computer Engineering)
PPTX
Principal presentation for NAAC (1).pptx
PPTX
Environmental studies, Moudle 3-Environmental Pollution.pptx
PDF
Micro 4 New.ppt.pdf a servay of cells and microorganism
PPT
Programmable Logic Controller PLC and Industrial Automation
PDF
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
PPT
UNIT-I Machine Learning Essentials for 2nd years
PPTX
Micro1New.ppt.pptx the main themes if micro
PPTX
Micro1New.ppt.pptx the mai themes of micfrobiology
PDF
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
SE unit 1.pptx aaahshdhajdviwhsiehebeiwheiebeiev
IAE-V2500 Engine Airbus Family A319/320
Agentic Artificial Intelligence (Agentic AI).pptx
IAE-V2500 Engine for Airbus Family 319/320
Principles of operation, construction, theory, advantages and disadvantages, ...
Cryptography and Network Security-Module-I.pdf
[jvmmeetup] next-gen integration with apache camel and quarkus.pdf
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
UEFA_Embodied_Carbon_Emissions_Football_Infrastructure.pdf
WN UNIT-II CH4_MKaruna_BapatlaEngineeringCollege.pptx
AI-Reporting for Emerging Technologies(BS Computer Engineering)
Principal presentation for NAAC (1).pptx
Environmental studies, Moudle 3-Environmental Pollution.pptx
Micro 4 New.ppt.pdf a servay of cells and microorganism
Programmable Logic Controller PLC and Industrial Automation
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
UNIT-I Machine Learning Essentials for 2nd years
Micro1New.ppt.pptx the main themes if micro
Micro1New.ppt.pptx the mai themes of micfrobiology
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS

Five Common Mistakes made when Conducting a Software FMECA

  • 1. Five Common Mistakes when Conducting Software Failure Modes Effects Analysis Ann Marie Neufelder SoftRel, LLC [email protected] http://www.softrel.com © SoftRel, LLC 2019. This presentation may not be reprinted in whole or part without written permission from [email protected]
  • 2. Five Common Mistakes when Conducting Software Failure Modes Effects Analysis  The software FMECA is a powerful tool for identifying software failure modes but there are 5 common mistakes that can derail the effectiveness of the analysis.  #1 - Software is analyzed as a black box (and shouldn't be).  #2 - It's assumed that the software will work as expected  #3 - It's conducted far too late in development life cycle  #4 - It's conducted at wrong level of abstraction  #5 - The most common failure modes aren't considered 2
  • 3. #1 - Software is analyzed as a black box (and shouldn’t be).  The single most common mistake is to analyze the software based on what it "is" instead of what it "does".  The black box approach is common for hardware FMECA.  However, it doesn't work well for software.  Software doesn't wear out - it fails because the code doesn't perform the required functions.  Hence, it must be analyzed from a functionality versus black box standpoint.  Examples of “Black box” SFMECA which should be avoided. 3 LRU Failure mode Recommendation Executive CSCI CSCI fails to execute Doesn’t address states, timing, missing functionality, wrong data, faulty error handling, etc. Executive CSCI CSCI fails to perform required function CSCI performs far too many features and functions. List each feature and what can go wrong instead.
  • 4. Examples of functional SFMEA  Example of a use case to move a turret analyzed based on what it does/doesn’t do and not what it is 4 Use Case Failure mode Root causes Move turret Faulty timing • Turret moves too late • Turret moves too early Faulty sequencing and state management • Turret moves inadvertently • Turret fails to move when commanded Faulty error handling • Turret exceeds the maximum range allowed • Failures in turret hardware aren’t detected Faulty processing Turret moves upon startup after an abnormal shutdown Faulty data • Turret moves to the wrong location because of improperly formatted, improperly scaled or null data • Turret comes too close to a hard stop because of overly tight specifications • Turret doesn’t move the entire spectrum of possible radians Faulty functionality Use case doesn’t meet the system requirements
  • 5. #2 - It's assumed that the software will work as expected  The "software" FMECA focuses on how the "software" fails.  Yet many analysts assume that the software will work perfectly.  There's no point in doing a "software" FMECA if you're going to assume that the software always works.  One must assume that  1) Unwritten assumptions will lead to failures  2) If an important detail isn't in writing it won't get coded or tested  3) If the requirements don't discuss fault handling the software won't handle faults  4) even when the requirements are complete, the code may not be written to meet the requirements. 5
  • 6. Example: Unwritten assumptions in the software requirements leading to a failure Satellite is lost at a cost of $186 million. Engine continues to operate until fuel is consumed First stage of launch on 10/8/05 is successful. Second stage stops performing when required command to cut off main engine doesn’t occur. SRS specifications missing requirement for main engine cutoff (Unwritten assumption that software engineers would know that the software must cut of the main engine after first stage of launch) European Space Agency CryoSat-1
  • 7. Example: Important details missing from requirements won’t get coded or tested This is the specification for the logging feature: 1) The software shall log all warnings, failures and successful missions. 2) At least 8 hours of operation shall be captured 3) Logging to an SD card shall be supported in addition to logging to the computer drive This is what you know about the software organization and software itself 1) Logging function will be called from nearly every use case since nearly every use case checks for warnings, failures and successes 2) Testing will cover the requirements. But no plans to cover stress testing, endurance testing, path testing, fault insertion testing. 3) Software engineers have discretion to test their code as they see fit.
  • 8. Example: Important details missing from requirements won’t get coded or tested  These are the faults that can/will fall through the cracks  No checking of read/write errors, file open, file exist errors which are common  No rollover of log files once drive is full (may be beyond 8 hours)  No checking of SD card (not present, not working)  Logging when heavy usage versus light or normal usage (might take less than 8 hours to fill drive if heavy usage)  This is why these faults aren’t found prior to operation  No one is required to explicitly test these faults  No one is required to review the code for this fault checking  No one is required to test beyond 8 hours of operation  This is the effect if any of these faults happens  Entire system is down because it crashes on nearly every function once drive is full, SD card removed, file is open or read/write errors  With the SFMEA you cannot assume that best practices will be followed unless there is a means to guarantee that
  • 9. Example: If the requirements don't discuss fault handling the software won't handle faults  This state diagram based on the written software requirements, doesn’t have a faulty state or transitions to/from a faulty state  Hence, these faults are unlikely to be handled in design, code or test plan  The SFMECA should not assume otherwise 9 Initialization Ready Prepare for launch Launch Fails to account for initialization failures in HW, SW Fails to account for failures in launch preparation Fails to account for launch failures such as hang fire, misfire, etc
  • 10. Cost = $18.5 million of 1962 dollars. Rocket destroyed 293 seconds after liftoff. Faulty corrections sent the rocket off course. Without the smoothing function the software treated normal variations in velocity as if they were serious. The requirements document clearly indicated an overbar which was supposed to be an averaging function of velocity. However, the programmer ignored the superscript when transcribing the formula into code. Mariner 1 rocket failure in 1962. [Mariner] Example: Even when the requirements are complete, the code may not be written to meet the requirements
  • 11. #3 It's conducted far too late in the development life cycle  The perfect time to conduct a software FMECA is immediately after the first pass of the software requirements/use cases and before the code is written to those requirements.  Typically the first pass of the SRS and use cases is when the "shalls" are defined.  In the second pass is when the "shall nots" or alternative flows should be defined.  The SFMECA can be used to strengthen the requirements and can even be used as a requirements review tool.  If SFMECA is conducted after code is written  Less effective but still time to effect test procedures  If SFMECA is conducted after testing is finished  Significantly less effective – can only effect user training or next release of software 11
  • 12. #4 It's conducted at the wrong level of abstraction  Some analysts work through the code one line at a time and analyze how that single line of code could fail.  For software functions that are associated with particularly high hazards that may be appropriate but not necessarily sufficient.  When analyzing one line of code at a time the analyst misses the failure modes due to  1) required code is missing altogether  2) defects that are caused by more than one line of code.  Effective software FMECAs focus on the requirements, use cases, interfaces, detailed design and usability. 12
  • 13. Focusing at too high or too level a level of abstraction System requirements Software requirements Software interface design Software design – state diagrams, timing diagrams, sequence diagrams, DB design, GUI design Module and class design Line of code Functions, procedures (code) Not enough coverage across the software and not enough coverage of design or software only requirements Analyzing one line of code at a time has potential to miss the design and requirement s related faults
  • 14. #5 The most common failure modes aren't considered  The most common failure modes that apply to all software intensive systems are:  Faulty functionality - missing required functionality, function doesn't work as required  Faulty processing - can't perform after an interruption of service or extended usage  Faulty error handling - doesn't handle hardware, interfaces, software or user faults  Faulty state management - executes when it shouldn't, encounters dead states, faulty state transitions, etc.  Faulty timing - race conditions, a function executes too early, too late, accumulates timing errors when left on too long, etc.  Faulty data - missing, corrupt, improperly sized, improperly formatted, improperly scaled data isn't handled 14
  • 15. Just a few examples of failure modes that causes major failure events Failure Event Associated failure mode Several patients suffered radiation overdose from the Therac 25 equipment in the mid-1980s. [THERAC] Faulty timing - A race condition combined with ambiguous error messages and missing hardware overrides. AT&T long distance service was down for 9 hours in January 1991. [AT&T] Faulty sequencing - An improperly placed “break” statement was introduced into the code while making another change. Ariane 5 Explosion in 1996. [ARIAN5] Faulty data - An unhandled mismatch between 64 bit and 16 bit format. NASA Mars Climate Orbiter crash in 1999.[MARS] Faulty data - Metric/English unit mismatch. Mars Climate Orbiter was written to take thrust instructions using the metric unit Newton (N), while the software on the ground that generated those instructions used the Imperial measure pound-force (lbf). On October 8th, 2005, The European Space Agency's CryoSat-1 satellite was lost shortly after launching. [CRYOSAT] Faulty functionality - Flight Control System code was missing a required command from the on-board flight control system to the main engine. A rail car fire in a major underground metro system in April 2007. [RAILCAR] Faulty error handling - Missing error detection and recovery by the software.
  • 16. Additional references  Effective Application of Software Failure Modes Effects Analysis  This book provides practical guidance and examples for conducting effective software FMECAs.  If you want to learn more- attend the 2 day software FMECA technical interchange Register here. Or attend the online self guided software FMEA training. 16