Software Engineering - Fault Tree
CIS 2341 Lesson 5 Failure Mode and Effect Analysis.
References
Failure Mode Effect Analysis: FMEA from Theory to Execution, D.H. Stamatis
The FMEA Pocket Handbook, Kenneth W. Dailey
Softrel.com
Effects of Failures
If we only assured 99.9% quality in the US, the failure rate would result in the following effects:
Risk reduction via FMEA
The elimination, control or reduction of risk is a total commitment by the entire organization, and it is more often than not the responsibility of the engineering department. FMEA is a specific methodology to evaluate a system, design, process, or service for possible ways in which failures (problems, errors, risks, concerns) can occur.
Why conduct an FMEA? Benefits of executing this method:
Helps to define the most significant opportunity for achieving fundamental differentiation
Improves the quality and reliability and safety of a product or service
Helps to select alternatives with high reliability and high safety potential during early phases of the system development life cycle
Improves the company’s image and competitiveness
Helps to increase customer satisfaction
Helps to determine redundancy of the system
Helps to establish the forum for defect for defect prevention
Helps to define the corrective action
Establishes a priority for design improvement actions
Lists potential failures and identifies the relative magnitude of their effects
Provides the basis for the test program during development and final validation of the system, design, process or service
What is a failure mode
A failure mode is the effect by which a failure is observed in a system component. It is important that all possible or potential failure modes of a system be listed as this is the essential basis of the FMEA.
For new components, reference can be made to other components with similar functions and structures and tests performed on them
For commonly used components already in service, records on their performance, reported failures, and existing tests can be consulted
Complex components that can be broken down into elements can be analyzed qualitatively, treating each systems
2 common ways of classifying failure modes
1. Identification if general failure modes (such as premature operation, time out, failure during operation, failure to cease operation)
2.By listing as completely as possible all generic failure modes (such as erroneous input, loss of output, security issues, communications etc)
System evaluation
Evaluation for a system using FMEA method
Standards methodology
requirements
Development
Quality Assurance Evaluation
Evaluation Reports
Quality Assurance track corrective action
The process of FMEA
To conduct FMEA effectively, one must follow a systematic approach.
Select the team and brainstorm
Functional block diagram and/or process flowchart
Prioritize
Data Collection
Analysis
Results
Confirm, evaluate, measure
Repeat (do it all over again)
What happens after the completion of FMEA?
Is the problem identification specific?
Was the effect, symptom or root cause identified?
Is the corrective action measurable?
Is the corrective action proactive?
Is the corrective action realistic and sustainable?
FMEA Process
8
The basis FMEA
Information necessary to perform the FMEA:
The different system elements with their characteristics
The connection between elements, tasks, components
Redundancy level and nature of redundancy
Data pertaining to functions, characteristics and performance
A failure mode is the effect by which a failure is observed in a system component. It is important that all possible or potential failure modes of a system be listed .
GENERAL FAILURE MODES
Premature operation
Failure to operate at a prescribed time
Failure to cease operation at a prescribed time
Failure during operation
Failure to start, stop, switch, close
Loss of input, loss out output, erroneous input/output
Tolerance failures
Code errors
Security issues
Intermittent operation
Identification of Failure Modes
Identification of failure modes, their causes and effects, their relative importance, and their sequence:
The operation of a successful FMEA is dependent on the performance of critical system elements. The key to evaluation of system performance is the identification of critical elements. The procedures for identifying failure modes, their causes and effects can be effectively enhanced by the preparation of a list of failure modes anticipated in view of:
System usage
Mode of operation
Pertinent operation specifications
Time constraints
Environment
Failure Mode Checklist (example)
Logic Missing
Are all constants defined and used?
Are all defaults checked explicitly (blanks in an input field) ?
If character strings are created, are they complete? Are delimiters used and necessary?
If a keyword has many unique values, are they all checked?
Are all keywords tested in a macro?
Are all keyword related parameters tested in a service routine?
Are all increment counts properly initialized?
After processing a data entry table, should any value be decremented/incremented?
Is provision made for possible processing at logical checkpoints (end-of-file etc)
If a queue is being manipulated, can the execution be interrupted?
After queuing/de-queuing, should any value be decremented or incremented?
Should any registers be saved on entry?
Should any registers be restored on exits?
11
Failure Mode Checklist ( cont’d)
Logic Wrong
Are literals used where there should be constant data names?
On comparison of group items, should all fields be compared?
Are internal variables unique?
Logic Extra
Are all data areas necessary?
Does this module contain redundant logic?
Control block definition/usage missing
Are pointers declared as XX bit pointers?
Is the bit configuration for input/output parameters defined?
Is the field property defined in the control block/data area?
Is the design dependent on building/creating/deleting various control blocks/data areas, is it provided for in the code?
12
Failure Mode Checklist ( cont’d)
Bits, Byte, Reset etc.
Initialize all variables before usage – never assume zeroes
Initialize all fields of a control block, do not leave garbage
Reserved fields must be initialized to zero
Early termination – pointer values not reset
First buffer released, but not others
Data types, variable lengths
When defining counters, make sure boundaries are sufficient
Make code data independent whenever possible
Permutations and parameter values, labels
Parms passed in wrong order
Update return code on error conditions
Missing parameters (comma missing, moved/copied code etc.)
Duplicate labels
Made-up labels as coder went along
13
Failure Mode Checklist ( cont’d)
Loop logic errors
Consider all flags on each iteration
Consider 3 loop conditions: 1st pass, last pass, middle iteration
Initialize all flags and counters before entering loop
Increment counters on each iteration
Update all pointers on each iteration
Wrong bit checked
DO WHOLE instead of DO UNTIL
OR instead of AND on IF statement
Tested OFF instead of ON
X ‘YY’ should have been X ’10’
Resetting of Bits in Wrong Place
Flag set in control block at wrong time
14
Easy Steps of FMEA
13 Easy steps of FMEA
Create a detailed Component List
Identify functions
Identify failure modes
Describe the effects
Assign Severity ranking
Identify root causes
Assign occurrence ranking (OR)
Identify Design current control
Assign detection ranking (DR)
Calculate Risk Priority Number (RPN)
Sort RPN from high to low
Set action items and take corrective action
Recalculate the resulting RPN and return to step 10
15
Example 1 of Fault Tree Analysis
Module/function : a user accessing an application on a web site (any with authorization)
16
Example 2 of Fault Tree Analysis
17
Example 3 of Fault Tree Analysis
18
Critical Failure modes and effects
Security
Failures: any security breach or penetration is critical failure. Effect: loss of business, loss of Walmart brand recognition and potentially legal liability. Example: injection attack possibility or email fraud at <input value="" placeholder="Email address" title="Email address" data-tl-id="footer-GlobalEmailSignup-formInput" class="form-control " data-reactid=".2glvuk2txc0.1.0.3.0.1.0.0.1.0.0.1.2.1">
Data
Failures :any significant data corruption or data transfer failure is critical. Effect: loss of consumer data, loss of advertising data and ultimately loss of Walmart revenue Example: data transfer from advertisers can be Source data can become corrupted <img src="" alt="" data-triggered="0" data-beacon-src="//beam.hlserve.com/b/I8K39PA7KUCqLL0niGw31Q?hlpt=H&fid=96&pageguid=fbcc557f-1ebf-40ba-8b8e-
19
Critical Failure modes and effects
Navigation
Failures :any inaccessible links or clickable images if they don’t work is critical. Effect: user cannot find item and cannot complete purchase, ultimately revenue loss to Walmart Example: clickable link does not navigate user to correct target page <img src="https://tpc.googlesyndication.com/simgad/17347722438563713569" border="0" width="300" height="250" alt="" class="img_ad">
Infrastructure
Failure :any significant outage be it network or server failure is critical. Effect: user cannot operate the web site and ultimately Walmart revenue loss. Example: if DNS server goes down and becomes unresponsive href="//beacon.walmart.com" rel="dns-prefetch"/><link
20
Practice
1. Prepare a Fault Tree analysis of www.zillow.com home page
2. Identify how many critical failure modes can occur in the feature set you selected and record the effects of each.
21
Homework
Reading:
The FMEA pocket handbook by Kenneth W Dailey
Writing, individual submission: will be posted in Moodle
1. Prepare a Fault Tree analysis of www.ndnu.edu/admissions/request-info-freshman page.
2. Identify how many critical failure modes can occur in the feature set you selected and record the effects of each.
22