Resources - System Safety

Fault Tree Analysis

Fault Tree Analysis (FTA) is a popular and productive hazard identification tool. It provides a standardized discipline to evaluate and control hazards. The FTA process is used to solve a wide variety of problems ranging from safety to management issues.

This tool is used by the professional safety and reliability community to both prevent and resolve hazards and failures. Both qualitative and quantitative methods are used to identify areas in a system that are most critical to safe operation. Either approach is effective. The output is a graphical presentation providing technical and administrative personnel with a map of "failure or hazard" paths. FTA symbols may be found in Figure 8- 5. The reviewer and the analyst must develop an insight into system behavior, particularly those aspects that might lead to the hazard under investigation.

Qualitative FTAs are cost effective and invaluable safety engineering tools. The generation of a qualitative fault tree is always the first step. Quantitative approaches multiply the usefulness of the FTA but are more expensive and often very difficult to perform.

An FTA (similar to a logic diagram) is a "deductive" analytical tool used to study a specific undesired event such as "engine failure." The "deductive" approach begins with a defined undesired event, usually a postulated accident condition, and systematically considers all known events, faults, and occurrences that could cause or contribute to the occurrence of the undesired event. Top level events may be identified through any safety analysis approach, through operational experience, or through a "Could it happen?" hypotheses. The procedural steps of performing a FTA are:

  1. Assume a system state and identify and clearly document state the top level undesired event(s). This is often accomplished by using the PHL or PHA. Alternatively, design documentation such as schematics, flow diagrams, level B & C documentation may reviewed.
  2. Develop the upper levels of the trees via a top down process. That is determine the intermediate failures and combinations of failures or events that are the minimum to cause the next higher level event to occur. The logical relationships are graphically generated as described below using standardized FTA logic symbols.
  3. Continue the top down process until the root causes for each branch is identified and/or until further decomposition is not considered necessary.
  4. Assign probabilities of failure to the lowest level event in each branch of the tree. This may be through predictions, allocations, or historical data.
  5. Establish a Boolean equation for the tree using Boolean logic and evaluate the probability of the undesired top level event.
  6. Compare to the system level requirement. If it the requirement is not met, implement corrective action. Corrective actions vary from redesign to analysis refinement.

The FTA is a graphical logic representation of fault events that may occur to a functional system. This logical analysis must be a functional representation of the system and must include all combinations of system fault events that can cause or contribute to the undesired event. Each contributing fault event should be further analyzed to determine the logical relationships of underlying fault events that may cause them. This tree of fault events is expanded until all "input" fault events are defined in terms of basic, identifiable faults that may then be quantified for computation of probabilities, if desired. When the tree has been completed, it becomes a logic gate network of fault paths, both singular and multiple, containing combinations of events and conditions that include primary, secondary, and upstream inputs that may influence or command the hazardous mode.

A non-technical person can, with minimal training, determine from the fault tree, the combination and alternatives of events that may lead to failure or a hazard. the figure above is a sample fault tree for an aircraft engine failure. In this sample there are three possible causes of engine failure: fuel flow, coolant, or ignition failure. The alternatives and combinations leading to any of these conditions may also be determined by inspection of the FTA.

Based on available data, probabilities of occurrences for each event can be assigned. Algebraic expressions can be formulated to determine the probability of the top level event occurring. This can be compared to acceptable thresholds and the necessity and direction of corrective action determined.

The FTA shows the logical connections between failure events and the top level hazard or event. "Event," the terminology used, is an occurrence of any kind. Hazards and normal or abnormal system operations are examples. For example, both "engine overheats" and "frozen bearing" are abnormal events. Events are shown as some combination of rectangles, circles, triangles, diamonds, and "houses." Rectangles represent events that are a combination of lower level events. Circles represent events that require no further expansion. Triangles reflect events that are dependent on lower level events where the analyst has chosen to develop the fault tree further. Diamonds represent events that are not developed further, usually due to insufficient information. Depending upon criticality, it may be necessary to develop these branches further.

In the aircraft engine example, a coolant pump failure may be caused by a seal failure. This level was not further developed. The example does not include a "house." That symbol illustrates a normal (versus failure) event. If the hazard were "unintentional stowing of the landing goal", a normal condition for the hazard would be the presence of electrical power.

FTA symbols can depict all aspects of NAS events. The example reflects a hardware based problem. More typically, software (incorrect assumptions or boundary conditions), human factors (inadequate displays), and environment conditions (ice) are also included, as appropriate.

Events can be further broken down as primary and secondary. A primary event is a coolant pump failure caused by a bad bearing. A secondary event would be a pump failure caused by ice through the omission of antifreeze in the coolant on a cold day. The analyst may also distinguish between faults and failures. An ignition turned off at the wrong time is a fault, an ignition switch that will not conduct current is an example of failure.

Events are linked together by "AND" and "OR" logic gates. The latter is used in the example for both fuel flow and carburetor failures. For example, fuel flow failures can be caused by either a failed fuel pump or a blocked fuel filter. An "AND" gate is used for the ignition failure illustrating that the ignition systems are redundant. That is both must fail for the engine to fail. These logic gates are called Boolean gates or operators. Boolean algebra is used for the quantitative approach. The "AND" and "OR" gates are numbered sequentially A# or O# respectively in the figure above.

As previously stated, the FTA is built through a deductive "top down" process. It is a deductive process in that it considers combinations of events in the "cause" path as opposed to the inductive approach, which does not. The process is asking a series of logical questions such as "What could cause the engine to fail?" When all causes are identified, the series of questions is repeated at the next lower level, i.e., "What would prevent fuel flow?" Interdependent relationships are established in the same manner.

When a quantitative analysis is performed, probabilities of occurrences are assigned to each event. The values are determined through analytical processes such as reliability predictions, engineering estimates, or the reduction of field data (when available). A completed tree is called a Boolean model. The probability of occurrence of the top level hazard is calculated by generating a Boolean equation. It expresses the chain of events required for the hazard to occur. Such an equation may reflect several alternative paths. Boolean equations rapidly become very complex for simple looking trees. They usually require computer modeling for solution.

In addition to evaluating the significance of a risk and the likelihood of occurrence, FTAs facilitate presentations of the hazards, causes, and discussions of safety issues. They can contribute to the generation of the Master Minimum Equipment List (MMEL).

The FTA's graphical format is superior to the tabular or matrix format in that the inter-relationships are obvious. The FTA graphic format is a good tool for the analyst not knowledgeable of the system being examined. The matrix format is still necessary for a hazard analysis to pick up severity, criticality, family tree, probability of event, cause of event, and other information. Being a top-down approach, in contrast to the fault hazard and FMECA, the FTA may miss some non-obvious top level hazards.

Evaluating a Fault Tree Analysis

FTA is a technique that can be used for any formal system safety program analysis (PHA, SSHA, O&SHA). The FTA is one of several deductive logic model techniques, and is by far the most common. The FTA begins with a stated top-level hazardous/undesired event and uses logic diagrams to identify single events and combinations of events that could cause the top event. The logic diagram can then be analyzed to identify single and multiple events that can cause the top event. Probability of occurrence values are assigned to the lowest events in the tree. FTA utilizes Boolean Algebra to determine the probability of occurrence of the top (and intermediate) events. When properly done, the FTA shows all the problem areas and makes the critical areas stand out. The FTA has two drawbacks:

  1. Depending on the complexity of the system being analyzed, it can be time consuming, and therefore very expensive.
  2. It does not identify all system hazards, it only identifies failures associated with the predetermined top event being analyzed. For example, an FTA will not identify "ruptured tank" as a hazard in a home water heater. It will show all failures that lead to that event. In other words, the analyst needs to identify all hazards that cannot be identified by use of a fault tree.

The graphic symbols used in a FTA are provided in the figure below.

The first area for evaluation (and probably the most difficult) is the top event. This top event should be very carefully defined and stated. If it is too broad (e.g., aircraft crashes), the resulting FTA will be overly large. On the other hand, if the top event is too narrow (e.g., aircraft crashes due to pitch-down caused by broken bellcrank pin), then the time and expense for the FTA may not yield significant results. The top event should specify the exact hazard and define the limits of the FTA. In this example, a good top event would be "uncommanded aircraft pitch-down," which would center the fault tree around the aircraft flight control system, but would draw in other factors, such as pilot inputs and engine failures. In some cases, a broad top event may be useful to organize and tie together several fault trees.

Some fault trees do not lend themselves to quantification because the factors that tie the occurrence of a second level event to the top event are normally outside the control/influence of the operator (e.g., an aircraft that experiences loss of engine power may or may not crash depending on altitude at which the loss occurs).

A quick evaluation of a fault tree may be possible by looking at the logic gates. Most fault trees will have a substantial majority of OR gates. If fault trees have too many OR gates, every fault of event may lead to the top event. This may not be the case, but a large majority of OR gates will certainly indicate this. An evaluator needs to be sure that logic symbols are well defined and understood. If nonstandard symbols are used, they must not get mixed with other symbols.

Check for proper control of transfers. Transfers are reference numbers permitting linking between pages of FTA graphics. Fault trees can be extremely large, requiring the uses of many pages and clear interpage references. Occasionally, a transfer number may be changed during fault tree construction. If the corresponding sub-tree does not have the same transfer number, then improper logic will result. Cut sets (minimum combinations of events that lead to the top event) need to be evaluated for completeness and accuracy. Establishing the correct number of cuts and their depth is a matter of engineering judgment.

Each fault tree should include a list of minimum cut sets. Without this list, it is difficult to identify critical faults or combinations of events. For large or complicated fault trees, a computer is necessary to catch all of the cut sets; it is nearly impossible for a single individual to find all of the cut sets. For a large fault tree, it may be difficult to determine whether or not the failure paths were completely developed. If the evaluator is not totally familiar with the system, the evaluator may need to rely upon other means. A good indication is the shape of the symbols at the branch bottom. If the symbols are primarily circles (primary failures), the tree is likely to be complete. On the other hand, if many symbols are diamonds (secondary failures or areas needing development), then it is likely the fault tree needs expansion.

Faulty logic is probably the most difficult area to evaluate, unless the faults lie within the gates, which are relatively easy to spot. A gate-to-gate connection shows that the analyst might not completely understand the workings of the system being evaluated. Each gate must lead to a clearly defined specific event, i.e., what is the event and when does it occur? If the event consists of any component failures that can directly cause that event, an OR gate is needed to define the event. If the event does not consist of any component failures, look for an AND gate.

When reviewing an FTA with quantitative hazard probabilities of occurrence, identify the events with relatively large probability of occurrence. They should be discussed in the analysis summaries, probably as primary cause factors.

A large fault tree performed manually is susceptible to errors and omissions. There are many advantages of computer modeling relative to manual analysis (of complex systems):

  • Logic errors and event (or branch) duplications can be quickly spotted.
  • Cut sets (showing minimum combinations leading to the top event) can be listed.
  • Numerical calculations (e.g., event probabilities) can be quickly done.
  • A neat, readable, fault tree can be drawn.

Source: FAA System Safety Handbook, Ch. 9.

Certisafety Section Home Page

Copyright ©2000-2016 Geigle Safety Group, Inc. All rights reserved. Federal copyright prohibits unauthorized reproduction by any means without permission. Students may reproduce materials for personal study. Disclaimer: This material is for training purposes only to inform the reader of occupational safety and health best practices and general compliance requirement and is not a substitute for provisions of the OSH Act of 1970 or any governmental regulatory agency. CertiSafety is a division of Geigle Safety Group, Inc., and is not connected or affiliated with the U.S. Department of Labor (DOL), or the Occupational Safety and Health Administration (OSHA).