Resources - System Safety

Procedure for Failure Modes and Effects Analysis (FMEA)

The procedure for performing an FMEA consists of the following nine steps. Each step is further explained on the following pages.

1.0 Define the system of interest. Specify and clearly define the boundaries of the system for which risk-related information is needed.

2.0 Define the problems of interest for the analysis. Specify the problems of interest that the analysis will address. These may include safety issues, failures in systems such as steering or propulsion, etc.

3.0 Choose the type of FMEA approach for the study. Select a hardware approach (bottom-up), functional approach (top-down), or hybrid approach for applying FMEA.

4.0 Subdivide the system for analysis. Section the system according to the type of FMEA approach selected.

5.0 Identify potential failure modes for elements of the system. Define the fundamental ways that each element of the system can fail to achieve its intended functions. Determine which failures can lead to accidents of interest for the analysis.

6.0 Evaluate potential failure modes capable of producing accidents of interest. For each potential failure that can lead to accidents of interest, evaluate the following:

  • The range of possible effects
  • Ways in which the failure mode can occur
  • Ways in which the failure mode can be detected and isolated
  • Safeguards that are in place to protect against accidents resulting from the failure mode

7.0 Perform quantitative evaluation (if necessary). Extend the analysis of potentially important failures by characterizing their likelihood, their severity, and the resulting levels of risk. FMEAs that incorporate this step are referred to as failure modes, effects, and criticality analyses (FMECAs).

8.0 Transition the analysis to another level of resolution (if necessary or otherwise useful). For top-down FMEAs, follow-on analyses at lower (i.e., more detailed) levels of analysis may be useful for finding more specific contributors to system problems. For bottom-up FMEAs, follow-on analyses at higher (i.e., less detailed) levels of analysis may be useful for characterizing performance problems in broader categories. Typically, this would involve system and subsystem characterizations based on previous component-level analyses.

9.0 Use the results in decision making. Evaluate recommendations from the analysis and implement those that will bring more benefits than they will cost over the life cycle of the system.

1.0 Define the system of interest

Intended functions. Because all risk assessments are concerned with ways in which a system can fail to perform an intended function, a clear definition of the intended functions for a system is an important first step.

Boundaries. Few systems operate in isolation. Most are connected to or interact with other systems. By clearly defining the boundaries of a system, especially boundaries with support systems such as electric power and compressed air, analysts can avoid (1) overlooking key elements of a system at interfaces and (2) penalizing a system by associating other equipment with the subject of the study. A diagram or schematic of the system is helpful for identifying boundaries.

2.0 Define the problems of interest for the analysis

Safety problems. The analysis team may be asked to look for ways in which failures in a hardware system may result in personnel injury. These injuries may be caused by many mechanisms, including the following:

  • Steering or propulsion failures
  • Hoist and rigging failures
  • Exposure to high temperatures (e.g., through steam leaks)
  • Fires and explosions

Environmental issues. The analysis team may be asked to look for ways in which the failure of a system can undesirably affect the environment. These environmental issues may be caused by many mechanisms, including the following:

  • Equipment failures that result in an unplanned discharge of material into the water
  • Equipment failures, such as seal failures, that result in a material spill

Economic impacts. The analysis team may be asked to look for ways in which the failure of a system may have adverse economic impacts. These economic risks may be categorized in many ways, including the following:

  • Business risks, such as vessel detained at port, contractual penalties, lost revenue, etc.
  • Environmental restoration costs
  • Replacement costs, such as the cost of replacing damaged equipment

A particular analysis may focus only on events above a certain threshold of concern in one or more of these categories.

3.0 Choose the type of FMEA approach for the study

Hardware approach (bottom-up). The hardware approach is normally used when hardware items can be uniquely identified from schematics, drawings, and other engineering and design data. The hardware approach typically focuses on the potential failure modes of basic components of the system. This is generally the lowest level of resolution that provides valuable information to decision makers. The hardware approach for defining an FMEA is a good choice when every component of a system must be reviewed (e.g., to make design or maintenance decisions). It can be difficult or inefficient, however, for use in analyzing (1) complex systems or (2) systems that are not well defined when the analysis must be performed.

Functional approach (top-down). The functional approach is normally used when hardware items cannot be uniquely identified or when system complexity requires progressive analysis, with each successive level of analysis focusing in more detail on only the most important contributors. This approach focuses on ways in which functional intents of a system may go unsatisfied rather than on the specific failure modes of individual equipment items. The functional approach to an FMEA is particularly effective if the analysis focuses on only a limited set of accidents of interest, or if it must directly address only the most important contributors to potential problems rather than every individual component.

Hybrid of the two. An FMEA may begin with a functional approach and then transition to a focus on equipment, especially equipment that directly contributes to functional failures identified as important. Traditional reliability-centered maintenance analysis uses this hybrid approach, beginning with identification of important system functional failures and then identifying the specific equipment failure modes that produce those system functional failures.

4.0 Subdivide the system by equipment or functions for analysis

This step defines the elements of a system that will provide the basic structure of the initial FMEA. These elements may be equipment items for a hardware approach or intended functions for a functional approach. Example structures for both approaches are illustrated on the next two pages.

Example of the hardware approach (bottom-up)

Example of the functional approach (top-down)

5.0 Identify potential failure modes for elements of the system

The list of typical failure conditions above applies to equipment items and functional statements. The next five pages provide examples of these conditions applied to a wide range of typical industrial equipment. Below is an example of the typical failure conditions applied to one functional statement.

6.0 Evaluate potential failure modes capable of producing accidents of interest

Evaluating potential failure modes generally defines the following:

Mission phase/operational mode. A description of how the system is being used. This perspective is important for understanding the impacts of failure modes. More than one mission phase or operational mode may have to be considered for each potential failure mode.

Effects. The accidents that are expected if the failure mode occurs are often divided into the following categories:

Local effects The initial changes in system conditions that will occur if the postulated failure mode occurs

Higher level effects The change in condition of the next higher level of equipment or system function caused by the occurrence of the postulated failure mode

End effects The overall effects on the system, typically related to one or more of the accidents of interest for the analysis. The end effect may be possible only if planned mitigating safeguards for the failure mode also fail

Causes. In a hardware-based FMEA, the causes are typically the failure modes of equipment at the next lower level of resolution for the system, as well as human errors and external events that cause equipment problems at this level of resolution. In a function-based FMEA, the causes are typically lower-level functional failures.

Indications. Indications are the identifiable characteristics that suggest to a crew member or some other inspector or troubleshooter that this failure mode has occurred. Indications can include visual, audible, physical, and odor clues.

Safeguards. Safeguards are the equipment, procedures, and administrative controls in place to help (1) prevent the postulated situation from occurring or (2) mitigate the effects if the situation does occur.

Recommendations/remarks. These are the suggestions for system improvements that the team believes are appropriate. Generally, they are suggestions for additional safeguards.

There are three basic levels of documentation possible for an FMEA analysis:

  • Complete. Full descriptions for failure modes and a complete list of recommendations generated from the analysis
  • Streamlined. Descriptions for failure modes that result in suggestions for improvement, along with the complete list of recommendations generated from the analysis
  • Minimal. Complete list of recommendations generated from the analysis

7.0 Perform quantitative evaluation (if necessary)

Quantifying the risks associated with potential failure modes of a system provides more precise results than qualitative analysis alone. Quantifying the risks of potential failure modes has many benefits, including the following:

  • Overall levels of risk can be judged against risk acceptance guidelines, if such guidelines exist
  • Risk-based prioritization of potential failure modes provides a highly cost-effective way of allocating resources (design, maintenance, etc.) to best manage the most significant risks
  • Risk reductions can be estimated to help justify the cost of recommendations generated during the analysis

8.0 Transition the analysis to another level of resolution (if necessary or otherwise useful)

Hardware approach (bottom-up). Summaries of important issues at higher levels (systems and subsystems) are sometimes needed. When this type of information is needed, the results of lower-level analyses may be compiled into composite analyses for the higher levels. This includes composite risk characterizations.

Functional approach (top-down). Further subdivision and analysis of system functions occur only if decision makers need information at a more detailed level. Often, only a few areas must be expanded further.

9.0 Use the results in decision making

System improvements. FMEA results generally present a number of specific, practical suggestions for reducing accident exposure associated with a specific system. These suggestions often cover a range of issues from changes in design configuration and equipment specifications to better operating and maintenance practices. The qualitative and quantitative results from FMEAs also present the case for implementing the suggestions.

Maintenance task planning. One very prominent use of FMEAs is in maintenance task planning. Approaches like reliability-centered maintenance and other similar tools use the systematic analysis of FMEA as a basis for establishing effective maintenance plans.

Spare parts inventories. Another prominent use of FMEAs is in determining the types and numbers of spare parts to have on hand.

Troubleshooting guidelines. FMEAs that address indications and isolation of failures contain the information needed to develop highly effective troubleshooting guidelines.

Source: USCG Risk-based Decision-making (RBDM) Guidelines.

Certisafety Section Home Page

Copyright ©2000-2019 Geigle Safety Group, Inc. All rights reserved. Federal copyright prohibits unauthorized reproduction by any means without permission. Disclaimer: This material is for training purposes only to inform the reader of occupational safety and health best practices and general compliance requirement and is not a substitute for provisions of the OSH Act of 1970 or any governmental regulatory agency. CertiSafety is a division of Geigle Safety Group, Inc., and is not connected or affiliated with the U.S. Department of Labor (DOL), or the Occupational Safety and Health Administration (OSHA).