Accident Investigation

Resources - Accident Investigation and Root Cause Analysis

Hazard Analysis Basics

What is the Role of the Hazard Analysis?

Hazard analyses are performed to identify and define hazardous conditions/risks for the purpose of their elimination or control. Analyses examine the system, subsystems, components, and interrelationships.

Steps in performing a hazard analysis:

  1. Describe and bound the system in accordance with system description instructions in Chapter 3.
  2. Perform functional analysis if appropriate to the system under study.
  3. Develop a preliminary hazard list.
  4. Identify contributory hazards, initiators, or any other causes.
  5. Establish hazard control baseline by identifying existing controls when appropriate.
  6. Determine potential outcomes, effects, or harm.
  7. Perform a risk assessment of the severity of consequence and likelihood of occurrence.
  8. Rank hazards according to risk.
  9. Develop a set of recommendations and requirements to eliminate or control risks
  10. Provide managers, designers, test planners, and other affected decision makers with the information and data needed to permit effective trade-offs
  11. Conduct hazard tracking and risk resolution of medium and high risks. Verify that recommendations and requirements identified in Step 9 have been implemented.
  12. Demonstrate compliance with given safety related technical specifications, operational requirements, and design criteria.

What are the Basic Elements of A Hazard Analysis?

The analytical approach to safety requires four key elements if the resulting output is to impact the system in a timely and cost effective manner. They are:

  1. Hazard identification
  2. Hazard evaluation
  3. Hazard resolution
  4. Timely solutions

These concepts are described in detail below:

Identification of a risk is the first step in the risk control process. Identifying a risk provides no assurance that it will be eliminated or controlled. The risk must be documented, evaluated (likelihood and severity), and when appropriate, highlighted to those with decision making authority.

Evaluation of risks requires determination of how frequently a risk occurs and how severe it could be if and accident occurs as a result of the hazards. A severe risk that has a realistic possibility of occurring requires action; one that has an extremely remote chance may not require action. Similarly, a non-critical accident that has a realistic chance of occurring may not require further study. Frequency may be characterized qualitatively by terms such as "frequent" or "rarely." It may also be measured quantitatively such as by a probability (e.g., one in a million flight hours). In summary, the evaluation step prioritizes and focuses the system safety activity and maximizes the return-on-investment for safety expenditures.

The timing of safety analysis and resulting corrective action is critical to minimize the impact on cost and schedule. The later in the life cycle of the equipment that safety modifications are incorporated, the higher the impact on cost and schedule. The analysis staff should work closely with the designers to feed their recommendations or, at a minimum, objections back to the designers as soon as they are identified. A safe design is the end product, not a hazard analysis. By working closely with the design team, hazards can be eliminated or controlled in the most efficient manner. An inefficient alternate safety analysis approach is when the safety engineer works alone in performing an independent safety analysis and formally reports the results. This approach has several disadvantages.

Significant risks will be corrected later than the case where the design engineer is alerted to the problem shortly after detection by the safety engineer. This requires a more costly fix, leads to program resistance to change, and the potential implementation of a less effective control. The published risk may not be as severe as determined by the safety engineer operating in a vacuum, or overcome by subsequent design evolution.

Once the risks have been analyzed and evaluated, the remaining task of safety engineering is to follow the development and verify that the agreed-upon safety requirements are met by the design or that the risks are controlled to an acceptable level.

What is the Relationship Between Safety and Reliability?

Reliability and system safety analyses complement each other. They can each provide the other more information than obtained individually. Neither rarely can be substituted for the other but, when performed in collaboration, can lead to better and more efficient products.

Two reliability analyses (one a subset of the other) are often compared to hazard analyses. Performance of a Failure Modes and Effects Analysis (FMEA) is the first step in generating the Failure Modes, Effects, and Criticality Analysis (FMECA). Both types of analyses can serve as a final product depending on the situation. An FMECA is generated from a FMEA by adding a criticality figure of merit. These analyses are performed for reliability, and supportability information.

A hazard analysis uses a top-down methodology that first identifies risks and then isolates all possible (or probable) causes. For an operational system, it is performed for specific suspect hazards. In the case of the hazard analysis, failures, operating procedures, human factors, and transient conditions are included in the list of hazard causes.

The FMECA is limited even further in that it only considers hardware failures. It may be performed either top-down or bottom-up, usually the latter. It is generated by asking questions such as "If this fails, what is the impact on the system? Can I detect it? Will it cause anything else to fail?" If so, the induced failure is called a secondary failure.

Reliability predictions establish either a failure rate for an assembly (or component) or a probability of failure. This quantitative data, at both the component and assembly level, is a major source of data for quantitative reliability analysis. This understanding is necessary to use it correctly. In summary, however, hazard analyses are first performed in a qualitative manner identifying risks, their causes, and the significance of hazards associated with the risk.

What General Procedures Should Follow in the Performance of a Hazard Analysis?

1. Establish safety requirements baseline and applicable history (i.e., system restraints):

  • Specifications/detailed design requirements
  • Mission requirements (e.g., How is it supposed to operate?)
  • General statutory regulations (e.g., noise abatement)
  • Human factors standardized conventions (e.g., switches "up" or "forward" for on)
  • Accident experience and failure reports

2. Identify general and specific potential accident contributory factors (hazards):

  • In the equipment (hardware, software, and human)
  • Operational and maintenance environment
  • Human machine interfaces (e.g., procedural steps)
  • Operation
  • All procedures
  • All configurations (e.g., operational and maintenance)

3. Identify risks for each contributory factor (e.g., risks caused by the maintenance environment and the interface hazards). An example would be performing maintenance tasks incompatible with gloves in a very cold environment.

4. Assign severity categories and determine probability levels. Risk probability levels may either be assigned qualitatively or quantitatively. Risk severity is determined through hazard analysis. This reflects, using a qualitative measure, the worst credible accident that may result from the risk. These range from death to negligible effect on personnel and equipment. Evaluating the safety of the system or risk of the hazard(s), quantitatively requires the development of a probability model and the use of Boolean algebra. The latter is used to identify possible states or conditions (and combinations thereof) that may result in accidents. The model is used to quantify the likelihood of those conditions occurring.

6. Develop corrective actions for critical risks. This may take the form of design or procedural changes.

What Outputs Can Be Expected from a Hazard Analysis?

  • An assessment of the significant safety problems of the program/system
  • A plan for follow-on action such as additional analyses, tests, and training
  • Identification of failure modes that can result in hazards and improper usage
  • Selection of pertinent criteria, requirements, and/or specifications
  • Safety factors for trade-off considerations
  • An evaluation of hazardous designs and the establishment of corrective/preventative action priorities
  • Identification of safety problems in subsystem interfaces
  • Identification of factors leading to accidents
  • A quantitative assessment of how likely hazardous events are to occur with the critical paths of cause
  • A description and ranking of the importance of risks
  • A basis for program oriented precautions, personnel protection, safety devices, emergency equipment-procedures-training, and safety requirements for facilities, equipment, and environment
  • Evidence of compliance with program safety regulations.

Qualitative and Quantitative Analysis

Hazard analyses can be performed in either a qualitative or quantitative manner, or a combination of both.

Qualitative Analysis A qualitative analysis is a review of all factors affecting the safety of a product, system, operation, or person. It involves examination of the design against a predetermined set of acceptability parameters. All possible conditions and events and their consequences are considered to determine whether they could cause or contribute to injury or damage. A qualitative analysis always precedes a quantitative one. The objective of a qualitative analysis is similar to that of a quantitative one. Its method of focus is simply less precise.

Qualitative analysis verifies the proper interpretation and application of the safety design criteria established by the preliminary hazard study. It also verifies that the system will operate within the safety goals and parameters established by the Operational Safety Assessment (OSA). It ensures that the search for design weaknesses is approached in a methodical, focused way.

Quantitative Analysis

Quantitative analysis takes qualitative analysis one logical step further. It evaluates more precisely the probability that an accident might occur. This is accomplished by calculating probabilities.

In a quantitative analysis, the risk probability is expressed using a number or rate. The objective is to achieve maximum safety by minimizing, eliminating, or establishing control over significant risks. Significant risks are identified through engineering estimations, experience, and documented history of similar equipment.

A probability is the expectation that an event will occur a certain number of times in a specific number of trials. Actuarial methods employed by insurance companies are a familiar example of the use of probabilities for predicting future occurrences based on past experiences. Reliability engineering uses similar techniques to predict the likelihood (probability) that a system will operate successfully for a specified mission time. Reliability is the probability of success. It is calculated from the probability of failure, in turn calculated from failure rates (failures/unit of time) of hardware (electronic or mechanical). An estimate of the system failure probability or unreliability can be obtained from reliability data using the formula:

P = 1-e-lt

Where P is the probability of failure, e is the natural logarithm, l is the failure rate in failures per hour, and t is the number of hours operated.

However, system safety analyses predict the probability of a broader definition of failure than does reliability. This definition includes:

  • A failure must equate to a specific hazard
  • Hardware failures that are hazards
  • Software malfunctions
  • Mechanically correct but functionally unsafe system operation due to human or procedural errors
  • Human error in design
  • Unanticipated operation due to an unplanned sequence of events, actions or operating conditions.
  • Adverse environment

It is important to note that the likelihood of damage or injury reflects a broader range of events or possibilities than reliability. Many situations exist in which equipment can fail and no damage or injury occurs because systems can be designed to fail safe. Conversely, many situations exist in which personnel are injured using equipment that functioned reliably (the way it was designed) but at the wrong time because of an unsafe design or procedure. A simple example is an electrical shock received by a repair technician working in an area where power has not failed.

Source: Missouri Department of Labor and Industrial Relations

Certisafety Section Home Page

Copyright ©2000-2019 Geigle Safety Group, Inc. All rights reserved. Federal copyright prohibits unauthorized reproduction by any means without permission. Disclaimer: This material is for training purposes only to inform the reader of occupational safety and health best practices and general compliance requirement and is not a substitute for provisions of the OSH Act of 1970 or any governmental regulatory agency. CertiSafety is a division of Geigle Safety Group, Inc., and is not connected or affiliated with the U.S. Department of Labor (DOL), or the Occupational Safety and Health Administration (OSHA).