The transition from defect detection and removal activities to defect prevention activities may not be as smooth as you would like. You may start asking, "Where do I start?" Or, you may have the feeling that you are not getting much benefit from your defect prevention activities. You may also find yourself faced with a need to explore, evaluate, and adopt new metrics. This article discusses some quantitative (non-statistical process control [SPC]) methods for looking at your data; I will show the results of applying SPC to the same information, and provide a few "what next" options. The intent of this article is to provide process improvement team members, program managers, and supervisors with ideas for defect prevention metrics to help them identify and analyze problem areas and to help them prioritize and plan their defect prevention activities. I have chosen to avoid discussing complex mathematical algorithms in favor of providing charts to aid the reader in participating in brainstorming activities to identify metrics they will find useful for their situation.
You know it is impossible to fix every problem at once so you review the defect information looking for something that will jump out and say, "Fix me." During your review of the data, you find an item that grabs your attention. You are confident that you can reduce type xyz defects by 90 percent simply by providing the organization with an annual eight-hour refresher-training course. You estimate that it will cost $20,000 to develop a formal training course, and you get management approval to implement the idea. A few weeks later, you provide the first eight-hour training course to a team of 50 employees.
Six months later you analyze the data and, to your credit, you exceeded your goal: Type xyz defects were reduced by 95 percent. Unfortunately, you learn that your savings in development and rework costs is significantly less than the annual costs for the training. You also realize that all type xyz defects were detected internally and none were ever released to the customer. In order to maintain your integrity, you brief management of your findings and recommend discontinuing the annual eight-hour training course.
You cannot try to solve every type of defect at once so clearly you need a way of prioritizing your efforts. You also need a way of evaluating the possible solutions (cost versus benefit) to determine the most effective solution. This article is aimed at giving the reader some ideas on what type of defect information should be captured, and ways to present that data. Armed with the proper information, a defect prevention team will be able to prioritize its efforts, evaluate the effectiveness of the proposed solutions, and determine the proper corrective action.
As our Software Engineering Division at the Ogden-Air Logistics Center increased its focus on defect prevention activities, the Extended Software Engineering Process Group (ESEPG) found that it was not receiving much utility from its existing quality metrics. At the request of the ESEPG, I began analyzing its data in an effort to recommend some potential metrics that would facilitate defect prevention activities.
In my data analysis, I explored a variety of ways to show the data in order to provide the ESEPG with the ability to prioritize its efforts. Our group had collected a vast amount of information, so the first task was to develop appropriate filters to give me a better ability to extract the data in a manner that would facilitate the analysis. My first look at the information was by the category and severity of the defect as shown in Figure 1. If defects of a high severity were getting through the process, then this would be a logical starting point for defect prevention activities.
As seen in Figure 1, almost all of the recent defects were identified as being a minor severity. At this point, I changed the filters to extract the information for 18 different categories and types of defects, and then again for 19 different categories and locations for the defects. Table 1 provides an example of how each defect is characterized by category, type, and location.
The documentation defects analysis showed that typographical errors in the engineering documentation used to maintain the product were the most common defect type found during peer reviews. I then began to perform a similar analysis on the software defects using the same type of metrics developed for documentation defects. Too much information on a chart can make it difficult to understand, so to keep the information presentable, the documentation metrics were displayed on one chart and the software metrics on another. A few items from both categories were selected to display on the chart shown in Figure 2.
The information shown in Figure 2 can be used quite easily to convince a defect prevention team that they need to jump in and begin taking action to reduce the number of typographical errors. But the information presented so far does not answer the question, "Is working the typographical errors the best use of our time?" To answer this, I developed a chart similar to the one shown in Figure 3.
Figure 3 shows an example of the rework costs; this chart was developed to enable an easy comparison between Figures 2 and 3. Presenting and comparing the information in this manner (as shown in Figures 1-3) is a method that you may want to consider to help prioritize your defect prevention activities.
Knowing the information discussed earlier, many teams may think, "We know everything that we need to know. What can statistical process control (SPC) tell us that we don't already know?" To start with, the information shown in Figures 1-3 does not identify whether or not the process is under control, and the charts do not identify random events versus non-random events. Non-random events can be assigned to specific causes, which you may be able to prevent or take into future consideration as a risk.
At least seven watch-for indicators have been identified as events that can be assigned to a cause; they have a very low probability of being random in nature. These watch-for indicators include the following:
Using the same data, I generated the Sample (X) and moving Range (XmR) Control Charts for the total number of defects found during each peer review. The Sample (X) run chart is shown in Figure 4.
The LNPL shown in Figure 4 was not allowed to go below zero because it is impossible to have a negative number of findings. As can be seen in Figure 4, only one anomaly occurred where the number of peer review findings exceeded the UNPL.
I was concerned that by including all defect types in the run chart, I was masking defects that could be assigned to a cause. I then developed individual XmR charts for 18 different types of defects and for 19 different defect locations (okay, so I need a life). Peeling back the data and looking at the specific defects revealed an additional 18 anomalies where the quantity exceeded the UNPL. Figure 5 shows one of these additional charts, which in this case there were five instances in which the quantity of defects exceeded the UNPL.
The result of this effort identified a total of 19 anomalies1 in which the quantity of defects exceeded the UNPL. As I started looking at each anomaly, a common attribute appeared in the data. All 19 anomalies pointed back to one small2 highly skilled team working on a project in which the original proposal was too optimistic and based upon an unproven technology. The project quickly went over schedule as soon as the unproven technology failed to meet or exceed the anticipated productivity. The team was under a lot of pressure from both the customer and management to bring the project back on schedule. The harder the team tried to bring the project back on schedule, the louder the voice of the process became.
As I further analyzed the project's data, I started using this analogy: putting three valves on the end of a garden hose does not increase the flow of the water through the hose. The process capability was limited by constraints within the process such as manpower, equipment availability, and equipment throughput. In essence, the process capability resisted heroic efforts to bring the project back into the contract schedule. When the employees tried to rush through their own personal quality checks, they were met with higher defect rates found during the peer reviews.
The following is a comparison of the two methods of quantitative analysis.
Non-SPC
The benefit of quantitative non-SPC types of metrics is simplicity. The metrics and charts may seem easier to develop, the metrics may take less time to develop, and the audience may find these charts a lot easier to understand. Depending upon the data collected, these may be about the only metrics the team can develop. One drawback is that you do not necessarily know up front if the causes of the defects are random in nature or attributable to specific causes.
Based upon the software style guide rework costs shown in Figure 3, I recommended that the ESEPG first consider a variety of training options to reduce the style guide defects. The corrective actions for these defects could range from creating a heightened awareness (such as a team staff meeting) of the need to follow the style guide, to providing the team with formalized training on it. The cost of implementing each of the proposed solutions can be calculated, the annual rework costs are known, and based upon the perceived success of the proposed solutions, the defect prevention team can determine the appropriate corrective action plan.
SPC
The benefits of applying SPC techniques as a project management tool are that they may help identify problems that could remain hidden by other quantitative analysis methodologies. The calculations are a little more complex, but once you set up your calculations in something like a spreadsheet file then the file can easily be changed for the new set of data.
The results of this analysis led to a decision that every program manager will probably have to make sometime in his or her career. The proper corrective action was obvious, but at first it was not well received by the customer. After determining the process capability, I calculated a new baseline for the project and presented the new baseline to the customer. My analysis included the negative quality impacts experienced from trying to bring the project back on schedule and the argument that the new baseline would reduce life-cycle costs by providing the customer with higher quality products. The damage repair in customer satisfaction took many months to achieve, but the last feedback that I received was that customer satisfaction did improve over time. The team met the re-baselined plan and provided the customer with a higher quality product.
All of the charts discussed in this article provide a historical view of process activities. Displaying the data in a manner that shows trends may enable management to move from reactive management toward proactive management activities. I explored a variety of options for trying to watch for trends in the quality. One option that seemed to give some insight into the process was to show the trend of the probability of the chance of one or more defects being found; for each peer review I set a yes/no flag to indicate whether any defects of that nature occurred. I established the probability calculation based upon the sum of defects found in the last 50 peer reviews. By using the information from the last 50 reviews, I was able to develop a chart with a moving window (last 50) that would show a trend in the data.
I chose to use the last 50 reviews for two reasons. First, it was large enough to give a fair representation of the probability of the defect occurring in the product. The second reason was that even with using a sample size of 50, the time period spanning the reviews was less than a year. Figure 6 shows the trends for two of the defect types; the undesirable trends include the increasing probability of finding style guide and typographical defects. Smaller improvements in other defect types added up to a noticeable improvement trend in the probability of not finding any defects. The probability of not finding any defects was promising but the undesirable trends again reinforced a need to take action to reduce the style guide and typographical defects.
The three attributes of the product being developed are cost, schedule, and quality. When projects fall behind schedule and/or over-budget, then efforts are made to bring the project back on track, but it is undesirable to do this at the expense of quality. Applying the SPC concepts to the process revealed that our current course of action on one project risked delivering poor-quality products to the customer. In this case, the application of the SPC concepts enabled us to change our course of action to improve the quality of the products delivered to the customer.
As shown earlier, a lot of knowledge can be gained by a careful analysis of the data. By carefully analyzing the data and comparing the perceived benefits versus the costs, the defect prevention teams can select activities that provide the best return on investment.
You may find automated charts to be one of your greatest assets, but they can also be one of your greatest liabilities. The person that extracts the data, performs the calculations, and builds the charts seems to have a much better understanding of the data behind the chart than does the person that gets the charts from an automated process.
Source: David B. Putman, Ogden-Air Logistics Center
Copyright ©2000-2019 Geigle Safety Group, Inc. All rights reserved. Federal copyright prohibits unauthorized reproduction by any means without permission. Disclaimer: This material is for training purposes only to inform the reader of occupational safety and health best practices and general compliance requirement and is not a substitute for provisions of the OSH Act of 1970 or any governmental regulatory agency. CertiSafety is a division of Geigle Safety Group, Inc., and is not connected or affiliated with the U.S. Department of Labor (DOL), or the Occupational Safety and Health Administration (OSHA).