Resources - Statistics

Engineering Statistical Glossary

Welcome to the hypertext dictionary. It contains selected terms from engineering statistics. Current topic areas include experimental design, metrology, survey questionnaires, statistical process control, and computer experiments.


[A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z]

[A]
accuracy In metrology, the total measurement variation, including not only precision (reproducibility), but also the systematic offset between the average of measured values and the true value.
additive effect A property of a model describing a physical process whereby the average or expected change from changing a particular input factor does not depend upon the values of other input factors. An additive effect has no associated interactions.
affine calibration Especially in computer experiments, the practice of improving the agreement between predicted values and empirical responses by modifying one or more of the input factors (usually of the simulator) by linear transformations. To be distinguished from global calibration.
analysis of manufacturing variance (AMV) Especially in the analysis of computer experiments, the decomposition of projected distribution of a process in manufacturing into components attributable to each of several factors, or combination of factor interactions.

To be distinguished from analysis of variance, because AMV depends on assuming a (usually Gaussian) distribution for each factor. To be distinguished from variance components, because AMV is usually applied to computer experiments.

analysis of variance A way of presenting the calculations for the significance of a particular factor's effect, especially for data in which the influence of several factors is being considered simultaneously. Analysis of variance decomposes the sum of squared residuals from the mean into non-negative components attributable to each factor, or combination of factor interactions.

Usually it is useful to distinguish between fixed and random effects. In the case of only random effects, the term variance components is often preferred.

assignable cause A synonym for special cause.
attenuation
  1. As a fuzzy concept, the lessening of any signal due to the presence of variation.

  2. In metrology, the tendency to estimate the sensitivity to a signal with a bias toward zero, especially due to the effect of variability or measurement uncertainty in the calibration standard.
audit The periodic observation of performed activities to verify compliance with documented requirements.
average outgoing quality The average level of defective product that is delivered to the customer, after any benefit from inspection and rectification has been taken into account. Usually reported in part per million. See also Hahn estimator.


[B]
bar chart A graph that reports several values by drawing a bar from zero to each value. Each bar is suitably labeled. Critics of good graphic design distinguish further between vertical bar charts and horizontal bar charts. Vertical bar charts, sometimes also called "column charts," plot the values against the y-axis and the labels along the x-axis; they are recommended for reporting data in time order. Horizontal bar charts plot the values along the x-axis and the labels along the y-axis; they are recommended for reporting data that is not chronological. Note that for horizontal bar charts, the labels are naturally oriented horizontally, and more readable than for vertical bar charts. Finally, note that recommended practice is to order the axes meaningfully, either using a natural sequence in the labels, or by ranking the values plotted. See also Pareto charts.
bias The difference between the average or expected value of a distribution, and the true value. In metrology, the difference between precision and accuracy is that measures of precision are not affected by bias, whereas accuracy measures degrade as bias increases.
binomial distribution An important theoretical distribution used to model discrete events, especially the count of defectives. The binomial distribution depends on two parameters, n and p. n is the total number of trials; for each trial, the chance of observing the event of interest is p, and of not observing it, 1-p. The binomial distribution assumes each trial's outcome is independent of that of any other trial, and models the sum of events observed. Unlike the Poisson distribution, the binomial distribution sets a maximum number of events n, the sample size that can be observed. Unlike the hyergeometric distribution, the binomial distribution assumes the events it counts are independent.
blocking The practice of partitioning an experiment into subgroups, each of which is restricted in size, time, and/or space. Good experimental design practice has all factors changing within blocks, unit assignment within blocks randomized, and block order and assignment randomized.
Box-Benkhen experiment An experimental design with three levels on each factor that allows the estimation of a full quadratic model, including interactions. Box-Benkhen designs have two parts:
  1. centerpoints, and
  2. points lying on one sphere, equally distant from the centerpoint.
The latter points consists of small two-level full factorials where some factors are fixed at their center values. The number of centerpoints is chosen to establish rotatability. Compare to central composite designs.
box plot A univariate graphical display of a distribution designed to facilitate the comparison of several groups, especially when each group has a substantial number of observations. Each group is represented by a box; the ends of the box denote the 25th and 75th percentiles; a mid-line denotes the median. In addition, from the ends of the box outward are two lines drawn to either (a) the largest and smallest values of the distribution, or (b) the largest and smallest values that are not considered outliers. By the latter convention, individual values that are considered outliers are plotted as particular points. Some software plots the average value also.
brushing The technique of highlighting a subgroup of observations, especially in a scatterplot matrix, but sometimes also in a histogram or other graphical display. In a scatterplot matrix, brushing helps is visualizing multivariate data. Typical computer implementations allow the user to redefine the subgroup in real time.


[C]
calibration In metrology, the process or method for comparing actual readings to their known values, and also of making suitable adjustments so that the agreement between the two is improved.
constraint In either an experiment or for a production process, a limitation in the range of a factor or combination of factors that is either physically not possible or greatly undesirable to execute.
capability The natural variation of a process due to common causes.
capability index, Cpk A measure of the natural variation of a stable process compared to the closeness of the specification limit(s). When the process is both stable and normally distributed, it is possible to estimate from Cpk the fraction of product out of specification.

Let LSL denote the lower specification limit and let USL denote the upper specification limit. Let AVG denote the mean or similar typical value of a distribution, and let SIGMA denote an estimate of the total common cause variation. Then Cpk is defined as the smaller of [ AVG - LSL ]/3*SIGMA and [ USL - AVG ]/3*SIGMA.

Sometimes only a lower or only an upper specification is appropriate. For a lower limit, the one-sided capability index called Cpl, defined as [ AVG - LSL ]/3*SIGMA, can used instead; for an upper limit, Cpu, defined as [ USL - AVG ]/3*SIGMA. Because of their similarity, Cpk is sometimes used as a general term to include the cases of both one- and two-sided specifications.

capability index, Cpk with 25 percent precision and 95 percent confidence If the process is repeated with at least 33 distinct repetitions, Cp, defined as [USL-LSL]/6*SIGMA, has a 95 percent confidence interval that is about plus or minus 25 percent of the estimated Cp value. This same principle holds approximately for the confidence interval of Cpk.

Further detail is available in AMD technical report 320.

Alternative methods are available to achieve comparable precision with smaller sample sizes under some circumstances. See the AMD technical report 326.

capability study Any study of the common cause variability of a process.
capable process
  1. a process in which there is sufficient tolerance in the specification range that, in principle, one can detect out-of-control situations and effect corrective action without placing production material in jeopardy

  2. A process for which the capability index Cpk exceeds 1.0. (Other criteria for Cpk are sometimes promoted. Among these are 1.33, 1.5, and 2.0, but these latter values are usually reserved for the label "manufacturable."
cause-effect diagram Also called a CE diagram, an Ishikawa diagram, and a fish-bone diagram. First presented by Kaoru Ishikawa, a picture describing the various causes and sources of variation on a particular quality of interest. The quality of interest is usually placed at the right, at the tip of a horizontal arrow. Major categories of causes branch off this main arrow in a manner reminiscent of bones of a splayed fish. Other coding conventions draw boxes around cause labels when the influence of a cause is quantified, and underline labels when such causes are believed to be important, but when the effect is not yet quantified.
census The method of data collection that involves assessing all the units in the sample frame, i.e. population. As opposed to a sample.
centerpoint In an experiment with quantitative factors, the experimental condition corresponding to all factors being set to the mid-point between their high and low values. Centerpoints serve to test for the presence of curvature, and give information about quadratic effects. When repeated, centerpoints also provide estimates of the magnitude of the experimental error.
central composite design Also known as a Box-Wilson or star composite design. An experimental design of three parts:
  1. A two-level full or fractional factorial design;

  2. "Star" or axial points in which each factor is varied to high and low levels with all other factors held constant;

  3. centerpoints.
The configuration of star points leads to variations: If one codes the two-level design part with -1 and +1, then the original Box-Wilson proposal varied the star points at particular values larger than 1; the precise value was chosen to ensure rotatability. Another alternative is to restrict the star points to +/-1.

Central composite designs have a further appeal in that they are amenable to iterative experiments and blocking. Compare to Box-Benkhen designs.

characteristic A distinguishing feature of a process or its output on which variables or attributes data can be collected. The response of a process.
characterization Any description of a process or its measurable output that aids in the prediction of its performance.
checklist A method of data recording, or of data analysis, in which the scale of the measurement is broken into distinct lines. On observing a value that falls in a particular interval, one records a vertical stroke. Each fifth stroke is drawn horizontally across the preceding four.
Clifford's method The robust calculation of control limits for the individuals chart. The centerline is calculated by the median, and the upper (lower) control limits are at the centerline plus (minus) 3.15 times the median moving range. Clifford originally proposed this method to reduce hand calculation; it has good properties for automated control limit recalculation also.
close-ended question In a survey, a question format that poses a question, and attempts to structure the answer (by yes-no, scale from 1 to 10, etc.)
common cause A source of natural variation that affects all of the individual values of the process output being studied. Typically, common causes are numerous, individually contribute little to the total variation (although the total variation can still be substantial), and are difficult to eliminate.
computer experiment A study of a fundamental physical process by the use of one or more computer simulators. Like empirical experiments, input variables (factors) are systematically changed to assess their impact upon simulator outputs (responses). Unlike empirical experiments, the simulator responses are deterministic, and this has implications: Computer experiments can appropriately have their factors with intermediate levels and the scope, especially the number of runs, can be more ambitious. Further, modeling methods based on interpolators (especially kriging) emerge as a viable approach. Good practice is to use Latin hypercubes for computer experiments, and advanced nonparametric modeling methods such as kriging, neural networks, and multivariate adaptive regression splines (MARS) in the data analysis stage. Important applications of computer experimental methods are for determining process optima and for evaluating process tolerances.
confidence interval
  1. Any statement that an unknown parameter is between two values with a certain probability. For example, if one says that the 95 percent confidence interval for theta is 1.1 to 10.3, this corresponds to the probability statement that Pr{ 1.1 <= theta <= 10.3 }=0.95.

  2. Based on the observation of a certain set of data, the range of plausible values of an unknown parameter that are consistent with observing that data. For example, if one says the 95 percent confidence interval for theta is 1.1 to 10.3 then this is equivalent to saying that based on the data observed, there is a 95 percent chance that theta is between 1.1 and 10.3
confidentiality In surveys, the degree to which the respondents' identities are kept unknown to the public, other respondents, and especially to the survey planners, interviewers, and administrators.
control A corrective action based on feedback.
control chart A graphical representation of a process characteristic. A time-sequence chart showing plotted values of a statistic or individual measurement, including a central line and one or more statistically derived control limits.

Some typical examples of control charts are X--R charts, batch averages ("individuals") control charts, within-wafer range and standard deviation charts, wafer-to-wafer range and standard deviation charts, cumsum charts, exponentially weighted moving average control charts, analysis of means control charts, and cumulative count control charts.

control factor Especially in an experiment, a factor or process input that is easy to control, has a strong effect on the typical value of a response, and has little effect on the magnitude of its variability. Usually distinguished from noise factors.
control group the set of observations in an experiment or prospective study that do not receive the experimental treatment(s). These observations serve (a) as a comparison point to evaluate the magnitude and significance of each experimental treatment, (b) as a reality check to compare the current observations with previous observation history, and (c) as a source of data for establishing the natural experimental error.
control limits The maximum allowable variation of a process characteristic due to common causes alone. Variation beyond a control limit is evidence that special causes may be affecting the process. Control limits are calculated from process data.
convenience sample In a survey, the selection of observational units according to the convenience of the investigator or the interviewer. To be distinguished from scientific randomization.
correlation Correlation is a measure of the strength of the (usually linear) relationship between two variables. The usual correlation coefficient, called the Pearson correlation coefficient, ranges from -1 to 1. A value of +1 corresponds to the case where the two variables are related perfectly by an increasing relationship; a value of -1 corresponds to a perfect, but decreasing relationship. In the case of the Pearson correlation coefficient, a value of +1 (-1) implies the relationship is linear and increasing (decreasing).
coverage error In surveys, the error that results when the pool of potential respondents (the sample frame) does not match the population to which one wishes to make generalizations.
critical parameters A critical parameter is a measurable characteristic of a material, process, equipment, measurement instrument, facility, or product that is directly or indirectly related to the fitness for use of a product, process, or service.
critical process module A node in the process flow whose output has a significant impact on the total process. Sometimes also called a critical process step.
cross-validation A family of methods based on the idea that the most unbiased test of the predictive error is by applying it to data that was not used in the building of the initial predictive model. A common application is to partition a dataset into two parts, to fit the model on the first part, and to assess the predictive capability of that model on the second part.
cusum chart A control chart based on CUmulative SUMs, sometimes also called a "cumsum" chart. If the value measured at time t is X(t), a cusum chart plots the value SUM{ X(u)-target: u=1,2,...,t }. Cusum charts are sensitive to drift, and to processes running systematically above or below target. The are most suitable for adjustable processes, and when the entity has only one recipe running in high volume. These properties make it similar to an EWMA chart.

Cusum control limits take the form of a backward-facing V-mask. A process is in control when all plotted points lie within this V-mask.

customers Organizations that use the products, information, or services of an operation.


[D]
data-driven The property of requiring data and facts, but not requiring subjective opinions. As opposed to opinion-driven.
data reduction The process of calculating from several numbers one or fewer numbers. An example is that one might have 9 readings taken across a wafer. A common data reduction would be to use the average and standard deviation, which is only two numbers. The benefits of data reduction are usually simplicity, interpretation ease, greater focus on issues of interest, and small data files.
demographics In surveys, information such as age, gender, place of residence, and annual income that can be taken to elucidate the responses, especially by identifying the respondent as a member of a particular group.
detection The class of process corrective monitors designed to determine whether production material is conforming to specifications. See also disposition. As opposed to prevention.
deterministic The property of being perfectly repeatable, and without experimental or observational error. Usually achievable only in computer experiments.
diagnostic A calculation or graph that serves to test one or more assumptions about a model.
digidot chart A hybrid chart that consists of a stem and leaf chart on the y axis, with the leaves pointing left, and a time trend plot to the right. In both cases the values plotted are the most significant few digits of the observed value.
disposition The class of product decisions that evaluate what is to be done with production material that has been manufactured outside specification. See also detection. As opposed to prevention.
distribution A representation of the frequency of occurrence of values of a variable, especially of a response.
dot plot A form of a histogram for which an observation with a value within a certain range is plotted as a dot a fixed interval above the previous dot in that same range. Useful for small numbers of observations.


[E]
effect The change in the average or expected value of a given response due to the change of a given factor. The change of the given factor is usually from the lowest to the highest value of those tried experimentally, and the units of the effect are usually in the same units as the response.
efficiency A fuzzy concept for the precision that can be achieved by a given estimation method and sample size. An efficient method estimates a population parameter with the shortest possible confidence interval.
EVOP, Evolutionary Operation An abbreviation for "evolutionary operation". An EVOP is a special type of on-line experiment with several distinguishing features:
  1. The experimental material is production material intended to be delivered to customers.
  2. In each experimental cycle, the standard production recipe is changed.
  3. The experimental factor levels are less extreme than in conventional off-line experiments.
  4. The experiment is run over a longer term with more material than in conventional off-line experiments.
EWMA chart
  1. Any control chart based on Exponentially Weighted Moving Averages. An EWMA chart plots a weighted average of the current observation and the previously plotted point; the weight of the current observation is denoted by lambda. Values of lambda between 0.3 to 0.7 are generally recommended; the value of 0.7 is better for "noisier" processes. A lambda of 0.4 behaves approximately like one with all 8 Western Electric rules active.

    EWMA charts are sensitive to drift, and to processes running systematically off target. It is most useful when the entity has only one process running in high volume. These features it shares with the cusum chart.

  2. At AMD, the EWMA chart also allows for the calculation of an optimal lambda value. The AMD implementation plots the average of the current monitor point and compares this average to designated control limits. In this regard, it resembles an individuals control chart. The EWMA part is implemented as a set of trend rules to be used instead of the Western Electric rules.

  3. As originally proposed by J Stuart Hunter, the EWMA chart would plot the exponentially weighted average of the current observation with the previously plotted point. These would be compared to appropriate control limits. In addition, the current observation would also be plotted, and compared to appropriate (e.g. individuals) control limits.


[F]
face-to-face survey A method of administering surveys whereby the respondents are interviewed by persons who are physically present.
factor The input variable of a process, and especially of an experiment. Experimental factors are particularly those variables that are deliberately manipulated during the experiment. Experimental factors can be divided further into control factors and noise factors. Control factors are those factors that are easy to control, and usually have a strong influence on the response. (A classic example is the time involved for a deposition process.) Noise factors are factors that are either difficult or inconvenient to control. A difficult-to-control noise factor might be the ambient air flow around a furnace tube. An inconvenient-to-control noise factor might be the recent use history of a wet clean sink.
factor level In experimental design, the value that an input variable or factor takes on.
factor range In experimental design, and especially for a quantitative factor, the difference between the highest value that the factor takes on and the lowest.
factorial experiment An experiment in which the values of each factor are used in combination with all the values of all other factors. A fractional factorial experiment takes a judicious subset of all combinations, with the following objectives in mind:
  1. the total number of experiments is small,
  2. the experimental space is well covered,
  3. for subsets of factors (say of size 2, 3, or 4), the total number of experimental combinations is kept large.
focus groups A method of interviewing people not individually, but in small groups. This method is often favored as a preliminary to formal questionnaire-based surveys. The groups are usually composed to be comparable in some way (income, age, etc). Its disadvantages include small sample sizes relative to the effort expended, potential biases from hearing other respondent views, and lack of structure for synthesizing results.
fuzzy concepts Concepts that, by their greater abstraction, admit both generalization and alternative approaches. For example, the average is usually calculated to estimate the typical value of a set of numbers. The average is a specific concept, whereas "typical value" is a fuzzy one.


[G]
gauge study A synonym for a metrology study.
Gaussian distribution See normal distribution.
global calibration In computer experiments, the practice of achieving good agreement between simulated and empirical readings by developing a function that transforms the raw simulated response. To be distinguished from affine calibration. See also calibration.
goodness-of-fit
  1. As a fuzzy concept, the opposite of lack of fit.
  2. Any measure of how close a probability model reproduces the frequencies of an observed distribution.
  3. A measure, such as R-squared, of how close a statistical model predicts observed values.


[H]
Hahn estimator A nonparametric estimate of average outgoing quality based on the binomial distribution and an assumption of an unobserved distribution on the binomial distribution's parameter p.
histogram A graphical display of a statistical distribution; a form of bar chart. One axis (usually x) is the scale of the values observed, the second (usually y) is the frequency that observations occur with (approximately) that value.
hypergeometric distribution An important distribution used to model discrete events, especially the count of defectives when sampling without replacement. The hypergeometric distribution depends on three parameters, N, n and D. N is the known and finite population size, n the known sample size (constrained to be less than or equal to N), and D, the unknown number of defectives. Unlike the binomial distribution, the hypergeometric distribution assumes sampling is without replacement, and that its parameters are all integer-valued.


[I]
imputation The replacement of unknown, unmeasured, or missing data with a particular value. The simplest form of imputation is to replace all missing values with the average of that variable. More sophisticated imputation methods use the correlation structure among observed variables. Imputation is most common in surveys of human populations. It is also used in certain computer experiment applications.
in control The opposite of being out of control.
individuals chart
  1. A control chart for variables data in which the rational subgroup size is one. A synonym for X chart.

  2. The algorithm for a variables data control chart in which the multiple readings of the rational subgroup are reduced to some single number, usually the average, and then limits calculated as if the rational subgroup size were one.
See also Clifford's method.
inspection The measurement of a characteristic and its comparison to a standard.
interaction A property of a physical process (of a model describing such a process) wherein the average (or predicted average) change in the response from changing a particular input factor depends on the values of other input factors.
interpolator Any predictive algorithm that always perfectly reproduces the observations used for model construction. Useful for computer experiments.
interview survey A method of administering a survey in which the respondent is interviewed verbally and the answer recorded by the survey taker.
interviewer error In surveys, a component of measurement error that results from the respondent modifying answers in response to the interviewer's behavior, nonverbal cues, or verbal statements.
item stem and leaf plot For questionnaires, a method for presenting the average results of several questions with a common Likert scale. Each question is plotted at approximately its average value and labeled by its question number. The format resembles a histogram of the question averages, with the question labels used to imply the relative frequency of the observed averages.


[J] [K]
kriging An interpolator easily generalized to multiple dimensions and arbitrary configurations of observed points. Nonetheless, kriging is analogous to least squares. A point at which a kriging prediction is desired is thought to be more "correlated" to the closer observed points in the observation space. Further, as this point approaches another that is actually observed, the correlation approaches 1.0. From these ideas, one can formalize a prediction method, kriging. For an experiment of n observations, kriging requires the inversion of an n x n matrix, making it awkward to use to large n.


[L]
lack of fit A property of a model with respect to a set of observations. Lack of fit refers to the degree to which the model does not predict or fit the observations. Lack of fit can be due to experimental error and uncertainty in the process obtaining the observations, or it may be due to a defect in the model.
Latin hypercube design An experimental design consisting of n trials, and for which each factor has n distinct levels. Usually the factor levels are equally spaced. The best Latin hypercube designs are based on orthogonal arrays. Latin hypercube designs are especially useful for computer experiments.
Latin hypercube sampling A computer experimental method that uses Latin hypercube designs in order to estimate distributions of the simulator outputs. The use of Latin hypercube designs allows Latin hypercube sampling to be quite a bit more precise than Monte Carlo methods. The distributions of the input factors are represented in the spacing of the factor levels.
LDL, lower detection limit The level at which a measurement system ceases to discriminate effectively between background noise and the actual value. An exact definition of LDL turns out to be difficult to operationalize. Common practice is to take samples lacking the trace characteristic entirely, assay them, and report the LDL as the average measurement plus two standard deviations.
Likert scale In questionnaires, the answer format that requires the respondent to pick one of a few values along a scale. 5-point and 9-points are common Likert scales. The two ends of a Likert scale are opposites, and the middle values represent degrees in between.
linearity In metrology, the difference in bias throughout the range of the measured instrument. This definitions is best understood if one views the relation between measured result on the y-axis and the true value on the x-axis. Ideal linearity is a line with slope 1.0. (Pure bias would correspond to the intercept=0.0.) Linearity is a little bit of a misnomer, for it refers to any difference from a line with slope of 1.0, and this can happen both by having a nonlinear relationship, and by having a linear relationship, but with a slope other than 1.0.
logistic function The function 1/(1+exp(-x)). The logistic function is skew-symmetric about zero, since logistic(x)=0.5-logistic(-x). Applications include modeling dose-response curves, heavy-tailed distributions, and as a "squashing" function in neural network modeling.


[M]
mail survey A method of administering surveys whereby the respondents are contacted by mail. Salant and Dillman (1994) present a series of strategies for improving mail survey response rates. These include interesting cover designs, accelerating reminder post cards and letters, and final contact by certified letter.
matching In a retrospective study, a method for identifying a comparison group. Matching pairs observational unit: each unit that has both trait-of-interest A and nuisance effects B,C,... with another unit that lacks trait-of-interest A, yet still shares B,C,... Low yielding lots (trait-of-interest is yield) are in this way compared to well yielding lots of the same product started at about the same time. Matches in this way are more sensitive to key causal differences (for example, in the particular equipment set used) than would occur from taking "matches" from all available lots. Matching is a way of implementing commonality studies. Matching is a kind of blocking for retrospective studies.
MCA Measurement capability assessment, or sometimes a measurement capability analysis. A metrology characterization. Sematech definitions focus on (a) repeatability and (b) reproducibility. Broader definitions would assess (c) sensitivity to changes in the phenomenon being measured--such sensitivity is desirable--and (d) sensitivity to features other than the phenomenon being measured--such sensitivity is not desirable.
mean time between failures (MTBF) For one or a class of systems, the average time between one failure of a system and the next failure of a system. This average time excludes the time spent waiting for repair, the time spent being repaired, the time spent in being requalified, and so on; it is intended to measure only the time a system is available and operating.
measurement error
  1. the variability observed that can be attributed to the metrology or measurement system. Measurement error can be decomposed further into miscalibration, in sensitivity, repeatability, and reproducibility.

  2. In surveys, the error that results when a respondent's answer is inaccurate, imprecise, or not easily compared to those of other respondents. Salant and Dillman (1994) divide such measurement error into errors in method, questionnaire, interviewer, and respondent.
meta-analysis A family of statistical methods that quantitatively combine the results of separate investigations into a single statement of overall significance.
metamodel calibration The practice of determining unknown parameters of a model by the following steps:
  1. Run a computer experiment by varying the unknown parameters, and recording the expected responses.
  2. Fit a model of general form, especially a neural network, using the responses of the computer experiment as inputs and the factors as the outputs.
  3. Extract the unknown parameters as the outputs that result from this model when the inputs are taken to be the empirically observed values.
method error The part of the measurement error attributable to the details of the measurement process. In surveys, for example, one can administer questionnaires by face-to-face contact, by mail, by telephone, and so on. These different methods are recognized to give different results, and to the degree that they do, this is an example of a method error.
metrology study Sometimes called a gauge capability study, or measurement capability assessment. Such a study quantifies the capabilities and limitations of a measurement instrument, often estimating its repeatability, reproducibility, and sometimes its sensitivity.
mixture experiment An experimental design in which each experimental run is constrained such that when summed across the factors, the factor levels are constrained to sum to a constant. The typical applications involve chemical experiments in which the factors are liquids, or sometimes gases. In such a case, it is the proportion of each liquid ingredient, not its weight or volume, that is the essential issue.
model A mathematical statement of the relation(s) among variables. Models can be of two basic types, or have two basic parts: statistical models, which predict a measured quantity; probability models, which predict the relative frequency of different random outcomes.
monitor variable A measurable characteristic of a process that is particularly relevant and informative for purposes of process control. To be distinguished from a critical parameter, which is more relevant for product acceptance.
Monte Carlo sampling A computer experimental method that uses random numbers in order to estimate distributions of simulator outputs.


[N]
neural nets See neural network models.
neural network models A highly flexible modeling method that postulates one or more layers of unobserved variables. Each unobserved variable is a linear function of variables of the previous layer (and the first layer are the factors, or model inputs). As output to the next layer, the output of each unobserved variable is nearly always transformed by a nonlinear function, most commonly the logistic function. Neural networks are sometimes use for analysis of computer experiments, especially when the size of the experiment makes kriging impractical.
noise factor Especially in an experiment, a factor or process input that can be either difficult or inconvenient to control. Noise factors also include product use conditions (the temperature, test conditions, environment). Usually distinguished from control factors.
noise-to-signal ratio The ratio of the measurement system's precision to the average measurement value; the reciprocal of the signal-to-noise ratio. The noise-to-signal ratio allows one to express the magnitude of measurement precision on a percentage scale.
nonresponse The event that occurs during an experiment, survey, or observational study in which the responses (results of interest) cannot be measured or completely recorded.
nonresponse error In surveys, the error that results when sampled respondents decline to answer the questionnaire, especially when these respondents, viewed as a whole, seems to be different those who do answer in a way that is important to the study.
normal distribution A symmetric distribution with one high point or mode, sometimes also called the bell curve. The average is one of many statistical calculations that, even for only a moderate amount of data, tend to have a distribution of that resemble the normal curve. In industry, there are four important properties of the normal distribution:
  1. it is symmetric,
  2. within plus and minus one standard deviation about 68 percent of the distribution is enclosed,
  3. within plus and minus two standard deviations, 95 percent, and
  4. within plus and minus three standard deviations, 99.7 percent.


[O]
objective methods Methods of data collection, and especially of data analysis, characterized by the fact that they do not depend on the opinions or knowledge particular to an individual. Objective methods are reproducible, in a scientific sense, and in principle amenable to reduction to software algorithms.
off-line SPC Techniques such as histograms, checklists, Pareto charts, capability indices, and designed experiments that are intended to characterize selected properties of a process without necessarily determining when to invoke a control algorithm to investigate or correct for special causes.
on-line SPC Techniques such as control charts that seek to monitor a process relative to its natural variation and seek to identify when the invocation of a control algorithm, either to investigate or correct for special causes, is warranted. Certain statistical techniques, such as EVOP, seek the dynamic optimization of a process; these are also on-line SPC techniques.
open-ended question In a survey, a question format that poses a question, but does not attempt to structure the answer (by yes-no, scale from 1 to 10, etc). Rather, the respondent is expected to reply in his or her own words, orally or in writing.
opinion-driven The property of depending on personal opinion, arbitrary fudge factors, or other choices not objectively grounded. As opposed to data-driven.
optimal design The approach to creating experimental designs using a computer algorithm maximizing an objective funtion. The most common objective function is the determinant of the coefficients' variance-covariance matrix; such designs are called D-optimum. In contrast to the optimal design approach is that based on orthogonal arrays.
optimum
  1. Especially as determined by an experiment, the combination of factor setpoints that achieve the best balance of the responses of interest.
  2. The average response values achieved at such a set of factor setpoints.
orthogonal array A table consisting of rows and columns with the property that for any pair of columns (factors) all combinations of values (levels) occur, and further, all combinations occur the same number of times.
outliers Observations whose value is so extreme that they appear not to be consistent with the rest of the dataset. In a process monitor, outliers indicate that assignable or special causes are present. The deletion of a particular outlier from a data analysis is easiest to justify when such an usual cause has been identified.
out of control A process is out of control when a statistic such as an average or a range exceeds control limits or when, although within the control limits, a significant trend or pattern in this statistic emerges. Being out of control defines a time-bounded state, not an intrinsic property of a process. By analogy, at any given time, a driver may be involved in an accident (out of control) or not. The intrinsic property of the process is whether the driver is a safe driver or not (whether the frequency of out-of-control conditions is excessive or not). To determine the latter, the intrinsic safety (stability), typically requires observation over a sustained period of time.


[P]
Pareto analysis A technique for problem solving in which all potential problem areas or sources of variation are ranked according to their contribution.
partially open question In a survey, a question format that poses a question, structures the answer somewhat, but also admits the respondent to reply verbally. A hybrid of close-ended and open-ended questions.
PDC Passive data collection, sometimes called a prospective observational study. An early phase of engineering characterization in which a process is repeated and measured, but in which interventions--adjustments, modifications, recipe changes--are avoided. Associated with Sematech qualification plan.
Poisson distribution An important theoretical distribution used to model discrete events, especially the count of defects in an area. The Poisson distribution depends on one parameter, lambda, which represents the average defect density per observation area (or volume, time interval, etc.). The Poisson distribution assumes that the counts of defects in two non-overlapping observation units are independent. Further, the Poisson distribution assumes the distribution of defect counts depend only on the area in which they are to be observed. Unlike the binomial distribution, the Poisson distribution in principle sets no limit to the number of defects that can be observed in any area. Of particular interest the semi-conductor industry, the Poisson probability of observing zero defects in a region of area A, exp{-lambda A}, is useful for yield modeling.
population The entire set of potential observations (wafers, people, etc) about whose properties we would like to learn. As opposed to sample.
precision
  1. in metrology, the variability of a measurement process around its average value. Precision is usually distinguished from accuracy, the variability of a measurement process around the true value. Precision, in turn, can be decomposed further into short term variation or repeatability, and long term variation, or reproducibility.

  2. A fuzzy concept term for the general notion that one knows more or has shorter confidence intervals if one has more data; that is, more data gives greater precision in answers and decisions.
prevention The class of process monitors and corrective actions taken before production material is placed in jeopardy.
probability plot A plot designed to assess whether an observed distribution has a shape consistent with a theoretical distribution, especially with the normal distribution. The values observed are plotted against the expected order statistics from the theoretical distribution. When a straight line is apparent, the observed and theoretical distributions are said to have the same shape. Probability plots are especially good when the observed distribution consists of many observations, and useful for comparing at most only a few groups.
process A combination of people, procedures, machinery, material, measurement equipment, and environmental conditions for specific work activities. A repeatable sequence of activities with measurable inputs and outputs.
process signature The characterization of a process, including its sensitivity to input variables, its magnitude of natural variation, its sensitivity to variation in incoming material, and its dynamic and output profiles, both when operating naturally and when behaving aberrantly.
process capability study A study that quantifies the common cause variability of a process. See also capability study.
proctor survey A method of administering a survey in which the respondents are placed into a room that is attended by a person, a proctor. Proctor surveys preserve the confidentiality of the respondents answers, yet provide sufficient administrative structure so that one can ensure high response rates.
prospective study A kind of nonexperimental study in which sample selection and all investigated phenomena occur after the onset of the study. See also PDC.
proxy In a characterization, a variable that is used to replace another either because, in the case of a response, it is easier to measure, or because, in the case of a factor, it is easier to manipulate.
P/T ratio In metrology when applied to a manufacturing situation, the "precision-tolerance" ratio. The precision element is usually the 3 standard deviation magnitude of measurement error (precision, reproducibility), and the tolerance element is usually the corresponding half-tolerance: USL-(USL+LSL)/2, where USL (LSL) denote the upper (lower) specification limits, respectively. A common goal for a new metrology or process development project is to achieve a P/T ratio of 0.1.


[Q]
questionnaire error In surveys, the part of the measurement error that can be attributed to the questionnaire's form, structure, and wording. More at validity.


[R]
randomization, scientific
  1. The assignment of experimental material to treatments and treatment order through the use of random number tables.
  2. The selection of observational units through the use of random number tables. Scientific randomization is to be distinguished from arbitrary assignments and selection, and from systematic assignments (e.g. wafers 1-12 receive treatment A, 13-24 treatment B).
range For a given set of observations, the difference between the highest and lowest values.
rational subgroups Multiple readings taken to monitor a process, including the magnitude of short term variation. Rational subgroups of size 2 to 6 are the most common. Well constituted rational subgroups are the basis of SPC's most sensitive Shewhart charts, the X-bar-R (X-bar-S) chart.
R chart A control chart that plots ranges. Like S charts, R charts are typically used to monitor process uniformity, and measurement precision. Constant sample sizes for the rational subgroups are strongly recommended. There is a special set of Western Electric rules for R charts when the rational subgroup size is two. When the rational subgroup size is greater than 9, S charts are preferred to R charts for reasons of efficiency.
reference group A group of observations, or a group that could be observed, that serves as a point of comparison in a study. A reference group has a function similar to that of a control group, but, unlike a control group, does not carry the connotation that it was constructed deliberately randomly.
repeatability In metrology, the component of measurement precision that is the variability in the short term, and that occurs under highly controlled situations (e.g. same metrology instrument, same operator, same setup, same ambient environment, etc.).
reproducibility In metrology, the total measurement precision, especially including the components of variability that occur in the long term, and occurring from one measurement instrument to another, one laboratory to another, etc.
residual The difference between the actual value observed and the prediction or fitted value derived from a model. Residuals give information both about the model's lack of fit, and also about experimental error of the measurement process.
resolution
  1. In experimental design, especially for two-level designs, the length of the word of the shortest confounding relationship. Geometrically, design resolution corresponds to the 1 plus the strength.

  2. In metrology, the number of significant digits of a measurement system that can be meaningfully interpreted.
respondent error In surveys, a component of measurement error that results from the respondent deliberately or inadvertently answering incorrectly.
response The measured output of a process or experiment. Responses usually depend on the choice of metrology tool. In planning experiments, several responses are usually of interest, and their selection is tied closely to overall purpose of the study.
response surface model (RSM) A polynomial model of several factors, especially one including terms for linear, quadratic, and second-order crossproducts.
retrospective study A kind of nonexperimental study in which all the phenomenon investigated occurs prior to the onset of the study. Further, the samples of retrospective studies are usually chosen by the value the responses take. This latter point creates special conceptual issues regarding causality, and the composition of comparison samples (see matches) is especially important. Advantages of retrospective samples is that they allow one to investigate phenomena that are either unlikely or undesirable to occur in the future; further, since all key events occur in the past, retrospective studies can often be undertaken economically.
robust methods Methods of data analysis that are robust are not strongly affected by extreme changes to small portions of the data; their answers do not change very much from the presence of outliers. A classic example of a robust method is the median.
rotatable The property of an experimental design that minimizes the correlation among the terms of a full quadratic model, (including interactions), thereby allowing one to select some terms without regard to the significance of other terms. A generalization of orthogonality to response surface designs.
R-squared A statistic for a predictive model's lack of fit using the data from which the model was derived.
  1. R-squared is calculated as 1 minus the following ratio:
    SUM[ squared residuals from  model ]/
    SUM[ squared deviations from mean ]

    A perfectly fitting model yields an R-squared of 1.

  2. The latter definition is flawed by giving more credit to complicated models than is appropriate. To achieve an average value of zero when the model has no merit, R-squared-adjusted is often proposed.


[S]
sample
  1. The set of observational units (wafers, people, etc) whose properties our study is to observe. When we select a sample by scientific randomization, we are more easily able to generalize our conclusions to the population of interest. As opposed to population.

  2. For a given characteristic, the collection of measurements that are actually observed.
sample size The number of observations in, or planned to be in, a study or other investigation. Key considerations in selecting a particular sample size are
  1. value associated with any particular level of precision,
  2. the costs of obtaining observations, and
  3. available resources.
Some generic advice on sample sizes is
  1. 16, to estimate the center of a distribution by its average,
  2. 20, to estimate the correlation between two measurements,
  3. 32 per group, to estimate average difference between two groups,
  4. 50, to estimate the standard deviation of a distribution.
sample frame In sampling theory, the set of all units from which a sample is drawn. The sample frame is synonymous with the statistical population, but has a more technical and precise connotation relating to a particular enumeration of the population elements.
sampling distribution The distribution of a summary quantity or statistic.
sampling error In surveys, the error that results when the selection of respondents (the sample) is biased in a way so that the population about which one wishes to make conclusions is not accurately represented.
scatter plot A graph of a pair of variables that plots the first variable along the x-axis and the second variable along the y-axis. In a scatterplot, the points of successive pairs are not connected.
scatterplot matrix A graph of several variables that plots all pairs of variables in a corresponding scatterplot. In turn, these scatterplots are arranged in the form of an upper triangular matrix. In any row of this matrix, the y axes of all plots are always the same variable; in any column, the x axes also the same variable.
S chart A control chart that plots standard deviations. Like R charts, S charts are typically used to monitor process uniformity, and measurement precision. Constant sample sizes for the rational subgroups are strongly recommended. There is a special set of Western Electric rules for S charts when the rational subgroup size is two. S charts are preferred to R charts for reasons of efficiency regardless of rational subgroup size, but this becomes especially important for sizes greater than 9,
sensitive methods Methods of data analysis that are able to detect the presence of phenomena in the presence of noise, or in spite of small samples. Sensitive methods make the most efficient use of available data, and are especially useful when analyzing small datasets, such as from experiments. A classic example of a sensitive method is the average. When the underlying data comes from a single normal or Gaussian distribution, the average is the most sensitive method for estimating the distribution's center.
sensitivity In metrology, the rate at which the average measurement changes to changes in the true value. Often reported in units of percentage change to unit percentage change. The term is also used in the interpretation of response surface models.
sensitivity study An investigation of a process that identifies how strongly the input parameters affect one or more desired output characteristics.
sequential studies A style of investigation, especially in experiments, whereby a study is broken into a series of distinct phases, and the results of each phase are allowed to influence subsequent phases.
special cause A source of variation that is large, intermittent or unpredictable, affecting only some of the individual values of the process output being studied. Also called an assignable cause.
specification limits The numerical values defining the interval of acceptability for a particular characteristic.
split A group of experimental units that is processed in identical fashion. For example, a 2x2 factorial experiment would have four splits. When applied to a lot of 24 wafers, 6 wafers would be assigned to each "split."
stability The degree to which observations of a process can be represented by a single random "white noise" distribution, in which the prediction of the next value is not improved by knowing the process history.
stable process A process that is in a state of statistical control.
standard deviation A measure of spread or dispersion of a distribution. It estimates the square root of the average squared deviation from the distribution average, sometimes called the root-mean-square. Among all measures of dispersion, the standard deviation is the most efficient for normally distributed data. Also, unlike the range, it converges to a single value as more data from the distribution is gathered.
standard error The standard deviation for a statistic's sampling distribution. Because many have sampling distributions that are approximately normal, plus and minus 2 standard errors is usually an approximate 95 percent confidence interval.
statistic A value calculated from sample data.
statistical control The state of a process that is influenced by common causes alone. See in control.
statistical design of experiments (SDE) Also called design of experiments (DoE, DoX).
  1. The theory of experimental design emphasizing factorial and fractional factorial designs, response surface modeling, and analysis of variance methods.

  2. A particular experiment based on this theory.

  3. The scientific principles, experimental design strategies, and model building and evaluation techniques that lead to the efficient and thorough characterization and/or optimization of products and processes.
statistical process control (SPC) The conversion of data to information using statistical techniques to document, correct, and improve process performance.
SPC tools A didactic list from Kaoru Ishikawa of the following easy-to-use tools:
  1. Pareto charts and bar charts
  2. cause-effect and process flow diagrams,
  3. stratification,
  4. checklist, stem-and-leaf plots, and digidot charts
  5. histograms and dot plot,
  6. scatterplots, and scatterplot matrices,
  7. trend charts and control charts.
Ishikawa has introduced slight variations to this list over time.
statistical quality control (SQC) Statistical methods and procedures used to measure, monitor, assure, and document compliance with requirements.
stem and leaf plot A variation on a checklist in which a value is recorded in an interval not with a stroke but with one or more of its most significant digits.
strata Plural of stratum. Especially in survey work, the several subgroups that define the population of interest. Often, sampling plans are modified for the different strata; in particular, in small important ones it is common to measure all members.
stratification
  1. In SPC sampling, the property by which samples are more systematically broad than would be expected by chance.
  2. In surveys, a systematic property of the sample, by which one or more demographics have an association with the response.
  3. Same property as in 2, except that the property is associated with the sample frame (i.e. population).
strength For an orthogonal array, the largest dimension t such that any subset of t factors is a full factorial design.
survey A method of data collection that involves asking a fixed set of questions from selected individuals. Key issues involve questionnaire development, (ideally random) sample selection, and nonresponse management.
survey errors The sources of variability in a survey. Following Salant and Dillman (1994), these errors can be classified as coverage error, sampling error, measurement error, and nonresponse.


[T]
target value The ideal value of a parameter or characteristic.
telephone survey A method of administering surveys whereby the respondents are contacted by telephone.
transformation A function that serves to modify a response or factor, usually motivated to make a particular model fit better or be more easily interpreted. The most common transformation is to replace a variable by its logarithm.
trend chart A graph that plots values on the y-axis against the time at which they either occurred or were measured on the x-axis. Usually, the plotted pointed are connected with a line to emphasize the time order.
troubleshooting guide A pre-defined action plan that defines appropriate corrective actions when out-of-control conditions occur. Troubleshooting guides may take the form of checklists or decision flowcharts, and may be implemented on paper or as computer applications.
troubleshooting guide, detailed A troubleshooting guide is detailed when
  1. it is non-trivial,
  2. it contains sections for production, engineering, and maintenance, and
  3. it addresses violations for each of the SPC rules designated for that monitor.
troubleshooting guide, non-trivial A troubleshooting guide is non-trivial (a) when it checks for issues other than the quality of the measurements and (b) when it provides for actions other than the calling for assistance.
troubleshooting guide, preliminary A preliminary troubleshooting guide is often trivial in the sense that
  1. it focuses on checking the quality of measurements and
  2. resolves problems by calling for assistance.
In addition, preliminary troubleshooting guides typically contain guidelines for monitoring the process, and recording observations and reacting to out-of-control points.
tweaking For a process, the practice of routine adjustment in order to compensate for deviations from target. For a process that is in control, tweaking actually increases process variation, and is therefore not recommended.
2-level designs A category of experimental designs in which the input factors take only two distinct values (two distinct levels).

[U]
uncertainty A term for the fuzzy concept of qualifying statement of what is known or concluded with quantitative statements of probability. Uncertainty usually has two aspects:
  1. that created by the experimental (i.e. observational) error associated with taking observations, and
  2. that implied by the use of imperfect models.


[V]
validity In questionnaire development theory, validity is a measure of how well the questionnaire measures what it says it measures. Content validity is the statement that the questions seems appropriate and relevant when reviewed by experts in the field. Concurrent validity is the degree to which questionnaire results are related to other relevant measures taken at approximately the same time. Content validity is the degree to which a measure involves or reflects the actual content (e.g. English grammar on an English grammar test) of the variable measured. Predictive validity is the degree to which a measure predicts the future behavior or results it is designed to predict. See also questionnaire error.
variance components Variance components are estimates of contributions to total common cause variation that are attributable to distinct causal or sampling parameters. One example is to describe total thickness variation as the sum of contributions from variation in gases, temperature, power, etc. Another example is to describe the total variation in an electrical parameter in terms of the sum of contributions from lot-to-lot variation, wafer-to-wafer variation, within wafer variation, and measurement error.
variation The difference among individual outputs of a process. The causes of variation can be grouped into two major classes-- common causes and special causes; the common cause variation can often be decomposed into variance components.


[W]
Western Electric rules A set of rules that can be applied to most Shewhart control charts that define when a plotted point signals an out-of-control condition, even though it may be still within the control limits.


[X]
X-bar-R chart
  1. Any pair of control charts that plots both the average (the "X-bar") and the range of a rational subgroup. By convention, the plot of averages is above the plot of ranges, with the two x-axes, denote rational subgroup sequence order, aligned.

  2. In addition, the algorithm for calculating the width of the control limits for the averages based solely on the centerline of the ranges. Such an approach can be appropriate for
    • single wafer processes,
    • when the rational subgroups consist of multiple wafers, and
    • when the ranges are computed from wafer averages only.
See also X-bar-S charts.
X-bar-S chart A control chart that plots both the average (the "X-bar") and the standard deviation of a rational subgroup. X-bar-S charts are completely analogous to X-bar-R charts, except that the role of the ranges has been replaced by standard deviations. As with R charts compared to S charts, X-bar-S charts are more efficient than X-bar-R charts.
X chart A synonym for an individuals chart.


[Y]
yield The number of units that pass some inspection criteria divided by the number submitted.

[Z]

Source: NIST/SEMATECH e-Handbook of Statistical Methods

NOTE: This glossary is being adapted from a glossary kindly donated by Bill Heavlin of Advanced Micro Devices.

Certisafety Section Home Page

Copyright ©2000-2016 Geigle Safety Group, Inc. All rights reserved. Federal copyright prohibits unauthorized reproduction by any means without permission. Students may reproduce materials for personal study. Disclaimer: This material is for training purposes only to inform the reader of occupational safety and health best practices and general compliance requirement and is not a substitute for provisions of the OSH Act of 1970 or any governmental regulatory agency. CertiSafety is a division of Geigle Safety Group, Inc., and is not connected or affiliated with the U.S. Department of Labor (DOL), or the Occupational Safety and Health Administration (OSHA).