A Distribution-Free Control Chart for Retrospective
Location Analysis of Subgrouped Multivariate Data
by
Richard C. Bell, Jr.
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
August 6, 2011
Keywords: phase I, preliminary, in-control reference sample,
robust, nonparametric, data depth
Copyright 2011 by Richard C. Bell, Jr.
Approved by
Saeed Maghsoodloo, Co-chair, Professor of Industrial Engineering
L. Allison Jones-Farmer, Co-chair, Associate Professor of Management
Nedret Billor, Associate Professor of Mathematics and Statistics
Alice E. Smith, Professor of Industrial Engineering
ii
Abstract
In multivariate quality control, a proper Phase I analysis is essential to the success of
Phase II monitoring. Even self-starting methods, which seek to minimize the Phase I process,
usually recommend a single retrospective analysis at some point in the control charting process.
This is true regardless of the underlying distribution of a process, which cannot often be assumed
to be multivariate normal. A literature review reveals no distribution-free Phase I multivariate
techniques in existence, so this research seeks to fill that gap by developing a distribution-free
method of establishing an in-control reference sample for subgrouped multivariate processes in
Phase I. The resulting multivariate sample, representing the in-control state of a process, can
then be used to estimate the appropriate parameters for the Phase II multivariate quality control
monitoring method of choice.
The proposed method, which assumes constant covariance within subgroups, uses data
depth in conjunction with robust estimators to detect both isolated and sustained shifts in
subgroup location. Using Monte Carlo simulation, the proposed method is compared to the
traditional Hotelling's T2 chart with a Phase I upper control limit. Although Hotelling's T2 chart
is preferred when data are multivariate normally distributed, the proposed method is shown to
perform significantly better than Hotelling's T2 chart when a process distribution is heavy-tailed
or skewed.
iii
Acknowledgements
The author would first like to thank the United States Army for allowing him the
opportunity to pursue his dream of achieving a doctoral degree. He dedicates this work to all
veterans of the armed forces, in particular those who have given their lives in defense of this
great country. The author is also deeply grateful to Dr. L. Allison Jones-Farmer for suggesting
this research topic and spending countless hours guiding this research as committee co-chair, to
Dr. Saeed Maghsoodloo for his expert advice as committee co-chair, and to Dr. Nedret Billor and
Dr. Alice E. Smith for their valuable contributions as committee members. In addition, the
author is extremely thankful for the keen insights provided by researchers outside the university
such as Dr. Robert Serfling, Dr. Joe H. Sullivan, and Dr. Satyaki Mazumder. Finally, the author
would like to acknowledge that this work would not have been possible without the guiding hand
of God in his life and the unwavering support of his family and friends, especially his wonderful
mother Phyllis Carter and his beautiful fiance Heide Matthews.
iv
Table of Contents
Abstract ........................................................................................................................................... ii
Acknowledgements ........................................................................................................................ iii
List of Tables ................................................................................................................................. ix
List of Figures ................................................................................................................................. x
List of Abbreviations .................................................................................................................... xii
1 Introduction and Literature Review ........................................................................................ 1
1.1 Background and Motivation ............................................................................................. 1
1.2 Differences Between Phase I and Phase II ....................................................................... 3
1.3 Phase II Multivariate Control Charting Methods ............................................................. 6
1.3.1 Phase II Multivariate Parametric Charts ................................................................... 7
1.3.2 Phase II Multivariate Distribution-Free, Nonparametric, and Robust Charts .......... 9
1.3.3 Phase II Multivariate Rank-Based Charts ............................................................... 10
1.4 Self-Starting Multivariate Control Charting Methods ................................................... 12
1.5 Phase I Multivariate Control Charting Methods ............................................................ 14
1.6 Developing a Distribution-Free Phase I Procedure -- A Univariate Example ............... 17
1.7 Special Considerations in Multivariate Quality Control ................................................ 21
v
1.8 Organization of Dissertation .......................................................................................... 22
2 Measuring Centrality of Multivariate Data Using Data Depth ............................................. 23
2.1 Fundamentals of Data Depth .......................................................................................... 23
2.2 Desirable Properties of Data Depth Functions ............................................................... 24
2.3 Robust Mahalanobis Depth ............................................................................................ 27
2.4 Mahalanobis Spatial Depth ............................................................................................ 32
2.5 Simplicial Depth ............................................................................................................. 36
3 The Multivariate Mean-Rank (MMR) Control Chart ........................................................... 40
3.1 Introduction .................................................................................................................... 40
3.2 Design of the MMR Chart .............................................................................................. 42
3.2.1 The MMR Control Chart Statistic........................................................................... 45
3.2.2 Empirical Control Limits for the MMR Chart ........................................................ 47
3.2.3 Analytical Control Limits for the MMR Chart ....................................................... 49
3.3 Example Application of the MMR Chart ....................................................................... 52
4 MMR Chart Performance Assessment Methodology ........................................................... 54
4.1 Introduction .................................................................................................................... 54
4.2 Establishing Baseline Performance Using Hotelling's T2 Chart .................................... 54
4.3 Simulating Symmetric and Skewed Process Distributions ............................................ 56
4.4 Evaluating In-Control Performance ............................................................................... 57
4.5 Evaluating Out-of-Control Performance ........................................................................ 58
vi
4.6 Evaluating Out-of-Control Performance with Skewed Data .......................................... 60
5 MMR Chart Performance Comparisons ............................................................................... 64
5.1 Introduction .................................................................................................................... 64
5.2 MMR Chart Performance with Symmetric Distributions .............................................. 64
5.2.1 In-Control Performance with Symmetric Distributions .......................................... 65
5.2.2 Isolated Shifts of the Mean with Symmetric Distributions ..................................... 70
5.2.3 Sustained Shifts of the Mean with Symmetric Distributions .................................. 73
5.3 MMR Chart Performance with Skewed Data ................................................................ 78
5.3.1 In-Control Performance with Skewed Data ............................................................ 78
5.3.2 Isolated Shifts of the Mean with Skewed Data ....................................................... 80
5.3.3 Sustained Shifts of the Mean with Skewed Data .................................................... 82
5.4 MMR Chart Performance with Larger Subgroup Sizes ................................................. 91
5.5 Robust Estimators of Location and Scatter for the MMR Chart .................................... 94
6 An Example Phase I Analysis Using the MMR Chart ........................................................ 102
6.1 Simulating the Contaminated Reference Sample ......................................................... 102
6.2 Removing Outliers from the Sample ............................................................................ 103
6.3 Analyzing the Results................................................................................................... 107
7 Conclusion .......................................................................................................................... 108
7.1 Synopsis of Findings .................................................................................................... 108
7.2 Summary of Research Conducted ................................................................................ 108
vii
7.3 Recommendations for Phase I Analysis ....................................................................... 109
7.4 Recommendations for Phase II Monitoring ................................................................. 111
7.5 Future Research Directions .......................................................................................... 113
References ................................................................................................................................... 115
Appendices .................................................................................................................................. 123
Appendix A: MATLAB Code for Computing Robust Mahalanobis Depth .......................... 125
Appendix B: MATLAB Code for Computing Mahalanobis Spatial Depth ........................... 126
Appendix C: Expanded Table of Empirical UCLs for the MMR Chart ................................ 127
Appendix D: MATLAB Code for Finding Empirical UCLs for the MMR Chart ................. 128
Appendix E: Empirical UCLs for Hotelling's T2 Chart .......................................................... 132
Appendix F: MATLAB Code for Finding Empirical UCLs for Hotelling's T2 Chart ........... 133
Appendix G: MATLAB Code for Assessing MMR Chart Performance ............................... 139
Appendix H: MATLAB Code for Assessing Hotelling's T2 Chart Performance ................... 147
Appendix I: Simulation Results Using In-Control Symmetric Data ...................................... 153
Appendix J: Simulation Results Using Symmetric Data with an IS in p = 2 ......................... 154
Appendix K: Simulation Results Using Symmetric Data with an IS in p = 5 ....................... 155
Appendix L: Simulation Results Using Symmetric Data with an IS in p = 10 ....................... 156
Appendix M: Simulation Results Using Symmetric Data with a 5% SS in p = 2 ................. 157
Appendix N: Simulation Results Using Symmetric Data with a 15% SS in p = 2 ................ 158
Appendix O: Simulation Results Using Symmetric Data with a 30% SS in p = 2 ................ 159
viii
Appendix P: Simulation Results Using Symmetric Data with a 5% SS in p = 10 ................. 160
Appendix Q: Simulation Results Using Symmetric Data with a 15% SS in p = 10 .............. 161
Appendix R: Simulation Results Using Symmetric Data with a 30% SS in p = 10 .............. 162
Appendix S: Simulation Results Using In-Control Skewed Data .......................................... 163
Appendix T: Simulation Results Using Skewed Data with an IS in p = 2 ............................. 164
Appendix U: Simulation Results Using Skewed Data with an IS in p = 5 ............................ 165
Appendix V: Simulation Results Using Skewed Data with a 5% SS in p = 2 ....................... 166
Appendix W: Simulation Results Using Skewed Data with a 15% SS in p = 2 .................... 167
Appendix X: Simulation Results Using Skewed Data with a 30% SS in p = 2 ..................... 168
Appendix Y: Simulation Results Using Skewed Data with a SS in p = 5 ............................. 169
Appendix Z: Subgroup Size Analysis Using In-Control Data ............................................... 170
Appendix AA: Subgroup Size Analysis Using Data with an IS in p = 5 ............................... 171
Appendix BB: Subgroup Size Analysis Using Data with a 15% SS in p = 5 ........................ 172
ix
List of Tables
Table 2.3.1 Data Ranked According to RMD.............................................................................. 32
Table 2.4.1 Data Ranked According to MSD .............................................................................. 36
Table 3.2.1 Empirical Control Limits for the MMR Chart .......................................................... 48
Table 3.2.2 Simulated IC FAPs Using Normal Theory Limits ................................................... 50
Table 3.3.1 MMR Chart Data for the First Subgroup of a Bivariate Process .............................. 53
Table 4.3.1 Summary of Planned Experiments ........................................................................... 57
Table 5.2.1 Recommended Phase I Control Chart Usage for Heavy-Tailed Data ...................... 78
Table 5.3.1 Recommended Phase I Control Chart Usage for Skewed Multivariate Data ........... 91
Table 6.1.1 MMR Chart UCLs for Chapter 6 Example ............................................................. 102
x
List of Figures
Figure 1.1.1 The Unification of Relevant Research Areas ............................................................ 2
Figure 1.6.1 Initial (Top Panel) and Revised (Bottom Panel) Control Charts ............................. 19
Figure 2.3.1 Bivariate Random Sample ....................................................................................... 30
Figure 2.4.1 Illustration of Spatial Depth .................................................................................... 33
Figure 2.5.1 Illustration of Simplicial Depth ............................................................................... 38
Figure 3.2.1 Q-Q Plots of Zi for m = 50, n = 5(5)20 .................................................................... 51
Figure 5.2.1 Empirical IC FAPs for Symmetric Bivariate Distributions ..................................... 67
Figure 5.2.2 Empirical IC FAPs for t(3) Processes in Higher Dimensions ................................. 69
Figure 5.2.3 Control Chart Performance on Symmetric Bivariate Data with an IS .................... 71
Figure 5.2.4 MMR-RMD/MSD Chart Performance on t(3) Data with an IS .............................. 72
Figure 5.2.5 Control Chart Performance on t(3) Data with an IS in Higher Dimensions ........... 73
Figure 5.2.6 Control Chart Performance on Increasingly Contaminated Bivariate t(3) Data ..... 75
Figure 5.2.7 MMR-RMD/MSD Chart Performance on Bivariate t(3) Data with a 30% SS ....... 76
Figure 5.2.8 Control Chart Performance on t(3) Data with a 15% SS in p = 10 ......................... 77
Figure 5.3.1 Empirical IC FAPs for Lognormal Processes in p = 2 and p = 5 ............................ 79
Figure 5.3.2 Control Chart Performance on Bivariate Lognormal Data with an IS .................... 80
Figure 5.3.3 MMR-MSD/RMD Chart Performance on Bivariate LGN Data with an IS ............ 81
Figure 5.3.4 Control Chart Performance on LGN Data with an IS in p = 5 ................................ 82
Figure 5.3.5 Control Chart Performance on Increasingly Contaminated LGN Data in p = 2 ..... 83
xi
Figure 5.3.6 MMR-MSD Chart Performance on Increasingly Contaminated LGN Data ........... 84
Figure 5.3.7 MSD and RMD Rankings for Bivariate LGN Data with a 30% SS ........................ 85
Figure 5.3.8 Scatterplots of MSD vs. RMD Ranks for Shifted Bivariate LGN Data .................. 86
Figure 5.3.9 MMR-MSD/RMD Chart Performance on Increasingly Shifted LGN Data ............ 88
Figure 5.3.10 MMR-RMD Chart Performance on Bivariate LGN Data with a 30% SS ............ 89
Figure 5.3.11 Control Chart Performance on LGN Data with a 15% SS in p = 5 ....................... 90
Figure 5.4.1 Effects of Subgroup Size on Control Chart Performance Under an IS in p = 5 ...... 92
Figure 5.4.2 Effects of Subgroup Size on Chart Performance Under a 15% SS in p = 5 ............ 93
Figure 5.5.1 Comparison of MMR-RMD (Using BACON Estimators) and HT2 Charts ........... 95
Figure 5.5.2 The Effects of Increasing Shift Sizes on Univariate and Bivariate t(3) Data .......... 96
Figure 5.5.3 Improvement in MMR-RMD Chart Performance with New Estimators ................ 98
Figure 5.5.4 Change in Chart Performance When the Mean is Known ...................................... 99
Figure 5.5.5 Redistribution of Ranks Under 5% and 30% Sustained Shifts.............................. 100
Figure 6.2.1 Initial Application of Phase I Control Charts to the Lognormal Sample .............. 103
Figure 6.2.2 Second Iteration of the MMR-MSD Control Chart ............................................... 105
Figure 6.2.3 Final Control Charts After Four Iterations of Phase I Analysis ............................ 106
xii
List of Abbreviations
AP alarm probability
ARL average run length
BACON blocked adaptive computationally efficient outlier nominators
CL center line
CUSUM cumulative sum
EAP empirical alarm probability
EWMA exponentially weighted moving average
FAP false alarm probability
HT2 Hotelling's T2
IC in control
IS isolated shift
LCL lower control limit
LGN lognormal
MA moving average
MCD minimum covariance determinant
MCUSUM multivariate cumulative sum
MEWMA multivariate exponentially weighted moving average
MHD Mahalanobis depth
MMR multivariate mean-rank
xiii
MSD Mahalanobis spatial depth
MVE minimum volume ellipsoid
OC out of control
RBP replacement breakdown point
RL run length
RMCD reweighted minimum covariance determinant
RMD robust Mahalanobis depth
SD simplicial depth
SPD spatial depth
SS sustained shift
UCL upper control limit
Note: To avoid confusion, the reader should pay particular attention to the definitions provided
for OC, SD, and SS. These abbreviations are often used in statistical literature to stand for
operating characteristic, standard deviation, and sum of squares, respectively, but are defined
differently in this document.
1
1 Introduction and Literature Review
1.1 Background and Motivation
Multivariate statistical process control charts are necessary to simultaneously monitor
two or more correlated variables representing quality characteristics of an industrial or other
process. A multivariate control charting application usually involves a dimension reduction
technique of converting multivariate observations to single dimensional control chart statistics
which are then monitored using appropriate control limits. This approach accounts for the
correlation structure in the data, whereas monitoring correlated variables using separate
univariate control charts for each variable ignores the correlation among quality characteristics
and can lead to erroneous conclusions about the state of a process. The first multivariate quality
control chart is attributed to Hotelling (1947), who created the T2 chart to monitor bombsight
data during World War II.
Multivariate quality control charting has grown in both popularity and relevance since
Hotelling's introduction. In a review of statistical process control research issues and ideas,
Woodall and Montgomery (1999) pointed out the notable rise in multivariate quality control
research due to increased measurement capability and computing power. Montgomery (2005, p.
489) noted that larger manufacturing databases have greatly increased the use of multivariate
quality control methods in recent years. Bersimis, Psarakis, and Panaretos (2007) stated in a
multivariate statistical process control overview that multivariate Shewhart-type charts are the
2
most common control charts in industry today, adding that more examination of this area is very
important. In particular, they pointed out the need for more research into robust design of
Hotelling's T2 chart and nonparametric control charts.
As represented by Figure 1.1.1, the contribution of this research is the merger of three
separately researched but highly related fields (distribution-free Phase I quality control,
computational geometry, and robust parameter estimation) to provide a solution to the open
problem of establishing an outlier-free reference sample for a multivariate process without the
assumption of normality. Great strides have been made in each of the aforementioned research
Figure 1.1.1 The Unification of Relevant Research Areas
areas in recent years, yet no one in the statistical quality control field has leveraged recent
developments in the manner accomplished by this research. The following chapters will detail
the multivariate extension of an existing univariate distribution-free control chart for subgroup
Distribution-Free
Phase I Quality
Control
Robust
Parameter
Estimation
Computational
Geometry
3
location, including the use of appropriate data depth functions for purposes of dimension
reduction and the implementation of an effective robust parameter estimation technique, to
provide a solution to this problem.
1.2 Differences Between Phase I and Phase II
A control charting application is typically divided into two distinct phases. In Phase I,
also known as the preliminary analysis phase, when little is known about a process being studied,
the objective is to identify an in-control (IC) reference sample. This involves retrospective
analysis of a historical data set in order to eliminate any data points which do not accurately
represent the routine operation of the process. The resulting data are described as IC because it
is believed that all remaining variability in the process is inherent to the process itself and not
due to assignable causes. Upon completion of Phase I, the IC reference sample is used to
establish control limits for Phase II, the monitoring stage of a control charting application. In
Phase II, newly observed data points are successively compared to the control limits to identify
significant departures from the IC state. Should an observation fall outside the control limits, a
search for an assignable cause is immediately undertaken. If the change in process behavior can
be linked to special causes or external factors, the process is deemed out of control (OC) and
remedial action is taken to correct the problem.
Prior to conducting any analysis in a control charting scenario, it is usually assumed that
the unedited reference sample may contain OC points and the control limits are unknown. The
challenging nature of a Phase I analysis under these conditions has been recognized since the
earliest days of statistical process control. Shewhart (1939, p. 76) said, "In the majority of
practical instances, the most difficult job of all is to choose the sample that is to be used as the
4
basis for establishing the tolerance range. If one chooses such a sample without respect to the
assignable causes present, it is practically impossible to establish a tolerance range that is not
subject to a huge error."
If a flawed Phase I analysis results in the erroneous inclusion of OC points in the IC
reference sample, the control limits for Phase II monitoring will be too wide and OC situations
will not be detected in a timely manner. This in turn will result in the production of poor quality
goods or services for an unnecessarily protracted period of time. When the OC condition is
finally detected, the substandard goods or services will have to be reworked, or scrapped and
completely reproduced. This can cost the goods or services facility money in terms of labor and
other operating expenses for rework or reproduction, additional materials necessary for
reproduction, lost future production while previous work is redone, financial penalties for failure
to meet contractual deadlines, and loss of customers due to dissatisfaction with faulty or
untimely goods or services received.
On the other hand, if a faulty Phase I analysis results in the erroneous exclusion of IC
points from the IC reference sample, the control limits for Phase II monitoring will be too narrow
and false alarms will repeatedly occur. False alarms require work stoppages to search for
assignable causes, potentially costing the goods or services facility money in terms of lower
throughput, idle workers while OC signals are investigated, overtime for quality control
personnel investigating OC signals, financial penalties for failure to meet contractual deadlines,
and loss of customers due to goods or services not being received in a timely manner.
Ultimately, whether the resulting control limits are too wide or too narrow, an incorrect Phase I
analysis can also cause a lack of confidence by all in the quality control methodology in place,
creating a challenging environment for managers.
5
Phase I control charts are designed with the goal of achieving a specified overall IC false
alarm probability (FAP), defined as the probability of one or more observations plotting outside
the control limits in the absence of assignable causes. Phase I usually involves iteratively
comparing the reference sample to trial control limits (corresponding to the desired overall IC
FAP) estimated from the sample. At each iteration of a Phase I analysis, an OC point is
eliminated from the reference sample if an assignable cause is identified, and trial control limits
are updated excluding the OC point. This iterative process continues until all points in the
reference sample are IC.
Phase I analysis requires careful consideration when it involves methods which compute
independent control chart statistics consisting of individual observations (e.g. the univariate X
chart or the multivariate T2 chart) or subgrouped observations (e.g. the univariateX chart or the
multivariate T2 chart). Provided the observations originate from random sampling, the control
chart statistics are independent of one another. However, because the control limits are
estimated from the reference sample itself in Phase I, the control limits are dependent on each
sample point included in their calculation. Thus, successive comparisons of chart statistics to
control limits are statistically dependent despite the control chart statistics themselves being
independent. These dependencies often make it difficult to correctly determine the overall IC
FAP for a Phase I analysis.
Phase II, on the other hand, consists of comparing new observations (in the form of a
chart statistic) to the control limits previously established in Phase I. Because the control limits
in Phase II are fixed through conditioning, successive comparisons of chart statistics to control
limits are independent provided the chart statistics are independent of one another (e.g. the X, X ,
and T2 charts). This is in contrast to moving average (MA), exponentially weighted moving
6
average (EWMA), or cumulative sum (CUSUM) charts and their multivariate counterparts,
whose chart statistics include past observations and are therefore naturally dependent.
Chart performance in Phase II is often measured using moments of the run length (RL)
distribution. The RL is the number of observations until an OC signal is observed. If the
comparisons of the chart statistics to the control limits are independent, the RL is a geometric
random variable. The expected value of the IC RL is equal to 1/?, where ? is equal to the
probability that a single chart statistic plots outside the control limits in the absence of assignable
causes. The expected value of the RL is known as the average run length (ARL) and is
commonly used to describe control chart performance in Phase II.
The purpose of this research is to develop a Phase I procedure for subgrouped
multivariate data that is distribution free when a process is IC. The procedure will be based on
the use of data depth in conjunction with robust estimators of location and scale to reduce
multivariate observations to univariate depth values, thus producing a center-outward ordering of
the multivariate data. The corresponding ranks of the univariate depth values, in the form of a
control statistic for each subgroup, will then be analyzed using a univariate chart. As the
following literature review will demonstrate, this is an area in much need of additional research.
1.3 Phase II Multivariate Control Charting Methods
Existing Phase II multivariate control charting methods will be discussed first, beginning
with parametric charts. This will be followed by an examination of distribution-free,
nonparametric, and robust techniques, and will conclude with a synopsis of depth-based
nonparametric procedures for use in Phase II. Before undertaking this discussion, however, it is
7
important to distinguish precisely what is meant by the terms distribution free, nonparametric,
and robust.
Gibbons and Chakraborti (2003, p. 3) state, "In a distribution-free inference, whether for
testing or estimation, the methods are based on functions of the sample observations whose
corresponding random variable has a distribution which does not depend on the specific
distribution of the population from which the sample was drawn." In other words, a
"distribution-free" method uses a control chart statistic which follows the same distribution
regardless of the underlying distribution of the process itself. Gibbons and Chakraborti (2003, p.
3) add, "On the other hand, strictly speaking, the term nonparametric test implies a test for a
hypothesis which is not a statement about parameter values." This means that "nonparametric"
control charting methods assess whether the distribution of a process, as opposed to specific
parameters, has departed from the IC state. From this, it is clear that the terms distribution free
and nonparametric are not synonymous, as a control charting method could be distribution free
but not nonparametric and vice versa. Last but not least, the term "robust" will be used to refer
to methods in which the distribution of the statistics are similar regardless of the distribution of
the process data, but the methods may not be strictly distribution free. All characterizations of
control charting methods as being distribution free, nonparametric, or robust refer to the IC state
of a process only.
1.3.1 Phase II Multivariate Parametric Charts
Hotelling's T2 control chart is the most familiar multivariate quality control chart in
existence today [Montgomery (2005, p. 491)]. It is designed for detecting large shifts in the
mean vector of a multivariate normally distributed process because it uses information only from
8
the current sample, and it can be applied during both Phase I and Phase II using appropriate
control limits. Alternatively, authors such as Chenouri and Steiner (2009), Chenouri and
Variyath (2011), and Mohammadi, Midi, Arasan, and Al-Talib (2011) have proposed bypassing
Phase I by using the reweighted minimum covariance determinant (RMCD) method of Willems,
Pison, Rousseeuw, and Van Alest (2002) to glean robust estimates of location and scatter from a
reference sample, and implementing those estimates directly in a Phase II T2 control chart. In all
cases, however, the T2 chart is reliant on the limiting assumption that the data follow a
multivariate normal distribution. This chart's lack of robustness to nonnormality is well
documented by distribution-free and nonparametric control chart authors such as Chou, Mason,
and Young (2001), Liu, Singh, and Teng (2004), and Fricker and Chang (2009a) who evaluated
their proposed methods in comparison to the traditional T2 chart applied to nonnormal process
data.
Crosier (1988) and Pignatiello and Runger (1990) proposed several multivariate
cumulative sum (MCUSUM) charts which are more sensitive to small or gradual location shifts
since they use past information in addition to the current sample, but these charts also rely on the
assumption of multivariate normally distributed data. Jackson (1991) presented a T2 chart using
principal components scores, a control chart for principal components residuals, and a control
chart for each independent principal component's scores, all based on the assumption of a
multivariate normally distributed process. The multivariate exponential weighted moving
average (MEWMA) control chart developed by Lowry, Woodall, Champ, and Rigdon (1992) is,
like the MCUSUM chart, sensitive to small or gradual shifts but likewise based on the
assumption of multivariate normally distributed data. It can be designed to be robust to
nonnormality by using a small smoothing constant as noted by Stoumbos and Sullivan (2002),
9
Testik, Runger, and Borror (2003), and Testik and Borror (2004). However, the MEWMA chart
assumes that the IC process mean vector and covariance matrix are known, which is unlikely to
be the case in Phase I.
Numerous other parametric Phase II multivariate control charting methods, many of
which are variants of the well-known T2, MEWMA, and MCUSUM charts, have been proposed
but will not be detailed here. For comprehensive reviews of such charts, see Wierda (1994),
Lowry and Montgomery (1995), Mason, Champ, Tracy, Wierda, and Young (1997), Woodall
and Montgomery (1999), and Bersimis et al. (2007).
1.3.2 Phase II Multivariate Distribution-Free, Nonparametric, and Robust Charts
Nonparametric, distribution-free, and robust multivariate control charting methods have
been developed, yet they are usually designed for Phase II implementation. Hayter and Tsui
(1994) proposed a nonparametric multivariate control chart to detect location changes in
nonnormally distributed processes. This method is based on the empirical cumulative
distribution function of a statistic formed from an IC reference sample of 500 or more
observations, so it is strictly a Phase II method. Qiu and Hawkins (2001) developed a
distribution-free, rank-based CUSUM procedure for detecting a location shift, but this method
assumes knowledge of the IC mean vector. Chou et al. (2001) proposed a kernel smoothing
technique for estimating the distribution of the T2 control statistic and the upper control limit of
the T2 chart when the Phase II process data follow a multivariate exponential distribution. Qiu
and Hawkins (2003) also introduced a nonparametric CUSUM procedure for detecting mean
shifts in all directions. This method is based both on the order information among the
measurement components as well as the order information between measurement components
10
and their IC means, but it assumes that the IC distribution of a process is known. Sun and Tsung
(2003) developed a distribution-free multivariate control chart based on the distance between the
"kernel centre" of the known IC sample and the new observation, using support vector methods
to calculate the kernel distance. Thissen, Swierenga, de Weijer, Wehrens, Melssen, and Buydens
(2005) used a combination of mixture modeling, which separates the data into Gaussian clusters,
and statistical process control techniques to create a distribution-free multivariate control chart.
This method requires an IC reference sample to estimate the moments of the Gaussian clusters
and the fraction of observations in each cluster. Qiu (2008) proposed a distribution-free, log-
linear modeling-based approach to estimating the IC multivariate distribution, as well as a
distribution-free MCUSUM procedure for detecting location shifts in Phase II, but the
availability of a set of IC data is assumed. Fricker and Chang (2009a) used a Kolmogorov-
Smirnov test to compare the ranked kernel density estimates for a set of IC data and a set of the
most recent data points. This method is nonparametric but again requires the existence of a
multivariate reference sample.
1.3.3 Phase II Multivariate Rank-Based Charts
Nonparametric multivariate control charts have also been proposed using simplicial data
depth, which was first introduced by Liu (1990), as a dimension reduction technique. The idea
behind simplicial depth-based control charts is to use the simplicial depth of a given multivariate
point x within the data cloud formed by a multivariate reference sample ? ?1,..., nXX to produce
a univariate center-outward ranking of the data points. A precise definition of simplicial depth in
p dimensions will be presented in Chapter 2, but the simplicial depth of a bivariate point x is the
11
proportion of triangles formed by all possible triplets of points in ? ?1,..., nXX containing x.
Simplicial depth in higher dimensions follows the same logic.
Liu's (1995) suggested procedure is to calculate the simplicial depth of a given
multivariate point, use the depth to create a control statistic reflecting the point's center-outward
ranking relative to an IC reference sample, plot the control statistic on a univariate control chart,
and finally compare the control statistic to control limits set to achieve a desired maximum IC
FAP. The resulting control charts, called r, Q, and S charts, are essentially ,,XXand cumulative
sum (CUSUM) charts respectively, using simplicial depth-based ranks instead of raw univariate
data to compute control statistics. Liu (1995) describes these charts as completely nonparametric
and able to simultaneously detect location and scale changes in a process. However, Stoumbos
and Jones (2000) showed that the 500-observation IC reference sample recommended for Liu's
(1995) charts was not large enough to achieve a satisfactory IC FAP for many process
distributions, thus limiting the method's potential for widespread implementation. Liu et al.
(2004) later introduced a simplicial data depth-based moving average (DDMA) control chart
which is described as having better ability than the r and Q charts to detect changes in location
while maintaining the same ability to detect changes in scale. The DDMA chart is also said to be
completely nonparametric but as with most nonparametric methods, if the process data follow a
multivariate normal distribution then a normal theory method (e.g. Hotelling's T2 chart) is
preferred. Notably, all results from this study are derived using an IC reference sample of 1000,
yet again raising the question of how one is to obtain such a large IC data set.
Other data depth-based nonparametric approaches to the Phase II multivariate quality
control problem have been developed, but they all assume a pre-existing IC reference sample.
Zarate (2004) used principal components analysis to reduce the dimensionality of a process, and
12
then employed a nonparametric control chart based on data depth to monitor some of the
principal components instead of the original variables. Beltran (2006) employed Liu's (1995) r
chart using the simplicial depth ranks of the first and last set of principal components.
Messaoud, Weihs, and Hering (2008) proposed a data depth-based, distribution-free EWMA
control chart for multivariate observations. This procedure consists of computing the
Mahalanobis or simplicial depth of a point with respect to the m most recent observations from a
process, converting each depth to a sequential rank among the m most recent observations, and
monitoring the standardized sequential ranks using the EWMA chart. The authors recommend
an IC reference sample of 100 or more points to initiate this method. For multivariate data
following an elliptical distribution, Hamurkaroglu, Mert, and Saykan (2004) developed a
nonparametric control chart which consists of computing the Mahalanobis depth of each point,
ranking each depth measurement with respect to a sample from an IC process, and then using r
and Q charts proposed by Liu (1995) to monitor the ranks. Once more, the Phase I problem of
identifying an IC reference sample must be solved before using any of these procedures.
1.4 Self-Starting Multivariate Control Charting Methods
Self-starting multivariate methods, in which successive observations are used to update
parameter estimates and check for OC conditions, have been suggested as a substitute for solving
the Phase I problem because they can be implemented at the very beginning of a process. These
methods are designed to reduce reliance on large and potentially costly Phase I samples required
by some multivariate control charting procedures. As noted by Sullivan and Jones (2002, p. 25),
they can be especially advantageous when production is slow, early OC production is expensive,
or there are insufficient samples available to estimate parameters.
13
One of the earliest attempts at a self-starting multivariate control chart is Quesenberry's
(1997) Q-chart, in which the author proposed computing a control chart statistic based on the
quadratic form of the deviation of the current observation vector from the estimated mean vector.
The control chart statistic is then transformed to a N(0, 1) scalar and monitored using a
univariate Shewhart-type control chart. Schaffer (1998) employed the same basic methodology
as Quesenberry (1997), but used a univariate EWMA scheme to monitor the resulting control
chart statistic. Both methods assume multivariate normally distributed process data.
Sullivan and Jones (2002) introduced a self-starting MEWMA chart, showing that it is
more effective than the methods of Quesenberry (1997) and Schaffer (1998) and has the added
advantage of robustness to nonnormality with an appropriate choice of smoothing constant.
Sullivan and Jones (2002) caution that because parameter estimates are updated with each new
observation, changes occurring near the beginning of a process can be unknowingly absorbed
into the parameter estimates, thus masking the shift. To guard against this, Sullivan and Jones
(2002) recommend augmenting their self-starting chart with a single retrospective analysis at a
suitable point in the process, with the exact timing dependent on the dimension as well as other
factors.
Zamba and Hawkins (2006) developed a multivariate change-point model which claims
to eliminate the requirement of a large Phase I sample. Their method analyzes standardized
differences between potential preshift and postshift observations to identify the point at which
the mean vector changes, but is only applicable to multivariate normal processes. Also, Zamba
and Hawkins' (2006) chart assumes that the mean vector remains constant after a single shift
occurs, so it is designed to detect a sustained shift of the mean only.
14
Hawkins and Maboudou-Tchao (2007) proposed a self-starting methodology which
transforms multivariate normal observations with unknown parameters into multivariate standard
normal observations which are then charted using the MEWMA chart or any other method
requiring known parameters, thus bypassing the difficult task of parameter estimation. However,
like most self-starting methods, this technique is susceptible to error resulting from early shifts in
the process. Although the authors argue their method eliminates the need for a Phase I - Phase II
distinction, they suggest that after the initial phase of data gathering, one should "start with the
most recent process reading and successively add and chart the earlier readings back to the start
of the sequence" [Hawkins and Maboudou-Tchao (2007, p. 206)] in order to diagnose undetected
shifts occurring earlier in the process.
These self starting methods are certainly viable alternatives under certain conditions.
Nevertheless, they have not diminished the need for a more universally applicable distribution-
free Phase I multivariate control chart procedure.
1.5 Phase I Multivariate Control Charting Methods
There exist a number of control chart methods developed for use in Phase I, though they
are mostly variations of Hotelling's T2 control chart based on the assumption of a multivariate
normally distributed process. In addition, the majority of them deal with individual as opposed
to subgrouped data. Hotelling's T2 control chart can be applied to individual data in Phase I
using control limits outlined by Tracy, Young, and Mason (1992). However, Sullivan and
Woodall (1996) showed that the usual practice of pooling all the individual observations to
estimate the covariance matrix for a T2 chart results in poor performance in detecting step
(sudden) and ramp (gradual) shifts in the mean vector. They instead proposed using the vector
15
differences between successive individual observations to estimate the IC covariance matrix for
the T2 statistic, and demonstrated that this method works better in detecting mean shifts but not
outliers.
For processes consisting of either individual or subgrouped observations, Sullivan and
Woodall (1998) proposed modified MCUSUM and MEWMA charts using simulated control
limits to account for the correlation among control statistics as well as a regression-based method
with exact (not simulated) limits for detecting sustained shifts in the mean vector. Using
simulation, they showed that each of their three proposed methods is better at detecting small
shifts in the mean vector than Hotelling's T2 chart. Nedumaran and Pignatiello (2000) addressed
the issue of constructing T2 control chart limits for retrospective testing when the parameters of a
subgrouped multivariate normally distributed process are unknown. They described and
compared a computationally intensive method of determining the exact control limit, Bonferroni
adjustments to Alt's (1976) Phase I control limit, and Bonferroni adjustments to the standard 2?
limit, ultimately recommending Bonferroni adjustments to Alt's (1976) Phase I limit as the best
alternative.
Vargas (2003) proposed T2 control charts for Phase I analysis of individual multivariate
normally distributed data using robust estimators of location and dispersion instead of the usual
sample mean vector and sample covariance matrix. A total of five different estimators were
considered, including the minimum volume ellipsoid (MVE) estimators of Rousseeuw and Van
Zomeren (1990), first introduced by Rousseeuw (1984), and the minimum covariance
determinant (MCD) technique of Rousseeuw and Van Driessen (1999), also introduced by
Rousseeuw (1984). The MVE method finds the ellipsoid of minimum volume that covers a
specified minimum number of data points, and uses the geometrical center of the ellipsoid as the
16
location estimator and the matrix defining the ellipsoid itself (multiplied by a constant) as the
covariance matrix estimator. The MCD method finds the subset of data that has the smallest
covariance matrix determinant while covering a specified minimum number of points. It then
uses the sample mean vector and the sample covariance matrix (also multiplied by a constant) of
the points in the subset as estimators for location and dispersion. Vargas also considered a
trimming approach which removes a proportion of extreme values based on Mahalanobis
distance, Sullivan and Woodall's (1996) sample mean vector and covariance matrix estimated
from differences of successive observations, and an outlier detection algorithm proposed by
Sullivan and Woodall (1996). Based on simulation results, Vargas recommended using both a T2
control chart based on MVE estimators for detecting multiple outliers and the T2 control chart
suggested by Sullivan and Woodall (1996) to detect sustained shifts in the mean vector in Phase
I.
Jensen, Birch, and Woodall (2007) further detailed the advantages of using the MVE and
MCD methods in conjunction with T2 control charts for detecting outliers in individual
multivariate normally distributed data during Phase I. They determined that the MVE estimator
is best for smaller sample sizes and a smaller percentage of outliers, while the MCD estimator is
preferred for larger sample sizes or a larger percentage of outliers. The authors also provided
tables of simulated control limits for both estimators.
Other Phase I control charting efforts for multivariate normally distributed processes
include Alfaro and Ortega's (2008) proposal to trim each variable to obtain robust estimates for
the mean vector and covariance matrix, and then use those estimates in Hotelling's T2 chart with
Tracy et al.'s (1992) Phase I UCL to provide enhanced outlier detection. Jobe and Pokojovy
(2009) created a computationally intensive two-step method of identifying the largest bulk of
17
similar data from a time-ordered sequence of individual multivariate normally distributed points,
and used the estimated mean vector and covariance matrix from this bulk in the T2 statistic with
empirical control limits. The authors compared the performance of Hotelling's T2 chart using
their method, the classical method of parameter estimation, and the robust methods analyzed by
Vargas (2003) and Jensen et al. (2007), showing that their method results in improved
performance in detecting outliers as well as location shifts in Phase I. The authors attribute their
success to the fact that their method considers the time order of the data, whereas other methods
do not. Oyeyemi and Ipinyomi (2010) robustly estimated the covariance matrix for Hotelling's
T2 chart for individuals in Phase I by identifying a subset of data which meets specified
optimality criteria, and then iteratively expanding the subset to a predetermined size. Their
method was shown to outperform the MVE and MCD methods in a limited number of cases, but
only bivariate normally distributed samples of size m = 30 were considered. Most recently,
Yanez, Gonzalez, and Vargas (2010) proposed using biweight S estimators for location and
scatter in a T2 chart for individual multivariate normally distributed data with simulated limits,
showing that it outperforms Hotelling's T2 chart with MVE estimators for small samples.
Distribution-free and nonparametric Phase I methods, on the other hand, have received
little attention in multivariate quality control literature. The only chart found is Dai, Zhou, and
Wang's (2006a) unpublished halfspace (Tukey) data depth-based nonparametric MCUSUM
chart.
1.6 Developing a Distribution-Free Phase I Procedure -- A Univariate Example
Although unanswered in the multivariate domain, the challenge of developing a
distribution-free Phase I procedure has been addressed for the univariate case. The details of the
18
univariate Phase I solution are relevant to the multivariate Phase I problem because this research
will ultimately rely on a univariate chart to monitor control statistics resulting from dimension
reduction of a multivariate reference sample using data depth. The unique considerations
involved in developing a distribution-free Phase I procedure are best illustrated by an example.
Example 1.6.1
Consider a reference sample consisting of m = 25 independent subgroups, each
containing n = 5 observations from an unknown distribution. The widely used Shewhart X chart
with 3? limits can be created using the procedure outlined in Montgomery (2005), under the
assumption that the distribution of subgroup averages is approximately normal due to the central
limit theorem. Since the IC parameters o? and o? are unknown, the lower control limit (LCL),
center line (CL), and upper control limit (UCL) are estimated using
?? 3 o
oLCL n????
(1.6.1)
?oCL?? (1.6.2)
?? 3,o
oUCL n????
(1.6.3)
where ?o? and ?o? are unbiased estimators for o? and .o? Montgomery (2005, pp. 196-198)
discusses several choices for ?o? and ?.o?
Using Equations (1.6.1), (1.6.2), and (1.6.3), the initial Phase I control chart for this
example is illustrated in the top panel of Figure 1.6.1. Suppose that investigation of the potential
OC point represented by subgroup average number 11 reveals an assignable cause, so the point is
deemed OC. The revised control limits in the bottom panel of Figure 1.6.1 are more narrow due
19
to the exclusion of subgroup 11, and all remaining subgroup averages now fall within the
updated control limits. The IC reference sample has been established, and the most recent
control limits can be used for Phase II monitoring.
Figure 1.6.1 Initial (Top Panel) and Revised (Bottom Panel) Control Charts
Determination of the overall IC FAP for the control chart in Example 1.6.1 would be
straightforward under conditions of normality of subgroup averages and known parameters. The
overall IC FAP or P(at least one false alarm among all m = 25 comparisons) would be calculated
20
as follows: (1 - (1 - 0.0027)25) = 0.0654. The overall IC FAP, while considerably higher than
the individual FAP of 0.0027, could easily be lowered by using limits wider than 3.?
If, on the other hand, the underlying distribution of the subgroup averages is not normal,
the true overall IC FAP may be much larger. Suppose for example that the actual individual
FAP is 0.01. Then the overall IC FAP equals (1 - (1 - 0.01)25) = 0.2222. With only a slight
increase in the individual IC FAP, the overall IC FAP increased dramatically. This could result
in a large number of IC subgroups being erroneously excluded during Phase I.
Furthermore, when the parameters are unknown as in Example 1.6.1, successive
comparisons of subgroup averages to control limits are dependent. Therefore, the overall IC
FAP may not be determined using 1 minus the product of the complements of the m = 25
individual FAPs. Instead, control limits designed to achieve a specified overall IC FAP must be
determined using the joint density function or the simulated empirical distribution of the
subgroup averages.
Champ and Jones (2004) dealt with the case of a normally distributed process and
unknown parameters by using the (joint) multivariate t distribution of the m control statistics to
define control limits to achieve a desired overall IC FAP. For processes in which normality
cannot be established and parameters are unknown, Jones-Farmer, Jordan, and Champ (2009)
proposed a rank-based Phase I location chart which is essentially a Shewhart chart of
standardized subgroup mean ranks. This method uses approximate multivariate normal theory
control limits (for large subgroup sizes n) and simulated control limits (for smaller subgroup
sizes n) to achieve a specified overall IC FAP.
The issues of data nonnormality and dependence among control statistics are problematic
for any parametric Phase II control charting method used in Phase I, including multivariate
21
procedures. These are precisely the problems this research seeks to address by developing a
distribution-free method of establishing an IC reference sample for a multivariate process
consisting of subgrouped data.
1.7 Special Considerations in Multivariate Quality Control
There are two drawbacks to multivariate quality control that must be kept in mind in any
research effort. The first is computational complexity. Multivariate control charting methods
are inherently more computationally intensive than univariate methods. Despite advances in
quality control software, complex methods can quickly become unmanageable as the dimension
of the data increases. The development of methods that only work for two or three variables, or
that are too complex to be used by practitioners, must be guarded against.
The second downside to multivariate quality control is the issue of interpretation.
Multivariate control chart techniques do not directly identify which variable(s) caused an OC
signal. As previously discussed, it is insufficient to simply separate and individually chart each
variable belonging to an OC multivariate process, because correlated variables may behave
differently alone than when in combination with each other. As a result, many useful approaches
to interpreting OC signals in a multivariate setting have been proposed, and a summary of such
works is provided by Bersimis et al. (2007) in an overview of multivariate statistical process
control charts. While this problem will not be specifically addressed by this research, it should
be considered when developing a new procedure.
22
1.8 Organization of Dissertation
The remainder of this document is dedicated to the detailed development and application
of a data depth-based, distribution-free Phase I multivariate control charting method for detecting
location changes in subgrouped data. In Chapter 2, data depth is explored as a distribution-free
method of reducing multi-dimensional data to univariate ranks, and the advantages and
disadvantages of several depth functions considered for implementation are discussed. Chapter 3
addresses the actual design of the data depth-based, distribution-free Phase I control chart for
subgrouped multivariate data. In Chapter 4, the simulation-based performance assessment plan
for the proposed method is discussed, and detailed algorithms for measuring performance under
various location shifts in normal, heavy-tailed, and skewed distributions are provided. Chapter 5
contains the results of extensive simulation runs comparing the proposed data depth-based,
distribution-free Phase I multivariate method to Hotelling's T2 chart with Phase I UCL. Chapter
6 is dedicated to a comprehensive application of the proposed data depth-based, distribution-free
Phase I multivariate method to a simulated historical data set containing several location shifts.
This dissertation concludes in Chapter 7 with a synopsis of research conducted,
recommendations for Phase I analysis when dealing with subgrouped multivariate data under
conditions of normality and nonnormality, recommendations for subsequent Phase II monitoring,
and discussion of areas in need of further investigation.
23
2 Measuring Centrality of Multivariate Data Using Data Depth
2.1 Fundamentals of Data Depth
A data depth measures how deep (or central) a point p?x R is with respect to a certain
probability distribution F or a given data cloud ? ?1,...,nn?X XX in .pR A data depth is
computed by applying one of many known data depth functions to a multivariate data point, thus
reducing it from a p-vector to a univariate depth value. Assuming unimodality of the data, a
large depth value indicates centrality and a low depth value suggests outlyingness of a given
point. Depth values are usually normalized to have a range of [0, 1]. The point of maximal
depth is considered the center of the data and is referred to as the multivariate median. A data
depth function may be visualized in p-dimensional space as a series of nested contours around
the multivariate median, where each contour represents the set of p-dimensional points with
equal depth values. Some depth functions force contours of a particular geometric form (e.g.
elliptical), whereas others allow contours to follow the actual geometric shape of the data.
Data depth facilitates the extension of order statistics to higher dimensions, because depth
values can be ranked from largest to smallest to produce a center-outward ordering of the data.
The ordered depth values can then be used to detect outliers, which are known in multivariate
quality control literature as OC points. Data depth allows multivariate data from any distribution
to be characterized by the relative position of the data points rather than parameters estimated
24
from the actual data values. This rank-based perspective makes data depth potentially very
useful as a distribution-free method of multivariate analysis.
The concept of data depth dates as far back as Tukey (1975), but until recently its
usefulness for statistical quality control has been limited by the tradeoff between statistical
properties, robustness to nonnormality, and computational complexity. After a comprehensive
review of numerous existing depth functions, this research will implement robust Mahalanobis
depth and Mahalanobis spatial depth because they are computationally feasible in any
dimension, sufficiently robust to outliers under the assumptions of this research, and satisfy the
four desirable properties of data depth functions discussed by Zuo and Serfling (2000).
2.2 Desirable Properties of Data Depth Functions
For a depth function ? ?;DFx to serve most effectively as an analytical tool, the
following four properties are required [Liu (1990), Zuo and Serfling (2000)]. Denote the class of
probability distributions on pR by F.
? Property 1: Affine invariance. The depth of a point p?x R should not depend on the
underlying coordinate system or, in particular, on the scales of the underlying
measurements. This ensures that a point classified as an outlier or nonoutlier in one
coordinate system is similarly classified in another coordinate system resulting from an
affine transformation. Formally stated, ? ? ? ?;;D F D F???A X b XA x b x holds for any
random vector X in ,pR any p x p nonsingular matrix A, and any p-vector b.
? Property 2: Maximality at center. For a distribution having a uniquely defined "center"
(e.g., the point of symmetry with respect to some notion of symmetry), the depth function
should attain maximum value at this center. This supports an accurate center-outward
25
ordering of the data points. Formally stated, ? ? ? ?; s u p ;
pD F D F?? x xR?
holds for any
F?F having center ,? where F is the class of distributions on the Borel sets of .pR
? Property 3: Monotonicity relative to deepest point. As a point p?x R moves away from
the "deepest point" (the point at which the depth function attains maximum value; in
particular, for a symmetric distribution, the center) along any fixed ray through the
center, the depth at x should decrease monotonically. This also supports an accurate
center-outward ordering of the data points. Formally stated, for any F?F having
deepest point ,? ? ? ? ?? ?;;D F D F?? ? ?xx?? holds for ? ?0,1.??
? Property 4: Vanishing at infinity. The depth of a point x should approach zero as x
approaches infinity, where x is the Euclidean norm of x. This ensures the data depth
function is both bounded and nonnegative. Formally stated, ? ?;0DF?x as ,??x
for each .F?F
According to Zuo and Serfling (2000), depth functions which satisfy these four properties are
particularly well suited for nonparametric multivariate inference, so these properties will serve as
a useful basis for describing the data depth functions selected for implementation in this
research.
A depth function may be viewed as a location estimator, and as such may be
characterized by its finite-sample replacement breakdown point (RBP). First defined by Donoho
and Huber (1983), the RBP is the minimum fraction of a sample which must be replaced by
outliers in order to completely ruin an estimate, so a low RBP indicates nonrobustness and a high
RBP signifies robustness to outliers. When used to describe a depth function, the RBP is usually
stated in reference to the multivariate median estimated by a depth function. The RBP of the
26
multivariate median is important because if the center of the data (as determined by the
multivariate median) is significantly affected by outliers, the subsequent center-outward ordering
will likewise be affected and outliers may be masked.
Whether a depth function has a high or low RBP is often determined by the robustness of
any location or scatter estimators used in its construction. The robustness of such location or
scatter measures is also described using the RBP. Precise definitions of RBPs for both location
and scatter estimators are adapted from Donoho and Huber (1983) and Lopuhaa and Rousseeuw
(1991). Let ? ?1,...,nn?X XX be a random sample of size n in .pR The RBP of a location
estimator T at ,nX or the smallest fraction k/n of outliers which can take the resulting estimate
beyond any bound, is defined as
? ? ? ? ? ?
, ,
; m i n : s u p ,
nkn n n k
kR B P T T Tn??? ? ? ?????
XX X X
(2.2.1)
where ,nkX is a contaminated sample found by replacing k points of nX with arbitrary values.
The RBP of a scatter estimator C at nX or the smallest fraction k/n of outliers which can drive
the largest eigenvalue of the resulting estimate to infinity or the smallest eigenvalue of the
resulting estimate to zero, is defined as
? ? ? ? ? ?? ?
, ,
; m i n : s u p , ,
nkn n n k
kR B P C M C Cn??? ? ?????
XX X X
(2.2.2)
where ,nkX is defined as before, ? ? ? ? ? ? ? ? ? ?? ?11
11, m a x , ,ppM ? ? ? ???? ? ?A B A B A B
and
? ? ? ?1 p????AA are the ordered eigenvalues of the matrix A.
To illustrate the idea of an RBP, consider a sample of size n in 1R and two common
location estimators: the sample mean and the sample median. The sample mean has an RBP of
27
only 1/n because a single outlier could move the sample mean to infinity, so it is considered a
nonrobust location estimator. In contrast, the sample median has the highest possible RBP of 1/2
because 1/2 of the sample would have to be contaminated with outliers in order to effect a
corresponding shift in the sample median. Consequently, the sample median is the preferred
location estimator in 1R from a robustness standpoint.
In addition to having a high RBP, any location or scatter estimator used in conjunction
with a data depth function should also be affine equivariant. From Lopuhaa and Rousseeuw
(1991), a location estimator T is affine equivariant if ? ? ? ?TT? ? ?A X b A X b for any p-vector
b and any p x p nonsingular matrix A, and a positive definite scatter estimator C is said to be
affine equivariant if ? ? ? ? TCC??A X b A X A for any p-vector b and any p x p nonsingular
matrix A. Akin to the concept of affine invariance for data depth functions, affine equivariance
means that an estimator does not depend on the location, scale, or orientation of the data.
According to Lopuhaa and Rousseeuw (1991), finding affine equivariant estimators with high
RBPs is a challenging problem. However, these properties are of paramount importance to any
multivariate quality control application, so only estimators possessing these properties will be
considered in this research.
2.3 Robust Mahalanobis Depth
The Mahalanobis depth (MHD) of a point x in pR with respect to a distribution F in pR
is defined as
? ? ? ? ? ?? ? 12; 1 ,
FM H D F d F
??????????x x , ? (2.3.1)
28
where ? ?F? and ? ?F? are location and covariance measures defined on F and
? ? ? ? ? ?21,'d ?? ? ?M x y x y M x y is the Mahalanobis distance [Mahalanobis (1936)] between two
points x and y in pR with respect to a positive definite p x p matrix M. When the distribution F
is unknown and a random sample ? ?1,...,nn?X XX is used to estimate ? ?F? and ? ?,F? the
sample version of the depth function is annotated as ? ?;,nMHD Fx where Fn denotes the
empirical distribution function of the sample. MATLAB code for computing Mahalanobis
depth, based on a modification of S. Mazumder's (personal communication, July 7, 2010)
algorithm, is provided in Appendix A.
The Mahalanobis depth function satisfies the four desirable properties listed by Zuo and
Serfling (2000) and is relatively easy to compute, but assumes the underlying distribution F is
elliptical and therefore produces elliptical contours of equal depth. In addition, as noted by Zuo
and Serfling (2000), the RBP of the median determined by the Mahalanobis depth function is
completely dependent on the choice of location and covariance measures ? ?F? and ? ?.F? If
the classical location and covariance estimators nX and nS are used, the Mahalanobis depth
function is nonrobust. The presence of even a single outlier can contaminate the estimators nX
and ,nS possibly masking the presence of outliers. In order to preclude this, Mahalanobis depth
should be used in conjunction with robust estimators.
Mahalanobis depth will be referred to as robust Mahalanobis depth (RMD) when used
with robust location and scatter estimators. There are numerous robust estimation methods from
which to choose. Dang and Serfling (2010) noted that the computationally complex MCD
method proposed by Rousseeuw (1984) or the more efficient Fast-MCD method of Rousseeuw
and Van Driessen (1999) could be used to produce affine equivariant, robust location and
29
covariance estimates. As discussed in Chapter 1, the MCD method finds the subset of data that
has the smallest covariance matrix determinant while covering a user-specified number of points.
It then uses the sample mean vector and the sample covariance matrix of the points in the subset
as estimators for location and dispersion. According to Jensen et al. (2007), MCD estimators
have a maximum RBP of ? ?1 / 2 / ,n p n?????? which is approximately 1/2 for reasonable values
of n and p, when the number of points used is equal to the integer value of ? ?1 / 2.np?? The
Fast-MCD program is available in many statistical software packages such as R, S-PLUS, and
SAS. In addition, a library of MATLAB codes for robust analysis including the Fast-MCD
program may be obtained from the LIBRA website at
http://wis.kuleuven.be/stat/robust/Libra.html.
Another alternative for finding robust estimators of location and scatter is the blocked
adaptive computationally efficient outlier nominators (BACON) method of Billor, Hadi, and
Velleman (2000). The BACON method is very computationally efficient, even for extremely
large data sets. It begins with a small outlier-free subset of the data, and then allows this subset
to grow rapidly until a stopping criteria is reached. Two versions of this iterative forward
selection method are available: Version 2 which is nearly affine equivariant and has RBPs
exceeding 40% for various combinations of dimension and sample size, and Version 1 which is
completely affine equivariant with RBP of approximately 20%. The Type I error probability (?)
for the BACON method can be set to any number between 0 and 1, but ? = 0.05 is suggested for
most applications. MATLAB code for the BACON method is available from the authors.
After several rounds of experimentation, it was decided to use the BACON method (with
? = 0.10) to estimate the process mean vector and
1
1 ,m i
im ?? ?SS
the scatter estimator for
30
Hotelling's T2 chart when data are divided into m subgroups, to estimate the process covariance
matrix. The BACON method was chosen as the location estimator because of its excellent
balance between computational efficiency and robustness. Although S is generally not
considered a robust estimator, it was chosen as the scatter estimator because it is highly robust to
location shifts (the focus of this research) when process data possess a common within-subgroup
covariance structure. Details are provided in Chapter 5.
Example 2.3.1
To illustrate an application of the robust Mahalanobis depth function, consider the
bivariate random sample ? ?5 1 5,...,? XXX from an unknown distribution, where each
? ?12 , 1, ..., 5 ,i X X i??X illustrated in Figure 2.3.1. The first step in computing RMD for this
sample is to estimate the mean vector using the BACON method (with ? = 0.10) and the
Figure 2.3.1 Bivariate Random Sample
11.15, 49.63
7.91, 36.46
5.42, 28.06
16.22, 38.77
8.09, 29.21
8.14, 35.84
0.00
10.00
20.00
30.00
40.00
50.00
60.00
0.00 5.00 10.00 15.00 20.00
X2
X1
Sample Data BACON Mean
i
1 11.15 49.63
2 7.91 36.46
3 5.42 28.06
4 16.22 38.77
5 8.09 29.21
X i
31
covariance matrix using Hotelling's T2 scatter estimator for subgrouped data. Because this
example involves individual as opposed to subgrouped observations, Hotelling's T2 scatter
estimator for subgrouped data reduces to the classical nonrobust sample covariance matrix.
Under these conditions, the robust BACON scatter estimator may be a better choice, but
Hotelling's T2 scatter estimator is used to maintain consistency with the methodology employed
throughout the remainder of this research. Estimates of location and scatter are determined to be:
? ?
2
1
2
8.14 35.84
17.18 20.45
20.45 75.48
0.09 0.02
.
0.02 0.02
B A CO N
HT
HT
?
?
??
? ??
??
???
? ??
???
X
S
S
Note that the BACON method excluded ? ?4 1 6 .2 2 3 8 .7 7?X from the estimated mean vector
due to its outlyingness relative to the other points.
Using the RMD function, ? ? ? ? ? ? 11; 1 ,
n r o b u s t r o b u s t r o b u s tR M D F ????? ? ? ???x x X S x X
and the
location and scatter estimates, the robust Mahalanobis depth for ? ?1 11.15 49.63?X is computed
as follows:
? ? ? ? ? ?
? ? ? ?? ? ? ? ? ?? ?
? ? ? ?
? ?
1
1
1 5 1 2 1
1
1
1
;1
0 .0 9 0 .0 2
1 1 1 .1 5 4 9 .6 3 8 .1 4 3 5 .8 4 1 1 .1 5 4 9 .6 3 8 .1 4 3 5 .8 4
0 .0 2 0 .0 2
0 .0 9 0 .0 2
1 3 .0 1 1 3 .7 9 3 .0 1 1 3 .7 9
0 .0 2 0 .0 2
1 2 .5 6
0 .2 8 .
B A CO N H T B A CO N
R M D F
?
?
?
?
?
??? ? ? ?
??
??? ?
? ? ? ??? ??
???
?? ??? ?
?? ?? ??
?????
??
?
X X X S X X
32
RMD computations for the four remaining observations in the sample proceed in the same
manner.
The final results, along with corresponding rankings, are provided in Table 2.3.1. As
expected, 2X attains the highest depth value since it is closest to the center of the data set (as
defined by the BACON mean vector), and 4X receives the lowest depth value since it is most
outlying.
i Xi RMD(Xi;F5) rank
1 11.15 49.63 0.28 4
2 7.91 36.46 0.98 1
3 5.42 28.06 0.55 2
4 16.22 38.77 0.18 5
5 8.09 29.21 0.54 3
Table 2.3.1 Data Ranked According to RMD
2.4 Mahalanobis Spatial Depth
Mahalanobis spatial depth (MSD) [Dang and Serfling (2010)] is an attractive alternative
to robust Mahalanobis depth because it is only slightly more difficult to compute yet is not
restricted to elliptical distributions. This means that the contours of equal depth determined by
the depth function conform to the geometric structure and shape of the data, as opposed to being
constrained to an elliptical form. Mahalanobis spatial depth is based on the concept of spatial
depth (SPD), defined by Vardi and Zhang (2000) for a point x in pR with respect to a
distribution F in pR as
? ? ? ? ? ? if; 1 , w h e r e
if .
S P D F E
? ??
? ? ? ? ??
??
0
00
x x
xx S x X S x
x
(2.4.1)
33
Intuitively, the spatial depth of a multivariate point x is equal to one minus the average of the
unit vectors from x to all observations in the sample. Spatial depth is graphically illustrated in
Figure 2.4.1.
Figure 2.4.1 Illustration of Spatial Depth
The spatial depth function is quickly computable in any dimension, and its multivariate
median has a very favorable RBP of 1/2 [Vardi and Zhang (2000)]. It also satisfies the
properties of maximality at center (with some exceptions; see Zuo and Serfling (2000) for
details), monotonicity relative to deepest point, and vanishing at infinity. However, it is not
completely affine invariant. According to Serfling (2002), the spatial depth function is invariant
with respect to shift, orthogonal, and homogeneous scale transformations of the data, but not
heterogeneous scale transformations. This is sufficient if all variables share the same unit of
measure, but this is not always the case in a multivariate quality control application so a
modification of the spatial depth function is needed.
34
Serfling (2010) showed that a fully affine invariant modification of the spatial depth
function may be accomplished by standardizing the sample data using any weak covariance
functional, which is defined as follows [Serfling (2010, p. 9)]:
"A symmetric positive definite p x p matrix-valued functional ? ?FC is called a
weak covariance functional if, for Y = AX + b with any nonsingular p x p matrix
A and any vector b, ? ? ? ?1 ',YXF k F?C A C A with ? ?11 ,, Xk k F? Ab a positive
scalar function of A, b, and FX. The sample version for a data set
? ?1,...,nn?X XX in pR may be expressed, with nn?YX= A b and
? ?11 , , ,nkk? Ab X as ? ? ? ?1 .n n n nk ??C A C AYX"
Application of a weak covariance functional transformation leads to Serfling's (2010) formula
for computation of Mahalanobis spatial depth (MSD) for a point x in pR with respect to a
distribution F in :pR
? ? ? ? ? ?? ?1 / 2; 1 .M S D F E F ?? ? ?
Xx S C x X
(2.4.2)
The sample version for a point x with respect to a random sample ? ?1,...,nn?X XX in
pR is
? ? ? ? ? ?? ?1 / 2; 1 .
n n nM S D F E F ?? ? ?x S C x X
(2.4.3)
There are a number of options available for determining the sample weak covariance functional
? ?nnC X but again
1
1 ,m i
im ?? ?SS
the scatter estimator for Hotelling's T2 chart when data are
divided into m subgroups, will be used in this research because of its robustness to location shifts
under the assumption of constant within-subgroup covariance. MATLAB code for computing
35
Mahalanobis spatial depth, based on a modification of S. Mazumder's (personal communication,
July 7, 2010) algorithm, is provided in Appendix B.
Example 2.3.1 will be revisited to illustrate an application of the Mahalanobis spatial
depth function. Computing MSD begins by multiplying the data set by the negative square root
of the weak covariance functional ? ?5 5 2HT?CSX as follows:
? ?? ?
1 / 2
1 / 2* 1 / 2
5 5 5 5 2
1 1 . 1 5 4 9 . 6 3 0 . 4 3 5 . 7 4
7 . 9 1 3 6 . 4 6 0 . 2 4 4 . 2 3
1 7 . 1 8 2 0 . 4 5
.5 . 4 2 2 8 . 0 6 0 . 0 1 3 . 2 9
2 0 . 4 5 7 5 . 4 8
1 6 . 2 2 3 8 . 7 7 2 . 5 0 4 . 0 6
8 . 0 9 2 9 . 2 1 0 . 6 9 3 . 2 9
HT
?
? ?
? ? ? ?
? ? ? ?
? ? ? ???
? ? ? ?? ? ? ? ???
? ? ? ???
? ? ? ?
? ? ? ?? ? ? ?
CSX X X X
Next, the spatial depth formula is applied to each observation in the transformed sample,
beginning with ? ?*1 0.43 5.74 .?X The first step in this process is to determine the unit vectors
from *1X to every point in the sample:
? ?
? ?
? ?
? ?
? ?
? ?
? ?
? ?
? ?
**
11
**
11
**
12
**
12
**
13
**
13
**
14
**
14
**
15
**
15
0.0 0 0 .00 by de f i nit i on
0.2 0 1.5 1
0.1 3 0.9 9
1.52
0.4 4 2.4 4
0.1 8 0.9 8
2.48
2.0 7 1.6 8
0.7 8 0.6 3
2.66
0.2 6 2.4 5
0.1 1 0.9 9 .
2.46
?
?
?
?
??
?
?
??
?
??
? ? ?
?
??
? ? ?
?
XX
XX
XX
XX
XX
XX
XX
XX
XX
XX
Then, the average of the unit vectors from the point *1X to every point in the sample is computed:
? ? ? ? ? ? ? ? ? ? ? ?0 . 0 0 0 . 0 0 0 . 1 3 0 . 9 9 0 . 1 8 0 . 9 8 0 . 7 8 0 . 6 3 0 . 1 1 0 . 9 9 0 . 1 2 0 . 7 2 .5? ? ? ? ? ? ??
36
Finally, the Euclidean norm of the resulting vector is subtracted from one in order to arrive at the
Mahalanobis spatial depth value of the point *1:X
? ? ? ? ? ?22*15M S D ; 1 0 . 1 2 0 . 7 2 0 . 2 7 .F ? ? ? ? ?X
Computations for the four remaining observations in the sample proceed in the same fashion.
The final results, along with corresponding rankings, are listed in Table 2.4.1. Rankings
for 1X and 4X were assigned as indicated because ? ? ? ?**1 5 4 5M S D ; M S D ;FF?XX when
? ?*15MSD ; FX and ? ?*45MSD ; FX are expanded to four significant digits. Depth values and
rankings using the MSD function are somewhat different than those obtained using the RMD
function. This is because RMD assumes the data are elliptically symmetric, whereas MSD
makes no distributional assumptions about the data.
i Xi MSD(Xi*;F5) rank
1 11.15 49.63 0.27 4
2 7.91 36.46 0.68 1
3 5.42 28.06 0.35 3
4 16.22 38.77 0.27 5
5 8.09 29.21 0.53 2
Table 2.4.1 Data Ranked According to MSD
2.5 Simplicial Depth
As discussed in Chapter 1, simplicial data depth played a prominent role in early depth-
based nonparametric multivariate control charting efforts, so a justification for its exclusion from
this research is necessary. Introduced by Liu (1990), the simplicial depth (SD) of a point x in
pR with respect to a distribution F in pR is defined as the probability that x belongs to a
random simplex in ,pR formally stated as
37
? ? ? ?
11; , . . . , ,FpS D F P S ????? ??x x X X
(2.5.1)
where 11,..., p?XX are independent observations from F and
11,..., pS ?????XX
denotes the p-
dimensional simplex with vertices 11,..., ,p?XX or the set of all points in pR that are convex
combinations of 11,..., .p?XX
For a random sample ? ?1,...,nn?X XX from F in ,pR the sample simplicial depth
function is derived from this definition to be
? ? ? ?
11
11
1
1
; , . . . , ,1 p
p
n i ii i n
nS D F I Sp
?
?
?
? ?? ??? ?
?? ?????? ???
?? ?x x X X
(2.5.2)
where I is the indicator function. ? ?; nSD Fx computes the fraction of the random sample
simplices containing the point x. In order to check whether a point x in pR is inside a simplex
11,..., ,pS ?????XX
the following system of p + 1 equations with p + 1 unknowns must be solved:
1 1 2 2 1 1... ppa a a ??? ? ? ?x x x x (2.5.3)
1 2 1... 1 .pa a a ?? ? ? ? (2.5.4)
Equation (2.5.3) translates into p equations which check to see if the p-dimensional point x can
be expressed as a linear combination of the p + 1 vertices forming a given simplex
11,..., .pS ?????XX Equation (2.5.4) represents a constraint that the coefficients 1 2 1, ,..., pa a a ? sum
to one. According to Liu (1990), if the simplex is nondegenerate, this system of equations has a
unique solution. Furthermore, the point x is inside the simplex if and only if the coefficients
1 2 1, ,..., pa a a ? are all positive. For a given point x, this process must be repeated for each of the
38
1np???????
possible p-dimensional simplices
11,..., pS ?????XX
formed by the sample
? ?1,..., .nn? XXX
In order to illustrate the simplicial depth function, a simple graphical example is
provided. Consider a sample of size n = 5 from a continuous bivariate distribution F, and
suppose the simplicial depth of a point ? is desired. There are a total of ? ?5 5! 10
3 3 ! 5 3 !?????? ???
possible triangles that can be formed from the sample, three of which contain the point ? as
illustrated in Figure 2.5.1: ? ?1 2 4 1 3 4 1 4 5, , .X X X X X X X X X Therefore, the simplicial depth of the
point ? is ? ?
5 3; 0 .3 0 .10S D F ???
Figure 2.5.1 Illustration of Simplicial Depth
Liu (1990) showed that the simplicial depth function satisfies the affine invariance,
vanishing at infinity, maximality at center, and monotonicity properties for continuous
39
distributions. However, as demonstrated by Zuo and Serfling (2000), the maximality and
monotonicity properties fail for some discrete distributions, which could be problematic when
dealing with a finite sample. As noted by Li and Liu (2004), the exact simplicial depth may be
computed in any dimension by solving a system of linear equations, but more efficient
algorithms are needed due to the increased computational complexity in higher dimensions.
Rousseeuw and Ruts (1996) provided such an algorithm for the bivariate case, but for
dimensions greater than two this remains an open problem. Since computational feasibility in
higher dimensions is an important goal of this research, simplicial depth will not be implemented
in the multivariate quality control charting method proposed in the following chapter.
40
3 The Multivariate Mean-Rank (MMR) Control Chart
3.1 Introduction
A multivariate quality control Phase I analysis begins with a p-dimensional reference
sample, often from an unknown distribution, which may contain one or more OC points.
Application of a data depth function to the multivariate reference sample reduces the dimension
of the reference sample from p to one. Then a univariate control charting method, with control
limits adjusted to account for the dependence among successive comparisons of control chart
statistics to control limits, may be applied to the resulting depth values in order to identify and
remove the OC points, thus producing an IC reference sample which will serve as a basis for
Phase II monitoring.
Differences between Phase I and Phase II were explained in detail in Chapter 1, but will
be briefly reiterated here as these differences directly impact the manner in which control limits
are determined in a Phase I analysis. In Phase II, the monitoring stage of a control charting
application, each new observation is compared (through a control chart statistic) to fixed control
limits. With data depth-based methods such as those described by Liu (1995), control limits are
often fixed by using an IC reference sample to approximate the univariate distribution of the
control chart statistic. Knowledge of this distribution is used to set control limits designed to
achieve a certain maximum IC FAP.
41
In Phase I, the retrospective analysis stage of a control charting application, a fixed
number of m existing observations (or subgroups) from a reference sample are successively
compared through control chart statistics to trial control limits which are constantly revised as
OC points are identified and removed from the reference sample. This renders successive
comparisons of control chart statistics to control limits dependent, so control limits must be
determined by manipulation of the joint distribution of the control chart statistic, simulation of
the empirical joint distribution of the control chart statistic, or other techniques which account
for these dependencies.
Methods such as these will be necessary to design control limits for the data depth-based
variation of the X chart used in this research. The X chart for subgrouped data was selected as
the model for implementation because it is particularly well suited for use in a Phase I analysis.
The X chart analyzes only the information from the most recent observation or subgroup. This
makes it very effective at detecting single outliers or large shifts in a process which commonly
occur in Phase I. According to Montgomery (2005, p. 385), Shewhart-type charts (such as X
charts) "are extremely useful in Phase I implementation of statistical process control, where the
process is likely to be OC and experiencing assignable causes that result in large shifts in the
monitored parameters."
On the contrary, other methods such as cumulative sum (CUSUM), exponentially
weighted moving average (EWMA), and moving average (MA) charts use more information
from a sample and are therefore typically preferred for Phase II monitoring. A CUSUM chart is
used to plot the cumulative sum of deviations of sample values from a specified target value
[Montgomery (2005, p. 388)]. An EWMA control chart statistic is a weighted average of all
previous sample means, with the weights declining geometrically [Montgomery (2005, p. 406)].
42
The control chart statistic of an MA chart is a simple unweighted average of a specified number
of the most recent observations [Montgomery (2005, p. 417)]. Because they accumulate
information over time, CUSUM, EWMA, and MA charts detect small shifts in a process more
effectively than X and X charts, but are slower to respond to large shifts and have less ability to
detect single outliers. Furthermore, these charts are based on an implicit assumption that the
most recent observations are the most important. This assumption may not be reasonable in
Phase I when the sample size is fixed and new observations are not being added. Consistent with
this perspective, Montgomery (2005, p. 386) characterizes CUSUM and EWMA control charts
as "excellent alternatives to the Shewhart control chart for Phase II process monitoring
situations."
3.2 Design of the MMR Chart
The chart implemented in this research is the multivariate analog of Jones-Farmer et al.'s
(2009) Phase I mean-rank chart, which was designed as a distribution-free method of identifying
an IC reference sample for a univariate process with subgrouped data. The mean-rank chart is
similar in construct to the X chart for univariate subgrouped data, but it uses the standardized
average subgroup rank rather than the average of raw subgroup data values as a control statistic.
The use of ranks rather than actual data values renders the method distribution free, since the
distribution of ranks is the same regardless of the underlying distribution of a univariate process.
The mean-rank chart's IC and OC performance was shown to be comparable to the traditional X
chart when a univariate process is normally distributed, and better than the X chart in many
scenarios when a univariate process follows a heavy-tailed or skewed distribution.
43
It will be shown that the mean-rank chart of Jones-Farmer et al. (2009) performs
similarly well when adapted for use with ranked data depth values corresponding to a
multivariate process. The mean-rank chart modified for use with data depth values from a
multivariate process will be hereafter referred to as the multivariate mean-rank (MMR) chart.
Like the mean-rank chart, the MMR chart will monitor standardized average subgroup ranks
which follow the same distribution regardless of the underlying distribution of a multivariate
process, so it too will be distribution free when a process is IC.
In general, any continuous process consisting of two or more correlated variables, usually
but not always representing quality characteristics, in which data are subgrouped by design or
can be rationally subgrouped, could potentially benefit from the MMR chart proposed by this
research. Since most existing multivariate Phase I methods rely on the assumption of a
multivariate normally distributed process, the MMR chart will be particularly useful when the
process under study is clearly nonnormal or lacks sufficient history to verify an assumption of
normality. In addition, because the MMR chart is computationally inexpensive, it will be
especially useful for processes consisting of a large number of variables. Example applications
of the MMR chart include, but are not limited to industrial (e.g. chemical, power, mining, steel,
petroleum, pharmaceutical, electronics, textile, polymer, and automotive), healthcare (e.g.
clinical trials and patient satisfaction), military (e.g. weapons development, combat operations,
and soldier performance), and service organizations (e.g. finance, marketing, and customer
support).
An example military application of the MMR chart, and the one which inspired this
author's interest in quality control, is charting the progress of combat operations in Iraq. This
problem rose to the forefront of the military operations research community in early 2007, when
44
the President of the United States ordered the deployment of approximately 40,000 additional
American troops (known as "The Surge") to reverse a trend of escalating violence in Iraq.
Because the troop increase was politically polarizing and therefore closely scrutinized by the
United States Congress, it was imperative that an accurate method of assessing its effectiveness
be emplaced. Military analysts thus faced a two-fold problem -- determining a historical data set
reflecting "normal" violence levels in Iraq and implementing an appropriate method of
prospectively monitoring future violence levels during "The Surge."
In hindsight, the difficult problem of determining a historical data set would have been a
prime opportunity for application of the MMR chart. First of all, the overall level of violence in
Iraq was measured by several correlated variables related to the performance of the US-led
coalition and Iraqi security forces, the terrorist actions of various insurgent groups in Iraq, and
the safety of the Iraqi civilian populace. In addition, early data on violence levels was extremely
volatile and highly skewed due to Iraq's troubled history as well as immature and often
inaccurate reporting procedures. Furthermore, data were collected daily but aggregated into
weekly subgroups to account for differences in the pace of combat operations on different days
of the week. In this situation, the MMR chart would have been a useful tool to establish an IC
reference sample against which future weekly violence levels during "The Surge" could have
been compared using a Phase II multivariate control chart.
An all-inclusive list of potential applications for the MMR chart is not possible, but it is
the opinion of this author that it has the potential to serve as a valuable analytical tool for a wide
range of organizations in diverse settings. Its ease of execution and flexibility in solving the
distribution-free Phase I multivariate quality control charting problem for subgrouped data fills
45
in many of the existing gaps in current literature, thus providing a useful methodology for
researchers and practitioners alike.
3.2.1 The MMR Control Chart Statistic
Consider a reference sample consisting of m subgroups of size n from a p-dimensional
multivariate process in which all variables are continuous. Let the random vector Xij represent
the 1 x p row vector containing the jth observation from the ith subgroup. Treating the
observations from the m mutually independent samples of size n as a single sample of size
xN n m? as described by Jones-Farmer et al. (2009) and attributed to Kruskal and Wallis
(1952), a data depth function is applied to each Xij, resulting in a corresponding depth value
? ?;,ij NDFX where NF denotes the empirical distribution function of the pooled reference
sample. Next, integer ranks Rij = 1, 2,..., N are assigned to each ? ?;ij NDFX in the pooled
sample of size N, beginning with the largest ? ?;ij NDFX and continuing in descending order. In
other words, Rij denotes the rank of ? ?;ij NDFX when compared to all other depth values in the
pooled sample of size N, with the largest ? ?;ij NDFX receiving rank 1 and the smallest receiving
rank N. When the process is IC, the mean of the random variable Rij is ? ? 12
ij NER ??
and the
variance is ? ? ? ?? ?1112
ij NNV a r R ???
[Jones-Farmer et al. (2009, p. 306)].
In the event of a tie, the midrank method is used as a correction without affecting the
mean and variance of the random variable Rij [Jones-Farmer et al. (2009, p. 306)]. According to
the midrank method, each tied depth value receives the average of the ranks they would receive
46
if the ties were broken [Lehman (2006, p. 18)]. For example, suppose the four depth values
{0.93, 0.67, 0.67, 0.22} are to be ranked in descending order. It is clear that the largest depth
value (0.93) should be assigned rank 1 and the smallest depth value (0.22) rank 4, but the
assignment of ranks 2 and 3 to the equivalent depth values (0.67, 0.67) is ambiguous. In order to
preserve the equality of these two depth values in terms of their ranks, they will both be assigned
the average of the middle two ranks. In this example, the duplicate depth values will both be
assigned rank = (2+3)/2 = 2.5. Thus, the set of ranks corresponding to the four depth values is
{1, 2.5, 2.5, 4}.
Now consider the average of the ranks in each subgroup i, denoted by
1 .
n
ijj
i
R
R n??? (3.2.1)
If a process is IC, the ranks should be distributed evenly throughout the m subgroups, resulting in
approximately equal iR for each subgroup. For an IC process, the mean and variance of iR are,
respectively [Bakir (1989, pp. 764-765)]:
? ? 12
i NER ??
(3.2.2)
? ? ? ?? ?1 .12
i N n NV a r R n???
(3.2.3)
Invoking the central limit theorem, the random variable representing the standardized subgroup
mean rank,
? ?? ? ,ii
i
i
R E RZ
Var R
?? (3.2.4)
47
follows an approximate standard normal distribution when n is sufficiently large [Jones-Farmer
et al. (2009, p. 306)], although small subgroup sizes (e.g. n = 4, 5, or 6) are more likely in most
quality control applications [Montgomery (2005, p. 196)]. To create the MMR control chart for
use in Phase I, the control statistic Zi in Equation (3.2.4) is plotted for each of the m subgroups.
3.2.2 Empirical Control Limits for the MMR Chart
As opposed to both lower and upper control limits required for the univariate mean-rank
chart of Jones-Farmer et al. (2009), the MMR chart has only an upper control limit. This is
because with the MMR chart, observations are ranked based on data depth values rather than raw
data values. An extremely negative control chart statistic Zi occurs when a subgroup consists of
observations having extremely high depth values and correspondingly low ranks. This indicates
near-perfect centrality with respect to the p-dimensional data cloud, and is therefore no cause for
concern. Conversely, an extremely positive control chart statistic Zi is realized when a subgroup
of observations is located far away from the center of the p-dimensional data cloud, resulting in
extremely low depth values and correspondingly high ranks. Such a subgroup indicates a
potential OC condition which requires further investigation.
For each m, n combination of interest, Monte Carlo simulation of the empirical joint
distribution of the standardized subgroup mean rank was used to determine the MMR chart upper
control limits in Table 3.2.1. Recall that the joint distribution is required because successive
comparisons of control chart statistics to control limits are dependent in Phase I. Limits are
tabled for a maximum overall IC FAP of 0.10, where the FAP is the probability that the Phase I
chart with m subgroups of size n signals at least once when the process is IC. Due to the discrete
nature of the mean-rank distribution as well as simulation noise, simulated FAP values do not
48
Table 3.2.1 Empirical Control Limits for the MMR Chart
precisely match the desired FAP values. Conservative limits were chosen in order to ensure the
simulated FAP came as close as possible to the desired FAP without exceeding it. A more
comprehensive table of limits for various combinations of m, n, and FAP is provided in
Appendix C, and MATLAB code for simulating additional limits is provided in Appendix D.
The general construct of the simulation algorithm is as follows:
1) Establish a trial UCL to attain the desired overall IC FAP for a given (m, n)
combination.
2) Simulate N = m x n random numbers from a Uniform(0, 1) distribution. Assign each
number a rank from largest (rank = 1) to smallest (rank = N). Divide the resulting
ranks into m subgroups of size n.
3) Compute the average rank iR for each subgroup. Determine the corresponding
standardized subgroup mean rank Zi.
4) Compare each of the m standardized subgroup mean ranks, Zi, i = 1,...,m, to the trial
UCL. Increment a counter by one if any Zi exceeds the UCL.
5) Repeat steps 2 - 4 a total of 100,000 times.
6) Determine the empirical FAP = (final counter value)/100,000.
UCL S i m u l at e d F A P
20 5 2.476 0.0941
50 5 2.702 0.0983
100 5 2.854 0.0982
150 5 2.932 0.0983
200 5 2.985 0.0981
D e s i r e d F A P = 0. 10
m n
49
7) If the empirical FAP exceeds the desired FAP, increase the UCL. If the empirical
FAP is lower than the desired FAP, decrease the LCL.
8) Reset the counter to zero.
9) Repeat steps 2 - 8 until the desired overall IC FAP is achieved.
10) Record m, n, the desired FAP, the empirical FAP, and the UCL.
3.2.3 Analytical Control Limits for the MMR Chart
Prior to simulating empirical limits for the MMR chart, analytical control limits were
attempted using the joint distribution of the standardized mean ranks. As reported by Jones-
Farmer et al. (2009), the central limit theorem suggests that the individual standardized mean
ranks follow a standard normal distribution for sufficiently large subgroup size n. From Bakir
(1989), the joint distribution of the standardized mean ranks is asymptotically multivariate
normal with correlation matrix
1 2 1
2 1 2
x
12
1
1 ,
1
m
m
mm
mm
R
??
??
??
??
??
?
????
where ? ?1
1ij m? ?? ?
when
subgroup sizes are equal. Using a zero mean vector and the correlation structure given by Rm x m,
asymptotic control limits for the MMR chart were numerically determined through a
modification of Genz' (2011) MATLAB algorithm for evaluating the multivariate normal
distribution. Control limits were computed to achieve a maximum IC FAP of 0.10.
Next, the IC performance of the multivariate normal theory control limits was evaluated
by simulating 10,000 applications of the MMR chart using robust Mahalanobis depth to IC
bivariate normally distributed data with zero mean vector and identity covariance matrix, without
50
loss of generality. Multivariate normal theory control limits and corresponding empirical IC
FAPs for m = 20, 50(50)200 subgroups of size n = 5(5)20 are recorded in Table 3.2.2.
Table 3.2.2 Simulated IC FAPs Using Normal Theory Limits
Multivariate normal theory control limits produced empirical IC FAPs which are close to
the desired IC FAP of 0.10 for large n but unacceptably low for small n. This is because small
subgroup sizes n are insufficient to ensure the individual standardized mean ranks Zi follow a
standard normal distribution in accordance with the central limit theorem, thus preventing the
joint distribution of the standardized mean ranks from achieving asymptotic multivariate
normality. This can be seen graphically in Figure 3.2.1 depicting Q-Q plots of simulated
M V N U C L S i m u l at e d F A P
20 5 2.565 0.075 2
20 10 2.565 0.082 2
20 15 2.565 0.088 1
20 20 2.565 0.087 3
50 5 2.865 0.048 5
50 10 2.865 0.076 6
50 15 2.865 0.088 9
50 20 2.865 0.086 1
100 5 3.077 0.029 6
100 10 3.077 0.069 2
100 15 3.077 0.083 1
100 20 3.077 0.083 7
150 5 3.195 0.019 9
150 10 3.195 0.060 2
150 15 3.195 0.074 4
150 20 3.195 0.078 9
200 5 3.277 0.013 1
200 10 3.277 0.056 7
200 15 3.277 0.072 5
200 20 3.277 0.075 1
D e s i r e d F A P = 0.10
m n
51
standardized mean ranks for m = 50 and n = 5(5)20. The individual Q-Q plots show a clear
departure from normality when m = 50 and n = 5 (top left), and increasing normality as n is
raised to 20 (bottom right).
Figure 3.2.1 Q-Q Plots of Zi for m = 50, n = 5(5)20
Table 3.2.2 also illustrates that MMR chart performance using multivariate normal theory
control limits worsens with increasing m. This is easily understood if a Phase I analysis is
viewed as the partitioning of a desired overall IC FAP among m simultaneous individual
comparisons of control chart statistics to an UCL. A larger m means that a smaller portion of the
overall IC FAP is allocated to each of the m individual comparisons. This can be visualized as
- 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5
- 2 . 5
-2
- 1 . 5
-1
- 0 . 5
0
0 . 5
1
1 . 5
2
2 . 5
S t a n d a r d N o r m a l Q u a n t i l e s
Q
u
a
n
ti
le
s
o
f
In
p
u
t
S
a
m
p
le
Q Q P l o t o f Z i ( m = 5 0 , n = 5 ) v e r s u s S t a n d a r d N o r m a l
- 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5
-3
-2
-1
0
1
2
3
S t a n d a r d N o r m a l Q u a n t i l e s
Q
u
a
n
ti
le
s
o
f
In
p
u
t
S
a
m
p
le
Q Q P l o t o f Z i ( m = 5 0 , n = 1 0 ) v e r s u s S t a n d a r d N o r m a l
- 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5
-3
-2
-1
0
1
2
3
S t a n d a r d N o r m a l Q u a n t i l e s
Q
u
a
n
ti
le
s
o
f
In
p
u
t
S
a
m
p
le
Q Q P l o t o f Z i ( m = 5 0 , n = 1 5 ) v e r s u s S t a n d a r d N o r m a l
- 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5
- 2 . 5
-2
- 1 . 5
-1
- 0 . 5
0
0 . 5
1
1 . 5
2
2 . 5
S t n d a r d N o r m a l Q u a n t i l e s
Q
u
a
n
ti
le
s
o
f
In
p
u
t
S
a
m
p
le
Q Q P l o t o f Z i ( m = 5 0 , n = 2 0 ) v e r s u s S t a n d a r d N o r m a l
52
the UCL being pushed progressively farther into the upper tail of the standard normal
distribution of each individual control chart statistic. As this happens, the effects of any
departures of the distribution of the individual control chart statistic from standard normality will
be exacerbated. This in turn will lead to undesired empirical FAPs for the MMR chart using
multivariate normal theory control limits.
Multivariate normal theory control limits could be used to provide conservative limits for
a very small number of subgroups or very large subgroup sizes, but empirical control limits are
much more consistent in maintaining the desired IC FAP for the range of m and n considered in
this research. An alternative to the "one size fits all" multivariate normal theory control limits
for the MMR chart is to enumerate the distribution of the standardized mean rank for each
combination of number of subgroups m and subgroup size n, and use this information to derive
the corresponding joint distribution of the standardized mean ranks. However, this method is
clearly impractical for the number of subgroups considered in this research, again supporting the
use of empirically determined control limits for the MMR chart.
3.3 Example Application of the MMR Chart
In order to fully understand the workings of an MMR chart, a simple example is
provided. Consider the first subgroup of a bivariate process consisting of m = 50 subgroups of
size n = 5 from an unknown distribution F. Let the random vector Xij represent the 1 x 2 row
vector containing the jth observation from the ith subgroup, where i = 1 and j = 1 - 5. The data,
along with corresponding robust Mahalanobis depth values and ranks, are listed in Table 3.3.1.
53
i j Xij RMD(Xij;F250) Rij
1 1 5.1880 2.4570 0.3311 197
1 2 0.7332 4.7681 0.2904 218
1 3 3.3695 4.3434 0.4533 127
1 4 4.5465 4.7078 0.3258 201
1 5 3.0102 3.8656 0.5677 61
Table 3.3.1 MMR Chart Data for the First Subgroup of a Bivariate Process
Note that Rij reflects rankings with respect to the pooled reference sample of size N =
250. The average of the ranks in the first subgroup is ? ?
1 1 9 7 2 1 8 1 2 7 2 0 1 6 1 1 6 0 . 8 0 .5R ? ? ? ???
Using Equations (3.2.2) and (3.2.3), ? ? 1 2 5 0 1 1 2 5 . 5 022
i NER ??? ? ?
and
? ? ? ? ? ? ? ? ? ?? ?1 2 5 0 5 2 5 0 1 1 0 2 4 . 9 2 .1 2 1 2 5i N n NV a r R n? ? ? ?? ? ? Using Equation (3.2.4), the
standardized mean rank for the first subgroup is ? ?? ?1
1
1 6 0 . 8 0 1 2 5 . 5 0 1 . 1 0 3 .
1 0 2 4 . 9 2ii
R E RZ
V a r R
? ?? ? ?
Given a desired IC FAP of 0.10, the MMR chart UCL for m = 50, n = 5 is found from
Table 3.2.1 to be 2.702. Since Z1 is less than 2.702, it is concluded that the first subgroup is IC.
In order to complete the MMR chart, this process is repeated for subgroups i = 2 - 50. Any Zi
exceeding the UCL will have its corresponding subgroup Xi. removed from the sample if no
assignable cause is found, thus establishing the IC reference sample for use in Phase II.
Using the control limits in Table 3.2.1, the next step is to compare the performance of the
MMR chart using both robust Mahalanobis depth and Mahalanobis spatial depth to the best
multivariate parametric Phase I alternative. All control charts will be tested on normal, heavy-
tailed, and skewed multivariate data, with both isolated and sustained shifts of the mean. Details
concerning the testing and evaluation process are provided in Chapter 4.
54
4 MMR Chart Performance Assessment Methodology
4.1 Introduction
To assess the effectiveness of the MMR chart as a distribution-free method of
establishing an IC reference sample, its performance will be compared to an equivalent Phase I
parametric multivariate method. If there were any other multivariate nonparametric or
distribution-free Phase I methods in existence, they would also yield useful comparisons.
However, the MMR chart appears to be the first in this class of control charts.
Because the MMR chart is a Shewhart-type chart, it must naturally be compared to
another Shewhart-type chart for subgrouped multivariate data. From the literature review in
Chapter 1, there is no clear consensus on the preferred Phase I parametric method. Because the
original Hotelling's T2 chart is the most common baseline performance measure for subsequently
developed Phase I parametric multivariate methods, it will likewise be used as a basis of
comparison for the distribution-free MMR chart.
4.2 Establishing Baseline Performance Using Hotelling's T2 Chart
Constructing Hotelling's T2 chart for a reference sample consisting of m subgroups of size
n from a p-dimensional multivariate process requires first calculating unbiased estimates of the
mean vector and covariance matrix. From Montgomery (2005, p. 495) the classical estimators
55
are
1
1 m i
im ?? ?XX
(4.2.1)
and
1
1 ,m i
im ?? ?SS
(4.2.2)
where X represents the average of the m subgroup mean vectors and S represents the average
of the m subgroup covariance matrices. Using these estimated parameters, the control statistic is
computed as
? ? ? ?12 .ii
iTn ??? ? ?X X S X X
(4.2.3)
The control statistic for each subgroup is compared to the Phase I UCL given by Alt's (1976)
formula:
? ? ? ? ? ? ? ?? ?
2 , , 1
11, , , w h e r e , , .1p m n m p
T
p m nU C L C m n p F C m n p m n m p? ? ? ? ???? ? ? ? (4.2.4)
In Equation (4.2.4) above, , , 1p mn m pF? ? ? ? represents the (1 - ?)th percentile of the F distribution
with p and (mn - m - p + 1) degrees of freedom, and ? is the desired IC FAP for each individual
subgroup. In order to achieve a desired overall IC FAP for all m subgroups in a reference data
set, ? must be set as follows:
? ?1/1 1 ,mo vera ll??? ? ? (4.2.5)
56
where ?overall is the desired overall IC FAP. For example, for a reference sample consisting of m
= 50 subgroups and a desired overall IC FAP of 0.05, ? ?1 / 5 01 1 0 .0 5 0 .0 0 1 0 2 5? ? ? ? ? would be
used in Equation (4.2.4) to determine the Phase I UCL.
Alt's (1976) formula given in Equation (4.2.4) was derived using the IC distribution of
the T2 statistic given in Equation (4.2.3) under the assumption of multivariate normally
distributed data. Therefore, it is not appropriate for use when the distribution of the data is
nonnormal because it will not result in the desired IC FAP. Having a common baseline level of
performance is essential to a valid comparison of OC performance among all charts considered,
so control limits for Hotelling's T2 chart must be empirically adjusted when the data under study
are nonnormally distributed. This will be accomplished using an algorithm similar to the one for
determining MMR empirical control limits detailed in Chapter 3. Hotelling's T2 empirical
control limits used in this research are provided in Appendix E, and the MATLAB code used to
determine them is provided in Appendix F.
4.3 Simulating Symmetric and Skewed Process Distributions
The MMR and Hotelling's T2 charts will be tested on IC as well as mean-shifted data
from normal, heavy-tailed, and skewed distributions with dimensions p = 2, 5, and 10. Due to
affine equivariance of the mean vector and covariance matrix, multivariate normal data will be
generated without loss of generality from the standard multivariate normal distribution, Np(0, I),
where 0 is a p-dimensional mean vector of all zeros and I is a p x p identity matrix. Heavy-tailed
data will be represented by the multivariate t distribution, also using Ip x p as the covariance
matrix. Variations of the multivariate t distribution will include both 10 and 3 degrees of
freedom corresponding to increasingly fatter tails. Finally, skewed data will come from a
57
multivariate lognormal distribution, standardized to have zero mean vector and identity
covariance matrix. The data will be simulated using MATLAB code from the MathWorks
Statistics Toolbox at http://www.mathworks.com/help/toolbox/stats/. A summary of all planned
experiments is illustrated in Table 4.3.1.
Table 4.3.1 Summary of Planned Experiments
4.4 Evaluating In-Control Performance
The MMR and Hotelling's T2 charts will first be evaluated based on their ability to
maintain a desired IC FAP for subgrouped data from multivariate normal, multivariate t, and
multivariate lognormal distributions. It is expected that only the MMR chart, because it is
distribution-free, will be able to maintain the desired IC FAP across all combinations of sample
and subgroup sizes. Furthermore, IC performance of the MMR chart should be invariant to the
choice of depth function used.
The algorithm for these simulations, which will be performed in MATLAB, is as follows:
1) Simulate m subgroups of size n from a p-dimensional normal, t, or lognormal
distribution.
n o s h i f t 2 5 10 2 2 5 10 2 5
i s ol a t e d s h i f t 2 5 10 2 2 5 10 2 5
5 / 15 / 30 % s u s t a i n e d s h i f t s 2 10 2 10 2 5
n o s h i f t 2 5 10 2 2 5 10 2
i s ol a t e d s h i f t 2 5 10 2 2 5 10 2
5 / 15 / 30 % s u s t a i n e d s h i f t s 2 10 2 10 2
n o s h i f t 2 2 2 5 2 5
i s ol a t e d s h i f t 2 2 2 5 2 5
5 / 15 / 30 % s u s t a i n e d s h i f t s 2 2 5
N u m b e r / S i z e o f S u b gr ou p s : m = 2 0, 5 0( 50 ) 20 0; n = 5
P r oc e s s D i s t r i b u t i on ( i n p = 2 , 5 , o r 1 0 D i m e n s i on s ) :
n or m a l t ( 10 ) t ( 3) l og n or m a l
H ot e l l i n g ' s T
2
C h a r t
M M R - R M D C h a r t
M M R - M S D C h a r t
C on t r ol C h ar t S h i f t T yp e
58
2) Establish the UCL for the MMR or Hotelling's T2 chart.
3) Compute control chart statistics for each subgroup and compare to the UCL. If at
least one control chart statistic exceeds the UCL, increment a counter by one.
4) Repeat steps 1 - 3 a total of 10,000 times.
5) Estimate the overall IC FAP = (final counter value)/10,000.
This process will be repeated for all desired combinations of m, n, p, process distribution, and
control chart. MATLAB algorithms for simulating IC performance for the MMR and Hotelling's
T2 charts are provided in Appendix G and Appendix H, respectively.
4.5 Evaluating Out-of-Control Performance
Next, the MMR and Hotelling's T charts will be evaluated in terms of their ability to
detect isolated and sustained shifts of the mean. An isolated shift of the mean is defined as a
location shift occurring in a single subgroup of size n. Because the probability of detection is
independent of the location of a shift within a data set, isolated shifts will take place in the first
subgroup of each simulated data set without loss of generality. A sustained shift of the mean is
defined as a location shift occurring in a certain percentage of the pooled sample of size N.
Sustained shift percentages tested will include 5%, 15%, and 30%, and will take place at the end
of each data set. Sustained shifts could be induced anywhere in the data set without loss of
generality, but being at the end is most logical since it is unlikely that a process would go from
an OC state back to an IC state without outside intervention.
The magnitude of the various shifts imposed will vary depending on the scenario being
evaluated. This is because both the dimension of the data and the type of shift have a direct
impact on the probability of a shift being detected. In general, all shifts are easier to detect in
59
lower dimensions than in higher dimensions, and sustained shifts are easier to detect than
isolated shifts. The magnitude of a shift will be measured by the noncentrality parameter
1 ,? ? ?? ??? (4.5.1)
where the process mean vector shifts from o? to o??? and ? is the process covariance matrix.
Because the direction of a shift does not affect control chart performance with elliptically
symmetric distributions, shifts will be fixed in the direction of ? ?1 1,0,...,0?e without loss of
generality [Stoumbos and Sullivan (2002), p. 265]. Shift directions for skewed distributions will
be discussed in Section 4.6.
OC performance for a control chart will be quantified in terms of the empirical alarm
probability (EAP), where EAP is defined as the estimated probability of a chart signaling at least
once in an OC situation. Ideally, a control chart's EAP should be 100% for all scenarios
involving induced location shifts. It is hoped that the MMR chart's performance will match that
of Hotelling's T2 chart for normally distributed data and surpass the T2 chart's performance for
nonnormally distributed data.
The algorithm for simulating OC performance is slightly different than the IC case, and is
detailed as follows:
1) Simulate m subgroups of size n from a p-dimensional normal, t, or lognormal
distribution.
2) Add isolated or sustained location shifts to the desired subgroups.
3) Establish the UCL for the MMR or Hotelling's T2 chart.
4) Compute control chart statistics for each subgroup and compare to the UCL. If at
least one control chart statistic exceeds the UCL, increment a counter by one.
5) Repeat steps 1 - 4 a total of 10,000 times.
60
6) Estimate the EAP = (final counter value)/10,000.
This process will be repeated for all combinations of m, n, p, process distribution, shift type, and
control chart. MATLAB algorithms for simulating OC performance for the MMR and
Hotelling's T2 charts are also provided in Appendix G and Appendix H, respectively.
4.6 Evaluating Out-of-Control Performance with Skewed Data
Control chart performance with skewed distributions will be assessed using multivariate
lognormally distributed data, simulated using the transformational relationship between the
multivariate normal and the multivariate lognormal distributions. A p-dimensional multivariate
lognormal random vector X can be represented as ? ?
12, ,..., ,pYYYe e e?X
where Y is multivariate
normal ? ?N,p Y Y?? [Law and Kelton (2000), p. 382]. Applying this transformation using a
multivariate normal random vector Y with mean vector ? ?12, ,...,Yp? ? ??? and covariance
matrix Y? with ij? ?the (i,j)th entry, the resulting multivariate lognormal random vector X has
the following properties [Law and Kelton (2000), p. 382]:
? ? ? ?/2i iiiE X e ???? (4.6.1)
? ? ? ? ? ?2 1i ii iiiV X e e?? ???? (4.6.2)
? ? ? ? 2, 1 .ii jjijijijC o v X X e e ????? ??????????? (4.6.3)
Simulating multivariate lognormal observations is therefore simply a matter of generating
? ?12, ,..., pY Y Y?Y ~ ? ?N,p Y Y? ? and then evaluating ? ?12, ,..., .pYYYe e e?X Without loss of
61
generality, this research will use Y ~ ? ?,pN 0I to create multivariate lognormal data X having the
following properties:
? ? 1/ 2 1 .6 4 8 7iE X e?? (4.6.4)
? ? ? ?1 4 .6 7 0 8iV X e e? ? ? (4.6.5)
? ?, 0.ijCov X X ? (4.6.6)
In order to maintain consistency with other simulated distributions used in this research, the
multivariate lognormal data X will be standardized using ? ? 1/ 2 ,i i X X???XX ? ? where X? is a 1
x p mean vector with all entries equal to 1.6487 and X? is a p x p covariance matrix with
diagonal entries equal to 4.6708 and zeros everywhere else, resulting in X ~ lognormal (0, I).
Once the multivariate lognormal data are simulated, isolated and sustained shifts will be
induced to evaluate OC performance. As noted by Stoumbos and Sullivan (2002, p. 265), while
the direction of a shift has no effect on control chart performance with elliptically symmetric
distributions, it can substantially affect a control chart's detection power with skewed
distributions. One method of handling this is to focus on the shift direction having the most
dramatic effect on control chart performance, but this is a difficult task because there are an
infinite number of shift directions from which to choose in a multivariate setting [Stoumbos and
Sullivan (2000), p. 267]. Even if the most impactful shift direction could be determined, its odds
of occurring in practice are unknown. As pointed out by J. Sullivan (personal communication,
February 2, 2011), there is no guidance found in the literature regarding the likelihood of certain
shift directions occurring, so a better approach is to assume that all shift directions are equally
probable. Under this assumption, as done by Stoumbos and Sullivan (2000), the effects of shift
directions randomly generated over a uniform distribution will be averaged.
62
The shift directions will be generated using an algorithm proposed by Johnson (1987, p.
127), who stated that a p-dimensional shift can be created by first generating p independent
standard normal random variates 12, ,..., .pZ Z Z Next, the shift vector ? which follows a uniform
distribution on the p-sphere is computed using
? ?
221 , 1 , 2 , . . . , .... ii p
Z ipZZ? ???? (4.6.7)
A different ? will be generated for each of the 10,000 iterations of the simulation, and the results
will be averaged at the conclusion of the simulation. In two dimensions, this method of creating
shift vectors is analogous to randomly generating a series of unit vectors which emanate from the
origin and terminate along the boundary of the unit sphere.
As with elliptically symmetric distributions, the magnitude of the various shifts imposed
will be measured by the noncentrality parameter given in Equation (4.5.1), where the
multivariate lognormal process mean vector shifts from o? to o??? and ? is the asymptotic
covariance matrix of the multivariate lognormal process. With ? as defined by Equation (4.6.7)
and ? equal to the identity matrix, ? always equals one. In order to induce shifts corresponding
to 1,?? the shift vector ? resulting from Equation (4.6.7) must be multiplied by the desired ??
thus shortening or lengthening the unit vector to achieve the desired ?.
For example, suppose it is desired to induce a shift of size ??= 3 into a bivariate
lognormal process with identity covariance matrix. Using Equation (4.6.7), a possible shift
vector is ? ?-0 .7 4 6 8 , 0 .6 6 5 1 .?? If this shift vector is applied directly to the process without any
scaling, the magnitude of the resulting shift is 1 1.? ? ?????? However, using
? ?3 -2 .2 4 0 4 , 1 .9 9 5 3?? produces the desired result of 13 3 3.? ? ?????? This methodology
63
will be employed for all simulations involving OC conditions in multivariate lognormally
distributed data. Once all simulations have been completed and results analyzed,
recommendations will be provided on how best to proceed in a Phase I multivariate quality
control scenario when a process distribution is normal, heavy-tailed, or skewed.
64
5 MMR Chart Performance Comparisons
5.1 Introduction
MMR chart performance comparisons to Hotelling's T2 (HT2) chart were focused
primarily on m = 20, 50(50)200 subgroups of size n = 5. The number of subgroups was chosen
to be relatively small because a Phase I analysis often occurs early in the life of a process when
very little historical data is available. A subgroup size of five was chosen because Jones-Farmer
et al. (2009) showed that this is the minimum subgroup size necessary for reliable univariate
mean-rank chart performance, and further testing using the MMR chart confirmed this to be true
in the multivariate case as well. Limited experimentation was conducted using subgroup sizes n
= 5(5)20 in order to demonstrate the enhancing effect of larger subgroup sizes on MMR chart
performance. In all simulations, the desired IC FAP was set to 0.10, but the results can be
generalized to other common IC FAPs such as 0.05.
5.2 MMR Chart Performance with Symmetric Distributions
Symmetric distributions tested include the multivariate normal, t(10), and t(3)
distributions. When evaluating IC performance of Hotelling's T2 chart, Alt's (1976) Phase I UCL
was used for all process distributions. For OC assessments, Alt's (1976) Phase I UCL was used
for the multivariate normal case only, and empirically adjusted UCLs were used for the t(10) and
t(3) cases. RMD was the primary depth function used in the MMR chart because it is well-suited
65
for elliptically symmetric distributions and one of the simplest depth functions to compute, but
MSD was implemented in a few cases for comparison purposes. Simulation results show that
when data are normally or nearly normally distributed, a normal-theory method such as
Hotelling's T2 chart is preferred. However, when data are heavy-tailed as with the t(3)
distribution, the distribution-free MMR chart is usually a superior alternative.
5.2.1 In-Control Performance with Symmetric Distributions
The fundamental advantage of a distribution-free control chart is its ability to maintain a
desired IC FAP for any process distribution. Accordingly, the MMR chart using both RMD and
MSD was first compared to Hotelling's T2 chart using IC bivariate normal, t(10), and t(3)
processes with a desired IC FAP of 0.10. For these comparisons, Hotelling's T2 chart was
constructed using only Alt's (1976) Phase I UCL given by Equation (4.2.4), adjusted for the
number of subgroups using Equation (4.2.5), in order to demonstrate the effects of applying a
normal-theory method to both normally and nonnormally distributed data.
As indicated in Figure 5.2.1, Hotelling's T2 chart maintains the desired IC FAP for the
bivariate normal process, but becomes progressively worse as the distribution deviates from
normality and the number of subgroups is increased. For a bivariate t(3) process, the IC FAP for
Hotelling's T2 chart using Alt's (1976) Phase I UCL ranges from approximately 30% when m =
20 to over 90% when m = 200. This is why, for OC assessments with nonnormally distributed
data, the UCL for Hotelling's T2 chart must be empirically tailored to achieve the desired IC
FAP of 0.10 for each (m, n) combination and process distribution studied. Although this is
impracticable outside of a simulation environment because it requires knowing the exact process
distribution, it is necessary in order to ensure a common basis of comparison for all charts
66
included in OC performance comparisons. The MMR chart, on the other hand, consistently
maintains the desired IC FAP for all process distributions and any number of subgroups. This
holds true regardless of the data depth measure used, so no adjustments to the MMR chart UCLs
given in Table 3.2.1 are necessary.
67
Figure 5.2.1 Empirical IC FAPs for Symmetric Bivariate Distributions
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l F
AP
(m,n)
Bivariate Normal Process
HT2
RMD
MSD
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l F
AP
(m,n)
Bivariate t(10) Process
HT2
RMD
MSD
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l F
AP
(m,n)
Bivariate t(3) Process
HT2
RMD
MSD
68
Figure 5.2.2 shows the effects of dimensionality on control chart performance using a t(3)
process. Again, the MMR chart consistently maintains the desired IC FAP for any number of
subgroups m and any dimension p. Hotelling's T2 chart becomes distinctly worse in higher
dimensions, reaching empirical IC FAPs near 100% for all but the smallest number of subgroups
considered when p = 10. These results show that the MMR chart is distribution free in any
dimension when applied to elliptically symmetric data using RMD, MSD, or presumably any
other depth function with similar statistical properties. A complete table of IC performance data
for symmetric distributions is provided in Appendix I.
69
Figure 5.2.2 Empirical IC FAPs for t(3) Processes in Higher Dimensions
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l F
AP
(m,n)
Bivariate t(3) Process
HT2
RMD
MSD
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l F
AP
(m,n)
t(3) Process, p = 5
HT2
RMD
MSD
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l F
AP
(m,n)
t(3) Process, p = 10
HT2
RMD
MSD
70
5.2.2 Isolated Shifts of the Mean with Symmetric Distributions
MMR-RMD and Hotelling's T2 chart performance for isolated shifts in two dimensions
for (m, n) combinations (20, 5), (100, 5), and (200, 5) are shown in Figure 5.2.3. Hotelling's T2
chart using Alt's (1976) Phase I limits is superior in the case of bivariate normally distributed
data, as expected. For slightly nonnormal data following a bivariate t(10) distribution,
Hotelling's T2 chart with empirically adjusted UCL maintains a smaller but still notable
advantage over the MMR-RMD chart. For heavy-tailed process data following a bivariate t(3)
distribution depicted in the bottom panel of Figure 5.2.3, however, the MMR-RMD chart is both
significantly better and much more consistent than Hotelling's T2 chart in terms of EAP. The two
control charts are roughly equivalent when m = 20, but the performance of Hotelling's T2 chart
declines dramatically as m is increased to 200, whereas MMR chart performance is far less
affected when the number of subgroups is increased. For example, in the case of bivariate t(3)
data with an isolated shift of magnitude ? = 6, EAPs for m = 20, 100, and 200 are approximately
100% using the MMR-RMD chart as compared to 100%, 92%, and 46%, respectively, for
Hotelling's T2 chart with empirical UCL.
71
Figure 5.2.3 Control Chart Performance on Symmetric Bivariate Data with an IS
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Normal Process with an IS
RMD (20,5)
RMD (100,5)
RMD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(10) Process with an IS
RMD (20,5)
RMD (100,5)
RMD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with an IS
RMD (20,5)
RMD (100,5)
RMD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
72
MMR chart performance for isolated shifts is relatively invariant to the choice of depth
function. Figure 5.2.4 shows the application of an MMR chart using both RMD and MSD to a
bivariate t(3) process with (m, n) combinations (20, 5) and (200, 5). The MMR-RMD chart has a
slight advantage over the MMR-MSD chart for m = 20 subgroups, but the two charts are nearly
identical in terms of EAP when m = 200. Repeating this analysis using other symmetric
distributions yielded similar results in both two and five dimensions.
Figure 5.2.4 MMR-RMD/MSD Chart Performance on t(3) Data with an IS
The MMR chart loses some power to detect isolated shifts of the mean as the dimension
of the data is increased, but it retains clear superiority over Hotelling's T2 chart in most scenarios
considered. When applied to a heavy-tailed process represented by the t(3) distribution, the
MMR-RMD chart is a substantially better alternative for m ? 50 in five dimensions and for m ?
100 in ten dimensions. This is illustrated in Figure 5.2.5.
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with an IS
RMD (20,5)
RMD (200,5)
MSD (20,5)
MSD (200,5)
73
Figure 5.2.5 Control Chart Performance on t(3) Data with an IS in Higher Dimensions
Complete tables of results for all simulations performed using symmetric distributions
with isolated shifts of the mean are provided in Appendices J - L.
5.2.3 Sustained Shifts of the Mean with Symmetric Distributions
The MMR-RMD chart is generally superior to Hotelling's T2 chart in detecting sustained
shifts of the mean in a bivariate t(3) process, although some loss of power is observed as the
level of contamination in the sample is increased. Figure 5.2.6 depicts control chart performance
for sustained mean shifts composing 5%, 15%, and 30% of the total data sets. For a 5%
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
t(3) Process with an IS in p = 5
RMD (50,5)
RMD (100,5)
RMD (200,5)
HT2 (50,5)
HT2 (100,5)
HT2 (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
t(3) Process with an IS in p = 10
RMD (100,5)
RMD (200,5)
HT2 (100,5)
HT2 (200,5)
74
contamination level, MMR-RMD chart performance matches Hotelling's T2 chart performance
for m = 20 and surpasses it by an increasing margin as m is increased from 50 to 200. Similar
trends are observed for a 15% level of contamination, but m ? 50 subgroups are necessary for
MMR-RMD chart performance to exceed that of Hotelling's T2 chart. When the level of
contamination is raised to 30%, m ? 150 subgroups are necessary for the MMR-RMD chart to
consistently outperform Hotelling's T2 chart.
For each sustained shift scenario considered, MMR-RMD chart performance is
remarkably consistent when at least 50 subgroups are present. For example, in the 15%
sustained shift scenario depicted in the middle panel of Figure 5.2.6, the lines representing
MMR-RMD chart performance for m = 50, 100, and 200 subgroups are nearly coincident. On
the other hand, Hotelling's T2 chart performance declines rapidly as the number of subgroups is
increased. However, the fact that the overall detection power of the MMR chart declines as the
level of contamination is raised from 5% to 30% is counterintuitive, as one would expect the
opposite to hold true. This is shown in Section 5.5 to be an unavoidable consequence of a rank-
based control charting method.
75
Figure 5.2.6 Control Chart Performance on Increasingly Contaminated Bivariate t(3) Data
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with a 5% SS
RMD (20,5)
RMD (100,5)
RMD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with a 15% SS
RMD (50,5)
RMD (100,5)
RMD (200,5)
HT2 (50,5)
HT2 (100,5)
HT2 (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with a 30% SS
RMD (150,5)
RMD (200,5)
HT2 (150,5)
HT2 (200,5)
76
For the MMR chart, RMD is a more effective depth measure than MSD in the presence
of a sustained mean shift in a bivariate t(3) process. MMR-MSD chart detection power lags only
slightly behind MMR-RMD chart performance under a 5% contamination level, but falls farther
behind when the contamination is increased to 15% and becomes unacceptably low at the 30%
contamination level. This effect is illustrated in Figure 5.2.7. Based on these results, RMD is
clearly the preferred depth measure for the MMR chart when data are elliptically symmetric.
Figure 5.2.7 MMR-RMD/MSD Chart Performance on Bivariate t(3) Data with a 30% SS
As with isolated shifts of the mean, the MMR chart's ability to detect sustained shifts of
the mean is somewhat degraded as the dimension of the data is increased. In the ten-dimensional
t(3) process with a 15% sustained mean shift shown in Figure 5.2.8, the MMR-RMD chart
matches or exceeds Hotelling's T2 chart performance for m ? 100, in contrast to m ? 50 required
for the bivariate t(3) case depicted in the middle panel of Figure 5.2.6. Similar results are seen
with 5% and 30% sustained shifts of the mean imposed upon a ten-dimensional t(3) process.
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with a 30% SS
RMD (20,5)
RMD (200,5)
MSD (20,5)
MSD (200,5)
77
Figure 5.2.8 Control Chart Performance on t(3) Data with a 15% SS in p = 10
Complete tables of results for all simulations performed using symmetric distributions
with sustained shifts of the mean are provided in Appendices M - R. In addition, a matrix of
recommended control chart usage with heavy-tailed multivariate data under both isolated and
sustained shifts of the mean is provided in Table 5.2.1. Although Hotelling's T2 chart
outperforms the MMR-RMD chart for all scenarios in which m ? 50, n = 5, and p = 10, even m =
50 subgroups of size n = 5 would be considered an exceptionally small reference sample for a
ten-dimensional process. The MMR-RMD chart is a better alternative than Hotelling's T2 chart
for most of the more realistic scenarios considered in ten dimensions. Furthermore, for any
scenario in which Hotelling's T2 chart outperforms the distribution-free MMR chart, it should be
reiterated that its implementation requires empirically adjusted UCLs based on the exact
distribution of the process under study. Since the process distribution is unlikely to be known in
practice, another control charting technique must be sought for scenarios in Table 5.2.1 labeled
"HT2."
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
t(3) Process with a 15% SS in p = 10
RMD (100,5)
RMD (150,5)
RMD (200,5)
HT2 (100,5)
HT2 (150,5)
HT2 (200,5)
78
Table 5.2.1 Recommended Phase I Control Chart Usage for Heavy-Tailed Data
5.3 MMR Chart Performance with Skewed Data
The multivariate lognormal distribution was the lone skewed distribution tested. As with
the symmetric distributions evaluated, Hotelling's T2 chart was used in conjunction with Alt's
(1976) Phase I UCL for the IC case and with empirically adjusted UCLs for the OC scenarios.
Most MMR charts were created using MSD, since MSD was expected to outperform RMD on
skewed process data. Performance comparisons were focused on m = 20, 100, and 200
subgroups and dimensions p = 2 and 5 because MSD, although quickly computable for a single
data set, is considerably more time consuming than RMD when performing 10,000 replications.
Simulation results show that when data are skewed, the distribution-free MMR chart almost
always represents the best available control charting methodology.
5.3.1 In-Control Performance with Skewed Data
In order to validate its performance as a distribution-free method when process data are
skewed, MMR and Hotelling's T2 charts were first applied to IC lognormal processes in both two
and five dimensions using a desired IC FAP of 0.10. As with the symmetric distributions tested,
C on t am i n at i on
L e ve l ( 20,5) ( 50,5) ( 100,5) ( 150,5) ( 200,5)
I S
5% S S M M R - R M D
15% S S
30% S S H T 2
I S
5% S S M M R - R M D
15% S S
30% S S
( m ,n )
H T 2
p
2
10
79
Hotelling's T2 chart was constructed using Alt's (1976) Phase I UCL given by Equation (4.2.4),
adjusted for the number of subgroups using Equation (4.2.5), in order to demonstrate the
negative consequences of applying a normal-theory method to skewed data. The results of the
IC performance analysis are illustrated in Figure 5.3.1.
Figure 5.3.1 Empirical IC FAPs for Lognormal Processes in p = 2 and p = 5
Hotelling's T2 using Alt's (1976) Phase I UCL chart categorically fails to maintain the
desired IC FAP for multivariate lognormal processes. The IC FAP for Hotelling's T2 chart
ranges from approximately 43% to 99% in two dimensions and from approximately 48% to
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l F
AP
(m,n)
Bivariate Lognormal Process
HT2
RMD
MSD
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l F
AP
(m,n)
Lognormal Process in p = 5
HT2
MSD
80
100% in five dimensions. In contrast, the MMR charts using RMD and MSD with the UCLs
from Table 3.2.1 consistently maintain the desired IC FAP of 0.10 for all (m, n) combinations
considered, solidifying its characterization as a distribution-free method. A complete table of IC
performance data for skewed data is provided in Appendix S.
5.3.2 Isolated Shifts of the Mean with Skewed Data
The performance of MMR-MSD and Hotelling's T2 charts under isolated shifts of the
mean in bivariate lognormally distributed data is displayed in Figure 5.3.2. Even with UCLs
empirically adjusted to achieve an IC FAP of 0.10, Hotelling's T2 chart performance deteriorates
rapidly for m > 20. The MMR chart not only outperforms Hotelling's T2 chart by a wide margin,
but its performance is extremely consistent for all m.
Figure 5.3.2 Control Chart Performance on Bivariate Lognormal Data with an IS
In Figure 5.3.3, the previous scenario is repeated for m = 100, n = 5 using the MMR-
RMD chart in order to compare the performance of MSD and RMD as depth functions. As
originally hypothesized, it is seen that the MMR-MSD chart detects smaller isolated shifts with
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with an IS
MSD (20,5)
MSD (100,5)
MSD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
81
higher probability than the MMR-RMD chart, and offers equivalent performance in the case of
larger shifts.
Figure 5.3.3 MMR-MSD/RMD Chart Performance on Bivariate LGN Data with an IS
Although the MMR chart's gradual loss in power with symmetric distributions in higher
dimensions is also observed with skewed data, it remains notably better than Hotelling's T2 chart.
In the five-dimensional scenario depicted in Figure 5.3.4, the MMR-MSD chart matches
Hotelling's T2 chart performance for m = 20 and dominates for m > 20, making it clearly the best
alternative for detecting isolated shifts occurring in skewed process data when p ? 5.
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with an IS
MSD (100,5)
RMD (100,5)
82
Figure 5.3.4 Control Chart Performance on LGN Data with an IS in p = 5
Complete tables of results for all simulations performed using the multivariate lognormal
distribution with isolated shifts of the mean are provided in Appendices T and U.
5.3.3 Sustained Shifts of the Mean with Skewed Data
MMR-MSD chart performance in detecting sustained shifts of the mean in skewed
bivariate data varies greatly with the percentage of data shifted. As shown in Figure 5.3.5, the
MMR-MSD chart is universally more powerful than Hotelling's T2 chart in detecting 5% and
15% sustained shifts in a bivariate lognormal process and demonstrates very consistent
performance across the range of m considered. However, a different story is seen with a 30%
sustained shift of the mean, as MMR-MSD chart performance falls to unacceptable levels.
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Lognormal Process with an IS in p=5
MSD (20,5)
MSD (100,5)
MSD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
83
Figure 5.3.5 Control Chart Performance on Increasingly Contaminated LGN Data in p = 2
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with a 5% SS
MSD (20,5)
MSD (100,5)
MSD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with a 15% SS
MSD (20,5)
MSD (100,5)
MSD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with a 30% SS
MSD (20,5)
MSD (100,5)
MSD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
84
Further testing revealed that the MMR-MSD chart is robust to sustained shifts of the
mean in skewed bivariate data with contamination levels up to approximately 20%. MMR-MSD
chart performance for m = 100, n = 5 and contamination levels 5(5)30% is illustrated in Figure
5.3.6.
Figure 5.3.6 MMR-MSD Chart Performance on Increasingly Contaminated LGN Data
Surprisingly, this happens despite the fact that the multivariate median determined by the
MSD function has RBP equal to 1/2, indicating a high degree of robustness to outliers.
However, as noted by R. Serfling (personal communication, June 6, 2011), the RBPs of other
quantiles determined by MSD decrease from the median outward. This means that when a high
percentage of a data set is shifted, even though the center of the data is well estimated by MSD,
the overall center-outward ordering may be adversely affected by outlying points.
In order to demonstrate this, a simple example using bivariate lognormal data with a 30%
randomly directed, sustained mean shift with ??= 4 is presented in Figure 5.3.7. For illustrative
purposes, only 20 individual observations are simulated, so 14 points are IC and 6 are OC. In
Figure 5.3.7, each bivariate data point is labeled with two numbers representing ranks
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with Various SS
5% SS
15% SS
20% SS
25% SS
30% SS
85
determined by the MSD function and the RMD function, respectively. Ideally, the IC points
should be labeled with the highest ranks from 1 (most central) to 14, and the OC points should be
labeled with the lowest ranks from 15 to 20 (most outlying).
Figure 5.3.7 MSD and RMD Rankings for Bivariate LGN Data with a 30% SS
This holds true for the ranks determined by the RMD function, but is clearly not the case
with MSD. Close scrutiny of Figure 5.3.7 reveals that the center determined by the MSD
function is relatively close to the center determined by the RMD function, but the similarities end
there. The MSD function assigns the lowest ranks (indicating outlyingness) to points along the
outer limits of entire data cloud consisting of both IC and OC points. This disrupts the entire
ranking scheme, resulting in several IC points being assigned ranks which suggest outlyingness,
and several OC points receiving ranks which indicate centrality. For example, consider the point
located at approximate coordinates (-0.75, -0.50). This point is assigned a rank of 19 by the
MSD function, which indicates a high degree outlyingness, and a distinctly different rank of 11
by the RMD function, which strongly suggests that the point belongs to the IC cluster.
-1 0 1 2 3 4 5
-1
- 0 . 5
0
0 . 5
1
1 . 5
2
X 1
X
2
5 , 5
2 , 1
3 , 1 3 1 5 , 7
1 4 , 8
8 , 9
4 , 1 4
7 , 1 0
1 1 , 1 2
6 , 4
1 6 , 6
1 , 2 9 , 3
1 9 , 1 1
1 8 , 1 9
1 0 , 1 5
1 2 , 1 6
1 3 , 1 7
2 0 , 2 0
1 7 , 1 8
I C P o i n t s
O C P o i n t s
86
Further analysis was performed by simulating m = 200 subgroups of size n = 5 of
bivariate lognormal data with both 5% and 30% randomly directed, sustained mean shifts with ?
= 4. For each scenario, a scatterplot was constructed of ranks determined by the MSD function
versus ranks determined by the RMD function. A straight line was drawn to represent the path
the plotted ranks would follow if both depth functions generated equivalent rankings for each
observation. Results are provided in Figure 5.3.8.
Figure 5.3.8 Scatterplots of MSD vs. RMD Ranks for Shifted Bivariate LGN Data
In the case of a 5% sustained shift, MSD and RMD rankings are in general agreement for
the lowest and highest rankings, and there is a moderate amount of variation in the middle.
However, with a 30% sustained shift, the differences in depth functions become more apparent.
There is much more variability overall, but the rankings assigned to the OC points are especially
notable. The MSD function consistently assigns higher rankings to the OC points than does the
RMD function, as evidenced by the fact that most of the OC points fall well above the diagonal
100 200 300 400 500 600 700 800 900 1000
100
200
300
400
500
600
700
800
900
1000
M SD R a nk
R
M
D
R
a
nk
5% S u s t ai n e d S h i f t
I C P o in t s
O C P o in t s
100 200 300 400 500 600 700 800 900 1000
100
200
300
400
500
600
700
800
900
1000
M SD R a nk
R
M
D
R
a
nk
30% S u s t ai n e d S h i f t
I C P o in t s
O C P o in t s
87
line in the right panel of Figure 5.3.8. In other words, at the 30% contamination level, the MSD
function usually classifies OC points as more central than they truly are. Because of this, many
of the IC points correspondingly receive lower rankings (incorrectly suggesting outlyingness)
from the MSD function than from the RMD function. The rankings determined by the MSD and
RMD functions are similar only for the most extreme OC points (rankings near 1000). Although
simulation results vary with other randomly generated shift directions, the general conclusion
remains the same -- a more robust depth function is needed for skewed data with contamination
levels exceeding 15%.
Since the MMR chart using RMD did not break down at the 30% contamination level
with symmetric distributions, it was decided to rerun the skewed distribution scenarios depicted
in Figure 5.3.5 using RMD instead of MSD as the depth function. The results are displayed in
Figure 5.3.9.
88
Figure 5.3.9 MMR-MSD/RMD Chart Performance on Increasingly Shifted LGN Data
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with a 5% SS
MSD (20,5)
MSD (100,5)
MSD (200,5)
RMD (20,5)
RMD (100,5)
RMD (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with a 15% SS
MSD (20,5)
MSD (100,5)
MSD (200,5)
RMD (20,5)
RMD (100,5)
RMD (200,5)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with a 30% SS
MSD (20,5)
MSD (100,5)
MSD (200,5)
RMD (20,5)
RMD (100,5)
RMD (200,5)
89
For a 5% sustained shift of the mean, the MMR-RMD chart is less effective than the
MMR-MSD chart in detecting shifts of magnitude ? = 0.5 - 1 and marginally better in detecting
shifts of magnitude ? = 1.5 - 2. The same is true for a 15% level of contamination, but the
differences in chart performance are slightly magnified. When 30% of the data are shifted,
however, the MMR-RMD chart is clearly the better alternative because it does not break down in
the presence of severe contamination levels. The MMR-RMD chart's performance as compared
to Hotelling's T2 chart with a 30% sustained shift of the mean is illustrated in Figure 5.3.10. The
MMR-RMD chart clearly outperforms Hotelling's T2 chart for m ? 100, but more importantly
offers reasonable distribution-free performance for all m even in the presence of severe
contamination levels.
Figure 5.3.10 MMR-RMD Chart Performance on Bivariate LGN Data with a 30% SS
When the dimension is increased to five, the same trends in MMR-MSD chart
performance under sustained shifts of the mean in skewed data are observed, along with the
slight loss in power which accompanies increased dimensionality. Figure 5.3.11 shows the
results of applying the MMR-MSD and Hotelling's T2 charts to a five-dimensional lognormally
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate Lognormal Process with a 30% SS
RMD (20,5)
RMD (100,5)
RMD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
90
distributed process with a 15% sustained shift of the mean. At least 100 subgroups, as opposed
to m ? 20 in the bivariate case, are required for MMR-MSD chart performance to surpass
Hotelling's T2 chart performance.
Figure 5.3.11 Control Chart Performance on LGN Data with a 15% SS in p = 5
Based on these results, it is concluded that when dealing with skewed data containing
sustained mean shifts, the MMR-MSD chart is preferred for contamination levels up to 15%, and
the MMR-RMD chart is the best option if the contamination level is suspected to exceed 15%.
Alternatively, both MMR-MSD and MMR-RMD charts could be run on the same data set in
order to provide maximum detection capability for all possible contamination levels.
Complete tables of results for all simulations performed using the multivariate lognormal
distribution with sustained shifts of the mean are provided in Appendices V - Y. In addition, a
matrix of recommended control chart usage with skewed multivariate data under both isolated
and sustained shifts of the mean is provided in Table 5.3.1. The MMR-MSD chart is almost
always preferred for contamination levels of 15% or less, and the MMR-RMD chart can be used
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Lognormal Process with a 15% SS in p=5
MSD (20,5)
MSD (100,5)
MSD (200,5)
HT2 (20,5)
HT2 (100,5)
HT2 (200,5)
91
for higher contamination levels when the number of subgroups is sufficiently large. For the few
cases in which Hotelling's T2 chart outperforms the MMR chart, more research is necessary
because implementation of Hotelling's T2 chart with empirical UCL is only possible if the exact
process distribution is known.
Table 5.3.1 Recommended Phase I Control Chart Usage for Skewed Multivariate Data
5.4 MMR Chart Performance with Larger Subgroup Sizes
In order to assess the effects of larger subgroup sizes on MMR chart performance, a
targeted analysis using m = 100 and n = 5(5)20 was undertaken. The MMR-RMD chart was
evaluated under both isolated and 15% sustained shifts of the mean in five-dimensional t(3) and
lognormally distributed processes. Simulation results reveal that increasing the subgroup size for
a given m enhances the performance of both the MMR and Hotelling's T2 charts, but the MMR
chart reigns supreme in all cases considered.
As exhibited in Figure 5.4.1, the empirical probability of an MMR-RMD or Hotelling's
T2 chart detecting an isolated shift in five dimensions is raised substantially by increasing the
subgroup size from 5 to 20. The difference in performance between the MMR-RMD and
C on t am i n at i on
L e ve l ( 20,5) ( 100,5) ( 200,5)
I S
5% S S
15% S S
30% S S H T 2
I S
5% S S
15% S S
30% S S H T 2 M M R - R M D
p
2
5
M M R - M S D
R M D
M M R - M S D
( m ,n )
92
Hotelling's T2 charts is smallest when n = 20, but the MMR-RMD chart remains the superior
alternative throughout the range of subgroup sizes evaluated. The overall trends for detection of
isolated shifts in heavy-tailed and skewed processes are very similar, although shifts of smaller
magnitude are detected more rapidly in a skewed process.
Figure 5.4.1 Effects of Subgroup Size on Control Chart Performance Under an IS in p = 5
A comparable pattern of performance is witnessed in detection of 15% sustained shifts of
the mean by the MMR-RMD and Hotelling's T2 charts. Figure 5.4.2 shows that increasing the
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
t(3) Process with an IS in p = 5
RMD (100,5)
RMD (100,20)
HT2 (100,5)
HT2 (100,20)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Lognormal Process with an IS in p = 5
RMD (100,5)
RMD (100,20)
HT2 (100,5)
HT2 (100,20)
93
subgroup size raises the EAP for both charts considerably, but the MMR-RMD chart always
performs better than Hotelling's T2 chart with empirically adjusted UCL.
Figure 5.4.2 Effects of Subgroup Size on Chart Performance Under a 15% SS in p = 5
These results are somewhat surprising, as one might think that increasing the subgroup
size to n = 20 would result in approximate normality of subgroup averages. This in turn would
make a normal-theory method such as Hotelling's T2 chart a better option than the MMR chart.
Although normality of subgroup averages will eventually be achieved for sufficiently large n due
to the central limit theorem, it is unlikely that subgroup sizes n > 20 will be observed in reality.
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
t(3) Process with a 15% SS in p = 5
RMD (100,5)
RMD (100,20)
HT2 (100,5)
HT2 (100,20)
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Lognormal Process with a 15% SS in p = 5
RMD (100,5)
RMD (100,20)
HT2 (100,5)
HT2 (100,20)
94
For more practical subgroup sizes such as 5 ? n ? 20, the distribution-free MMR chart is clearly
the best alternative. Complete tables of results for all subgroup size analyses performed are
provided in Appendices Z - BB.
5.5 Robust Estimators of Location and Scatter for the MMR Chart
It was originally decided to use the BACON method of Billor et al. (2000) to robustly
estimate both the mean vector and covariance matrix for use with the MMR chart. However, it
was later determined that using the BACON location estimator with Type I error probability ??=
0.10 and Hotelling's T2 scatter estimator S would result in significantly enhanced MMR chart
performance. This choice of robust estimators was briefly addressed in Chapter 2, and will be
discussed in detail here.
In early test runs, the MMR-RMD chart using strictly BACON estimators was compared
to Hotelling's T2 chart with empirically adjusted UCL using a bivariate t(3) process with a
sustained shift of the mean. The BACON method of estimation with ??= 0.05 performed nearly
perfectly in detecting large process shifts (????8) and subsequently excluding OC points from the
resulting location and scatter estimates. With smaller shifts (??< 8) however, the BACON
method did not consistently identify outlying points, often resulting in estimated mean vectors
and covariance matrices which were approximately equivalent to the classical nonrobust
estimates. The contamination in the estimated parameters resulted in degraded performance of
the MMR chart and as indicated in Figure 5.5.1, this effect was magnified as the level of
contamination in the data set was raised from 15% to 30%. Limited testing of the MCD method
to determine robust location and scatter estimates yielded similar results at the cost of a
significantly higher computational burden.
95
Figure 5.5.1 Comparison of MMR-RMD (Using BACON Estimators) and HT2 Charts
Figure 5.5.2 shows why it is so difficult for even robust methods to distinguish IC from
OC data when ? is small. The univariate t(3) plots represent probability density functions for
various unshifted and shifted t(3) distributions. The bivariate graphs were created by randomly
generating 500 observations from a bivariate t(3) distribution and inducing a location shift upon
15% of the data. In the first row of Figure 5.5.2, a one unit shift is barely distinguishable. In the
second row, a four unit shift is more noticeable but still results in significant overlap between
unshifted and shifted data. It takes an eight unit shift, as depicted in the third row of Figure
5.5.2, to clearly separate shifted data from unshifted data.
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with 15% and 30% SS
RMD 15% SS
RMD 30% SS
HT2 15% SS
HT2 30% SS
96
Figure 5.5.2 The Effects of Increasing Shift Sizes on Univariate and Bivariate t(3) Data
Also illustrated in Figure 5.5.1, Hotelling's T2 control chart with empirical UCL is
substantially less affected by higher contamination levels than the MMR-RMD chart. This is
because Hotelling's scatter estimator for data consisting of m subgroups of size n,
1
1 ,m i
im ?? ?SS
where Si has (k,l)th entry ? ?? ?
1
1 ,1 n j k i j l i
jn ?
???? ? X X X X is robust to location shifts under the
assumption that shifted subgroups possess the same covariance structure as unshifted subgroups.
S represents the average of the m subgroup covariance matrices, each of which is computed
-4 -2 0 2 4 6 8 10 12 14
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
x
f
(x)
t ( 3 ) t ( 3 ) + 1
-6 -4 -2 0 2 4 6 8 10 12 14
- 1 0
-5
0
5
10
X 1
X
2
t ( 3 ) v s . t ( 3 ) + 1
I C P o in t s
O C P o in t s
-4 -2 0 2 4 6 8 10 12 14
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
x
f
(x)
t ( 3 ) t ( 3 ) + 4
-6 -4 -2 0 2 4 6 8 10 12 14
- 1 0
-5
0
5
10
X 1
X
2
t ( 3 ) v s . t ( 3 ) + 4
I C P o in t s
O C P o in t s
-4 -2 0 2 4 6 8 10 12 14
0
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
x
f
(x)
t ( 3 ) t ( 3 ) + 8
-6 -4 -2 0 2 4 6 8 10 12 14
- 1 0
-5
0
5
10
X 1
X
2
t ( 3 ) v s . t ( 3 ) + 8
I C P o in t s
O C P o in t s
97
with respect to its subgroup mean iX rather than the mean of the entire data set X as with the
classical covariance estimator S, which has (k,l)th entry ? ?? ?
1
1 .1 N j k j l
jN ?
???? ? X X X X
Accordingly, S is not inflated by OC subgroups as are classical methods which consider the
data set as a whole or robust methods which fail to exclude outliers.
This result is true only for subgrouped data. When individual data are encountered in a
control charting application, Hotelling's T2 scatter estimator becomes the nonrobust classical
covariance matrix. Under those circumstances, robust parameter estimation methods such as
BACON may be preferred because they exclude OC points corresponding to shifts with large ?.
Based on these findings, it was decided to substitute Hotelling's T2 scatter estimator S
for the BACON scatter estimator in the MMR chart. To achieve a more robust location estimate
for the MMR chart, the BACON method was implemented with a higher Type I error
probability. Experimentation with the BACON method using ??= 0.05, 0.10, 0.20, and 0.35
showed that ??= 0.10 provides the best compromise between Type I and Type II error. As
indicated in Figure 5.5.3, implementation of the MMR-RMD chart using the new estimators
results in significantly enhanced performance over the MMR-RMD chart using strictly BACON
estimators, especially when the contamination level is high.
98
Figure 5.5.3 Improvement in MMR-RMD Chart Performance with New Estimators
Surprisingly, even with the new estimators, 30% sustained shifts of the mean are detected
by both charts with lower probability than 15% sustained shifts. In the case of Hotelling's T2
chart, this occurs because Hotelling's T2 scatter estimator is naturally robust but Hotelling's T2
location estimator
1
1 m i
im ?? ?XX
is equivalent to the classical mean vector and is therefore
nonrobust. To verify this, the charts in Figure 5.5.1 were repeated using a known mean vector of
all zeros. As expected, Figure 5.5.4 shows that 30% sustained shifts are detected by Hotelling's
T2 chart with higher probability than 15% sustained shifts when the mean vector is known, yet
the same does not hold true for the MMR-RMD chart. Additional experimentation revealed that
this occurs because of the redistribution of the ranks assigned to depth values during the MMR
control charting process, and is simply an unavoidable consequence of rank-based control
charting.
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with 15% and 30% SS
BACON/HT2 Est
15% SS
BACON Only
15% SS
BACON/HT2 Est
30% SS
BACON Only
30% SS
99
Figure 5.5.4 Change in Chart Performance When the Mean is Known
To illustrate the redistribution of ranks in conjunction with higher contamination levels,
100 observations consisting of m = 20 subgroups of size n = 5 from an in-control bivariate
standard normal process were simulated. Depth values for each point were computed using
RMD with BACON location estimator (? = 0.10) and Hotelling's T2 scatter estimator, and ranks
were assigned to each point from nearest (rank = 1) to farthest (rank = 100) from the center.
Next, 5% of the data were shifted by three units to the right, RMD values were recomputed, and
new ranks were recorded. Finally, this process was repeated using a 30% contamination level.
For both 5% and 30% shifts, Figure 5.5.5 illustrates scatterplots and rank charts of IC and
OC data before and after the shifts. If the rankings of IC points were unaffected by the shifts as
one might expect, they would follow a straight line on the before versus after rank charts. For
the 5% shift in column one this is nearly the case, but the rank chart for the 30% shift in column
two shows that the IC rankings are significantly affected by the induced shift. This is true
because as previously illustrated in Figure 5.5.2, a shift of magnitude three is not large enough to
clearly separate the IC points from the OC points. Rather, many of the IC and OC points are
comingled, thus distorting the rankings and making it more difficult for the MMR chart to
0.00.1
0.20.3
0.40.5
0.60.7
0.80.9
1.0
Em
pirica
l Ala
rm
P
ro
ba
bil
ity
Noncentrality Parameter
Bivariate t(3) Process with Known Mean
RMD 15% SS
RMD 30% SS
HT2 15% SS
HT2 30% SS
100
distinguish between them. As the level of contamination in the data is raised, the level of
distortion in the rankings increases and MMR chart performance decreases accordingly.
Figure 5.5.5 Redistribution of Ranks Under 5% and 30% Sustained Shifts
Despite this effect, the MMR chart using data depth with BACON location estimator (?
= 0.10) and Hotelling's T2 scatter estimator is much more effective than Hotelling's T2 chart in
detecting isolated and sustained shifts of the mean in both symmetric and skewed multivariate
data. This was illustrated throughout Chapter 5 for various combinations of m = 20, 50(50)200,
Scatterplot Before 5% Shift
IC
OC
Scatterplot Before 30% Shift
IC
OC
Scatterplot After 5% Shift
IC
OC
Scatterplot After 30% Shift
IC
OC
0
20
40
60
80
100
0 20 40 60 80 100R
an
k o
f sh
ift
ed
da
ta
Rank of unshifted data
Rank Chart for 5% Shift
IC
OC 0
20
40
60
80
100
0 20 40 60 80 100R
an
k o
f sh
ift
ed
da
ta
Rank of unshifted data
Rank Chart for 30% Shift
IC
OC
101
n = 5(5)20, and p = 2, 5, and 10. The MMR chart has the added advantage of being distribution
free, unlike Hotelling's T2 chart which has to be tailored to the specific process distribution under
study. In order to illustrate a complete application of the MMR chart, an example is offered in
the following chapter.
102
6 An Example Phase I Analysis Using the MMR Chart
6.1 Simulating the Contaminated Reference Sample
In order to demonstrate an application of the MMR-MSD chart from start to finish, a
simulated example involving a five-dimensional, lognormally distributed reference sample with
m = 100, n = 5, and three isolated shifts of the mean is presented. Data and shift directions were
generated in accordance with the procedures outlined in Chapter 4. Isolated shifts of increasing
magnitude were applied to single subgroups as follows: ? = 3 at subgroup 4, ? = 5 at subgroup
41, and ? = 200 at subgroup 91. The shift of magnitude ??= 3 represents the smallest shift for
which the MMR-MSD chart was shown in Chapter 5 to have a nearly perfect detection ability,
and the shift of magnitude ? = 200 is designed to illustrate the sensitivity of robust and nonrobust
estimators to extreme outliers. Using a desired IC FAP of 0.05, the MMR-MSD chart using
UCLs from Table 6.1.1 was compared to Hotelling's T2 chart with Alt's (1976) Phase I UCL.
Table 6.1.1 MMR Chart UCLs for Chapter 6 Example
UCL S i m u l at e d F A P
100 5 2.992 0.0483
99 5 2.99 0.0484
98 5 2.987 0.0482
97 5 2.986 0.0485
m n D e s i r e d F A P = 0 .0 5
103
6.2 Removing Outliers from the Sample
MMR-MSD and Hotelling's T2 charts applied to the unedited reference sample are
pictured in Figure 6.2.1. Each chart contains a superimposed table of potential OC subgroups.
Figure 6.2.1 Initial Application of Phase I Control Charts to the Lognormal Sample
The three shifted subgroups are readily apparent on the MMR-MSD chart, as they all fall above
the initial UCL for m = 100, n = 5. The extreme outlier represented by subgroup 91 does not
look considerably different than the other two outliers for several reasons. First of all, its
extreme outlyingness is mitigated by the rank-based nature of the MMR-MSD control chart
statistic. A rank from 1 to N assigned to a point in a reference sample represents only the
-3
-2
-1
0
1
2
3
4
0 20 40 60 80 100
Co
ntr
ol
Cha
rt
Sta
tis
tic
Subgroup
MMR-MSD Chart
MMR Stat
MMR UCL
i Z i UCL
4 3.156 2.992
41 3.598 2.992
91 3.850 2.992
0
10
20
30
40
50
60
0 20 40 60 80 100
Co
ntr
ol
Cha
rt
Sta
tis
tic
Subgroup
Hotelling's T2 Chart
HT2 Stat
HT2 UCL
i T i 2 UCL
4 151.25 22.59
41 238.79 22.59
91 268,471 22.59
104
position of that point with respect to the other N = m x n points in the sample as determined by a
depth function -- the degree of outlyingness is not reflected in the ranking. The most outlying
point in a data set will receive a rank of N, regardless of whether the point is only marginally
more outlying than all others or a significant distance away from the rest of the p-dimensional
data cloud. Also, computation of MSD does not involve estimation of a location vector, hence
its robustness to isolated shifts in location no matter how extreme. As was shown in Chapter 5,
very high contamination levels can redistribute the ranks in such a manner that the MMR-MSD
chart becomes ineffective at detecting sustained shifts, but extreme isolated shifts are detected
with ease. Even if the RMD function (which does require an estimated mean vector to compute)
was used in this scenario, the BACON location estimator would exclude the extreme outlier
represented by subgroup 91 and the resulting MMR-RMD control chart would be very similar to
the MMR-MSD chart. As a result of these properties, the MMR chart is well insulated against
the effects of a single extreme outlier in a given reference sample.
With Hotelling's T2 chart, however, the extreme outlier has a dramatic effect on the T2
statistic for each subgroup, as evidenced by the fact that the majority of the control chart
statistics fall above the initial UCL. This occurs because the grand mean X
used in computing
? ? ? ?12 iiiTn ??? ? ?X X S X X is not robust to outliers. A more robust estimator for the mean
vector such as BACON could prevent this from occurring, but is beyond the scope of this
research.
The next step in a Phase I analysis is to investigate each potential OC subgroup for an
assignable cause. In this example, it is assumed that all potential OC subgroups have assignable
causes and therefore warrant removal from the data set. Some control chart authors advocate
105
removing all OC subgroups at once, and then recalculating control limits. Others believe that
OC subgroups should be removed one at a time beginning with the most outlying subgroup, with
the control limits being recalculated at each iteration. This example will take the latter approach.
The most extreme OC subgroup for both the MMR-MSD and Hotelling's T2 control
charts is subgroup 91, so it will be removed first. Once an OC subgroup is removed from the
data set, both control charts are reconstructed using control limits appropriate for the reduced
number of subgroups. The control charts for m = 99, n = 5 after removal of the first OC
subgroup are depicted in Figure 6.2.2.
Figure 6.2.2 Second Iteration of the MMR-MSD Control Chart
-3
-2
-1
0
1
2
3
4
0 20 40 60 80 100
Co
ntr
ol
Cha
rt
Sta
tis
tic
Subgroup
MMR-MSD Chart
MMR Stat
MMR UCL
i Z i UCL
4 3.152 2.990
41 3.661 2.990
0
10
20
30
40
50
60
0 20 40 60 80 100
Co
ntr
ol
Cha
rt
Sta
tis
tic
Subgroup
Hotelling's T2 Chart
HT2 Stat
HT2 UCL
i T i 2 UCL
4 151.25 22.57
41 238.79 22.57
106
After removing the extreme outlier, control chart statistics for the remaining two planted
outliers still exceed the UCLs for both the MMR-MSD and Hotelling's T2 control charts. Next,
the outlier represented by subgroup 41 is removed and both control charts are recalculated using
m = 98, n = 5. Finally, the outlier represented by subgroup 4 is eliminated and the control charts
are recomputed using m = 97, n = 5. The final MMR-MSD and Hotelling's T2 control charts
after sequentially removing all planted outliers are illustrated in Figure 6.2.3.
Figure 6.2.3 Final Control Charts After Four Iterations of Phase I Analysis
At this point, all control chart statistics for the MMR-MSD chart fall under the UCL, so
the remaining reference sample consisting of m = 97 subgroups is correctly declared to be IC.
-3
-2
-1
0
1
2
3
4
0 20 40 60 80 100
Co
ntr
ol
Cha
rt
Sta
tis
tic
Subgroup
MMR-MSD Chart
MMR Stat
MMR UCL
0
10
20
30
40
50
60
0 20 40 60 80 100
Co
ntr
ol
Cha
rt
Sta
tis
tic
Subgroup
Hotelling's T2 Chart
HT2 Stat
HT2 UCL
i T i 2 UCL
58 24.82 22.53
67 34.65 22.53
82 32.98 22.53
90 24.03 22.53
107
Hotelling's T2 chart, despite following the same outlier removal process as the MMR-MSD chart,
still identifies four potential OC subgroups after the third iteration of a Phase I analysis. Further
iterations could result in the identification of even more potential OC subgroups because the
Phase I UCL for Hotelling's T2 chart is adjusted downward as the number of subgroups is
decreased with each iteration.
6.3 Analyzing the Results
Hotelling's T2 chart falsely identifies multiple potential OC subgroups because normal-
theory Phase I UCLs were applied to skewed data, illustrating the danger of applying a normal-
theory method without regard to the underlying distribution of a process. Using UCLs
empirically tailored to a five-dimensional multivariate lognormal distribution would solve the
problem of multiple false alarms, but would also result in a loss in detection power as only the
first two OC subgroups would be identified and removed. In addition, the exact process
distribution would not be known in anything but a simulation example such as the one presented
here, so empirical UCLs for Hotelling's T2 chart are not practical for widespread implementation.
The MMR chart is clearly a superior alternative because it offers accurate, distribution-free
performance with a low computational burden.
108
7 Conclusion
7.1 Synopsis of Findings
The MMR chart for detecting location shifts in subgrouped data represents the first
known distribution-free Phase I multivariate control chart. This work represents the culmination
of extensive research to synthesize appropriate statistical process control techniques, data depth
functions, and robust parameter estimation methods to create a distribution-free, computationally
feasible, and accurate Phase I multivariate control charting methodology. The MMR chart has
been shown to be extremely effective in detecting isolated and sustained shifts of the mean in
both heavy-tailed and skewed multivariate data.
7.2 Summary of Research Conducted
The MMR chart was created as a multivariate extension of Jones-Farmer et al.'s (2009)
univariate distribution-free Phase I mean-rank chart for subgroup location. Given an unedited p-
dimensional reference sample consisting of m subgroups of size n, data depth functions in
conjunction with robust estimators were used to reduce multivariate data to univariate depth
values. The robust Mahalanobis depth function for elliptically symmetric data was implemented
using the BACON location estimator and Hotelling's T2 scatter estimator for subgrouped data.
The Mahalanobis spatial depth function, which is not reliant on distributional assumptions and
does not require a location estimator, was employed using Hotelling's T2 scatter estimator for
109
subgrouped data. Depth values resulting from these functions were ranked and converted into
MMR control chart statistics for each subgroup, which were then compared to empirical UCLs
determined through simulation of the joint distribution of the MMR control chart statistic.
Hotelling's T2 control chart with Alt's (1976) Phase I UCLs for normally distributed data
and empirically adjusted UCLs for nonnormally distributed data was used to establish a baseline
level of Phase I performance. Performance comparisons of the MMR chart to Hotelling's T2
chart included scenarios involving simulated multivariate normally distributed data, heavy-tailed
data represented by the multivariate t(3) distribution, and skewed data represented by the
multivariate lognormal distribution for m = 20, 50(50)200 subgroups of size n = 5 and
dimensions p = 2, 5, and 10. All data were standardized, without loss of generality, to have zero
mean vector and identity covariance matrix. IC performance was measured by each chart's
ability to maintain the desired FAP using simulated IC data. OC performance was measured by
each chart's EAP under isolated as well as 5%, 15%, and 30% sustained shifts of the mean
assuming constant within-subgroup covariance. Shifts were fixed in a specific direction with
elliptically symmetric distributions without loss of generality, and averaged over a uniform
distribution of shift directions with skewed distributions. Limited analysis was performed on the
effect of increased subgroup sizes on control chart performance in Phase I.
7.3 Recommendations for Phase I Analysis
A comprehensive simulation study shows that when normality of Phase I multivariate
process data can be established, Hotelling's T2 chart with Alt's (1976) Phase I UCL is preferred
for detecting isolated or sustained shifts of the mean. This is not unexpected, as one would
expect a normal-theory method to outperform a distribution-free method when a process is
110
multivariate normally distributed, and the original intent of the MMR chart was to provide a
distribution-free control charting methodology for processes demonstrating clear departures from
normality.
When Phase I process data are heavy-tailed or skewed, the MMR chart usually
outperforms Hotelling's T2 chart in detecting isolated or sustained shifts of the mean. More
importantly, the MMR chart offers truly distribution-free performance because the UCL for a
given application depends only on the number of subgroups, the size of each subgroup, and the
desired IC FAP without regard to the form of the underlying process distribution. UCLs for
Hotelling's T2 chart, on the contrary, must be empirically tailored to the exact distribution of a
nonnormally distributed process to achieve the desired IC FAP, something which is only possible
in a simulation environment. An added benefit of the MMR chart is that, for a given OC
scenario involving nonnormally distributed data, its performance is far more invariant to the size
of m than Hotelling's T2 chart with empirical UCL, thus making it even more attractive as a
distribution-free alternative.
As indicated in Table 5.2.1, the MMR-RMD chart is recommended for most situations
involving heavy-tailed data as long as the required minimum number of subgroups is present. As
shown in Table 5.3.1, when process data are skewed, the MMR-MSD chart is almost always
recommended if the contamination level is less than 15%, and the MMR-RMD chart is preferred
for contamination levels above 15% if the number of subgroups is sufficiently large. In all cases
tested, as the dimension of the data or the level of contamination is raised, the minimum number
of subgroups required for the MMR chart to achieve superiority over Hotelling's T2 chart with
empirical limits correspondingly increases but remains within reasonable bounds. These general
conclusions are based on a subgroup size of at least n = 5. Larger subgroup sizes reduce the
111
minimum number of subgroups required for MMR chart performance to surpass that of
Hotelling's T2 chart.
7.4 Recommendations for Phase II Monitoring
Once an IC reference sample has been determined through a successful Phase I analysis
using the MMR chart, it can be used in conjunction with an appropriate Phase II method to
monitor future observations for any departures from the IC state. As noted by C. Champ
(personal communication, May 12, 2011), since more is known about a process at the conclusion
of a Phase I analysis, the form of a Phase II control chart does not necessarily have to match the
form of a Phase I control chart. Although this research will assume nonnormally distributed data
throughout the retrospective analysis and monitoring phases, this means that even though the
Phase I MMR chart is specifically designed for multivariate data collected in subgroups, the
search for the most suitable Phase II complement to the Phase I MMR chart need not be limited
to methods requiring subgrouped multivariate data.
After an extensive literature review, the MEWMA chart proposed by Lowry et al. (1992),
with small smoothing parameter as recommended by Stoumbos and Sullivan (2002), is
recommended for Phase II monitoring because it is easy to understand and implement, well
documented in statistical process control literature, and robust to the underlying process
distribution. The MEWMA control chart statistic represents a weighted average of all Phase II
observations, with the most recent observation assigned a weight equal to the smoothing constant
r and all previous observations assigned weights which geometrically decrease according to their
age. Stoumbos and Sullivan (2002) showed that the MEWMA chart can be successfully applied
to nonnormally distributed individual or subgrouped multivariate data if a sufficiently small
112
smoothing constant is chosen. Based on the results of a comprehensive simulation exercise, the
authors recommend a smoothing constant of ? ?0.02, 0.05r? for five or less dimensions and r ?
0.02 for more than five dimensions for reliable detection of sustained location shifts in heavy-
tailed or skewed multivariate data. A subsequent study by Testik et al. (2003) mirrored the
findings of Stoumbos and Sullivan (2002) regarding use of the MEWMA chart as a robust Phase
II method.
It should be noted that all three aforementioned MEWMA chart studies are based on the
assumption that the IC mean vector and covariance matrix are known. If the MEWMA chart is
employed following a Phase I analysis using the MMR chart, the mean vector and covariance
matrix are not known but rather estimated from an IC reference sample. If the IC reference
sample is too small, using estimated as opposed to known parameters can lead to more frequent
false alarms and a lower probability of detecting of OC conditions, especially when the
smoothing constant is small. For the univariate EWMA chart, this effect was detailed by Jones,
Champ, and Rigdon (2001), and design strategies to alleviate this problem were offered by Jones
(2002). For the MEWMA chart, Champ and Jones-Farmer (2007) showed that widening the
control limits through simulation to account for the additional variability introduced by the use of
estimated parameters results in nearly the same performance as the known parameter case. An
analytical method of determining control limits for the MEWMA chart with estimated
parameters as well as the minimum sample size required for estimated parameter performance to
equal known parameter performance are topics for future research. Despite these open issues,
the MEWMA chart represents the most broadly applicable control charting methodology for
Phase II monitoring of nonnormally distributed multivariate data.
113
A potential criticism of the MEWMA chart is that using a small smoothing parameter to
improve robustness to nonnormality decreases control chart sensitivity to large sustained mean
shifts and isolated outlying observations, but it can be argued that this is not a significant
disadvantage in a Phase II control charting scenario. As previously noted in Chapter 3 of this
document, Montgomery (2005, p. 386) characterizes control charts which accumulate
information from sequences of points (e.g. CUSUM, EWMA, and their multivariate
counterparts) as being ideally suited for Phase II monitoring because they are more sensitive to
small process shifts than Shewhart type charts which use information only from the most recent
observation. According to Montgomery (2005, p. 386), sensitivity to small shifts is desirable for
a Phase II control chart because in contrast to Phase I, "assignable causes do not typically result
in large process upsets or disturbances" in Phase II. If greater control chart sensitivity to large
sustained mean shifts or individual outliers is desired, the reader is directed to the Chapter 1
discussion of Phase II nonparametric, distribution-free, and robust control charts. Although a
few such methods could potentially supplement the MEWMA control chart in certain scenarios,
none have proven as effective as the MEWMA chart with small smoothing constant on a wide
range of nonnormally distributed data in higher dimensions.
7.5 Future Research Directions
The MMR chart fills a notable gap in current multivariate quality control literature, yet
much work remains to be done in the field of distribution-free Phase I multivariate quality
control. Although it is believed that the fundamental structure of the MMR chart is sound,
potential refinements include further exploration of the BACON method to determine optimal
input parameters (e.g. Type I error probability) for maximum robustness to shifts of all
114
magnitudes, implementation of other location and scatter estimators to improve robustness to
higher contamination levels, and experimentation with alternative data depth functions which
may enhance MMR chart performance. Additionally, since the MMR chart is designed to detect
location changes in subgrouped multivariate data during Phase I, an equivalent distribution-free
chart for detecting scale changes is needed for Phase I scenarios in which the assumption of
constant within-subgroup covariance is not appropriate. Finally, Phase I distribution-free charts
for detecting both location and scale changes in small subgroups (n < 5) and individual
multivariate observations (n = 1) should be sought as well. It is the hope of this author that the
success of the MMR chart as the first proposed distribution-free Phase I multivariate method will
serve as the catalyst for some or all of this additional research.
115
References
Alfaro, J.L., & Ortega, J.F. (2008). A Robust Alternative to Hotelling's T2 Control Chart Using
Trimmed Estimators. Quality and Reliability Engineering International, 24, 601-611.
Aloupis, G. (2005, August). Geometric and Combinatorial Issues in Data Depth. Presented at
the Franco-Canadian Workshop on Combinatorial Algorithms, Hamilton, Ontario.
Aloupis, G. (2006). Geometric Measures of Data Depth. In DIMACS Series in Discrete
Mathematics and Theoretical Computer Science (Vol. 72, pp. 147-158). Providence, RI:
American Mathematical Society.
Alt, F.B. (1976). Small Sample Probability Limits for the Mean of a Multivariate Normal
Process. ASQC Technical Conference Transactions, pp. 170-176.
Bakir, S.T. (1989). Analysis of Means Using Ranks. Communications in Statistics ? Simulation
and Computation, 18(2), 757-776.
Beltran, L.A. (2006). Nonparametric Multivariate Statistical Process Control Using Principal
Component Analysis and Simplicial Depth. Dissertation, University of Central Florida.
Bersimis, S., Panaretos, J., & Psarakis, S. (2005). Multivariate Statistical Process Control Charts
and the Problem of Interpretation: A Short Overview and Some Applications in Industry.
Proceedings of the 7th Hellenic European Conference on Computer Mathematics and Its
Applications.
Bersimis, S., Psarakis, S., & Panaretos, J. (2007). Multivariate Statistical Process Control Charts:
An Overview. Quality and Reliability Engineering International, 23, 517-543.
Billor, N., Hadi, A.S., & Velleman, P.F. (2000). BACON: Blocked Adaptive Computationally
Efficient Outlier Nominators. Computational Statistics & Data Analysis, 34, 279-298.
Chakraborty, B., & Chaudhuri, P. (1999). A Note on the Robustness of Multivariate Medians.
Statistics & Probability Letters, 45, 269-276.
Champ, C.W., & Jones, L.A. (2004). Designing Phase I X Charts with Small Sample Sizes.
Quality and Reliability Engineering International, 20, 497-510.
Champ, C.W., & Jones-Farmer, L.A. (2007). Properties of Multivariate Control Charts with
Estimated Parameters. Sequential Analysis, 26, 153-169.
116
Chatterjee, S. & Qiu, P. (2009). Distribution-Free Cumulative Sum Control Charts Using
Bootstrap-Based Control Limits. The Annals of Applied Statistics, 3(1), 349-369.
Chenouri, S., & Steiner, S.H. (2009). A Multivariate Robust Control Chart for Individual
Observations. Journal of Quality Technology, 41(3), 259-271.
Chenouri, S., & Variyath, A.M. (2011). A Comparative Study of Phase II Robust Multivariate
Control Charts for Individual Observations. Quality and Reliability Engineering
International, 27(3) [Electronic version].
Chou, Y.M., Mason, R.L., & Young, J.C.(2001). The Control Chart for Individual Observations
from a Multivariate Non-Normal Distribution. Communications in Statistics ? Theory
and Methods, 30(8), 1937-1949.
Crosier, R.B. (1988). Multivariate Generalizations of Cumulative Sum Quality Control Schemes.
Technometrics, 30(3), 291-303.
Dai, Y., Zhou, C., & Wang, Z. (2006a). Multivariate CUSUM Control Charts Based on Data
Depth for Preliminary Analysis (Working paper).
Dang, X., & Serfling, R. (2010). Nonparametric Depth-Based Multivariate Outlier Identifiers,
and Masking Robustness Properties. Journal of Statistical Planning and Inference, 140,
198-213.
Donoho, D.L. & Huber, P.J. (1983). The Notion of a Breakdown Point. In P.J. Bickel, K.A.
Doksum and J.L. Hodges, Jr. (Eds.), A Festschrift for Eric L. Lehmann (pp. 157-184).
Belmont, CA: Wadsworth.
Fricker, R.D., & Chang, J.T. (2009a). The Repeated Two-Sample Rank (RTR) Procedure: A
Nonparametric Multivariate Individuals Control Chart (Working paper).
Gao, Yonghong (2003). Data Depth Based on Spatial Rank. Statistics and Probability Letters,
65(3), 217-225.
Genz, A. (2011). QSIMVNV. Retrieved April 22, 2011, from
http://www.math.wsu.edu/faculty/genz/software/software.html.
Gibbons, J.D., & Chakraborti, S. (2003). Nonparametric Statistical Inference (4th ed.). New
York: Marcel Dekker.
Hamurkaroglu, C., Mert, M., & Saykan, Y. (2004). Nonparametric Control Charts Based on
Mahalanobis Depth. Hacettepe Journal of Mathematics and Statistics, 33, 57-67.
Hawkins, D.M., & Maboudou-Tchao, E.M. (2007). Self-Starting Multivariate Exponentially
Weighted Moving Average Control Charting. Technometrics, 49(2), 199-209.
117
Hayter, A.J., & Tsui, K. (1994). Identification and Quantification in Multivariate Quality Control
Problems. Journal of Quality Control, 26, 197-208.
Hotelling, H. (1947). Multivariate Quality Control ? Illustrated By the Air Testing of Sample
Bombsights. In C. Eisenhart, M.W. Hastay, & W.A. Wallis (Eds.), Techniques of
Statistical Analysis (pp. 111-184). New York: McGraw-Hill.
Hugg, J., Rafalin, E., Seyboth, K., & Souvaine, D. (2006, January). An Experimental Study of
Old and New Depth Measures. Paper presented at the Workshop on Algorithm
Engineering and Experiments, Miami, FL.
Hugg, J., Rafalin, E., & Souvaine, D. (2006, July). Depth Explorer ? A Software Tool for the
Analyis of Depth Measures. Presented at the International Conference on Robust
Statistics, Lisbon, Portugal.
Jackson, J.E. (1991). A User Guide to Principal Components. New York: Wiley.
Jensen, W.A., Jones-Farmer, L.A., Champ, C.W., & Woodall, W.H. (2006). Effects of Parameter
Estimation on Control Chart Properties: A Literature Review. Journal of Quality
Technology, 38(4), 349-364.
Jensen, W.A., Birch, J.B., & Woodall, W.H. (2007). High Breakdown Estimation Methods for
Phase I Multivariate Control Charts. Quality and Reliability Engineering International,
23(5), 615-629.
Jobe, J.M., & Pokojovy, M. (2009). A Multistep, Cluster-Based Multivariate Chart for
Retrospective Monitoring of Individuals. Journal of Quality Technology, 41(4), 323-339.
Johnson, M.E. (1987). Multivariate Statistical Simulation. New York: Wiley.
Jones, L.A. (2002). The Statistical Design of EWMA Control Charts with Estimated Parameters.
Journal of Quality Technology, 34(3), 277-288.
Jones, L.A., & Woodall, W.H. (1998). The Performance of Bootstrap Control Charts. Journal of
Quality Technology, 30(4), 362-375.
Jones, L.A., Champ, C.W., & Rigdon, S.E. (2001). The Performance of Exponentially Weighted
Moving Average Charts with Estimated Parameters. Technometrics, 43(2), 156-167.
Jones-Farmer, L.A., Jordan, V., & Champ, C.W. (2009). Distribution-Free Phase I Control
Charts for Subgroup Location. Journal of Quality Technology, 41(3), 304-316.
Kruskal, W.H., & Wallis, W.A. (1952). Use of Ranks in One-Criterion Variance Analysis.
Journal of the American Statistical Association, 47, 583-621.
118
Law, A.M., & Kelton, W.D. (2000). Simulation Modeling and Analysis (3rd ed.). Boston:
McGraw-Hill.
Lehmann, E.L. (2006). Nonparametrics ? Statistical Methods Based on Ranks (revised 1st ed.).
New York: Springer Science+Business Media, LLC.
Li, J., & Liu, R. (2004). New Nonparametric Tests of Multivariate Locations and Scales Using
Data Depth. Statistical Science, 19(4), 686-696.
Liu, R.Y. (1990). On a Notion of Data Depth Based on Random Simplices. The Annals of
Statistics, 18, 405-414.
Liu, R.Y. (1995). Control Charts for Multivariate Processes. Journal of the American Statistical
Association, 90, 1380-1388.
Liu, R.Y., & Singh, K. (1993). A Quality Index Based on Data Depth and Multivariate Rank
Tests. Journal of the American Statistical Association, 88, 252-260.
Liu, R.Y., Singh, K., & Teng, J.H. (2004). DDMA-Charts: Nonparametric Multivariate Moving
Average Control Charts Based on Data Depth. Allgemeines Statistisches, 88, 235-258.
Lowry, C.A., Woodall W.H., Champ, C.W., & Rigdon, S.E. (1992). A Multivariate
Exponentially Weighted Moving Average Control Chart. Technometrics, 34, 46-53.
Lowry, C.A., & Montgomery, D.C. (1995). A Review of Multivariate Control Charts. IIE
Transactions, 27, 800-810.
Mahalanobis, P. C. (1936). On the Generalized Distance in Statistics. Proceedings of the
National Institute of Science of India, 12, 49-55.
Mason, R.L., Champ, C.W., Tracy, N.D., Wierda, S.J., & Young, J.C. (1997). Assessment of
Multivariate Process Control Techniques. Journal of Quality Technology, 29(2), 140-143.
Mason, R.L., Chou, Y.M., & Young, J.C. (2001). Applying Hotelling's T2 Statistic to Batch
Processes. Journal of Quality Technology, 33(4), 466-479.
Mason, R.L., & Young, J.C. (2002). Multivariate Statistical Process Control with Industrial
Applications. Alexandria, VA: American Statistical Association; Philadelphia, PA:
Society for Industrial and Applied Mathematics.
Messaoud, A., Weihs, C., & Hering, F. (2008). Detection of Chatter Vibration in a Drilling
Process Using Multivariate Control Charts. Computational Statistics & Data Analysis,
52(6), 3208-3219.
119
Mohammadi, M., Midi, H., Arasan, J., & Al-Talib, B. (2011). High Breakdown Estimators to
Robustify Phase II Multivariate Control Charts. Journal of Applied Sciences, 11(3), 503-
511.
Montgomery, D.C. (2005). Introduction to Statistical Quality Control (5th ed.). Hoboken, NJ:
Wiley.
Nedumaran, G., & Pignatiello, J.J. (2000). On Constructing T2 Control Charts for Retrospective
Examination. Communications in Statistics ? Simulation and Computation, 29(2), 621-
632.
Nedumaran, G., & Pignatiello, J.J. (2005). On Constructing Retrospective X Control Chart
Limits. Quality and Reliability Engineering International, 21, 81-89.
Oyeyemi, G.M., & Ipinyomi, R.A. (2010). A Robust Method of Estimating Covariance Matrix in
Multivariate Data Analysis. African Journal of Mathematics and Computer Science
Research, 3(1), 1-18.
Pignatiello, J.J., & Runger, G.C. (1990). Comparison of Multivariate CUSUM Charts. Journal of
Quality Technology, 22(3), 173-186.
Polansky, A.M. (2005). A General Framework for Constructing Control Charts. Quality and
Reliability Engineering International, 21, 633-653.
Qiu, P. (2008). Distribution-Free Multivariate Process Control Based on Log-Linear Modeling.
IIE Transactions, 40(7), 664-691.
Qiu, P., & Hawkins, D. (2001). A Rank-Based Multivariate CUSUM Procedure. Technometrics
43(2), 120-132.
Qiu, P., & Hawkins, D. (2003). A Nonparametric Multivariate Cumulative Sum Procedure for
Detecting Shifts In All Directions. The Statistician, 52(2), 151-164.
Quesenberry, C.P. (1997). SPC Methods for Quality Improvement. New York: Wiley.
Rafalin, E.K. (2005). Algorithms and Analysis of Depth Functions Using Computational
Geometry. Dissertation, Tufts University.
Rousseeuw, P.J. (1984). Least Median of Squares Regression. Journal of the American
Statistical Association, 79, 871-880.
Rousseeuw, P.J., & Ruts, I. (1996). Algorithm AS 307: Bivariate Location Depth. Applied
Statistics, 45, 516-526.
Rousseeuw, P.J., & Van Driessen, K. (1999). A Fast Algorithm for the Minimum Covariance
Determinant Estimator. Technometrics, 41, 212-223.
120
Rousseeuw, P.J., & Van Zomeren, B.C. (1990). Unmasking Multivariate Outliers and Leverage
Points. Journal of the American Statistical Association, 85, 633-651.
Schaffer, J.R. (1998, August). A Multivariate Application of the Q Chart. Paper presented at the
1998 Joint Statistical Meetings, Dallas, TX.
Serfling, R. (2002). A Depth Function and a Scale Curve Based on Spatial Quantiles. In Y.
Dodge (Ed.), Statistical Data Analysis Based on the L1-Norm and Related Methods (pp.
25-28). Berlin, Germany: Birkh?user.
Serfling, R. (2006). Depth Functions in Nonparametric Multivariate Inference. In DIMACS
Series in Discrete Mathematics and Theoretical Computer Science (Vol. 72, pp. 1-16).
Providence, RI: American Mathematical Society.
Serfling, R. (2010). Equivariance and Invariance Properties of Multivariate Quantile and Related
Functions, and the Role of Standardization. Journal of Nonparametric Statistics, 22, 915-
936.
Serfling, R., & Zuo, Y. (2010). Discussion. The Annals of Statistics, 38(2), 676-684.
Shewhart, W.A. (1939). Statistical Method from the Viewpoint of Quality Control. New York:
Dover Publications.
Stoumbos, Z. G., & Jones, L. A. (2000). On the Properties and Design of Individuals Control
Charts Based on Simplicial Depth. Nonlinear Studies, 7(2), 147-178.
Stoumbos, Z. G., & Sullivan, J.H. (2002). Robustness to Non-Normality of the Multivariate
EWMA Control Chart. Journal of Quality Technology, 34(3), 260-276.
Sullivan, J.H., & Woodall, W.H. (1996). A Comparison of Multivariate Control Charts for
Individual Observations. Journal of Quality Technology, 28(4), 398-408.
Sullivan, J.H., & Woodall, W.H. (1998). Adapting Control Charts for the Preliminary Analysis
of Multivariate Observations. Communications in Statistics ? Simulation and
Computation, 27(4), pp. 953-979.
Sullivan, J.H., & Jones, L.A. (2002). A Self-Starting Control Chart for Multivariate Individual
Observations. Technometrics, 44(1), 24-33.
Sun, R., & Tsung, F. (2003). A Kernel-Distance-Based Multivariate Control Chart Using
Support Vector Methods. International Journal of Production Research, 41(13), 2975-
2989.
Teng, H.C. (2000). New Methodology in Regression and Multivariate Quality Control Via Data
Depth. Dissertation, Rutgers University.
121
Testik, M.C., Runger, G.C., & Borror, C.M. (2003). Robustness Properties of Multivariate
EWMA Control Charts. Quality and Reliability Engineering International, 19, 31-38.
Testik, M.C., & Borror, C.M. (2004). Design Strategies for the Multivariate Exponentially
Weighted Moving Average Control Chart. Quality and Reliability Engineering
International, 20, 571-577.
Thissen, U., Swierenga, H., de Weijer, A.P., Wehrens, R., Melssen, W.J., & Buydens, L.M.C.
(2005). Multivariate Statistical Process Control Using Mixture Modelling [sic]. Journal
of Chemometrics, 19, 23-31.
Tracy, N.D., Young, J.C., & Mason, R.L. (1992). Multivariate Control Charts for Individual
Observations. Journal of Quality Technology, 24, 88-95.
Tukey, J. W. (1975). Mathematics and Picturing Data. In R. James (Ed.), Proceedings of the
1974 International Congress of Mathematicians (Vol. 2, pp. 523-531). Vancouver, BC.
Vardi, Y., & Zhang, C. (2000). The Multivariate L1-Median and Associated Data Depth.
Proceedings of the National Academy of Sciences of the USA, 97(4), 1423-1426.
Vargas, J.A. (2003). Robust Estimation in Multivariate Control Charts for Individual
Observations. Journal of Quality Technology, 35(4), 367-376.
Wierda, S.J. (1994). Multivariate Statistical Process Control ? Recent Results and Directions for
Future Research. Statistica Neerlandica, 48, 147-168.
Willems, G., Pison, G., Rousseeuw, P.J., & Van Aelst, S. (2002). A Robust Hotelling Test.
Metrika, 55, 125-138.
Wood, M., Kaye, M., & Capon, N. (1999). The Use of Resampling for Estimating Control Chart
Limits. Journal of the Operational Research Society, 50, 651-659.
Woodall, W.H., & Montgomery, D.C. (1999). Research Issues and Ideas in Statistical Process
Control. Journal of Quality Technology, 31(4), 376-386.
Yanez, S., Gonzalez, N., & Vargas, J.A. (2010). Hotelling's T2 Control Charts Based on Robust
Estimators. Dyna, 163, 239-247.
Zamba, K.D., & Hawkins, D.M. (2006). A Multivariate Change-Point Model for Statistical
Process Control. Technometrics, 48(4), 539-549.
Zarate, P.B. (2004). Design of Nonparametric Control Chart for Monitoring Multivariate
Processes Using Principal Components Analysis and Data Depth. Dissertation,
University of South Florida.
122
Zuo, Y. (2003). Projection-Based Depth Functions and Associated Medians. The Annals of
Statistics, 31(5), 1460-1490.
Zuo, Y., & He, X. (2006). On the Limiting Distributions of Multivariate Depth-Based Rank Sum
Statistics and Related Tests. The Annals of Statistics, 34(6), 2879-2896.
Zuo, Y., & Serfling, R. (2000). General Notions of Statistical Depth Functions. The Annals of
Statistics, 28(2), 461-482.
123
Appendices
Appendix A: MATLAB Code for Computing Robust Mahalanobis Depth
Appendix B: MATLAB Code for Computing Mahalanobis Spatial Depth
Appendix C: Expanded Table of Empirical UCLs for the MMR Chart
Appendix D: MATLAB Code for Finding Empirical UCLs for the MMR Chart
Appendix E: Empirical UCLs for Hotelling's T2 Chart
Appendix F: MATLAB Code for Finding Empirical UCLs for Hotelling's T2 Chart
Appendix G: MATLAB Code for Assessing MMR Chart Performance
Appendix H: MATLAB Code for Assessing Hotelling's T2 Chart Performance
Appendix I: Simulation Results Using In-Control Symmetric Data
Appendix J: Simulation Results Using Symmetric Data with an IS in p = 2
Appendix K: Simulation Results Using Symmetric Data with an IS in p = 5
Appendix L: Simulation Results Using Symmetric Data with an IS in p = 10
Appendix M: Simulation Results Using Symmetric Data with a 5% SS in p = 2
Appendix N: Simulation Results Using Symmetric Data with a 15% SS in p = 2
Appendix O: Simulation Results Using Symmetric Data with a 30% SS in p = 2
Appendix P: Simulation Results Using Symmetric Data with a 5% SS in p = 10
Appendix Q: Simulation Results Using Symmetric Data with a 15% SS in p = 10
Appendix R: Simulation Results Using Symmetric Data with a 30% SS in p = 10
Appendix S: Simulation Results Using In-Control Skewed Data
124
Appendix T: Simulation Results Using Skewed Data with an IS in p = 2
Appendix U: Simulation Results Using Skewed Data with an IS in p = 5
Appendix V: Simulation Results Using Skewed Data with a 5% SS in p = 2
Appendix W: Simulation Results Using Skewed Data with a 15% SS in p = 2
Appendix X: Simulation Results Using Skewed Data with a 30% SS in p = 2
Appendix Y: Simulation Results Using Skewed Data with a SS in p = 5
Appendix Z: Subgroup Size Analysis Using In-Control Data
Appendix AA: Subgroup Size Analysis Using Data with an IS in p = 5
Appendix BB: Subgroup Size Analysis Using Data with a 15% SS in p = 5
125
Appendix A: MATLAB Code for Computing Robust Mahalanobis Depth
function depth=computeRMDv1(X,Xbar_robust,S_robust)
% Computes the Robust Mahalanobis Depth (RMD) of each point in a multivariate
data set.
% Adapted by Richard Bell on 20100928 from code provided by Satyaki Mazumder
on 20100707.
% X is the multivariate reference data set.
% Xbar_robust is the robust location estimate.
% S_robust is the robust scatter estimate.
% Version 2 uses the square root in the Mahalanobis distance computation,
whereas Version 1 does not.
rows=length(X(:,1)); % identify the number of rows in the sample data set
depth=zeros(rows,1); % initialize the (rows x 1) vector of depth values for
speed
for i=1:rows
depth(i)=1/(1+((X(i,:)-Xbar_robust)/S_robust*(X(i,:)-Xbar_robust)'));
% compute the RMD for each observation in the sample; don't use the "mahal"
function in MATLAB because it uses the (nonrobust) sample mean vector and
covariance matrix
end
126
Appendix B: MATLAB Code for Computing Mahalanobis Spatial Depth
function depth=computeMSDfast(X,S_robust)
% Computes the Mahalanobis Spatial Depth of each point in a multivariate data
set.
% Adapted by Richard Bell on 20100928 from code provided by Satyaki Mazumder
on 20100707.
% X is the (N x p) multivariate reference data set.
% S_robust is the (p x p) robust scatter matrix, raised to the -1/2 power and
used as the transformation-retransformation functional.
Xtr=X/(sqrtm(S_robust)); % transform the data using the TR functional
[rows,cols]=size(Xtr); % store the dimensions of the transformed data set
depth=zeros(rows,1); % initialize the vector of depth values for speed
% implementation of the Mahalanobis Spatial Depth function
for i=1:rows % perform the outer loop for each x
e=zeros(rows,cols); % initialize the matrix of unit vectors from x to
all Xi's in the sample
for j=1:rows % perform the inner loop to compare each x to all Xi's in
the sample (including itself)
Euclid=norm(Xtr(i,:)-Xtr(j,:)); % compute the Euclidean distance
between the current x and all Xi's
if (Euclid~=0)
e(j,:)=(Xtr(i,:)-Xtr(j,:))/Euclid; % if the Euclidean distance
is nonzero, use it to normalize the distance between the current x and all
other Xi's in the sample
else
e(j,:)=0; % if the Euclidean distance is zero, x is being
compared to itself so the normalized distance is zero
end
end % end of inner loop
depth(i)=1-norm(mean(e)); % compute Mahalanobis Spatial Depth of the
point x as one minus the average of the unit vectors from x to all Xi's in
the sample
end % end of outer loop
127
Appendix C: Expanded Table of Empirical UCLs for the MMR Chart
UCL S i m u l at e d F A P UCL S i m u l at e d F A P
20 5 2.476 0.094 1 2.650 0.048 6
20 10 2.519 0.098 4 2.737 0.047 6
30 5 2.581 0.096 4 2.749 0.047 1
30 10 2.642 0.097 5 2.849 0.047 7
40 5 2.650 0.098 4 2.815 0.048 8
40 10 2.724 0.098 2 2.925 0.048 4
50 5 2.702 0.098 3 2.861 0.048 7
50 10 2.787 0.098 1 2.980 0.048 0
60 5 2.743 0.097 4 2.895 0.048 5
60 10 2.840 0.098 3 3.030 0.048 6
70 5 2.776 0.097 2 2.924 0.047 2
70 10 2.881 0.098 2 3.065 0.048 5
80 5 2.810 0.096 7 2.949 0.048 7
80 10 2.917 0.098 2 3.100 0.048 7
90 5 2.831 0.096 1 2.969 0.048 5
90 10 2.946 0.098 3 3.127 0.048 9
100 5 2.854 0.098 2 2.992 0.048 3
100 10 2.972 0.097 4 3.150 0.048 8
110 5 2.872 0.098 0 3.008 0.048 2
110 10 2.998 0.096 7 3.176 0.048 8
120 5 2.890 0.097 4 3.019 0.048 9
120 10 3.022 0.098 0 3.198 0.047 8
130 5 2.904 0.097 5 3.038 0.047 9
130 10 3.042 0.098 4 3.214 0.048 0
140 5 2.919 0.098 4 3.048 0.048 6
140 10 3.060 0.096 9 3.226 0.048 9
150 5 2.932 0.098 3 3.057 0.048 6
150 10 3.076 0.097 1 3.244 0.048 6
160 5 2.945 0.098 0 3.067 0.048 8
160 10 3.088 0.098 4 3.262 0.048 4
170 5 2.953 0.098 3 3.082 0.047 7
170 10 3.104 0.097 6 3.274 0.048 8
180 5 2.964 0.098 2 3.089 0.048 8
180 10 3.119 0.097 7 3.285 0.048 5
190 5 2.977 0.096 1 3.098 0.048 3
190 10 3.134 0.098 3 3.300 0.048 6
200 5 2.985 0.098 1 3.104 0.048 5
200 10 3.144 0.098 5 3.310 0.048 2
m n
D e s i r e d F A P = 0.10 D e s i r e d F A P = 0.05
128
Appendix D: MATLAB Code for Finding Empirical UCLs for the MMR Chart
%=========================================================================%
% FINDING EMPIRICAL CONTROL LIMITS FOR THE MMR CHART %
%=========================================================================%
% -Created by Richard Bell on 3/1/2011; last updated on 3/22/2011. %
% -Variables named for robust Mahalanobis depth (RMD) are used here, %
% although this file is not reliant on any particular depth measure. %
%=========================================================================%
%>>>>> INSTRUCTIONS: Start with 10k iterations to get a ballpark estimate,
then fine-tune with 100k iterations.
clear all % clear all objects in the MATLAB workspace
clc % clear the output screen
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% AUTOMATED INPUTS (for simulating multiple scenarios using an input file)
% read in m, n, UCL, shift size, and p from an Excel file
iterations=100000; % number of simulation iterations to be performed
input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','A1:C50');
inputRows=length(input(:,1)); % determine the number of rows of data in the
input file
APtable=zeros(inputRows,3); % initialize the array of estimated alarm
probability (AP) values for speed
for row=1:inputRows % perform the simulation below for each m, n, p, UCL,
and shift size combination in the input file
m=input(row,1); % read in the desired value for sample size (m)
n=input(row,2); % read in the desired value for subgroup size (n)
UCL=input(row,3); % read in the upper control limit
N=m*n; % determine the pooled sample size (=m in the case of individual
observations)
AP=1; % initialize the AP to 1 so at least one repetition of the UCL search
will be performed
reps=0; % initialize the counter for the number of repetitions required to
find the optimal UCL
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%% GENERATE DATA AND COMPUTE ROBUST ESTIMATES %%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
while AP > 0.0985 % set the threshold AP based on the lower limit of an
upper 95% CI for a proportion
UCL=UCL+0.001; % set the desired increment for each iteration of the UCL
search; use 0.10 first, then 0.01 and 0.001 to refine
reps=reps+1; % count the number of repetitions required to find the optimal
UCL
129
count=0; % initialize the counter for the number of iterations performed
alarmCount=0; % initialize the alarm counter
while count < iterations % run the entire loop for a set number of
iterations
%=====> SIMULATE UNIFORM(0,1) NUMBERS REPRESENTING DEPTH VALUES FROM 0 TO 1
X=unifrnd(0,1,[N,1]);
%=====> PARTITION DATA INTO SUBGROUPS
% assign a subgroup identifier to each simulated data point
i=1; % start with the first observation in the data set
assigned=0; % initialize the total number of observations which have been
assigned subgroups
ID=1; % initialize the subgroup identifier for the first subgroup
subgroup=zeros(N,1); % initialize the N x 1 vector of subgroup identifiers
for speed
while assigned <= N-n % perform loop until all observations in the data set
have been assigned subgroup identifiers
size=0; % initialize the number of observations contained in each subgroup
while size < n % perform loop until each subgroup reaches size n
subgroup(i)=ID; % assign the subgroup identifier "ID" to an observation
size=size+1; % increment the number of observations in the current subgroup
i=i+1; % move to the next observation
end
ID=ID+1; % increment the subgroup identifier
assigned=assigned+n; % increment the total number of observations which have
been assigned subgroups
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% RANK DATA AND COMPUTE SUBGROUP MEAN RANKS %%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% rank each uniform random number generated
RMDrank=tiedrank(X); % use the midrank method in the event of a tie; MATLAB
default is to rank from smallest (rank=1) to largest (rank=N)
% compute subgroup mean ranks
subgroup(N+1)=0; % create a fictitious subgroup identifier for the
nonexistent (N+1)st rank so the following while loop doesn't cause an error
at the Nth rank in the data set
RMDtotal=0; % initialize the total RMD rank for the first subgroup to 0
i=1; % initialize the index for the N x 1 vector of ranks resulting from the
depth function
k=1; % initialize the index for the m x 1 vector of subgroup mean ranks to
be computed
alarm=0; % initialize the number of RMD alarms to 0
130
RMDsubgrpAvg=zeros(m,1); % initialize the m x 1 vector of RMD subgroup mean
ranks for speed
while i <= N % perform loop for all N ranks resulting from application of
the depth function
j=i; % initialize the rank identifier to point to the first observation in
each subgroup
RMDtotal=RMDrank(j); % initialize the total RMD rank for each subgroup to be
the first rank in the subgroup
while subgroup(j)==subgroup(j+1) % perform loop until the subgroup
identifier changes
RMDtotal=RMDtotal+RMDrank(j+1); % add the next RMD rank in the current
subgroup to the total
j=j+1; % increment the rank identifier by 1
end
RMDsubgrpAvg(k)=RMDtotal/n; % compute the average subgroup RMD rank for the
current subgroup
k=k+1; % increment the index for the vector of subgroup mean ranks
i=i+n; % count the number of ranks for which subgroup averages have been
computed in order to regulate the while loop
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% COMPARE STANDARDIZED SUBGROUP MEAN RANKS TO CONTROL LIMITS %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% compute the theoretical mean and variance of subgroup mean ranks
ExpRbar=(N+1)/2; % compute the expected value of the subgroup mean rank
VarRbar=((N-n)*(N+1))/(12*n); % compute the variance of the subgroup mean
rank
Z_RMD=zeros(m,1); % initialize the m x 1 vector of standardized subgroup RMD
mean ranks
% standardize subgroup mean ranks resulting from the RMD function and compare
to the UCL
for i = 1:m % perform loop for all m subgroup mean ranks
if alarm==0 % continue loop as long as no alarms occur
Z_RMD(i)=(RMDsubgrpAvg(i)-ExpRbar)/sqrt(VarRbar); % standardize each
subgroup mean rank
if Z_RMD(i)>UCL % compare each standardized subgroup mean rank statistic to
the UCL
alarm=1; % signal if a standardized subgroup mean rank falls above the UCL
end
end
end
if alarm==1
alarmCount=alarmCount+1; % if a control chart issues an alarm, increment the
counter representing total alarms for all iterations
end
count=count+1; % increment counter for total number of iterations performed
end
131
AP=alarmCount/iterations; % estimate the alarm probability (AP) for the
current scenario
APtable(row,1)=reps; % record the results of each UCL evaluation in a table
APtable(row,2)=UCL;
APtable(row,3)=AP;
disp(APtable); % display AP for the current scenario
end
% send the results to an Excel file
xlswrite('c:\Users\Rich\Documents\OutputFile.xlsx',APtable,'Sheet1','A1');
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
132
Appendix E: Empirical UCLs for Hotelling's T2 Chart
P r oc e s s
D i s t r i b u t i on UCL S i m u l at e d F A P
20 5 11.51 0.097 0
50 5 14.01 0.096 3
100 5 15.88 0.097 6
150 5 17.06 0.097 1
200 5 17.94 0.097 7
20 5 15.66 0.095 4
50 5 27.47 0.097 3
100 5 43.79 0.096 7
150 5 58.02 0.096 5
200 5 70.40 0.097 6
20 5 25.79 0.096 7
50 5 42.34 0.096 7
100 5 68.76 0.095 8
150 5 92.55 0.096 7
200 5 114 .51 0.096 8
100 5 68.76 0.095 8
100 10 61.43 0.098 0
100 15 57.42 0.096 3
100 20 54.69 0.097 0
20 5 42.41 0.097 0
50 5 59.65 0.097 6
100 5 91.52 0.097 1
150 5 123 .51 0.097 0
200 5 154 .03 0.095 9
20 5 19.05 0.097 7
50 5 33.05 0.097 1
100 5 50.32 0.096 4
150 5 64.26 0.097 5
200 5 75.56 0.096 3
20 5 29.73 0.097 8
50 5 45.79 0.097 6
100 5 68.01 0.097 5
150 5 86.07 0.097 3
200 5 102 .01 0.097 7
100 5 68.01 0.097 5
100 10 56.94 0.097 6
100 15 51.12 0.097 4
100 20 47.25 0.097 2
10
2
5
5
5t ( 3)
log n or m a l
D e s i r e d F A P = 0.10
t ( 3)
t ( 3)
t ( 3)
t ( 10)
log n or m a l
log n or m a l
m np
2
2
5
133
Appendix F: MATLAB Code for Finding Empirical UCLs for Hotelling's T2 Chart
%=========================================================================%
% FINDING EMPIRICAL CONTROL LIMITS FOR HOTELLING'S T^2 CONTROL CHART %
%=========================================================================%
% -Created by Richard Bell on 9/15/2010; last updated on 4/26/2011. %
% -Based on Hotelling's T2 control chart with Alt's (1976) Phase I UCL %
% adjusted for the number of subgroups. %
% -File is set up to run multiple scenarios; before using, undesired %
% sections must be commented out using "%". %
%=========================================================================%
%>>>>> INSTRUCTIONS: Start with 10k iterations to get a ballpark estimate,
then fine-tune with 50k iterations.
clear all % clear all objects in the MATLAB workspace
clc % clear the output screen
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% AUTOMATED INPUTS (for simulating multiple scenarios using an input file)
% read in m, n, UCL, shift size, and p from an Excel file
iterations=50000; % number of simulation iterations to be performed
input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','A1:E50');
inputRows=length(input(:,1)); % determine the number of rows of data in the
input file
APtable=zeros(inputRows,3); % initialize the array of estimated alarm
probability (AP) values for speed
for row=1:inputRows % perform the simulation below for each m, n, p, UCL,
and shift size combination in the input file
m=input(row,1); % read in the desired value for sample size (m)
n=input(row,2); % read in the desired value for subgroup size (n)
UCL=input(row,3); % read in the upper control limit
shiftSize=input(row,4); % read in the desired shift size
p=input(row,5); % read in the number of variables
N=m*n; % determine the pooled sample size (=m in the case of individual
observations)
AP=1; % initialize the AP to 1 so at least one repetition of the UCL search
will be performed
reps=0; % initialize the counter for the number of repetitions required to
find the optimal UCL
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% GENERATE DATA AND CONSTRUCT HOTELLING'S T2 CHART %%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
while AP > 0.0978 % set the threshold AP based on the lower limit of an
upper 95% CI for a proportion
134
UCL=UCL+0.01; % set the desired increment for each iteration of the UCL
search; use 1.0 first, then 0.25 and 0.01 to refine
reps=reps+1; % count number of repetitions required to find the optimal UCL
count=0; % initialize the counter for the number of iterations performed
alarmCount=0; % initialize the alarm counter
while count < iterations % run the entire loop for a set number of
iterations
%=====> SIMULATE MULTIVARIATE NORMAL AND MULTIVARIATE T DATA (ELLIPTICAL)
% OPTION 1: Simulate in-control data.
% multivariate normal distribution
alpha=.10; % desired overall false alarm probability (FAP) for the chart
alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual
comparison
UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's
Phase I upper control limit for Hotelling's T2 chart
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
X=mvnrnd(mu,sigma,N); % generate multivariate normal data
% multivariate t distribution
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
X=mvtrnd(sigma,df,N); % generate multivariate t data with specified degrees
of freedom
% OPTION 2: Simulate out-of-control data with isolated or sustained shifts of
the mean.
% multivariate normal -- isolated shift of the mean during the first subgroup
only
alpha=.10; % desired overall false alarm probability (FAP) for the chart
alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual
comparison
UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's
Phase I upper control limit for Hotelling's T2 chart
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvnrnd(mu+shift,sigma,n); % generate the shifted subgroup
Xb=mvnrnd(mu,sigma,N-n); % generate the rest of the (unshifted) sample
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% multivariate t -- isolated shift of the mean during the first subgroup only
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
135
Xa=mvtrnd(sigma,df,n)+repmat(shift,n,1); % generate the first subgroup and
add the shift
Xb=mvtrnd(sigma,df,N-n); % generate the rest of the (unshifted) sample
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% multivariate normal -- sustained shift of the mean during the last
"percentOC" % of the sample (irrespective of subgroups)
alpha=.10; % desired overall false alarm probability (FAP) for the chart
alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual
comparison
UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's
Phase I upper control limit for Hotelling's T2 chart
percentOC=0.15; % designate the percentage of out-of-control points
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
numberOC=round(percentOC*N); % determine the number of out-of-control
points, rounded to the nearest integer
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvnrnd(mu,sigma,N-numberOC); % generate the in-control points
Xb=mvnrnd(mu+shift,sigma,numberOC); % generate the out-of-control points
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% multivariate t -- sustained shift of the mean during the last "percentOC" %
of the sample (irrespective of subgroups)
percentOC=0.15; % designate the percentage of out-of-control points
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
numberOC=round(percentOC*N); % determine the number of out-of-control
points, rounded to the nearest integer
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvtrnd(sigma,df,N-numberOC); % generate the in-control points
Xb=mvtrnd(sigma,df,numberOC)+repmat(shift,numberOC,1); % generate the out-
of-control points
X=vertcat(Xa,Xb); % combine shifted and unshifted data
%=====> SIMULATE MULTIVARIATE LOGNORMAL DATA (SKEWED)
% STEP 1: Simulate uniformly distributed vector of shift directions using
algorithm by Johnson (1987), page 127.
StdNorm=zeros(1,p); % initialize vector of standard normal random numbers
Unif=zeros(1,p); % initialize vector of shift directions
for i = 1:p
StdNorm(1,i)=normrnd(0,1); % generate p independent standard normal variates
end
for i = 1:p
Unif(1,i)=StdNorm(1,i)/sqrt(sum(StdNorm.^2)); % create vector of shift
directions IAW Johnson (1987), page 127
end
136
% STEP 2: Simulate the sample data set and standardize.
mu_Y=zeros(1,p); % create a mean vector of all zeros
sigma_Y=eye(p); % set the covariance matrix equal to the identity matrix
Y=mvnrnd(mu_Y,sigma_Y,N); % simulate N multivariate normal observations
X=exp(Y); % transform multivariate normal observations to multivariate
lognormal observations
% NOTE: THE FOLLOWING RESULTS ONLY APPLY TO MULTIVARIATE LOGNORMAL DATA
CREATED USING MULTIVARIATE NORMAL DATA WITH ZERO MEAN VECTOR AND IDENTITY
COVARIANCE MATRIX!
ExpX=exp(1/2); % compute theoretical expected value of X
sigma_X=zeros(p,p); % initialize covariance matrix to all zeros
for i=1:p % fill in diagonals of covariance matrix
for j=1:p
if i==j
sigma_X(i,j)=exp(1)*(exp(1)-1); % from Law and Kelton (2000), page 382
end
end
end
X=(X-ExpX)/sqrtm(sigma_X); % standardize multivariate lognormal observations
to have zero mean vector and identity covariance matrix
% STEP 3: Scale the vector of shift directions to achieve a specified
noncentrality parameter.
sigma_X=eye(p); % specify theoretical covariance matrix of standardized data
Unif=shiftSize*Unif; % scale the directional shift vector
NCP=sqrt(Unif/sigma_X*Unif'); % check the noncentrality parameter to ensure
it equals the desired value
% STEP 4: Induce isolated or sustained shifts of the mean.
% isolated shift of the mean during the first subgroup only
Xa=X(1:n,:)+repmat(Unif,n,1); % replicate the shift vector n times and add
it to the first subgroup
Xb=X(n+1:N,:); % identify the remaining (unshifted) observations in the data
set
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% sustained shift of the mean during the last "percentOC" % of the
sample (irrespective of subgroups)
percentOC=0.15; % designate the percentage of out-of-control points
numberOC=round(percentOC*N); % determine the number of in-control points,
rounded to the nearest integer
Xa=X(1:(N-numberOC),:); % identify unshifted observations in the data set
Xb=X(N-numberOC+1:N,:)+repmat(Unif,numberOC,1); % replicate the shift vector
and add it to the remaining observations
X=vertcat(Xa,Xb); % combine shifted and unshifted data
137
%=====> PARTITION DATA INTO SUBGROUPS
% assign a subgroup identifier to each simulated data point
i=1; % start with the first observation in the data set
assigned=0; % initialize the total number of observations which have been
assigned subgroups
ID=1; % initialize the subgroup identifier for the first subgroup
subgroup=zeros(N,1); % initialize the N x 1 vector of subgroup identifiers
for speed
while assigned <= N-n % perform loop until all observations in the data set
have been assigned subgroup identifiers
size=0; % initialize the number of observations contained in each subgroup
while size < n % perform loop until each subgroup reaches size n
subgroup(i)=ID; % assign the subgroup identifier "ID" to an observation
size=size+1; % increment the number of observations in the current subgroup
i=i+1; % move to the next observation
end
ID=ID+1; % increment the subgroup identifier
assigned=assigned+n; % increment the total number of observations which have
been assigned subgroups
end
%=====> COMPUTE ROBUST ESTIMATES OF LOCATION AND SCATTER
subgroupMeans=zeros(m,p); % initialize the matrix of individual subgroup
mean vectors
totalMeans=zeros(1,p); % initialize the total of all subgroup mean vectors
totalCovs=zeros(p,p); % initialize the total of all subgroup covariance
matrices
subgroup(N+1)=0; % create a fictitious subgroup for the nonexistent (N+1)st
observation so the following while loop doesn't cause an error at the Nth
observation
i=1; % initialize the index for the N x p vector of observations
while i <= N % perform loop for all N observations
currentSubgroup=X(i,:); % start with first observation in the data set
j=i; % initialize the subgroup index to point to the first observation in
each subgroup
while subgroup(j)==subgroup(j+1) % perform loop until the subgroup
identifier changes (this is where the fake subgroup is needed)
currentSubgroup=cat(1,currentSubgroup,X(j+1,:)); % combine individual
observations into their respective subgroups
j=j+1; % increment the subgroup index by 1
end
subgroupMeans(j/n,:)=mean(currentSubgroup); % store individual subgroup
means in a vector
totalMeans=totalMeans+subgroupMeans(j/n,:); % keep a running total of all
subgroup mean vectors
totalCovs=totalCovs+cov(currentSubgroup); % keep a running total of all
subgroup covariance matrices
i=i+n; % count the number of observations for which subgroup averages have
been computed in order to regulate the while loop
end
138
Xbar_robust=totalMeans/m; % compute average of subgroup means; serves as
unbaised estimate of mean vector
S_robust=totalCovs/m; % compute average of subgroup variances; serves as
unbiased estimate of covariance matrix
%=====> COMPUTE HOTELLING'S T2 STATISTICS AND COMPARE TO UCL
alarm=0; % initialize indicator variable representing an alarm (=1) or no
alarm (=0)
T2vector=zeros(m,1); % initialize vector of T2 statistics
for i=1:m
if alarm==0 % continue loop as long as no false alarms occur
T2stat=n*(subgroupMeans(i,:)-Xbar_robust)/S_robust*(subgroupMeans(i,:)-
Xbar_robust)'; % compute T2 control statistic
T2vector(i)=T2stat; % store T2 control statistics in a vector
if T2stat > UCL
alarm=1; % issue a false alarm if the T2 control statistic exceeds the UCL
end
end
end
if alarm==1
alarmCount=alarmCount+1; % if a control chart issues a false alarm,
increment the counter representing total false alarms for all iterations
end
count=count+1; % increment the counter for the total number of iterations
performed
end
AP=alarmCount/iterations; % estimate the alarm probability (AP) for the
current scenario
APtable(row,1)=reps; % record the results of each UCL evaluation in a table
APtable(row,2)=UCL;
APtable(row,3)=AP;
disp(APtable); % display AP table for Hotelling's T2 chart on screen, if
desired
end
% send the results to an Excel file
xlswrite('c:\Users\Rich\Documents\OutputFile.xlsx',APtable,'Sheet1','A1');
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
139
Appendix G: MATLAB Code for Assessing MMR Chart Performance
%=========================================================================%
% MULTIVARIATE MEAN-RANK (MMR) CONTROL CHART PROGRAM FILE %
%=========================================================================%
% -Created by Richard Bell on 9/18/2010; last updated on 3/1/2011. %
% -Can be modified to find empirical APs for specified scenarios, %
% determine empirical UCLs for specific distributions, or construct %
% control charts for preliminary data sets. %
% -File is set up to run multiple scenarios; before using, undesired %
% sections must be commented out using "%". %
%=========================================================================%
clear all % clear all objects in the MATLAB workspace
clc % clear the output screen
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% AUTOMATED INPUTS (for simulating multiple scenarios using an input file)
% read in m, n, control limits, shift size, and p from an Excel file
iterations=10000; % number of simulation iterations to be performed
input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','A1:E50');
inputRows=length(input(:,1)); % determine the number of rows of data in the
input file
RMD_APtable=zeros(inputRows,1); % initialize the array of estimated alarm
probability (AP) values for the MMR chart using RMD
MSD_APtable=zeros(inputRows,1); % initialize the array of estimated alarm
probability (AP) values for the MMR chart using MSD
for row=1:inputRows % perform the simulation below for each m, n, UCL, shift
size, and p combination in the input file
m=input(row,1); % read in the desired value for sample size (m)
n=input(row,2); % read in the desired value for subgroup size (n)
UCL=input(row,3); % read in the upper control limit (UCL) corresponding to
the m,n combination
shiftSize=input(row,4); % read in the size of the desired shift
p=input(row,5); % read in the number of variables
N=m*n; % determine the pooled sample size (=m in the case of individual
observations)
count=0; % initialize the counter for the number of iterations performed
RMDalarmCount=0; % initialize the alarm counter for the RMD function
MSDalarmCount=0; % initialize the alarm counter for the MSD function
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%% GENERATE DATA AND COMPUTE ROBUST ESTIMATES %%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
while count < iterations % run the entire loop for a set number of
iterations
140
%=====> SIMULATE MULTIVARIATE NORMAL AND MULTIVARIATE T DATA (ELLIPTICAL)
% OPTION 1: Simulate in-control data.
% multivariate normal distribution
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
X=mvnrnd(mu,sigma,N); % generate multivariate normal data
% multivariate t distribution
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
X=mvtrnd(sigma,df,N); % generate multivariate t data with specified degrees
of freedom
% OPTION 2: Simulate out-of-control data with isolated or sustained shifts of
the mean.
% multivariate normal -- isolated shift of the mean during the first subgroup
only
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvnrnd(mu+shift,sigma,n); % generate the shifted subgroup
Xb=mvnrnd(mu,sigma,N-n); % generate the rest of the (unshifted) sample
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% multivariate t -- isolated shift of the mean during the first subgroup only
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvtrnd(sigma,df,n)+repmat(shift,n,1); % generate the first subgroup and
add the shift
Xb=mvtrnd(sigma,df,N-n); % generate the rest of the (unshifted) sample
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% multivariate normal -- sustained shift of the mean during the last
"percentOC" % of the sample (irrespective of subgroups)
percentOC=0.15; % designate the percentage of out-of-control points
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
numberOC=round(percentOC*N); % determine the number of out-of-control
points, rounded to the nearest integer
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvnrnd(mu,sigma,N-numberOC); % generate the in-control points
Xb=mvnrnd(mu+shift,sigma,numberOC); % generate the out-of-control points
X=vertcat(Xa,Xb); % combine shifted and unshifted data
141
% multivariate t -- sustained shift of the mean during the last "percentOC" %
of the sample (irrespective of subgroups)
percentOC=0.15; % designate the percentage of out-of-control points
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
numberOC=round(percentOC*N); % determine the number of out-of-control
points, rounded to the nearest integer
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvtrnd(sigma,df,N-numberOC); % generate the in-control points
Xb=mvtrnd(sigma,df,numberOC)+repmat(shift,numberOC,1); % generate the out-
of-control points
X=vertcat(Xa,Xb); % combine shifted and unshifted data
%=====> SIMULATE MULTIVARIATE LOGNORMAL DATA (SKEWED)
% STEP 1: Simulate uniformly distributed vector of shift directions using
algorithm by Johnson (1987), page 127.
StdNorm=zeros(1,p); % initialize vector of standard normal random numbers
Unif=zeros(1,p); % initialize vector of shift directions
for i = 1:p
StdNorm(1,i)=normrnd(0,1); % generate p independent standard normal variates
end
for i = 1:p
Unif(1,i)=StdNorm(1,i)/sqrt(sum(StdNorm.^2)); % create vector of shift
directions IAW Johnson (1987), page 127
end
% STEP 2: Simulate the sample data set and standardize.
mu_Y=zeros(1,p); % create a mean vector of all zeros
sigma_Y=eye(p); % set the covariance matrix equal to the identity matrix
Y=mvnrnd(mu_Y,sigma_Y,N); % simulate N multivariate normal observations
X=exp(Y); % transform multivariate normal observations to multivariate
lognormal observations
% NOTE: THE FOLLOWING RESULTS ONLY APPLY TO MULTIVARIATE LOGNORMAL DATA
CREATED USING MULTIVARIATE NORMAL DATA WITH ZERO MEAN VECTOR AND IDENTITY
COVARIANCE MATRIX!
ExpX=exp(1/2); % compute theoretical expected value of X
VarX=exp(1)*(exp(1)-1); % compute theoretical variance of X
X=(X-ExpX)/sqrt(VarX); % standardize multivariate lognormal observations to
have zero mean vector and identity covariance matrix (can use this method
since the observations are independent)
142
% STEP 3: Scale the vector of shift directions to achieve a specified
noncentrality parameter.
sigma_X=eye(p); % specify theoretical covariance matrix of standardized data
Unif=shiftSize*Unif; % scale the directional shift vector
NCP=sqrt(Unif/sigma_X*Unif'); % check the noncentrality parameter to ensure
it equals the desired value
% STEP 4: Induce isolated or sustained shifts of the mean.
% isolated shift of the mean during the first subgroup only
Xa=X(1:n,:)+repmat(Unif,n,1); % replicate the shift vector n times and add
it to the first subgroup
Xb=X(n+1:N,:); % identify the remaining (unshifted) observations in the data
set
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% sustained shift of the mean during the last "percentOC" % of the
sample (irrespective of subgroups)
percentOC=0.15; % designate the percentage of out-of-control points
numberOC=round(percentOC*N); % determine the number of in-control points,
rounded to the nearest integer
Xa=X(1:(N-numberOC),:); % identify the unshifted observations in the data
set
Xb=X(N-numberOC+1:N,:)+repmat(Unif,numberOC,1); % replicate the shift vector
and add it to the remaining observations
X=vertcat(Xa,Xb); % combine shifted and unshifted data
%=====> PARTITION DATA INTO SUBGROUPS
% assign a subgroup identifier to each simulated data point
i=1; % start with the first observation in the data set
assigned=0; % initialize the total number of observations which have been
assigned subgroups
ID=1; % initialize the subgroup identifier for the first subgroup
subgroup=zeros(N,1); % initialize the N x 1 vector of subgroup identifiers
for speed
while assigned <= N-n % perform loop until all observations in the data set
have been assigned subgroup identifiers
size=0; % initialize the number of observations contained in each subgroup
while size < n % perform loop until each subgroup reaches size n
subgroup(i)=ID; % assign the subgroup identifier "ID" to an observation
size=size+1; % increment the number of observations in the current subgroup
i=i+1; % move to the next observation
end
ID=ID+1; % increment the subgroup identifier
assigned=assigned+n; % increment the total number of observations which have
been assigned subgroups
end
143
%=====> COMPUTE ROBUST ESTIMATES USING HOTELLING'S T^2 OR BACON METHODS
% OPTION 1: Hotelling's T^2 Method
totalMeans=zeros(1,p); % initialize the total of all subgroup mean vectors
totalCovs=zeros(p,p); % initialize the total of all subgroup covariance
matrices
subgroup(N+1)=0; % create a fictitious subgroup for the nonexistent (N+1)st
observation so the following while loop doesn't cause an error at the Nth
observation
i=1; % initialize the index for the N x p vector of observations
while i <= N % perform loop for all N observations
currentSubgroup=X(i,:); % start with first observation in the data set
j=i; % initialize the subgroup identifier to point to the first observation
in each subgroup
while subgroup(j)==subgroup(j+1) % perform loop until the subgroup
identifier changes
currentSubgroup=cat(1,currentSubgroup,X(j+1,:)); % combine individual
observations into their respective subgroups
j=j+1; % increment the subgroup identifier by 1
end
totalMeans=totalMeans+mean(currentSubgroup); % keep a running total of all
subgroup mean vectors
totalCovs=totalCovs+cov(currentSubgroup); % keep a running total of all
subgroup covariance matrices
i=i+n; % count the number of observations for which subgroup averages have
been computed in order to regulate the while loop
end
Xbar_robust=totalMeans/m; % compute average of subgroup means; serves as
unbaised estimate of mean vector
S_robust=totalCovs/m; % compute average of subgroup variances; serves as
unbiased estimate of covariance matrix
% OPTION 2: BACON method for estimating mean vector and covariance matrix
out=baconV(X,1,.10,4); % compute BACON estimates for location and scatter
using Mahalanobis distance, alpha=0.05, and c=4; use version 2 (Euclidean
distance) if expected contamination exceeds 20 percent
Xbar_robust=out.center3; % BACON estimate for mean vector
S_robust=out.cov3; % BACON estimate for covariance matrix
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%% RANK DATA USING DATA DEPTH %%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% NOTE: The following code simultaneously applies both the robust Mahalanobis
depth (RMD) and Mahalanobis spatial depth (MSD) functions to the same data
set.
[RMD]=computeRMDv1(X,Xbar_robust,S_robust); % compute the Robust Mahalanobis
Depth of each point in the sample
144
RMDrank_interim=tiedrank(RMD); % rank each depth value; use the midrank
method in the event of a tie; MATLAB default is to rank from smallest
(rank=1) to largest (rank=N)
RMDrank=N-RMDrank_interim+1; % following data depth convention, adjust the
ranks to go from largest depth value (rank=1) to smallest depth value
(rank=N)
[MSD]=computeMSDfast(X,S_robust); % compute the Mahalanobis Spatial Depth of
each point in the sample
MSDrank_interim=tiedrank(MSD); % rank each depth value; use the midrank
method in the event of a tie; MATLAB default is to rank from smallest
(rank=1) to largest (rank=N)
MSDrank=N-MSDrank_interim+1; % following data depth convention, adjust the
ranks to go from largest depth value (rank=1) to smallest depth value
(rank=N)
% compute subgroup mean ranks
subgroup(N+1)=0; % create a fictitious subgroup identifier for the
nonexistent (N+1)st rank so the following while loop doesn't cause an error
at the Nth rank in the data set
RMDtotal=0; % initialize the total RMD rank for the first subgroup to 0
MSDtotal=0; % initialize the total MSD rank for the first subgroup to 0
i=1; % initialize the index for the N x 1 vector of ranks resulting from the
depth function
k=1; % initialize the index for the m x 1 vector of subgroup mean ranks to
be computed
RMDalarm=0; % initialize the number of RMD alarms to 0
MSDalarm=0; % initialize the number of MSD alarms to 0
RMDsubgrpAvg=zeros(m,1); % initialize the m x 1 vector of RMD subgroup mean
ranks for speed
MSDsubgrpAvg=zeros(m,1); % initialize the m x 1 vector of MSD subgroup mean
ranks for speed
while i <= N % perform loop for all N ranks resulting from application of
the depth function
j=i; % initialize the rank identifier to point to the first observation in
each subgroup
RMDtotal=RMDrank(j); % initialize the total RMD rank for each subgroup to be
the first rank in the subgroup
MSDtotal=MSDrank(j); % initialize the total MSD rank for each subgroup to be
the first rank in the subgroup
while subgroup(j)==subgroup(j+1) % perform loop until the subgroup
identifier changes
RMDtotal=RMDtotal+RMDrank(j+1); % add the next RMD rank in the current
subgroup to the total
MSDtotal=MSDtotal+MSDrank(j+1); % add the next MSD rank in the current
subgroup to the total
j=j+1; % increment the rank identifier by 1
end
RMDsubgrpAvg(k)=RMDtotal/n; % compute the average subgroup RMD rank for the
current subgroup
145
MSDsubgrpAvg(k)=MSDtotal/n; % compute the average subgroup MSD rank for the
current subgroup
k=k+1; % increment the index for the vector of subgroup mean ranks
i=i+n; % count the number of ranks for which subgroup averages have been
computed in order to regulate the while loop
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% COMPARE STANDARDIZED SUBGROUP MEAN RANKS TO CONTROL LIMITS %%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% compute the theoretical mean and variance of subgroup mean ranks
ExpRbar=(N+1)/2; % compute the expected value of the subgroup mean rank
VarRbar=((N-n)*(N+1))/(12*n); % compute the variance of the subgroup mean
rank
Z_RMD=zeros(m,1); % initialize the m x 1 vector of standardized subgroup RMD
mean ranks
Z_MSD=zeros(m,1); % initialize the m x 1 vector of standardized subgroup MSD
mean ranks
% standardize subgroup mean ranks resulting from the RMD function and compare
to the UCL
for i = 1:m % perform loop for all m subgroup mean ranks
if RMDalarm==0 % continue loop as long as no alarms occur; no need to
perform further computations once an alarm occurs (for example, recall that
FAP is the probability of observing ONE OR MORE signals from a control chart
when the process is in control, so the total number of false alarms on a
single chart is irrelevant; same concept applies to EAP in out-of-control
scenarios)
Z_RMD(i)=(RMDsubgrpAvg(i)-ExpRbar)/sqrt(VarRbar); % standardize each
subgroup mean rank
if Z_RMD(i)>UCL % compare each standardized subgroup mean rank statistic to
the UCL
RMDalarm=1; % signal if a standardized subgroup mean rank falls above UCL
end
end
end
if RMDalarm==1
RMDalarmCount=RMDalarmCount+1; % if a control chart issues an alarm,
increment the counter representing total alarms for all iterations
end
% standardize subgroup mean ranks resulting from the MSD function and compare
to the UCL
for i = 1:m
if MSDalarm==0
Z_MSD(i)=(MSDsubgrpAvg(i)-ExpRbar)/sqrt(VarRbar);
if Z_MSD(i)>UCL
MSDalarm=1;
end
end
end
146
if MSDalarm==1
MSDalarmCount=MSDalarmCount+1;
end
count=count+1; % increment the counter for the total number of iterations
performed
end
% record results for both RMD and MSD methods
RMD_AP=RMDalarmCount/iterations; % estimate the RMD alarm probability (AP)
for the current scenario and store in an array
RMD_APtable(row,1)=RMD_AP;
MSD_AP=MSDalarmCount/iterations; % estimate the MSD AP for the current
scenario and store in an array
MSD_APtable(row,1)=MSD_AP;
disp('EAP Table for MMR-RMD');
disp(RMD_APtable); % display AP table for MMR chart using RMD on screen, if
desired
disp('EAP Table for MMR-MSD');
disp(MSD_APtable); % display AP table for MMR chart using MSD on screen, if
desired
% send the estimated APs to an Excel file
xlswrite('c:\Users\Rich\Documents\OutputFile.xlsx',APtable,'Sheet1','A1');
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
147
Appendix H: MATLAB Code for Assessing Hotelling's T2 Chart Performance
%=========================================================================%
% HOTELLING'S T^2 CONTROL CHART PROGRAM FILE %
%=========================================================================%
% -Created by Richard Bell on 9/15/2010; last updated on 3/22/2011. %
% -Based on Hotelling's T2 control chart with Alt's (1976) Phase I UCL %
% adjusted for the number of subgroups. %
% -Can be modified to find empirical APs for specified scenarios, %
% determine empirical UCLs for specific distributions, or construct %
% control charts for preliminary data sets. %
% -File is set up to run multiple scenarios; before using, undesired %
% sections must be commented out using "%". %
%=========================================================================%
clear all % clear all objects in the MATLAB workspace
clc % clear the output screen
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% AUTOMATED INPUTS (for simulating multiple scenarios using an input file)
% read in m, n, UCL, shift size, and p from an Excel file
iterations=10000; % number of simulation iterations to be performed
input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','A1:E50');
inputRows=length(input(:,1)); % determine the number of rows of data in the
input file
APtable=zeros(inputRows,1); % initialize the array of estimated alarm
probability (AP) values for speed
for row=1:inputRows % perform the simulation below for each m, n, p, UCL,
and shift size combination in the input file
m=input(row,1); % read in the desired value for sample size (m)
n=input(row,2); % read in the desired value for subgroup size (n)
UCL=input(row,3); % read in the upper control limit
shiftSize=input(row,4); % read in the number of variables
p=input(row,5); % read in the number of variables
N=m*n; % determine the pooled sample size (=m in the case of individual
observations)
count=0; % initialize the counter for the number of iterations performed
alarmCount=0; % initialize the alarm counter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% GENERATE DATA AND CONSTRUCT HOTELLING'S T2 CHART %%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
while count < iterations % run the entire loop for a set number of
iterations
148
%=====> SIMULATE MULTIVARIATE NORMAL AND MULTIVARIATE T DATA (ELLIPTICAL)
% OPTION 1: Simulate in-control data.
% multivariate normal distribution
alpha=.10; % desired overall false alarm probability (FAP) for the chart
alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual
comparison
UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's
Phase I upper control limit for Hotelling's T2 chart
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
X=mvnrnd(mu,sigma,N); % generate multivariate normal data
% multivariate t distribution
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
X=mvtrnd(sigma,df,N); % generate multivariate t data with specified degrees
of freedom
% OPTION 2: Simulate out-of-control data with isolated or sustained shifts of
the mean.
% multivariate normal -- isolated shift of the mean during the first subgroup
only
alpha=.10; % desired overall false alarm probability (FAP) for the chart
alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual
comparison
UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's
Phase I upper control limit for Hotelling's T2 chart
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvnrnd(mu+shift,sigma,n); % generate the shifted subgroup
Xb=mvnrnd(mu,sigma,N-n); % generate the rest of the (unshifted) sample
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% multivariate t -- isolated shift of the mean during the first subgroup only
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvtrnd(sigma,df,n)+repmat(shift,n,1); % generate the first subgroup and
add the shift
Xb=mvtrnd(sigma,df,N-n); % generate the rest of the (unshifted) sample
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% multivariate normal -- sustained shift of the mean during the last
"percentOC" % of the sample (irrespective of subgroups)
alpha=.10; % desired overall false alarm probability (FAP) for the chart
alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual
comparison
149
UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's
Phase I upper control limit for Hotelling's T2 chart
percentOC=0.15; % designate the percentage of out-of-control points
mu=zeros(1,p); % set the mean vector to all zeros
sigma=eye(p); % set the covariance matrix equal to the identity matrix
numberOC=round(percentOC*N); % determine the number of out-of-control
points, rounded to the nearest integer
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvnrnd(mu,sigma,N-numberOC); % generate the in-control points
Xb=mvnrnd(mu+shift,sigma,numberOC); % generate the out-of-control points
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% multivariate t -- sustained shift of the mean during the last "percentOC" %
of the sample (irrespective of subgroups)
percentOC=0.30; % designate the percentage of out-of-control points
df=3; % degrees of freedom for multivariate t distribution
sigma=eye(p); % set the covariance matrix equal to the identity matrix
numberOC=round(percentOC*N); % determine the number of out-of-control
points, rounded to the nearest integer
shift=zeros(1,p); % initialize the shift vector
shift(1)=shiftSize; % place the desired shift in the first position of the
shift vector
Xa=mvtrnd(sigma,df,N-numberOC); % generate the in-control points
Xb=mvtrnd(sigma,df,numberOC)+repmat(shift,numberOC,1); % generate the out-
of-control points
X=vertcat(Xa,Xb); % combine shifted and unshifted data
%=====> SIMULATE MULTIVARIATE LOGNORMAL DATA (SKEWED)
% STEP 1: Simulate uniformly distributed vector of shift directions using
algorithm by Johnson (1987), page 127.
StdNorm=zeros(1,p); % initialize vector of standard normal random numbers
Unif=zeros(1,p); % initialize vector of shift directions
for i = 1:p
StdNorm(1,i)=normrnd(0,1); % generate p independent standard normal variates
end
for i = 1:p
Unif(1,i)=StdNorm(1,i)/sqrt(sum(StdNorm.^2)); % create vector of shift
directions IAW Johnson (1987), page 127
end
% STEP 2: Simulate the sample data set and standardize.
mu_Y=zeros(1,p); % create a mean vector of all zeros
sigma_Y=eye(p); % set the covariance matrix equal to the identity matrix
Y=mvnrnd(mu_Y,sigma_Y,N); % simulate N multivariate normal observations
X=exp(Y); % transform multivariate normal observations to multivariate
lognormal observations
150
% NOTE: THE FOLLOWING RESULTS ONLY APPLY TO MULTIVARIATE LOGNORMAL DATA
CREATED USING MULTIVARIATE NORMAL DATA WITH ZERO MEAN VECTOR AND IDENTITY
COVARIANCE MATRIX!
ExpX=exp(1/2); % compute theoretical expected value of X
sigma_X=zeros(p,p); % initialize covariance matrix to all zeros
for i=1:p % fill in diagonals of covariance matrix
for j=1:p
if i==j
sigma_X(i,j)=exp(1)*(exp(1)-1); % from Law and Kelton (2000), page 382
end
end
end
X=(X-ExpX)/sqrtm(sigma_X); % standardize multivariate lognormal observations
to have zero mean vector and identity covariance matrix
% STEP 3: Scale the vector of shift directions to achieve a specified
noncentrality parameter.
sigma_X=eye(p); % specify theoretical covariance matrix of standardized data
Unif=shiftSize*Unif; % scale the directional shift vector
NCP=sqrt(Unif/sigma_X*Unif'); % check the noncentrality parameter to ensure
it equals the desired value
if abs(NCP-shiftSize)>0.00001 % display error message if calculated NCP does
not equal the shift size (they should be equal since the theoretical
covariance matrix of X is I)
disp('ERROR in NCP!')
end
% STEP 4: Induce isolated or sustained shifts of the mean.
% isolated shift of the mean during the first subgroup only
Xa=X(1:n,:)+repmat(Unif,n,1); % replicate the shift vector n times and add
it to the first subgroup
Xb=X(n+1:N,:); % identify the remaining (unshifted) observations in the data
set
X=vertcat(Xa,Xb); % combine shifted and unshifted data
% sustained shift of the mean during the last "percentOC" % of the
sample (irrespective of subgroups)
percentOC=0.15; % designate the percentage of out-of-control points
numberOC=round(percentOC*N); % determine the number of in-control points,
rounded to the nearest integer
Xa=X(1:(N-numberOC),:); % identify the unshifted observations in the data
set
Xb=X(N-numberOC+1:N,:)+repmat(Unif,numberOC,1); % replicate the shift vector
and add it to the remaining observations
X=vertcat(Xa,Xb); % combine shifted and unshifted data
151
%=====> PARTITION DATA INTO SUBGROUPS
% assign a subgroup identifier to each simulated data point
i=1; % start with the first observation in the data set
assigned=0; % initialize the total number of observations which have been
assigned subgroups
ID=1; % initialize the subgroup identifier for the first subgroup
subgroup=zeros(N,1); % initialize the N x 1 vector of subgroup identifiers
for speed
while assigned <= N-n % perform loop until all observations in the data set
have been assigned subgroup identifiers
size=0; % initialize the number of observations contained in each subgroup
while size < n % perform loop until each subgroup reaches size n
subgroup(i)=ID; % assign the subgroup identifier "ID" to an observation
size=size+1; % increment the number of observations in the current subgroup
i=i+1; % move to the next observation
end
ID=ID+1; % increment the subgroup identifier
assigned=assigned+n; % increment the total number of observations which have
been assigned subgroups
end
%=====> COMPUTE ROBUST ESTIMATES OF LOCATION AND SCATTER
subgroupMeans=zeros(m,p); % initialize the matrix of individual subgroup
mean vectors
totalMeans=zeros(1,p); % initialize the total of all subgroup mean vectors
totalCovs=zeros(p,p); % initialize the total of all subgroup covariance
matrices
subgroup(N+1)=0; % create a fictitious subgroup for the nonexistent (N+1)st
observation so the following while loop doesn't cause an error at the Nth
observation
i=1; % initialize the index for the N x p vector of observations
while i <= N % perform loop for all N observations
currentSubgroup=X(i,:); % start with first observation in the data set
j=i; % initialize the subgroup index to point to the first observation in
each subgroup
while subgroup(j)==subgroup(j+1) % perform loop until the subgroup
identifier changes (this is where the fake subgroup is needed)
currentSubgroup=cat(1,currentSubgroup,X(j+1,:)); % combine individual
observations into their respective subgroups
j=j+1; % increment the subgroup index by 1
end
subgroupMeans(j/n,:)=mean(currentSubgroup); % store individual subgroup
means in a vector
totalMeans=totalMeans+subgroupMeans(j/n,:); % keep a running total of all
subgroup mean vectors
totalCovs=totalCovs+cov(currentSubgroup); % keep a running total of all
subgroup covariance matrices
i=i+n; % count the number of observations for which subgroup averages have
been computed in order to regulate the while loop
end
152
Xbar_robust=totalMeans/m; % compute average of subgroup means; serves as
unbaised estimate of mean vector
S_robust=totalCovs/m; % compute average of subgroup variances; serves as
unbiased estimate of covariance matrix
%=====> COMPUTE HOTELLING'S T2 STATISTICS AND COMPARE TO UCL
alarm=0; % initialize indicator variable representing an alarm (=1) or no
alarm (=0)
T2vector=zeros(m,1); % initialize vector of T2 statistics
for i=1:m
if alarm==0 % continue loop as long as no false alarms occur
T2stat=n*(subgroupMeans(i,:)-Xbar_robust)/S_robust*(subgroupMeans(i,:)-
Xbar_robust)'; % compute T2 control statistic
T2vector(i)=T2stat; % store T2 control statistics in a vector
if T2stat > UCL
alarm=1; % issue a false alarm if the T2 control statistic exceeds the UCL
end
end
end
if alarm==1
alarmCount=alarmCount+1; % if a control chart issues a false alarm,
increment the counter representing total false alarms for all iterations
end
count=count+1; % increment the counter for the total number of iterations
performed
end
AP=alarmCount/iterations; % estimate the alarm probability (AP) for the
current scenario and store in an array
APtable(row,1)=AP;
disp('AP Table for Hotellings T2 Chart');
disp(APtable); % display AP table for Hotelling's T2 chart on screen, if
desired
% send the estimated APs to an Excel file
xlswrite('c:\Users\Rich\Documents\OutputFile.xlsx',APtable,'Sheet1','A1');
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
153
Appendix I: Simulation Results Using In-Control Symmetric Data
P r oc e s s
D i s t r i b u t i on p = 2 p = 5 p = 10 p = 2 p = 5 p = 10 p = 2 p = 5 p = 10
20 5 0.0984 0.0992 0.0995 0.0919 0.0967 0.0947 0.0902 0.0947
50 5 0.0979 0.0990 0.1019 0.0996 0.1020 0.0975 0.1006 0.0975
100 5 0.1004 0.1006 0.0979 0.0998 0.0985 0.0972 0.0983 0.0972
150 5 0.0997 0.1040 0.0995 0.1009 0.1019 0.0990 0.1019 0.0990
200 5 0.0972 0.1005 0.1016 0.0973 0.0929 0.0973 0.0967 0.0973
20 5 0.1205 0.0851 0.0862
50 5 0.1634 0.0961 0.0981
100 5 0.1907 0.1031 0.1035
150 5 0.2203 0.0939 0.0940
200 5 0.2317 0.0988 0.0988
20 5 0.3040 0.3843 0.4266 0.0973 0.0930 0.0981 0.0959 0.0912 0.0981
50 5 0.5892 0.7876 0.9055 0.0978 0.1010 0.1004 0.0973 0.1021 0.1004
100 5 0.7876 0.9591 0.9932 0.0994 0.1022 0.0974 0.0974 0.1009 0.0974
150 5 0.8864 0.9895 0.9998 0.0950 0.1037 0.0981 0.0957 0.1035 0.0981
200 5 0.9348 0.9971 1.0000 0.1019 0.1013 0.0957 0.1035 0.1015 0.0957
t ( 10)
t ( 3)
E m p i r i c al F A P f or M M R - M S D C h ar t
n or m a l
E m p i r i c al F A P f or H ot e l l i n g' s T
2
C h ar t E m p i r i c al F A P f or M M R - R M D C h ar t
m n
154
Appendix J: Simulation Results Using Symmetric Data with an IS in p = 2
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2.5 ? = 3 ? = 3.5 ? = 4 ? = 5 ? = 6
20 5 0.0984 0.2523 0.8795 0.9835 0.9990 1.0000 1.0000 1.0000 1.0000
50 5 0.0979 0.2091 0.8497 0.9847 0.9991 1.0000 1.0000 1.0000 1.0000
100 5 0.1004 0.1826 0.8222 0.9770 0.9983 0.9999 1.0000 1.0000 1.0000
150 5 0.0997 0.1703 0.7876 0.9712 0.9979 1.0000 1.0000 1.0000 1.0000
200 5 0.0972 0.1610 0.7827 0.9641 0.9985 1.0000 1.0000 1.0000 1.0000
20 5 0.0974 0.1918 0.7438 0.9360 0.9917 0.9993 1.0000 1.0000 1.0000
50 5 0.0934 0.1505 0.6629 0.9110 0.9894 0.9991 0.9998 1.0000 1.0000
100 5 0.1019 0.1295 0.5951 0.8790 0.9815 0.9978 0.9999 1.0000 1.0000
150 5 0.0970 0.1231 0.5417 0.8442 0.9733 0.9978 0.9998 1.0000 1.0000
200 5 0.0997 0.1196 0.4977 0.8268 0.9751 0.9976 0.9999 1.0000 1.0000
20 5 0.0985 0.1154 0.2468 0.4115 0.6287 0.8065 0.9060 0.9811 0.9952
50 5 0.0987 0.1061 0.1170 0.1496 0.2546 0.4322 0.6503 0.9242 0.9852
100 5 0.0990 0.0988 0.1058 0.0991 0.1122 0.1488 0.2296 0.6279 0.9233
150 5 0.1025 0.1005 0.0957 0.0984 0.1055 0.1013 0.1257 0.3033 0.7200
200 5 0.1007 0.0983 0.0973 0.1060 0.1008 0.1020 0.1115 0.1610 0.4645
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2.5 ? = 3 ? = 3.5 ? = 4 ? = 5 ? = 6
20 5 0.0919 0.1120 0.4927 0.7795 0.9408 0.9894 0.9984 1.0000 1.0000
50 5 0.0996 0.1090 0.4482 0.7518 0.9268 0.9861 0.9980 1.0000 1.0000
100 5 0.0998 0.1092 0.4013 0.7012 0.9063 0.9807 0.9969 0.9998 1.0000
150 5 0.1009 0.1091 0.3694 0.6740 0.8936 0.9760 0.9959 0.9999 1.0000
200 5 0.0973 0.1065 0.3565 0.6577 0.8752 0.9771 0.9953 0.9997 1.0000
20 5 0.0851 0.1067 0.3870 0.6811 0.8891 0.9674 0.9912 0.9994 0.9999
50 5 0.0961 0.1037 0.3447 0.6374 0.8679 0.9622 0.9891 0.9986 0.9998
100 5 0.1031 0.0977 0.2989 0.5842 0.8339 0.9488 0.9862 0.9974 0.9996
150 5 0.0939 0.1062 0.2754 0.5473 0.8055 0.9406 0.9804 0.9975 0.9995
200 5 0.0988 0.1041 0.2518 0.5180 0.8026 0.9335 0.9773 0.9973 0.9992
20 5 0.0973 0.1033 0.2135 0.3926 0.6432 0.8208 0.9263 0.9872 0.9956
50 5 0.0978 0.1025 0.1774 0.3318 0.5795 0.7957 0.9095 0.9817 0.9939
100 5 0.0994 0.1014 0.1411 0.2598 0.4940 0.7432 0.8892 0.9765 0.9933
150 5 0.0950 0.1010 0.1265 0.2344 0.4493 0.6983 0.8667 0.9714 0.9912
200 5 0.1019 0.0988 0.1251 0.2091 0.4061 0.6681 0.8471 0.9694 0.9914
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2.5 ? = 3 ? = 3.5 ? = 4 ? = 5 ? = 6
20 5 0.0902 0.1064 0.4549 0.7387 0.9219 0.9836 0.9976 1.0000 1.0000
50 5 0.1006 0.1087 0.4302 0.7318 0.9168 0.9837 0.9978 1.0000 1.0000
100 5 0.0983 0.1074 0.3899 0.6902 0.8987 0.9794 0.9968 0.9999 0.9999
150 5 0.1019 0.1082 0.3631 0.6653 0.8902 0.9755 0.9958 0.9999 1.0000
200 5 0.0967 0.1059 0.3515 0.6506 0.8717 0.9758 0.9956 0.9999 1.0000
20 5 0.0862 0.1046 0.3508 0.6351 0.8559 0.9543 0.9856 0.9996 0.9997
50 5 0.0981 0.1047 0.3302 0.6157 0.8513 0.9555 0.9872 0.9989 0.9996
100 5 0.1035 0.0970 0.2907 0.5702 0.8237 0.9435 0.9817 0.9989 0.9995
150 5 0.0940 0.1071 0.2679 0.5372 0.7977 0.9382 0.9781 0.9980 0.9995
200 5 0.0988 0.1041 0.2480 0.5088 0.7948 0.9312 0.9762 0.9979 0.9992
20 5 0.0959 0.1024 0.1914 0.3475 0.5767 0.7645 0.8873 0.9766 0.9956
50 5 0.0973 0.1010 0.1679 0.3065 0.5462 0.7683 0.8948 0.9793 0.9932
100 5 0.0974 0.1007 0.1388 0.2502 0.4760 0.7233 0.8790 0.9760 0.9924
150 5 0.0957 0.0995 0.1253 0.2305 0.4371 0.6847 0.8611 0.9715 0.9902
200 5 0.1035 0.0992 0.1249 0.2049 0.3936 0.6559 0.8397 0.9657 0.9902
2
2
2
m n
m n
p
2
2
2
p
t ( 3)
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
n or m a l
t ( 10)
2
2
2
p m n
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - M S D C h ar t
t ( 3)
n or m a l
n or m a l
t ( 10)
t ( 10)
t ( 3)
155
Appendix K: Simulation Results Using Symmetric Data with an IS in p = 5
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 ? = 6 ? = 7
20 5 0.0992 0.1791 0.7391 0.9939 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.0990 0.1532 0.7034 0.9944 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.1006 0.1408 0.6658 0.9935 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.1040 0.1323 0.6440 0.9939 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.1005 0.1212 0.6137 0.9892 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0987 0.1039 0.1697 0.4600 0.6703 0.8194 0.9184 0.9660 0.9930 0.9984
50 5 0.1045 0.0960 0.1037 0.1538 0.2215 0.3817 0.5886 0.7732 0.9563 0.9934
100 5 0.0973 0.0986 0.1024 0.0997 0.1047 0.1218 0.1495 0.2523 0.6051 0.9049
150 5 0.0936 0.0990 0.0978 0.1055 0.1009 0.0960 0.1089 0.1176 0.2296 0.5849
200 5 0.1008 0.0997 0.1001 0.0992 0.1010 0.1055 0.1043 0.1030 0.1235 0.2728
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 ? = 6 ? = 7
20 5 0.0967 0.1044 0.3214 0.8282 0.9539 0.9911 0.9993 0.9999 1.0000 1.0000
50 5 0.1020 0.1060 0.2838 0.7967 0.9455 0.9891 0.9988 0.9998 1.0000 1.0000
100 5 0.0985 0.1069 0.2295 0.7485 0.9173 0.9838 0.9980 1.0000 1.0000 1.0000
150 5 0.1019 0.1068 0.2223 0.7105 0.9090 0.9799 0.9964 0.9991 1.0000 1.0000
200 5 0.0929 0.0956 0.2025 0.6996 0.8992 0.9765 0.9965 0.9994 1.0000 1.0000
20 5 0.0930 0.0980 0.1184 0.2897 0.4630 0.6691 0.8206 0.9200 0.9873 0.9981
50 5 0.1010 0.1014 0.1032 0.2045 0.3414 0.5607 0.7768 0.9126 0.9881 0.9986
100 5 0.1022 0.0971 0.1044 0.1474 0.2535 0.4412 0.6859 0.8720 0.9849 0.9973
150 5 0.1037 0.0998 0.0998 0.1357 0.2105 0.3736 0.6098 0.8190 0.9810 0.9975
200 5 0.1013 0.0977 0.1074 0.1222 0.1740 0.3234 0.5481 0.7898 0.9761 0.9971
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 ? = 6 ? = 7
20 5 0.0912 0.0980 0.1141 0.2520 0.3997 0.5950 0.7547 0.8708 0.9716 0.9943
50 5 0.1021 0.0998 0.1041 0.1899 0.3110 0.5200 0.7321 0.8848 0.9847 0.9975
100 5 0.1009 0.0970 0.1033 0.1431 0.2394 0.4147 0.6537 0.8424 0.9792 0.9978
150 5 0.1035 0.0989 0.0997 0.1336 0.2046 0.3549 0.5846 0.8132 0.9779 0.9968
200 5 0.1015 0.0988 0.1076 0.1199 0.1706 0.3137 0.5309 0.7691 0.9725 0.9964
m n
m n
m n
p
5
5
p
t ( 3)
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
n or m a l
n or m a l
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t
t ( 3)
p
5
5
t ( 3)
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - M S D C h ar t
5
156
Appendix L: Simulation Results Using Symmetric Data with an IS in p = 10
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0995 0.1387 0.5438 0.9670 0.9998 1.0000 1.0000 1.0000 1.0000
50 5 0.1019 0.1268 0.5379 0.9764 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.0979 0.1199 0.5038 0.9736 0.9999 1.0000 1.0000 1.0000 1.0000
150 5 0.0995 0.1197 0.4692 0.9677 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.1016 0.1157 0.4553 0.9667 0.9999 1.0000 1.0000 1.0000 1.0000
20 5 0.0987 0.1038 0.1395 0.3195 0.6983 0.9302 0.9894 0.9991 0.9996
50 5 0.1030 0.1035 0.1068 0.1284 0.2556 0.5971 0.9037 0.9843 0.9981
100 5 0.0943 0.0971 0.0964 0.1045 0.1131 0.1594 0.3705 0.7529 0.9531
150 5 0.0979 0.0972 0.0949 0.0985 0.0995 0.1029 0.1342 0.2857 0.6659
200 5 0.1005 0.0994 0.0993 0.1006 0.1011 0.0973 0.1085 0.1396 0.2816
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0947 0.1079 0.2509 0.6957 0.9668 0.9992 1.0000 1.0000 1.0000
50 5 0.0975 0.0993 0.1889 0.6289 0.9546 0.9985 0.9999 1.0000 1.0000
100 5 0.0972 0.1015 0.1644 0.5607 0.9379 0.9985 0.9999 1.0000 1.0000
150 5 0.0990 0.1044 0.1518 0.5260 0.9225 0.9964 0.9999 1.0000 1.0000
200 5 0.0973 0.1026 0.1494 0.5009 0.9162 0.9952 0.9999 1.0000 1.0000
20 5 0.0981 0.0979 0.1070 0.1528 0.3362 0.6653 0.8966 0.9823 0.9968
50 5 0.1004 0.1029 0.0966 0.1146 0.2004 0.4930 0.8380 0.9778 0.9973
100 5 0.0974 0.0999 0.1059 0.1064 0.1402 0.3368 0.7346 0.9640 0.9972
150 5 0.0981 0.0975 0.0985 0.1005 0.1273 0.2522 0.6426 0.9393 0.9956
200 5 0.0957 0.1026 0.1034 0.0995 0.1139 0.2106 0.5652 0.9221 0.9947
10
10
10
p m n
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
t ( 3)
n or m a l
t ( 3)
n or m a l
p m n
10
157
Appendix M: Simulation Results Using Symmetric Data with a 5% SS in p = 2
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0956 0.2576 0.8755 0.9835 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.1012 0.2906 0.9625 0.9987 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.0990 0.4052 0.9991 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.1031 0.4354 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0993 0.4841 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0985 0.1176 0.2473 0.4248 0.6404 0.8041 0.9079 0.9793 0.9936 0.9985 0.9991
50 5 0.0987 0.1022 0.1313 0.1891 0.3091 0.5196 0.7544 0.9594 0.9941 0.9989 1.0000
100 5 0.0990 0.0946 0.1058 0.1186 0.1597 0.2605 0.4569 0.8859 0.9902 0.9992 0.9997
150 5 0.1025 0.1007 0.1122 0.1134 0.1186 0.1428 0.2103 0.5985 0.9344 0.9944 0.9990
200 5 0.1007 0.0959 0.1038 0.1071 0.1163 0.1310 0.1561 0.3909 0.8497 0.9856 0.9986
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0919 0.1157 0.4920 0.7743 0.9414 0.9910 0.9989 1.0000 1.0000 1.0000 1.0000
50 5 0.0996 0.1194 0.5842 0.8872 0.9892 0.9993 0.9999 1.0000 1.0000 1.0000 1.0000
100 5 0.0998 0.1345 0.7941 0.9872 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.1009 0.1417 0.8360 0.9951 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0973 0.1503 0.8985 0.9987 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0973 0.0953 0.2167 0.4078 0.6423 0.8240 0.9264 0.9848 0.9963 0.9983 0.9993
50 5 0.0978 0.0984 0.2060 0.4160 0.6855 0.8878 0.9728 0.9983 0.9994 0.9995 0.9998
100 5 0.0994 0.0993 0.2359 0.4995 0.8355 0.9743 0.9960 0.9994 0.9998 0.9999 1.0000
150 5 0.0950 0.0981 0.2209 0.4957 0.8335 0.9812 0.9980 0.9998 1.0000 1.0000 1.0000
200 5 0.1019 0.0986 0.2272 0.5167 0.8672 0.9890 0.9988 0.9997 0.9999 0.9999 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0959 0.0949 0.1953 0.3621 0.5784 0.7676 0.8914 0.9768 0.9939 0.9976 0.9990
50 5 0.0973 0.0965 0.1868 0.3507 0.5913 0.8139 0.9349 0.9933 0.9994 0.9995 0.9999
100 5 0.0974 0.0998 0.2100 0.4191 0.7379 0.9316 0.9879 0.9995 0.9997 0.9996 1.0000
150 5 0.0957 0.0999 0.2005 0.4121 0.7169 0.9336 0.9934 0.9996 0.9995 0.9999 0.9999
200 5 0.1035 0.1009 0.2029 0.4271 0.7623 0.9554 0.9957 0.9996 0.9994 1.0000 0.9999
t ( 3) 2
m n
2
p
p m n
2
p
2
2
m
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
n or m a l
t ( 3)
n or m a l
t ( 3)
n
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
158
Appendix N: Simulation Results Using Symmetric Data with a 15% SS in p = 2
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0901 0.3966 0.9854 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.1026 0.4790 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.0990 0.6014 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.1009 0.6486 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0941 0.6901 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0985 0.1369 0.3595 0.5747 0.7921 0.9088 0.9617 0.9941 0.9976 0.9992 0.9995
50 5 0.0987 0.1150 0.1713 0.2584 0.4344 0.6663 0.8420 0.9795 0.9950 0.9988 0.9998
100 5 0.0990 0.0965 0.1229 0.1425 0.2015 0.3140 0.5217 0.8979 0.9888 0.9964 0.9995
150 5 0.1025 0.1043 0.1056 0.1240 0.1388 0.1940 0.2585 0.6328 0.9326 0.9915 0.9980
200 5 0.1007 0.1004 0.1106 0.1202 0.1339 0.1506 0.2000 0.4280 0.8296 0.9790 0.9981
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0919 0.1266 0.5698 0.8621 0.9820 0.9984 0.9998 1.0000 0.9999 0.9997 1.0000
50 5 0.0996 0.1402 0.6561 0.9375 0.9973 1.0000 1.0000 0.9999 0.9999 1.0000 1.0000
100 5 0.0998 0.1519 0.7984 0.9876 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.1009 0.1541 0.8408 0.9945 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0973 0.1550 0.8786 0.9978 1.0000 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0973 0.0996 0.2185 0.3975 0.6356 0.8208 0.9317 0.9935 0.9985 0.9996 1.0000
50 5 0.0978 0.1042 0.2132 0.3951 0.6463 0.8559 0.9593 0.9974 0.9989 0.9999 0.9999
100 5 0.0994 0.1071 0.2188 0.4136 0.7085 0.9106 0.9813 0.9985 0.9996 1.0000 1.0000
150 5 0.0950 0.1041 0.2084 0.4129 0.6911 0.9072 0.9854 0.9993 0.9994 0.9999 0.9999
200 5 0.1019 0.1021 0.2042 0.3984 0.6999 0.9176 0.9870 0.9993 0.9997 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0959 0.0991 0.1814 0.2853 0.4414 0.5971 0.7399 0.9126 0.9722 0.9919 0.9976
50 5 0.0973 0.1036 0.1751 0.2743 0.4180 0.5841 0.7152 0.9040 0.9677 0.9889 0.9966
100 5 0.0974 0.1062 0.1811 0.2913 0.4575 0.6264 0.7664 0.9362 0.9844 0.9966 0.9993
150 5 0.0957 0.1041 0.1771 0.2845 0.4415 0.6041 0.7576 0.9320 0.9835 0.9957 0.9987
200 5 0.1035 0.1021 0.1740 0.2804 0.4408 0.6185 0.7657 0.9406 0.9859 0.9970 0.9991
p
2
2
p
2
2
p
2
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
m n
m n
n or m a l
n or m a l
t ( 3)
t ( 3)
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
t ( 3)
m n
159
Appendix O: Simulation Results Using Symmetric Data with a 30% SS in p = 2
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0970 0.4486 0.9877 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.0971 0.5508 0.9994 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.1012 0.6391 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.0974 0.6695 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0965 0.7107 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0985 0.1478 0.3740 0.5731 0.7725 0.8920 0.9542 0.9875 0.9961 0.9988 0.9993
50 5 0.0987 0.1181 0.1860 0.2727 0.4284 0.6168 0.8144 0.9666 0.9934 0.9970 0.9986
100 5 0.0990 0.1020 0.1335 0.1577 0.2076 0.2829 0.4083 0.7743 0.9599 0.9915 0.9973
150 5 0.1025 0.1073 0.1236 0.1357 0.1635 0.1884 0.2384 0.4802 0.8231 0.9701 0.9917
200 5 0.1007 0.1094 0.1127 0.1236 0.1404 0.1622 0.1990 0.3221 0.6076 0.9028 0.9820
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0952 0.1097 0.2626 0.4342 0.6330 0.7909 0.9085 0.9922 0.9998 0.9998 1.0000
50 5 0.0985 0.1159 0.3212 0.5213 0.7209 0.8661 0.9517 0.9974 1.0000 1.0000 1.0000
100 5 0.0969 0.1245 0.3571 0.5759 0.7708 0.9073 0.9694 0.9988 1.0000 1.0000 1.0000
150 5 0.0973 0.1185 0.3755 0.6052 0.8109 0.9276 0.9794 0.9989 1.0000 1.0000 1.0000
200 5 0.0996 0.1233 0.4044 0.6288 0.8314 0.9364 0.9811 0.9994 0.9999 1.0000 1.0000
20 5 0.0942 0.0987 0.1379 0.1935 0.2853 0.3839 0.5116 0.7353 0.8950 0.9669 0.9947
50 5 0.0945 0.0997 0.1406 0.1919 0.2743 0.3875 0.5054 0.7428 0.8997 0.9725 0.9957
100 5 0.0948 0.0948 0.1345 0.1852 0.2681 0.3769 0.4966 0.7524 0.9073 0.9717 0.9947
150 5 0.0992 0.1012 0.1316 0.1795 0.2609 0.3566 0.4955 0.7442 0.9049 0.9712 0.9933
200 5 0.0954 0.0981 0.1288 0.1761 0.2598 0.3615 0.4859 0.7498 0.9043 0.9675 0.9915
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8
20 5 0.0948 0.0941 0.1145 0.1354 0.1679 0.1864 0.2148 0.2668 0.2974 0.3257 0.3451
50 5 0.0948 0.0992 0.1210 0.1412 0.1702 0.1945 0.2151 0.2586 0.3087 0.3292 0.3553
100 5 0.0945 0.0972 0.1193 0.1379 0.1687 0.1921 0.2164 0.2722 0.3126 0.3422 0.3705
150 5 0.0991 0.1021 0.1200 0.1363 0.1662 0.1845 0.2196 0.2712 0.3001 0.3332 0.3621
200 5 0.0968 0.0981 0.1144 0.1353 0.1668 0.1944 0.2032 0.2691 0.3084 0.3494 0.3631
2
m n
m n
m n
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
n or m a l
p
2
2
p
t ( 3)
t ( 3)
n or m a l 2
2
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
p
t ( 3)
160
Appendix P: Simulation Results Using Symmetric Data with a 5% SS in p = 10
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0
20 5 0.1010 0.1417 0.5407 0.9664 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.1029 0.1569 0.7098 0.9978 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.0977 0.1790 0.9170 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.0936 0.1894 0.9526 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.1005 0.2041 0.9808 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0987 0.1050 0.1396 0.3116 0.6846 0.9251 0.9896 0.9990 0.9999 0.9999 1.0000
50 5 0.1030 0.0996 0.1086 0.1491 0.3094 0.6567 0.9241 0.9894 0.9986 0.9999 1.0000
100 5 0.0943 0.1013 0.1070 0.1153 0.1551 0.2771 0.6562 0.9387 0.9938 0.9992 0.9999
150 5 0.0979 0.0985 0.1032 0.1086 0.1128 0.1446 0.2346 0.5196 0.8724 0.9851 0.9972
200 5 0.1005 0.0981 0.0951 0.1016 0.1060 0.1215 0.1591 0.2767 0.6207 0.9207 0.9897
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0
20 5 0.0947 0.1107 0.2460 0.6923 0.9681 0.9994 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.0975 0.1095 0.2431 0.7505 0.9876 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.0972 0.1073 0.3075 0.9262 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.0990 0.1029 0.3264 0.9447 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0973 0.1034 0.3605 0.9756 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0981 0.0988 0.1094 0.1587 0.3444 0.6559 0.8976 0.9802 0.9969 0.9994 0.9999
50 5 0.1004 0.1001 0.1022 0.1186 0.2185 0.4938 0.8180 0.9693 0.9979 0.9994 0.9999
100 5 0.0974 0.0953 0.1037 0.1167 0.2178 0.5369 0.9032 0.9907 0.9996 1.0000 1.0000
150 5 0.0981 0.1015 0.0995 0.1096 0.1747 0.4674 0.8701 0.9894 0.9998 1.0000 1.0000
200 5 0.0957 0.1013 0.1030 0.1146 0.1757 0.4738 0.8888 0.9952 0.9996 1.0000 1.0000
p
10
10
m
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
n
m n
n or m a l
t ( 3)
p
n or m a l
t ( 3)
10
10
161
Appendix Q: Simulation Results Using Symmetric Data with a 15% SS in p = 10
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0
20 5 0.0995 0.1831 0.7445 0.9974 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.0950 0.2261 0.9049 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.0968 0.2464 0.9848 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.0973 0.2713 0.9949 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0982 0.2946 0.9992 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0987 0.1121 0.1772 0.4380 0.8039 0.9680 0.9962 0.9997 0.9999 1.0000 1.0000
50 5 0.1030 0.1065 0.1329 0.1984 0.4178 0.7694 0.9578 0.9949 0.9994 0.9998 1.0000
100 5 0.0943 0.1026 0.1113 0.1282 0.1909 0.3578 0.6962 0.9421 0.9944 0.9988 0.9998
150 5 0.0979 0.0964 0.1045 0.1205 0.1428 0.1828 0.2854 0.5475 0.8508 0.9762 0.9976
200 5 0.1005 0.0980 0.1040 0.1071 0.1216 0.1386 0.1961 0.3136 0.5851 0.8851 0.9825
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0
20 5 0.0947 0.1094 0.2859 0.7599 0.9839 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.0975 0.1075 0.2744 0.7892 0.9951 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.0972 0.1088 0.3130 0.9012 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.0990 0.1082 0.3314 0.9240 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0973 0.1121 0.3573 0.9519 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0981 0.1004 0.1051 0.1649 0.3047 0.5756 0.8267 0.9511 0.9920 0.9993 1.0000
50 5 0.1004 0.0959 0.1057 0.1272 0.2051 0.3934 0.6600 0.8851 0.9816 0.9993 1.0000
100 5 0.0974 0.1009 0.1029 0.1186 0.1881 0.3850 0.6887 0.9214 0.9905 0.9998 1.0000
150 5 0.0981 0.0908 0.1002 0.1147 0.1675 0.3294 0.6172 0.8768 0.9814 0.9993 1.0000
200 5 0.0957 0.0973 0.1017 0.1126 0.1703 0.3129 0.6219 0.8908 0.9871 0.9996 1.0000
n
m n
p
10
10
p
10
10
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
m
n or m a l
t ( 3)
n or m a l
t ( 3)
162
Appendix R: Simulation Results Using Symmetric Data with a 30% SS in p = 10
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0
20 5 0.0957 0.2040 0.7375 0.9952 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
50 5 0.0969 0.2504 0.9008 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 5 0.1030 0.2719 0.9682 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
150 5 0.0978 0.2935 0.9860 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
200 5 0.0962 0.3092 0.9934 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 5 0.0987 0.1122 0.1859 0.4187 0.7523 0.9482 0.9931 0.9994 0.9999 0.9999 1.0000
50 5 0.1030 0.1094 0.1439 0.2264 0.4202 0.7460 0.9434 0.9938 0.9992 0.9998 1.0000
100 5 0.0943 0.1066 0.1196 0.1510 0.1979 0.3169 0.5278 0.8218 0.9639 0.9954 0.9994
150 5 0.0979 0.1004 0.1093 0.1254 0.1489 0.1960 0.2900 0.4356 0.6899 0.9171 0.9870
200 5 0.1005 0.1012 0.1094 0.1200 0.1337 0.1589 0.2058 0.2651 0.4123 0.6195 0.8686
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0
20 5 0.1013 0.1136 0.1759 0.4025 0.7251 0.9344 0.9944 0.9993 1.0000 1.0000 1.0000
50 5 0.1024 0.1121 0.1818 0.4155 0.7651 0.9613 0.9976 1.0000 1.0000 1.0000 1.0000
100 5 0.0969 0.1032 0.1736 0.4387 0.8137 0.9713 0.9989 1.0000 1.0000 1.0000 1.0000
150 5 0.0967 0.1001 0.1769 0.4457 0.8223 0.9807 0.9996 1.0000 1.0000 1.0000 1.0000
200 5 0.1021 0.1057 0.1835 0.4711 0.8435 0.9845 0.9990 1.0000 1.0000 1.0000 1.0000
20 5 0.0946 0.0996 0.1112 0.1237 0.1710 0.2612 0.4224 0.6628 0.8747 0.9701 0.9923
50 5 0.0967 0.1015 0.1052 0.1094 0.1406 0.2009 0.2903 0.4543 0.7347 0.9553 0.9973
100 5 0.0985 0.0997 0.0996 0.1068 0.1324 0.1619 0.2527 0.3618 0.5623 0.9034 0.9960
150 5 0.0995 0.0993 0.0999 0.1017 0.1253 0.1551 0.2254 0.3298 0.4823 0.8402 0.9928
200 5 0.0961 0.0969 0.0991 0.1064 0.1221 0.1476 0.2184 0.3138 0.4491 0.7572 0.9903
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0
20 5 0.0909 0.0948 0.1041 0.1093 0.1278 0.1605 0.2134 0.2560 0.3182 0.3837 0.4417
50 5 0.0960 0.0997 0.1029 0.1060 0.1176 0.1445 0.1661 0.2075 0.2530 0.3155 0.3456
100 5 0.0979 0.0979 0.0980 0.1037 0.1211 0.1266 0.1586 0.1930 0.2341 0.2755 0.3192
150 5 0.0992 0.0999 0.0997 0.0996 0.1131 0.1283 0.1493 0.1832 0.2119 0.2544 0.2983
200 5 0.0965 0.0967 0.0994 0.1033 0.1141 0.1202 0.1459 0.1779 0.2087 0.2390 0.2912
m n
m n
t ( 3)
t ( 3)
n or m a l
p
10
10
10
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
n or m a l
p
10
t ( 3)
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
p
10
m n
163
Appendix S: Simulation Results Using In-Control Skewed Data
P r o c e s s
D i s t r i b u t i o n p = 2 p = 5 p = 2 p = 5 p = 2 p = 5
20 5 0 .4 4 1 4 0 .4 8 0 3 0 .0 9 3 5 0 .0 9 6 5 0 .0 9 9 1
50 5 0 .7 6 1 8 0 .8 6 7 6 0 .1 0 1 9 0 .0 9 9 4 0 .0 9 8 7
100 5 0 .9 3 5 3 0 .9 8 2 7 0 .1 0 1 2 0 .1 0 3 0 0 .1 0 0 5
150 5 0 .9 7 7 9 0 .9 9 7 2 0 .0 9 4 9 0 .0 9 5 7 0 .1 0 2 0
200 5 0 .9 9 1 5 0 .9 9 9 8 0 .0 9 9 6 0 .0 9 9 7 0 .1 0 2 3
l o g n o r m a l
E m p i r i c a l F A P f o r M M R - M S D C h a r tE m p i r i c a l F A P f o r M M R - R M D C h a r tE m p i r i c a l F A P f o r H o t e l l i n g ' s T 2 C h a r tm n
164
Appendix T: Simulation Results Using Skewed Data with an IS in p = 2
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3
20 5 0.0984 0.1234 0.2527 0.5734 0.8771 0.9857 0.9995
50 5 0.0979 0.1138 0.2139 0.5128 0.8494 0.9812 0.9994
100 5 0.1004 0.1041 0.1846 0.4545 0.8181 0.9777 0.9994
150 5 0.0997 0.1071 0.1717 0.4171 0.7983 0.9682 0.9986
200 5 0.0972 0.1036 0.1617 0.4075 0.7840 0.9668 0.9978
20 5 0.0967 0.1071 0.1579 0.3973 0.7084 0.8826 0.9586
50 5 0.0956
100 5 0.1009 0.0991 0.1003 0.1016 0.1254 0.2423 0.5404
150 5 0.1001
200 5 0.0979 0.0999 0.1008 0.1017 0.1008 0.1066 0.1618
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3
20 5 0.0919 0.0899 0.1240 0.2274 0.4873 0.7792 0.9401
50 5 0.0996 0.0962 0.1140 0.2012 0.4509 0.7458 0.9348
100 5 0.0998 0.0977 0.1085 0.1858 0.4046 0.7065 0.9075
150 5 0.1009 0.1001 0.1067 0.1725 0.3713 0.6672 0.8929
200 5 0.0973 0.0995 0.1030 0.1555 0.3491 0.6587 0.8768
20 5 0.0935 0.1048 0.2979 0.7470 0.9430 0.9861 0.9918
50 5 0.1019
100 5 0.1012 0.1027 0.1518 0.5256 0.9168 0.9788 0.9918
150 5 0.0949
200 5 0.0996 0.0956 0.1173 0.3561 0.8533 0.9684 0.9891
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3
20 5 0.0902 0.0887 0.1181 0.2117 0.4489 0.7387 0.9216
50 5 0.1006 0.0948 0.1107 0.1910 0.4327 0.7244 0.9163
100 5 0.0983 0.0969 0.1087 0.1809 0.3958 0.6934 0.9005
150 5 0.1019 0.1003 0.1066 0.1682 0.3618 0.6640 0.8891
200 5 0.0967 0.0990 0.1033 0.1549 0.3465 0.6517 0.8745
20 5 0.0965 0.1785 0.3991 0.6790 0.8966 0.9759 0.9914
50 5 0.0994
100 5 0.1030 0.1579 0.3564 0.6137 0.9052 0.9836 0.9938
150 5 0.0957
200 5 0.0997 0.1366 0.3322 0.5517 0.8622 0.9805 0.9918
2
2
p m n
2
2
n or m a l
n or m a l
l og n or m a l
l og n or m a l
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - M S D C h ar t
p m n
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
n or m a l
l og n or m a l
m np
2
2
165
Appendix U: Simulation Results Using Skewed Data with an IS in p = 5
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4
20 5 0.0934 0.1119 0.1279 0.2515 0.5461 0.8207 0.9455 0.9851 0.9945
50 5 0.0963
100 5 0.0958 0.0968 0.0975 0.1001 0.1110 0.1400 0.2783 0.6024 0.8719
150 5 0.0968
200 5 0.0967 0.0953 0.0976 0.1012 0.0989 0.1018 0.1128 0.1577 0.3426
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4
20 5 0.0991 0.0977 0.1258 0.2897 0.5647 0.8285 0.9420 0.9855 0.9962
50 5 0.0987
100 5 0.1005 0.1001 0.0964 0.1760 0.4048 0.7213 0.9350 0.9926 0.9987
150 5 0.1020
200 5 0.1023 0.0969 0.1038 0.1370 0.3094 0.6104 0.8813 0.9850 0.9986
l og n or m a l
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
l og n or m a l
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - M S D C h ar t
p
5
p
5
m n
m n
166
Appendix V: Simulation Results Using Skewed Data with a 5% SS in p = 2
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0967 0.1072 0.1577 0.3914 0.7022 0.8842 0.9589 0.9839 0.9938 0.9967 0.9974
50 5 0.0956
100 5 0.1009 0.1028 0.1055 0.1258 0.1868 0.4334 0.7751 0.9465 0.9876 0.9978 0.9996
150 5 0.1001
200 5 0.0979 0.1012 0.1024 0.1102 0.1258 0.1842 0.3438 0.7156 0.9361 0.9885 0.9983
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0935 0.1072 0.3005 0.7387 0.9436 0.9821 0.9934 0.9966 0.9985 0.9990 0.9993
50 5 0.1019
100 5 0.1012 0.1024 0.2126 0.7795 0.9933 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000
150 5 0.0949
200 5 0.0996 0.1020 0.1746 0.7258 0.9955 0.9999 0.9999 1.0000 1.0000 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0965 0.1868 0.4053 0.6796 0.8998 0.9737 0.9921 0.9963 0.9984 0.9995 0.9993
50 5 0.0994
100 5 0.1030 0.2552 0.5387 0.7700 0.9632 0.9987 0.9999 1.0000 1.0000 1.0000 1.0000
150 5 0.0957
200 5 0.0997 0.2678 0.5578 0.7762 0.9664 0.9998 0.9999 1.0000 1.0000 1.0000 1.0000
2
n
m n
p m n
p
2
p
2
m
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
l og n or m a l
l og n or m a l
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
l og n or m a l
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
167
Appendix W: Simulation Results Using Skewed Data with a 15% SS in p = 2
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0967 0.1150 0.2044 0.5074 0.8041 0.9374 0.9746 0.9904 0.9972 0.9984 0.9991
50 5 0.0956
100 5 0.1009 0.1058 0.1160 0.1553 0.2504 0.4650 0.7737 0.9310 0.9858 0.9960 0.9982
150 5 0.1001
200 5 0.0979 0.1011 0.1056 0.1287 0.1626 0.2307 0.4078 0.6638 0.8988 0.9806 0.9957
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0935 0.1099 0.2674 0.6685 0.9247 0.9895 0.9981 0.9993 0.9999 0.9998 0.9997
50 5 0.1019
100 5 0.1012 0.1075 0.1904 0.5607 0.9053 0.9927 0.9992 1.0000 1.0000 1.0000 1.0000
150 5 0.0949
200 5 0.0996 0.1050 0.1649 0.4819 0.8685 0.9907 0.9993 1.0000 1.0000 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0965 0.2099 0.4151 0.6046 0.7839 0.9072 0.9648 0.9889 0.9955 0.9983 0.9993
50 5 0.0994
100 5 0.1030 0.2263 0.4239 0.6243 0.8080 0.9372 0.9857 0.9976 0.9997 0.9998 0.9999
150 5 0.0957
200 5 0.0997 0.2103 0.4125 0.6021 0.8056 0.9404 0.9894 0.9978 0.9989 0.9997 0.9996
p m n
2
p m n
2
p m n
2
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
l og n or m a l
l og n or m a l
l og n or m a l
168
Appendix X: Simulation Results Using Skewed Data with a 30% SS in p = 2
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0967 0.1256 0.2241 0.4939 0.7701 0.9173 0.9714 0.9868 0.9949 0.9973 0.9989
50 5 0.0956
100 5 0.1009 0.1073 0.1247 0.1722 0.2658 0.4155 0.6337 0.8513 0.9522 0.9859 0.9962
150 5 0.1001
200 5 0.0979 0.1048 0.1144 0.1400 0.1746 0.2546 0.3735 0.5437 0.7171 0.8843 0.9654
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0955 0.1013 0.1440 0.2777 0.5222 0.7769 0.9319 0.9846 0.9972 0.9993 1.0000
50 5 0.0983
100 5 0.0982 0.0999 0.1368 0.2239 0.3961 0.5983 0.7691 0.9078 0.9808 0.9985 0.9997
150 5 0.1008
200 5 0.0992 0.1019 0.1386 0.2079 0.3640 0.5372 0.7020 0.8314 0.9433 0.9940 0.9997
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0965 0.1458 0.1946 0.2119 0.2345 0.2724 0.2992 0.3175 0.3537 0.3645 0.3771
50 5 0.0994
100 5 0.1030 0.1390 0.1630 0.1993 0.2314 0.2801 0.3116 0.3556 0.3829 0.3986 0.4240
150 5 0.0957
200 5 0.0997 0.1258 0.1562 0.1843 0.2258 0.2756 0.3145 0.3499 0.3852 0.4220 0.4270
p m n
2
p m n
2
p m n
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
l og n or m a l
l og n or m a l
l og n or m a l 2
169
Appendix Y: Simulation Results Using Skewed Data with a SS in p = 5
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0934 0.1031 0.1236 0.2490 0.5473 0.8198 0.9456 0.9857 0.9970 0.9989 0.9998
50 5 0.0963
100 5 0.0958 0.0980 0.1020 0.1112 0.1419 0.2260 0.5187 0.8406 0.9713 0.9965 0.9994
150 5 0.0968
200 5 0.0967 0.0947 0.0973 0.1025 0.1097 0.1356 0.1916 0.3595 0.6965 0.9435 0.9919
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0991 0.0920 0.1296 0.2995 0.5772 0.8201 0.9460 0.9865 0.9974 0.9989 0.9998
50 5 0.0987
100 5 0.1005 0.1050 0.1067 0.2296 0.5250 0.8371 0.9689 0.9974 0.9994 0.9999 1.0000
150 5 0.1020
200 5 0.1023 0.0998 0.1054 0.1922 0.4607 0.7969 0.9658 0.9989 0.9999 0.9999 1.0000
p m n
5
p m n
l og n or m a l
l og n or m a l 5
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0934 0.1109 0.1574 0.3359 0.6643 0.8916 0.9749 0.9945 0.9989 0.9996 0.9999
50 5 0.0963
100 5 0.0958 0.1013 0.1085 0.1309 0.1843 0.2998 0.5539 0.8396 0.9653 0.9960 0.9988
150 5 0.0968
200 5 0.0967 0.0958 0.0993 0.1123 0.1282 0.1688 0.2459 0.4111 0.6712 0.9096 0.9856
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0991 0.0907 0.1251 0.2176 0.3862 0.6074 0.7847 0.9006 0.9572 0.9829 0.9931
50 5 0.0987
100 5 0.1005 0.1046 0.1055 0.1588 0.2801 0.4819 0.6916 0.8418 0.9446 0.9816 0.9932
150 5 0.1020
200 5 0.1023 0.0978 0.1034 0.1453 0.2316 0.3983 0.6119 0.8036 0.9173 0.9730 0.9907
p m n
5
p m n
l og n or m a l
l og n or m a l 5
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0934 0.1137 0.1713 0.3298 0.6137 0.8508 0.9568 0.9896 0.9973 0.9984 0.9997
50 5 0.0963
100 5 0.0958 0.1014 0.1143 0.1431 0.1914 0.2863 0.4627 0.6808 0.8868 0.9739 0.9942
150 5 0.0968
200 5 0.0967 0.0967 0.1020 0.1200 0.1401 0.1793 0.2447 0.3458 0.5092 0.7166 0.8836
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
20 5 0.0991 0.0878 0.1065 0.1140 0.1477 0.1793 0.2306 0.2725 0.3227 0.3643 0.4048
50 5 0.0987
100 5 0.1005 0.1028 0.0992 0.1102 0.1316 0.1546 0.2014 0.2350 0.2736 0.3244 0.3615
150 5 0.1020
200 5 0.1023 0.0975 0.0980 0.1150 0.1283 0.1491 0.1870 0.2183 0.2597 0.2960 0.3410
p m n
5
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t
l og n or m a l
p m n
5
l og n or m a l
170
Appendix Z: Subgroup Size Analysis Using In-Control Data
P r oc e s s E m p i r i c al F A P f or E m p i r i c al F A P f or
D i s t r i b u t i on H ot e l l i n g' s T
2
C h ar t M M R - R M D C h ar t
100 5 0. 95 41 0. 09 64
100 10 0. 90 54 0. 10 50
100 15 0. 87 08 0. 10 20
100 20 0. 83 32 0. 10 47
100 5 0. 98 33 0. 09 86
100 10 0. 94 37 0. 10 42
100 15 0. 90 29 0. 09 49
100 20 0. 86 02 0. 10 30
t ( 3)
l og n or m a l
m np
5
5
171
Appendix AA: Subgroup Size Analysis Using Data with an IS in p = 5
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7
100 5 0.0999 0.0991 0.0990 0.0990 0.1201 0.2394 0.6120 0.8989
100 10 0.0942 0.0999 0.0974 0.1434 0.5595 0.9532 0.9973 0.9998
100 15 0.0984 0.1000 0.1075 0.4214 0.9614 0.9990 1.0000 0.9999
100 20 0.1004 0.0994 0.1402 0.8056 0.9971 0.9999 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7
100 5 0.1022 0.1013 0.1055 0.1498 0.4484 0.8643 0.9851 0.9975
100 10 0.0986 0.1038 0.1395 0.5409 0.9767 0.9998 1.0000 1.0000
100 15 0.0964 0.1030 0.1915 0.8414 0.9995 1.0000 1.0000 0.9999
100 20 0.1021 0.1050 0.2741 0.9650 0.9999 1.0000 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
100 5 0.0933 0.0925 0.0977 0.1019 0.1080 0.1333 0.2812 0.5998 0.8664 0.9692 0.9929
100 10 0.1016 0.0951 0.0952 0.1190 0.2992 0.7866 0.9787 0.9987 0.9997 1.0000 1.0000
100 15 0.0977 0.0998 0.1133 0.2610 0.8405 0.9947 0.9998 0.9999 1.0000 1.0000 1.0000
100 20 0.0898 0.1038 0.1310 0.6108 0.9852 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5
100 5 0.0986 0.1016 0.1028 0.1130 0.2076 0.5308 0.8765 0.9826 0.9976 0.9994 1.0000
100 10 0.1042 0.1012 0.1173 0.3757 0.9080 0.9981 1.0000 1.0000 1.0000 1.0000 1.0000
100 15 0.0949 0.1093 0.1767 0.7495 0.9972 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100 20 0.1030 0.1085 0.2768 0.9339 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
t ( 3)
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t
p m n
5
t ( 3)
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
p m n
5
p m n
l og n or m a l
E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t
l og n or m a l
5
p m n
5
172
Appendix BB: Subgroup Size Analysis Using Data with a 15% SS in p = 5
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7
100 5 0.0963 0.0980 0.1120 0.1361 0.2236 0.5038 0.8621 0.9806
100 10 0.0917 0.1067 0.1270 0.3049 0.8734 0.9970 1.0000 1.0000
100 15 0.1003 0.1015 0.1805 0.7814 0.9979 1.0000 1.0000 1.0000
100 20 0.1021 0.1144 0.2951 0.9834 0.9999 1.0000 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7
100 5 0.0950 0.0996 0.1130 0.2132 0.5115 0.8817 0.9861 0.9988
100 10 0.0989 0.1058 0.1997 0.7429 0.9961 0.9997 0.9999 1.0000
100 15 0.0998 0.0985 0.3297 0.9696 0.9996 1.0000 1.0000 1.0000
100 20 0.0996 0.1147 0.4881 0.9977 1.0000 1.0000 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4
100 5 0.0933 0.0966 0.1075 0.1304 0.1843 0.2998 0.5591 0.8385 0.9673
100 10 0.1016 0.1014 0.1256 0.2416 0.6156 0.9663 0.9992 0.9998 1.0000
100 15 0.0977 0.1111 0.1791 0.5726 0.9905 0.9999 1.0000 1.0000 1.0000
100 20 0.0898 0.1202 0.2824 0.9500 0.9999 1.0000 1.0000 1.0000 1.0000
P r oc e s s
D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4
100 5 0.0986 0.1026 0.1078 0.1390 0.2592 0.5200 0.8104 0.9541 0.9957
100 10 0.1042 0.1035 0.1601 0.5318 0.9633 0.9994 1.0000 1.0000 1.0000
100 15 0.0949 0.1157 0.2799 0.8985 0.9995 1.0000 1.0000 1.0000 1.0000
100 20 0.1030 0.1189 0.4525 0.9899 1.0000 1.0000 1.0000 1.0000 1.0000
p m n
5
p m n
5
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t
t ( 3)
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
t ( 3)
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T
2
C h ar t
p m n
l og n or m a l
E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t
l og n or m a l
5
p m n
5