A Robust Version of Hotelling's T2 Control Chart  
for Retrospective Location Analysis of Individuals Using BACON Estimators 
 
by 
 
Richard C. Bell, Jr. 
 
 
 
 
A thesis submitted to the Graduate Faculty of 
Auburn University 
in partial fulfillment of the 
requirements for the Degree of 
Master of Science 
 
Auburn, Alabama 
December 7, 2011 
 
 
 
 
Keywords:  multivariate statistical process control, phase I, preliminary,  
in-control reference sample, breakdown point, outliers 
 
 
Copyright 2011 by Richard C. Bell, Jr. 
 
 
Approved by 
 
Nedret Billor, Chair, Associate Professor of Mathematics and Statistics 
L. Allison Jones-Farmer, Co-chair, Associate Professor of Management 
Asheber Abebe, Associate Professor of Mathematics and Statistics 
 
  
ii 
 
 
 
 
 
 
 
Abstract 
 
 
  
 Hotelling's T2 chart is commonly used for Phase I analysis of individual multivariate 
normally distributed data.  However, the presence of only a few outliers can significantly distort 
classical estimates of location and scale, thus rendering the resulting analysis ineffective.  This 
poses a significant problem for the Hotelling's T2 chart practitioner because the desired output of 
a Phase I analysis is an outlier-free reference sample which can be used to estimate control limits 
for prospectively monitoring a process in Phase II.  Careful selection of a robust parameter 
estimation method is therefore critical when the initial reference sample is suspected to contain 
multiple outliers.   
 The purpose of this research is to propose a version of Hotelling's T2 chart that uses the 
blocked adaptive computationally efficient outlier nominators (BACON) algorithm to robustly 
estimate location and scale parameters in Phase I.  The proposed control chart, which assumes 
individual multivariate normally distributed data with constant covariance, is designed to detect 
both individual outliers and sustained mean shifts.  Using Monte Carlo simulation, the proposed 
method is compared to Hotelling's T2 chart using classical estimators as well as robust estimators 
such as the minimum volume ellipsoid (MVE), minimum covariance determinant (MCD), and 
clustering methods.  Although the BACON-based version of Hotelling's T2 chart turned out to be 
less powerful than expected, it is significantly better than the classical approach and offers some 
improvement over existing robust methods at a fraction of the computational expense.   
iii 
 
 
 
 
 
 
 
Table of Contents 
 
 
Abstract ........................................................................................................................................... ii 
List of Tables .................................................................................................................................. v 
List of Figures ................................................................................................................................ vi 
List of Abbreviations .................................................................................................................... vii 
1 Introduction and Literature Review ........................................................................................ 1 
1.1 Special Considerations in Phase I Control Charting ........................................................ 1 
1.2 Hotelling's T2 Control Chart for Individuals -- the Classical Approach .......................... 4 
1.3 Previous Attempts to Improve the Robustness of Hotelling's T2 Chart ........................... 6 
1.4 Organization of Thesis ..................................................................................................... 9 
2 Hotelling's T2 Chart Using Robust Estimates of Location and Scatter ................................. 10 
2.1 Desirable Properties of Robust Estimators..................................................................... 10 
2.2 The MVE and MCD Methods of Robust Parameter Estimation .................................... 12 
2.3 The BACON Method of Robust Parameter Estimation ................................................. 14 
2.4 Empirical Control Limits for Hotelling's T2 Chart with Robust Estimators .................. 18 
3 Evaluating Phase I Performance of Hotelling's T2 Chart ...................................................... 22 
3.1 Simulating Multivariate Normally Distributed Data ...................................................... 22 
iv 
 
3.2 Assessing Out-of-Control Performance ......................................................................... 23 
4 Comparing the BACON-Based Hotelling's T2 Chart to Other Robust Versions .................. 26 
4.1 Detecting Randomly Occurring Outliers........................................................................ 26 
4.2 Detecting a Sustained Shift of the Mean ........................................................................ 37 
4.3 Application to an Example Data Set .............................................................................. 39 
5 Summary and Conclusions ................................................................................................... 42 
References ..................................................................................................................................... 45 
Appendices .................................................................................................................................... 48 
Appendix A:  MATLAB Code for Simulating Hotelling?s T2 Chart Empirical Control Limits .. 49 
Appendix B:  MATLAB Code for Simulating Hotelling's T2 Chart Performance ....................... 52 
 
  
 
v 
 
 
 
 
 
 
 
List of Tables 
 
 
Table 2.4.1  Empirical and Classical UCLs for Hotelling's T2 Chart ........................................... 19 
Table 2.4.2  BACON Input Arguments for Each Dimension Evaluated ...................................... 19 
Table 3.1.1  Summary of Planned Experiments ........................................................................... 23 
Table 4.1.1  Empirical Alarm Probabilities for p = 3 ................................................................... 29 
Table 4.1.2  Empirical Alarm Probabilities for p = 5 ................................................................... 33 
Table 4.1.3  Empirical Alarm Probabilities for p = 10 ................................................................. 35 
Table 4.2.1  Empirical Alarm Probabilities Under a 50% Sustained Shift of the Mean .............. 38 
Table 4.3.1  Example Bivariate Data Set ...................................................................................... 39 
Table 4.3.2  T2 Statistics for Original and Altered Samples Using BACON Estimators ............. 40 
 
  
 
vi 
 
 
 
 
 
 
 
List of Figures 
 
 
Figure 2.3.1  IC FAPs for Hotelling's T2  Chart Using Tracy et al.'s (1992) UCL ....................... 17 
Figure 2.4.1  Hotelling's T2 Chart IC FAPs Using Robust Estimators & Empirical UCLs .......... 21 
Figure 4.1.1  Control Chart Performance on Bivariate Normal Data with k = 1, 3 Outliers ........ 27 
Figure 4.1.2  Control Chart Performance on Bivariate Normal Data with k = 5, 7 Outliers ........ 28 
Figure 4.1.3  Control Chart Performance with k Outliers When n = 30 and p = 3 ....................... 30 
Figure 4.1.4  Effect of Increasing k on Control Chart Performance ............................................. 31 
Figure 4.1.5  Control Chart Performance with Extreme Outliers ................................................. 32 
Figure 4.1.6  Control Chart Performance with k Outliers When n = 50 and p = 5 ....................... 34 
Figure 4.1.7  Control Chart Performance with 20% Outliers in Ten Dimensions ........................ 36 
Figure 4.1.8  Effect of Increasing Dimension on Control Chart Performance ............................. 37 
Figure 4.2.1  Control Chart Performance Under a 50% Sustained Shift of the Mean .................. 38 
Figure 4.3.1  Application of the BACON-Based Hotelling's T2 Chart to Altered Data ............... 41 
 
  
 
vii 
 
 
 
 
 
 
 
List of Abbreviations 
 
 
ARL  average run length 
BACON blocked adaptive computationally efficient outlier nominators 
CL  center line 
EAP  empirical alarm probability 
FAP  false alarm probability 
HT2  Hotelling's T2 
IC  in control 
LCL  lower control limit 
MCD  minimum covariance determinant 
MCUSUM multivariate cumulative sum 
MEWMA multivariate exponentially weighted moving average 
MVE  minimum volume ellipsoid 
OC  out of control 
RBP  replacement breakdown point 
RL  run length 
RMCD reweighted minimum covariance determinant 
RMVE  reweighted minimum volume ellipsoid 
UCL  upper control limit 
  
 
1 
 
 
 
 
 
 
 
1 Introduction and Literature Review 
  The first multivariate quality control chart is attributed to Harold Hotelling (1947), who 
created the T2 chart to monitor bombsight data during World War II.  Since its introduction, 
many variations and refinements of Hotelling's T2 chart have been proposed, and it remains the 
most familiar multivariate quality control chart in existence today [Montgomery (2005, p. 491)].  
This research seeks to further broaden the appeal of Hotelling's T2 chart for individual 
multivariate normally distributed data in a Phase I setting by using Billor, Hadi, and Velleman's 
(2000) blocked adaptive computationally efficient outlier nominators (BACON) method of 
robust parameter estimation to improve the T2 statistic's robustness to outliers. 
 
1.1 Special Considerations in Phase I Control Charting 
  A control charting application is typically divided into two distinct phases.  In Phase I, 
when little is known about a process being studied, the objective is to identify an in-control (IC) 
reference sample.  This involves retrospective analysis of a historical data set in order to 
eliminate any data points that do not accurately represent the routine operation of the process.  
The resulting data are described as in control because it is believed that all remaining variability 
in the process is inherent to the process itself and not due to assignable causes.  Upon completion 
of Phase I, the in-control reference sample is used to establish control limits for Phase II, the 
monitoring stage of a control charting application.  In Phase II, newly observed data points are 
successively compared to the control limits to identify significant departures from the in-control 
 
2 
 
state.  Should an observation fall outside the control limits, a search for an assignable cause is 
immediately undertaken.  If the change in process behavior can be linked to special causes or 
external factors, the process is deemed out of control (OC) and corrective action is implemented 
to fix the problem.   
 Prior to conducting any analysis in a control charting scenario, it is usually assumed that 
the unedited reference sample may contain out-of-control points and the control limits are 
unknown.  The challenging nature of a Phase I analysis under these conditions has been 
recognized since the earliest days of statistical process control.  Shewhart (1939, p. 76) said, "In 
the majority of practical instances, the most difficult job of all is to choose the sample that is to 
be used as the basis for establishing the tolerance range.  If one chooses such a sample without 
respect to the assignable causes present, it is practically impossible to establish a tolerance range 
that is not subject to a huge error."   
 Phase I control charts are designed with the goal of achieving a specified overall in-
control false alarm probability (FAP), defined as the probability of one or more observations 
plotting outside the control limits in the absence of assignable causes.  Phase I usually involves 
iteratively comparing the reference sample to trial control limits (corresponding to the desired 
overall in-control FAP) estimated from the sample.  At each iteration of a Phase I analysis, an 
out-of-control point is eliminated from the reference sample if an assignable cause is identified, 
and trial control limits are updated excluding the out-of-control point.  This iterative process 
continues until all points in the reference sample are in control.   
 Phase I analysis requires careful consideration when it involves methods such as 
Hotelling's T2 chart that compute independent control chart statistics consisting of individual 
observations.  Provided the observations originate from random sampling, the control chart 
 
3 
 
statistics are independent of one another.  However, because the control limits in Phase I are 
estimated from the reference sample itself, the control limits are dependent on each sample point 
included in their calculation.  Thus, simultaneous comparisons of chart statistics to control limits 
in Phase I are statistically dependent despite the control chart statistics themselves being 
independent.  These dependencies often make it difficult to correctly determine the overall in-
control FAP for a Phase I analysis.   
 Phase II, on the other hand, consists of comparing new observations (in the form of a 
control chart statistic) to the control limits previously established in Phase I.  Because the control 
limits in Phase II are fixed through conditioning, successive comparisons of chart statistics to 
control limits are independent provided the chart statistics are independent of one another as in 
the case of Hotelling's T2 chart.  This is in contrast to multivariate exponentially weighted 
moving average (MEWMA) or multivariate cumulative sum (MCUSUM) charts whose chart 
statistics include past observations and are therefore naturally dependent. 
 Chart performance in Phase II is often measured using moments of the run length (RL) 
distribution.  The RL is the number of observations until an out-of-control signal is observed.  If 
the comparisons of the chart statistics to the control limits are independent, the RL is a geometric 
random variable.  The expected value of the RL is equal to 1/?, where ? is equal to the 
probability that a single chart statistic plots outside the control limits in the absence of assignable 
causes.  The expected value of the RL is known as the average run length (ARL) and is 
commonly used to describe control chart performance in Phase II. 
  
 
4 
 
1.2 Hotelling's T2 Control Chart for Individuals -- the Classical Approach 
 Hotelling's T2 control chart may be used to detect outliers in a p-dimensional multivariate 
process, where each observation in the form of a p-vector 1( ,..., )ipXX?X  is assumed to come 
from a multivariate normal distribution when the process is in-control.   More specifically, each 
~ ( , ),ipNX ? ? where ? is the p-dimensional mean vector that defines the location of the process 
and ? is the positive definite p x p covariance matrix that specifies the dispersion of the process.  
Hotelling's T2 statistic is calculated for each iX  as 
 ? ? ? ?21 ,i i iT ??? ? ?XX??? (1.2.1) 
and subsequently compared to either a Phase I or Phase II upper control limit (UCL).   
 In a Phase II control charting application when ? and ? are known, the UCL for 
Hotelling's T2 chart is given by Montgomery (2005, p. 501) as  
 2,UCL ,p???  where ? = the desired in-control FAP.  (1.2.2) 
When ? and ? are unknown, as is usually the case in practice, they are typically replaced by the 
classical sample mean vector and sample covariance matrix estimated from an in-control 
reference sample consisting of n independent observations.  The classical estimators are defined 
as follows:  
 ? ? ? ?
11
11a n d .1nni i i
iinn??
?? ? ? ????X X S X X X X (1.2.3) 
Using these unbiased estimators, Hotelling's T2 statistic becomes 
 ? ? ? ?21i i iT ??? ? ?X X S X X (1.2.4) 
and according to Montgomery (2005, p. 501) has Phase II upper control limit  
 
5 
 
 
,,2( 1 ) ( 1 )U C L ,p n pp n n Fn n p ? ???? ?
 where ? = the desired in-control FAP. (1.2.5) 
 If an in-control reference sample is not available, a Phase I analysis must first be 
conducted using Hotelling's T2 statistic in Equation (1.2.4) and the following UCL specified by  
Tracy, Young, and Mason (1992):  
 2
, / 2 ,( 1 ) / 2( 1 )U C L ,p n pn n ?? ????
 where ? = the desired in-control FAP. (1.2.6) 
Note that in Phase I, ? represents the desired in-control FAP for each observation.  In order to set 
??to achieve a desired overall in-control FAP for all n observations in a reference data set, the 
following relation must be used: 
 ? ?1/1 1 .noverall??? ? ?  (1.2.7) 
For example, for a reference sample consisting of n = 50 observations and a desired overall in-
control FAP of 0.05, ? ?1 / 5 01 1 0 .0 5 0 .0 0 1 0 2 5? ? ? ? ? would be used in Equation (1.2.6) to 
determine the Phase I UCL.  If the purpose of Hotelling's T2 chart is solely to identify location 
shifts, the lower control limit (LCL) in both Phase I and Phase II is often defined as zero or not 
specified at all.  This is because location shifts in all directions result in increasingly positive T2 
values, so an LCL is not necessary.   
 As noted by Vargas (2003), Hotelling's T2 statistic using classical estimators in Phase I is 
effective in detecting a single moderately sized outlier.  However, its inability to detect multiple 
outliers has been well documented by Vargas (2003), Jobe and Pokojovy (2009), and Yanez, 
Gonzalez, and Vargas (2010), and its poor performance in detecting sustained shifts in the mean 
vector has been demonstrated by Sullivan and Woodall (1996) and Vargas (2003).  This is 
because the presence of even a single arbitrarily large outlier or a few moderately sized outliers 
 
6 
 
can significantly contaminate the parameter estimates and ,XS thus rendering the T2 statistic 
ineffective.  This is precisely the problem this research seeks to address. 
 
1.3 Previous Attempts to Improve the Robustness of Hotelling's T2 Chart 
  In recent years, improving the robustness of Hotelling's T2 chart for individual 
multivariate normally distributed data in Phase I has garnered much interest in multivariate 
quality control research.  Many proposed methods have been successful under certain conditions, 
but none have proven to be universally superior.  Thus, there is still room for improvement.  
Before detailing the BACON-based robust version of Hotelling's T2 chart proposed by this 
research, a brief summary of other robust versions of Hotelling's T2 chart for individual 
multivariate normally distributed data will be provided. 
 In one of the earliest attempts to improve the robustness of Hotelling's T2 control chart, 
Sullivan and Woodall (1996) proposed using vector differences between successive observations 
to estimate the in-control covariance matrix of a process, and showed that this method used in 
conjunction with Hotelling's T2 statistic results in enhanced detection ability of step (sudden) and 
ramp (gradual) shifts in the mean during retrospective analysis of a data set.  Later, Vargas 
(2003) evaluated the performance of five different types of robust estimators for use with 
Hotelling's T2 chart, including the minimum volume ellipsoid (MVE) estimators of Rousseeuw 
(1984) and Rousseeuw and Van Zomeren (1990), the minimum covariance determinant (MCD) 
method of Rousseeuw (1984) and Rousseeuw and Van Driessen (1999), a trimming approach 
based on Mahalanobis distance, the aforementioned method of vector differences proposed by 
Sullivan and Woodall (1996), and an outlier detection algorithm also proposed by Sullivan and 
Woodall (1996).   
 
7 
 
 Vargas (2003) ultimately recommended the MVE estimator for detecting multiple 
outliers and the Sullivan and Woodall (1996) successive differences estimator for identifying 
sustained shifts in the mean vector.  Jensen, Birch, and Woodall (2007) further detailed the 
advantages of the MVE and MCD methods as robust estimators and provided a detailed analysis 
of when to use each type of estimator with the T2 statistic in a Phase I control chart setting.  
Alfaro and Ortega (2008) proposed trimming each variable to obtain robust estimates for the 
location vector and covariance matrix, and then using those estimates in Hotelling's T2 chart with 
the Phase I UCL given in Equation (1.2.6) to provide enhanced outlier detection.  The method of 
Alfaro and Ortega (2008) demonstrated improvement over Hotelling's T2 chart using classical 
estimators, but no other performance comparisons were offered.  
 In one of the most comprehensive studies performed, Jobe and Pokojovy (2009) 
developed a computationally intensive two-step method of identifying the largest bulk of similar 
multivariate data from a time-ordered sequence of individual points, and used the estimated 
mean vector and covariance matrix from this bulk in the T2 statistic with empirical control limits.  
The authors compared the performance of Hotelling's T2 chart using their method, the classical 
method of parameter estimation, and the robust methods analyzed by Vargas (2003) and Jensen 
et al. (2007), showing that their method resulted in improved performance in detecting outliers as 
well as sustained shifts in location.  Based on these findings, the results of Jobe and Pokojovy 
(2009) will be used as the standard of comparison for the BACON-based version of Hotelling's 
T2 chart proposed by this research. 
  The most recent attempts to improve the robustness of Hotelling's T2 chart in Phase I 
include Oyeyemi and Ipinyomi's (2010) proposal to robustly estimate the covariance matrix by 
identifying a subset of data that meets specified optimality criteria, and then iteratively 
 
8 
 
expanding the subset to a predetermined size.  The method was shown to outperform the MVE 
and MCD methods in a limited number of cases, but only bivariate samples of size m = 30 were 
considered.  Yanez, Gonzalez, and Vargas (2010) proposed a T2 chart using biweight S 
estimators for location and scatter in conjunction with simulated limits, showing that it 
outperforms Hotelling's T2 chart with MVE estimators for small samples.   
 Other authors of robust Hotelling's T2 charts for individual multivariate normally 
distributed data focused their performance comparisons on Phase II rather than Phase I.  
Chenouri and Steiner (2009) proposed a robust Phase II Hotelling's T2 chart with simulated limits 
based on reweighted MCD (RMCD) estimators, as defined by Willems, Pison, Rousseeuw, and 
Van Alest (2002), obtained directly from an unedited reference sample.  The authors show that 
their version of Hotelling's T2 chart outperforms Hotelling's T2 chart using classical estimators, 
MVE estimators, and MCD estimators under certain conditions in Phase II.  Chenouri and 
Variyath (2011) built upon the work of Chenouri and Steiner (2009) by comparing Hotelling's T2 
chart using RMCD estimators to Hotelling's T2 chart using reweighted MVE (RMVE) and S 
estimators in Phase II, again favoring the RMCD estimators for location and scatter in most 
scenarios evaluated.  Mohammadi, Midi, Arasan, and Al-Talib (2011) also explored the merits of 
using RMCD and RMVE estimators in a Phase II Hotelling's T2 chart, recommending the RMVE 
method for small reference samples and the RMCD method for large reference samples.  
Assessment of a robust version of Hotelling's T2 control chart exclusively in terms of Phase II 
performance inherently assumes that a practitioner desires to use Hotelling's T2 chart in Phase II, 
which may or may not be the case in reality.  In order to allow a practitioner the flexibility to 
choose another proven Phase II method (e.g. the MEWMA or MCUSUM chart) after outlier 
 
9 
 
removal is completed in Phase I, this research will focus on Hotelling's T2 chart performance in 
Phase I only. 
 
1.4 Organization of Thesis 
 The remainder of this document is dedicated to the detailed development and application 
of a BACON-based robust version of Hotelling's T2 control chart for individual multivariate 
normally distributed data in Phase I.  Chapter 2 explores the properties of various robust 
parameter estimation methods including BACON, and addresses the design of a BACON-based 
Hotelling's T2 control chart.  Chapter 3 provides the simulation plan for assessing the BACON-
based Hotelling's T2 control chart's in- and out-of-control performance.  Chapter 4 contains the 
results of the simulation study and comparisons to several existing robust Hotelling's T2 control 
charts.  This thesis concludes in Chapter 5 with a synopsis of research conducted, 
recommendations to practitioners, and discussion of areas in need of further investigation. 
  
 
10 
 
 
 
 
 
 
 
2 Hotelling's T2 Chart Using Robust Estimates of Location and Scatter 
 Numerous robust parameter estimation methods have been used in control chart research, 
including the MVE, MCD, and several other methods mentioned in Chapter 1.  The BACON 
method was shown by Billor et al. (2000) to perform similarly to well known robust parameter 
estimation methods such as the MVE and MCD methods without the extreme computational 
burden, making it a seemingly ideal candidate for improving the robustness of Hotelling's T2 
chart.  The BACON method demonstrates excellent balance between computational complexity 
and robustness to outliers while also satisfying several other important statistical properties of 
robust parameter estimation methods.   
 
2.1 Desirable Properties of Robust Estimators 
 The performance of an estimator is commonly described by its finite-sample replacement 
breakdown point (RBP).  First defined by Donoho and Huber (1983), the RBP is the minimum 
fraction of a sample that must be replaced by outliers in order to completely ruin an estimate, so 
a low RBP indicates nonrobustness and a high RBP signifies robustness to outliers.  Precise 
definitions of RBPs for both location and scatter estimators are adapted from Donoho and Huber 
(1983) and Lopuhaa and Rousseeuw (1991).  Let ? ?1,...,nn?X XX be a random sample of size n 
in .pR   The RBP of a location estimator T at nX , or the smallest fraction k/n of outliers that can 
take the resulting estimate beyond any bound, is defined as 
 
11 
 
 
, ,
( ; ) m i n : s u p ( ) ( ) ,
nkn n n k
kR B P T T Tn??? ? ? ?????
XX X X
 (2.1.1) 
where ,nkX  is a contaminated sample found by replacing k points of nX  with arbitrary values.  
The RBP of a scatter estimator C at nX , or the smallest fraction k/n of outliers that can drive 
either the largest eigenvalue of the resulting estimate to infinity or the smallest eigenvalue of the 
resulting estimate to zero, is defined as 
 ? ?
, ,
( ; ) m i n : s u p ( ) , ( ) ,
nkn n n k
kR B P C M C Cn??? ? ?????
XX X X
 (2.1.2) 
where ,nkX  is defined as before, ? ?11
11( , ) m a x ( ) ( ) , ( ) ( ) ,ppM ? ? ? ???? ? ?A B A B A B
and 
1( ) ( )p????AA are the ordered eigenvalues of the matrix A.   
 To illustrate the idea of an RBP, consider a sample of size n in 1R  and two common 
location estimators:  the sample mean and the sample median.  The sample mean has an RBP of 
only 1/n because a single outlier could move the sample mean to infinity, so it is considered a 
nonrobust location estimator.  In contrast, the sample median has the highest possible RBP of 1/2 
because 1/2 of the sample would have to be contaminated with outliers in order to effect a 
corresponding shift in the sample median.  Consequently, the sample median is the preferred 
location estimator in 1R  from a robustness standpoint.   
 In addition to having a high RBP, a location or scatter estimator should also be affine 
equivariant.  From  Lopuhaa and Rousseeuw (1991), a location estimator T is affine equivariant 
if ( ) ( )TT? ? ?A X b A X b for any p-vector b and any p x p nonsingular matrix A, and a positive 
definite scatter estimator C is said to be affine equivariant if () TC C ( )??A X b A X Afor any p-
 
12 
 
vector b and any p x p nonsingular matrix A.  Affine equivariance means that an estimator does 
not depend on the location, scale, or orientation of the data.   
 
2.2 The MVE and MCD Methods of Robust Parameter Estimation 
 Rousseeuw's (1984) MVE method is an affine equivariant, computationally complex, 
robust parameter estimation technique.  The MVE method finds the ellipsoid of minimum 
volume that covers a subset of at least h points, and uses the geometrical center of the ellipsoid 
as the location estimator and the matrix defining the ellipsoid itself (multiplied by a constant) as 
the covariance matrix estimator.  Lopuhaa and Rousseeuw (1991) showed that the integer value 
of h = (n+p+1)/2 provides the highest possible RBP of [(n-p+1)/2]/n, which converges to 50% as 
.n??    
 Due to the computational complexity of Rousseeuw's (1984) original MVE method, 
Rousseeuw and Leroy (1987) proposed an alternative method that approximates MVE estimates 
using a subsampling algorithm.  With the subsampling method, a fixed number of subsets are 
first drawn from a data set.  Rousseeuw and Leroy (1987, p. 199) recommend at least 500 subsets 
for small data sets in low dimensions and even more subsets for larger n and p.  Next, the sample 
mean vector and sample covariance matrix are calculated for each subset.  This determines the 
shape of an ellipsoid, which is then increased in size through multiplication by a constant until 
the ellipsoid covers at least the required h data points.  Once this has been completed for each 
subset, the ellipsoid having the smallest volume is used to obtain the MVE estimates of location 
and scatter.   
 According to Jensen et al. (2007), the subsampling algorithm is widely used but suffers 
from repeatability issues.  More specifically, MVE estimates of location and scatter from the 
 
13 
 
same data set can vary widely depending on the number of subsets used in the subsampling 
algorithm.  The software package R was used in this research to calculate MVE estimates using 
Rousseeuw and Leroy's (1987) subsampling algorithm.  The MVE function in R was employed 
using default input arguments, which are believed to be h = (n+p+1)/2 and 500 subsamples, 
although the R documentation is not completely clear about this.  The fact that the MVE results 
determined by this research are somewhat different than those obtained by Vargas (2003) and 
Jensen et al. (2007) is likely due to the authors' use of different MVE input arguments. 
 Rousseeuw's (1984) MCD method, which is also robust, affine equivariant, and 
computationally complex, finds the subset of data that has the smallest covariance matrix 
determinant while covering a specified minimum number of points, h.  It then uses the sample 
mean vector and the sample covariance matrix (as in the MVE approach, also multiplied by a 
constant) of the points in the subset as estimators of location and scatter.  Rousseeuw and Van 
Driessen's (1999) FAST-MCD algorithm is a more computationally efficient version that 
approximates MCD estimates based on an iterative scheme involving randomly selected subsets 
of data.  According to Jensen et al. (2007), the FAST-MCD algorithm does not suffer from the 
same repeatability issues as the MVE subsampling method and is therefore a better estimator.   
Like the MVE method, the integer value of h = (n+p+1)/2 provides the highest possible RBP for 
the MCD method of [(n-p+1)/2]/n, which converges to 50% as .n??   Alternatively, h can be 
increased to 0.75n if the percentage of OC points is thought to be low.  This increases the 
efficiency of the estimator because more IC observations from the reference sample are being 
used.  All results in this research were derived using h = 0.75n because the vast majority of 
scenarios evaluated involve contamination levels less than or equal to 20%, so there is no need to 
sacrifice statistical efficiency for a higher RBP.  Using h = 0.75n, this research often achieved 
 
14 
 
better performance using the FAST-MCD method than Vargas (2003) and Jensen et al. (2007), 
especially for small samples with a low percentage of outliers.      
 
2.3 The BACON Method of Robust Parameter Estimation 
 The BACON method possesses all the desired properties of robust estimators and is very 
computationally efficient, even for extremely large data sets and higher dimensions.  It begins 
with a small outlier-free subset of the data and then allows this subset to grow rapidly until a 
stopping criteria is reached, sometimes taking only two to three iterations to converge to a 
satisfactory solution.  Two versions of this iterative forward selection method are available -- 
Version 2 which is nearly affine equivariant and has RBPs exceeding 40% for various 
combinations of dimension and sample size, and Version 1 which is completely affine 
equivariant with an approximate RBP of 20%.  MATLAB code for the BACON method is 
available from the authors upon request. 
 From Billor et al. (2000), the BACON algorithm is as follows: 
 Step 1:  Identify an initial basic subset of m > p observations that can safely be assumed 
free of outliers, where p is the dimension of the data and m is an integer chosen by the data 
analyst.  Billor et al. (2000) suggest m = cp, where c is a small integer (such as 4 or 5) chosen by 
the data analyst.   
 Using Version 1 of the BACON method, Step 1 involves computing the Mahalanobis 
distances 
 ? ? ? ? ? ?1, , 1 , . . . , ,i i id i n??? ? ? ?X S X X S X X (2.3.1) 
 
15 
 
where X and S are the classical mean vector and covariance matrix, respectively, of the n 
observations.  The m = cp observations with the smallest values of ? ?,id XS  are then nominated 
as a potential basic subset.  According to Billor et al. (2000), this start is not robust, but it is 
affine equivariant and computationally easy.  Furthermore, Billor et al. (2000) show that 
subsequent iterations make up for the nonrobust start as long as the fraction of outliers is 
relatively small (20% in 5 dimensions, 10% in 20 dimensions). 
 If Version 2 of the BACON method is used, Step 1 involves computing  
 , 1,..., ,i in??XM  (2.3.2) 
where M is the coordinatewise median and ?  is the vector norm.  The m = cp observations 
with the smallest values of i?XM are then nominated as a potential basic subset.  According 
to Billor et al. (2000), this start is robust but not affine equivariant (because the coordinatewise 
median is not affine equivariant) and slightly more computationally intensive because of the 
need to find medians in all directions.  However, Billor et al. (2000) also state that because 
subsequent iterations are affine equivariant, the overall procedure is nearly affine equivariant, 
and it achieves a high RBP of approximately 40%.   
 Step 2:  Fit an appropriate model to the basic subset, and from that model compute 
discrepancies for each of the observations.  This involves computing 
 ? ? ? ? ? ?1, , 1 , . . . , ,i b b i b b i bd i n??? ? ? ?X S X X S X X (2.3.3) 
where bX and bS  are the mean vector and covariance matrix, respectively, of the observations in 
the basic subset. 
 Step 3:  Find a larger basic subset consisting of observations known (by their 
discrepancies) to be homogeneous with the basic subset.  Generally, these are the observations 
 
16 
 
with the smallest discrepancies.  This new basic subset may omit some of the previous basic 
subset observations, but it must be as large as the previous basic subset. 
 In order to accomplish this, the new basic subset will include all points with 
discrepancies less than ,/,npr p nc ?? where 2,p??  is the 1 - ?  percentile of the chi-square 
distribution with p degrees of freedom, cnpr = cnp + chr is a correction factor, chr = max{0, (h - 
r)/(h + r)}, h = [(n + p + 1)/2], r is the size of the current basic subset, and  
 1 1 1 21 1 .
13np ppc n p n h p n p n p??? ? ? ? ? ?? ? ? ? ? ?
 (2.3.4) 
The parameter ?  can be set to any number between 0 and 1, but ? = 0.05 is suggested by Billor 
et al. (2000) for most applications.   
 Step 4:  Iterate Steps 2 and 3 until the size of the basic subset no longer changes. 
 Step 5:  Nominate the observations excluded by the final basic subset as outliers. 
 Once the final basic subset is determined, estimates of location and scatter using the 
classical formulas for X and S are computed and used in conjunction with Hotelling's T2 chart to 
perform a Phase I analysis of individual multivariate normally distributed data.  It is, however, 
important to recognize that the distribution of the T2 statistic using BACON estimates from the 
outlier-free data set is not the same as the distribution of the T2 statistic using classical estimates 
from the original data set.  The same is true when MVE, MCD, or other estimators are used in 
lieu of classical estimators.  In such cases, Tracy et al.'s (1992) Phase I UCL for Hotelling's T2 
chart given by Equation (1.2.6) is no longer appropriate.   
 Figure 2.3.1 is a graphical depiction of the results of incorrectly applying Tracy et al.'s 
(1992) Phase I UCL to Hotelling's T2 chart using several non-classical estimators of location and 
scatter.  In this example, the data are multivariate normally distributed and the desired IC FAP is 
 
17 
 
 
 
 
 
Figure 2.3.1  IC FAPs for Hotelling's T2  Chart Using Tracy et al.'s (1992) UCL 
 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
30 50 75 100 125 
Em
pir
ica
l IC 
FA
P 
Sample Size 
p = 3 
BACON 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
30 50 75 100 125 
Em
pir
ica
l IC 
FA
P 
Sample Size 
p = 5 
BACON 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
30 50 75 100 125 
Em
pirica
l I
C 
FAP
 
Sample Size 
p = 10 
BACON 
MVE 
MCD 
Classical 
 
18 
 
0.05, but only Hotelling's T2 chart using classical estimators maintains the desired IC FAP 
throughout the range of sample sizes and dimensions considered.  The empirical IC FAPs for 
Hotelling's T2 chart using BACON, MVE, and MCD estimators are in most cases significantly 
higher than the target of 0.05.  This effect is usually most pronounced with small sample sizes 
and worsens with increasing dimension.  If the distributions of the T2 statistics using BACON, 
MVE, and MCD estimators were known, corresponding Phase I UCLs for Hotelling's T2 chart 
could be derived mathematically.  Since the distribution of the T2 statistic using BACON 
estimators is unknown and the distribution of the T2 statistic using MVE or MCD estimators is 
known only asymptotically [Jensen et al. (2009, p. 20)], Phase I UCLs must be empirically 
determined through simulation.  
 
2.4 Empirical Control Limits for Hotelling's T2 Chart with Robust Estimators 
 For each combination of n and p, using a desired IC FAP of 0.05, empirical Phase I UCLs 
for Hotelling's T2 chart using BACON, MVE, and MCD estimators were determined using a 
methodology similar to the one outlined by Jensen et al. (2007): 
1) Simulate 100,000 sets of individual multivariate normally distributed data with zero mean 
vector and identity covariance matrix.  Due to the affine equivariance of the 2T  statistic, 
0 and I may be used without loss of generality, so the resulting limits are applicable for 
any ? and ?.    
2) For each data set, determine robust estimates of location and scatter, compute 2T  
statistics for all observations, and record the maximum 2T  statistic. 
 
19 
 
3) From the set of 100,000 maximum 2T  statistics, which represents the empirical 
distribution of the maximum 2T  statistic, determine the 95th percentile.  This represents 
the empirical UCL for a desired IC FAP of 0.05. 
A table of BACON, MVE, MCD, and classical control limits for all combinations of n and p 
used in this research is provided in Table 2.4.1, a table of input arguments for the BACON 
method is illustrated in Table 2.4.2, and MATLAB code for simulating additional control limits 
is provided in Appendix A.  The input parameters in Table 2.4.2 were established through trial 
and error in order to achieve the highest possible alarm probabilities under a variety of OC 
conditions. 
 
  
Table 2.4.1  Empirical and Classical UCLs for Hotelling's T2 Chart 
 
 
 
 
Table 2.4.2  BACON Input Arguments for Each Dimension Evaluated  
n p B A C O N  U C L M V E  U C L M C D  U C L C l as s i c al  U C L
30 2 21 .0 7 41 .6 5 27 .7 3 10 .5 5
30 3 24 .2 8 58 .6 5 41 .2 6 12 .2 1
50 3 22 .0 5 35 .3 9 28 .5 6 14 .1 4
100 3 21 .6 0 26 .1 5 24 .0 9 16 .4 1
30 5 34 .4 5 84 .2 6 69 .7 8 14 .9 2
50 5 29 .0 9 46 .4 9 43 .6 1 17 .4 1
100 5 27 .1 4 32 .5 6 31 .1 9 20 .2 1
30 10 20 .0 2 21 7. 91 19 2. 24 20 .0 5
50 10 49 .6 8 95 .3 5 10 3. 09 23 .9 8
100 10 40 .8 9 51 .3 2 52 .6 4 28 .0 9
D e s i r e d  I C  F A P  =  0 .0 5
p V e r s ion ? c
2 2 0.10 6
3 2 0.10 6
5 2 0.10 4
10 2 0.10 3
B A C O N  I n p u t  A r gu m e n t s
 
20 
 
 Repeating the experiment reflected in Figure 2.3.1 using the empirical UCLs for 
Hotelling's T2 chart with BACON, MVE, and MCD estimators in Table 2.4.1 yields the graphs 
pictured in Figure 2.4.1.  In this case, because the correct (empirical) UCLs were used, each 
robust version of Hotelling's T2 chart maintains the desired IC FAP of 0.05 throughout the range 
of sample sizes and dimensions evaluated.  Minor deviations from the target IC FAP of 0.05 are 
due to simulation noise.  All simulations were conducted using the MATLAB code for 
Hotelling's T2 chart provided in Appendix B, with the number of outliers set to zero. 
  
 
21 
 
 
 
 
 
Figure 2.4.1  Hotelling's T2 Chart IC FAPs Using Robust Estimators & Empirical UCLs 
  
0.00 
0.03 
0.05 
0.08 
0.10 
30 50 100 
Em
pir
ica
l IC 
FA
P 
Sample Size 
p = 3 
BACON 
MVE 
MCD 
0.00 
0.03 
0.05 
0.08 
0.10 
30 50 100 
Em
pirica
l I
C 
FAP
 
Sample Size 
p = 5 
BACON 
MVE 
MCD 
0.00 
0.03 
0.05 
0.08 
0.10 
30 50 100 
Em
pirica
l I
C 
FAP
 
Sample Size 
p = 10 
BACON 
MVE 
MCD 
 
22 
 
 
 
 
 
 
 
3 Evaluating Phase I Performance of Hotelling's T2 Chart 
 To assess the effectiveness of Hotelling's T2 chart using BACON estimators to establish 
an IC reference sample for individual multivariate normally distributed data, its performance was 
compared to Jobe and Pokojovy's (2009) cluster-based Hotelling's T2 chart.  This is a logical 
standard of performance because, as mentioned in Chapter 1, Jobe and Pokojovy's (2009) 
method was shown to be superior to previous attempts to improve the robustness of Hotelling's 
T2 chart in most cases tested.  Because the ultimate goal of this research is to match or exceed the 
performance of Jobe and Pokojovy's (2009) chart, the entire set of scenarios evaluated by Jobe 
and Pokojovy (2009) was replicated using Hotelling's T2 chart with BACON estimators.  Similar 
to Jobe and Pokojovy's (2009) design of experiments, Hotelling's T2 charts using MVE, MCD, 
and classical estimators were also simulated for comparison purposes.  
 
3.1 Simulating Multivariate Normally Distributed Data 
  Hotelling's T2 chart was tested on individual multivariate normally distributed data under 
a variety of OC conditions, including situations in which simulated data in p = 2, 3, 5, and 10 
dimensions contained a specified number of outliers as well as a scenario involving simulated 
bivariate data containing a 50% sustained shift of the mean.  Due to affine equivariance of the 
mean vector and covariance matrix, multivariate normal data were generated without loss of 
generality from the standard multivariate normal distribution, Np(0, I), where 0 is a p-
dimensional mean vector of all zeros and I is a p x p identity matrix.  The data were simulated 
 
23 
 
using MATLAB code from the MathWorks Statistics Toolbox at 
http://www.mathworks.com/help/toolbox/stats/.  A summary of all planned simulations is 
illustrated in Table 3.1.1, where p = dimension, n = sample size, k = number of outliers, and NCP 
= shift size.  
 
 
Table 3.1.1  Summary of Planned Experiments 
 
 
 
3.2 Assessing Out-of-Control Performance 
 Hotelling's T2 charts using BACON, MVE, MCD, and classical estimators were first 
evaluated in terms of their ability to detect k randomly occurring outliers, or mean-shifted data 
points.  The k outliers were randomly arranged throughout the data in order to achieve 
consistency with Jobe and Pokojovy's (2009) simulation methodology.  In general, the detection 
power of Hotelling's T2 chart in Phase I is unaffected by the placement of outliers.  This is 
because most parameter estimation methods (e.g. BACON, MVE, MCD, and classical methods) 
are unaffected by the time order of data, and Hotelling's T2 chart in Phase I does not consider the 
time order of data when making simultaneous comparisons of all control chart statistics to the 
UCL.  In contrast, the clustering method of parameter estimation used by Jobe and Pokojovy 
(2009) in their version of Hotelling's T2 chart does account for the sequencing of data, hence the 
necessity of randomizing outliers in their simulation methodology. 
p n k N C P
30 1( 2) 7 5( 5) 30
30 15 4, 5( 5) 30
30 2( 2) 6 5( 10) 25
50 2, 5, 10 5( 10) 25
100 5, 10, 20 5( 10) 25
2
3, 5, 10
 
24 
 
  Next, Hotelling's T2 charts using BACON, MVE, MCD, and classical estimators were 
evaluated in terms of their ability to detect a 50% sustained shift of the mean occurring in the 
latter half of a data set.  Sustained shifts can be induced anywhere in the data set without loss of 
generality, but were placed at the end of each data set for purposes of consistency with Jobe and 
Pokojovy's (2009) methodology.  It should be noted that this level of contamination exceeds the 
BACON method's maximum RBP of 40%, so a significant degradation in performance of 
Hotelling's T2 chart using BACON estimators was expected.   
 The magnitude of each shift was measured by the noncentrality parameter  
 1NCP ? ?????  (3.2.1) 
where the process mean vector shifts from o?  to o??? and ?  is the process covariance matrix.  
Because the direction of a shift does not affect control chart performance with elliptically 
symmetric distributions, shifts were fixed in the direction of ? ?1 1,0,...,0?e  without loss of 
generality [Stoumbos and Sullivan (2002), p. 265].   
 OC performance of Hotelling's T2 charts using BACON, MVE, MCD, and classical 
estimators was quantified in terms of the empirical alarm probability (EAP), where EAP is 
defined as the estimated probability of a chart signaling at least once in an OC situation.  Ideally, 
a control chart's EAP should be 100% for all scenarios involving outliers.  The algorithm for 
simulating OC performance of Hotelling's T2 chart is as follows: 
1) Simulate n observations from a p-dimensional standard normal distribution. 
2) Shift a specified number of randomly selected observations by the desired NCP. 
 
25 
 
3) Compute a 2T  statistic for each observation and compare it to the empirical UCL if 
robust estimators are used or Tracy et al.?s (1992) Phase I UCL if classical estimators 
are used.  If at least one 2T  statistic exceeds the UCL, increment a counter by one. 
4) Repeat steps 1 - 3 a total of 10,000 times. 
5) Estimate the overall EAP = (final counter value)/10,000. 
This process was repeated for all experiments listed in Table 3.1.1.  The MATLAB program for 
simulating Hotelling's T2 chart performance is provided in Appendix B. 
 
  
 
26 
 
 
 
 
 
 
 
4 Comparing the BACON-Based Hotelling's T2 Chart to Other Robust Versions 
 Through a variety of scenarios involving randomly occurring outliers, a 50% sustained 
shift of the mean, and an example application to a bivariate data set, Hotelling's T2 chart with 
BACON estimators was compared to Hotelling's T2 chart using Jobe and Pokojovy's (2009) 
clustering method as well as the MVE, MCD, and classical methods included in Jobe and 
Pokojovy's (2009) research.  Due to unavailability of complete computer code for Jobe and 
Pokojovy's (2009) algorithm, simulation results for Hotelling's T2 chart using clustering were 
taken directly from Jobe and Pokojovy (2009).     
For comparison to their cluster-based chart, Jobe and Pokojovy (2009) took tables of 
simulation results for Hotelling's T2 chart using the MVE, MCD, and classical methods directly 
from Vargas (2003).  However, this is problematic because it is unknown whether Vargas (2003) 
and Jobe and Pokojovy (2009) employed equivalent simulation methodologies (software, input 
arguments, number of iterations, etc.).  As discussed in Chapter 2, the choice of input arguments 
can have a substantial impact on the performance of the MVE and MCD methods.  In contrast, 
simulation results for Hotelling's T2 chart using the MVE, MCD, and classical methods were 
properly recreated here using the simulation methodology outlined in Chapter 3.   
 
4.1 Detecting Randomly Occurring Outliers 
 The first set of performance comparisons involved the detection of k randomly occurring 
outliers in a bivariate normally distributed data set of size n = 30.  As indicated in the top panel 
 
27 
 
of Figure 4.1.1, Hotelling's T2 chart with classical estimators is superior when only one outlier is 
present, although Hotelling's T2 chart using BACON estimators and Jobe and Pokojovy's (2009) 
cluster-based chart offer only slightly lesser performance.  As illustrated in the bottom panel of 
Figure 4.1.1, however, the performance of Hotelling's T2 chart with classical estimators quickly 
degrades as k is increased from 1 to 3, whereas Hotelling's T2 chart using MCD estimators, 
BACON estimators, and Jobe and Pokojovy's (2009) clustering method continue to perform 
exceptionally well. 
 
 
 
 
Figure 4.1.1  Control Chart Performance on Bivariate Normal Data with k = 1, 3 Outliers 
 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 10 15 20 25 30 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 1, n = 30, p = 2 
BACON 
J & P 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 10 15 20 25 30 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 3, n = 30, p = 2 
BACON 
J & P 
MVE 
MCD 
Classical 
 
28 
 
 As depicted in the top panel of Figure 4.1.2, when the number of outliers is increased to 
5, Hotelling's T2 chart using MCD estimators becomes the best option, although Jobe and 
Pokojovy's (2009) cluster-based chart remains competitive.  When the number of outliers is 
increased to 7 as shown in the bottom panel of Figure 4.1.2, Jobe and Pokojovy's (2009) cluster-
based chart is slightly better than Hotelling's T2 chart using MVE or MCD estimators.  For both k 
= 5 and k = 7, the performance of Hotelling's T2 chart  using BACON estimators falls behind the 
other robust methods, especially for small to moderate NCPs.   
 
 
 
 
Figure 4.1.2  Control Chart Performance on Bivariate Normal Data with k = 5, 7 Outliers 
 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 10 15 20 25 30 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 5, n = 30, p = 2 
BACON 
J & P 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 10 15 20 25 30 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 7, n = 30, p = 2 
BACON 
J & P 
MVE 
MCD 
Classical 
 
29 
 
 Trends similar to those witnessed with bivariate normally distributed data are also 
observed in the three-dimensional scenarios summarized in Table 4.1.1, where the highest EAP 
for each scenario is in bold.  Hotelling's T2 chart using BACON estimators provides excellent 
performance when the number of outliers is very small.  However, as the number of outliers is 
increased, Jobe and Pokojovy's (2009) cluster-based chart achieves superiority over all other 
methods. 
 
 
 
Table 4.1.1  Empirical Alarm Probabilities for p = 3 
 
 This trend is illustrated in Figure 4.1.3, which represents control chart performance using 
a three-dimensional normally distributed sample of size n = 30 with increasing k.  In the case of k 
= 2 outliers, Hotelling's T2 chart using BACON estimators and Jobe and Pokojovy's (2009) 
cluster-based chart provide nearly identical performance.  Both charts are substantially better 
than Hotelling's T2 chart using MCD, MVE, or classical estimators.  As k is increased to 6, 
however, the performance of Hotelling's T2 chart using BACON estimators drops below the 
performance of Hotelling's T2 chart using MVE or MCD estimators, whereas Jobe and 
M e t h od N C P k  =  2 k  =  4 k  =  6 k  =  2 k  =  5 k  =  10 k  =  5 k  =  10 k  =  20
p  =  3
B A C O N 5 0.0879 0.0706 0.0572 0.1027 0.0908 0.0476 0.1264 0.0904 0.0517
J  &  P 5 0.0710 0.0680 0.0730 0.0950 0.0890 0.0850 0.1410 0.1790 0.1760
M V E 5 0.0640 0.0631 0.0595 0.0770 0.0866 0.0659 0.1254 0.1221 0.0715
M C D 5 0.0782 0.0702 0.0660 0.0952 0.0907 0.0638 0.1240 0.1234 0.0588
C l a s s i c a l 5 0.0869 0.0663 0.0498 0.0988 0.0772 0.0462 0.1158 0.0830 0.0475
B A C O N 15 0.3871 0.2103 0.1194 0.5239 0.3056 0.0701 0.6671 0.2995 0.0478
J  &  P 15 0.3800 0.3770 0.3950 0.4860 0.6190 0.6640 0.7440 0.8490 0.9170
M V E 15 0.2031 0.2185 0.1904 0.3579 0.4550 0.3550 0.7194 0.7674 0.5733
M C D 15 0.3022 0.3162 0.2294 0.4780 0.5708 0.3362 0.7588 0.7864 0.3878
C l a s s i c a l 15 0.2627 0.0830 0.0475 0.4091 0.1391 0.0418 0.4205 0.1387 0.0435
B A C O N 25 0.7476 0.5382 0.3921 0.8846 0.6992 0.3191 0.9712 0.6730 0.1496
J  &  P 25 0.7470 0.7830 0.8070 0.8560 0.9490 0.9660 0.9870 0.9970 0.9990
M V E 25 0.4145 0.4634 0.4416 0.7044 0.8405 0.7999 0.9798 0.9928 0.9692
M C D 25 0.6084 0.6778 0.6056 0.8336 0.9260 0.8170 0.9874 0.9968 0.9266
C l a s s i c a l 25 0.4460 0.0897 0.0450 0.7469 0.1689 0.0468 0.6963 0.1633 0.0380
n  =  30 n  =  50 n  =  100
 
30 
 
Pokojovy's (2009) cluster-based chart maintains a significant performance advantage over all 
other methods.   
 
 
 
 
Figure 4.1.3  Control Chart Performance with k Outliers When n = 30 and p = 3 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 2, n = 30, p = 3 
BACON 
J & P 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 4, n = 30, p = 3 
BACON 
J & P 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 6, n = 30, p = 3 
BACON 
J & P 
MVE 
MCD 
Classical 
 
31 
 
 In all simulations performed, it was noted that for a given (k, n, p) combination, the 
performance of Hotelling's T2 chart using BACON estimators improved as the NCP was 
increased, which is what one would expect when using a robust parameter estimation method.  In 
general, OC points with the largest NCPs should be detected more frequently than those with 
small to moderate NCPs because they are farthest from the center of the data as defined by the 
robust mean estimate.  However, as shown in Figure 4.1.4, for a given (n, p, NCP) combination, 
the performance of Hotelling's T2 chart using BACON estimators actually worsened as k was 
increased, which is somewhat counterintuitive.  One would expect the presence of more OC 
points to result in a higher probability of an alarm, assuming the OC points are correctly 
excluded from computation of the T2 statistic by the BACON method.   
     
  
Figure 4.1.4  Effect of Increasing k on Control Chart Performance 
 
 These trends together suggest that the BACON method is unable to consistently exclude 
OC points with small to moderate NCPs from a data set, but rather performs optimally when the 
NCP corresponding to an OC point is very large.  In order to confirm this, additional simulations 
were performed using NCPs beyond the range of those tested by Jobe and Pokojovy (2009).  
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
k = 2 k = 4 k = 6 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Outliers 
n = 30, p = 3, NCP = 25 
BACON 
J & P 
MVE 
MCD 
Classical 
 
32 
 
Expanding the range of NCPs from NCP = 5(10)25 to NCP = 5(10)55, the performance of 
Hotelling's T2 charts using BACON, MVE, MCD, and classical estimators are displayed in 
Figure 4.1.5.  As suspected, even with a large percentage of outliers (6/30 = 20% in this case), 
Hotelling's T2 chart with BACON estimators gets progressively better as the NCP is increased, 
and is second only to Hotelling's T2 chart with MCD estimators when the NCP > 25.  Because of 
its comparatively low computational burden, Hotelling's T2 chart with BACON estimators is 
therefore an excellent option when it is suspected that extreme outliers are present in the data.  
Jobe and Pokojovy's (2009) cluster-based chart was not included in this performance comparison 
due to lack of availability of their simulation algorithm. 
 
 
Figure 4.1.5  Control Chart Performance with Extreme Outliers 
 
Table 4.1.2 shows empirical alarm probabilities for all charts evaluated using normally 
distributed data in five dimensions, with the highest EAP for each scenario in bold.  As with the 
two- and three-dimensional cases, Hotelling's T2 chart using BACON estimators is competitive 
as long as k is small relative to n, but Jobe and Pokojovy's (2009) cluster-based chart provides 
the best overall performance across the range of scenarios evaluated.  This is further illustrated 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 35 45 55 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 6, n = 30, p = 3 
BACON 
MVE 
MCD 
Classical 
 
33 
 
for sample size n = 50 in Figure 4.1.6.  In comparison to the results obtained in two and three 
dimensions, there is a loss of detection power observed for all control charts in five dimensions.  
This is to be expected because outliers are known to be more difficult to detect in higher 
dimensions than in low dimensions. 
 
 
 
Table 4.1.2  Empirical Alarm Probabilities for p = 5 
 
 
 
  
M e t h od N C P k  =  2 k  =  4 k  =  6 k  =  2 k  =  5 k  =  10 k  =  5 k  =  10 k  =  20
p  =  5
B A C O N 5 0.0696 0.0614 0.0570 0.0759 0.0669 0.0523 0.0938 0.0739 0.0489
J  &  P 5 0.0830 0.0670 0.0590 0.0840 0.0920 0.0690 0.0900 0.1070 0.0840
M V E 5 0.0568 0.0571 0.0496 0.0666 0.0644 0.0561 0.0945 0.0886 0.0525
M C D 5 0.0684 0.0598 0.0598 0.0700 0.0724 0.0548 0.0924 0.0758 0.0498
C l a s s i c a l 5 0.0704 0.0584 0.0541 0.0792 0.0650 0.0484 0.0886 0.0683 0.0513
B A C O N 15 0.2175 0.1073 0.0652 0.3428 0.1531 0.0548 0.4475 0.1507 0.0504
J  &  P 15 0.2270 0.2550 0.2200 0.3820 0.3690 0.3450 0.6130 0.7280 0.7610
M V E 15 0.1312 0.1238 0.0915 0.2386 0.2744 0.1333 0.5592 0.5305 0.1877
M C D 15 0.1916 0.1764 0.1002 0.2834 0.3384 0.1616 0.6018 0.5884 0.2012
C l a s s i c a l 15 0.1649 0.0673 0.0503 0.2717 0.1017 0.0501 0.2689 0.0999 0.0476
B A C O N 25 0.4723 0.2611 0.1918 0.7114 0.3937 0.1444 0.8363 0.3271 0.0688
J  &  P 25 0.5030 0.5450 0.5480 0.7410 0.8110 0.8230 0.9690 0.9930 0.9910
M V E 25 0.2639 0.2667 0.1896 0.5443 0.6441 0.3683 0.9302 0.9370 0.5694
M C D 25 0.3904 0.3874 0.2544 0.6122 0.7586 0.5420 0.9586 0.9740 0.8046
C l a s s i c a l 25 0.2638 0.0701 0.0490 0.5284 0.1190 0.0433 0.4623 0.1135 0.0461
n  =  30 n  =  50 n  =  100
 
34 
 
 
 
Figure 4.1.6  Control Chart Performance with k Outliers When n = 50 and p = 5 
 
 The loss of detection power in higher dimensions is most evident in the results for the 
ten-dimensional scenarios provided in Table 4.1.3.  With most charts depicted in Table 4.1.3, the 
detection power is relatively poor.  In fact, for NCP = 5, the highest EAPs for all n and k are 
barely above the IC FAP of 0.05.  Even Jobe and Pokojovy's (2009) cluster-based chart provides 
acceptable performance only for the largest sample size and shift evaluated (n = 100, NCP = 25).  
 
 
 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 2, n = 50, p = 5 
BACON 
J & P 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pir
ica
l A
lar
m 
Pr
oba
bil
ity
 
Noncentrality Parameter 
k = 5, n = 50, p = 5 
BACON 
J & P 
MVE 
MCD 
Classical 
 
35 
 
 
Table 4.1.3  Empirical Alarm Probabilities for p = 10 
 
 The difficulty in detecting shifts in higher dimension (especially with very small samples) 
is further illustrated in Figure 4.1.7, which represents ten-dimensional normally distributed data 
with various sample sizes containing 20% outliers.  Again, most charts are unable to detect the 
presence of the shifted data, although Jobe and Pokojovy's (2009) cluster-based chart has some 
success in doing so as the sample size, number of outliers, and NCP are raised.    
M e t h od N C P k  =  2 k  =  4 k  =  6 k  =  2 k  =  5 k  =  10 k  =  5 k  =  10 k  =  20
p  =  10
B A C O N 5 0.0544 0.0569 0.0570 0.0584 0.0588 0.0490 0.0671 0.0601 0.0516
J  &  P 5 0.0610 0.0570 0.0530 0.0690 0.0730 0.0580 0.0680 0.0490 0.0530
M V E 5 0.0555 0.0562 0.0508 0.0509 0.0550 0.0498 0.0604 0.0569 0.0493
M C D 5 0.0528 0.0496 0.0536 0.0490 0.0460 0.0422 0.0612 0.0608 0.0474
C l a s s i c a l 5 0.0585 0.0497 0.0472 0.0611 0.0620 0.0522 0.0648 0.0593 0.0460
B A C O N 15 0.0874 0.0624 0.0503 0.1443 0.0744 0.0519 0.1819 0.0871 0.0551
J  &  P 15 0.0790 0.0880 0.0960 0.1810 0.2120 0.2020 0.2190 0.2300 0.2100
M V E 15 0.0723 0.0578 0.0541 0.0925 0.0859 0.0542 0.2433 0.1597 0.0557
M C D 15 0.0740 0.0604 0.0550 0.0938 0.0970 0.0576 0.2794 0.2662 0.0690
C l a s s i c a l 15 0.0860 0.0528 0.0530 0.1320 0.0719 0.0525 0.1334 0.0772 0.0443
B A C O N 25 0.1150 0.0576 0.0552 0.3213 0.1021 0.0537 0.4046 0.1089 0.0500
J  &  P 25 0.1600 0.1760 0.1710 0.4160 0.5010 0.5000 0.7360 0.8060 0.8140
M V E 25 0.0979 0.0670 0.0591 0.1939 0.1546 0.0574 0.5923 0.3966 0.0754
M C D 25 0.1326 0.1022 0.0532 0.2176 0.2594 0.0830 0.7256 0.7650 0.2610
C l a s s i c a l 25 0.1065 0.0610 0.0519 0.2510 0.0810 0.0506 0.2240 0.0849 0.0492
n  =  30 n  =  50 n  =  100
 
36 
 
 
 
 
 
Figure 4.1.7  Control Chart Performance with 20% Outliers in Ten Dimensions 
 
 In order to visualize the changes in detection power that occur as the dimension is 
increased, Figure 4.1.8 shows control chart performance with k = 5 outliers using a sample size 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 6, n = 30, p = 10 
BACON 
J & P 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 10, n = 50, p = 10 
BACON 
J & P 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 20, n = 100, p = 10 
BACON 
J & P 
MVE 
MCD 
Classical 
 
37 
 
of n = 100 in both three and ten dimensions.  From the scenario depicted in Figure 4.1.8, it is 
evident that the detection power for each variation of Hotelling's T2 chart is much greater when p 
= 3 than it is when p = 10.   
 
 
 
 
Figure 4.1.8  Effect of Increasing Dimension on Control Chart Performance 
 
 
 
4.2 Detecting a Sustained Shift of the Mean 
 Following the outline of Jobe and Pokojovy (2009), the next scenario evaluated involved 
a sample of 30 bivariate normally distributed observations with a sustained shift of the mean 
occurring in the latter half of the data (so k = 15).  The numerical results are provided in Table 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 5, n = 100, p = 3 
BACON 
J & P 
MVE 
MCD 
Classical 
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
5 15 25 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 5, n = 100, p = 10 
BACON 
J & P 
MVE 
MCD 
Classical 
 
38 
 
4.2.1 where the highest EAP for each NCP is in bold, and the graphical results are portrayed in 
Figure 4.2.1.  Hotelling's T2 chart using BACON estimators was not expected to perform well 
since the maximum RBP of the BACON method is approximately 40%.  This proved to be the 
case, although the BACON method did perform significantly better than the MVE, MCD, and 
classical methods for the larger NCPs evaluated.  Jobe and Pokojovy's (2009) cluster-based chart 
demonstrated the best performance overall.  Hotelling's T2 chart using MVE, MCD, and classical 
estimators were unable to detect the presence of such a high level of contamination, producing 
EAPs that were in most cases even lower than the desired IC FAP of 0.05. 
 
 
Table 4.2.1  Empirical Alarm Probabilities Under a 50% Sustained Shift of the Mean 
 
  
Figure 4.2.1  Control Chart Performance Under a 50% Sustained Shift of the Mean 
 
 
M e t h o d  \  N C P 4 5 10 15 20 25 30
B A C O N 0 .0 3 3 6 0 .0 3 1 3 0 .0 2 8 7 0 .0 5 0 0 0 .0 9 0 0 0 .1 6 2 2 0 .2 4 8 3
J  &  P 0 .2 6 5 0 0 .3 5 6 0 0 .6 9 3 0 0 .8 7 3 0 0 .9 5 2 0 0 .9 8 2 0 0 .9 8 6 0
M V E 0 .0 3 9 9 0 .0 3 9 8 0 .0 4 4 3 0 .0 4 2 5 0 .0 4 3 1 0 .0 5 1 0 0 .0 4 8 3
M C D 0 .0 3 9 8 0 .0 3 0 4 0 .0 3 1 2 0 .0 3 2 0 0 .0 3 2 4 0 .0 3 7 0 0 .0 3 2 0
C l a s s i c a l 0 .0 3 1 6 0 .0 2 8 3 0 .0 2 4 2 0 .0 2 2 8 0 .0 2 1 5 0 .0 2 2 5 0 .0 2 3 0
0.0 0.1 
0.2 0.3 
0.4 0.5 
0.6 0.7 
0.8 0.9 
1.0 
4 5 10 15 20 25 30 
Em
pirica
l Ala
rm
 P
ro
ba
bil
ity
 
Noncentrality Parameter 
k = 15, n = 30, p = 2 
BACON 
J & P 
MVE 
MCD 
Classical 
 
39 
 
4.3 Application to an Example Data Set 
 The final performance evaluation for Hotelling's T2 chart using BACON estimators 
involved its application to the bivariate data set depicted in Table 4.3.1.  The original data set 
was presented by Quesenberry (2001) in 11 variables, but later pared down to two variables by 
Vargas (2003) for purposes of comparing the performance of Hotelling's T2 chart using MVE, 
MCD, and classical estimators as well as two methods proposed by Sullivan and Woodall 
(1996).  Jobe and Pokojovy (2009) subsequently used the bivariate data set from Vargas (2003) 
to compare the effectiveness of their cluster-based chart to the five aforementioned methods 
evaluated by Vargas (2003).   
 Simulation exercises resulted in observation 2 being identified as an outlier by all 
versions of Hotelling's T2 chart.  This is contrary to the findings of Vargas (2003), who reported 
that Hotelling's T2 chart using MCD estimators failed to identify the lone outlier.  As discussed in 
Chapter 2, this is probably because the MCD method in this research used a higher percentage of 
the sample than Vargas (2003) to compute more accurate estimates of location and scatter. 
 
 
Table 4.3.1  Example Bivariate Data Set 
 
i 1 2 3 4 5 6 7 8 9 10
x 1 0. 56 7 0. 53 8 0. 53 0 0. 56 2 0. 48 3 0. 52 5 0. 55 6 0. 58 6 0. 54 7 0. 53 1
x 2 60 .5 58 56 .3 03 59 .5 24 61 .1 02 59 .8 34 60 .2 28 60 .7 56 59 .8 23 60 .1 53 60 .6 40
i 11 12 13 14 15 16 17 18 19 20
x 1 0. 58 1 0. 58 3 0. 54 0 0. 45 8 0. 55 4 0. 46 9 0. 47 1 0. 45 7 0. 56 5 0. 66 4
x 2 59 .7 85 59 .6 75 60 .4 89 61 .0 67 59 .7 88 58 .6 40 59 .5 74 59 .7 18 60 .9 01 60 .1 80
i 21 22 23 24 25 26 27 28 29 30
x 1 0. 60 0 0. 58 6 0. 56 7 0. 49 6 0. 48 5 0. 57 3 0. 52 0 0. 55 6 0. 53 9 0. 55 4
x 2 60 .4 93 58 .3 70 60 .2 16 60 .2 14 59 .5 00 60 .0 52 59 .5 01 58 .4 76 58 .6 66 60 .2 39
 
40 
 
 Next, imitating the analysis of Vargas (2003), the bivariate sample was altered by 
changing observation 16 to (0.469, 56.23) and observation 24 to (0.496, 56.08) to make them 
outliers.  Of all the variations of Hotelling's T2 chart evaluated by Vargas (2003) and Jobe and 
Pokojovy (2009), as well as the BACON-based alternative presented here, only the clustering 
and BACON versions correctly identified all three observations (2, 16, and 24) as outliers.  The 
classical, MVE, MCD, and both Sullivan and Woodall (1996) versions of Hotelling's T2 chart 
failed to detect at least one of the three outlying observations.  Again, this is contrary to Vargas 
(2003), who determined that Hotelling's T2 chart using MVE estimators correctly identified all 
three outliers and that Hotelling's T2 chart using MCD estimators failed to do so.  As previously 
stated, this is likely due to each author using different MVE and MCD input arguments.   
 Control chart statistics for Hotelling's T2 chart using BACON estimators for the original 
and altered samples are provided in Table 4.3.2.  Entries in bold indicate control chart statistics 
that exceed the empirical UCL of 21.07 and therefore signal a potential out-of-control condition.  
The corresponding control chart for the altered data is presented in Figure 4.3.1. 
 
 
 
Table 4.3.2  T2 Statistics for Original and Altered Samples Using BACON Estimators 
 
 
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
O r i g i n a l 0 .9 2 2 4 .9 6 0 .3 5 2 .6 1 1 .5 1 0 .3 1 1 .2 9 0 .9 3 0 .0 9 1 .0 3 0 .7 7 0 .9 6 0 .5 9 6 .1 1 0 .1 2
A l t e r e d 0 .8 7 2 6 .6 8 0 .5 1 2 .6 2 1 .8 7 0 .3 4 1 .2 5 0 .8 0 0 .0 6 0 .9 9 0 .6 5 0 .8 3 0 .5 4 6 .0 9 0 .1 0
i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
O r i g i n a l 4 .9 5 2 .3 0 3 .1 5 1 .8 7 6 .5 9 1 .9 0 5 .9 6 0 .3 9 1 .1 5 1 .6 3 0 .4 4 0 .5 1 4 .2 7 3 .0 4 0 .2 2
A l t e r e d 3 0 .1 5 2 .8 9 3 .7 8 1 .8 5 6 .5 5 1 .8 6 5 .9 3 0 .3 2 3 0 .9 4 2 .1 4 0 .3 5 0 .7 4 4 .5 1 3 .4 0 0 .1 7
 
41 
 
 
Figure 4.3.1  Application of the BACON-Based Hotelling's T2 Chart to Altered Data 
 
 
  
0 
5 
10 
15 
20 
25 
30 
35 
0 5 10 15 20 25 30 
T2
 Sta
tis
tic
 
Observation 
Hotelling's T2 Chart Using BACON 
UCL = 21.07 
 
42 
 
 
 
 
 
 
 
5 Summary and Conclusions 
     Hotelling's T2 chart using BACON estimators, although ultimately unsuccessful in 
exceeding the standard of performance established by Jobe and Pokojovy's (2009) cluster-based 
chart, was shown to be a viable option for Phase I analysis of individual multivariate normally 
distributed data under certain conditions.  If the level of contamination in a reference data set of 
size n = 30 - 100 is thought to be relatively small (less than 5 - 10%, depending on the sample 
size and dimension), or if any number of outliers are suspected to be extremely large (as 
measured by their NCP, and also dependent on n and p), Hotelling's T2 chart using BACON 
estimators offers good performance with a low computational burden.  This is in contrast to other 
robust parameter estimation methods such as MVE and MCD which, as discussed by Billor et al. 
(2000), are significantly more computationally complex.   
 If, on the other hand, a more universally robust procedure is desired, Jobe and Pokojovy's 
(2009) cluster-based chart was shown to be extremely effective across a wide range of scenarios, 
including the presence of a 50% sustained shift of the mean -- a level of contamination which for 
most methods renders IC data indistinguishable from OC data.  However, Jobe and Pokojovy's 
(2009) control chart is not without potential drawbacks.  The relatively complicated clustering 
algorithm may not be easy to understand for practitioners, and the computational complexity of 
the procedure is not detailed by the authors.  Furthermore, because complete computer code for 
Jobe and Pokojovy's (2009) cluster-based chart is not readily available, their simulation results 
cannot be validated.  It is therefore impossible to give their method an unqualified endorsement 
 
43 
 
at this time.  While it is assumed that the general observations regarding the comparative 
performance of Jobe and Pokojovy's (2009) chart to other robust Hotelling's T2 charts are valid, 
detailed conclusions cannot be drawn without replicating Jobe and Pokojovy's (2009) simulation 
results using the process outlined in Chapter 3.  This is a topic for future research.   
 Another area that merits further investigation is the effect of sample size on control chart 
performance.  In accordance with Jobe and Pokojovy's (2009) experimental design, only sample 
sizes of n = 30, 50, and 100 were evaluated in this research.  Though they may be commonly 
encountered in Phase I scenarios, these are very small sample sizes for all but the smallest 
dimensions considered in this research (p = 2, 3, 5, and 10).  Billor et al. (2000) originally 
designed the BACON method to be a computationally efficient robust parameter estimation 
method for extremely large data sets in higher dimensions, and the authors offered a variety of 
simulation results for n = 100 - 10,000 and p = 5 - 20 to demonstrate its performance.  The 
authors noted that the BACON method matched the performance of the MVE, MCD, and other 
robust parameter estimation methods on all published test problems, but at a mere fraction of the 
computational expense.  If used as originally intended (with large n and p), BACON estimators 
in conjunction with Hotelling's T2 chart might become a more attractive option than Hotelling's 
T2 chart using MVE or MCD estimators and perhaps even Jobe and Pokojovy's (2009) cluster-
based chart, both in terms of speed and accuracy. 
 In conclusion, improving the robustness of Hotelling's T2 chart applied to individual 
multivariate normally distributed data in Phase I has been the subject of much research over the 
years, yet this study has shown that many questions remain unanswered.  Numerous variations of 
Hotelling's T2 chart have been proposed, but none so far appear to have attained a balance of 
accuracy, robustness, computational complexity, and ease of implementation for a wide range of 
 
44 
 
n and p.  The BACON-based version of Hotelling's T2 chart presented here, despite falling short 
of its original performance objectives, provides a valuable contribution by demonstrating its 
strengths and weaknesses as a Phase I method when sample sizes are small, and in the process 
identifying several areas worthy of additional research.  
 
45 
 
 
 
 
 
 
 
References 
Alfaro, J.L., & Ortega, J.F. (2008). A Robust Alternative to Hotelling's T2 Control Chart Using 
Trimmed Estimators. Quality and Reliability Engineering International, 24, 601-611. 
 
Bersimis, S., Psarakis, S., & Panaretos, J. (2007). Multivariate Statistical Process Control Charts:  
An Overview. Quality and Reliability Engineering International, 23, 517-543. 
 
Billor, N., Hadi, A.S., & Velleman, P.F. (2000). BACON: Blocked Adaptive Computationally 
Efficient Outlier Nominators. Computational Statistics & Data Analysis, 34, 279-298. 
 
Chenouri, S., & Steiner, S.H. (2009). A Multivariate Robust Control Chart for Individual 
Observations. Journal of Quality Technology, 41(3), 259-271. 
 
Chenouri, S., & Variyath, A.M. (2011). A Comparative Study of Phase II Robust Multivariate 
Control Charts for Individual Observations. Quality and Reliability Engineering 
International, 27(7), 857-865.   
 
Donoho, D.L. & Huber, P.J. (1983). The Notion of a Breakdown Point. In P.J. Bickel, K.A. 
Doksum and J.L. Hodges, Jr. (Eds.), A Festschrift for Eric L. Lehmann (pp. 157-184). 
Belmont, CA: Wadsworth. 
 
Hotelling, H. (1947). Multivariate Quality Control ? Illustrated By the Air Testing of Sample 
Bombsights. In C. Eisenhart, M.W. Hastay, & W.A. Wallis (Eds.), Techniques of 
Statistical Analysis (pp. 111-184). New York, NY: McGraw-Hill.   
 
Jensen, W.A., Jones-Farmer, L.A., Champ, C.W., & Woodall, W.H. (2006). Effects of Parameter 
Estimation on Control Chart Properties: A Literature Review. Journal of Quality 
Technology, 38(4), 349-364. 
 
Jensen, W.A., Birch, J.B., & Woodall, W.H. (2007). High Breakdown Estimation Methods for 
Phase I Multivariate Control Charts. Quality and Reliability Engineering International, 
23(5), 615-629. 
 
Jobe, J.M., & Pokojovy, M. (2009). A Multistep, Cluster-Based Multivariate Chart for 
Retrospective Monitoring of Individuals. Journal of Quality Technology, 41(4), 323-339. 
 
Lopuhaa, H.P., & Rousseeuw, P.J. (1991). Breakdown Points of Affine Equivariant Estimators 
of Location and Covariance Matrices. The Annals of Statistics, 19, 229-248. 
 
 
46 
 
Lowry, C.A., & Montgomery, D.C. (1995). A Review of Multivariate Control Charts. IIE 
Transactions, 27, 800-810. 
 
Mahalanobis, P. C. (1936). On the Generalized Distance in Statistics. Proceedings of the 
National Institute of Science of India, 12, 49-55. 
 
Mason, R.L., Champ, C.W., Tracy, N.D., Wierda, S.J., & Young, J.C. (1997). Assessment of 
Multivariate Process Control Techniques. Journal of Quality Technology, 29(2), 140-143. 
 
Mason, R.L., & Young, J.C. (2002). Multivariate Statistical Process Control with Industrial 
Applications. Alexandria, VA: American Statistical Association; Philadelphia, PA: 
Society for Industrial and Applied Mathematics. 
 
Mohammadi, M., Midi, H., Arasan, J., & Al-Talib, B. (2011). High Breakdown Estimators to 
Robustify Phase II Multivariate Control Charts. Journal of Applied Sciences, 11(3), 503-
511. 
 
Montgomery, D.C. (2005). Introduction to Statistical Quality Control (5th ed.). Hoboken, NJ: 
John Wiley & Sons, Inc.  
 
Mudholkar, G.S., & Srivastava, D.K. (2000). A Class of Robust Stepwise Alternatives to 
Hotelling's T2 Tests. Journal of Applied Statistics, 27(5), 599-619. 
 
Nedumaran, G., & Pignatiello, J.J. (2000). On Constructing T2 Control Charts for Retrospective 
Examination. Communications in Statistics ? Simulation and Computation, 29(2), 621-
632. 
 
Oyeyemi, G.M., & Ipinyomi, R.A. (2010). A Robust Method of Estimating Covariance Matrix in 
Multivariate Data Analysis. African Journal of Mathematics and Computer Science 
Research, 3(1), 1-18. 
 
Rousseeuw, P.J. (1984). Least Median of Squares Regression. Journal of the American 
Statistical Association, 79, 871-880. 
 
Rousseeuw, P.J., & Leroy, A.M. (1987). Robust Regression and Outlier Detection.  New York, 
NY: John Wiley & Sons, Inc. 
 
Rousseeuw, P.J., & van Driessen, K. (1999). A Fast Algorithm for the Minimum Covariance 
Determinant Estimator. Technometrics, 41, 212-223. 
 
Rousseeuw, P.J., & van Zomeren, B.C. (1990). Unmasking Multivariate Outliers and Leverage 
Points. Journal of the American Statistical Association, 85, 633-651. 
 
Shewhart, W.A. (1939). Statistical Method from the Viewpoint of Quality Control. New York: 
Dover Publications. 
 
 
47 
 
Srivastava, D.K., & Mudholkar, G.S. (2001). Trimmed T2: A Robust Analog of Hotelling's T2. 
Journal of Statistical Planning and Inference, 97, 343-358.  
 
Sullivan, J.H., & Woodall, W.H. (1996). A Comparison of Multivariate Control Charts for 
Individual Observations. Journal of Quality Technology, 28, 398-408. 
 
Sullivan, J.H., & Woodall, W.H. (1998). Adapting Control Charts for the Preliminary Analysis 
of Multivariate Observations. Communications in Statistics ? Simulation and 
Computation, 27(4), pp. 953-979. 
 
Tiku, M.L., & Balakrishnan, N. (1988). Robust Hotelling-Type T2 Statistics Based on the 
Modified Maximum Likelihood Estimators. Communications in Statistics ? Theory and 
Methods, 17(6), 1789-1810. 
 
Tiku, M.L., & Singh, M. (1982). Robust Statistics for Testing Mean Vectors of Multivariate 
Distributions. Communications in Statistics ? Theory and Methods, 11(9), 985-1001. 
 
Vargas, J.A. (2003). Robust Estimation in Multivariate Control Charts for Individual 
Observations. Journal of Quality Technology, 35(4), 367-376. 
 
Wierda, S.J. (1994). Multivariate Statistical Process Control ? Recent Results and Directions for 
Future Research. Statistica Neerlandica, 48, 147-168. 
 
Willems, G., Pison, G., Rousseeuw, P.J., & Van Aelst, S. (2002). A Robust Hotelling Test. 
Metrika, 55, 125-138. 
 
Woodall, W.H., & Montgomery, D.C. (1999). Research Issues and Ideas in Statistical Process 
Control. Journal of Quality Technology, 31(4), 376-386. 
 
Yanez, S., Gonzalez, N., & Vargas, J.A. (2010). Hotelling's T2 Control Charts Based on Robust 
Estimators. Dyna, 163, 239-247. 
 
  
 
48 
 
 
 
 
 
 
 
Appendices 
 
 
Appendix A:  MATLAB Code for Simulating Hotelling?s T2 Chart Empirical Control Limits 
Appendix B:  MATLAB Code for Simulating Hotelling's T2 Chart Performance 
 
 
 
 
 
49 
 
Appendix A:  MATLAB Code for Simulating Hotelling?s T2 Chart Empirical Control Limits 
 
%=========================================================================% 
%   FINDING EMPIRICAL CONTROL LIMITS FOR HOTELLING'S T^2 CONTROL CHART    % 
%=========================================================================% 
%  -Created by Richard Bell 8/12/2011; last updated 9/25/2011             % 
%  -Finds empirical UCLs for Hotelling's T2 control chart with BACON,     % 
%   MVE, or MCD location and scatter estimators                           % 
%  -Uses same method as Jensen, Birch, and Woodall (2007)                 % 
%=========================================================================% 
 
 
clear all  % clear all objects in the MATLAB workspace 
clc  % clear the output screen 
  
 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
  
iterations=100000;  % number of simulation iterations to be performed 
input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','a1:b10'); 
inputRows=length(input(:,1));  % determine the number of rows of data in the 
 input file 
empUCLtable=zeros(inputRows,1);  % initialize the table of empirical UCLs to 
 all zeros 
  
% NOTE:  USE ONLY IF MVE METHOD IS EMPLOYED 
[status,msg] = openR; 
if status ~= 1 
    disp(['Problem connecting to R: ' msg]); 
end 
  
for row=1:inputRows  % perform the simulation below for each scenario in the 
 input file 
    n=input(row,1);  % read in the sample size 
    p=input(row,2);  % read in the number of variables 
 
    alpha=.05;  % desired overall false alarm probability (FAP) for the chart 
    percentile=(1-alpha)*100;  % percentile corresponding to the desired 
 alpha level 
    maxT2vector=zeros(iterations,1);  % initialize the vector of maximum T2 
 statistics to all zeros 
     
 
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
    %%%%%%%%%% GENERATE DATA AND CONSTRUCT HOTELLING'S T2 CHART %%%%%%%%%%% 
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
     
    count=0;  % initialize the counter for the number of iterations performed 
     
    while count < iterations  % run the entire loop for a set number of 
 iterations        
 
50 
 
        %=====> SIMULATE MULTIVARIATE NORMAL DATA 
         
        mu=zeros(1,p);  % set the mean vector to all zeros 
        sigma=eye(p);  % set the covariance matrix equal to the identity  
  matrix 
        X=mvnrnd(mu,sigma,n);  % generate multivariate normal data 
         
 
        %=====> COMPUTE ROBUST ESTIMATES OF LOCATION AND SCATTER (must code  
  out 2 of the 3 methods listed using % symbols) 
         
        % BACON METHOD 
        out=baconV(X,2,.10,6);  % compute BACON estimate for location using  
  Mahalanobis distance, alpha=0.05, and c=4; use version 2   
  (Euclidean distance) if expected contamination exceeds 20 percent 
        Xbar_robust=out.center3;  % BACON estimate for mean vector 
        S_robust=out.cov3;  % BACON estimate for covariance matrix 
         
        % MVE METHOD 
        evalR('library(MASS)'); 
        putRdata('X',X); 
        putRdata('n',n); 
        putRdata('p',p); 
        Xbar_robust=evalR('cov.mve(X, cor=FALSE, quantile.used=floor((n + p + 
  1)/2), nsamp = "best")$center'); 
        S_robust=evalR('cov.mve(X, cor=FALSE, quantile.used=floor((n + p +  
  1)/2), nsamp = "best")$cov'); 
         
        % MCD METHOD 
        [rew,raw]=mcdcov(X,'plots',0);  % compute MCD estimates for location  
  and scatter using default parameter values; suppress plot output  
  by adding the arguments ('plots',0) 
        Xbar_robust=rew.center;  % MCD estimate for mean vector 
        S_robust=rew.cov;  % MCD estimate for covariance matrix 
         
         
        %=====> COMPUTE HOTELLING'S T2 STATISTICS AND COMPARE TO UCL 
         
        T2vector=zeros(n,1);  % initialize the vector of T2 statistics 
         
        for i=1:n  % compute T2 control statistic for each observation  
            T2stat=(X(i,:)-Xbar_robust)/S_robust*(X(i,:)-Xbar_robust)';   
            T2vector(i)=T2stat;  % store the T2 statistics in a vector 
        end 
         
        count=count+1;  % increment the counter for the total number of  
  iterations performed 
         
        maxT2=max(T2vector);  % identify the maximum T2 statistic 
        maxT2vector(count,1)=maxT2;  % store the maximum T2 statistic in a  
  vector 
                
    end 
     
 
51 
 
    empUCL=prctile(maxT2vector,percentile);  % compute the empirical UCL for 
 the current scenario IAW Jensen, Birch, and Woodall (2007) 
     
    % store the results in a table and display the table on the output screen 
    empUCLtable(row,1)=empUCL; 
    disp(empUCLtable); 
     
    % send the results to an Excel file 
   
xlswrite('c:\Users\Rich\Documents\OutFile.xlsx',empUCLtable,'Sheet1','A1'); 
     
end 
  
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
  
 
52 
 
Appendix B:  MATLAB Code for Simulating Hotelling's T2 Chart Performance 
 
%=========================================================================% 
%               HOTELLING'S T^2 CONTROL CHART PROGRAM FILE                % 
%=========================================================================% 
%  -Created by Richard Bell on 9/15/2010; last updated on 9/25/2011       % 
%  -Based on Hotelling's T2 chart with classical or empirical UCLs        % 
%  -File is set up to run multiple scenarios; before using, undesired     % 
%   sections must be commented out using "%"                              % 
%=========================================================================% 
  
  
clear all  % clear all objects in the MATLAB workspace 
clc  % clear the output screen 
  
  
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
  
% read in m, n, UCL, shift size, and p from an Excel file 
iterations=10000;  % number of simulation iterations to be performed 
input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','a1:e10'); 
inputRows=length(input(:,1));  % determine the number of rows of data in the 
 input file 
APtable=zeros(inputRows,1);  % initialize the table of alarm probabilities to 
 all zeros 
  
% NOTE:  USE ONLY IF MVE METHOD IS EMPLOYED 
[status,msg] = openR; 
if status ~= 1 
    disp(['Problem connecting to R: ' msg]); 
end 
  
for row=1:inputRows  % perform the simulation below for scenario in the input 
 file 
    n=input(row,1);  % read in the sample size 
    p=input(row,2);  % read in the number of variables 
    UCL=input(row,3);  % read in the empirical upper control limit 
    shiftSize=input(row,4); % read in the size of the shift (as defined by 
 the NCP) 
    numOC=input(row,5);  % read in the number of out-of-control points 
     
    count=0;  % initialize the counter for the number of iterations performed 
    alarmCount=0;  % initialize the alarm counter 
     
    while count < iterations  % run the entire loop for a set number of 
iterations 
         
 
 
 
         
 
53 
 
        %=====> SIMULATE MULTIVARIATE NORMAL DATA 
         
        % NOTE:  USE THIS UCL ONLY FOR NORMALLY DISTRIBUTED DATA! 
        alpha=.05;  % desired overall false alarm probability (FAP) for the  
  chart 
        alphaAdjusted=1-(1-alpha)^(1/n);  % desired FAP for each individual  
  comparison 
        UCL=((n-1)^2/n)*betainv(1-alphaAdjusted,p/2,(n-p-1)/2)  % Tracy,  
  Young, and Mason's (1992) Phase I UCL 
         
        mu=zeros(1,p);  % set the mean vector to all zeros 
        sigma=eye(p);  % set the covariance matrix equal to the identity  
  matrix 
        X=mvnrnd(mu,sigma,n);  % simulate multivariate normally distributed  
  data 
        shift=zeros(1,p);  % initialize the shift vector to all zeros 
        shift(1)=sqrt(shiftSize);  % place the desired shift in the first  
  position of the shift vector 
         
        % check the NCP to ensure it equals the desired value 
        NCP=shift/eye(p)*shift'; 
        if abs(NCP-shiftSize) > 0.0001  % display error message if calculated 
  NCP is significantly different than shift size (they should be  
  equal since the theoretical covariance matrix of X is I) 
            disp('ERROR in NCP!') 
        end 
                 
        % add the desired shift to randomly selected points 
        i=1; 
        randIndex=randperm(n)'; 
        while i <= numOC 
            X(randIndex(i),:)=X(randIndex(i),:)+shift; 
            i = i + 1; 
        end 
         
 
        %=====> COMPUTE ROBUST ESTIMATES OF LOCATION AND SCATTER (must code  
  out 2 of the 3 methods listed using % symbols) 
         
        % BACON METHOD 
        out=baconV(X,2,.10,3);  % compute BACON estimate for location using  
  Mahalanobis distance, alpha=0.05, and c=4; use version 2   
  (Euclidean distance) if expected contamination exceeds 20 percent 
        Xbar_robust=out.center3;  % BACON estimate for mean vector 
        S_robust=out.cov3;  % BACON estimate for covariance matrix 
         
        % MVE METHOD (requires code for R interface) 
        evalR('library(MASS)');  % call the R library named "MASS" 
        putRdata('X',X);  % send sample data to R 
        Xbar_robust=evalR('cov.mve(X)$center');  % use R to find MVE estimate 
  for mean vector 
        S_robust=evalR('cov.mve(X)$cov');  % use R to find MVE estimate for  
  covariance matrix 
         
 
 
54 
 
        % MCD METHOD 
        [rew,raw]=mcdcov(X,'plots',0);  % compute MCD estimates for location  
  and scatter using default parameter values; suppress plot output  
  by adding the arguments ('plots',0) 
        Xbar_robust=rew.center;  % MCD estimate for mean vector 
        S_robust=rew.cov;  % MCD estimate for covariance matrix 
         
        % CLASSICAL METHOD 
        Xbar_robust=mean(X); 
        S_robust=cov(X); 
         
         
        %=====> COMPUTE HOTELLING'S T2 STATISTICS AND COMPARE TO UCL 
         
        alarm=0;  % initialize indicator variable representing an alarm (=1)  
  or no alarm (=0) 
        T2vector=zeros(n,1);  % initialize vector of T2 statistics 
         
        for i=1:n  % perform loop for all observations in the sample 
            if alarm==0  % continue loop as long as no false alarms occur 
                T2stat=(X(i,:)-Xbar_robust)/S_robust*(X(i,:)-Xbar_robust)';   
   % compute T2 control statistic for each observation 
                T2vector(i)=T2stat;  % store T2 control statistics in a  
   vector 
                if T2stat > UCL 
                    alarm=1;  % issue an alarm if the current T2 control  
    statistic exceeds the UCL 
                end 
            end 
        end 
         
        if alarm==1 
            alarmCount=alarmCount+1;  % if a control chart issues an alarm,  
  increment the counter representing total alarms for all   
  iterations 
        end 
         
        count=count+1;  % increment the counter for the total number of  
  iterations performed 
         
    end 
     
    AP=alarmCount/iterations;  % estimate the alarm probability (AP) for the 
 current scenario 
    APtable(row,1)=AP;  % store the current AP in a table 
    disp(APtable);  % display AP table for Hotelling's T2 chart on screen 
     
    % send the estimated APs to an Excel file 
 
    xlswrite('c:\Users\Rich\Documents\OutFile.xlsx',APtable,'Sheet1','A1'); 
     
end 
  
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%