A Distribution-Free Control Chart for Retrospective Location Analysis of Subgrouped Multivariate Data by Richard C. Bell, Jr. A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 6, 2011 Keywords: phase I, preliminary, in-control reference sample, robust, nonparametric, data depth Copyright 2011 by Richard C. Bell, Jr. Approved by Saeed Maghsoodloo, Co-chair, Professor of Industrial Engineering L. Allison Jones-Farmer, Co-chair, Associate Professor of Management Nedret Billor, Associate Professor of Mathematics and Statistics Alice E. Smith, Professor of Industrial Engineering ii Abstract In multivariate quality control, a proper Phase I analysis is essential to the success of Phase II monitoring. Even self-starting methods, which seek to minimize the Phase I process, usually recommend a single retrospective analysis at some point in the control charting process. This is true regardless of the underlying distribution of a process, which cannot often be assumed to be multivariate normal. A literature review reveals no distribution-free Phase I multivariate techniques in existence, so this research seeks to fill that gap by developing a distribution-free method of establishing an in-control reference sample for subgrouped multivariate processes in Phase I. The resulting multivariate sample, representing the in-control state of a process, can then be used to estimate the appropriate parameters for the Phase II multivariate quality control monitoring method of choice. The proposed method, which assumes constant covariance within subgroups, uses data depth in conjunction with robust estimators to detect both isolated and sustained shifts in subgroup location. Using Monte Carlo simulation, the proposed method is compared to the traditional Hotelling's T2 chart with a Phase I upper control limit. Although Hotelling's T2 chart is preferred when data are multivariate normally distributed, the proposed method is shown to perform significantly better than Hotelling's T2 chart when a process distribution is heavy-tailed or skewed. iii Acknowledgements The author would first like to thank the United States Army for allowing him the opportunity to pursue his dream of achieving a doctoral degree. He dedicates this work to all veterans of the armed forces, in particular those who have given their lives in defense of this great country. The author is also deeply grateful to Dr. L. Allison Jones-Farmer for suggesting this research topic and spending countless hours guiding this research as committee co-chair, to Dr. Saeed Maghsoodloo for his expert advice as committee co-chair, and to Dr. Nedret Billor and Dr. Alice E. Smith for their valuable contributions as committee members. In addition, the author is extremely thankful for the keen insights provided by researchers outside the university such as Dr. Robert Serfling, Dr. Joe H. Sullivan, and Dr. Satyaki Mazumder. Finally, the author would like to acknowledge that this work would not have been possible without the guiding hand of God in his life and the unwavering support of his family and friends, especially his wonderful mother Phyllis Carter and his beautiful fiance Heide Matthews. iv Table of Contents Abstract ........................................................................................................................................... ii Acknowledgements ........................................................................................................................ iii List of Tables ................................................................................................................................. ix List of Figures ................................................................................................................................. x List of Abbreviations .................................................................................................................... xii 1 Introduction and Literature Review ........................................................................................ 1 1.1 Background and Motivation ............................................................................................. 1 1.2 Differences Between Phase I and Phase II ....................................................................... 3 1.3 Phase II Multivariate Control Charting Methods ............................................................. 6 1.3.1 Phase II Multivariate Parametric Charts ................................................................... 7 1.3.2 Phase II Multivariate Distribution-Free, Nonparametric, and Robust Charts .......... 9 1.3.3 Phase II Multivariate Rank-Based Charts ............................................................... 10 1.4 Self-Starting Multivariate Control Charting Methods ................................................... 12 1.5 Phase I Multivariate Control Charting Methods ............................................................ 14 1.6 Developing a Distribution-Free Phase I Procedure -- A Univariate Example ............... 17 1.7 Special Considerations in Multivariate Quality Control ................................................ 21 v 1.8 Organization of Dissertation .......................................................................................... 22 2 Measuring Centrality of Multivariate Data Using Data Depth ............................................. 23 2.1 Fundamentals of Data Depth .......................................................................................... 23 2.2 Desirable Properties of Data Depth Functions ............................................................... 24 2.3 Robust Mahalanobis Depth ............................................................................................ 27 2.4 Mahalanobis Spatial Depth ............................................................................................ 32 2.5 Simplicial Depth ............................................................................................................. 36 3 The Multivariate Mean-Rank (MMR) Control Chart ........................................................... 40 3.1 Introduction .................................................................................................................... 40 3.2 Design of the MMR Chart .............................................................................................. 42 3.2.1 The MMR Control Chart Statistic........................................................................... 45 3.2.2 Empirical Control Limits for the MMR Chart ........................................................ 47 3.2.3 Analytical Control Limits for the MMR Chart ....................................................... 49 3.3 Example Application of the MMR Chart ....................................................................... 52 4 MMR Chart Performance Assessment Methodology ........................................................... 54 4.1 Introduction .................................................................................................................... 54 4.2 Establishing Baseline Performance Using Hotelling's T2 Chart .................................... 54 4.3 Simulating Symmetric and Skewed Process Distributions ............................................ 56 4.4 Evaluating In-Control Performance ............................................................................... 57 4.5 Evaluating Out-of-Control Performance ........................................................................ 58 vi 4.6 Evaluating Out-of-Control Performance with Skewed Data .......................................... 60 5 MMR Chart Performance Comparisons ............................................................................... 64 5.1 Introduction .................................................................................................................... 64 5.2 MMR Chart Performance with Symmetric Distributions .............................................. 64 5.2.1 In-Control Performance with Symmetric Distributions .......................................... 65 5.2.2 Isolated Shifts of the Mean with Symmetric Distributions ..................................... 70 5.2.3 Sustained Shifts of the Mean with Symmetric Distributions .................................. 73 5.3 MMR Chart Performance with Skewed Data ................................................................ 78 5.3.1 In-Control Performance with Skewed Data ............................................................ 78 5.3.2 Isolated Shifts of the Mean with Skewed Data ....................................................... 80 5.3.3 Sustained Shifts of the Mean with Skewed Data .................................................... 82 5.4 MMR Chart Performance with Larger Subgroup Sizes ................................................. 91 5.5 Robust Estimators of Location and Scatter for the MMR Chart .................................... 94 6 An Example Phase I Analysis Using the MMR Chart ........................................................ 102 6.1 Simulating the Contaminated Reference Sample ......................................................... 102 6.2 Removing Outliers from the Sample ............................................................................ 103 6.3 Analyzing the Results................................................................................................... 107 7 Conclusion .......................................................................................................................... 108 7.1 Synopsis of Findings .................................................................................................... 108 7.2 Summary of Research Conducted ................................................................................ 108 vii 7.3 Recommendations for Phase I Analysis ....................................................................... 109 7.4 Recommendations for Phase II Monitoring ................................................................. 111 7.5 Future Research Directions .......................................................................................... 113 References ................................................................................................................................... 115 Appendices .................................................................................................................................. 123 Appendix A: MATLAB Code for Computing Robust Mahalanobis Depth .......................... 125 Appendix B: MATLAB Code for Computing Mahalanobis Spatial Depth ........................... 126 Appendix C: Expanded Table of Empirical UCLs for the MMR Chart ................................ 127 Appendix D: MATLAB Code for Finding Empirical UCLs for the MMR Chart ................. 128 Appendix E: Empirical UCLs for Hotelling's T2 Chart .......................................................... 132 Appendix F: MATLAB Code for Finding Empirical UCLs for Hotelling's T2 Chart ........... 133 Appendix G: MATLAB Code for Assessing MMR Chart Performance ............................... 139 Appendix H: MATLAB Code for Assessing Hotelling's T2 Chart Performance ................... 147 Appendix I: Simulation Results Using In-Control Symmetric Data ...................................... 153 Appendix J: Simulation Results Using Symmetric Data with an IS in p = 2 ......................... 154 Appendix K: Simulation Results Using Symmetric Data with an IS in p = 5 ....................... 155 Appendix L: Simulation Results Using Symmetric Data with an IS in p = 10 ....................... 156 Appendix M: Simulation Results Using Symmetric Data with a 5% SS in p = 2 ................. 157 Appendix N: Simulation Results Using Symmetric Data with a 15% SS in p = 2 ................ 158 Appendix O: Simulation Results Using Symmetric Data with a 30% SS in p = 2 ................ 159 viii Appendix P: Simulation Results Using Symmetric Data with a 5% SS in p = 10 ................. 160 Appendix Q: Simulation Results Using Symmetric Data with a 15% SS in p = 10 .............. 161 Appendix R: Simulation Results Using Symmetric Data with a 30% SS in p = 10 .............. 162 Appendix S: Simulation Results Using In-Control Skewed Data .......................................... 163 Appendix T: Simulation Results Using Skewed Data with an IS in p = 2 ............................. 164 Appendix U: Simulation Results Using Skewed Data with an IS in p = 5 ............................ 165 Appendix V: Simulation Results Using Skewed Data with a 5% SS in p = 2 ....................... 166 Appendix W: Simulation Results Using Skewed Data with a 15% SS in p = 2 .................... 167 Appendix X: Simulation Results Using Skewed Data with a 30% SS in p = 2 ..................... 168 Appendix Y: Simulation Results Using Skewed Data with a SS in p = 5 ............................. 169 Appendix Z: Subgroup Size Analysis Using In-Control Data ............................................... 170 Appendix AA: Subgroup Size Analysis Using Data with an IS in p = 5 ............................... 171 Appendix BB: Subgroup Size Analysis Using Data with a 15% SS in p = 5 ........................ 172 ix List of Tables Table 2.3.1 Data Ranked According to RMD.............................................................................. 32 Table 2.4.1 Data Ranked According to MSD .............................................................................. 36 Table 3.2.1 Empirical Control Limits for the MMR Chart .......................................................... 48 Table 3.2.2 Simulated IC FAPs Using Normal Theory Limits ................................................... 50 Table 3.3.1 MMR Chart Data for the First Subgroup of a Bivariate Process .............................. 53 Table 4.3.1 Summary of Planned Experiments ........................................................................... 57 Table 5.2.1 Recommended Phase I Control Chart Usage for Heavy-Tailed Data ...................... 78 Table 5.3.1 Recommended Phase I Control Chart Usage for Skewed Multivariate Data ........... 91 Table 6.1.1 MMR Chart UCLs for Chapter 6 Example ............................................................. 102 x List of Figures Figure 1.1.1 The Unification of Relevant Research Areas ............................................................ 2 Figure 1.6.1 Initial (Top Panel) and Revised (Bottom Panel) Control Charts ............................. 19 Figure 2.3.1 Bivariate Random Sample ....................................................................................... 30 Figure 2.4.1 Illustration of Spatial Depth .................................................................................... 33 Figure 2.5.1 Illustration of Simplicial Depth ............................................................................... 38 Figure 3.2.1 Q-Q Plots of Zi for m = 50, n = 5(5)20 .................................................................... 51 Figure 5.2.1 Empirical IC FAPs for Symmetric Bivariate Distributions ..................................... 67 Figure 5.2.2 Empirical IC FAPs for t(3) Processes in Higher Dimensions ................................. 69 Figure 5.2.3 Control Chart Performance on Symmetric Bivariate Data with an IS .................... 71 Figure 5.2.4 MMR-RMD/MSD Chart Performance on t(3) Data with an IS .............................. 72 Figure 5.2.5 Control Chart Performance on t(3) Data with an IS in Higher Dimensions ........... 73 Figure 5.2.6 Control Chart Performance on Increasingly Contaminated Bivariate t(3) Data ..... 75 Figure 5.2.7 MMR-RMD/MSD Chart Performance on Bivariate t(3) Data with a 30% SS ....... 76 Figure 5.2.8 Control Chart Performance on t(3) Data with a 15% SS in p = 10 ......................... 77 Figure 5.3.1 Empirical IC FAPs for Lognormal Processes in p = 2 and p = 5 ............................ 79 Figure 5.3.2 Control Chart Performance on Bivariate Lognormal Data with an IS .................... 80 Figure 5.3.3 MMR-MSD/RMD Chart Performance on Bivariate LGN Data with an IS ............ 81 Figure 5.3.4 Control Chart Performance on LGN Data with an IS in p = 5 ................................ 82 Figure 5.3.5 Control Chart Performance on Increasingly Contaminated LGN Data in p = 2 ..... 83 xi Figure 5.3.6 MMR-MSD Chart Performance on Increasingly Contaminated LGN Data ........... 84 Figure 5.3.7 MSD and RMD Rankings for Bivariate LGN Data with a 30% SS ........................ 85 Figure 5.3.8 Scatterplots of MSD vs. RMD Ranks for Shifted Bivariate LGN Data .................. 86 Figure 5.3.9 MMR-MSD/RMD Chart Performance on Increasingly Shifted LGN Data ............ 88 Figure 5.3.10 MMR-RMD Chart Performance on Bivariate LGN Data with a 30% SS ............ 89 Figure 5.3.11 Control Chart Performance on LGN Data with a 15% SS in p = 5 ....................... 90 Figure 5.4.1 Effects of Subgroup Size on Control Chart Performance Under an IS in p = 5 ...... 92 Figure 5.4.2 Effects of Subgroup Size on Chart Performance Under a 15% SS in p = 5 ............ 93 Figure 5.5.1 Comparison of MMR-RMD (Using BACON Estimators) and HT2 Charts ........... 95 Figure 5.5.2 The Effects of Increasing Shift Sizes on Univariate and Bivariate t(3) Data .......... 96 Figure 5.5.3 Improvement in MMR-RMD Chart Performance with New Estimators ................ 98 Figure 5.5.4 Change in Chart Performance When the Mean is Known ...................................... 99 Figure 5.5.5 Redistribution of Ranks Under 5% and 30% Sustained Shifts.............................. 100 Figure 6.2.1 Initial Application of Phase I Control Charts to the Lognormal Sample .............. 103 Figure 6.2.2 Second Iteration of the MMR-MSD Control Chart ............................................... 105 Figure 6.2.3 Final Control Charts After Four Iterations of Phase I Analysis ............................ 106 xii List of Abbreviations AP alarm probability ARL average run length BACON blocked adaptive computationally efficient outlier nominators CL center line CUSUM cumulative sum EAP empirical alarm probability EWMA exponentially weighted moving average FAP false alarm probability HT2 Hotelling's T2 IC in control IS isolated shift LCL lower control limit LGN lognormal MA moving average MCD minimum covariance determinant MCUSUM multivariate cumulative sum MEWMA multivariate exponentially weighted moving average MHD Mahalanobis depth MMR multivariate mean-rank xiii MSD Mahalanobis spatial depth MVE minimum volume ellipsoid OC out of control RBP replacement breakdown point RL run length RMCD reweighted minimum covariance determinant RMD robust Mahalanobis depth SD simplicial depth SPD spatial depth SS sustained shift UCL upper control limit Note: To avoid confusion, the reader should pay particular attention to the definitions provided for OC, SD, and SS. These abbreviations are often used in statistical literature to stand for operating characteristic, standard deviation, and sum of squares, respectively, but are defined differently in this document. 1 1 Introduction and Literature Review 1.1 Background and Motivation Multivariate statistical process control charts are necessary to simultaneously monitor two or more correlated variables representing quality characteristics of an industrial or other process. A multivariate control charting application usually involves a dimension reduction technique of converting multivariate observations to single dimensional control chart statistics which are then monitored using appropriate control limits. This approach accounts for the correlation structure in the data, whereas monitoring correlated variables using separate univariate control charts for each variable ignores the correlation among quality characteristics and can lead to erroneous conclusions about the state of a process. The first multivariate quality control chart is attributed to Hotelling (1947), who created the T2 chart to monitor bombsight data during World War II. Multivariate quality control charting has grown in both popularity and relevance since Hotelling's introduction. In a review of statistical process control research issues and ideas, Woodall and Montgomery (1999) pointed out the notable rise in multivariate quality control research due to increased measurement capability and computing power. Montgomery (2005, p. 489) noted that larger manufacturing databases have greatly increased the use of multivariate quality control methods in recent years. Bersimis, Psarakis, and Panaretos (2007) stated in a multivariate statistical process control overview that multivariate Shewhart-type charts are the 2 most common control charts in industry today, adding that more examination of this area is very important. In particular, they pointed out the need for more research into robust design of Hotelling's T2 chart and nonparametric control charts. As represented by Figure 1.1.1, the contribution of this research is the merger of three separately researched but highly related fields (distribution-free Phase I quality control, computational geometry, and robust parameter estimation) to provide a solution to the open problem of establishing an outlier-free reference sample for a multivariate process without the assumption of normality. Great strides have been made in each of the aforementioned research Figure 1.1.1 The Unification of Relevant Research Areas areas in recent years, yet no one in the statistical quality control field has leveraged recent developments in the manner accomplished by this research. The following chapters will detail the multivariate extension of an existing univariate distribution-free control chart for subgroup Distribution-Free Phase I Quality Control Robust Parameter Estimation Computational Geometry 3 location, including the use of appropriate data depth functions for purposes of dimension reduction and the implementation of an effective robust parameter estimation technique, to provide a solution to this problem. 1.2 Differences Between Phase I and Phase II A control charting application is typically divided into two distinct phases. In Phase I, also known as the preliminary analysis phase, when little is known about a process being studied, the objective is to identify an in-control (IC) reference sample. This involves retrospective analysis of a historical data set in order to eliminate any data points which do not accurately represent the routine operation of the process. The resulting data are described as IC because it is believed that all remaining variability in the process is inherent to the process itself and not due to assignable causes. Upon completion of Phase I, the IC reference sample is used to establish control limits for Phase II, the monitoring stage of a control charting application. In Phase II, newly observed data points are successively compared to the control limits to identify significant departures from the IC state. Should an observation fall outside the control limits, a search for an assignable cause is immediately undertaken. If the change in process behavior can be linked to special causes or external factors, the process is deemed out of control (OC) and remedial action is taken to correct the problem. Prior to conducting any analysis in a control charting scenario, it is usually assumed that the unedited reference sample may contain OC points and the control limits are unknown. The challenging nature of a Phase I analysis under these conditions has been recognized since the earliest days of statistical process control. Shewhart (1939, p. 76) said, "In the majority of practical instances, the most difficult job of all is to choose the sample that is to be used as the 4 basis for establishing the tolerance range. If one chooses such a sample without respect to the assignable causes present, it is practically impossible to establish a tolerance range that is not subject to a huge error." If a flawed Phase I analysis results in the erroneous inclusion of OC points in the IC reference sample, the control limits for Phase II monitoring will be too wide and OC situations will not be detected in a timely manner. This in turn will result in the production of poor quality goods or services for an unnecessarily protracted period of time. When the OC condition is finally detected, the substandard goods or services will have to be reworked, or scrapped and completely reproduced. This can cost the goods or services facility money in terms of labor and other operating expenses for rework or reproduction, additional materials necessary for reproduction, lost future production while previous work is redone, financial penalties for failure to meet contractual deadlines, and loss of customers due to dissatisfaction with faulty or untimely goods or services received. On the other hand, if a faulty Phase I analysis results in the erroneous exclusion of IC points from the IC reference sample, the control limits for Phase II monitoring will be too narrow and false alarms will repeatedly occur. False alarms require work stoppages to search for assignable causes, potentially costing the goods or services facility money in terms of lower throughput, idle workers while OC signals are investigated, overtime for quality control personnel investigating OC signals, financial penalties for failure to meet contractual deadlines, and loss of customers due to goods or services not being received in a timely manner. Ultimately, whether the resulting control limits are too wide or too narrow, an incorrect Phase I analysis can also cause a lack of confidence by all in the quality control methodology in place, creating a challenging environment for managers. 5 Phase I control charts are designed with the goal of achieving a specified overall IC false alarm probability (FAP), defined as the probability of one or more observations plotting outside the control limits in the absence of assignable causes. Phase I usually involves iteratively comparing the reference sample to trial control limits (corresponding to the desired overall IC FAP) estimated from the sample. At each iteration of a Phase I analysis, an OC point is eliminated from the reference sample if an assignable cause is identified, and trial control limits are updated excluding the OC point. This iterative process continues until all points in the reference sample are IC. Phase I analysis requires careful consideration when it involves methods which compute independent control chart statistics consisting of individual observations (e.g. the univariate X chart or the multivariate T2 chart) or subgrouped observations (e.g. the univariateX chart or the multivariate T2 chart). Provided the observations originate from random sampling, the control chart statistics are independent of one another. However, because the control limits are estimated from the reference sample itself in Phase I, the control limits are dependent on each sample point included in their calculation. Thus, successive comparisons of chart statistics to control limits are statistically dependent despite the control chart statistics themselves being independent. These dependencies often make it difficult to correctly determine the overall IC FAP for a Phase I analysis. Phase II, on the other hand, consists of comparing new observations (in the form of a chart statistic) to the control limits previously established in Phase I. Because the control limits in Phase II are fixed through conditioning, successive comparisons of chart statistics to control limits are independent provided the chart statistics are independent of one another (e.g. the X, X , and T2 charts). This is in contrast to moving average (MA), exponentially weighted moving 6 average (EWMA), or cumulative sum (CUSUM) charts and their multivariate counterparts, whose chart statistics include past observations and are therefore naturally dependent. Chart performance in Phase II is often measured using moments of the run length (RL) distribution. The RL is the number of observations until an OC signal is observed. If the comparisons of the chart statistics to the control limits are independent, the RL is a geometric random variable. The expected value of the IC RL is equal to 1/?, where ? is equal to the probability that a single chart statistic plots outside the control limits in the absence of assignable causes. The expected value of the RL is known as the average run length (ARL) and is commonly used to describe control chart performance in Phase II. The purpose of this research is to develop a Phase I procedure for subgrouped multivariate data that is distribution free when a process is IC. The procedure will be based on the use of data depth in conjunction with robust estimators of location and scale to reduce multivariate observations to univariate depth values, thus producing a center-outward ordering of the multivariate data. The corresponding ranks of the univariate depth values, in the form of a control statistic for each subgroup, will then be analyzed using a univariate chart. As the following literature review will demonstrate, this is an area in much need of additional research. 1.3 Phase II Multivariate Control Charting Methods Existing Phase II multivariate control charting methods will be discussed first, beginning with parametric charts. This will be followed by an examination of distribution-free, nonparametric, and robust techniques, and will conclude with a synopsis of depth-based nonparametric procedures for use in Phase II. Before undertaking this discussion, however, it is 7 important to distinguish precisely what is meant by the terms distribution free, nonparametric, and robust. Gibbons and Chakraborti (2003, p. 3) state, "In a distribution-free inference, whether for testing or estimation, the methods are based on functions of the sample observations whose corresponding random variable has a distribution which does not depend on the specific distribution of the population from which the sample was drawn." In other words, a "distribution-free" method uses a control chart statistic which follows the same distribution regardless of the underlying distribution of the process itself. Gibbons and Chakraborti (2003, p. 3) add, "On the other hand, strictly speaking, the term nonparametric test implies a test for a hypothesis which is not a statement about parameter values." This means that "nonparametric" control charting methods assess whether the distribution of a process, as opposed to specific parameters, has departed from the IC state. From this, it is clear that the terms distribution free and nonparametric are not synonymous, as a control charting method could be distribution free but not nonparametric and vice versa. Last but not least, the term "robust" will be used to refer to methods in which the distribution of the statistics are similar regardless of the distribution of the process data, but the methods may not be strictly distribution free. All characterizations of control charting methods as being distribution free, nonparametric, or robust refer to the IC state of a process only. 1.3.1 Phase II Multivariate Parametric Charts Hotelling's T2 control chart is the most familiar multivariate quality control chart in existence today [Montgomery (2005, p. 491)]. It is designed for detecting large shifts in the mean vector of a multivariate normally distributed process because it uses information only from 8 the current sample, and it can be applied during both Phase I and Phase II using appropriate control limits. Alternatively, authors such as Chenouri and Steiner (2009), Chenouri and Variyath (2011), and Mohammadi, Midi, Arasan, and Al-Talib (2011) have proposed bypassing Phase I by using the reweighted minimum covariance determinant (RMCD) method of Willems, Pison, Rousseeuw, and Van Alest (2002) to glean robust estimates of location and scatter from a reference sample, and implementing those estimates directly in a Phase II T2 control chart. In all cases, however, the T2 chart is reliant on the limiting assumption that the data follow a multivariate normal distribution. This chart's lack of robustness to nonnormality is well documented by distribution-free and nonparametric control chart authors such as Chou, Mason, and Young (2001), Liu, Singh, and Teng (2004), and Fricker and Chang (2009a) who evaluated their proposed methods in comparison to the traditional T2 chart applied to nonnormal process data. Crosier (1988) and Pignatiello and Runger (1990) proposed several multivariate cumulative sum (MCUSUM) charts which are more sensitive to small or gradual location shifts since they use past information in addition to the current sample, but these charts also rely on the assumption of multivariate normally distributed data. Jackson (1991) presented a T2 chart using principal components scores, a control chart for principal components residuals, and a control chart for each independent principal component's scores, all based on the assumption of a multivariate normally distributed process. The multivariate exponential weighted moving average (MEWMA) control chart developed by Lowry, Woodall, Champ, and Rigdon (1992) is, like the MCUSUM chart, sensitive to small or gradual shifts but likewise based on the assumption of multivariate normally distributed data. It can be designed to be robust to nonnormality by using a small smoothing constant as noted by Stoumbos and Sullivan (2002), 9 Testik, Runger, and Borror (2003), and Testik and Borror (2004). However, the MEWMA chart assumes that the IC process mean vector and covariance matrix are known, which is unlikely to be the case in Phase I. Numerous other parametric Phase II multivariate control charting methods, many of which are variants of the well-known T2, MEWMA, and MCUSUM charts, have been proposed but will not be detailed here. For comprehensive reviews of such charts, see Wierda (1994), Lowry and Montgomery (1995), Mason, Champ, Tracy, Wierda, and Young (1997), Woodall and Montgomery (1999), and Bersimis et al. (2007). 1.3.2 Phase II Multivariate Distribution-Free, Nonparametric, and Robust Charts Nonparametric, distribution-free, and robust multivariate control charting methods have been developed, yet they are usually designed for Phase II implementation. Hayter and Tsui (1994) proposed a nonparametric multivariate control chart to detect location changes in nonnormally distributed processes. This method is based on the empirical cumulative distribution function of a statistic formed from an IC reference sample of 500 or more observations, so it is strictly a Phase II method. Qiu and Hawkins (2001) developed a distribution-free, rank-based CUSUM procedure for detecting a location shift, but this method assumes knowledge of the IC mean vector. Chou et al. (2001) proposed a kernel smoothing technique for estimating the distribution of the T2 control statistic and the upper control limit of the T2 chart when the Phase II process data follow a multivariate exponential distribution. Qiu and Hawkins (2003) also introduced a nonparametric CUSUM procedure for detecting mean shifts in all directions. This method is based both on the order information among the measurement components as well as the order information between measurement components 10 and their IC means, but it assumes that the IC distribution of a process is known. Sun and Tsung (2003) developed a distribution-free multivariate control chart based on the distance between the "kernel centre" of the known IC sample and the new observation, using support vector methods to calculate the kernel distance. Thissen, Swierenga, de Weijer, Wehrens, Melssen, and Buydens (2005) used a combination of mixture modeling, which separates the data into Gaussian clusters, and statistical process control techniques to create a distribution-free multivariate control chart. This method requires an IC reference sample to estimate the moments of the Gaussian clusters and the fraction of observations in each cluster. Qiu (2008) proposed a distribution-free, log- linear modeling-based approach to estimating the IC multivariate distribution, as well as a distribution-free MCUSUM procedure for detecting location shifts in Phase II, but the availability of a set of IC data is assumed. Fricker and Chang (2009a) used a Kolmogorov- Smirnov test to compare the ranked kernel density estimates for a set of IC data and a set of the most recent data points. This method is nonparametric but again requires the existence of a multivariate reference sample. 1.3.3 Phase II Multivariate Rank-Based Charts Nonparametric multivariate control charts have also been proposed using simplicial data depth, which was first introduced by Liu (1990), as a dimension reduction technique. The idea behind simplicial depth-based control charts is to use the simplicial depth of a given multivariate point x within the data cloud formed by a multivariate reference sample ? ?1,..., nXX to produce a univariate center-outward ranking of the data points. A precise definition of simplicial depth in p dimensions will be presented in Chapter 2, but the simplicial depth of a bivariate point x is the 11 proportion of triangles formed by all possible triplets of points in ? ?1,..., nXX containing x. Simplicial depth in higher dimensions follows the same logic. Liu's (1995) suggested procedure is to calculate the simplicial depth of a given multivariate point, use the depth to create a control statistic reflecting the point's center-outward ranking relative to an IC reference sample, plot the control statistic on a univariate control chart, and finally compare the control statistic to control limits set to achieve a desired maximum IC FAP. The resulting control charts, called r, Q, and S charts, are essentially ,,XXand cumulative sum (CUSUM) charts respectively, using simplicial depth-based ranks instead of raw univariate data to compute control statistics. Liu (1995) describes these charts as completely nonparametric and able to simultaneously detect location and scale changes in a process. However, Stoumbos and Jones (2000) showed that the 500-observation IC reference sample recommended for Liu's (1995) charts was not large enough to achieve a satisfactory IC FAP for many process distributions, thus limiting the method's potential for widespread implementation. Liu et al. (2004) later introduced a simplicial data depth-based moving average (DDMA) control chart which is described as having better ability than the r and Q charts to detect changes in location while maintaining the same ability to detect changes in scale. The DDMA chart is also said to be completely nonparametric but as with most nonparametric methods, if the process data follow a multivariate normal distribution then a normal theory method (e.g. Hotelling's T2 chart) is preferred. Notably, all results from this study are derived using an IC reference sample of 1000, yet again raising the question of how one is to obtain such a large IC data set. Other data depth-based nonparametric approaches to the Phase II multivariate quality control problem have been developed, but they all assume a pre-existing IC reference sample. Zarate (2004) used principal components analysis to reduce the dimensionality of a process, and 12 then employed a nonparametric control chart based on data depth to monitor some of the principal components instead of the original variables. Beltran (2006) employed Liu's (1995) r chart using the simplicial depth ranks of the first and last set of principal components. Messaoud, Weihs, and Hering (2008) proposed a data depth-based, distribution-free EWMA control chart for multivariate observations. This procedure consists of computing the Mahalanobis or simplicial depth of a point with respect to the m most recent observations from a process, converting each depth to a sequential rank among the m most recent observations, and monitoring the standardized sequential ranks using the EWMA chart. The authors recommend an IC reference sample of 100 or more points to initiate this method. For multivariate data following an elliptical distribution, Hamurkaroglu, Mert, and Saykan (2004) developed a nonparametric control chart which consists of computing the Mahalanobis depth of each point, ranking each depth measurement with respect to a sample from an IC process, and then using r and Q charts proposed by Liu (1995) to monitor the ranks. Once more, the Phase I problem of identifying an IC reference sample must be solved before using any of these procedures. 1.4 Self-Starting Multivariate Control Charting Methods Self-starting multivariate methods, in which successive observations are used to update parameter estimates and check for OC conditions, have been suggested as a substitute for solving the Phase I problem because they can be implemented at the very beginning of a process. These methods are designed to reduce reliance on large and potentially costly Phase I samples required by some multivariate control charting procedures. As noted by Sullivan and Jones (2002, p. 25), they can be especially advantageous when production is slow, early OC production is expensive, or there are insufficient samples available to estimate parameters. 13 One of the earliest attempts at a self-starting multivariate control chart is Quesenberry's (1997) Q-chart, in which the author proposed computing a control chart statistic based on the quadratic form of the deviation of the current observation vector from the estimated mean vector. The control chart statistic is then transformed to a N(0, 1) scalar and monitored using a univariate Shewhart-type control chart. Schaffer (1998) employed the same basic methodology as Quesenberry (1997), but used a univariate EWMA scheme to monitor the resulting control chart statistic. Both methods assume multivariate normally distributed process data. Sullivan and Jones (2002) introduced a self-starting MEWMA chart, showing that it is more effective than the methods of Quesenberry (1997) and Schaffer (1998) and has the added advantage of robustness to nonnormality with an appropriate choice of smoothing constant. Sullivan and Jones (2002) caution that because parameter estimates are updated with each new observation, changes occurring near the beginning of a process can be unknowingly absorbed into the parameter estimates, thus masking the shift. To guard against this, Sullivan and Jones (2002) recommend augmenting their self-starting chart with a single retrospective analysis at a suitable point in the process, with the exact timing dependent on the dimension as well as other factors. Zamba and Hawkins (2006) developed a multivariate change-point model which claims to eliminate the requirement of a large Phase I sample. Their method analyzes standardized differences between potential preshift and postshift observations to identify the point at which the mean vector changes, but is only applicable to multivariate normal processes. Also, Zamba and Hawkins' (2006) chart assumes that the mean vector remains constant after a single shift occurs, so it is designed to detect a sustained shift of the mean only. 14 Hawkins and Maboudou-Tchao (2007) proposed a self-starting methodology which transforms multivariate normal observations with unknown parameters into multivariate standard normal observations which are then charted using the MEWMA chart or any other method requiring known parameters, thus bypassing the difficult task of parameter estimation. However, like most self-starting methods, this technique is susceptible to error resulting from early shifts in the process. Although the authors argue their method eliminates the need for a Phase I - Phase II distinction, they suggest that after the initial phase of data gathering, one should "start with the most recent process reading and successively add and chart the earlier readings back to the start of the sequence" [Hawkins and Maboudou-Tchao (2007, p. 206)] in order to diagnose undetected shifts occurring earlier in the process. These self starting methods are certainly viable alternatives under certain conditions. Nevertheless, they have not diminished the need for a more universally applicable distribution- free Phase I multivariate control chart procedure. 1.5 Phase I Multivariate Control Charting Methods There exist a number of control chart methods developed for use in Phase I, though they are mostly variations of Hotelling's T2 control chart based on the assumption of a multivariate normally distributed process. In addition, the majority of them deal with individual as opposed to subgrouped data. Hotelling's T2 control chart can be applied to individual data in Phase I using control limits outlined by Tracy, Young, and Mason (1992). However, Sullivan and Woodall (1996) showed that the usual practice of pooling all the individual observations to estimate the covariance matrix for a T2 chart results in poor performance in detecting step (sudden) and ramp (gradual) shifts in the mean vector. They instead proposed using the vector 15 differences between successive individual observations to estimate the IC covariance matrix for the T2 statistic, and demonstrated that this method works better in detecting mean shifts but not outliers. For processes consisting of either individual or subgrouped observations, Sullivan and Woodall (1998) proposed modified MCUSUM and MEWMA charts using simulated control limits to account for the correlation among control statistics as well as a regression-based method with exact (not simulated) limits for detecting sustained shifts in the mean vector. Using simulation, they showed that each of their three proposed methods is better at detecting small shifts in the mean vector than Hotelling's T2 chart. Nedumaran and Pignatiello (2000) addressed the issue of constructing T2 control chart limits for retrospective testing when the parameters of a subgrouped multivariate normally distributed process are unknown. They described and compared a computationally intensive method of determining the exact control limit, Bonferroni adjustments to Alt's (1976) Phase I control limit, and Bonferroni adjustments to the standard 2? limit, ultimately recommending Bonferroni adjustments to Alt's (1976) Phase I limit as the best alternative. Vargas (2003) proposed T2 control charts for Phase I analysis of individual multivariate normally distributed data using robust estimators of location and dispersion instead of the usual sample mean vector and sample covariance matrix. A total of five different estimators were considered, including the minimum volume ellipsoid (MVE) estimators of Rousseeuw and Van Zomeren (1990), first introduced by Rousseeuw (1984), and the minimum covariance determinant (MCD) technique of Rousseeuw and Van Driessen (1999), also introduced by Rousseeuw (1984). The MVE method finds the ellipsoid of minimum volume that covers a specified minimum number of data points, and uses the geometrical center of the ellipsoid as the 16 location estimator and the matrix defining the ellipsoid itself (multiplied by a constant) as the covariance matrix estimator. The MCD method finds the subset of data that has the smallest covariance matrix determinant while covering a specified minimum number of points. It then uses the sample mean vector and the sample covariance matrix (also multiplied by a constant) of the points in the subset as estimators for location and dispersion. Vargas also considered a trimming approach which removes a proportion of extreme values based on Mahalanobis distance, Sullivan and Woodall's (1996) sample mean vector and covariance matrix estimated from differences of successive observations, and an outlier detection algorithm proposed by Sullivan and Woodall (1996). Based on simulation results, Vargas recommended using both a T2 control chart based on MVE estimators for detecting multiple outliers and the T2 control chart suggested by Sullivan and Woodall (1996) to detect sustained shifts in the mean vector in Phase I. Jensen, Birch, and Woodall (2007) further detailed the advantages of using the MVE and MCD methods in conjunction with T2 control charts for detecting outliers in individual multivariate normally distributed data during Phase I. They determined that the MVE estimator is best for smaller sample sizes and a smaller percentage of outliers, while the MCD estimator is preferred for larger sample sizes or a larger percentage of outliers. The authors also provided tables of simulated control limits for both estimators. Other Phase I control charting efforts for multivariate normally distributed processes include Alfaro and Ortega's (2008) proposal to trim each variable to obtain robust estimates for the mean vector and covariance matrix, and then use those estimates in Hotelling's T2 chart with Tracy et al.'s (1992) Phase I UCL to provide enhanced outlier detection. Jobe and Pokojovy (2009) created a computationally intensive two-step method of identifying the largest bulk of 17 similar data from a time-ordered sequence of individual multivariate normally distributed points, and used the estimated mean vector and covariance matrix from this bulk in the T2 statistic with empirical control limits. The authors compared the performance of Hotelling's T2 chart using their method, the classical method of parameter estimation, and the robust methods analyzed by Vargas (2003) and Jensen et al. (2007), showing that their method results in improved performance in detecting outliers as well as location shifts in Phase I. The authors attribute their success to the fact that their method considers the time order of the data, whereas other methods do not. Oyeyemi and Ipinyomi (2010) robustly estimated the covariance matrix for Hotelling's T2 chart for individuals in Phase I by identifying a subset of data which meets specified optimality criteria, and then iteratively expanding the subset to a predetermined size. Their method was shown to outperform the MVE and MCD methods in a limited number of cases, but only bivariate normally distributed samples of size m = 30 were considered. Most recently, Yanez, Gonzalez, and Vargas (2010) proposed using biweight S estimators for location and scatter in a T2 chart for individual multivariate normally distributed data with simulated limits, showing that it outperforms Hotelling's T2 chart with MVE estimators for small samples. Distribution-free and nonparametric Phase I methods, on the other hand, have received little attention in multivariate quality control literature. The only chart found is Dai, Zhou, and Wang's (2006a) unpublished halfspace (Tukey) data depth-based nonparametric MCUSUM chart. 1.6 Developing a Distribution-Free Phase I Procedure -- A Univariate Example Although unanswered in the multivariate domain, the challenge of developing a distribution-free Phase I procedure has been addressed for the univariate case. The details of the 18 univariate Phase I solution are relevant to the multivariate Phase I problem because this research will ultimately rely on a univariate chart to monitor control statistics resulting from dimension reduction of a multivariate reference sample using data depth. The unique considerations involved in developing a distribution-free Phase I procedure are best illustrated by an example. Example 1.6.1 Consider a reference sample consisting of m = 25 independent subgroups, each containing n = 5 observations from an unknown distribution. The widely used Shewhart X chart with 3? limits can be created using the procedure outlined in Montgomery (2005), under the assumption that the distribution of subgroup averages is approximately normal due to the central limit theorem. Since the IC parameters o? and o? are unknown, the lower control limit (LCL), center line (CL), and upper control limit (UCL) are estimated using ?? 3 o oLCL n???? (1.6.1) ?oCL?? (1.6.2) ?? 3,o oUCL n???? (1.6.3) where ?o? and ?o? are unbiased estimators for o? and .o? Montgomery (2005, pp. 196-198) discusses several choices for ?o? and ?.o? Using Equations (1.6.1), (1.6.2), and (1.6.3), the initial Phase I control chart for this example is illustrated in the top panel of Figure 1.6.1. Suppose that investigation of the potential OC point represented by subgroup average number 11 reveals an assignable cause, so the point is deemed OC. The revised control limits in the bottom panel of Figure 1.6.1 are more narrow due 19 to the exclusion of subgroup 11, and all remaining subgroup averages now fall within the updated control limits. The IC reference sample has been established, and the most recent control limits can be used for Phase II monitoring. Figure 1.6.1 Initial (Top Panel) and Revised (Bottom Panel) Control Charts Determination of the overall IC FAP for the control chart in Example 1.6.1 would be straightforward under conditions of normality of subgroup averages and known parameters. The overall IC FAP or P(at least one false alarm among all m = 25 comparisons) would be calculated 20 as follows: (1 - (1 - 0.0027)25) = 0.0654. The overall IC FAP, while considerably higher than the individual FAP of 0.0027, could easily be lowered by using limits wider than 3.? If, on the other hand, the underlying distribution of the subgroup averages is not normal, the true overall IC FAP may be much larger. Suppose for example that the actual individual FAP is 0.01. Then the overall IC FAP equals (1 - (1 - 0.01)25) = 0.2222. With only a slight increase in the individual IC FAP, the overall IC FAP increased dramatically. This could result in a large number of IC subgroups being erroneously excluded during Phase I. Furthermore, when the parameters are unknown as in Example 1.6.1, successive comparisons of subgroup averages to control limits are dependent. Therefore, the overall IC FAP may not be determined using 1 minus the product of the complements of the m = 25 individual FAPs. Instead, control limits designed to achieve a specified overall IC FAP must be determined using the joint density function or the simulated empirical distribution of the subgroup averages. Champ and Jones (2004) dealt with the case of a normally distributed process and unknown parameters by using the (joint) multivariate t distribution of the m control statistics to define control limits to achieve a desired overall IC FAP. For processes in which normality cannot be established and parameters are unknown, Jones-Farmer, Jordan, and Champ (2009) proposed a rank-based Phase I location chart which is essentially a Shewhart chart of standardized subgroup mean ranks. This method uses approximate multivariate normal theory control limits (for large subgroup sizes n) and simulated control limits (for smaller subgroup sizes n) to achieve a specified overall IC FAP. The issues of data nonnormality and dependence among control statistics are problematic for any parametric Phase II control charting method used in Phase I, including multivariate 21 procedures. These are precisely the problems this research seeks to address by developing a distribution-free method of establishing an IC reference sample for a multivariate process consisting of subgrouped data. 1.7 Special Considerations in Multivariate Quality Control There are two drawbacks to multivariate quality control that must be kept in mind in any research effort. The first is computational complexity. Multivariate control charting methods are inherently more computationally intensive than univariate methods. Despite advances in quality control software, complex methods can quickly become unmanageable as the dimension of the data increases. The development of methods that only work for two or three variables, or that are too complex to be used by practitioners, must be guarded against. The second downside to multivariate quality control is the issue of interpretation. Multivariate control chart techniques do not directly identify which variable(s) caused an OC signal. As previously discussed, it is insufficient to simply separate and individually chart each variable belonging to an OC multivariate process, because correlated variables may behave differently alone than when in combination with each other. As a result, many useful approaches to interpreting OC signals in a multivariate setting have been proposed, and a summary of such works is provided by Bersimis et al. (2007) in an overview of multivariate statistical process control charts. While this problem will not be specifically addressed by this research, it should be considered when developing a new procedure. 22 1.8 Organization of Dissertation The remainder of this document is dedicated to the detailed development and application of a data depth-based, distribution-free Phase I multivariate control charting method for detecting location changes in subgrouped data. In Chapter 2, data depth is explored as a distribution-free method of reducing multi-dimensional data to univariate ranks, and the advantages and disadvantages of several depth functions considered for implementation are discussed. Chapter 3 addresses the actual design of the data depth-based, distribution-free Phase I control chart for subgrouped multivariate data. In Chapter 4, the simulation-based performance assessment plan for the proposed method is discussed, and detailed algorithms for measuring performance under various location shifts in normal, heavy-tailed, and skewed distributions are provided. Chapter 5 contains the results of extensive simulation runs comparing the proposed data depth-based, distribution-free Phase I multivariate method to Hotelling's T2 chart with Phase I UCL. Chapter 6 is dedicated to a comprehensive application of the proposed data depth-based, distribution-free Phase I multivariate method to a simulated historical data set containing several location shifts. This dissertation concludes in Chapter 7 with a synopsis of research conducted, recommendations for Phase I analysis when dealing with subgrouped multivariate data under conditions of normality and nonnormality, recommendations for subsequent Phase II monitoring, and discussion of areas in need of further investigation. 23 2 Measuring Centrality of Multivariate Data Using Data Depth 2.1 Fundamentals of Data Depth A data depth measures how deep (or central) a point p?x R is with respect to a certain probability distribution F or a given data cloud ? ?1,...,nn?X XX in .pR A data depth is computed by applying one of many known data depth functions to a multivariate data point, thus reducing it from a p-vector to a univariate depth value. Assuming unimodality of the data, a large depth value indicates centrality and a low depth value suggests outlyingness of a given point. Depth values are usually normalized to have a range of [0, 1]. The point of maximal depth is considered the center of the data and is referred to as the multivariate median. A data depth function may be visualized in p-dimensional space as a series of nested contours around the multivariate median, where each contour represents the set of p-dimensional points with equal depth values. Some depth functions force contours of a particular geometric form (e.g. elliptical), whereas others allow contours to follow the actual geometric shape of the data. Data depth facilitates the extension of order statistics to higher dimensions, because depth values can be ranked from largest to smallest to produce a center-outward ordering of the data. The ordered depth values can then be used to detect outliers, which are known in multivariate quality control literature as OC points. Data depth allows multivariate data from any distribution to be characterized by the relative position of the data points rather than parameters estimated 24 from the actual data values. This rank-based perspective makes data depth potentially very useful as a distribution-free method of multivariate analysis. The concept of data depth dates as far back as Tukey (1975), but until recently its usefulness for statistical quality control has been limited by the tradeoff between statistical properties, robustness to nonnormality, and computational complexity. After a comprehensive review of numerous existing depth functions, this research will implement robust Mahalanobis depth and Mahalanobis spatial depth because they are computationally feasible in any dimension, sufficiently robust to outliers under the assumptions of this research, and satisfy the four desirable properties of data depth functions discussed by Zuo and Serfling (2000). 2.2 Desirable Properties of Data Depth Functions For a depth function ? ?;DFx to serve most effectively as an analytical tool, the following four properties are required [Liu (1990), Zuo and Serfling (2000)]. Denote the class of probability distributions on pR by F. ? Property 1: Affine invariance. The depth of a point p?x R should not depend on the underlying coordinate system or, in particular, on the scales of the underlying measurements. This ensures that a point classified as an outlier or nonoutlier in one coordinate system is similarly classified in another coordinate system resulting from an affine transformation. Formally stated, ? ? ? ?;;D F D F???A X b XA x b x holds for any random vector X in ,pR any p x p nonsingular matrix A, and any p-vector b. ? Property 2: Maximality at center. For a distribution having a uniquely defined "center" (e.g., the point of symmetry with respect to some notion of symmetry), the depth function should attain maximum value at this center. This supports an accurate center-outward 25 ordering of the data points. Formally stated, ? ? ? ?; s u p ; pD F D F?? x xR? holds for any F?F having center ,? where F is the class of distributions on the Borel sets of .pR ? Property 3: Monotonicity relative to deepest point. As a point p?x R moves away from the "deepest point" (the point at which the depth function attains maximum value; in particular, for a symmetric distribution, the center) along any fixed ray through the center, the depth at x should decrease monotonically. This also supports an accurate center-outward ordering of the data points. Formally stated, for any F?F having deepest point ,? ? ? ? ?? ?;;D F D F?? ? ?xx?? holds for ? ?0,1.?? ? Property 4: Vanishing at infinity. The depth of a point x should approach zero as x approaches infinity, where x is the Euclidean norm of x. This ensures the data depth function is both bounded and nonnegative. Formally stated, ? ?;0DF?x as ,??x for each .F?F According to Zuo and Serfling (2000), depth functions which satisfy these four properties are particularly well suited for nonparametric multivariate inference, so these properties will serve as a useful basis for describing the data depth functions selected for implementation in this research. A depth function may be viewed as a location estimator, and as such may be characterized by its finite-sample replacement breakdown point (RBP). First defined by Donoho and Huber (1983), the RBP is the minimum fraction of a sample which must be replaced by outliers in order to completely ruin an estimate, so a low RBP indicates nonrobustness and a high RBP signifies robustness to outliers. When used to describe a depth function, the RBP is usually stated in reference to the multivariate median estimated by a depth function. The RBP of the 26 multivariate median is important because if the center of the data (as determined by the multivariate median) is significantly affected by outliers, the subsequent center-outward ordering will likewise be affected and outliers may be masked. Whether a depth function has a high or low RBP is often determined by the robustness of any location or scatter estimators used in its construction. The robustness of such location or scatter measures is also described using the RBP. Precise definitions of RBPs for both location and scatter estimators are adapted from Donoho and Huber (1983) and Lopuhaa and Rousseeuw (1991). Let ? ?1,...,nn?X XX be a random sample of size n in .pR The RBP of a location estimator T at ,nX or the smallest fraction k/n of outliers which can take the resulting estimate beyond any bound, is defined as ? ? ? ? ? ? , , ; m i n : s u p , nkn n n k kR B P T T Tn??? ? ? ????? XX X X (2.2.1) where ,nkX is a contaminated sample found by replacing k points of nX with arbitrary values. The RBP of a scatter estimator C at nX or the smallest fraction k/n of outliers which can drive the largest eigenvalue of the resulting estimate to infinity or the smallest eigenvalue of the resulting estimate to zero, is defined as ? ? ? ? ? ?? ? , , ; m i n : s u p , , nkn n n k kR B P C M C Cn??? ? ????? XX X X (2.2.2) where ,nkX is defined as before, ? ? ? ? ? ? ? ? ? ?? ?11 11, m a x , ,ppM ? ? ? ???? ? ?A B A B A B and ? ? ? ?1 p????AA are the ordered eigenvalues of the matrix A. To illustrate the idea of an RBP, consider a sample of size n in 1R and two common location estimators: the sample mean and the sample median. The sample mean has an RBP of 27 only 1/n because a single outlier could move the sample mean to infinity, so it is considered a nonrobust location estimator. In contrast, the sample median has the highest possible RBP of 1/2 because 1/2 of the sample would have to be contaminated with outliers in order to effect a corresponding shift in the sample median. Consequently, the sample median is the preferred location estimator in 1R from a robustness standpoint. In addition to having a high RBP, any location or scatter estimator used in conjunction with a data depth function should also be affine equivariant. From Lopuhaa and Rousseeuw (1991), a location estimator T is affine equivariant if ? ? ? ?TT? ? ?A X b A X b for any p-vector b and any p x p nonsingular matrix A, and a positive definite scatter estimator C is said to be affine equivariant if ? ? ? ? TCC??A X b A X A for any p-vector b and any p x p nonsingular matrix A. Akin to the concept of affine invariance for data depth functions, affine equivariance means that an estimator does not depend on the location, scale, or orientation of the data. According to Lopuhaa and Rousseeuw (1991), finding affine equivariant estimators with high RBPs is a challenging problem. However, these properties are of paramount importance to any multivariate quality control application, so only estimators possessing these properties will be considered in this research. 2.3 Robust Mahalanobis Depth The Mahalanobis depth (MHD) of a point x in pR with respect to a distribution F in pR is defined as ? ? ? ? ? ?? ? 12; 1 , FM H D F d F ??????????x x , ? (2.3.1) 28 where ? ?F? and ? ?F? are location and covariance measures defined on F and ? ? ? ? ? ?21,'d ?? ? ?M x y x y M x y is the Mahalanobis distance [Mahalanobis (1936)] between two points x and y in pR with respect to a positive definite p x p matrix M. When the distribution F is unknown and a random sample ? ?1,...,nn?X XX is used to estimate ? ?F? and ? ?,F? the sample version of the depth function is annotated as ? ?;,nMHD Fx where Fn denotes the empirical distribution function of the sample. MATLAB code for computing Mahalanobis depth, based on a modification of S. Mazumder's (personal communication, July 7, 2010) algorithm, is provided in Appendix A. The Mahalanobis depth function satisfies the four desirable properties listed by Zuo and Serfling (2000) and is relatively easy to compute, but assumes the underlying distribution F is elliptical and therefore produces elliptical contours of equal depth. In addition, as noted by Zuo and Serfling (2000), the RBP of the median determined by the Mahalanobis depth function is completely dependent on the choice of location and covariance measures ? ?F? and ? ?.F? If the classical location and covariance estimators nX and nS are used, the Mahalanobis depth function is nonrobust. The presence of even a single outlier can contaminate the estimators nX and ,nS possibly masking the presence of outliers. In order to preclude this, Mahalanobis depth should be used in conjunction with robust estimators. Mahalanobis depth will be referred to as robust Mahalanobis depth (RMD) when used with robust location and scatter estimators. There are numerous robust estimation methods from which to choose. Dang and Serfling (2010) noted that the computationally complex MCD method proposed by Rousseeuw (1984) or the more efficient Fast-MCD method of Rousseeuw and Van Driessen (1999) could be used to produce affine equivariant, robust location and 29 covariance estimates. As discussed in Chapter 1, the MCD method finds the subset of data that has the smallest covariance matrix determinant while covering a user-specified number of points. It then uses the sample mean vector and the sample covariance matrix of the points in the subset as estimators for location and dispersion. According to Jensen et al. (2007), MCD estimators have a maximum RBP of ? ?1 / 2 / ,n p n?????? which is approximately 1/2 for reasonable values of n and p, when the number of points used is equal to the integer value of ? ?1 / 2.np?? The Fast-MCD program is available in many statistical software packages such as R, S-PLUS, and SAS. In addition, a library of MATLAB codes for robust analysis including the Fast-MCD program may be obtained from the LIBRA website at http://wis.kuleuven.be/stat/robust/Libra.html. Another alternative for finding robust estimators of location and scatter is the blocked adaptive computationally efficient outlier nominators (BACON) method of Billor, Hadi, and Velleman (2000). The BACON method is very computationally efficient, even for extremely large data sets. It begins with a small outlier-free subset of the data, and then allows this subset to grow rapidly until a stopping criteria is reached. Two versions of this iterative forward selection method are available: Version 2 which is nearly affine equivariant and has RBPs exceeding 40% for various combinations of dimension and sample size, and Version 1 which is completely affine equivariant with RBP of approximately 20%. The Type I error probability (?) for the BACON method can be set to any number between 0 and 1, but ? = 0.05 is suggested for most applications. MATLAB code for the BACON method is available from the authors. After several rounds of experimentation, it was decided to use the BACON method (with ? = 0.10) to estimate the process mean vector and 1 1 ,m i im ?? ?SS the scatter estimator for 30 Hotelling's T2 chart when data are divided into m subgroups, to estimate the process covariance matrix. The BACON method was chosen as the location estimator because of its excellent balance between computational efficiency and robustness. Although S is generally not considered a robust estimator, it was chosen as the scatter estimator because it is highly robust to location shifts (the focus of this research) when process data possess a common within-subgroup covariance structure. Details are provided in Chapter 5. Example 2.3.1 To illustrate an application of the robust Mahalanobis depth function, consider the bivariate random sample ? ?5 1 5,...,? XXX from an unknown distribution, where each ? ?12 , 1, ..., 5 ,i X X i??X illustrated in Figure 2.3.1. The first step in computing RMD for this sample is to estimate the mean vector using the BACON method (with ? = 0.10) and the Figure 2.3.1 Bivariate Random Sample 11.15, 49.63 7.91, 36.46 5.42, 28.06 16.22, 38.77 8.09, 29.21 8.14, 35.84 0.00 10.00 20.00 30.00 40.00 50.00 60.00 0.00 5.00 10.00 15.00 20.00 X2 X1 Sample Data BACON Mean i 1 11.15 49.63 2 7.91 36.46 3 5.42 28.06 4 16.22 38.77 5 8.09 29.21 X i 31 covariance matrix using Hotelling's T2 scatter estimator for subgrouped data. Because this example involves individual as opposed to subgrouped observations, Hotelling's T2 scatter estimator for subgrouped data reduces to the classical nonrobust sample covariance matrix. Under these conditions, the robust BACON scatter estimator may be a better choice, but Hotelling's T2 scatter estimator is used to maintain consistency with the methodology employed throughout the remainder of this research. Estimates of location and scatter are determined to be: ? ? 2 1 2 8.14 35.84 17.18 20.45 20.45 75.48 0.09 0.02 . 0.02 0.02 B A CO N HT HT ? ? ?? ? ?? ?? ??? ? ?? ??? X S S Note that the BACON method excluded ? ?4 1 6 .2 2 3 8 .7 7?X from the estimated mean vector due to its outlyingness relative to the other points. Using the RMD function, ? ? ? ? ? ? 11; 1 , n r o b u s t r o b u s t r o b u s tR M D F ????? ? ? ???x x X S x X and the location and scatter estimates, the robust Mahalanobis depth for ? ?1 11.15 49.63?X is computed as follows: ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ? ? ? 1 1 1 5 1 2 1 1 1 1 ;1 0 .0 9 0 .0 2 1 1 1 .1 5 4 9 .6 3 8 .1 4 3 5 .8 4 1 1 .1 5 4 9 .6 3 8 .1 4 3 5 .8 4 0 .0 2 0 .0 2 0 .0 9 0 .0 2 1 3 .0 1 1 3 .7 9 3 .0 1 1 3 .7 9 0 .0 2 0 .0 2 1 2 .5 6 0 .2 8 . B A CO N H T B A CO N R M D F ? ? ? ? ? ??? ? ? ? ?? ??? ? ? ? ? ??? ?? ??? ?? ??? ? ?? ?? ?? ????? ?? ? X X X S X X 32 RMD computations for the four remaining observations in the sample proceed in the same manner. The final results, along with corresponding rankings, are provided in Table 2.3.1. As expected, 2X attains the highest depth value since it is closest to the center of the data set (as defined by the BACON mean vector), and 4X receives the lowest depth value since it is most outlying. i Xi RMD(Xi;F5) rank 1 11.15 49.63 0.28 4 2 7.91 36.46 0.98 1 3 5.42 28.06 0.55 2 4 16.22 38.77 0.18 5 5 8.09 29.21 0.54 3 Table 2.3.1 Data Ranked According to RMD 2.4 Mahalanobis Spatial Depth Mahalanobis spatial depth (MSD) [Dang and Serfling (2010)] is an attractive alternative to robust Mahalanobis depth because it is only slightly more difficult to compute yet is not restricted to elliptical distributions. This means that the contours of equal depth determined by the depth function conform to the geometric structure and shape of the data, as opposed to being constrained to an elliptical form. Mahalanobis spatial depth is based on the concept of spatial depth (SPD), defined by Vardi and Zhang (2000) for a point x in pR with respect to a distribution F in pR as ? ? ? ? ? ? if; 1 , w h e r e if . S P D F E ? ?? ? ? ? ? ?? ?? 0 00 x x xx S x X S x x (2.4.1) 33 Intuitively, the spatial depth of a multivariate point x is equal to one minus the average of the unit vectors from x to all observations in the sample. Spatial depth is graphically illustrated in Figure 2.4.1. Figure 2.4.1 Illustration of Spatial Depth The spatial depth function is quickly computable in any dimension, and its multivariate median has a very favorable RBP of 1/2 [Vardi and Zhang (2000)]. It also satisfies the properties of maximality at center (with some exceptions; see Zuo and Serfling (2000) for details), monotonicity relative to deepest point, and vanishing at infinity. However, it is not completely affine invariant. According to Serfling (2002), the spatial depth function is invariant with respect to shift, orthogonal, and homogeneous scale transformations of the data, but not heterogeneous scale transformations. This is sufficient if all variables share the same unit of measure, but this is not always the case in a multivariate quality control application so a modification of the spatial depth function is needed. 34 Serfling (2010) showed that a fully affine invariant modification of the spatial depth function may be accomplished by standardizing the sample data using any weak covariance functional, which is defined as follows [Serfling (2010, p. 9)]: "A symmetric positive definite p x p matrix-valued functional ? ?FC is called a weak covariance functional if, for Y = AX + b with any nonsingular p x p matrix A and any vector b, ? ? ? ?1 ',YXF k F?C A C A with ? ?11 ,, Xk k F? Ab a positive scalar function of A, b, and FX. The sample version for a data set ? ?1,...,nn?X XX in pR may be expressed, with nn?YX= A b and ? ?11 , , ,nkk? Ab X as ? ? ? ?1 .n n n nk ??C A C AYX" Application of a weak covariance functional transformation leads to Serfling's (2010) formula for computation of Mahalanobis spatial depth (MSD) for a point x in pR with respect to a distribution F in :pR ? ? ? ? ? ?? ?1 / 2; 1 .M S D F E F ?? ? ? Xx S C x X (2.4.2) The sample version for a point x with respect to a random sample ? ?1,...,nn?X XX in pR is ? ? ? ? ? ?? ?1 / 2; 1 . n n nM S D F E F ?? ? ?x S C x X (2.4.3) There are a number of options available for determining the sample weak covariance functional ? ?nnC X but again 1 1 ,m i im ?? ?SS the scatter estimator for Hotelling's T2 chart when data are divided into m subgroups, will be used in this research because of its robustness to location shifts under the assumption of constant within-subgroup covariance. MATLAB code for computing 35 Mahalanobis spatial depth, based on a modification of S. Mazumder's (personal communication, July 7, 2010) algorithm, is provided in Appendix B. Example 2.3.1 will be revisited to illustrate an application of the Mahalanobis spatial depth function. Computing MSD begins by multiplying the data set by the negative square root of the weak covariance functional ? ?5 5 2HT?CSX as follows: ? ?? ? 1 / 2 1 / 2* 1 / 2 5 5 5 5 2 1 1 . 1 5 4 9 . 6 3 0 . 4 3 5 . 7 4 7 . 9 1 3 6 . 4 6 0 . 2 4 4 . 2 3 1 7 . 1 8 2 0 . 4 5 .5 . 4 2 2 8 . 0 6 0 . 0 1 3 . 2 9 2 0 . 4 5 7 5 . 4 8 1 6 . 2 2 3 8 . 7 7 2 . 5 0 4 . 0 6 8 . 0 9 2 9 . 2 1 0 . 6 9 3 . 2 9 HT ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ?? ? ? ? ??? ? ? ? ??? ? ? ? ? ? ? ? ?? ? ? ? CSX X X X Next, the spatial depth formula is applied to each observation in the transformed sample, beginning with ? ?*1 0.43 5.74 .?X The first step in this process is to determine the unit vectors from *1X to every point in the sample: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ** 11 ** 11 ** 12 ** 12 ** 13 ** 13 ** 14 ** 14 ** 15 ** 15 0.0 0 0 .00 by de f i nit i on 0.2 0 1.5 1 0.1 3 0.9 9 1.52 0.4 4 2.4 4 0.1 8 0.9 8 2.48 2.0 7 1.6 8 0.7 8 0.6 3 2.66 0.2 6 2.4 5 0.1 1 0.9 9 . 2.46 ? ? ? ? ?? ? ? ?? ? ?? ? ? ? ? ?? ? ? ? ? XX XX XX XX XX XX XX XX XX XX Then, the average of the unit vectors from the point *1X to every point in the sample is computed: ? ? ? ? ? ? ? ? ? ? ? ?0 . 0 0 0 . 0 0 0 . 1 3 0 . 9 9 0 . 1 8 0 . 9 8 0 . 7 8 0 . 6 3 0 . 1 1 0 . 9 9 0 . 1 2 0 . 7 2 .5? ? ? ? ? ? ?? 36 Finally, the Euclidean norm of the resulting vector is subtracted from one in order to arrive at the Mahalanobis spatial depth value of the point *1:X ? ? ? ? ? ?22*15M S D ; 1 0 . 1 2 0 . 7 2 0 . 2 7 .F ? ? ? ? ?X Computations for the four remaining observations in the sample proceed in the same fashion. The final results, along with corresponding rankings, are listed in Table 2.4.1. Rankings for 1X and 4X were assigned as indicated because ? ? ? ?**1 5 4 5M S D ; M S D ;FF?XX when ? ?*15MSD ; FX and ? ?*45MSD ; FX are expanded to four significant digits. Depth values and rankings using the MSD function are somewhat different than those obtained using the RMD function. This is because RMD assumes the data are elliptically symmetric, whereas MSD makes no distributional assumptions about the data. i Xi MSD(Xi*;F5) rank 1 11.15 49.63 0.27 4 2 7.91 36.46 0.68 1 3 5.42 28.06 0.35 3 4 16.22 38.77 0.27 5 5 8.09 29.21 0.53 2 Table 2.4.1 Data Ranked According to MSD 2.5 Simplicial Depth As discussed in Chapter 1, simplicial data depth played a prominent role in early depth- based nonparametric multivariate control charting efforts, so a justification for its exclusion from this research is necessary. Introduced by Liu (1990), the simplicial depth (SD) of a point x in pR with respect to a distribution F in pR is defined as the probability that x belongs to a random simplex in ,pR formally stated as 37 ? ? ? ? 11; , . . . , ,FpS D F P S ????? ??x x X X (2.5.1) where 11,..., p?XX are independent observations from F and 11,..., pS ?????XX denotes the p- dimensional simplex with vertices 11,..., ,p?XX or the set of all points in pR that are convex combinations of 11,..., .p?XX For a random sample ? ?1,...,nn?X XX from F in ,pR the sample simplicial depth function is derived from this definition to be ? ? ? ? 11 11 1 1 ; , . . . , ,1 p p n i ii i n nS D F I Sp ? ? ? ? ?? ??? ? ?? ?????? ??? ?? ?x x X X (2.5.2) where I is the indicator function. ? ?; nSD Fx computes the fraction of the random sample simplices containing the point x. In order to check whether a point x in pR is inside a simplex 11,..., ,pS ?????XX the following system of p + 1 equations with p + 1 unknowns must be solved: 1 1 2 2 1 1... ppa a a ??? ? ? ?x x x x (2.5.3) 1 2 1... 1 .pa a a ?? ? ? ? (2.5.4) Equation (2.5.3) translates into p equations which check to see if the p-dimensional point x can be expressed as a linear combination of the p + 1 vertices forming a given simplex 11,..., .pS ?????XX Equation (2.5.4) represents a constraint that the coefficients 1 2 1, ,..., pa a a ? sum to one. According to Liu (1990), if the simplex is nondegenerate, this system of equations has a unique solution. Furthermore, the point x is inside the simplex if and only if the coefficients 1 2 1, ,..., pa a a ? are all positive. For a given point x, this process must be repeated for each of the 38 1np??????? possible p-dimensional simplices 11,..., pS ?????XX formed by the sample ? ?1,..., .nn? XXX In order to illustrate the simplicial depth function, a simple graphical example is provided. Consider a sample of size n = 5 from a continuous bivariate distribution F, and suppose the simplicial depth of a point ? is desired. There are a total of ? ?5 5! 10 3 3 ! 5 3 !?????? ??? possible triangles that can be formed from the sample, three of which contain the point ? as illustrated in Figure 2.5.1: ? ?1 2 4 1 3 4 1 4 5, , .X X X X X X X X X Therefore, the simplicial depth of the point ? is ? ? 5 3; 0 .3 0 .10S D F ??? Figure 2.5.1 Illustration of Simplicial Depth Liu (1990) showed that the simplicial depth function satisfies the affine invariance, vanishing at infinity, maximality at center, and monotonicity properties for continuous 39 distributions. However, as demonstrated by Zuo and Serfling (2000), the maximality and monotonicity properties fail for some discrete distributions, which could be problematic when dealing with a finite sample. As noted by Li and Liu (2004), the exact simplicial depth may be computed in any dimension by solving a system of linear equations, but more efficient algorithms are needed due to the increased computational complexity in higher dimensions. Rousseeuw and Ruts (1996) provided such an algorithm for the bivariate case, but for dimensions greater than two this remains an open problem. Since computational feasibility in higher dimensions is an important goal of this research, simplicial depth will not be implemented in the multivariate quality control charting method proposed in the following chapter. 40 3 The Multivariate Mean-Rank (MMR) Control Chart 3.1 Introduction A multivariate quality control Phase I analysis begins with a p-dimensional reference sample, often from an unknown distribution, which may contain one or more OC points. Application of a data depth function to the multivariate reference sample reduces the dimension of the reference sample from p to one. Then a univariate control charting method, with control limits adjusted to account for the dependence among successive comparisons of control chart statistics to control limits, may be applied to the resulting depth values in order to identify and remove the OC points, thus producing an IC reference sample which will serve as a basis for Phase II monitoring. Differences between Phase I and Phase II were explained in detail in Chapter 1, but will be briefly reiterated here as these differences directly impact the manner in which control limits are determined in a Phase I analysis. In Phase II, the monitoring stage of a control charting application, each new observation is compared (through a control chart statistic) to fixed control limits. With data depth-based methods such as those described by Liu (1995), control limits are often fixed by using an IC reference sample to approximate the univariate distribution of the control chart statistic. Knowledge of this distribution is used to set control limits designed to achieve a certain maximum IC FAP. 41 In Phase I, the retrospective analysis stage of a control charting application, a fixed number of m existing observations (or subgroups) from a reference sample are successively compared through control chart statistics to trial control limits which are constantly revised as OC points are identified and removed from the reference sample. This renders successive comparisons of control chart statistics to control limits dependent, so control limits must be determined by manipulation of the joint distribution of the control chart statistic, simulation of the empirical joint distribution of the control chart statistic, or other techniques which account for these dependencies. Methods such as these will be necessary to design control limits for the data depth-based variation of the X chart used in this research. The X chart for subgrouped data was selected as the model for implementation because it is particularly well suited for use in a Phase I analysis. The X chart analyzes only the information from the most recent observation or subgroup. This makes it very effective at detecting single outliers or large shifts in a process which commonly occur in Phase I. According to Montgomery (2005, p. 385), Shewhart-type charts (such as X charts) "are extremely useful in Phase I implementation of statistical process control, where the process is likely to be OC and experiencing assignable causes that result in large shifts in the monitored parameters." On the contrary, other methods such as cumulative sum (CUSUM), exponentially weighted moving average (EWMA), and moving average (MA) charts use more information from a sample and are therefore typically preferred for Phase II monitoring. A CUSUM chart is used to plot the cumulative sum of deviations of sample values from a specified target value [Montgomery (2005, p. 388)]. An EWMA control chart statistic is a weighted average of all previous sample means, with the weights declining geometrically [Montgomery (2005, p. 406)]. 42 The control chart statistic of an MA chart is a simple unweighted average of a specified number of the most recent observations [Montgomery (2005, p. 417)]. Because they accumulate information over time, CUSUM, EWMA, and MA charts detect small shifts in a process more effectively than X and X charts, but are slower to respond to large shifts and have less ability to detect single outliers. Furthermore, these charts are based on an implicit assumption that the most recent observations are the most important. This assumption may not be reasonable in Phase I when the sample size is fixed and new observations are not being added. Consistent with this perspective, Montgomery (2005, p. 386) characterizes CUSUM and EWMA control charts as "excellent alternatives to the Shewhart control chart for Phase II process monitoring situations." 3.2 Design of the MMR Chart The chart implemented in this research is the multivariate analog of Jones-Farmer et al.'s (2009) Phase I mean-rank chart, which was designed as a distribution-free method of identifying an IC reference sample for a univariate process with subgrouped data. The mean-rank chart is similar in construct to the X chart for univariate subgrouped data, but it uses the standardized average subgroup rank rather than the average of raw subgroup data values as a control statistic. The use of ranks rather than actual data values renders the method distribution free, since the distribution of ranks is the same regardless of the underlying distribution of a univariate process. The mean-rank chart's IC and OC performance was shown to be comparable to the traditional X chart when a univariate process is normally distributed, and better than the X chart in many scenarios when a univariate process follows a heavy-tailed or skewed distribution. 43 It will be shown that the mean-rank chart of Jones-Farmer et al. (2009) performs similarly well when adapted for use with ranked data depth values corresponding to a multivariate process. The mean-rank chart modified for use with data depth values from a multivariate process will be hereafter referred to as the multivariate mean-rank (MMR) chart. Like the mean-rank chart, the MMR chart will monitor standardized average subgroup ranks which follow the same distribution regardless of the underlying distribution of a multivariate process, so it too will be distribution free when a process is IC. In general, any continuous process consisting of two or more correlated variables, usually but not always representing quality characteristics, in which data are subgrouped by design or can be rationally subgrouped, could potentially benefit from the MMR chart proposed by this research. Since most existing multivariate Phase I methods rely on the assumption of a multivariate normally distributed process, the MMR chart will be particularly useful when the process under study is clearly nonnormal or lacks sufficient history to verify an assumption of normality. In addition, because the MMR chart is computationally inexpensive, it will be especially useful for processes consisting of a large number of variables. Example applications of the MMR chart include, but are not limited to industrial (e.g. chemical, power, mining, steel, petroleum, pharmaceutical, electronics, textile, polymer, and automotive), healthcare (e.g. clinical trials and patient satisfaction), military (e.g. weapons development, combat operations, and soldier performance), and service organizations (e.g. finance, marketing, and customer support). An example military application of the MMR chart, and the one which inspired this author's interest in quality control, is charting the progress of combat operations in Iraq. This problem rose to the forefront of the military operations research community in early 2007, when 44 the President of the United States ordered the deployment of approximately 40,000 additional American troops (known as "The Surge") to reverse a trend of escalating violence in Iraq. Because the troop increase was politically polarizing and therefore closely scrutinized by the United States Congress, it was imperative that an accurate method of assessing its effectiveness be emplaced. Military analysts thus faced a two-fold problem -- determining a historical data set reflecting "normal" violence levels in Iraq and implementing an appropriate method of prospectively monitoring future violence levels during "The Surge." In hindsight, the difficult problem of determining a historical data set would have been a prime opportunity for application of the MMR chart. First of all, the overall level of violence in Iraq was measured by several correlated variables related to the performance of the US-led coalition and Iraqi security forces, the terrorist actions of various insurgent groups in Iraq, and the safety of the Iraqi civilian populace. In addition, early data on violence levels was extremely volatile and highly skewed due to Iraq's troubled history as well as immature and often inaccurate reporting procedures. Furthermore, data were collected daily but aggregated into weekly subgroups to account for differences in the pace of combat operations on different days of the week. In this situation, the MMR chart would have been a useful tool to establish an IC reference sample against which future weekly violence levels during "The Surge" could have been compared using a Phase II multivariate control chart. An all-inclusive list of potential applications for the MMR chart is not possible, but it is the opinion of this author that it has the potential to serve as a valuable analytical tool for a wide range of organizations in diverse settings. Its ease of execution and flexibility in solving the distribution-free Phase I multivariate quality control charting problem for subgrouped data fills 45 in many of the existing gaps in current literature, thus providing a useful methodology for researchers and practitioners alike. 3.2.1 The MMR Control Chart Statistic Consider a reference sample consisting of m subgroups of size n from a p-dimensional multivariate process in which all variables are continuous. Let the random vector Xij represent the 1 x p row vector containing the jth observation from the ith subgroup. Treating the observations from the m mutually independent samples of size n as a single sample of size xN n m? as described by Jones-Farmer et al. (2009) and attributed to Kruskal and Wallis (1952), a data depth function is applied to each Xij, resulting in a corresponding depth value ? ?;,ij NDFX where NF denotes the empirical distribution function of the pooled reference sample. Next, integer ranks Rij = 1, 2,..., N are assigned to each ? ?;ij NDFX in the pooled sample of size N, beginning with the largest ? ?;ij NDFX and continuing in descending order. In other words, Rij denotes the rank of ? ?;ij NDFX when compared to all other depth values in the pooled sample of size N, with the largest ? ?;ij NDFX receiving rank 1 and the smallest receiving rank N. When the process is IC, the mean of the random variable Rij is ? ? 12 ij NER ?? and the variance is ? ? ? ?? ?1112 ij NNV a r R ??? [Jones-Farmer et al. (2009, p. 306)]. In the event of a tie, the midrank method is used as a correction without affecting the mean and variance of the random variable Rij [Jones-Farmer et al. (2009, p. 306)]. According to the midrank method, each tied depth value receives the average of the ranks they would receive 46 if the ties were broken [Lehman (2006, p. 18)]. For example, suppose the four depth values {0.93, 0.67, 0.67, 0.22} are to be ranked in descending order. It is clear that the largest depth value (0.93) should be assigned rank 1 and the smallest depth value (0.22) rank 4, but the assignment of ranks 2 and 3 to the equivalent depth values (0.67, 0.67) is ambiguous. In order to preserve the equality of these two depth values in terms of their ranks, they will both be assigned the average of the middle two ranks. In this example, the duplicate depth values will both be assigned rank = (2+3)/2 = 2.5. Thus, the set of ranks corresponding to the four depth values is {1, 2.5, 2.5, 4}. Now consider the average of the ranks in each subgroup i, denoted by 1 . n ijj i R R n??? (3.2.1) If a process is IC, the ranks should be distributed evenly throughout the m subgroups, resulting in approximately equal iR for each subgroup. For an IC process, the mean and variance of iR are, respectively [Bakir (1989, pp. 764-765)]: ? ? 12 i NER ?? (3.2.2) ? ? ? ?? ?1 .12 i N n NV a r R n??? (3.2.3) Invoking the central limit theorem, the random variable representing the standardized subgroup mean rank, ? ?? ? ,ii i i R E RZ Var R ?? (3.2.4) 47 follows an approximate standard normal distribution when n is sufficiently large [Jones-Farmer et al. (2009, p. 306)], although small subgroup sizes (e.g. n = 4, 5, or 6) are more likely in most quality control applications [Montgomery (2005, p. 196)]. To create the MMR control chart for use in Phase I, the control statistic Zi in Equation (3.2.4) is plotted for each of the m subgroups. 3.2.2 Empirical Control Limits for the MMR Chart As opposed to both lower and upper control limits required for the univariate mean-rank chart of Jones-Farmer et al. (2009), the MMR chart has only an upper control limit. This is because with the MMR chart, observations are ranked based on data depth values rather than raw data values. An extremely negative control chart statistic Zi occurs when a subgroup consists of observations having extremely high depth values and correspondingly low ranks. This indicates near-perfect centrality with respect to the p-dimensional data cloud, and is therefore no cause for concern. Conversely, an extremely positive control chart statistic Zi is realized when a subgroup of observations is located far away from the center of the p-dimensional data cloud, resulting in extremely low depth values and correspondingly high ranks. Such a subgroup indicates a potential OC condition which requires further investigation. For each m, n combination of interest, Monte Carlo simulation of the empirical joint distribution of the standardized subgroup mean rank was used to determine the MMR chart upper control limits in Table 3.2.1. Recall that the joint distribution is required because successive comparisons of control chart statistics to control limits are dependent in Phase I. Limits are tabled for a maximum overall IC FAP of 0.10, where the FAP is the probability that the Phase I chart with m subgroups of size n signals at least once when the process is IC. Due to the discrete nature of the mean-rank distribution as well as simulation noise, simulated FAP values do not 48 Table 3.2.1 Empirical Control Limits for the MMR Chart precisely match the desired FAP values. Conservative limits were chosen in order to ensure the simulated FAP came as close as possible to the desired FAP without exceeding it. A more comprehensive table of limits for various combinations of m, n, and FAP is provided in Appendix C, and MATLAB code for simulating additional limits is provided in Appendix D. The general construct of the simulation algorithm is as follows: 1) Establish a trial UCL to attain the desired overall IC FAP for a given (m, n) combination. 2) Simulate N = m x n random numbers from a Uniform(0, 1) distribution. Assign each number a rank from largest (rank = 1) to smallest (rank = N). Divide the resulting ranks into m subgroups of size n. 3) Compute the average rank iR for each subgroup. Determine the corresponding standardized subgroup mean rank Zi. 4) Compare each of the m standardized subgroup mean ranks, Zi, i = 1,...,m, to the trial UCL. Increment a counter by one if any Zi exceeds the UCL. 5) Repeat steps 2 - 4 a total of 100,000 times. 6) Determine the empirical FAP = (final counter value)/100,000. UCL S i m u l at e d F A P 20 5 2.476 0.0941 50 5 2.702 0.0983 100 5 2.854 0.0982 150 5 2.932 0.0983 200 5 2.985 0.0981 D e s i r e d F A P = 0. 10 m n 49 7) If the empirical FAP exceeds the desired FAP, increase the UCL. If the empirical FAP is lower than the desired FAP, decrease the LCL. 8) Reset the counter to zero. 9) Repeat steps 2 - 8 until the desired overall IC FAP is achieved. 10) Record m, n, the desired FAP, the empirical FAP, and the UCL. 3.2.3 Analytical Control Limits for the MMR Chart Prior to simulating empirical limits for the MMR chart, analytical control limits were attempted using the joint distribution of the standardized mean ranks. As reported by Jones- Farmer et al. (2009), the central limit theorem suggests that the individual standardized mean ranks follow a standard normal distribution for sufficiently large subgroup size n. From Bakir (1989), the joint distribution of the standardized mean ranks is asymptotically multivariate normal with correlation matrix 1 2 1 2 1 2 x 12 1 1 , 1 m m mm mm R ?? ?? ?? ?? ?? ? ???? where ? ?1 1ij m? ?? ? when subgroup sizes are equal. Using a zero mean vector and the correlation structure given by Rm x m, asymptotic control limits for the MMR chart were numerically determined through a modification of Genz' (2011) MATLAB algorithm for evaluating the multivariate normal distribution. Control limits were computed to achieve a maximum IC FAP of 0.10. Next, the IC performance of the multivariate normal theory control limits was evaluated by simulating 10,000 applications of the MMR chart using robust Mahalanobis depth to IC bivariate normally distributed data with zero mean vector and identity covariance matrix, without 50 loss of generality. Multivariate normal theory control limits and corresponding empirical IC FAPs for m = 20, 50(50)200 subgroups of size n = 5(5)20 are recorded in Table 3.2.2. Table 3.2.2 Simulated IC FAPs Using Normal Theory Limits Multivariate normal theory control limits produced empirical IC FAPs which are close to the desired IC FAP of 0.10 for large n but unacceptably low for small n. This is because small subgroup sizes n are insufficient to ensure the individual standardized mean ranks Zi follow a standard normal distribution in accordance with the central limit theorem, thus preventing the joint distribution of the standardized mean ranks from achieving asymptotic multivariate normality. This can be seen graphically in Figure 3.2.1 depicting Q-Q plots of simulated M V N U C L S i m u l at e d F A P 20 5 2.565 0.075 2 20 10 2.565 0.082 2 20 15 2.565 0.088 1 20 20 2.565 0.087 3 50 5 2.865 0.048 5 50 10 2.865 0.076 6 50 15 2.865 0.088 9 50 20 2.865 0.086 1 100 5 3.077 0.029 6 100 10 3.077 0.069 2 100 15 3.077 0.083 1 100 20 3.077 0.083 7 150 5 3.195 0.019 9 150 10 3.195 0.060 2 150 15 3.195 0.074 4 150 20 3.195 0.078 9 200 5 3.277 0.013 1 200 10 3.277 0.056 7 200 15 3.277 0.072 5 200 20 3.277 0.075 1 D e s i r e d F A P = 0.10 m n 51 standardized mean ranks for m = 50 and n = 5(5)20. The individual Q-Q plots show a clear departure from normality when m = 50 and n = 5 (top left), and increasing normality as n is raised to 20 (bottom right). Figure 3.2.1 Q-Q Plots of Zi for m = 50, n = 5(5)20 Table 3.2.2 also illustrates that MMR chart performance using multivariate normal theory control limits worsens with increasing m. This is easily understood if a Phase I analysis is viewed as the partitioning of a desired overall IC FAP among m simultaneous individual comparisons of control chart statistics to an UCL. A larger m means that a smaller portion of the overall IC FAP is allocated to each of the m individual comparisons. This can be visualized as - 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5 - 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5 S t a n d a r d N o r m a l Q u a n t i l e s Q u a n ti le s o f In p u t S a m p le Q Q P l o t o f Z i ( m = 5 0 , n = 5 ) v e r s u s S t a n d a r d N o r m a l - 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5 -3 -2 -1 0 1 2 3 S t a n d a r d N o r m a l Q u a n t i l e s Q u a n ti le s o f In p u t S a m p le Q Q P l o t o f Z i ( m = 5 0 , n = 1 0 ) v e r s u s S t a n d a r d N o r m a l - 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5 -3 -2 -1 0 1 2 3 S t a n d a r d N o r m a l Q u a n t i l e s Q u a n ti le s o f In p u t S a m p le Q Q P l o t o f Z i ( m = 5 0 , n = 1 5 ) v e r s u s S t a n d a r d N o r m a l - 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5 - 2 . 5 -2 - 1 . 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 2 . 5 S t n d a r d N o r m a l Q u a n t i l e s Q u a n ti le s o f In p u t S a m p le Q Q P l o t o f Z i ( m = 5 0 , n = 2 0 ) v e r s u s S t a n d a r d N o r m a l 52 the UCL being pushed progressively farther into the upper tail of the standard normal distribution of each individual control chart statistic. As this happens, the effects of any departures of the distribution of the individual control chart statistic from standard normality will be exacerbated. This in turn will lead to undesired empirical FAPs for the MMR chart using multivariate normal theory control limits. Multivariate normal theory control limits could be used to provide conservative limits for a very small number of subgroups or very large subgroup sizes, but empirical control limits are much more consistent in maintaining the desired IC FAP for the range of m and n considered in this research. An alternative to the "one size fits all" multivariate normal theory control limits for the MMR chart is to enumerate the distribution of the standardized mean rank for each combination of number of subgroups m and subgroup size n, and use this information to derive the corresponding joint distribution of the standardized mean ranks. However, this method is clearly impractical for the number of subgroups considered in this research, again supporting the use of empirically determined control limits for the MMR chart. 3.3 Example Application of the MMR Chart In order to fully understand the workings of an MMR chart, a simple example is provided. Consider the first subgroup of a bivariate process consisting of m = 50 subgroups of size n = 5 from an unknown distribution F. Let the random vector Xij represent the 1 x 2 row vector containing the jth observation from the ith subgroup, where i = 1 and j = 1 - 5. The data, along with corresponding robust Mahalanobis depth values and ranks, are listed in Table 3.3.1. 53 i j Xij RMD(Xij;F250) Rij 1 1 5.1880 2.4570 0.3311 197 1 2 0.7332 4.7681 0.2904 218 1 3 3.3695 4.3434 0.4533 127 1 4 4.5465 4.7078 0.3258 201 1 5 3.0102 3.8656 0.5677 61 Table 3.3.1 MMR Chart Data for the First Subgroup of a Bivariate Process Note that Rij reflects rankings with respect to the pooled reference sample of size N = 250. The average of the ranks in the first subgroup is ? ? 1 1 9 7 2 1 8 1 2 7 2 0 1 6 1 1 6 0 . 8 0 .5R ? ? ? ??? Using Equations (3.2.2) and (3.2.3), ? ? 1 2 5 0 1 1 2 5 . 5 022 i NER ??? ? ? and ? ? ? ? ? ? ? ? ? ?? ?1 2 5 0 5 2 5 0 1 1 0 2 4 . 9 2 .1 2 1 2 5i N n NV a r R n? ? ? ?? ? ? Using Equation (3.2.4), the standardized mean rank for the first subgroup is ? ?? ?1 1 1 6 0 . 8 0 1 2 5 . 5 0 1 . 1 0 3 . 1 0 2 4 . 9 2ii R E RZ V a r R ? ?? ? ? Given a desired IC FAP of 0.10, the MMR chart UCL for m = 50, n = 5 is found from Table 3.2.1 to be 2.702. Since Z1 is less than 2.702, it is concluded that the first subgroup is IC. In order to complete the MMR chart, this process is repeated for subgroups i = 2 - 50. Any Zi exceeding the UCL will have its corresponding subgroup Xi. removed from the sample if no assignable cause is found, thus establishing the IC reference sample for use in Phase II. Using the control limits in Table 3.2.1, the next step is to compare the performance of the MMR chart using both robust Mahalanobis depth and Mahalanobis spatial depth to the best multivariate parametric Phase I alternative. All control charts will be tested on normal, heavy- tailed, and skewed multivariate data, with both isolated and sustained shifts of the mean. Details concerning the testing and evaluation process are provided in Chapter 4. 54 4 MMR Chart Performance Assessment Methodology 4.1 Introduction To assess the effectiveness of the MMR chart as a distribution-free method of establishing an IC reference sample, its performance will be compared to an equivalent Phase I parametric multivariate method. If there were any other multivariate nonparametric or distribution-free Phase I methods in existence, they would also yield useful comparisons. However, the MMR chart appears to be the first in this class of control charts. Because the MMR chart is a Shewhart-type chart, it must naturally be compared to another Shewhart-type chart for subgrouped multivariate data. From the literature review in Chapter 1, there is no clear consensus on the preferred Phase I parametric method. Because the original Hotelling's T2 chart is the most common baseline performance measure for subsequently developed Phase I parametric multivariate methods, it will likewise be used as a basis of comparison for the distribution-free MMR chart. 4.2 Establishing Baseline Performance Using Hotelling's T2 Chart Constructing Hotelling's T2 chart for a reference sample consisting of m subgroups of size n from a p-dimensional multivariate process requires first calculating unbiased estimates of the mean vector and covariance matrix. From Montgomery (2005, p. 495) the classical estimators 55 are 1 1 m i im ?? ?XX (4.2.1) and 1 1 ,m i im ?? ?SS (4.2.2) where X represents the average of the m subgroup mean vectors and S represents the average of the m subgroup covariance matrices. Using these estimated parameters, the control statistic is computed as ? ? ? ?12 .ii iTn ??? ? ?X X S X X (4.2.3) The control statistic for each subgroup is compared to the Phase I UCL given by Alt's (1976) formula: ? ? ? ? ? ? ? ?? ? 2 , , 1 11, , , w h e r e , , .1p m n m p T p m nU C L C m n p F C m n p m n m p? ? ? ? ???? ? ? ? (4.2.4) In Equation (4.2.4) above, , , 1p mn m pF? ? ? ? represents the (1 - ?)th percentile of the F distribution with p and (mn - m - p + 1) degrees of freedom, and ? is the desired IC FAP for each individual subgroup. In order to achieve a desired overall IC FAP for all m subgroups in a reference data set, ? must be set as follows: ? ?1/1 1 ,mo vera ll??? ? ? (4.2.5) 56 where ?overall is the desired overall IC FAP. For example, for a reference sample consisting of m = 50 subgroups and a desired overall IC FAP of 0.05, ? ?1 / 5 01 1 0 .0 5 0 .0 0 1 0 2 5? ? ? ? ? would be used in Equation (4.2.4) to determine the Phase I UCL. Alt's (1976) formula given in Equation (4.2.4) was derived using the IC distribution of the T2 statistic given in Equation (4.2.3) under the assumption of multivariate normally distributed data. Therefore, it is not appropriate for use when the distribution of the data is nonnormal because it will not result in the desired IC FAP. Having a common baseline level of performance is essential to a valid comparison of OC performance among all charts considered, so control limits for Hotelling's T2 chart must be empirically adjusted when the data under study are nonnormally distributed. This will be accomplished using an algorithm similar to the one for determining MMR empirical control limits detailed in Chapter 3. Hotelling's T2 empirical control limits used in this research are provided in Appendix E, and the MATLAB code used to determine them is provided in Appendix F. 4.3 Simulating Symmetric and Skewed Process Distributions The MMR and Hotelling's T2 charts will be tested on IC as well as mean-shifted data from normal, heavy-tailed, and skewed distributions with dimensions p = 2, 5, and 10. Due to affine equivariance of the mean vector and covariance matrix, multivariate normal data will be generated without loss of generality from the standard multivariate normal distribution, Np(0, I), where 0 is a p-dimensional mean vector of all zeros and I is a p x p identity matrix. Heavy-tailed data will be represented by the multivariate t distribution, also using Ip x p as the covariance matrix. Variations of the multivariate t distribution will include both 10 and 3 degrees of freedom corresponding to increasingly fatter tails. Finally, skewed data will come from a 57 multivariate lognormal distribution, standardized to have zero mean vector and identity covariance matrix. The data will be simulated using MATLAB code from the MathWorks Statistics Toolbox at http://www.mathworks.com/help/toolbox/stats/. A summary of all planned experiments is illustrated in Table 4.3.1. Table 4.3.1 Summary of Planned Experiments 4.4 Evaluating In-Control Performance The MMR and Hotelling's T2 charts will first be evaluated based on their ability to maintain a desired IC FAP for subgrouped data from multivariate normal, multivariate t, and multivariate lognormal distributions. It is expected that only the MMR chart, because it is distribution-free, will be able to maintain the desired IC FAP across all combinations of sample and subgroup sizes. Furthermore, IC performance of the MMR chart should be invariant to the choice of depth function used. The algorithm for these simulations, which will be performed in MATLAB, is as follows: 1) Simulate m subgroups of size n from a p-dimensional normal, t, or lognormal distribution. n o s h i f t 2 5 10 2 2 5 10 2 5 i s ol a t e d s h i f t 2 5 10 2 2 5 10 2 5 5 / 15 / 30 % s u s t a i n e d s h i f t s 2 10 2 10 2 5 n o s h i f t 2 5 10 2 2 5 10 2 i s ol a t e d s h i f t 2 5 10 2 2 5 10 2 5 / 15 / 30 % s u s t a i n e d s h i f t s 2 10 2 10 2 n o s h i f t 2 2 2 5 2 5 i s ol a t e d s h i f t 2 2 2 5 2 5 5 / 15 / 30 % s u s t a i n e d s h i f t s 2 2 5 N u m b e r / S i z e o f S u b gr ou p s : m = 2 0, 5 0( 50 ) 20 0; n = 5 P r oc e s s D i s t r i b u t i on ( i n p = 2 , 5 , o r 1 0 D i m e n s i on s ) : n or m a l t ( 10 ) t ( 3) l og n or m a l H ot e l l i n g ' s T 2 C h a r t M M R - R M D C h a r t M M R - M S D C h a r t C on t r ol C h ar t S h i f t T yp e 58 2) Establish the UCL for the MMR or Hotelling's T2 chart. 3) Compute control chart statistics for each subgroup and compare to the UCL. If at least one control chart statistic exceeds the UCL, increment a counter by one. 4) Repeat steps 1 - 3 a total of 10,000 times. 5) Estimate the overall IC FAP = (final counter value)/10,000. This process will be repeated for all desired combinations of m, n, p, process distribution, and control chart. MATLAB algorithms for simulating IC performance for the MMR and Hotelling's T2 charts are provided in Appendix G and Appendix H, respectively. 4.5 Evaluating Out-of-Control Performance Next, the MMR and Hotelling's T charts will be evaluated in terms of their ability to detect isolated and sustained shifts of the mean. An isolated shift of the mean is defined as a location shift occurring in a single subgroup of size n. Because the probability of detection is independent of the location of a shift within a data set, isolated shifts will take place in the first subgroup of each simulated data set without loss of generality. A sustained shift of the mean is defined as a location shift occurring in a certain percentage of the pooled sample of size N. Sustained shift percentages tested will include 5%, 15%, and 30%, and will take place at the end of each data set. Sustained shifts could be induced anywhere in the data set without loss of generality, but being at the end is most logical since it is unlikely that a process would go from an OC state back to an IC state without outside intervention. The magnitude of the various shifts imposed will vary depending on the scenario being evaluated. This is because both the dimension of the data and the type of shift have a direct impact on the probability of a shift being detected. In general, all shifts are easier to detect in 59 lower dimensions than in higher dimensions, and sustained shifts are easier to detect than isolated shifts. The magnitude of a shift will be measured by the noncentrality parameter 1 ,? ? ?? ??? (4.5.1) where the process mean vector shifts from o? to o??? and ? is the process covariance matrix. Because the direction of a shift does not affect control chart performance with elliptically symmetric distributions, shifts will be fixed in the direction of ? ?1 1,0,...,0?e without loss of generality [Stoumbos and Sullivan (2002), p. 265]. Shift directions for skewed distributions will be discussed in Section 4.6. OC performance for a control chart will be quantified in terms of the empirical alarm probability (EAP), where EAP is defined as the estimated probability of a chart signaling at least once in an OC situation. Ideally, a control chart's EAP should be 100% for all scenarios involving induced location shifts. It is hoped that the MMR chart's performance will match that of Hotelling's T2 chart for normally distributed data and surpass the T2 chart's performance for nonnormally distributed data. The algorithm for simulating OC performance is slightly different than the IC case, and is detailed as follows: 1) Simulate m subgroups of size n from a p-dimensional normal, t, or lognormal distribution. 2) Add isolated or sustained location shifts to the desired subgroups. 3) Establish the UCL for the MMR or Hotelling's T2 chart. 4) Compute control chart statistics for each subgroup and compare to the UCL. If at least one control chart statistic exceeds the UCL, increment a counter by one. 5) Repeat steps 1 - 4 a total of 10,000 times. 60 6) Estimate the EAP = (final counter value)/10,000. This process will be repeated for all combinations of m, n, p, process distribution, shift type, and control chart. MATLAB algorithms for simulating OC performance for the MMR and Hotelling's T2 charts are also provided in Appendix G and Appendix H, respectively. 4.6 Evaluating Out-of-Control Performance with Skewed Data Control chart performance with skewed distributions will be assessed using multivariate lognormally distributed data, simulated using the transformational relationship between the multivariate normal and the multivariate lognormal distributions. A p-dimensional multivariate lognormal random vector X can be represented as ? ? 12, ,..., ,pYYYe e e?X where Y is multivariate normal ? ?N,p Y Y?? [Law and Kelton (2000), p. 382]. Applying this transformation using a multivariate normal random vector Y with mean vector ? ?12, ,...,Yp? ? ??? and covariance matrix Y? with ij? ?the (i,j)th entry, the resulting multivariate lognormal random vector X has the following properties [Law and Kelton (2000), p. 382]: ? ? ? ?/2i iiiE X e ???? (4.6.1) ? ? ? ? ? ?2 1i ii iiiV X e e?? ???? (4.6.2) ? ? ? ? 2, 1 .ii jjijijijC o v X X e e ????? ??????????? (4.6.3) Simulating multivariate lognormal observations is therefore simply a matter of generating ? ?12, ,..., pY Y Y?Y ~ ? ?N,p Y Y? ? and then evaluating ? ?12, ,..., .pYYYe e e?X Without loss of 61 generality, this research will use Y ~ ? ?,pN 0I to create multivariate lognormal data X having the following properties: ? ? 1/ 2 1 .6 4 8 7iE X e?? (4.6.4) ? ? ? ?1 4 .6 7 0 8iV X e e? ? ? (4.6.5) ? ?, 0.ijCov X X ? (4.6.6) In order to maintain consistency with other simulated distributions used in this research, the multivariate lognormal data X will be standardized using ? ? 1/ 2 ,i i X X???XX ? ? where X? is a 1 x p mean vector with all entries equal to 1.6487 and X? is a p x p covariance matrix with diagonal entries equal to 4.6708 and zeros everywhere else, resulting in X ~ lognormal (0, I). Once the multivariate lognormal data are simulated, isolated and sustained shifts will be induced to evaluate OC performance. As noted by Stoumbos and Sullivan (2002, p. 265), while the direction of a shift has no effect on control chart performance with elliptically symmetric distributions, it can substantially affect a control chart's detection power with skewed distributions. One method of handling this is to focus on the shift direction having the most dramatic effect on control chart performance, but this is a difficult task because there are an infinite number of shift directions from which to choose in a multivariate setting [Stoumbos and Sullivan (2000), p. 267]. Even if the most impactful shift direction could be determined, its odds of occurring in practice are unknown. As pointed out by J. Sullivan (personal communication, February 2, 2011), there is no guidance found in the literature regarding the likelihood of certain shift directions occurring, so a better approach is to assume that all shift directions are equally probable. Under this assumption, as done by Stoumbos and Sullivan (2000), the effects of shift directions randomly generated over a uniform distribution will be averaged. 62 The shift directions will be generated using an algorithm proposed by Johnson (1987, p. 127), who stated that a p-dimensional shift can be created by first generating p independent standard normal random variates 12, ,..., .pZ Z Z Next, the shift vector ? which follows a uniform distribution on the p-sphere is computed using ? ? 221 , 1 , 2 , . . . , .... ii p Z ipZZ? ???? (4.6.7) A different ? will be generated for each of the 10,000 iterations of the simulation, and the results will be averaged at the conclusion of the simulation. In two dimensions, this method of creating shift vectors is analogous to randomly generating a series of unit vectors which emanate from the origin and terminate along the boundary of the unit sphere. As with elliptically symmetric distributions, the magnitude of the various shifts imposed will be measured by the noncentrality parameter given in Equation (4.5.1), where the multivariate lognormal process mean vector shifts from o? to o??? and ? is the asymptotic covariance matrix of the multivariate lognormal process. With ? as defined by Equation (4.6.7) and ? equal to the identity matrix, ? always equals one. In order to induce shifts corresponding to 1,?? the shift vector ? resulting from Equation (4.6.7) must be multiplied by the desired ?? thus shortening or lengthening the unit vector to achieve the desired ?. For example, suppose it is desired to induce a shift of size ??= 3 into a bivariate lognormal process with identity covariance matrix. Using Equation (4.6.7), a possible shift vector is ? ?-0 .7 4 6 8 , 0 .6 6 5 1 .?? If this shift vector is applied directly to the process without any scaling, the magnitude of the resulting shift is 1 1.? ? ?????? However, using ? ?3 -2 .2 4 0 4 , 1 .9 9 5 3?? produces the desired result of 13 3 3.? ? ?????? This methodology 63 will be employed for all simulations involving OC conditions in multivariate lognormally distributed data. Once all simulations have been completed and results analyzed, recommendations will be provided on how best to proceed in a Phase I multivariate quality control scenario when a process distribution is normal, heavy-tailed, or skewed. 64 5 MMR Chart Performance Comparisons 5.1 Introduction MMR chart performance comparisons to Hotelling's T2 (HT2) chart were focused primarily on m = 20, 50(50)200 subgroups of size n = 5. The number of subgroups was chosen to be relatively small because a Phase I analysis often occurs early in the life of a process when very little historical data is available. A subgroup size of five was chosen because Jones-Farmer et al. (2009) showed that this is the minimum subgroup size necessary for reliable univariate mean-rank chart performance, and further testing using the MMR chart confirmed this to be true in the multivariate case as well. Limited experimentation was conducted using subgroup sizes n = 5(5)20 in order to demonstrate the enhancing effect of larger subgroup sizes on MMR chart performance. In all simulations, the desired IC FAP was set to 0.10, but the results can be generalized to other common IC FAPs such as 0.05. 5.2 MMR Chart Performance with Symmetric Distributions Symmetric distributions tested include the multivariate normal, t(10), and t(3) distributions. When evaluating IC performance of Hotelling's T2 chart, Alt's (1976) Phase I UCL was used for all process distributions. For OC assessments, Alt's (1976) Phase I UCL was used for the multivariate normal case only, and empirically adjusted UCLs were used for the t(10) and t(3) cases. RMD was the primary depth function used in the MMR chart because it is well-suited 65 for elliptically symmetric distributions and one of the simplest depth functions to compute, but MSD was implemented in a few cases for comparison purposes. Simulation results show that when data are normally or nearly normally distributed, a normal-theory method such as Hotelling's T2 chart is preferred. However, when data are heavy-tailed as with the t(3) distribution, the distribution-free MMR chart is usually a superior alternative. 5.2.1 In-Control Performance with Symmetric Distributions The fundamental advantage of a distribution-free control chart is its ability to maintain a desired IC FAP for any process distribution. Accordingly, the MMR chart using both RMD and MSD was first compared to Hotelling's T2 chart using IC bivariate normal, t(10), and t(3) processes with a desired IC FAP of 0.10. For these comparisons, Hotelling's T2 chart was constructed using only Alt's (1976) Phase I UCL given by Equation (4.2.4), adjusted for the number of subgroups using Equation (4.2.5), in order to demonstrate the effects of applying a normal-theory method to both normally and nonnormally distributed data. As indicated in Figure 5.2.1, Hotelling's T2 chart maintains the desired IC FAP for the bivariate normal process, but becomes progressively worse as the distribution deviates from normality and the number of subgroups is increased. For a bivariate t(3) process, the IC FAP for Hotelling's T2 chart using Alt's (1976) Phase I UCL ranges from approximately 30% when m = 20 to over 90% when m = 200. This is why, for OC assessments with nonnormally distributed data, the UCL for Hotelling's T2 chart must be empirically tailored to achieve the desired IC FAP of 0.10 for each (m, n) combination and process distribution studied. Although this is impracticable outside of a simulation environment because it requires knowing the exact process distribution, it is necessary in order to ensure a common basis of comparison for all charts 66 included in OC performance comparisons. The MMR chart, on the other hand, consistently maintains the desired IC FAP for all process distributions and any number of subgroups. This holds true regardless of the data depth measure used, so no adjustments to the MMR chart UCLs given in Table 3.2.1 are necessary. 67 Figure 5.2.1 Empirical IC FAPs for Symmetric Bivariate Distributions 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l F AP (m,n) Bivariate Normal Process HT2 RMD MSD 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l F AP (m,n) Bivariate t(10) Process HT2 RMD MSD 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l F AP (m,n) Bivariate t(3) Process HT2 RMD MSD 68 Figure 5.2.2 shows the effects of dimensionality on control chart performance using a t(3) process. Again, the MMR chart consistently maintains the desired IC FAP for any number of subgroups m and any dimension p. Hotelling's T2 chart becomes distinctly worse in higher dimensions, reaching empirical IC FAPs near 100% for all but the smallest number of subgroups considered when p = 10. These results show that the MMR chart is distribution free in any dimension when applied to elliptically symmetric data using RMD, MSD, or presumably any other depth function with similar statistical properties. A complete table of IC performance data for symmetric distributions is provided in Appendix I. 69 Figure 5.2.2 Empirical IC FAPs for t(3) Processes in Higher Dimensions 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l F AP (m,n) Bivariate t(3) Process HT2 RMD MSD 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l F AP (m,n) t(3) Process, p = 5 HT2 RMD MSD 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l F AP (m,n) t(3) Process, p = 10 HT2 RMD MSD 70 5.2.2 Isolated Shifts of the Mean with Symmetric Distributions MMR-RMD and Hotelling's T2 chart performance for isolated shifts in two dimensions for (m, n) combinations (20, 5), (100, 5), and (200, 5) are shown in Figure 5.2.3. Hotelling's T2 chart using Alt's (1976) Phase I limits is superior in the case of bivariate normally distributed data, as expected. For slightly nonnormal data following a bivariate t(10) distribution, Hotelling's T2 chart with empirically adjusted UCL maintains a smaller but still notable advantage over the MMR-RMD chart. For heavy-tailed process data following a bivariate t(3) distribution depicted in the bottom panel of Figure 5.2.3, however, the MMR-RMD chart is both significantly better and much more consistent than Hotelling's T2 chart in terms of EAP. The two control charts are roughly equivalent when m = 20, but the performance of Hotelling's T2 chart declines dramatically as m is increased to 200, whereas MMR chart performance is far less affected when the number of subgroups is increased. For example, in the case of bivariate t(3) data with an isolated shift of magnitude ? = 6, EAPs for m = 20, 100, and 200 are approximately 100% using the MMR-RMD chart as compared to 100%, 92%, and 46%, respectively, for Hotelling's T2 chart with empirical UCL. 71 Figure 5.2.3 Control Chart Performance on Symmetric Bivariate Data with an IS 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Normal Process with an IS RMD (20,5) RMD (100,5) RMD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(10) Process with an IS RMD (20,5) RMD (100,5) RMD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with an IS RMD (20,5) RMD (100,5) RMD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 72 MMR chart performance for isolated shifts is relatively invariant to the choice of depth function. Figure 5.2.4 shows the application of an MMR chart using both RMD and MSD to a bivariate t(3) process with (m, n) combinations (20, 5) and (200, 5). The MMR-RMD chart has a slight advantage over the MMR-MSD chart for m = 20 subgroups, but the two charts are nearly identical in terms of EAP when m = 200. Repeating this analysis using other symmetric distributions yielded similar results in both two and five dimensions. Figure 5.2.4 MMR-RMD/MSD Chart Performance on t(3) Data with an IS The MMR chart loses some power to detect isolated shifts of the mean as the dimension of the data is increased, but it retains clear superiority over Hotelling's T2 chart in most scenarios considered. When applied to a heavy-tailed process represented by the t(3) distribution, the MMR-RMD chart is a substantially better alternative for m ? 50 in five dimensions and for m ? 100 in ten dimensions. This is illustrated in Figure 5.2.5. 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with an IS RMD (20,5) RMD (200,5) MSD (20,5) MSD (200,5) 73 Figure 5.2.5 Control Chart Performance on t(3) Data with an IS in Higher Dimensions Complete tables of results for all simulations performed using symmetric distributions with isolated shifts of the mean are provided in Appendices J - L. 5.2.3 Sustained Shifts of the Mean with Symmetric Distributions The MMR-RMD chart is generally superior to Hotelling's T2 chart in detecting sustained shifts of the mean in a bivariate t(3) process, although some loss of power is observed as the level of contamination in the sample is increased. Figure 5.2.6 depicts control chart performance for sustained mean shifts composing 5%, 15%, and 30% of the total data sets. For a 5% 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter t(3) Process with an IS in p = 5 RMD (50,5) RMD (100,5) RMD (200,5) HT2 (50,5) HT2 (100,5) HT2 (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter t(3) Process with an IS in p = 10 RMD (100,5) RMD (200,5) HT2 (100,5) HT2 (200,5) 74 contamination level, MMR-RMD chart performance matches Hotelling's T2 chart performance for m = 20 and surpasses it by an increasing margin as m is increased from 50 to 200. Similar trends are observed for a 15% level of contamination, but m ? 50 subgroups are necessary for MMR-RMD chart performance to exceed that of Hotelling's T2 chart. When the level of contamination is raised to 30%, m ? 150 subgroups are necessary for the MMR-RMD chart to consistently outperform Hotelling's T2 chart. For each sustained shift scenario considered, MMR-RMD chart performance is remarkably consistent when at least 50 subgroups are present. For example, in the 15% sustained shift scenario depicted in the middle panel of Figure 5.2.6, the lines representing MMR-RMD chart performance for m = 50, 100, and 200 subgroups are nearly coincident. On the other hand, Hotelling's T2 chart performance declines rapidly as the number of subgroups is increased. However, the fact that the overall detection power of the MMR chart declines as the level of contamination is raised from 5% to 30% is counterintuitive, as one would expect the opposite to hold true. This is shown in Section 5.5 to be an unavoidable consequence of a rank- based control charting method. 75 Figure 5.2.6 Control Chart Performance on Increasingly Contaminated Bivariate t(3) Data 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with a 5% SS RMD (20,5) RMD (100,5) RMD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with a 15% SS RMD (50,5) RMD (100,5) RMD (200,5) HT2 (50,5) HT2 (100,5) HT2 (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with a 30% SS RMD (150,5) RMD (200,5) HT2 (150,5) HT2 (200,5) 76 For the MMR chart, RMD is a more effective depth measure than MSD in the presence of a sustained mean shift in a bivariate t(3) process. MMR-MSD chart detection power lags only slightly behind MMR-RMD chart performance under a 5% contamination level, but falls farther behind when the contamination is increased to 15% and becomes unacceptably low at the 30% contamination level. This effect is illustrated in Figure 5.2.7. Based on these results, RMD is clearly the preferred depth measure for the MMR chart when data are elliptically symmetric. Figure 5.2.7 MMR-RMD/MSD Chart Performance on Bivariate t(3) Data with a 30% SS As with isolated shifts of the mean, the MMR chart's ability to detect sustained shifts of the mean is somewhat degraded as the dimension of the data is increased. In the ten-dimensional t(3) process with a 15% sustained mean shift shown in Figure 5.2.8, the MMR-RMD chart matches or exceeds Hotelling's T2 chart performance for m ? 100, in contrast to m ? 50 required for the bivariate t(3) case depicted in the middle panel of Figure 5.2.6. Similar results are seen with 5% and 30% sustained shifts of the mean imposed upon a ten-dimensional t(3) process. 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with a 30% SS RMD (20,5) RMD (200,5) MSD (20,5) MSD (200,5) 77 Figure 5.2.8 Control Chart Performance on t(3) Data with a 15% SS in p = 10 Complete tables of results for all simulations performed using symmetric distributions with sustained shifts of the mean are provided in Appendices M - R. In addition, a matrix of recommended control chart usage with heavy-tailed multivariate data under both isolated and sustained shifts of the mean is provided in Table 5.2.1. Although Hotelling's T2 chart outperforms the MMR-RMD chart for all scenarios in which m ? 50, n = 5, and p = 10, even m = 50 subgroups of size n = 5 would be considered an exceptionally small reference sample for a ten-dimensional process. The MMR-RMD chart is a better alternative than Hotelling's T2 chart for most of the more realistic scenarios considered in ten dimensions. Furthermore, for any scenario in which Hotelling's T2 chart outperforms the distribution-free MMR chart, it should be reiterated that its implementation requires empirically adjusted UCLs based on the exact distribution of the process under study. Since the process distribution is unlikely to be known in practice, another control charting technique must be sought for scenarios in Table 5.2.1 labeled "HT2." 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter t(3) Process with a 15% SS in p = 10 RMD (100,5) RMD (150,5) RMD (200,5) HT2 (100,5) HT2 (150,5) HT2 (200,5) 78 Table 5.2.1 Recommended Phase I Control Chart Usage for Heavy-Tailed Data 5.3 MMR Chart Performance with Skewed Data The multivariate lognormal distribution was the lone skewed distribution tested. As with the symmetric distributions evaluated, Hotelling's T2 chart was used in conjunction with Alt's (1976) Phase I UCL for the IC case and with empirically adjusted UCLs for the OC scenarios. Most MMR charts were created using MSD, since MSD was expected to outperform RMD on skewed process data. Performance comparisons were focused on m = 20, 100, and 200 subgroups and dimensions p = 2 and 5 because MSD, although quickly computable for a single data set, is considerably more time consuming than RMD when performing 10,000 replications. Simulation results show that when data are skewed, the distribution-free MMR chart almost always represents the best available control charting methodology. 5.3.1 In-Control Performance with Skewed Data In order to validate its performance as a distribution-free method when process data are skewed, MMR and Hotelling's T2 charts were first applied to IC lognormal processes in both two and five dimensions using a desired IC FAP of 0.10. As with the symmetric distributions tested, C on t am i n at i on L e ve l ( 20,5) ( 50,5) ( 100,5) ( 150,5) ( 200,5) I S 5% S S M M R - R M D 15% S S 30% S S H T 2 I S 5% S S M M R - R M D 15% S S 30% S S ( m ,n ) H T 2 p 2 10 79 Hotelling's T2 chart was constructed using Alt's (1976) Phase I UCL given by Equation (4.2.4), adjusted for the number of subgroups using Equation (4.2.5), in order to demonstrate the negative consequences of applying a normal-theory method to skewed data. The results of the IC performance analysis are illustrated in Figure 5.3.1. Figure 5.3.1 Empirical IC FAPs for Lognormal Processes in p = 2 and p = 5 Hotelling's T2 using Alt's (1976) Phase I UCL chart categorically fails to maintain the desired IC FAP for multivariate lognormal processes. The IC FAP for Hotelling's T2 chart ranges from approximately 43% to 99% in two dimensions and from approximately 48% to 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l F AP (m,n) Bivariate Lognormal Process HT2 RMD MSD 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l F AP (m,n) Lognormal Process in p = 5 HT2 MSD 80 100% in five dimensions. In contrast, the MMR charts using RMD and MSD with the UCLs from Table 3.2.1 consistently maintain the desired IC FAP of 0.10 for all (m, n) combinations considered, solidifying its characterization as a distribution-free method. A complete table of IC performance data for skewed data is provided in Appendix S. 5.3.2 Isolated Shifts of the Mean with Skewed Data The performance of MMR-MSD and Hotelling's T2 charts under isolated shifts of the mean in bivariate lognormally distributed data is displayed in Figure 5.3.2. Even with UCLs empirically adjusted to achieve an IC FAP of 0.10, Hotelling's T2 chart performance deteriorates rapidly for m > 20. The MMR chart not only outperforms Hotelling's T2 chart by a wide margin, but its performance is extremely consistent for all m. Figure 5.3.2 Control Chart Performance on Bivariate Lognormal Data with an IS In Figure 5.3.3, the previous scenario is repeated for m = 100, n = 5 using the MMR- RMD chart in order to compare the performance of MSD and RMD as depth functions. As originally hypothesized, it is seen that the MMR-MSD chart detects smaller isolated shifts with 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with an IS MSD (20,5) MSD (100,5) MSD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 81 higher probability than the MMR-RMD chart, and offers equivalent performance in the case of larger shifts. Figure 5.3.3 MMR-MSD/RMD Chart Performance on Bivariate LGN Data with an IS Although the MMR chart's gradual loss in power with symmetric distributions in higher dimensions is also observed with skewed data, it remains notably better than Hotelling's T2 chart. In the five-dimensional scenario depicted in Figure 5.3.4, the MMR-MSD chart matches Hotelling's T2 chart performance for m = 20 and dominates for m > 20, making it clearly the best alternative for detecting isolated shifts occurring in skewed process data when p ? 5. 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with an IS MSD (100,5) RMD (100,5) 82 Figure 5.3.4 Control Chart Performance on LGN Data with an IS in p = 5 Complete tables of results for all simulations performed using the multivariate lognormal distribution with isolated shifts of the mean are provided in Appendices T and U. 5.3.3 Sustained Shifts of the Mean with Skewed Data MMR-MSD chart performance in detecting sustained shifts of the mean in skewed bivariate data varies greatly with the percentage of data shifted. As shown in Figure 5.3.5, the MMR-MSD chart is universally more powerful than Hotelling's T2 chart in detecting 5% and 15% sustained shifts in a bivariate lognormal process and demonstrates very consistent performance across the range of m considered. However, a different story is seen with a 30% sustained shift of the mean, as MMR-MSD chart performance falls to unacceptable levels. 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Lognormal Process with an IS in p=5 MSD (20,5) MSD (100,5) MSD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 83 Figure 5.3.5 Control Chart Performance on Increasingly Contaminated LGN Data in p = 2 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with a 5% SS MSD (20,5) MSD (100,5) MSD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with a 15% SS MSD (20,5) MSD (100,5) MSD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with a 30% SS MSD (20,5) MSD (100,5) MSD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 84 Further testing revealed that the MMR-MSD chart is robust to sustained shifts of the mean in skewed bivariate data with contamination levels up to approximately 20%. MMR-MSD chart performance for m = 100, n = 5 and contamination levels 5(5)30% is illustrated in Figure 5.3.6. Figure 5.3.6 MMR-MSD Chart Performance on Increasingly Contaminated LGN Data Surprisingly, this happens despite the fact that the multivariate median determined by the MSD function has RBP equal to 1/2, indicating a high degree of robustness to outliers. However, as noted by R. Serfling (personal communication, June 6, 2011), the RBPs of other quantiles determined by MSD decrease from the median outward. This means that when a high percentage of a data set is shifted, even though the center of the data is well estimated by MSD, the overall center-outward ordering may be adversely affected by outlying points. In order to demonstrate this, a simple example using bivariate lognormal data with a 30% randomly directed, sustained mean shift with ??= 4 is presented in Figure 5.3.7. For illustrative purposes, only 20 individual observations are simulated, so 14 points are IC and 6 are OC. In Figure 5.3.7, each bivariate data point is labeled with two numbers representing ranks 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with Various SS 5% SS 15% SS 20% SS 25% SS 30% SS 85 determined by the MSD function and the RMD function, respectively. Ideally, the IC points should be labeled with the highest ranks from 1 (most central) to 14, and the OC points should be labeled with the lowest ranks from 15 to 20 (most outlying). Figure 5.3.7 MSD and RMD Rankings for Bivariate LGN Data with a 30% SS This holds true for the ranks determined by the RMD function, but is clearly not the case with MSD. Close scrutiny of Figure 5.3.7 reveals that the center determined by the MSD function is relatively close to the center determined by the RMD function, but the similarities end there. The MSD function assigns the lowest ranks (indicating outlyingness) to points along the outer limits of entire data cloud consisting of both IC and OC points. This disrupts the entire ranking scheme, resulting in several IC points being assigned ranks which suggest outlyingness, and several OC points receiving ranks which indicate centrality. For example, consider the point located at approximate coordinates (-0.75, -0.50). This point is assigned a rank of 19 by the MSD function, which indicates a high degree outlyingness, and a distinctly different rank of 11 by the RMD function, which strongly suggests that the point belongs to the IC cluster. -1 0 1 2 3 4 5 -1 - 0 . 5 0 0 . 5 1 1 . 5 2 X 1 X 2 5 , 5 2 , 1 3 , 1 3 1 5 , 7 1 4 , 8 8 , 9 4 , 1 4 7 , 1 0 1 1 , 1 2 6 , 4 1 6 , 6 1 , 2 9 , 3 1 9 , 1 1 1 8 , 1 9 1 0 , 1 5 1 2 , 1 6 1 3 , 1 7 2 0 , 2 0 1 7 , 1 8 I C P o i n t s O C P o i n t s 86 Further analysis was performed by simulating m = 200 subgroups of size n = 5 of bivariate lognormal data with both 5% and 30% randomly directed, sustained mean shifts with ? = 4. For each scenario, a scatterplot was constructed of ranks determined by the MSD function versus ranks determined by the RMD function. A straight line was drawn to represent the path the plotted ranks would follow if both depth functions generated equivalent rankings for each observation. Results are provided in Figure 5.3.8. Figure 5.3.8 Scatterplots of MSD vs. RMD Ranks for Shifted Bivariate LGN Data In the case of a 5% sustained shift, MSD and RMD rankings are in general agreement for the lowest and highest rankings, and there is a moderate amount of variation in the middle. However, with a 30% sustained shift, the differences in depth functions become more apparent. There is much more variability overall, but the rankings assigned to the OC points are especially notable. The MSD function consistently assigns higher rankings to the OC points than does the RMD function, as evidenced by the fact that most of the OC points fall well above the diagonal 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 M SD R a nk R M D R a nk 5% S u s t ai n e d S h i f t I C P o in t s O C P o in t s 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 M SD R a nk R M D R a nk 30% S u s t ai n e d S h i f t I C P o in t s O C P o in t s 87 line in the right panel of Figure 5.3.8. In other words, at the 30% contamination level, the MSD function usually classifies OC points as more central than they truly are. Because of this, many of the IC points correspondingly receive lower rankings (incorrectly suggesting outlyingness) from the MSD function than from the RMD function. The rankings determined by the MSD and RMD functions are similar only for the most extreme OC points (rankings near 1000). Although simulation results vary with other randomly generated shift directions, the general conclusion remains the same -- a more robust depth function is needed for skewed data with contamination levels exceeding 15%. Since the MMR chart using RMD did not break down at the 30% contamination level with symmetric distributions, it was decided to rerun the skewed distribution scenarios depicted in Figure 5.3.5 using RMD instead of MSD as the depth function. The results are displayed in Figure 5.3.9. 88 Figure 5.3.9 MMR-MSD/RMD Chart Performance on Increasingly Shifted LGN Data 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with a 5% SS MSD (20,5) MSD (100,5) MSD (200,5) RMD (20,5) RMD (100,5) RMD (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with a 15% SS MSD (20,5) MSD (100,5) MSD (200,5) RMD (20,5) RMD (100,5) RMD (200,5) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with a 30% SS MSD (20,5) MSD (100,5) MSD (200,5) RMD (20,5) RMD (100,5) RMD (200,5) 89 For a 5% sustained shift of the mean, the MMR-RMD chart is less effective than the MMR-MSD chart in detecting shifts of magnitude ? = 0.5 - 1 and marginally better in detecting shifts of magnitude ? = 1.5 - 2. The same is true for a 15% level of contamination, but the differences in chart performance are slightly magnified. When 30% of the data are shifted, however, the MMR-RMD chart is clearly the better alternative because it does not break down in the presence of severe contamination levels. The MMR-RMD chart's performance as compared to Hotelling's T2 chart with a 30% sustained shift of the mean is illustrated in Figure 5.3.10. The MMR-RMD chart clearly outperforms Hotelling's T2 chart for m ? 100, but more importantly offers reasonable distribution-free performance for all m even in the presence of severe contamination levels. Figure 5.3.10 MMR-RMD Chart Performance on Bivariate LGN Data with a 30% SS When the dimension is increased to five, the same trends in MMR-MSD chart performance under sustained shifts of the mean in skewed data are observed, along with the slight loss in power which accompanies increased dimensionality. Figure 5.3.11 shows the results of applying the MMR-MSD and Hotelling's T2 charts to a five-dimensional lognormally 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate Lognormal Process with a 30% SS RMD (20,5) RMD (100,5) RMD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 90 distributed process with a 15% sustained shift of the mean. At least 100 subgroups, as opposed to m ? 20 in the bivariate case, are required for MMR-MSD chart performance to surpass Hotelling's T2 chart performance. Figure 5.3.11 Control Chart Performance on LGN Data with a 15% SS in p = 5 Based on these results, it is concluded that when dealing with skewed data containing sustained mean shifts, the MMR-MSD chart is preferred for contamination levels up to 15%, and the MMR-RMD chart is the best option if the contamination level is suspected to exceed 15%. Alternatively, both MMR-MSD and MMR-RMD charts could be run on the same data set in order to provide maximum detection capability for all possible contamination levels. Complete tables of results for all simulations performed using the multivariate lognormal distribution with sustained shifts of the mean are provided in Appendices V - Y. In addition, a matrix of recommended control chart usage with skewed multivariate data under both isolated and sustained shifts of the mean is provided in Table 5.3.1. The MMR-MSD chart is almost always preferred for contamination levels of 15% or less, and the MMR-RMD chart can be used 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Lognormal Process with a 15% SS in p=5 MSD (20,5) MSD (100,5) MSD (200,5) HT2 (20,5) HT2 (100,5) HT2 (200,5) 91 for higher contamination levels when the number of subgroups is sufficiently large. For the few cases in which Hotelling's T2 chart outperforms the MMR chart, more research is necessary because implementation of Hotelling's T2 chart with empirical UCL is only possible if the exact process distribution is known. Table 5.3.1 Recommended Phase I Control Chart Usage for Skewed Multivariate Data 5.4 MMR Chart Performance with Larger Subgroup Sizes In order to assess the effects of larger subgroup sizes on MMR chart performance, a targeted analysis using m = 100 and n = 5(5)20 was undertaken. The MMR-RMD chart was evaluated under both isolated and 15% sustained shifts of the mean in five-dimensional t(3) and lognormally distributed processes. Simulation results reveal that increasing the subgroup size for a given m enhances the performance of both the MMR and Hotelling's T2 charts, but the MMR chart reigns supreme in all cases considered. As exhibited in Figure 5.4.1, the empirical probability of an MMR-RMD or Hotelling's T2 chart detecting an isolated shift in five dimensions is raised substantially by increasing the subgroup size from 5 to 20. The difference in performance between the MMR-RMD and C on t am i n at i on L e ve l ( 20,5) ( 100,5) ( 200,5) I S 5% S S 15% S S 30% S S H T 2 I S 5% S S 15% S S 30% S S H T 2 M M R - R M D p 2 5 M M R - M S D R M D M M R - M S D ( m ,n ) 92 Hotelling's T2 charts is smallest when n = 20, but the MMR-RMD chart remains the superior alternative throughout the range of subgroup sizes evaluated. The overall trends for detection of isolated shifts in heavy-tailed and skewed processes are very similar, although shifts of smaller magnitude are detected more rapidly in a skewed process. Figure 5.4.1 Effects of Subgroup Size on Control Chart Performance Under an IS in p = 5 A comparable pattern of performance is witnessed in detection of 15% sustained shifts of the mean by the MMR-RMD and Hotelling's T2 charts. Figure 5.4.2 shows that increasing the 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter t(3) Process with an IS in p = 5 RMD (100,5) RMD (100,20) HT2 (100,5) HT2 (100,20) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Lognormal Process with an IS in p = 5 RMD (100,5) RMD (100,20) HT2 (100,5) HT2 (100,20) 93 subgroup size raises the EAP for both charts considerably, but the MMR-RMD chart always performs better than Hotelling's T2 chart with empirically adjusted UCL. Figure 5.4.2 Effects of Subgroup Size on Chart Performance Under a 15% SS in p = 5 These results are somewhat surprising, as one might think that increasing the subgroup size to n = 20 would result in approximate normality of subgroup averages. This in turn would make a normal-theory method such as Hotelling's T2 chart a better option than the MMR chart. Although normality of subgroup averages will eventually be achieved for sufficiently large n due to the central limit theorem, it is unlikely that subgroup sizes n > 20 will be observed in reality. 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter t(3) Process with a 15% SS in p = 5 RMD (100,5) RMD (100,20) HT2 (100,5) HT2 (100,20) 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Lognormal Process with a 15% SS in p = 5 RMD (100,5) RMD (100,20) HT2 (100,5) HT2 (100,20) 94 For more practical subgroup sizes such as 5 ? n ? 20, the distribution-free MMR chart is clearly the best alternative. Complete tables of results for all subgroup size analyses performed are provided in Appendices Z - BB. 5.5 Robust Estimators of Location and Scatter for the MMR Chart It was originally decided to use the BACON method of Billor et al. (2000) to robustly estimate both the mean vector and covariance matrix for use with the MMR chart. However, it was later determined that using the BACON location estimator with Type I error probability ??= 0.10 and Hotelling's T2 scatter estimator S would result in significantly enhanced MMR chart performance. This choice of robust estimators was briefly addressed in Chapter 2, and will be discussed in detail here. In early test runs, the MMR-RMD chart using strictly BACON estimators was compared to Hotelling's T2 chart with empirically adjusted UCL using a bivariate t(3) process with a sustained shift of the mean. The BACON method of estimation with ??= 0.05 performed nearly perfectly in detecting large process shifts (????8) and subsequently excluding OC points from the resulting location and scatter estimates. With smaller shifts (??< 8) however, the BACON method did not consistently identify outlying points, often resulting in estimated mean vectors and covariance matrices which were approximately equivalent to the classical nonrobust estimates. The contamination in the estimated parameters resulted in degraded performance of the MMR chart and as indicated in Figure 5.5.1, this effect was magnified as the level of contamination in the data set was raised from 15% to 30%. Limited testing of the MCD method to determine robust location and scatter estimates yielded similar results at the cost of a significantly higher computational burden. 95 Figure 5.5.1 Comparison of MMR-RMD (Using BACON Estimators) and HT2 Charts Figure 5.5.2 shows why it is so difficult for even robust methods to distinguish IC from OC data when ? is small. The univariate t(3) plots represent probability density functions for various unshifted and shifted t(3) distributions. The bivariate graphs were created by randomly generating 500 observations from a bivariate t(3) distribution and inducing a location shift upon 15% of the data. In the first row of Figure 5.5.2, a one unit shift is barely distinguishable. In the second row, a four unit shift is more noticeable but still results in significant overlap between unshifted and shifted data. It takes an eight unit shift, as depicted in the third row of Figure 5.5.2, to clearly separate shifted data from unshifted data. 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with 15% and 30% SS RMD 15% SS RMD 30% SS HT2 15% SS HT2 30% SS 96 Figure 5.5.2 The Effects of Increasing Shift Sizes on Univariate and Bivariate t(3) Data Also illustrated in Figure 5.5.1, Hotelling's T2 control chart with empirical UCL is substantially less affected by higher contamination levels than the MMR-RMD chart. This is because Hotelling's scatter estimator for data consisting of m subgroups of size n, 1 1 ,m i im ?? ?SS where Si has (k,l)th entry ? ?? ? 1 1 ,1 n j k i j l i jn ? ???? ? X X X X is robust to location shifts under the assumption that shifted subgroups possess the same covariance structure as unshifted subgroups. S represents the average of the m subgroup covariance matrices, each of which is computed -4 -2 0 2 4 6 8 10 12 14 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 x f (x) t ( 3 ) t ( 3 ) + 1 -6 -4 -2 0 2 4 6 8 10 12 14 - 1 0 -5 0 5 10 X 1 X 2 t ( 3 ) v s . t ( 3 ) + 1 I C P o in t s O C P o in t s -4 -2 0 2 4 6 8 10 12 14 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 x f (x) t ( 3 ) t ( 3 ) + 4 -6 -4 -2 0 2 4 6 8 10 12 14 - 1 0 -5 0 5 10 X 1 X 2 t ( 3 ) v s . t ( 3 ) + 4 I C P o in t s O C P o in t s -4 -2 0 2 4 6 8 10 12 14 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 x f (x) t ( 3 ) t ( 3 ) + 8 -6 -4 -2 0 2 4 6 8 10 12 14 - 1 0 -5 0 5 10 X 1 X 2 t ( 3 ) v s . t ( 3 ) + 8 I C P o in t s O C P o in t s 97 with respect to its subgroup mean iX rather than the mean of the entire data set X as with the classical covariance estimator S, which has (k,l)th entry ? ?? ? 1 1 .1 N j k j l jN ? ???? ? X X X X Accordingly, S is not inflated by OC subgroups as are classical methods which consider the data set as a whole or robust methods which fail to exclude outliers. This result is true only for subgrouped data. When individual data are encountered in a control charting application, Hotelling's T2 scatter estimator becomes the nonrobust classical covariance matrix. Under those circumstances, robust parameter estimation methods such as BACON may be preferred because they exclude OC points corresponding to shifts with large ?. Based on these findings, it was decided to substitute Hotelling's T2 scatter estimator S for the BACON scatter estimator in the MMR chart. To achieve a more robust location estimate for the MMR chart, the BACON method was implemented with a higher Type I error probability. Experimentation with the BACON method using ??= 0.05, 0.10, 0.20, and 0.35 showed that ??= 0.10 provides the best compromise between Type I and Type II error. As indicated in Figure 5.5.3, implementation of the MMR-RMD chart using the new estimators results in significantly enhanced performance over the MMR-RMD chart using strictly BACON estimators, especially when the contamination level is high. 98 Figure 5.5.3 Improvement in MMR-RMD Chart Performance with New Estimators Surprisingly, even with the new estimators, 30% sustained shifts of the mean are detected by both charts with lower probability than 15% sustained shifts. In the case of Hotelling's T2 chart, this occurs because Hotelling's T2 scatter estimator is naturally robust but Hotelling's T2 location estimator 1 1 m i im ?? ?XX is equivalent to the classical mean vector and is therefore nonrobust. To verify this, the charts in Figure 5.5.1 were repeated using a known mean vector of all zeros. As expected, Figure 5.5.4 shows that 30% sustained shifts are detected by Hotelling's T2 chart with higher probability than 15% sustained shifts when the mean vector is known, yet the same does not hold true for the MMR-RMD chart. Additional experimentation revealed that this occurs because of the redistribution of the ranks assigned to depth values during the MMR control charting process, and is simply an unavoidable consequence of rank-based control charting. 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with 15% and 30% SS BACON/HT2 Est 15% SS BACON Only 15% SS BACON/HT2 Est 30% SS BACON Only 30% SS 99 Figure 5.5.4 Change in Chart Performance When the Mean is Known To illustrate the redistribution of ranks in conjunction with higher contamination levels, 100 observations consisting of m = 20 subgroups of size n = 5 from an in-control bivariate standard normal process were simulated. Depth values for each point were computed using RMD with BACON location estimator (? = 0.10) and Hotelling's T2 scatter estimator, and ranks were assigned to each point from nearest (rank = 1) to farthest (rank = 100) from the center. Next, 5% of the data were shifted by three units to the right, RMD values were recomputed, and new ranks were recorded. Finally, this process was repeated using a 30% contamination level. For both 5% and 30% shifts, Figure 5.5.5 illustrates scatterplots and rank charts of IC and OC data before and after the shifts. If the rankings of IC points were unaffected by the shifts as one might expect, they would follow a straight line on the before versus after rank charts. For the 5% shift in column one this is nearly the case, but the rank chart for the 30% shift in column two shows that the IC rankings are significantly affected by the induced shift. This is true because as previously illustrated in Figure 5.5.2, a shift of magnitude three is not large enough to clearly separate the IC points from the OC points. Rather, many of the IC and OC points are comingled, thus distorting the rankings and making it more difficult for the MMR chart to 0.00.1 0.20.3 0.40.5 0.60.7 0.80.9 1.0 Em pirica l Ala rm P ro ba bil ity Noncentrality Parameter Bivariate t(3) Process with Known Mean RMD 15% SS RMD 30% SS HT2 15% SS HT2 30% SS 100 distinguish between them. As the level of contamination in the data is raised, the level of distortion in the rankings increases and MMR chart performance decreases accordingly. Figure 5.5.5 Redistribution of Ranks Under 5% and 30% Sustained Shifts Despite this effect, the MMR chart using data depth with BACON location estimator (? = 0.10) and Hotelling's T2 scatter estimator is much more effective than Hotelling's T2 chart in detecting isolated and sustained shifts of the mean in both symmetric and skewed multivariate data. This was illustrated throughout Chapter 5 for various combinations of m = 20, 50(50)200, Scatterplot Before 5% Shift IC OC Scatterplot Before 30% Shift IC OC Scatterplot After 5% Shift IC OC Scatterplot After 30% Shift IC OC 0 20 40 60 80 100 0 20 40 60 80 100R an k o f sh ift ed da ta Rank of unshifted data Rank Chart for 5% Shift IC OC 0 20 40 60 80 100 0 20 40 60 80 100R an k o f sh ift ed da ta Rank of unshifted data Rank Chart for 30% Shift IC OC 101 n = 5(5)20, and p = 2, 5, and 10. The MMR chart has the added advantage of being distribution free, unlike Hotelling's T2 chart which has to be tailored to the specific process distribution under study. In order to illustrate a complete application of the MMR chart, an example is offered in the following chapter. 102 6 An Example Phase I Analysis Using the MMR Chart 6.1 Simulating the Contaminated Reference Sample In order to demonstrate an application of the MMR-MSD chart from start to finish, a simulated example involving a five-dimensional, lognormally distributed reference sample with m = 100, n = 5, and three isolated shifts of the mean is presented. Data and shift directions were generated in accordance with the procedures outlined in Chapter 4. Isolated shifts of increasing magnitude were applied to single subgroups as follows: ? = 3 at subgroup 4, ? = 5 at subgroup 41, and ? = 200 at subgroup 91. The shift of magnitude ??= 3 represents the smallest shift for which the MMR-MSD chart was shown in Chapter 5 to have a nearly perfect detection ability, and the shift of magnitude ? = 200 is designed to illustrate the sensitivity of robust and nonrobust estimators to extreme outliers. Using a desired IC FAP of 0.05, the MMR-MSD chart using UCLs from Table 6.1.1 was compared to Hotelling's T2 chart with Alt's (1976) Phase I UCL. Table 6.1.1 MMR Chart UCLs for Chapter 6 Example UCL S i m u l at e d F A P 100 5 2.992 0.0483 99 5 2.99 0.0484 98 5 2.987 0.0482 97 5 2.986 0.0485 m n D e s i r e d F A P = 0 .0 5 103 6.2 Removing Outliers from the Sample MMR-MSD and Hotelling's T2 charts applied to the unedited reference sample are pictured in Figure 6.2.1. Each chart contains a superimposed table of potential OC subgroups. Figure 6.2.1 Initial Application of Phase I Control Charts to the Lognormal Sample The three shifted subgroups are readily apparent on the MMR-MSD chart, as they all fall above the initial UCL for m = 100, n = 5. The extreme outlier represented by subgroup 91 does not look considerably different than the other two outliers for several reasons. First of all, its extreme outlyingness is mitigated by the rank-based nature of the MMR-MSD control chart statistic. A rank from 1 to N assigned to a point in a reference sample represents only the -3 -2 -1 0 1 2 3 4 0 20 40 60 80 100 Co ntr ol Cha rt Sta tis tic Subgroup MMR-MSD Chart MMR Stat MMR UCL i Z i UCL 4 3.156 2.992 41 3.598 2.992 91 3.850 2.992 0 10 20 30 40 50 60 0 20 40 60 80 100 Co ntr ol Cha rt Sta tis tic Subgroup Hotelling's T2 Chart HT2 Stat HT2 UCL i T i 2 UCL 4 151.25 22.59 41 238.79 22.59 91 268,471 22.59 104 position of that point with respect to the other N = m x n points in the sample as determined by a depth function -- the degree of outlyingness is not reflected in the ranking. The most outlying point in a data set will receive a rank of N, regardless of whether the point is only marginally more outlying than all others or a significant distance away from the rest of the p-dimensional data cloud. Also, computation of MSD does not involve estimation of a location vector, hence its robustness to isolated shifts in location no matter how extreme. As was shown in Chapter 5, very high contamination levels can redistribute the ranks in such a manner that the MMR-MSD chart becomes ineffective at detecting sustained shifts, but extreme isolated shifts are detected with ease. Even if the RMD function (which does require an estimated mean vector to compute) was used in this scenario, the BACON location estimator would exclude the extreme outlier represented by subgroup 91 and the resulting MMR-RMD control chart would be very similar to the MMR-MSD chart. As a result of these properties, the MMR chart is well insulated against the effects of a single extreme outlier in a given reference sample. With Hotelling's T2 chart, however, the extreme outlier has a dramatic effect on the T2 statistic for each subgroup, as evidenced by the fact that the majority of the control chart statistics fall above the initial UCL. This occurs because the grand mean X used in computing ? ? ? ?12 iiiTn ??? ? ?X X S X X is not robust to outliers. A more robust estimator for the mean vector such as BACON could prevent this from occurring, but is beyond the scope of this research. The next step in a Phase I analysis is to investigate each potential OC subgroup for an assignable cause. In this example, it is assumed that all potential OC subgroups have assignable causes and therefore warrant removal from the data set. Some control chart authors advocate 105 removing all OC subgroups at once, and then recalculating control limits. Others believe that OC subgroups should be removed one at a time beginning with the most outlying subgroup, with the control limits being recalculated at each iteration. This example will take the latter approach. The most extreme OC subgroup for both the MMR-MSD and Hotelling's T2 control charts is subgroup 91, so it will be removed first. Once an OC subgroup is removed from the data set, both control charts are reconstructed using control limits appropriate for the reduced number of subgroups. The control charts for m = 99, n = 5 after removal of the first OC subgroup are depicted in Figure 6.2.2. Figure 6.2.2 Second Iteration of the MMR-MSD Control Chart -3 -2 -1 0 1 2 3 4 0 20 40 60 80 100 Co ntr ol Cha rt Sta tis tic Subgroup MMR-MSD Chart MMR Stat MMR UCL i Z i UCL 4 3.152 2.990 41 3.661 2.990 0 10 20 30 40 50 60 0 20 40 60 80 100 Co ntr ol Cha rt Sta tis tic Subgroup Hotelling's T2 Chart HT2 Stat HT2 UCL i T i 2 UCL 4 151.25 22.57 41 238.79 22.57 106 After removing the extreme outlier, control chart statistics for the remaining two planted outliers still exceed the UCLs for both the MMR-MSD and Hotelling's T2 control charts. Next, the outlier represented by subgroup 41 is removed and both control charts are recalculated using m = 98, n = 5. Finally, the outlier represented by subgroup 4 is eliminated and the control charts are recomputed using m = 97, n = 5. The final MMR-MSD and Hotelling's T2 control charts after sequentially removing all planted outliers are illustrated in Figure 6.2.3. Figure 6.2.3 Final Control Charts After Four Iterations of Phase I Analysis At this point, all control chart statistics for the MMR-MSD chart fall under the UCL, so the remaining reference sample consisting of m = 97 subgroups is correctly declared to be IC. -3 -2 -1 0 1 2 3 4 0 20 40 60 80 100 Co ntr ol Cha rt Sta tis tic Subgroup MMR-MSD Chart MMR Stat MMR UCL 0 10 20 30 40 50 60 0 20 40 60 80 100 Co ntr ol Cha rt Sta tis tic Subgroup Hotelling's T2 Chart HT2 Stat HT2 UCL i T i 2 UCL 58 24.82 22.53 67 34.65 22.53 82 32.98 22.53 90 24.03 22.53 107 Hotelling's T2 chart, despite following the same outlier removal process as the MMR-MSD chart, still identifies four potential OC subgroups after the third iteration of a Phase I analysis. Further iterations could result in the identification of even more potential OC subgroups because the Phase I UCL for Hotelling's T2 chart is adjusted downward as the number of subgroups is decreased with each iteration. 6.3 Analyzing the Results Hotelling's T2 chart falsely identifies multiple potential OC subgroups because normal- theory Phase I UCLs were applied to skewed data, illustrating the danger of applying a normal- theory method without regard to the underlying distribution of a process. Using UCLs empirically tailored to a five-dimensional multivariate lognormal distribution would solve the problem of multiple false alarms, but would also result in a loss in detection power as only the first two OC subgroups would be identified and removed. In addition, the exact process distribution would not be known in anything but a simulation example such as the one presented here, so empirical UCLs for Hotelling's T2 chart are not practical for widespread implementation. The MMR chart is clearly a superior alternative because it offers accurate, distribution-free performance with a low computational burden. 108 7 Conclusion 7.1 Synopsis of Findings The MMR chart for detecting location shifts in subgrouped data represents the first known distribution-free Phase I multivariate control chart. This work represents the culmination of extensive research to synthesize appropriate statistical process control techniques, data depth functions, and robust parameter estimation methods to create a distribution-free, computationally feasible, and accurate Phase I multivariate control charting methodology. The MMR chart has been shown to be extremely effective in detecting isolated and sustained shifts of the mean in both heavy-tailed and skewed multivariate data. 7.2 Summary of Research Conducted The MMR chart was created as a multivariate extension of Jones-Farmer et al.'s (2009) univariate distribution-free Phase I mean-rank chart for subgroup location. Given an unedited p- dimensional reference sample consisting of m subgroups of size n, data depth functions in conjunction with robust estimators were used to reduce multivariate data to univariate depth values. The robust Mahalanobis depth function for elliptically symmetric data was implemented using the BACON location estimator and Hotelling's T2 scatter estimator for subgrouped data. The Mahalanobis spatial depth function, which is not reliant on distributional assumptions and does not require a location estimator, was employed using Hotelling's T2 scatter estimator for 109 subgrouped data. Depth values resulting from these functions were ranked and converted into MMR control chart statistics for each subgroup, which were then compared to empirical UCLs determined through simulation of the joint distribution of the MMR control chart statistic. Hotelling's T2 control chart with Alt's (1976) Phase I UCLs for normally distributed data and empirically adjusted UCLs for nonnormally distributed data was used to establish a baseline level of Phase I performance. Performance comparisons of the MMR chart to Hotelling's T2 chart included scenarios involving simulated multivariate normally distributed data, heavy-tailed data represented by the multivariate t(3) distribution, and skewed data represented by the multivariate lognormal distribution for m = 20, 50(50)200 subgroups of size n = 5 and dimensions p = 2, 5, and 10. All data were standardized, without loss of generality, to have zero mean vector and identity covariance matrix. IC performance was measured by each chart's ability to maintain the desired FAP using simulated IC data. OC performance was measured by each chart's EAP under isolated as well as 5%, 15%, and 30% sustained shifts of the mean assuming constant within-subgroup covariance. Shifts were fixed in a specific direction with elliptically symmetric distributions without loss of generality, and averaged over a uniform distribution of shift directions with skewed distributions. Limited analysis was performed on the effect of increased subgroup sizes on control chart performance in Phase I. 7.3 Recommendations for Phase I Analysis A comprehensive simulation study shows that when normality of Phase I multivariate process data can be established, Hotelling's T2 chart with Alt's (1976) Phase I UCL is preferred for detecting isolated or sustained shifts of the mean. This is not unexpected, as one would expect a normal-theory method to outperform a distribution-free method when a process is 110 multivariate normally distributed, and the original intent of the MMR chart was to provide a distribution-free control charting methodology for processes demonstrating clear departures from normality. When Phase I process data are heavy-tailed or skewed, the MMR chart usually outperforms Hotelling's T2 chart in detecting isolated or sustained shifts of the mean. More importantly, the MMR chart offers truly distribution-free performance because the UCL for a given application depends only on the number of subgroups, the size of each subgroup, and the desired IC FAP without regard to the form of the underlying process distribution. UCLs for Hotelling's T2 chart, on the contrary, must be empirically tailored to the exact distribution of a nonnormally distributed process to achieve the desired IC FAP, something which is only possible in a simulation environment. An added benefit of the MMR chart is that, for a given OC scenario involving nonnormally distributed data, its performance is far more invariant to the size of m than Hotelling's T2 chart with empirical UCL, thus making it even more attractive as a distribution-free alternative. As indicated in Table 5.2.1, the MMR-RMD chart is recommended for most situations involving heavy-tailed data as long as the required minimum number of subgroups is present. As shown in Table 5.3.1, when process data are skewed, the MMR-MSD chart is almost always recommended if the contamination level is less than 15%, and the MMR-RMD chart is preferred for contamination levels above 15% if the number of subgroups is sufficiently large. In all cases tested, as the dimension of the data or the level of contamination is raised, the minimum number of subgroups required for the MMR chart to achieve superiority over Hotelling's T2 chart with empirical limits correspondingly increases but remains within reasonable bounds. These general conclusions are based on a subgroup size of at least n = 5. Larger subgroup sizes reduce the 111 minimum number of subgroups required for MMR chart performance to surpass that of Hotelling's T2 chart. 7.4 Recommendations for Phase II Monitoring Once an IC reference sample has been determined through a successful Phase I analysis using the MMR chart, it can be used in conjunction with an appropriate Phase II method to monitor future observations for any departures from the IC state. As noted by C. Champ (personal communication, May 12, 2011), since more is known about a process at the conclusion of a Phase I analysis, the form of a Phase II control chart does not necessarily have to match the form of a Phase I control chart. Although this research will assume nonnormally distributed data throughout the retrospective analysis and monitoring phases, this means that even though the Phase I MMR chart is specifically designed for multivariate data collected in subgroups, the search for the most suitable Phase II complement to the Phase I MMR chart need not be limited to methods requiring subgrouped multivariate data. After an extensive literature review, the MEWMA chart proposed by Lowry et al. (1992), with small smoothing parameter as recommended by Stoumbos and Sullivan (2002), is recommended for Phase II monitoring because it is easy to understand and implement, well documented in statistical process control literature, and robust to the underlying process distribution. The MEWMA control chart statistic represents a weighted average of all Phase II observations, with the most recent observation assigned a weight equal to the smoothing constant r and all previous observations assigned weights which geometrically decrease according to their age. Stoumbos and Sullivan (2002) showed that the MEWMA chart can be successfully applied to nonnormally distributed individual or subgrouped multivariate data if a sufficiently small 112 smoothing constant is chosen. Based on the results of a comprehensive simulation exercise, the authors recommend a smoothing constant of ? ?0.02, 0.05r? for five or less dimensions and r ? 0.02 for more than five dimensions for reliable detection of sustained location shifts in heavy- tailed or skewed multivariate data. A subsequent study by Testik et al. (2003) mirrored the findings of Stoumbos and Sullivan (2002) regarding use of the MEWMA chart as a robust Phase II method. It should be noted that all three aforementioned MEWMA chart studies are based on the assumption that the IC mean vector and covariance matrix are known. If the MEWMA chart is employed following a Phase I analysis using the MMR chart, the mean vector and covariance matrix are not known but rather estimated from an IC reference sample. If the IC reference sample is too small, using estimated as opposed to known parameters can lead to more frequent false alarms and a lower probability of detecting of OC conditions, especially when the smoothing constant is small. For the univariate EWMA chart, this effect was detailed by Jones, Champ, and Rigdon (2001), and design strategies to alleviate this problem were offered by Jones (2002). For the MEWMA chart, Champ and Jones-Farmer (2007) showed that widening the control limits through simulation to account for the additional variability introduced by the use of estimated parameters results in nearly the same performance as the known parameter case. An analytical method of determining control limits for the MEWMA chart with estimated parameters as well as the minimum sample size required for estimated parameter performance to equal known parameter performance are topics for future research. Despite these open issues, the MEWMA chart represents the most broadly applicable control charting methodology for Phase II monitoring of nonnormally distributed multivariate data. 113 A potential criticism of the MEWMA chart is that using a small smoothing parameter to improve robustness to nonnormality decreases control chart sensitivity to large sustained mean shifts and isolated outlying observations, but it can be argued that this is not a significant disadvantage in a Phase II control charting scenario. As previously noted in Chapter 3 of this document, Montgomery (2005, p. 386) characterizes control charts which accumulate information from sequences of points (e.g. CUSUM, EWMA, and their multivariate counterparts) as being ideally suited for Phase II monitoring because they are more sensitive to small process shifts than Shewhart type charts which use information only from the most recent observation. According to Montgomery (2005, p. 386), sensitivity to small shifts is desirable for a Phase II control chart because in contrast to Phase I, "assignable causes do not typically result in large process upsets or disturbances" in Phase II. If greater control chart sensitivity to large sustained mean shifts or individual outliers is desired, the reader is directed to the Chapter 1 discussion of Phase II nonparametric, distribution-free, and robust control charts. Although a few such methods could potentially supplement the MEWMA control chart in certain scenarios, none have proven as effective as the MEWMA chart with small smoothing constant on a wide range of nonnormally distributed data in higher dimensions. 7.5 Future Research Directions The MMR chart fills a notable gap in current multivariate quality control literature, yet much work remains to be done in the field of distribution-free Phase I multivariate quality control. Although it is believed that the fundamental structure of the MMR chart is sound, potential refinements include further exploration of the BACON method to determine optimal input parameters (e.g. Type I error probability) for maximum robustness to shifts of all 114 magnitudes, implementation of other location and scatter estimators to improve robustness to higher contamination levels, and experimentation with alternative data depth functions which may enhance MMR chart performance. Additionally, since the MMR chart is designed to detect location changes in subgrouped multivariate data during Phase I, an equivalent distribution-free chart for detecting scale changes is needed for Phase I scenarios in which the assumption of constant within-subgroup covariance is not appropriate. Finally, Phase I distribution-free charts for detecting both location and scale changes in small subgroups (n < 5) and individual multivariate observations (n = 1) should be sought as well. It is the hope of this author that the success of the MMR chart as the first proposed distribution-free Phase I multivariate method will serve as the catalyst for some or all of this additional research. 115 References Alfaro, J.L., & Ortega, J.F. (2008). A Robust Alternative to Hotelling's T2 Control Chart Using Trimmed Estimators. Quality and Reliability Engineering International, 24, 601-611. Aloupis, G. (2005, August). Geometric and Combinatorial Issues in Data Depth. Presented at the Franco-Canadian Workshop on Combinatorial Algorithms, Hamilton, Ontario. Aloupis, G. (2006). Geometric Measures of Data Depth. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science (Vol. 72, pp. 147-158). Providence, RI: American Mathematical Society. Alt, F.B. (1976). Small Sample Probability Limits for the Mean of a Multivariate Normal Process. ASQC Technical Conference Transactions, pp. 170-176. Bakir, S.T. (1989). Analysis of Means Using Ranks. Communications in Statistics ? Simulation and Computation, 18(2), 757-776. Beltran, L.A. (2006). Nonparametric Multivariate Statistical Process Control Using Principal Component Analysis and Simplicial Depth. Dissertation, University of Central Florida. Bersimis, S., Panaretos, J., & Psarakis, S. (2005). Multivariate Statistical Process Control Charts and the Problem of Interpretation: A Short Overview and Some Applications in Industry. Proceedings of the 7th Hellenic European Conference on Computer Mathematics and Its Applications. Bersimis, S., Psarakis, S., & Panaretos, J. (2007). Multivariate Statistical Process Control Charts: An Overview. Quality and Reliability Engineering International, 23, 517-543. Billor, N., Hadi, A.S., & Velleman, P.F. (2000). BACON: Blocked Adaptive Computationally Efficient Outlier Nominators. Computational Statistics & Data Analysis, 34, 279-298. Chakraborty, B., & Chaudhuri, P. (1999). A Note on the Robustness of Multivariate Medians. Statistics & Probability Letters, 45, 269-276. Champ, C.W., & Jones, L.A. (2004). Designing Phase I X Charts with Small Sample Sizes. Quality and Reliability Engineering International, 20, 497-510. Champ, C.W., & Jones-Farmer, L.A. (2007). Properties of Multivariate Control Charts with Estimated Parameters. Sequential Analysis, 26, 153-169. 116 Chatterjee, S. & Qiu, P. (2009). Distribution-Free Cumulative Sum Control Charts Using Bootstrap-Based Control Limits. The Annals of Applied Statistics, 3(1), 349-369. Chenouri, S., & Steiner, S.H. (2009). A Multivariate Robust Control Chart for Individual Observations. Journal of Quality Technology, 41(3), 259-271. Chenouri, S., & Variyath, A.M. (2011). A Comparative Study of Phase II Robust Multivariate Control Charts for Individual Observations. Quality and Reliability Engineering International, 27(3) [Electronic version]. Chou, Y.M., Mason, R.L., & Young, J.C.(2001). The Control Chart for Individual Observations from a Multivariate Non-Normal Distribution. Communications in Statistics ? Theory and Methods, 30(8), 1937-1949. Crosier, R.B. (1988). Multivariate Generalizations of Cumulative Sum Quality Control Schemes. Technometrics, 30(3), 291-303. Dai, Y., Zhou, C., & Wang, Z. (2006a). Multivariate CUSUM Control Charts Based on Data Depth for Preliminary Analysis (Working paper). Dang, X., & Serfling, R. (2010). Nonparametric Depth-Based Multivariate Outlier Identifiers, and Masking Robustness Properties. Journal of Statistical Planning and Inference, 140, 198-213. Donoho, D.L. & Huber, P.J. (1983). The Notion of a Breakdown Point. In P.J. Bickel, K.A. Doksum and J.L. Hodges, Jr. (Eds.), A Festschrift for Eric L. Lehmann (pp. 157-184). Belmont, CA: Wadsworth. Fricker, R.D., & Chang, J.T. (2009a). The Repeated Two-Sample Rank (RTR) Procedure: A Nonparametric Multivariate Individuals Control Chart (Working paper). Gao, Yonghong (2003). Data Depth Based on Spatial Rank. Statistics and Probability Letters, 65(3), 217-225. Genz, A. (2011). QSIMVNV. Retrieved April 22, 2011, from http://www.math.wsu.edu/faculty/genz/software/software.html. Gibbons, J.D., & Chakraborti, S. (2003). Nonparametric Statistical Inference (4th ed.). New York: Marcel Dekker. Hamurkaroglu, C., Mert, M., & Saykan, Y. (2004). Nonparametric Control Charts Based on Mahalanobis Depth. Hacettepe Journal of Mathematics and Statistics, 33, 57-67. Hawkins, D.M., & Maboudou-Tchao, E.M. (2007). Self-Starting Multivariate Exponentially Weighted Moving Average Control Charting. Technometrics, 49(2), 199-209. 117 Hayter, A.J., & Tsui, K. (1994). Identification and Quantification in Multivariate Quality Control Problems. Journal of Quality Control, 26, 197-208. Hotelling, H. (1947). Multivariate Quality Control ? Illustrated By the Air Testing of Sample Bombsights. In C. Eisenhart, M.W. Hastay, & W.A. Wallis (Eds.), Techniques of Statistical Analysis (pp. 111-184). New York: McGraw-Hill. Hugg, J., Rafalin, E., Seyboth, K., & Souvaine, D. (2006, January). An Experimental Study of Old and New Depth Measures. Paper presented at the Workshop on Algorithm Engineering and Experiments, Miami, FL. Hugg, J., Rafalin, E., & Souvaine, D. (2006, July). Depth Explorer ? A Software Tool for the Analyis of Depth Measures. Presented at the International Conference on Robust Statistics, Lisbon, Portugal. Jackson, J.E. (1991). A User Guide to Principal Components. New York: Wiley. Jensen, W.A., Jones-Farmer, L.A., Champ, C.W., & Woodall, W.H. (2006). Effects of Parameter Estimation on Control Chart Properties: A Literature Review. Journal of Quality Technology, 38(4), 349-364. Jensen, W.A., Birch, J.B., & Woodall, W.H. (2007). High Breakdown Estimation Methods for Phase I Multivariate Control Charts. Quality and Reliability Engineering International, 23(5), 615-629. Jobe, J.M., & Pokojovy, M. (2009). A Multistep, Cluster-Based Multivariate Chart for Retrospective Monitoring of Individuals. Journal of Quality Technology, 41(4), 323-339. Johnson, M.E. (1987). Multivariate Statistical Simulation. New York: Wiley. Jones, L.A. (2002). The Statistical Design of EWMA Control Charts with Estimated Parameters. Journal of Quality Technology, 34(3), 277-288. Jones, L.A., & Woodall, W.H. (1998). The Performance of Bootstrap Control Charts. Journal of Quality Technology, 30(4), 362-375. Jones, L.A., Champ, C.W., & Rigdon, S.E. (2001). The Performance of Exponentially Weighted Moving Average Charts with Estimated Parameters. Technometrics, 43(2), 156-167. Jones-Farmer, L.A., Jordan, V., & Champ, C.W. (2009). Distribution-Free Phase I Control Charts for Subgroup Location. Journal of Quality Technology, 41(3), 304-316. Kruskal, W.H., & Wallis, W.A. (1952). Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association, 47, 583-621. 118 Law, A.M., & Kelton, W.D. (2000). Simulation Modeling and Analysis (3rd ed.). Boston: McGraw-Hill. Lehmann, E.L. (2006). Nonparametrics ? Statistical Methods Based on Ranks (revised 1st ed.). New York: Springer Science+Business Media, LLC. Li, J., & Liu, R. (2004). New Nonparametric Tests of Multivariate Locations and Scales Using Data Depth. Statistical Science, 19(4), 686-696. Liu, R.Y. (1990). On a Notion of Data Depth Based on Random Simplices. The Annals of Statistics, 18, 405-414. Liu, R.Y. (1995). Control Charts for Multivariate Processes. Journal of the American Statistical Association, 90, 1380-1388. Liu, R.Y., & Singh, K. (1993). A Quality Index Based on Data Depth and Multivariate Rank Tests. Journal of the American Statistical Association, 88, 252-260. Liu, R.Y., Singh, K., & Teng, J.H. (2004). DDMA-Charts: Nonparametric Multivariate Moving Average Control Charts Based on Data Depth. Allgemeines Statistisches, 88, 235-258. Lowry, C.A., Woodall W.H., Champ, C.W., & Rigdon, S.E. (1992). A Multivariate Exponentially Weighted Moving Average Control Chart. Technometrics, 34, 46-53. Lowry, C.A., & Montgomery, D.C. (1995). A Review of Multivariate Control Charts. IIE Transactions, 27, 800-810. Mahalanobis, P. C. (1936). On the Generalized Distance in Statistics. Proceedings of the National Institute of Science of India, 12, 49-55. Mason, R.L., Champ, C.W., Tracy, N.D., Wierda, S.J., & Young, J.C. (1997). Assessment of Multivariate Process Control Techniques. Journal of Quality Technology, 29(2), 140-143. Mason, R.L., Chou, Y.M., & Young, J.C. (2001). Applying Hotelling's T2 Statistic to Batch Processes. Journal of Quality Technology, 33(4), 466-479. Mason, R.L., & Young, J.C. (2002). Multivariate Statistical Process Control with Industrial Applications. Alexandria, VA: American Statistical Association; Philadelphia, PA: Society for Industrial and Applied Mathematics. Messaoud, A., Weihs, C., & Hering, F. (2008). Detection of Chatter Vibration in a Drilling Process Using Multivariate Control Charts. Computational Statistics & Data Analysis, 52(6), 3208-3219. 119 Mohammadi, M., Midi, H., Arasan, J., & Al-Talib, B. (2011). High Breakdown Estimators to Robustify Phase II Multivariate Control Charts. Journal of Applied Sciences, 11(3), 503- 511. Montgomery, D.C. (2005). Introduction to Statistical Quality Control (5th ed.). Hoboken, NJ: Wiley. Nedumaran, G., & Pignatiello, J.J. (2000). On Constructing T2 Control Charts for Retrospective Examination. Communications in Statistics ? Simulation and Computation, 29(2), 621- 632. Nedumaran, G., & Pignatiello, J.J. (2005). On Constructing Retrospective X Control Chart Limits. Quality and Reliability Engineering International, 21, 81-89. Oyeyemi, G.M., & Ipinyomi, R.A. (2010). A Robust Method of Estimating Covariance Matrix in Multivariate Data Analysis. African Journal of Mathematics and Computer Science Research, 3(1), 1-18. Pignatiello, J.J., & Runger, G.C. (1990). Comparison of Multivariate CUSUM Charts. Journal of Quality Technology, 22(3), 173-186. Polansky, A.M. (2005). A General Framework for Constructing Control Charts. Quality and Reliability Engineering International, 21, 633-653. Qiu, P. (2008). Distribution-Free Multivariate Process Control Based on Log-Linear Modeling. IIE Transactions, 40(7), 664-691. Qiu, P., & Hawkins, D. (2001). A Rank-Based Multivariate CUSUM Procedure. Technometrics 43(2), 120-132. Qiu, P., & Hawkins, D. (2003). A Nonparametric Multivariate Cumulative Sum Procedure for Detecting Shifts In All Directions. The Statistician, 52(2), 151-164. Quesenberry, C.P. (1997). SPC Methods for Quality Improvement. New York: Wiley. Rafalin, E.K. (2005). Algorithms and Analysis of Depth Functions Using Computational Geometry. Dissertation, Tufts University. Rousseeuw, P.J. (1984). Least Median of Squares Regression. Journal of the American Statistical Association, 79, 871-880. Rousseeuw, P.J., & Ruts, I. (1996). Algorithm AS 307: Bivariate Location Depth. Applied Statistics, 45, 516-526. Rousseeuw, P.J., & Van Driessen, K. (1999). A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics, 41, 212-223. 120 Rousseeuw, P.J., & Van Zomeren, B.C. (1990). Unmasking Multivariate Outliers and Leverage Points. Journal of the American Statistical Association, 85, 633-651. Schaffer, J.R. (1998, August). A Multivariate Application of the Q Chart. Paper presented at the 1998 Joint Statistical Meetings, Dallas, TX. Serfling, R. (2002). A Depth Function and a Scale Curve Based on Spatial Quantiles. In Y. Dodge (Ed.), Statistical Data Analysis Based on the L1-Norm and Related Methods (pp. 25-28). Berlin, Germany: Birkh?user. Serfling, R. (2006). Depth Functions in Nonparametric Multivariate Inference. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science (Vol. 72, pp. 1-16). Providence, RI: American Mathematical Society. Serfling, R. (2010). Equivariance and Invariance Properties of Multivariate Quantile and Related Functions, and the Role of Standardization. Journal of Nonparametric Statistics, 22, 915- 936. Serfling, R., & Zuo, Y. (2010). Discussion. The Annals of Statistics, 38(2), 676-684. Shewhart, W.A. (1939). Statistical Method from the Viewpoint of Quality Control. New York: Dover Publications. Stoumbos, Z. G., & Jones, L. A. (2000). On the Properties and Design of Individuals Control Charts Based on Simplicial Depth. Nonlinear Studies, 7(2), 147-178. Stoumbos, Z. G., & Sullivan, J.H. (2002). Robustness to Non-Normality of the Multivariate EWMA Control Chart. Journal of Quality Technology, 34(3), 260-276. Sullivan, J.H., & Woodall, W.H. (1996). A Comparison of Multivariate Control Charts for Individual Observations. Journal of Quality Technology, 28(4), 398-408. Sullivan, J.H., & Woodall, W.H. (1998). Adapting Control Charts for the Preliminary Analysis of Multivariate Observations. Communications in Statistics ? Simulation and Computation, 27(4), pp. 953-979. Sullivan, J.H., & Jones, L.A. (2002). A Self-Starting Control Chart for Multivariate Individual Observations. Technometrics, 44(1), 24-33. Sun, R., & Tsung, F. (2003). A Kernel-Distance-Based Multivariate Control Chart Using Support Vector Methods. International Journal of Production Research, 41(13), 2975- 2989. Teng, H.C. (2000). New Methodology in Regression and Multivariate Quality Control Via Data Depth. Dissertation, Rutgers University. 121 Testik, M.C., Runger, G.C., & Borror, C.M. (2003). Robustness Properties of Multivariate EWMA Control Charts. Quality and Reliability Engineering International, 19, 31-38. Testik, M.C., & Borror, C.M. (2004). Design Strategies for the Multivariate Exponentially Weighted Moving Average Control Chart. Quality and Reliability Engineering International, 20, 571-577. Thissen, U., Swierenga, H., de Weijer, A.P., Wehrens, R., Melssen, W.J., & Buydens, L.M.C. (2005). Multivariate Statistical Process Control Using Mixture Modelling [sic]. Journal of Chemometrics, 19, 23-31. Tracy, N.D., Young, J.C., & Mason, R.L. (1992). Multivariate Control Charts for Individual Observations. Journal of Quality Technology, 24, 88-95. Tukey, J. W. (1975). Mathematics and Picturing Data. In R. James (Ed.), Proceedings of the 1974 International Congress of Mathematicians (Vol. 2, pp. 523-531). Vancouver, BC. Vardi, Y., & Zhang, C. (2000). The Multivariate L1-Median and Associated Data Depth. Proceedings of the National Academy of Sciences of the USA, 97(4), 1423-1426. Vargas, J.A. (2003). Robust Estimation in Multivariate Control Charts for Individual Observations. Journal of Quality Technology, 35(4), 367-376. Wierda, S.J. (1994). Multivariate Statistical Process Control ? Recent Results and Directions for Future Research. Statistica Neerlandica, 48, 147-168. Willems, G., Pison, G., Rousseeuw, P.J., & Van Aelst, S. (2002). A Robust Hotelling Test. Metrika, 55, 125-138. Wood, M., Kaye, M., & Capon, N. (1999). The Use of Resampling for Estimating Control Chart Limits. Journal of the Operational Research Society, 50, 651-659. Woodall, W.H., & Montgomery, D.C. (1999). Research Issues and Ideas in Statistical Process Control. Journal of Quality Technology, 31(4), 376-386. Yanez, S., Gonzalez, N., & Vargas, J.A. (2010). Hotelling's T2 Control Charts Based on Robust Estimators. Dyna, 163, 239-247. Zamba, K.D., & Hawkins, D.M. (2006). A Multivariate Change-Point Model for Statistical Process Control. Technometrics, 48(4), 539-549. Zarate, P.B. (2004). Design of Nonparametric Control Chart for Monitoring Multivariate Processes Using Principal Components Analysis and Data Depth. Dissertation, University of South Florida. 122 Zuo, Y. (2003). Projection-Based Depth Functions and Associated Medians. The Annals of Statistics, 31(5), 1460-1490. Zuo, Y., & He, X. (2006). On the Limiting Distributions of Multivariate Depth-Based Rank Sum Statistics and Related Tests. The Annals of Statistics, 34(6), 2879-2896. Zuo, Y., & Serfling, R. (2000). General Notions of Statistical Depth Functions. The Annals of Statistics, 28(2), 461-482. 123 Appendices Appendix A: MATLAB Code for Computing Robust Mahalanobis Depth Appendix B: MATLAB Code for Computing Mahalanobis Spatial Depth Appendix C: Expanded Table of Empirical UCLs for the MMR Chart Appendix D: MATLAB Code for Finding Empirical UCLs for the MMR Chart Appendix E: Empirical UCLs for Hotelling's T2 Chart Appendix F: MATLAB Code for Finding Empirical UCLs for Hotelling's T2 Chart Appendix G: MATLAB Code for Assessing MMR Chart Performance Appendix H: MATLAB Code for Assessing Hotelling's T2 Chart Performance Appendix I: Simulation Results Using In-Control Symmetric Data Appendix J: Simulation Results Using Symmetric Data with an IS in p = 2 Appendix K: Simulation Results Using Symmetric Data with an IS in p = 5 Appendix L: Simulation Results Using Symmetric Data with an IS in p = 10 Appendix M: Simulation Results Using Symmetric Data with a 5% SS in p = 2 Appendix N: Simulation Results Using Symmetric Data with a 15% SS in p = 2 Appendix O: Simulation Results Using Symmetric Data with a 30% SS in p = 2 Appendix P: Simulation Results Using Symmetric Data with a 5% SS in p = 10 Appendix Q: Simulation Results Using Symmetric Data with a 15% SS in p = 10 Appendix R: Simulation Results Using Symmetric Data with a 30% SS in p = 10 Appendix S: Simulation Results Using In-Control Skewed Data 124 Appendix T: Simulation Results Using Skewed Data with an IS in p = 2 Appendix U: Simulation Results Using Skewed Data with an IS in p = 5 Appendix V: Simulation Results Using Skewed Data with a 5% SS in p = 2 Appendix W: Simulation Results Using Skewed Data with a 15% SS in p = 2 Appendix X: Simulation Results Using Skewed Data with a 30% SS in p = 2 Appendix Y: Simulation Results Using Skewed Data with a SS in p = 5 Appendix Z: Subgroup Size Analysis Using In-Control Data Appendix AA: Subgroup Size Analysis Using Data with an IS in p = 5 Appendix BB: Subgroup Size Analysis Using Data with a 15% SS in p = 5 125 Appendix A: MATLAB Code for Computing Robust Mahalanobis Depth function depth=computeRMDv1(X,Xbar_robust,S_robust) % Computes the Robust Mahalanobis Depth (RMD) of each point in a multivariate data set. % Adapted by Richard Bell on 20100928 from code provided by Satyaki Mazumder on 20100707. % X is the multivariate reference data set. % Xbar_robust is the robust location estimate. % S_robust is the robust scatter estimate. % Version 2 uses the square root in the Mahalanobis distance computation, whereas Version 1 does not. rows=length(X(:,1)); % identify the number of rows in the sample data set depth=zeros(rows,1); % initialize the (rows x 1) vector of depth values for speed for i=1:rows depth(i)=1/(1+((X(i,:)-Xbar_robust)/S_robust*(X(i,:)-Xbar_robust)')); % compute the RMD for each observation in the sample; don't use the "mahal" function in MATLAB because it uses the (nonrobust) sample mean vector and covariance matrix end 126 Appendix B: MATLAB Code for Computing Mahalanobis Spatial Depth function depth=computeMSDfast(X,S_robust) % Computes the Mahalanobis Spatial Depth of each point in a multivariate data set. % Adapted by Richard Bell on 20100928 from code provided by Satyaki Mazumder on 20100707. % X is the (N x p) multivariate reference data set. % S_robust is the (p x p) robust scatter matrix, raised to the -1/2 power and used as the transformation-retransformation functional. Xtr=X/(sqrtm(S_robust)); % transform the data using the TR functional [rows,cols]=size(Xtr); % store the dimensions of the transformed data set depth=zeros(rows,1); % initialize the vector of depth values for speed % implementation of the Mahalanobis Spatial Depth function for i=1:rows % perform the outer loop for each x e=zeros(rows,cols); % initialize the matrix of unit vectors from x to all Xi's in the sample for j=1:rows % perform the inner loop to compare each x to all Xi's in the sample (including itself) Euclid=norm(Xtr(i,:)-Xtr(j,:)); % compute the Euclidean distance between the current x and all Xi's if (Euclid~=0) e(j,:)=(Xtr(i,:)-Xtr(j,:))/Euclid; % if the Euclidean distance is nonzero, use it to normalize the distance between the current x and all other Xi's in the sample else e(j,:)=0; % if the Euclidean distance is zero, x is being compared to itself so the normalized distance is zero end end % end of inner loop depth(i)=1-norm(mean(e)); % compute Mahalanobis Spatial Depth of the point x as one minus the average of the unit vectors from x to all Xi's in the sample end % end of outer loop 127 Appendix C: Expanded Table of Empirical UCLs for the MMR Chart UCL S i m u l at e d F A P UCL S i m u l at e d F A P 20 5 2.476 0.094 1 2.650 0.048 6 20 10 2.519 0.098 4 2.737 0.047 6 30 5 2.581 0.096 4 2.749 0.047 1 30 10 2.642 0.097 5 2.849 0.047 7 40 5 2.650 0.098 4 2.815 0.048 8 40 10 2.724 0.098 2 2.925 0.048 4 50 5 2.702 0.098 3 2.861 0.048 7 50 10 2.787 0.098 1 2.980 0.048 0 60 5 2.743 0.097 4 2.895 0.048 5 60 10 2.840 0.098 3 3.030 0.048 6 70 5 2.776 0.097 2 2.924 0.047 2 70 10 2.881 0.098 2 3.065 0.048 5 80 5 2.810 0.096 7 2.949 0.048 7 80 10 2.917 0.098 2 3.100 0.048 7 90 5 2.831 0.096 1 2.969 0.048 5 90 10 2.946 0.098 3 3.127 0.048 9 100 5 2.854 0.098 2 2.992 0.048 3 100 10 2.972 0.097 4 3.150 0.048 8 110 5 2.872 0.098 0 3.008 0.048 2 110 10 2.998 0.096 7 3.176 0.048 8 120 5 2.890 0.097 4 3.019 0.048 9 120 10 3.022 0.098 0 3.198 0.047 8 130 5 2.904 0.097 5 3.038 0.047 9 130 10 3.042 0.098 4 3.214 0.048 0 140 5 2.919 0.098 4 3.048 0.048 6 140 10 3.060 0.096 9 3.226 0.048 9 150 5 2.932 0.098 3 3.057 0.048 6 150 10 3.076 0.097 1 3.244 0.048 6 160 5 2.945 0.098 0 3.067 0.048 8 160 10 3.088 0.098 4 3.262 0.048 4 170 5 2.953 0.098 3 3.082 0.047 7 170 10 3.104 0.097 6 3.274 0.048 8 180 5 2.964 0.098 2 3.089 0.048 8 180 10 3.119 0.097 7 3.285 0.048 5 190 5 2.977 0.096 1 3.098 0.048 3 190 10 3.134 0.098 3 3.300 0.048 6 200 5 2.985 0.098 1 3.104 0.048 5 200 10 3.144 0.098 5 3.310 0.048 2 m n D e s i r e d F A P = 0.10 D e s i r e d F A P = 0.05 128 Appendix D: MATLAB Code for Finding Empirical UCLs for the MMR Chart %=========================================================================% % FINDING EMPIRICAL CONTROL LIMITS FOR THE MMR CHART % %=========================================================================% % -Created by Richard Bell on 3/1/2011; last updated on 3/22/2011. % % -Variables named for robust Mahalanobis depth (RMD) are used here, % % although this file is not reliant on any particular depth measure. % %=========================================================================% %>>>>> INSTRUCTIONS: Start with 10k iterations to get a ballpark estimate, then fine-tune with 100k iterations. clear all % clear all objects in the MATLAB workspace clc % clear the output screen %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % AUTOMATED INPUTS (for simulating multiple scenarios using an input file) % read in m, n, UCL, shift size, and p from an Excel file iterations=100000; % number of simulation iterations to be performed input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','A1:C50'); inputRows=length(input(:,1)); % determine the number of rows of data in the input file APtable=zeros(inputRows,3); % initialize the array of estimated alarm probability (AP) values for speed for row=1:inputRows % perform the simulation below for each m, n, p, UCL, and shift size combination in the input file m=input(row,1); % read in the desired value for sample size (m) n=input(row,2); % read in the desired value for subgroup size (n) UCL=input(row,3); % read in the upper control limit N=m*n; % determine the pooled sample size (=m in the case of individual observations) AP=1; % initialize the AP to 1 so at least one repetition of the UCL search will be performed reps=0; % initialize the counter for the number of repetitions required to find the optimal UCL %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%% GENERATE DATA AND COMPUTE ROBUST ESTIMATES %%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% while AP > 0.0985 % set the threshold AP based on the lower limit of an upper 95% CI for a proportion UCL=UCL+0.001; % set the desired increment for each iteration of the UCL search; use 0.10 first, then 0.01 and 0.001 to refine reps=reps+1; % count the number of repetitions required to find the optimal UCL 129 count=0; % initialize the counter for the number of iterations performed alarmCount=0; % initialize the alarm counter while count < iterations % run the entire loop for a set number of iterations %=====> SIMULATE UNIFORM(0,1) NUMBERS REPRESENTING DEPTH VALUES FROM 0 TO 1 X=unifrnd(0,1,[N,1]); %=====> PARTITION DATA INTO SUBGROUPS % assign a subgroup identifier to each simulated data point i=1; % start with the first observation in the data set assigned=0; % initialize the total number of observations which have been assigned subgroups ID=1; % initialize the subgroup identifier for the first subgroup subgroup=zeros(N,1); % initialize the N x 1 vector of subgroup identifiers for speed while assigned <= N-n % perform loop until all observations in the data set have been assigned subgroup identifiers size=0; % initialize the number of observations contained in each subgroup while size < n % perform loop until each subgroup reaches size n subgroup(i)=ID; % assign the subgroup identifier "ID" to an observation size=size+1; % increment the number of observations in the current subgroup i=i+1; % move to the next observation end ID=ID+1; % increment the subgroup identifier assigned=assigned+n; % increment the total number of observations which have been assigned subgroups end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% RANK DATA AND COMPUTE SUBGROUP MEAN RANKS %%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % rank each uniform random number generated RMDrank=tiedrank(X); % use the midrank method in the event of a tie; MATLAB default is to rank from smallest (rank=1) to largest (rank=N) % compute subgroup mean ranks subgroup(N+1)=0; % create a fictitious subgroup identifier for the nonexistent (N+1)st rank so the following while loop doesn't cause an error at the Nth rank in the data set RMDtotal=0; % initialize the total RMD rank for the first subgroup to 0 i=1; % initialize the index for the N x 1 vector of ranks resulting from the depth function k=1; % initialize the index for the m x 1 vector of subgroup mean ranks to be computed alarm=0; % initialize the number of RMD alarms to 0 130 RMDsubgrpAvg=zeros(m,1); % initialize the m x 1 vector of RMD subgroup mean ranks for speed while i <= N % perform loop for all N ranks resulting from application of the depth function j=i; % initialize the rank identifier to point to the first observation in each subgroup RMDtotal=RMDrank(j); % initialize the total RMD rank for each subgroup to be the first rank in the subgroup while subgroup(j)==subgroup(j+1) % perform loop until the subgroup identifier changes RMDtotal=RMDtotal+RMDrank(j+1); % add the next RMD rank in the current subgroup to the total j=j+1; % increment the rank identifier by 1 end RMDsubgrpAvg(k)=RMDtotal/n; % compute the average subgroup RMD rank for the current subgroup k=k+1; % increment the index for the vector of subgroup mean ranks i=i+n; % count the number of ranks for which subgroup averages have been computed in order to regulate the while loop end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % COMPARE STANDARDIZED SUBGROUP MEAN RANKS TO CONTROL LIMITS %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % compute the theoretical mean and variance of subgroup mean ranks ExpRbar=(N+1)/2; % compute the expected value of the subgroup mean rank VarRbar=((N-n)*(N+1))/(12*n); % compute the variance of the subgroup mean rank Z_RMD=zeros(m,1); % initialize the m x 1 vector of standardized subgroup RMD mean ranks % standardize subgroup mean ranks resulting from the RMD function and compare to the UCL for i = 1:m % perform loop for all m subgroup mean ranks if alarm==0 % continue loop as long as no alarms occur Z_RMD(i)=(RMDsubgrpAvg(i)-ExpRbar)/sqrt(VarRbar); % standardize each subgroup mean rank if Z_RMD(i)>UCL % compare each standardized subgroup mean rank statistic to the UCL alarm=1; % signal if a standardized subgroup mean rank falls above the UCL end end end if alarm==1 alarmCount=alarmCount+1; % if a control chart issues an alarm, increment the counter representing total alarms for all iterations end count=count+1; % increment counter for total number of iterations performed end 131 AP=alarmCount/iterations; % estimate the alarm probability (AP) for the current scenario APtable(row,1)=reps; % record the results of each UCL evaluation in a table APtable(row,2)=UCL; APtable(row,3)=AP; disp(APtable); % display AP for the current scenario end % send the results to an Excel file xlswrite('c:\Users\Rich\Documents\OutputFile.xlsx',APtable,'Sheet1','A1'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 132 Appendix E: Empirical UCLs for Hotelling's T2 Chart P r oc e s s D i s t r i b u t i on UCL S i m u l at e d F A P 20 5 11.51 0.097 0 50 5 14.01 0.096 3 100 5 15.88 0.097 6 150 5 17.06 0.097 1 200 5 17.94 0.097 7 20 5 15.66 0.095 4 50 5 27.47 0.097 3 100 5 43.79 0.096 7 150 5 58.02 0.096 5 200 5 70.40 0.097 6 20 5 25.79 0.096 7 50 5 42.34 0.096 7 100 5 68.76 0.095 8 150 5 92.55 0.096 7 200 5 114 .51 0.096 8 100 5 68.76 0.095 8 100 10 61.43 0.098 0 100 15 57.42 0.096 3 100 20 54.69 0.097 0 20 5 42.41 0.097 0 50 5 59.65 0.097 6 100 5 91.52 0.097 1 150 5 123 .51 0.097 0 200 5 154 .03 0.095 9 20 5 19.05 0.097 7 50 5 33.05 0.097 1 100 5 50.32 0.096 4 150 5 64.26 0.097 5 200 5 75.56 0.096 3 20 5 29.73 0.097 8 50 5 45.79 0.097 6 100 5 68.01 0.097 5 150 5 86.07 0.097 3 200 5 102 .01 0.097 7 100 5 68.01 0.097 5 100 10 56.94 0.097 6 100 15 51.12 0.097 4 100 20 47.25 0.097 2 10 2 5 5 5t ( 3) log n or m a l D e s i r e d F A P = 0.10 t ( 3) t ( 3) t ( 3) t ( 10) log n or m a l log n or m a l m np 2 2 5 133 Appendix F: MATLAB Code for Finding Empirical UCLs for Hotelling's T2 Chart %=========================================================================% % FINDING EMPIRICAL CONTROL LIMITS FOR HOTELLING'S T^2 CONTROL CHART % %=========================================================================% % -Created by Richard Bell on 9/15/2010; last updated on 4/26/2011. % % -Based on Hotelling's T2 control chart with Alt's (1976) Phase I UCL % % adjusted for the number of subgroups. % % -File is set up to run multiple scenarios; before using, undesired % % sections must be commented out using "%". % %=========================================================================% %>>>>> INSTRUCTIONS: Start with 10k iterations to get a ballpark estimate, then fine-tune with 50k iterations. clear all % clear all objects in the MATLAB workspace clc % clear the output screen %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % AUTOMATED INPUTS (for simulating multiple scenarios using an input file) % read in m, n, UCL, shift size, and p from an Excel file iterations=50000; % number of simulation iterations to be performed input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','A1:E50'); inputRows=length(input(:,1)); % determine the number of rows of data in the input file APtable=zeros(inputRows,3); % initialize the array of estimated alarm probability (AP) values for speed for row=1:inputRows % perform the simulation below for each m, n, p, UCL, and shift size combination in the input file m=input(row,1); % read in the desired value for sample size (m) n=input(row,2); % read in the desired value for subgroup size (n) UCL=input(row,3); % read in the upper control limit shiftSize=input(row,4); % read in the desired shift size p=input(row,5); % read in the number of variables N=m*n; % determine the pooled sample size (=m in the case of individual observations) AP=1; % initialize the AP to 1 so at least one repetition of the UCL search will be performed reps=0; % initialize the counter for the number of repetitions required to find the optimal UCL %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% GENERATE DATA AND CONSTRUCT HOTELLING'S T2 CHART %%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% while AP > 0.0978 % set the threshold AP based on the lower limit of an upper 95% CI for a proportion 134 UCL=UCL+0.01; % set the desired increment for each iteration of the UCL search; use 1.0 first, then 0.25 and 0.01 to refine reps=reps+1; % count number of repetitions required to find the optimal UCL count=0; % initialize the counter for the number of iterations performed alarmCount=0; % initialize the alarm counter while count < iterations % run the entire loop for a set number of iterations %=====> SIMULATE MULTIVARIATE NORMAL AND MULTIVARIATE T DATA (ELLIPTICAL) % OPTION 1: Simulate in-control data. % multivariate normal distribution alpha=.10; % desired overall false alarm probability (FAP) for the chart alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual comparison UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's Phase I upper control limit for Hotelling's T2 chart mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix X=mvnrnd(mu,sigma,N); % generate multivariate normal data % multivariate t distribution df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix X=mvtrnd(sigma,df,N); % generate multivariate t data with specified degrees of freedom % OPTION 2: Simulate out-of-control data with isolated or sustained shifts of the mean. % multivariate normal -- isolated shift of the mean during the first subgroup only alpha=.10; % desired overall false alarm probability (FAP) for the chart alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual comparison UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's Phase I upper control limit for Hotelling's T2 chart mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvnrnd(mu+shift,sigma,n); % generate the shifted subgroup Xb=mvnrnd(mu,sigma,N-n); % generate the rest of the (unshifted) sample X=vertcat(Xa,Xb); % combine shifted and unshifted data % multivariate t -- isolated shift of the mean during the first subgroup only df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector 135 Xa=mvtrnd(sigma,df,n)+repmat(shift,n,1); % generate the first subgroup and add the shift Xb=mvtrnd(sigma,df,N-n); % generate the rest of the (unshifted) sample X=vertcat(Xa,Xb); % combine shifted and unshifted data % multivariate normal -- sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) alpha=.10; % desired overall false alarm probability (FAP) for the chart alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual comparison UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's Phase I upper control limit for Hotelling's T2 chart percentOC=0.15; % designate the percentage of out-of-control points mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix numberOC=round(percentOC*N); % determine the number of out-of-control points, rounded to the nearest integer shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvnrnd(mu,sigma,N-numberOC); % generate the in-control points Xb=mvnrnd(mu+shift,sigma,numberOC); % generate the out-of-control points X=vertcat(Xa,Xb); % combine shifted and unshifted data % multivariate t -- sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) percentOC=0.15; % designate the percentage of out-of-control points df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix numberOC=round(percentOC*N); % determine the number of out-of-control points, rounded to the nearest integer shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvtrnd(sigma,df,N-numberOC); % generate the in-control points Xb=mvtrnd(sigma,df,numberOC)+repmat(shift,numberOC,1); % generate the out- of-control points X=vertcat(Xa,Xb); % combine shifted and unshifted data %=====> SIMULATE MULTIVARIATE LOGNORMAL DATA (SKEWED) % STEP 1: Simulate uniformly distributed vector of shift directions using algorithm by Johnson (1987), page 127. StdNorm=zeros(1,p); % initialize vector of standard normal random numbers Unif=zeros(1,p); % initialize vector of shift directions for i = 1:p StdNorm(1,i)=normrnd(0,1); % generate p independent standard normal variates end for i = 1:p Unif(1,i)=StdNorm(1,i)/sqrt(sum(StdNorm.^2)); % create vector of shift directions IAW Johnson (1987), page 127 end 136 % STEP 2: Simulate the sample data set and standardize. mu_Y=zeros(1,p); % create a mean vector of all zeros sigma_Y=eye(p); % set the covariance matrix equal to the identity matrix Y=mvnrnd(mu_Y,sigma_Y,N); % simulate N multivariate normal observations X=exp(Y); % transform multivariate normal observations to multivariate lognormal observations % NOTE: THE FOLLOWING RESULTS ONLY APPLY TO MULTIVARIATE LOGNORMAL DATA CREATED USING MULTIVARIATE NORMAL DATA WITH ZERO MEAN VECTOR AND IDENTITY COVARIANCE MATRIX! ExpX=exp(1/2); % compute theoretical expected value of X sigma_X=zeros(p,p); % initialize covariance matrix to all zeros for i=1:p % fill in diagonals of covariance matrix for j=1:p if i==j sigma_X(i,j)=exp(1)*(exp(1)-1); % from Law and Kelton (2000), page 382 end end end X=(X-ExpX)/sqrtm(sigma_X); % standardize multivariate lognormal observations to have zero mean vector and identity covariance matrix % STEP 3: Scale the vector of shift directions to achieve a specified noncentrality parameter. sigma_X=eye(p); % specify theoretical covariance matrix of standardized data Unif=shiftSize*Unif; % scale the directional shift vector NCP=sqrt(Unif/sigma_X*Unif'); % check the noncentrality parameter to ensure it equals the desired value % STEP 4: Induce isolated or sustained shifts of the mean. % isolated shift of the mean during the first subgroup only Xa=X(1:n,:)+repmat(Unif,n,1); % replicate the shift vector n times and add it to the first subgroup Xb=X(n+1:N,:); % identify the remaining (unshifted) observations in the data set X=vertcat(Xa,Xb); % combine shifted and unshifted data % sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) percentOC=0.15; % designate the percentage of out-of-control points numberOC=round(percentOC*N); % determine the number of in-control points, rounded to the nearest integer Xa=X(1:(N-numberOC),:); % identify unshifted observations in the data set Xb=X(N-numberOC+1:N,:)+repmat(Unif,numberOC,1); % replicate the shift vector and add it to the remaining observations X=vertcat(Xa,Xb); % combine shifted and unshifted data 137 %=====> PARTITION DATA INTO SUBGROUPS % assign a subgroup identifier to each simulated data point i=1; % start with the first observation in the data set assigned=0; % initialize the total number of observations which have been assigned subgroups ID=1; % initialize the subgroup identifier for the first subgroup subgroup=zeros(N,1); % initialize the N x 1 vector of subgroup identifiers for speed while assigned <= N-n % perform loop until all observations in the data set have been assigned subgroup identifiers size=0; % initialize the number of observations contained in each subgroup while size < n % perform loop until each subgroup reaches size n subgroup(i)=ID; % assign the subgroup identifier "ID" to an observation size=size+1; % increment the number of observations in the current subgroup i=i+1; % move to the next observation end ID=ID+1; % increment the subgroup identifier assigned=assigned+n; % increment the total number of observations which have been assigned subgroups end %=====> COMPUTE ROBUST ESTIMATES OF LOCATION AND SCATTER subgroupMeans=zeros(m,p); % initialize the matrix of individual subgroup mean vectors totalMeans=zeros(1,p); % initialize the total of all subgroup mean vectors totalCovs=zeros(p,p); % initialize the total of all subgroup covariance matrices subgroup(N+1)=0; % create a fictitious subgroup for the nonexistent (N+1)st observation so the following while loop doesn't cause an error at the Nth observation i=1; % initialize the index for the N x p vector of observations while i <= N % perform loop for all N observations currentSubgroup=X(i,:); % start with first observation in the data set j=i; % initialize the subgroup index to point to the first observation in each subgroup while subgroup(j)==subgroup(j+1) % perform loop until the subgroup identifier changes (this is where the fake subgroup is needed) currentSubgroup=cat(1,currentSubgroup,X(j+1,:)); % combine individual observations into their respective subgroups j=j+1; % increment the subgroup index by 1 end subgroupMeans(j/n,:)=mean(currentSubgroup); % store individual subgroup means in a vector totalMeans=totalMeans+subgroupMeans(j/n,:); % keep a running total of all subgroup mean vectors totalCovs=totalCovs+cov(currentSubgroup); % keep a running total of all subgroup covariance matrices i=i+n; % count the number of observations for which subgroup averages have been computed in order to regulate the while loop end 138 Xbar_robust=totalMeans/m; % compute average of subgroup means; serves as unbaised estimate of mean vector S_robust=totalCovs/m; % compute average of subgroup variances; serves as unbiased estimate of covariance matrix %=====> COMPUTE HOTELLING'S T2 STATISTICS AND COMPARE TO UCL alarm=0; % initialize indicator variable representing an alarm (=1) or no alarm (=0) T2vector=zeros(m,1); % initialize vector of T2 statistics for i=1:m if alarm==0 % continue loop as long as no false alarms occur T2stat=n*(subgroupMeans(i,:)-Xbar_robust)/S_robust*(subgroupMeans(i,:)- Xbar_robust)'; % compute T2 control statistic T2vector(i)=T2stat; % store T2 control statistics in a vector if T2stat > UCL alarm=1; % issue a false alarm if the T2 control statistic exceeds the UCL end end end if alarm==1 alarmCount=alarmCount+1; % if a control chart issues a false alarm, increment the counter representing total false alarms for all iterations end count=count+1; % increment the counter for the total number of iterations performed end AP=alarmCount/iterations; % estimate the alarm probability (AP) for the current scenario APtable(row,1)=reps; % record the results of each UCL evaluation in a table APtable(row,2)=UCL; APtable(row,3)=AP; disp(APtable); % display AP table for Hotelling's T2 chart on screen, if desired end % send the results to an Excel file xlswrite('c:\Users\Rich\Documents\OutputFile.xlsx',APtable,'Sheet1','A1'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 139 Appendix G: MATLAB Code for Assessing MMR Chart Performance %=========================================================================% % MULTIVARIATE MEAN-RANK (MMR) CONTROL CHART PROGRAM FILE % %=========================================================================% % -Created by Richard Bell on 9/18/2010; last updated on 3/1/2011. % % -Can be modified to find empirical APs for specified scenarios, % % determine empirical UCLs for specific distributions, or construct % % control charts for preliminary data sets. % % -File is set up to run multiple scenarios; before using, undesired % % sections must be commented out using "%". % %=========================================================================% clear all % clear all objects in the MATLAB workspace clc % clear the output screen %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % AUTOMATED INPUTS (for simulating multiple scenarios using an input file) % read in m, n, control limits, shift size, and p from an Excel file iterations=10000; % number of simulation iterations to be performed input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','A1:E50'); inputRows=length(input(:,1)); % determine the number of rows of data in the input file RMD_APtable=zeros(inputRows,1); % initialize the array of estimated alarm probability (AP) values for the MMR chart using RMD MSD_APtable=zeros(inputRows,1); % initialize the array of estimated alarm probability (AP) values for the MMR chart using MSD for row=1:inputRows % perform the simulation below for each m, n, UCL, shift size, and p combination in the input file m=input(row,1); % read in the desired value for sample size (m) n=input(row,2); % read in the desired value for subgroup size (n) UCL=input(row,3); % read in the upper control limit (UCL) corresponding to the m,n combination shiftSize=input(row,4); % read in the size of the desired shift p=input(row,5); % read in the number of variables N=m*n; % determine the pooled sample size (=m in the case of individual observations) count=0; % initialize the counter for the number of iterations performed RMDalarmCount=0; % initialize the alarm counter for the RMD function MSDalarmCount=0; % initialize the alarm counter for the MSD function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%% GENERATE DATA AND COMPUTE ROBUST ESTIMATES %%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% while count < iterations % run the entire loop for a set number of iterations 140 %=====> SIMULATE MULTIVARIATE NORMAL AND MULTIVARIATE T DATA (ELLIPTICAL) % OPTION 1: Simulate in-control data. % multivariate normal distribution mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix X=mvnrnd(mu,sigma,N); % generate multivariate normal data % multivariate t distribution df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix X=mvtrnd(sigma,df,N); % generate multivariate t data with specified degrees of freedom % OPTION 2: Simulate out-of-control data with isolated or sustained shifts of the mean. % multivariate normal -- isolated shift of the mean during the first subgroup only mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvnrnd(mu+shift,sigma,n); % generate the shifted subgroup Xb=mvnrnd(mu,sigma,N-n); % generate the rest of the (unshifted) sample X=vertcat(Xa,Xb); % combine shifted and unshifted data % multivariate t -- isolated shift of the mean during the first subgroup only df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvtrnd(sigma,df,n)+repmat(shift,n,1); % generate the first subgroup and add the shift Xb=mvtrnd(sigma,df,N-n); % generate the rest of the (unshifted) sample X=vertcat(Xa,Xb); % combine shifted and unshifted data % multivariate normal -- sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) percentOC=0.15; % designate the percentage of out-of-control points mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix numberOC=round(percentOC*N); % determine the number of out-of-control points, rounded to the nearest integer shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvnrnd(mu,sigma,N-numberOC); % generate the in-control points Xb=mvnrnd(mu+shift,sigma,numberOC); % generate the out-of-control points X=vertcat(Xa,Xb); % combine shifted and unshifted data 141 % multivariate t -- sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) percentOC=0.15; % designate the percentage of out-of-control points df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix numberOC=round(percentOC*N); % determine the number of out-of-control points, rounded to the nearest integer shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvtrnd(sigma,df,N-numberOC); % generate the in-control points Xb=mvtrnd(sigma,df,numberOC)+repmat(shift,numberOC,1); % generate the out- of-control points X=vertcat(Xa,Xb); % combine shifted and unshifted data %=====> SIMULATE MULTIVARIATE LOGNORMAL DATA (SKEWED) % STEP 1: Simulate uniformly distributed vector of shift directions using algorithm by Johnson (1987), page 127. StdNorm=zeros(1,p); % initialize vector of standard normal random numbers Unif=zeros(1,p); % initialize vector of shift directions for i = 1:p StdNorm(1,i)=normrnd(0,1); % generate p independent standard normal variates end for i = 1:p Unif(1,i)=StdNorm(1,i)/sqrt(sum(StdNorm.^2)); % create vector of shift directions IAW Johnson (1987), page 127 end % STEP 2: Simulate the sample data set and standardize. mu_Y=zeros(1,p); % create a mean vector of all zeros sigma_Y=eye(p); % set the covariance matrix equal to the identity matrix Y=mvnrnd(mu_Y,sigma_Y,N); % simulate N multivariate normal observations X=exp(Y); % transform multivariate normal observations to multivariate lognormal observations % NOTE: THE FOLLOWING RESULTS ONLY APPLY TO MULTIVARIATE LOGNORMAL DATA CREATED USING MULTIVARIATE NORMAL DATA WITH ZERO MEAN VECTOR AND IDENTITY COVARIANCE MATRIX! ExpX=exp(1/2); % compute theoretical expected value of X VarX=exp(1)*(exp(1)-1); % compute theoretical variance of X X=(X-ExpX)/sqrt(VarX); % standardize multivariate lognormal observations to have zero mean vector and identity covariance matrix (can use this method since the observations are independent) 142 % STEP 3: Scale the vector of shift directions to achieve a specified noncentrality parameter. sigma_X=eye(p); % specify theoretical covariance matrix of standardized data Unif=shiftSize*Unif; % scale the directional shift vector NCP=sqrt(Unif/sigma_X*Unif'); % check the noncentrality parameter to ensure it equals the desired value % STEP 4: Induce isolated or sustained shifts of the mean. % isolated shift of the mean during the first subgroup only Xa=X(1:n,:)+repmat(Unif,n,1); % replicate the shift vector n times and add it to the first subgroup Xb=X(n+1:N,:); % identify the remaining (unshifted) observations in the data set X=vertcat(Xa,Xb); % combine shifted and unshifted data % sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) percentOC=0.15; % designate the percentage of out-of-control points numberOC=round(percentOC*N); % determine the number of in-control points, rounded to the nearest integer Xa=X(1:(N-numberOC),:); % identify the unshifted observations in the data set Xb=X(N-numberOC+1:N,:)+repmat(Unif,numberOC,1); % replicate the shift vector and add it to the remaining observations X=vertcat(Xa,Xb); % combine shifted and unshifted data %=====> PARTITION DATA INTO SUBGROUPS % assign a subgroup identifier to each simulated data point i=1; % start with the first observation in the data set assigned=0; % initialize the total number of observations which have been assigned subgroups ID=1; % initialize the subgroup identifier for the first subgroup subgroup=zeros(N,1); % initialize the N x 1 vector of subgroup identifiers for speed while assigned <= N-n % perform loop until all observations in the data set have been assigned subgroup identifiers size=0; % initialize the number of observations contained in each subgroup while size < n % perform loop until each subgroup reaches size n subgroup(i)=ID; % assign the subgroup identifier "ID" to an observation size=size+1; % increment the number of observations in the current subgroup i=i+1; % move to the next observation end ID=ID+1; % increment the subgroup identifier assigned=assigned+n; % increment the total number of observations which have been assigned subgroups end 143 %=====> COMPUTE ROBUST ESTIMATES USING HOTELLING'S T^2 OR BACON METHODS % OPTION 1: Hotelling's T^2 Method totalMeans=zeros(1,p); % initialize the total of all subgroup mean vectors totalCovs=zeros(p,p); % initialize the total of all subgroup covariance matrices subgroup(N+1)=0; % create a fictitious subgroup for the nonexistent (N+1)st observation so the following while loop doesn't cause an error at the Nth observation i=1; % initialize the index for the N x p vector of observations while i <= N % perform loop for all N observations currentSubgroup=X(i,:); % start with first observation in the data set j=i; % initialize the subgroup identifier to point to the first observation in each subgroup while subgroup(j)==subgroup(j+1) % perform loop until the subgroup identifier changes currentSubgroup=cat(1,currentSubgroup,X(j+1,:)); % combine individual observations into their respective subgroups j=j+1; % increment the subgroup identifier by 1 end totalMeans=totalMeans+mean(currentSubgroup); % keep a running total of all subgroup mean vectors totalCovs=totalCovs+cov(currentSubgroup); % keep a running total of all subgroup covariance matrices i=i+n; % count the number of observations for which subgroup averages have been computed in order to regulate the while loop end Xbar_robust=totalMeans/m; % compute average of subgroup means; serves as unbaised estimate of mean vector S_robust=totalCovs/m; % compute average of subgroup variances; serves as unbiased estimate of covariance matrix % OPTION 2: BACON method for estimating mean vector and covariance matrix out=baconV(X,1,.10,4); % compute BACON estimates for location and scatter using Mahalanobis distance, alpha=0.05, and c=4; use version 2 (Euclidean distance) if expected contamination exceeds 20 percent Xbar_robust=out.center3; % BACON estimate for mean vector S_robust=out.cov3; % BACON estimate for covariance matrix %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%% RANK DATA USING DATA DEPTH %%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % NOTE: The following code simultaneously applies both the robust Mahalanobis depth (RMD) and Mahalanobis spatial depth (MSD) functions to the same data set. [RMD]=computeRMDv1(X,Xbar_robust,S_robust); % compute the Robust Mahalanobis Depth of each point in the sample 144 RMDrank_interim=tiedrank(RMD); % rank each depth value; use the midrank method in the event of a tie; MATLAB default is to rank from smallest (rank=1) to largest (rank=N) RMDrank=N-RMDrank_interim+1; % following data depth convention, adjust the ranks to go from largest depth value (rank=1) to smallest depth value (rank=N) [MSD]=computeMSDfast(X,S_robust); % compute the Mahalanobis Spatial Depth of each point in the sample MSDrank_interim=tiedrank(MSD); % rank each depth value; use the midrank method in the event of a tie; MATLAB default is to rank from smallest (rank=1) to largest (rank=N) MSDrank=N-MSDrank_interim+1; % following data depth convention, adjust the ranks to go from largest depth value (rank=1) to smallest depth value (rank=N) % compute subgroup mean ranks subgroup(N+1)=0; % create a fictitious subgroup identifier for the nonexistent (N+1)st rank so the following while loop doesn't cause an error at the Nth rank in the data set RMDtotal=0; % initialize the total RMD rank for the first subgroup to 0 MSDtotal=0; % initialize the total MSD rank for the first subgroup to 0 i=1; % initialize the index for the N x 1 vector of ranks resulting from the depth function k=1; % initialize the index for the m x 1 vector of subgroup mean ranks to be computed RMDalarm=0; % initialize the number of RMD alarms to 0 MSDalarm=0; % initialize the number of MSD alarms to 0 RMDsubgrpAvg=zeros(m,1); % initialize the m x 1 vector of RMD subgroup mean ranks for speed MSDsubgrpAvg=zeros(m,1); % initialize the m x 1 vector of MSD subgroup mean ranks for speed while i <= N % perform loop for all N ranks resulting from application of the depth function j=i; % initialize the rank identifier to point to the first observation in each subgroup RMDtotal=RMDrank(j); % initialize the total RMD rank for each subgroup to be the first rank in the subgroup MSDtotal=MSDrank(j); % initialize the total MSD rank for each subgroup to be the first rank in the subgroup while subgroup(j)==subgroup(j+1) % perform loop until the subgroup identifier changes RMDtotal=RMDtotal+RMDrank(j+1); % add the next RMD rank in the current subgroup to the total MSDtotal=MSDtotal+MSDrank(j+1); % add the next MSD rank in the current subgroup to the total j=j+1; % increment the rank identifier by 1 end RMDsubgrpAvg(k)=RMDtotal/n; % compute the average subgroup RMD rank for the current subgroup 145 MSDsubgrpAvg(k)=MSDtotal/n; % compute the average subgroup MSD rank for the current subgroup k=k+1; % increment the index for the vector of subgroup mean ranks i=i+n; % count the number of ranks for which subgroup averages have been computed in order to regulate the while loop end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%% COMPARE STANDARDIZED SUBGROUP MEAN RANKS TO CONTROL LIMITS %%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % compute the theoretical mean and variance of subgroup mean ranks ExpRbar=(N+1)/2; % compute the expected value of the subgroup mean rank VarRbar=((N-n)*(N+1))/(12*n); % compute the variance of the subgroup mean rank Z_RMD=zeros(m,1); % initialize the m x 1 vector of standardized subgroup RMD mean ranks Z_MSD=zeros(m,1); % initialize the m x 1 vector of standardized subgroup MSD mean ranks % standardize subgroup mean ranks resulting from the RMD function and compare to the UCL for i = 1:m % perform loop for all m subgroup mean ranks if RMDalarm==0 % continue loop as long as no alarms occur; no need to perform further computations once an alarm occurs (for example, recall that FAP is the probability of observing ONE OR MORE signals from a control chart when the process is in control, so the total number of false alarms on a single chart is irrelevant; same concept applies to EAP in out-of-control scenarios) Z_RMD(i)=(RMDsubgrpAvg(i)-ExpRbar)/sqrt(VarRbar); % standardize each subgroup mean rank if Z_RMD(i)>UCL % compare each standardized subgroup mean rank statistic to the UCL RMDalarm=1; % signal if a standardized subgroup mean rank falls above UCL end end end if RMDalarm==1 RMDalarmCount=RMDalarmCount+1; % if a control chart issues an alarm, increment the counter representing total alarms for all iterations end % standardize subgroup mean ranks resulting from the MSD function and compare to the UCL for i = 1:m if MSDalarm==0 Z_MSD(i)=(MSDsubgrpAvg(i)-ExpRbar)/sqrt(VarRbar); if Z_MSD(i)>UCL MSDalarm=1; end end end 146 if MSDalarm==1 MSDalarmCount=MSDalarmCount+1; end count=count+1; % increment the counter for the total number of iterations performed end % record results for both RMD and MSD methods RMD_AP=RMDalarmCount/iterations; % estimate the RMD alarm probability (AP) for the current scenario and store in an array RMD_APtable(row,1)=RMD_AP; MSD_AP=MSDalarmCount/iterations; % estimate the MSD AP for the current scenario and store in an array MSD_APtable(row,1)=MSD_AP; disp('EAP Table for MMR-RMD'); disp(RMD_APtable); % display AP table for MMR chart using RMD on screen, if desired disp('EAP Table for MMR-MSD'); disp(MSD_APtable); % display AP table for MMR chart using MSD on screen, if desired % send the estimated APs to an Excel file xlswrite('c:\Users\Rich\Documents\OutputFile.xlsx',APtable,'Sheet1','A1'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 147 Appendix H: MATLAB Code for Assessing Hotelling's T2 Chart Performance %=========================================================================% % HOTELLING'S T^2 CONTROL CHART PROGRAM FILE % %=========================================================================% % -Created by Richard Bell on 9/15/2010; last updated on 3/22/2011. % % -Based on Hotelling's T2 control chart with Alt's (1976) Phase I UCL % % adjusted for the number of subgroups. % % -Can be modified to find empirical APs for specified scenarios, % % determine empirical UCLs for specific distributions, or construct % % control charts for preliminary data sets. % % -File is set up to run multiple scenarios; before using, undesired % % sections must be commented out using "%". % %=========================================================================% clear all % clear all objects in the MATLAB workspace clc % clear the output screen %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% INPUT SIMULATION PARAMETERS %%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % AUTOMATED INPUTS (for simulating multiple scenarios using an input file) % read in m, n, UCL, shift size, and p from an Excel file iterations=10000; % number of simulation iterations to be performed input=xlsread('c:\Users\Rich\Documents\InputFile.xlsx','Sheet1','A1:E50'); inputRows=length(input(:,1)); % determine the number of rows of data in the input file APtable=zeros(inputRows,1); % initialize the array of estimated alarm probability (AP) values for speed for row=1:inputRows % perform the simulation below for each m, n, p, UCL, and shift size combination in the input file m=input(row,1); % read in the desired value for sample size (m) n=input(row,2); % read in the desired value for subgroup size (n) UCL=input(row,3); % read in the upper control limit shiftSize=input(row,4); % read in the number of variables p=input(row,5); % read in the number of variables N=m*n; % determine the pooled sample size (=m in the case of individual observations) count=0; % initialize the counter for the number of iterations performed alarmCount=0; % initialize the alarm counter %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%% GENERATE DATA AND CONSTRUCT HOTELLING'S T2 CHART %%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% while count < iterations % run the entire loop for a set number of iterations 148 %=====> SIMULATE MULTIVARIATE NORMAL AND MULTIVARIATE T DATA (ELLIPTICAL) % OPTION 1: Simulate in-control data. % multivariate normal distribution alpha=.10; % desired overall false alarm probability (FAP) for the chart alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual comparison UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's Phase I upper control limit for Hotelling's T2 chart mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix X=mvnrnd(mu,sigma,N); % generate multivariate normal data % multivariate t distribution df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix X=mvtrnd(sigma,df,N); % generate multivariate t data with specified degrees of freedom % OPTION 2: Simulate out-of-control data with isolated or sustained shifts of the mean. % multivariate normal -- isolated shift of the mean during the first subgroup only alpha=.10; % desired overall false alarm probability (FAP) for the chart alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual comparison UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's Phase I upper control limit for Hotelling's T2 chart mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvnrnd(mu+shift,sigma,n); % generate the shifted subgroup Xb=mvnrnd(mu,sigma,N-n); % generate the rest of the (unshifted) sample X=vertcat(Xa,Xb); % combine shifted and unshifted data % multivariate t -- isolated shift of the mean during the first subgroup only df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvtrnd(sigma,df,n)+repmat(shift,n,1); % generate the first subgroup and add the shift Xb=mvtrnd(sigma,df,N-n); % generate the rest of the (unshifted) sample X=vertcat(Xa,Xb); % combine shifted and unshifted data % multivariate normal -- sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) alpha=.10; % desired overall false alarm probability (FAP) for the chart alphaAdjusted=1-(1-alpha)^(1/m); % desired FAP for each individual comparison 149 UCL=((p*(m-1)*(n-1))/(m*n-m-p+1))*finv(1-alphaAdjusted,p,m*n-m-p+1); % Alt's Phase I upper control limit for Hotelling's T2 chart percentOC=0.15; % designate the percentage of out-of-control points mu=zeros(1,p); % set the mean vector to all zeros sigma=eye(p); % set the covariance matrix equal to the identity matrix numberOC=round(percentOC*N); % determine the number of out-of-control points, rounded to the nearest integer shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvnrnd(mu,sigma,N-numberOC); % generate the in-control points Xb=mvnrnd(mu+shift,sigma,numberOC); % generate the out-of-control points X=vertcat(Xa,Xb); % combine shifted and unshifted data % multivariate t -- sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) percentOC=0.30; % designate the percentage of out-of-control points df=3; % degrees of freedom for multivariate t distribution sigma=eye(p); % set the covariance matrix equal to the identity matrix numberOC=round(percentOC*N); % determine the number of out-of-control points, rounded to the nearest integer shift=zeros(1,p); % initialize the shift vector shift(1)=shiftSize; % place the desired shift in the first position of the shift vector Xa=mvtrnd(sigma,df,N-numberOC); % generate the in-control points Xb=mvtrnd(sigma,df,numberOC)+repmat(shift,numberOC,1); % generate the out- of-control points X=vertcat(Xa,Xb); % combine shifted and unshifted data %=====> SIMULATE MULTIVARIATE LOGNORMAL DATA (SKEWED) % STEP 1: Simulate uniformly distributed vector of shift directions using algorithm by Johnson (1987), page 127. StdNorm=zeros(1,p); % initialize vector of standard normal random numbers Unif=zeros(1,p); % initialize vector of shift directions for i = 1:p StdNorm(1,i)=normrnd(0,1); % generate p independent standard normal variates end for i = 1:p Unif(1,i)=StdNorm(1,i)/sqrt(sum(StdNorm.^2)); % create vector of shift directions IAW Johnson (1987), page 127 end % STEP 2: Simulate the sample data set and standardize. mu_Y=zeros(1,p); % create a mean vector of all zeros sigma_Y=eye(p); % set the covariance matrix equal to the identity matrix Y=mvnrnd(mu_Y,sigma_Y,N); % simulate N multivariate normal observations X=exp(Y); % transform multivariate normal observations to multivariate lognormal observations 150 % NOTE: THE FOLLOWING RESULTS ONLY APPLY TO MULTIVARIATE LOGNORMAL DATA CREATED USING MULTIVARIATE NORMAL DATA WITH ZERO MEAN VECTOR AND IDENTITY COVARIANCE MATRIX! ExpX=exp(1/2); % compute theoretical expected value of X sigma_X=zeros(p,p); % initialize covariance matrix to all zeros for i=1:p % fill in diagonals of covariance matrix for j=1:p if i==j sigma_X(i,j)=exp(1)*(exp(1)-1); % from Law and Kelton (2000), page 382 end end end X=(X-ExpX)/sqrtm(sigma_X); % standardize multivariate lognormal observations to have zero mean vector and identity covariance matrix % STEP 3: Scale the vector of shift directions to achieve a specified noncentrality parameter. sigma_X=eye(p); % specify theoretical covariance matrix of standardized data Unif=shiftSize*Unif; % scale the directional shift vector NCP=sqrt(Unif/sigma_X*Unif'); % check the noncentrality parameter to ensure it equals the desired value if abs(NCP-shiftSize)>0.00001 % display error message if calculated NCP does not equal the shift size (they should be equal since the theoretical covariance matrix of X is I) disp('ERROR in NCP!') end % STEP 4: Induce isolated or sustained shifts of the mean. % isolated shift of the mean during the first subgroup only Xa=X(1:n,:)+repmat(Unif,n,1); % replicate the shift vector n times and add it to the first subgroup Xb=X(n+1:N,:); % identify the remaining (unshifted) observations in the data set X=vertcat(Xa,Xb); % combine shifted and unshifted data % sustained shift of the mean during the last "percentOC" % of the sample (irrespective of subgroups) percentOC=0.15; % designate the percentage of out-of-control points numberOC=round(percentOC*N); % determine the number of in-control points, rounded to the nearest integer Xa=X(1:(N-numberOC),:); % identify the unshifted observations in the data set Xb=X(N-numberOC+1:N,:)+repmat(Unif,numberOC,1); % replicate the shift vector and add it to the remaining observations X=vertcat(Xa,Xb); % combine shifted and unshifted data 151 %=====> PARTITION DATA INTO SUBGROUPS % assign a subgroup identifier to each simulated data point i=1; % start with the first observation in the data set assigned=0; % initialize the total number of observations which have been assigned subgroups ID=1; % initialize the subgroup identifier for the first subgroup subgroup=zeros(N,1); % initialize the N x 1 vector of subgroup identifiers for speed while assigned <= N-n % perform loop until all observations in the data set have been assigned subgroup identifiers size=0; % initialize the number of observations contained in each subgroup while size < n % perform loop until each subgroup reaches size n subgroup(i)=ID; % assign the subgroup identifier "ID" to an observation size=size+1; % increment the number of observations in the current subgroup i=i+1; % move to the next observation end ID=ID+1; % increment the subgroup identifier assigned=assigned+n; % increment the total number of observations which have been assigned subgroups end %=====> COMPUTE ROBUST ESTIMATES OF LOCATION AND SCATTER subgroupMeans=zeros(m,p); % initialize the matrix of individual subgroup mean vectors totalMeans=zeros(1,p); % initialize the total of all subgroup mean vectors totalCovs=zeros(p,p); % initialize the total of all subgroup covariance matrices subgroup(N+1)=0; % create a fictitious subgroup for the nonexistent (N+1)st observation so the following while loop doesn't cause an error at the Nth observation i=1; % initialize the index for the N x p vector of observations while i <= N % perform loop for all N observations currentSubgroup=X(i,:); % start with first observation in the data set j=i; % initialize the subgroup index to point to the first observation in each subgroup while subgroup(j)==subgroup(j+1) % perform loop until the subgroup identifier changes (this is where the fake subgroup is needed) currentSubgroup=cat(1,currentSubgroup,X(j+1,:)); % combine individual observations into their respective subgroups j=j+1; % increment the subgroup index by 1 end subgroupMeans(j/n,:)=mean(currentSubgroup); % store individual subgroup means in a vector totalMeans=totalMeans+subgroupMeans(j/n,:); % keep a running total of all subgroup mean vectors totalCovs=totalCovs+cov(currentSubgroup); % keep a running total of all subgroup covariance matrices i=i+n; % count the number of observations for which subgroup averages have been computed in order to regulate the while loop end 152 Xbar_robust=totalMeans/m; % compute average of subgroup means; serves as unbaised estimate of mean vector S_robust=totalCovs/m; % compute average of subgroup variances; serves as unbiased estimate of covariance matrix %=====> COMPUTE HOTELLING'S T2 STATISTICS AND COMPARE TO UCL alarm=0; % initialize indicator variable representing an alarm (=1) or no alarm (=0) T2vector=zeros(m,1); % initialize vector of T2 statistics for i=1:m if alarm==0 % continue loop as long as no false alarms occur T2stat=n*(subgroupMeans(i,:)-Xbar_robust)/S_robust*(subgroupMeans(i,:)- Xbar_robust)'; % compute T2 control statistic T2vector(i)=T2stat; % store T2 control statistics in a vector if T2stat > UCL alarm=1; % issue a false alarm if the T2 control statistic exceeds the UCL end end end if alarm==1 alarmCount=alarmCount+1; % if a control chart issues a false alarm, increment the counter representing total false alarms for all iterations end count=count+1; % increment the counter for the total number of iterations performed end AP=alarmCount/iterations; % estimate the alarm probability (AP) for the current scenario and store in an array APtable(row,1)=AP; disp('AP Table for Hotellings T2 Chart'); disp(APtable); % display AP table for Hotelling's T2 chart on screen, if desired % send the estimated APs to an Excel file xlswrite('c:\Users\Rich\Documents\OutputFile.xlsx',APtable,'Sheet1','A1'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% END OF PROGRAM %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 153 Appendix I: Simulation Results Using In-Control Symmetric Data P r oc e s s D i s t r i b u t i on p = 2 p = 5 p = 10 p = 2 p = 5 p = 10 p = 2 p = 5 p = 10 20 5 0.0984 0.0992 0.0995 0.0919 0.0967 0.0947 0.0902 0.0947 50 5 0.0979 0.0990 0.1019 0.0996 0.1020 0.0975 0.1006 0.0975 100 5 0.1004 0.1006 0.0979 0.0998 0.0985 0.0972 0.0983 0.0972 150 5 0.0997 0.1040 0.0995 0.1009 0.1019 0.0990 0.1019 0.0990 200 5 0.0972 0.1005 0.1016 0.0973 0.0929 0.0973 0.0967 0.0973 20 5 0.1205 0.0851 0.0862 50 5 0.1634 0.0961 0.0981 100 5 0.1907 0.1031 0.1035 150 5 0.2203 0.0939 0.0940 200 5 0.2317 0.0988 0.0988 20 5 0.3040 0.3843 0.4266 0.0973 0.0930 0.0981 0.0959 0.0912 0.0981 50 5 0.5892 0.7876 0.9055 0.0978 0.1010 0.1004 0.0973 0.1021 0.1004 100 5 0.7876 0.9591 0.9932 0.0994 0.1022 0.0974 0.0974 0.1009 0.0974 150 5 0.8864 0.9895 0.9998 0.0950 0.1037 0.0981 0.0957 0.1035 0.0981 200 5 0.9348 0.9971 1.0000 0.1019 0.1013 0.0957 0.1035 0.1015 0.0957 t ( 10) t ( 3) E m p i r i c al F A P f or M M R - M S D C h ar t n or m a l E m p i r i c al F A P f or H ot e l l i n g' s T 2 C h ar t E m p i r i c al F A P f or M M R - R M D C h ar t m n 154 Appendix J: Simulation Results Using Symmetric Data with an IS in p = 2 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2.5 ? = 3 ? = 3.5 ? = 4 ? = 5 ? = 6 20 5 0.0984 0.2523 0.8795 0.9835 0.9990 1.0000 1.0000 1.0000 1.0000 50 5 0.0979 0.2091 0.8497 0.9847 0.9991 1.0000 1.0000 1.0000 1.0000 100 5 0.1004 0.1826 0.8222 0.9770 0.9983 0.9999 1.0000 1.0000 1.0000 150 5 0.0997 0.1703 0.7876 0.9712 0.9979 1.0000 1.0000 1.0000 1.0000 200 5 0.0972 0.1610 0.7827 0.9641 0.9985 1.0000 1.0000 1.0000 1.0000 20 5 0.0974 0.1918 0.7438 0.9360 0.9917 0.9993 1.0000 1.0000 1.0000 50 5 0.0934 0.1505 0.6629 0.9110 0.9894 0.9991 0.9998 1.0000 1.0000 100 5 0.1019 0.1295 0.5951 0.8790 0.9815 0.9978 0.9999 1.0000 1.0000 150 5 0.0970 0.1231 0.5417 0.8442 0.9733 0.9978 0.9998 1.0000 1.0000 200 5 0.0997 0.1196 0.4977 0.8268 0.9751 0.9976 0.9999 1.0000 1.0000 20 5 0.0985 0.1154 0.2468 0.4115 0.6287 0.8065 0.9060 0.9811 0.9952 50 5 0.0987 0.1061 0.1170 0.1496 0.2546 0.4322 0.6503 0.9242 0.9852 100 5 0.0990 0.0988 0.1058 0.0991 0.1122 0.1488 0.2296 0.6279 0.9233 150 5 0.1025 0.1005 0.0957 0.0984 0.1055 0.1013 0.1257 0.3033 0.7200 200 5 0.1007 0.0983 0.0973 0.1060 0.1008 0.1020 0.1115 0.1610 0.4645 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2.5 ? = 3 ? = 3.5 ? = 4 ? = 5 ? = 6 20 5 0.0919 0.1120 0.4927 0.7795 0.9408 0.9894 0.9984 1.0000 1.0000 50 5 0.0996 0.1090 0.4482 0.7518 0.9268 0.9861 0.9980 1.0000 1.0000 100 5 0.0998 0.1092 0.4013 0.7012 0.9063 0.9807 0.9969 0.9998 1.0000 150 5 0.1009 0.1091 0.3694 0.6740 0.8936 0.9760 0.9959 0.9999 1.0000 200 5 0.0973 0.1065 0.3565 0.6577 0.8752 0.9771 0.9953 0.9997 1.0000 20 5 0.0851 0.1067 0.3870 0.6811 0.8891 0.9674 0.9912 0.9994 0.9999 50 5 0.0961 0.1037 0.3447 0.6374 0.8679 0.9622 0.9891 0.9986 0.9998 100 5 0.1031 0.0977 0.2989 0.5842 0.8339 0.9488 0.9862 0.9974 0.9996 150 5 0.0939 0.1062 0.2754 0.5473 0.8055 0.9406 0.9804 0.9975 0.9995 200 5 0.0988 0.1041 0.2518 0.5180 0.8026 0.9335 0.9773 0.9973 0.9992 20 5 0.0973 0.1033 0.2135 0.3926 0.6432 0.8208 0.9263 0.9872 0.9956 50 5 0.0978 0.1025 0.1774 0.3318 0.5795 0.7957 0.9095 0.9817 0.9939 100 5 0.0994 0.1014 0.1411 0.2598 0.4940 0.7432 0.8892 0.9765 0.9933 150 5 0.0950 0.1010 0.1265 0.2344 0.4493 0.6983 0.8667 0.9714 0.9912 200 5 0.1019 0.0988 0.1251 0.2091 0.4061 0.6681 0.8471 0.9694 0.9914 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2.5 ? = 3 ? = 3.5 ? = 4 ? = 5 ? = 6 20 5 0.0902 0.1064 0.4549 0.7387 0.9219 0.9836 0.9976 1.0000 1.0000 50 5 0.1006 0.1087 0.4302 0.7318 0.9168 0.9837 0.9978 1.0000 1.0000 100 5 0.0983 0.1074 0.3899 0.6902 0.8987 0.9794 0.9968 0.9999 0.9999 150 5 0.1019 0.1082 0.3631 0.6653 0.8902 0.9755 0.9958 0.9999 1.0000 200 5 0.0967 0.1059 0.3515 0.6506 0.8717 0.9758 0.9956 0.9999 1.0000 20 5 0.0862 0.1046 0.3508 0.6351 0.8559 0.9543 0.9856 0.9996 0.9997 50 5 0.0981 0.1047 0.3302 0.6157 0.8513 0.9555 0.9872 0.9989 0.9996 100 5 0.1035 0.0970 0.2907 0.5702 0.8237 0.9435 0.9817 0.9989 0.9995 150 5 0.0940 0.1071 0.2679 0.5372 0.7977 0.9382 0.9781 0.9980 0.9995 200 5 0.0988 0.1041 0.2480 0.5088 0.7948 0.9312 0.9762 0.9979 0.9992 20 5 0.0959 0.1024 0.1914 0.3475 0.5767 0.7645 0.8873 0.9766 0.9956 50 5 0.0973 0.1010 0.1679 0.3065 0.5462 0.7683 0.8948 0.9793 0.9932 100 5 0.0974 0.1007 0.1388 0.2502 0.4760 0.7233 0.8790 0.9760 0.9924 150 5 0.0957 0.0995 0.1253 0.2305 0.4371 0.6847 0.8611 0.9715 0.9902 200 5 0.1035 0.0992 0.1249 0.2049 0.3936 0.6559 0.8397 0.9657 0.9902 2 2 2 m n m n p 2 2 2 p t ( 3) E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t n or m a l t ( 10) 2 2 2 p m n E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - M S D C h ar t t ( 3) n or m a l n or m a l t ( 10) t ( 10) t ( 3) 155 Appendix K: Simulation Results Using Symmetric Data with an IS in p = 5 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 ? = 6 ? = 7 20 5 0.0992 0.1791 0.7391 0.9939 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.0990 0.1532 0.7034 0.9944 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.1006 0.1408 0.6658 0.9935 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.1040 0.1323 0.6440 0.9939 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.1005 0.1212 0.6137 0.9892 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0987 0.1039 0.1697 0.4600 0.6703 0.8194 0.9184 0.9660 0.9930 0.9984 50 5 0.1045 0.0960 0.1037 0.1538 0.2215 0.3817 0.5886 0.7732 0.9563 0.9934 100 5 0.0973 0.0986 0.1024 0.0997 0.1047 0.1218 0.1495 0.2523 0.6051 0.9049 150 5 0.0936 0.0990 0.0978 0.1055 0.1009 0.0960 0.1089 0.1176 0.2296 0.5849 200 5 0.1008 0.0997 0.1001 0.0992 0.1010 0.1055 0.1043 0.1030 0.1235 0.2728 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 ? = 6 ? = 7 20 5 0.0967 0.1044 0.3214 0.8282 0.9539 0.9911 0.9993 0.9999 1.0000 1.0000 50 5 0.1020 0.1060 0.2838 0.7967 0.9455 0.9891 0.9988 0.9998 1.0000 1.0000 100 5 0.0985 0.1069 0.2295 0.7485 0.9173 0.9838 0.9980 1.0000 1.0000 1.0000 150 5 0.1019 0.1068 0.2223 0.7105 0.9090 0.9799 0.9964 0.9991 1.0000 1.0000 200 5 0.0929 0.0956 0.2025 0.6996 0.8992 0.9765 0.9965 0.9994 1.0000 1.0000 20 5 0.0930 0.0980 0.1184 0.2897 0.4630 0.6691 0.8206 0.9200 0.9873 0.9981 50 5 0.1010 0.1014 0.1032 0.2045 0.3414 0.5607 0.7768 0.9126 0.9881 0.9986 100 5 0.1022 0.0971 0.1044 0.1474 0.2535 0.4412 0.6859 0.8720 0.9849 0.9973 150 5 0.1037 0.0998 0.0998 0.1357 0.2105 0.3736 0.6098 0.8190 0.9810 0.9975 200 5 0.1013 0.0977 0.1074 0.1222 0.1740 0.3234 0.5481 0.7898 0.9761 0.9971 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 ? = 6 ? = 7 20 5 0.0912 0.0980 0.1141 0.2520 0.3997 0.5950 0.7547 0.8708 0.9716 0.9943 50 5 0.1021 0.0998 0.1041 0.1899 0.3110 0.5200 0.7321 0.8848 0.9847 0.9975 100 5 0.1009 0.0970 0.1033 0.1431 0.2394 0.4147 0.6537 0.8424 0.9792 0.9978 150 5 0.1035 0.0989 0.0997 0.1336 0.2046 0.3549 0.5846 0.8132 0.9779 0.9968 200 5 0.1015 0.0988 0.1076 0.1199 0.1706 0.3137 0.5309 0.7691 0.9725 0.9964 m n m n m n p 5 5 p t ( 3) E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t n or m a l n or m a l E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t t ( 3) p 5 5 t ( 3) E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - M S D C h ar t 5 156 Appendix L: Simulation Results Using Symmetric Data with an IS in p = 10 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0995 0.1387 0.5438 0.9670 0.9998 1.0000 1.0000 1.0000 1.0000 50 5 0.1019 0.1268 0.5379 0.9764 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.0979 0.1199 0.5038 0.9736 0.9999 1.0000 1.0000 1.0000 1.0000 150 5 0.0995 0.1197 0.4692 0.9677 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.1016 0.1157 0.4553 0.9667 0.9999 1.0000 1.0000 1.0000 1.0000 20 5 0.0987 0.1038 0.1395 0.3195 0.6983 0.9302 0.9894 0.9991 0.9996 50 5 0.1030 0.1035 0.1068 0.1284 0.2556 0.5971 0.9037 0.9843 0.9981 100 5 0.0943 0.0971 0.0964 0.1045 0.1131 0.1594 0.3705 0.7529 0.9531 150 5 0.0979 0.0972 0.0949 0.0985 0.0995 0.1029 0.1342 0.2857 0.6659 200 5 0.1005 0.0994 0.0993 0.1006 0.1011 0.0973 0.1085 0.1396 0.2816 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0947 0.1079 0.2509 0.6957 0.9668 0.9992 1.0000 1.0000 1.0000 50 5 0.0975 0.0993 0.1889 0.6289 0.9546 0.9985 0.9999 1.0000 1.0000 100 5 0.0972 0.1015 0.1644 0.5607 0.9379 0.9985 0.9999 1.0000 1.0000 150 5 0.0990 0.1044 0.1518 0.5260 0.9225 0.9964 0.9999 1.0000 1.0000 200 5 0.0973 0.1026 0.1494 0.5009 0.9162 0.9952 0.9999 1.0000 1.0000 20 5 0.0981 0.0979 0.1070 0.1528 0.3362 0.6653 0.8966 0.9823 0.9968 50 5 0.1004 0.1029 0.0966 0.1146 0.2004 0.4930 0.8380 0.9778 0.9973 100 5 0.0974 0.0999 0.1059 0.1064 0.1402 0.3368 0.7346 0.9640 0.9972 150 5 0.0981 0.0975 0.0985 0.1005 0.1273 0.2522 0.6426 0.9393 0.9956 200 5 0.0957 0.1026 0.1034 0.0995 0.1139 0.2106 0.5652 0.9221 0.9947 10 10 10 p m n E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t t ( 3) n or m a l t ( 3) n or m a l p m n 10 157 Appendix M: Simulation Results Using Symmetric Data with a 5% SS in p = 2 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0956 0.2576 0.8755 0.9835 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.1012 0.2906 0.9625 0.9987 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.0990 0.4052 0.9991 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.1031 0.4354 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0993 0.4841 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0985 0.1176 0.2473 0.4248 0.6404 0.8041 0.9079 0.9793 0.9936 0.9985 0.9991 50 5 0.0987 0.1022 0.1313 0.1891 0.3091 0.5196 0.7544 0.9594 0.9941 0.9989 1.0000 100 5 0.0990 0.0946 0.1058 0.1186 0.1597 0.2605 0.4569 0.8859 0.9902 0.9992 0.9997 150 5 0.1025 0.1007 0.1122 0.1134 0.1186 0.1428 0.2103 0.5985 0.9344 0.9944 0.9990 200 5 0.1007 0.0959 0.1038 0.1071 0.1163 0.1310 0.1561 0.3909 0.8497 0.9856 0.9986 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0919 0.1157 0.4920 0.7743 0.9414 0.9910 0.9989 1.0000 1.0000 1.0000 1.0000 50 5 0.0996 0.1194 0.5842 0.8872 0.9892 0.9993 0.9999 1.0000 1.0000 1.0000 1.0000 100 5 0.0998 0.1345 0.7941 0.9872 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.1009 0.1417 0.8360 0.9951 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0973 0.1503 0.8985 0.9987 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0973 0.0953 0.2167 0.4078 0.6423 0.8240 0.9264 0.9848 0.9963 0.9983 0.9993 50 5 0.0978 0.0984 0.2060 0.4160 0.6855 0.8878 0.9728 0.9983 0.9994 0.9995 0.9998 100 5 0.0994 0.0993 0.2359 0.4995 0.8355 0.9743 0.9960 0.9994 0.9998 0.9999 1.0000 150 5 0.0950 0.0981 0.2209 0.4957 0.8335 0.9812 0.9980 0.9998 1.0000 1.0000 1.0000 200 5 0.1019 0.0986 0.2272 0.5167 0.8672 0.9890 0.9988 0.9997 0.9999 0.9999 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0959 0.0949 0.1953 0.3621 0.5784 0.7676 0.8914 0.9768 0.9939 0.9976 0.9990 50 5 0.0973 0.0965 0.1868 0.3507 0.5913 0.8139 0.9349 0.9933 0.9994 0.9995 0.9999 100 5 0.0974 0.0998 0.2100 0.4191 0.7379 0.9316 0.9879 0.9995 0.9997 0.9996 1.0000 150 5 0.0957 0.0999 0.2005 0.4121 0.7169 0.9336 0.9934 0.9996 0.9995 0.9999 0.9999 200 5 0.1035 0.1009 0.2029 0.4271 0.7623 0.9554 0.9957 0.9996 0.9994 1.0000 0.9999 t ( 3) 2 m n 2 p p m n 2 p 2 2 m E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t n or m a l t ( 3) n or m a l t ( 3) n E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t 158 Appendix N: Simulation Results Using Symmetric Data with a 15% SS in p = 2 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0901 0.3966 0.9854 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.1026 0.4790 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.0990 0.6014 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.1009 0.6486 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0941 0.6901 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0985 0.1369 0.3595 0.5747 0.7921 0.9088 0.9617 0.9941 0.9976 0.9992 0.9995 50 5 0.0987 0.1150 0.1713 0.2584 0.4344 0.6663 0.8420 0.9795 0.9950 0.9988 0.9998 100 5 0.0990 0.0965 0.1229 0.1425 0.2015 0.3140 0.5217 0.8979 0.9888 0.9964 0.9995 150 5 0.1025 0.1043 0.1056 0.1240 0.1388 0.1940 0.2585 0.6328 0.9326 0.9915 0.9980 200 5 0.1007 0.1004 0.1106 0.1202 0.1339 0.1506 0.2000 0.4280 0.8296 0.9790 0.9981 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0919 0.1266 0.5698 0.8621 0.9820 0.9984 0.9998 1.0000 0.9999 0.9997 1.0000 50 5 0.0996 0.1402 0.6561 0.9375 0.9973 1.0000 1.0000 0.9999 0.9999 1.0000 1.0000 100 5 0.0998 0.1519 0.7984 0.9876 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.1009 0.1541 0.8408 0.9945 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0973 0.1550 0.8786 0.9978 1.0000 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0973 0.0996 0.2185 0.3975 0.6356 0.8208 0.9317 0.9935 0.9985 0.9996 1.0000 50 5 0.0978 0.1042 0.2132 0.3951 0.6463 0.8559 0.9593 0.9974 0.9989 0.9999 0.9999 100 5 0.0994 0.1071 0.2188 0.4136 0.7085 0.9106 0.9813 0.9985 0.9996 1.0000 1.0000 150 5 0.0950 0.1041 0.2084 0.4129 0.6911 0.9072 0.9854 0.9993 0.9994 0.9999 0.9999 200 5 0.1019 0.1021 0.2042 0.3984 0.6999 0.9176 0.9870 0.9993 0.9997 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0959 0.0991 0.1814 0.2853 0.4414 0.5971 0.7399 0.9126 0.9722 0.9919 0.9976 50 5 0.0973 0.1036 0.1751 0.2743 0.4180 0.5841 0.7152 0.9040 0.9677 0.9889 0.9966 100 5 0.0974 0.1062 0.1811 0.2913 0.4575 0.6264 0.7664 0.9362 0.9844 0.9966 0.9993 150 5 0.0957 0.1041 0.1771 0.2845 0.4415 0.6041 0.7576 0.9320 0.9835 0.9957 0.9987 200 5 0.1035 0.1021 0.1740 0.2804 0.4408 0.6185 0.7657 0.9406 0.9859 0.9970 0.9991 p 2 2 p 2 2 p 2 E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t m n m n n or m a l n or m a l t ( 3) t ( 3) E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t t ( 3) m n 159 Appendix O: Simulation Results Using Symmetric Data with a 30% SS in p = 2 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0970 0.4486 0.9877 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.0971 0.5508 0.9994 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.1012 0.6391 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.0974 0.6695 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0965 0.7107 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0985 0.1478 0.3740 0.5731 0.7725 0.8920 0.9542 0.9875 0.9961 0.9988 0.9993 50 5 0.0987 0.1181 0.1860 0.2727 0.4284 0.6168 0.8144 0.9666 0.9934 0.9970 0.9986 100 5 0.0990 0.1020 0.1335 0.1577 0.2076 0.2829 0.4083 0.7743 0.9599 0.9915 0.9973 150 5 0.1025 0.1073 0.1236 0.1357 0.1635 0.1884 0.2384 0.4802 0.8231 0.9701 0.9917 200 5 0.1007 0.1094 0.1127 0.1236 0.1404 0.1622 0.1990 0.3221 0.6076 0.9028 0.9820 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0952 0.1097 0.2626 0.4342 0.6330 0.7909 0.9085 0.9922 0.9998 0.9998 1.0000 50 5 0.0985 0.1159 0.3212 0.5213 0.7209 0.8661 0.9517 0.9974 1.0000 1.0000 1.0000 100 5 0.0969 0.1245 0.3571 0.5759 0.7708 0.9073 0.9694 0.9988 1.0000 1.0000 1.0000 150 5 0.0973 0.1185 0.3755 0.6052 0.8109 0.9276 0.9794 0.9989 1.0000 1.0000 1.0000 200 5 0.0996 0.1233 0.4044 0.6288 0.8314 0.9364 0.9811 0.9994 0.9999 1.0000 1.0000 20 5 0.0942 0.0987 0.1379 0.1935 0.2853 0.3839 0.5116 0.7353 0.8950 0.9669 0.9947 50 5 0.0945 0.0997 0.1406 0.1919 0.2743 0.3875 0.5054 0.7428 0.8997 0.9725 0.9957 100 5 0.0948 0.0948 0.1345 0.1852 0.2681 0.3769 0.4966 0.7524 0.9073 0.9717 0.9947 150 5 0.0992 0.1012 0.1316 0.1795 0.2609 0.3566 0.4955 0.7442 0.9049 0.9712 0.9933 200 5 0.0954 0.0981 0.1288 0.1761 0.2598 0.3615 0.4859 0.7498 0.9043 0.9675 0.9915 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 2 . 5 ? = 3 ? = 3 . 5 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 20 5 0.0948 0.0941 0.1145 0.1354 0.1679 0.1864 0.2148 0.2668 0.2974 0.3257 0.3451 50 5 0.0948 0.0992 0.1210 0.1412 0.1702 0.1945 0.2151 0.2586 0.3087 0.3292 0.3553 100 5 0.0945 0.0972 0.1193 0.1379 0.1687 0.1921 0.2164 0.2722 0.3126 0.3422 0.3705 150 5 0.0991 0.1021 0.1200 0.1363 0.1662 0.1845 0.2196 0.2712 0.3001 0.3332 0.3621 200 5 0.0968 0.0981 0.1144 0.1353 0.1668 0.1944 0.2032 0.2691 0.3084 0.3494 0.3631 2 m n m n m n E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t n or m a l p 2 2 p t ( 3) t ( 3) n or m a l 2 2 E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t p t ( 3) 160 Appendix P: Simulation Results Using Symmetric Data with a 5% SS in p = 10 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0 20 5 0.1010 0.1417 0.5407 0.9664 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.1029 0.1569 0.7098 0.9978 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.0977 0.1790 0.9170 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.0936 0.1894 0.9526 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.1005 0.2041 0.9808 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0987 0.1050 0.1396 0.3116 0.6846 0.9251 0.9896 0.9990 0.9999 0.9999 1.0000 50 5 0.1030 0.0996 0.1086 0.1491 0.3094 0.6567 0.9241 0.9894 0.9986 0.9999 1.0000 100 5 0.0943 0.1013 0.1070 0.1153 0.1551 0.2771 0.6562 0.9387 0.9938 0.9992 0.9999 150 5 0.0979 0.0985 0.1032 0.1086 0.1128 0.1446 0.2346 0.5196 0.8724 0.9851 0.9972 200 5 0.1005 0.0981 0.0951 0.1016 0.1060 0.1215 0.1591 0.2767 0.6207 0.9207 0.9897 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0 20 5 0.0947 0.1107 0.2460 0.6923 0.9681 0.9994 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.0975 0.1095 0.2431 0.7505 0.9876 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.0972 0.1073 0.3075 0.9262 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.0990 0.1029 0.3264 0.9447 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0973 0.1034 0.3605 0.9756 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0981 0.0988 0.1094 0.1587 0.3444 0.6559 0.8976 0.9802 0.9969 0.9994 0.9999 50 5 0.1004 0.1001 0.1022 0.1186 0.2185 0.4938 0.8180 0.9693 0.9979 0.9994 0.9999 100 5 0.0974 0.0953 0.1037 0.1167 0.2178 0.5369 0.9032 0.9907 0.9996 1.0000 1.0000 150 5 0.0981 0.1015 0.0995 0.1096 0.1747 0.4674 0.8701 0.9894 0.9998 1.0000 1.0000 200 5 0.0957 0.1013 0.1030 0.1146 0.1757 0.4738 0.8888 0.9952 0.9996 1.0000 1.0000 p 10 10 m E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t n m n n or m a l t ( 3) p n or m a l t ( 3) 10 10 161 Appendix Q: Simulation Results Using Symmetric Data with a 15% SS in p = 10 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0 20 5 0.0995 0.1831 0.7445 0.9974 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.0950 0.2261 0.9049 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.0968 0.2464 0.9848 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.0973 0.2713 0.9949 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0982 0.2946 0.9992 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0987 0.1121 0.1772 0.4380 0.8039 0.9680 0.9962 0.9997 0.9999 1.0000 1.0000 50 5 0.1030 0.1065 0.1329 0.1984 0.4178 0.7694 0.9578 0.9949 0.9994 0.9998 1.0000 100 5 0.0943 0.1026 0.1113 0.1282 0.1909 0.3578 0.6962 0.9421 0.9944 0.9988 0.9998 150 5 0.0979 0.0964 0.1045 0.1205 0.1428 0.1828 0.2854 0.5475 0.8508 0.9762 0.9976 200 5 0.1005 0.0980 0.1040 0.1071 0.1216 0.1386 0.1961 0.3136 0.5851 0.8851 0.9825 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0 20 5 0.0947 0.1094 0.2859 0.7599 0.9839 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.0975 0.1075 0.2744 0.7892 0.9951 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.0972 0.1088 0.3130 0.9012 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.0990 0.1082 0.3314 0.9240 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0973 0.1121 0.3573 0.9519 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0981 0.1004 0.1051 0.1649 0.3047 0.5756 0.8267 0.9511 0.9920 0.9993 1.0000 50 5 0.1004 0.0959 0.1057 0.1272 0.2051 0.3934 0.6600 0.8851 0.9816 0.9993 1.0000 100 5 0.0974 0.1009 0.1029 0.1186 0.1881 0.3850 0.6887 0.9214 0.9905 0.9998 1.0000 150 5 0.0981 0.0908 0.1002 0.1147 0.1675 0.3294 0.6172 0.8768 0.9814 0.9993 1.0000 200 5 0.0957 0.0973 0.1017 0.1126 0.1703 0.3129 0.6219 0.8908 0.9871 0.9996 1.0000 n m n p 10 10 p 10 10 E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t m n or m a l t ( 3) n or m a l t ( 3) 162 Appendix R: Simulation Results Using Symmetric Data with a 30% SS in p = 10 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0 20 5 0.0957 0.2040 0.7375 0.9952 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 50 5 0.0969 0.2504 0.9008 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 5 0.1030 0.2719 0.9682 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 150 5 0.0978 0.2935 0.9860 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 200 5 0.0962 0.3092 0.9934 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 20 5 0.0987 0.1122 0.1859 0.4187 0.7523 0.9482 0.9931 0.9994 0.9999 0.9999 1.0000 50 5 0.1030 0.1094 0.1439 0.2264 0.4202 0.7460 0.9434 0.9938 0.9992 0.9998 1.0000 100 5 0.0943 0.1066 0.1196 0.1510 0.1979 0.3169 0.5278 0.8218 0.9639 0.9954 0.9994 150 5 0.0979 0.1004 0.1093 0.1254 0.1489 0.1960 0.2900 0.4356 0.6899 0.9171 0.9870 200 5 0.1005 0.1012 0.1094 0.1200 0.1337 0.1589 0.2058 0.2651 0.4123 0.6195 0.8686 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0 20 5 0.1013 0.1136 0.1759 0.4025 0.7251 0.9344 0.9944 0.9993 1.0000 1.0000 1.0000 50 5 0.1024 0.1121 0.1818 0.4155 0.7651 0.9613 0.9976 1.0000 1.0000 1.0000 1.0000 100 5 0.0969 0.1032 0.1736 0.4387 0.8137 0.9713 0.9989 1.0000 1.0000 1.0000 1.0000 150 5 0.0967 0.1001 0.1769 0.4457 0.8223 0.9807 0.9996 1.0000 1.0000 1.0000 1.0000 200 5 0.1021 0.1057 0.1835 0.4711 0.8435 0.9845 0.9990 1.0000 1.0000 1.0000 1.0000 20 5 0.0946 0.0996 0.1112 0.1237 0.1710 0.2612 0.4224 0.6628 0.8747 0.9701 0.9923 50 5 0.0967 0.1015 0.1052 0.1094 0.1406 0.2009 0.2903 0.4543 0.7347 0.9553 0.9973 100 5 0.0985 0.0997 0.0996 0.1068 0.1324 0.1619 0.2527 0.3618 0.5623 0.9034 0.9960 150 5 0.0995 0.0993 0.0999 0.1017 0.1253 0.1551 0.2254 0.3298 0.4823 0.8402 0.9928 200 5 0.0961 0.0969 0.0991 0.1064 0.1221 0.1476 0.2184 0.3138 0.4491 0.7572 0.9903 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 ? = 8 ? = 9 ? = 1 0 20 5 0.0909 0.0948 0.1041 0.1093 0.1278 0.1605 0.2134 0.2560 0.3182 0.3837 0.4417 50 5 0.0960 0.0997 0.1029 0.1060 0.1176 0.1445 0.1661 0.2075 0.2530 0.3155 0.3456 100 5 0.0979 0.0979 0.0980 0.1037 0.1211 0.1266 0.1586 0.1930 0.2341 0.2755 0.3192 150 5 0.0992 0.0999 0.0997 0.0996 0.1131 0.1283 0.1493 0.1832 0.2119 0.2544 0.2983 200 5 0.0965 0.0967 0.0994 0.1033 0.1141 0.1202 0.1459 0.1779 0.2087 0.2390 0.2912 m n m n t ( 3) t ( 3) n or m a l p 10 10 10 E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t n or m a l p 10 t ( 3) E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t p 10 m n 163 Appendix S: Simulation Results Using In-Control Skewed Data P r o c e s s D i s t r i b u t i o n p = 2 p = 5 p = 2 p = 5 p = 2 p = 5 20 5 0 .4 4 1 4 0 .4 8 0 3 0 .0 9 3 5 0 .0 9 6 5 0 .0 9 9 1 50 5 0 .7 6 1 8 0 .8 6 7 6 0 .1 0 1 9 0 .0 9 9 4 0 .0 9 8 7 100 5 0 .9 3 5 3 0 .9 8 2 7 0 .1 0 1 2 0 .1 0 3 0 0 .1 0 0 5 150 5 0 .9 7 7 9 0 .9 9 7 2 0 .0 9 4 9 0 .0 9 5 7 0 .1 0 2 0 200 5 0 .9 9 1 5 0 .9 9 9 8 0 .0 9 9 6 0 .0 9 9 7 0 .1 0 2 3 l o g n o r m a l E m p i r i c a l F A P f o r M M R - M S D C h a r tE m p i r i c a l F A P f o r M M R - R M D C h a r tE m p i r i c a l F A P f o r H o t e l l i n g ' s T 2 C h a r tm n 164 Appendix T: Simulation Results Using Skewed Data with an IS in p = 2 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 20 5 0.0984 0.1234 0.2527 0.5734 0.8771 0.9857 0.9995 50 5 0.0979 0.1138 0.2139 0.5128 0.8494 0.9812 0.9994 100 5 0.1004 0.1041 0.1846 0.4545 0.8181 0.9777 0.9994 150 5 0.0997 0.1071 0.1717 0.4171 0.7983 0.9682 0.9986 200 5 0.0972 0.1036 0.1617 0.4075 0.7840 0.9668 0.9978 20 5 0.0967 0.1071 0.1579 0.3973 0.7084 0.8826 0.9586 50 5 0.0956 100 5 0.1009 0.0991 0.1003 0.1016 0.1254 0.2423 0.5404 150 5 0.1001 200 5 0.0979 0.0999 0.1008 0.1017 0.1008 0.1066 0.1618 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 20 5 0.0919 0.0899 0.1240 0.2274 0.4873 0.7792 0.9401 50 5 0.0996 0.0962 0.1140 0.2012 0.4509 0.7458 0.9348 100 5 0.0998 0.0977 0.1085 0.1858 0.4046 0.7065 0.9075 150 5 0.1009 0.1001 0.1067 0.1725 0.3713 0.6672 0.8929 200 5 0.0973 0.0995 0.1030 0.1555 0.3491 0.6587 0.8768 20 5 0.0935 0.1048 0.2979 0.7470 0.9430 0.9861 0.9918 50 5 0.1019 100 5 0.1012 0.1027 0.1518 0.5256 0.9168 0.9788 0.9918 150 5 0.0949 200 5 0.0996 0.0956 0.1173 0.3561 0.8533 0.9684 0.9891 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 20 5 0.0902 0.0887 0.1181 0.2117 0.4489 0.7387 0.9216 50 5 0.1006 0.0948 0.1107 0.1910 0.4327 0.7244 0.9163 100 5 0.0983 0.0969 0.1087 0.1809 0.3958 0.6934 0.9005 150 5 0.1019 0.1003 0.1066 0.1682 0.3618 0.6640 0.8891 200 5 0.0967 0.0990 0.1033 0.1549 0.3465 0.6517 0.8745 20 5 0.0965 0.1785 0.3991 0.6790 0.8966 0.9759 0.9914 50 5 0.0994 100 5 0.1030 0.1579 0.3564 0.6137 0.9052 0.9836 0.9938 150 5 0.0957 200 5 0.0997 0.1366 0.3322 0.5517 0.8622 0.9805 0.9918 2 2 p m n 2 2 n or m a l n or m a l l og n or m a l l og n or m a l E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - M S D C h ar t p m n E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t n or m a l l og n or m a l m np 2 2 165 Appendix U: Simulation Results Using Skewed Data with an IS in p = 5 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 20 5 0.0934 0.1119 0.1279 0.2515 0.5461 0.8207 0.9455 0.9851 0.9945 50 5 0.0963 100 5 0.0958 0.0968 0.0975 0.1001 0.1110 0.1400 0.2783 0.6024 0.8719 150 5 0.0968 200 5 0.0967 0.0953 0.0976 0.1012 0.0989 0.1018 0.1128 0.1577 0.3426 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 20 5 0.0991 0.0977 0.1258 0.2897 0.5647 0.8285 0.9420 0.9855 0.9962 50 5 0.0987 100 5 0.1005 0.1001 0.0964 0.1760 0.4048 0.7213 0.9350 0.9926 0.9987 150 5 0.1020 200 5 0.1023 0.0969 0.1038 0.1370 0.3094 0.6104 0.8813 0.9850 0.9986 l og n or m a l E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t l og n or m a l E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - M S D C h ar t p 5 p 5 m n m n 166 Appendix V: Simulation Results Using Skewed Data with a 5% SS in p = 2 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0967 0.1072 0.1577 0.3914 0.7022 0.8842 0.9589 0.9839 0.9938 0.9967 0.9974 50 5 0.0956 100 5 0.1009 0.1028 0.1055 0.1258 0.1868 0.4334 0.7751 0.9465 0.9876 0.9978 0.9996 150 5 0.1001 200 5 0.0979 0.1012 0.1024 0.1102 0.1258 0.1842 0.3438 0.7156 0.9361 0.9885 0.9983 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0935 0.1072 0.3005 0.7387 0.9436 0.9821 0.9934 0.9966 0.9985 0.9990 0.9993 50 5 0.1019 100 5 0.1012 0.1024 0.2126 0.7795 0.9933 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 150 5 0.0949 200 5 0.0996 0.1020 0.1746 0.7258 0.9955 0.9999 0.9999 1.0000 1.0000 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0965 0.1868 0.4053 0.6796 0.8998 0.9737 0.9921 0.9963 0.9984 0.9995 0.9993 50 5 0.0994 100 5 0.1030 0.2552 0.5387 0.7700 0.9632 0.9987 0.9999 1.0000 1.0000 1.0000 1.0000 150 5 0.0957 200 5 0.0997 0.2678 0.5578 0.7762 0.9664 0.9998 0.9999 1.0000 1.0000 1.0000 1.0000 2 n m n p m n p 2 p 2 m E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t l og n or m a l l og n or m a l E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t l og n or m a l E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t 167 Appendix W: Simulation Results Using Skewed Data with a 15% SS in p = 2 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0967 0.1150 0.2044 0.5074 0.8041 0.9374 0.9746 0.9904 0.9972 0.9984 0.9991 50 5 0.0956 100 5 0.1009 0.1058 0.1160 0.1553 0.2504 0.4650 0.7737 0.9310 0.9858 0.9960 0.9982 150 5 0.1001 200 5 0.0979 0.1011 0.1056 0.1287 0.1626 0.2307 0.4078 0.6638 0.8988 0.9806 0.9957 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0935 0.1099 0.2674 0.6685 0.9247 0.9895 0.9981 0.9993 0.9999 0.9998 0.9997 50 5 0.1019 100 5 0.1012 0.1075 0.1904 0.5607 0.9053 0.9927 0.9992 1.0000 1.0000 1.0000 1.0000 150 5 0.0949 200 5 0.0996 0.1050 0.1649 0.4819 0.8685 0.9907 0.9993 1.0000 1.0000 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0965 0.2099 0.4151 0.6046 0.7839 0.9072 0.9648 0.9889 0.9955 0.9983 0.9993 50 5 0.0994 100 5 0.1030 0.2263 0.4239 0.6243 0.8080 0.9372 0.9857 0.9976 0.9997 0.9998 0.9999 150 5 0.0957 200 5 0.0997 0.2103 0.4125 0.6021 0.8056 0.9404 0.9894 0.9978 0.9989 0.9997 0.9996 p m n 2 p m n 2 p m n 2 E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t l og n or m a l l og n or m a l l og n or m a l 168 Appendix X: Simulation Results Using Skewed Data with a 30% SS in p = 2 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0967 0.1256 0.2241 0.4939 0.7701 0.9173 0.9714 0.9868 0.9949 0.9973 0.9989 50 5 0.0956 100 5 0.1009 0.1073 0.1247 0.1722 0.2658 0.4155 0.6337 0.8513 0.9522 0.9859 0.9962 150 5 0.1001 200 5 0.0979 0.1048 0.1144 0.1400 0.1746 0.2546 0.3735 0.5437 0.7171 0.8843 0.9654 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0955 0.1013 0.1440 0.2777 0.5222 0.7769 0.9319 0.9846 0.9972 0.9993 1.0000 50 5 0.0983 100 5 0.0982 0.0999 0.1368 0.2239 0.3961 0.5983 0.7691 0.9078 0.9808 0.9985 0.9997 150 5 0.1008 200 5 0.0992 0.1019 0.1386 0.2079 0.3640 0.5372 0.7020 0.8314 0.9433 0.9940 0.9997 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0965 0.1458 0.1946 0.2119 0.2345 0.2724 0.2992 0.3175 0.3537 0.3645 0.3771 50 5 0.0994 100 5 0.1030 0.1390 0.1630 0.1993 0.2314 0.2801 0.3116 0.3556 0.3829 0.3986 0.4240 150 5 0.0957 200 5 0.0997 0.1258 0.1562 0.1843 0.2258 0.2756 0.3145 0.3499 0.3852 0.4220 0.4270 p m n 2 p m n 2 p m n E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t l og n or m a l l og n or m a l l og n or m a l 2 169 Appendix Y: Simulation Results Using Skewed Data with a SS in p = 5 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0934 0.1031 0.1236 0.2490 0.5473 0.8198 0.9456 0.9857 0.9970 0.9989 0.9998 50 5 0.0963 100 5 0.0958 0.0980 0.1020 0.1112 0.1419 0.2260 0.5187 0.8406 0.9713 0.9965 0.9994 150 5 0.0968 200 5 0.0967 0.0947 0.0973 0.1025 0.1097 0.1356 0.1916 0.3595 0.6965 0.9435 0.9919 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0991 0.0920 0.1296 0.2995 0.5772 0.8201 0.9460 0.9865 0.9974 0.9989 0.9998 50 5 0.0987 100 5 0.1005 0.1050 0.1067 0.2296 0.5250 0.8371 0.9689 0.9974 0.9994 0.9999 1.0000 150 5 0.1020 200 5 0.1023 0.0998 0.1054 0.1922 0.4607 0.7969 0.9658 0.9989 0.9999 0.9999 1.0000 p m n 5 p m n l og n or m a l l og n or m a l 5 E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t E m p i r i c al A P f or a 5 % S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0934 0.1109 0.1574 0.3359 0.6643 0.8916 0.9749 0.9945 0.9989 0.9996 0.9999 50 5 0.0963 100 5 0.0958 0.1013 0.1085 0.1309 0.1843 0.2998 0.5539 0.8396 0.9653 0.9960 0.9988 150 5 0.0968 200 5 0.0967 0.0958 0.0993 0.1123 0.1282 0.1688 0.2459 0.4111 0.6712 0.9096 0.9856 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0991 0.0907 0.1251 0.2176 0.3862 0.6074 0.7847 0.9006 0.9572 0.9829 0.9931 50 5 0.0987 100 5 0.1005 0.1046 0.1055 0.1588 0.2801 0.4819 0.6916 0.8418 0.9446 0.9816 0.9932 150 5 0.1020 200 5 0.1023 0.0978 0.1034 0.1453 0.2316 0.3983 0.6119 0.8036 0.9173 0.9730 0.9907 p m n 5 p m n l og n or m a l l og n or m a l 5 E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0934 0.1137 0.1713 0.3298 0.6137 0.8508 0.9568 0.9896 0.9973 0.9984 0.9997 50 5 0.0963 100 5 0.0958 0.1014 0.1143 0.1431 0.1914 0.2863 0.4627 0.6808 0.8868 0.9739 0.9942 150 5 0.0968 200 5 0.0967 0.0967 0.1020 0.1200 0.1401 0.1793 0.2447 0.3458 0.5092 0.7166 0.8836 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 20 5 0.0991 0.0878 0.1065 0.1140 0.1477 0.1793 0.2306 0.2725 0.3227 0.3643 0.4048 50 5 0.0987 100 5 0.1005 0.1028 0.0992 0.1102 0.1316 0.1546 0.2014 0.2350 0.2736 0.3244 0.3615 150 5 0.1020 200 5 0.1023 0.0975 0.0980 0.1150 0.1283 0.1491 0.1870 0.2183 0.2597 0.2960 0.3410 p m n 5 E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t E m p i r i c al A P f or a 3 0% S u s t ai n e d S h i f t U s i n g t h e M M R - M S D C h ar t l og n or m a l p m n 5 l og n or m a l 170 Appendix Z: Subgroup Size Analysis Using In-Control Data P r oc e s s E m p i r i c al F A P f or E m p i r i c al F A P f or D i s t r i b u t i on H ot e l l i n g' s T 2 C h ar t M M R - R M D C h ar t 100 5 0. 95 41 0. 09 64 100 10 0. 90 54 0. 10 50 100 15 0. 87 08 0. 10 20 100 20 0. 83 32 0. 10 47 100 5 0. 98 33 0. 09 86 100 10 0. 94 37 0. 10 42 100 15 0. 90 29 0. 09 49 100 20 0. 86 02 0. 10 30 t ( 3) l og n or m a l m np 5 5 171 Appendix AA: Subgroup Size Analysis Using Data with an IS in p = 5 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 100 5 0.0999 0.0991 0.0990 0.0990 0.1201 0.2394 0.6120 0.8989 100 10 0.0942 0.0999 0.0974 0.1434 0.5595 0.9532 0.9973 0.9998 100 15 0.0984 0.1000 0.1075 0.4214 0.9614 0.9990 1.0000 0.9999 100 20 0.1004 0.0994 0.1402 0.8056 0.9971 0.9999 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 100 5 0.1022 0.1013 0.1055 0.1498 0.4484 0.8643 0.9851 0.9975 100 10 0.0986 0.1038 0.1395 0.5409 0.9767 0.9998 1.0000 1.0000 100 15 0.0964 0.1030 0.1915 0.8414 0.9995 1.0000 1.0000 0.9999 100 20 0.1021 0.1050 0.2741 0.9650 0.9999 1.0000 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 100 5 0.0933 0.0925 0.0977 0.1019 0.1080 0.1333 0.2812 0.5998 0.8664 0.9692 0.9929 100 10 0.1016 0.0951 0.0952 0.1190 0.2992 0.7866 0.9787 0.9987 0.9997 1.0000 1.0000 100 15 0.0977 0.0998 0.1133 0.2610 0.8405 0.9947 0.9998 0.9999 1.0000 1.0000 1.0000 100 20 0.0898 0.1038 0.1310 0.6108 0.9852 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 ? = 4 . 5 ? = 5 100 5 0.0986 0.1016 0.1028 0.1130 0.2076 0.5308 0.8765 0.9826 0.9976 0.9994 1.0000 100 10 0.1042 0.1012 0.1173 0.3757 0.9080 0.9981 1.0000 1.0000 1.0000 1.0000 1.0000 100 15 0.0949 0.1093 0.1767 0.7495 0.9972 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 100 20 0.1030 0.1085 0.2768 0.9339 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t t ( 3) E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t p m n 5 t ( 3) E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t p m n 5 p m n l og n or m a l E m p i r i c al A P f or an I s ol at e d S h i f t U s i n g t h e M M R - R M D C h ar t l og n or m a l 5 p m n 5 172 Appendix BB: Subgroup Size Analysis Using Data with a 15% SS in p = 5 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 100 5 0.0963 0.0980 0.1120 0.1361 0.2236 0.5038 0.8621 0.9806 100 10 0.0917 0.1067 0.1270 0.3049 0.8734 0.9970 1.0000 1.0000 100 15 0.1003 0.1015 0.1805 0.7814 0.9979 1.0000 1.0000 1.0000 100 20 0.1021 0.1144 0.2951 0.9834 0.9999 1.0000 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 1 ? = 2 ? = 3 ? = 4 ? = 5 ? = 6 ? = 7 100 5 0.0950 0.0996 0.1130 0.2132 0.5115 0.8817 0.9861 0.9988 100 10 0.0989 0.1058 0.1997 0.7429 0.9961 0.9997 0.9999 1.0000 100 15 0.0998 0.0985 0.3297 0.9696 0.9996 1.0000 1.0000 1.0000 100 20 0.0996 0.1147 0.4881 0.9977 1.0000 1.0000 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 100 5 0.0933 0.0966 0.1075 0.1304 0.1843 0.2998 0.5591 0.8385 0.9673 100 10 0.1016 0.1014 0.1256 0.2416 0.6156 0.9663 0.9992 0.9998 1.0000 100 15 0.0977 0.1111 0.1791 0.5726 0.9905 0.9999 1.0000 1.0000 1.0000 100 20 0.0898 0.1202 0.2824 0.9500 0.9999 1.0000 1.0000 1.0000 1.0000 P r oc e s s D i s t r i b u t i on ? = 0 ? = 0.5 ? = 1 ? = 1.5 ? = 2 ? = 2.5 ? = 3 ? = 3 . 5 ? = 4 100 5 0.0986 0.1026 0.1078 0.1390 0.2592 0.5200 0.8104 0.9541 0.9957 100 10 0.1042 0.1035 0.1601 0.5318 0.9633 0.9994 1.0000 1.0000 1.0000 100 15 0.0949 0.1157 0.2799 0.8985 0.9995 1.0000 1.0000 1.0000 1.0000 100 20 0.1030 0.1189 0.4525 0.9899 1.0000 1.0000 1.0000 1.0000 1.0000 p m n 5 p m n 5 E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t t ( 3) E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t t ( 3) E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g H ot e l l i n g' s T 2 C h ar t p m n l og n or m a l E m p i r i c al A P f or a 1 5% S u s t ai n e d S h i f t U s i n g t h e M M R - R M D C h ar t l og n or m a l 5 p m n 5