Advanced Monitoring and Soft Sensor Development with Application to
Industrial Processes
by
Hector Joaquin Galicia Escobar
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
May 07, 2012
Keywords: Process Monitoring, Fault Detection, Fault Diagnosis, Soft Sensor, Online
Adaptation
Copyright 2012 by Hector Joaquin Galicia Escobar
Approved by
Dr. Jin Wang, Chair, Associate Professor of Chemical Engineering
Dr. Qinghua He, Associate Professor of Chemical Engineering, Tuskegee University
Dr. Harry T. Cullinan, Professor of Chemical Engineering
Dr. Mario R. Eden, Professor of Chemical Engineering
Abstract
This dissertation presents the research performed to develop innovative multivariate
process monitoring and soft sensor approaches for industrial processes.
In the operation of modern industrial processes, it is attempted to improve product qual-
ity and yield in shorter periods of time while observing tighter environmental regulations
and safety guidelines. Furthermore, these goals are tried to be accomplished facing a severe
global competition and shorter product life cycles. The most reliable approach to achieve
these goals is the development of first principle models (e.g. based on dynamic mass and
energy balances) to describe the physics, chemistry, and the different phenomena occurring
in the process. However, because industrial processes are generally quite complex to model
and are characterized by significant inherent nonlinearities, a rigorous theoretical modeling
approach is often impractical, requiring a great amount of effort. An alternative approach
to address these aspects is using process data. In most modern industries, a large amount of
variables are measured and stored automatically by the governing control system (e.g. DCS).
The stored data corresponds to different measurement instruments that are used for closed-
loop control or merely as indicators of different conditions in process units. The process data
can be utilized to address several aspects. These include monitoring the process over time
in order to detect special events (e.g. faults) and assign causes for them. Furthermore, the
data can be used to predict key process variables that are difficult to measure online (e.g.
quality) based on easy-to-measure process variables.
In the first part of this work (PART I), a new multivariate method to monitor continuous
and batch processes is developed based on the statistics pattern analysis (SPA) framework.
The SPA framework is developed to address challenges present in industrial data such as
nonlinearities and multi-modal distributions, which are characteristics of process data that
ii
normally affect traditional multivariate monitoring methods. In addition, different unique
characteristics of the SPA framework are discussed in detail. Also presented are compar-
isons of the monitoring performance based on SPA with the monitoring performance of
different linear and nonlinear multivariate monitoring methods. The comparison results
clearly demonstrate the superiority of the SPA-based monitoring in key aspects such as de-
tecting and diagnosing different faults. The research findings indicate that the SPA-based
monitoring is a promising alternative for monitoring industrial processes, especially where
nonlinearities and multi-modal distributions are present.
In the second part of this work (PART II), the idea of obtaining quality estimates of key
process variables using multivariate projection methods is explored. In many industrial pro-
cesses, the primary product variable(s) are not measured online or not measured frequently,
but are required for feedback control. To address these challenges, there has been increased
interest toward developing data-driven software sensors (or soft sensors) using secondary
measurements (e.g. variables easy-to-measure) based on multivariate regression techniques.
The prediction of these key process variables not only provides a way to have real time
measurements, but also aids in improving quality and preventing undesirable operating con-
ditions.
In this work, the properties and characteristics of dynamic partial least squares (DPLS)
based soft sensors are studied in detail. In addition, a reduced-order DPLS (RO-DPLS) soft
sensor approach is proposed to address the limitation of the traditional DPLS soft sensor
when applied to model processes with large transport delay. The RO-DPLS soft sensor is
also extended to its adaptive version to cope with processes where frequent process changes
occur. Also presented are alternatives to address practical issues such as data scaling and
the presence of outliers off-line and online. The application of the proposed soft sensor is il-
lustrated with various simulated and industrial case studies of a continuous Kamyr digester.
In both simulation and industrial case studies, the proposed RO-DPLS soft sensor shows
iii
that it can provide quality estimates of key process variables in the presence of different
disturbances, data outliers, and time-varying conditions.
iv
Acknowledgments
This dissertation would not have been possible without the help of many people. First
and foremost, I?d like to thank my advisor, Dr. Jin Wang, whose encouragement, guidance
and support from the initial to the final level enabled me to develop a deeper understanding.
Her example of hard work and dedication have definitively made me a better researcher.
I would also like to thank my research committee members, Dr. Qinghua He, Dr. Mario
R. Eden, and Dr. Harry T. Cullinan for their words of encouragement and their valuable
comments which have helped me complete this dissertation.
I would like to show my gratitude to Dr. Qinghua He. His contributions to develop my
understanding of different aspects of my research field have been fundamental and defini-
tively beyond the call of duty. Special thanks are given to Dr. Christopher B. Roberts, Dr.
Mark E. Byrne and Dr. Mario R. Eden who have always provided advice, whether it was
scholarly or personal. I?d also like to thank Dr. John Hung for his classes and teaching
style which have had tremendous impact in my career and in my development as researcher.
Thanks are also due to current and past members of the Wang?s Research Group, specially
Meng Liang, Min Kim and Xiu Wang, I wish you the best in your future careers.
For their financial support, I would like to thank the National Science Foundation and the
Alabama Center for Paper and Bioresource Engineering.
On the non-technical side, I owe my deepest gratitude to my parents Joaquin Galicia
and Carmelina Escobar for teaching me that a good attitude and effort are a fundamental
part of a happy life. To my sister Melina, I will never forget you and will love you forever.
v
Last, but not least, I would like to thank my beautiful wife Arianna for everything she
is and does. You are the love of my life and my inspiration ?Hasta El Final?.
Thank you God for showing me the path of light through the heart of my wife, my
family, my friends, my professors, and the Auburn Family, I trust You with all my heart.
Hector J. Galicia Escobar
Auburn, Alabama
April 09, 2012
vi
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
1 Introduction and Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . 1
PART I: Advanced Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Multivariate Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 The PCA Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.4 Fault Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Dynamic Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 16
2.3 Kernel Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 The KPCA Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3 Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.4 Fault Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Independent Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 The ICA Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.3 Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
vii
2.4.4 Fault Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Statistics Pattern Analysis: A Novel Process Monitoring Framework . . . . . . . 28
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 The SPA Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Statistic patterns generation . . . . . . . . . . . . . . . . . . . . . . 31
3.3.2 Dissimilarity Quantification . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.3 Why SPA works? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Fault Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.1 Contributions to Dp and Dr . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.2 Hierarchical Contribution Approach . . . . . . . . . . . . . . . . . . . 41
3.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Comprehensive Comparison of Linear and Nonlinear Monitoring Methods . . . . 44
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Quadruple Tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.1 Simulation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 The Effect of Window Overlapping . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.1 Fault 1: The Sensor Bias . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.2 Fault 2: Leakage in Tank 1 . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 Tennessee Eastman Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Simulation Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.2 Monitoring Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 Fault detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.1 Detection Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.2 False Alarm Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.3 Detection Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
viii
4.6 Fault diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.7 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
PART II: Soft Sensor Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Partial Least Squares Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1 PLS Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Recursive PLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Block-wise RPLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4 Block-wise RPLS with Moving Window . . . . . . . . . . . . . . . . . . . . . 81
5.5 Block-wise RPLS with Forgetting Factor . . . . . . . . . . . . . . . . . . . . 81
5.6 Online Data Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6 A Reduced-Order Dynamic PLS Soft Sensor Approach . . . . . . . . . . . . . . 86
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 The Relationship of DPLS Soft Sensor and Subspace Identification . . . . . . 90
6.3 Reduced Order DPLS Soft Sensor . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4 Implementation of the RO-DPLS Soft Sensor . . . . . . . . . . . . . . . . . . 100
6.4.1 Regressor Variable Selection . . . . . . . . . . . . . . . . . . . . . . . 101
6.4.2 Variable Delay Estimation . . . . . . . . . . . . . . . . . . . . . . . . 102
6.4.3 Variable Past Horizon Estimation . . . . . . . . . . . . . . . . . . . . 103
6.5 Application to a Simulated Continuous Digester . . . . . . . . . . . . . . . . 103
6.5.1 Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.5.2 Soft Sensors Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.5.3 Open Loop Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.5.4 Closed-Loop Case Study . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.6 Application to an Industrial Kamyr Digester . . . . . . . . . . . . . . . . . . 111
6.7 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7 Recursive Reduced-Order Dynamic PLS Soft Sensor . . . . . . . . . . . . . . . . 117
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
ix
7.2 Brief Review of the RO-DPLS Soft Sensor . . . . . . . . . . . . . . . . . . . 119
7.3 Comparison of the RO-DPLS Soft Sensor Performance and Practical Aspects 120
7.3.1 Description of the Data Sets and Case Studies . . . . . . . . . . . . . 121
7.3.2 RO-DPLS Model Update Performance Comparison . . . . . . . . . . 123
7.4 Application of the Recursive RO-DPLS Soft Sensor to Digester Control . . . 139
7.4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.4.2 Controller Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.4.3 Comparison of Closed-Loop Performance . . . . . . . . . . . . . . . . 141
7.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8 Bayesian Supervision of Outliers for a Recursive Soft Sensor Update . . . . . . . 145
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.2 Outlier Detection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.2.1 Off-line Outlier Detection for Initial Model Building . . . . . . . . . . 147
8.2.2 Online Outlier Detection for Recursive Model Update . . . . . . . . . 147
8.3 Bayesian Supervisory Approach for Online Outlier Classification . . . . . . . 148
8.3.1 Bayesian Outlier Classification Procedure . . . . . . . . . . . . . . . . 150
8.4 Soft Sensor Recursive Update with Outlier Detection and Classification . . . 151
8.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9 Conclusions and Proposed Future Work . . . . . . . . . . . . . . . . . . . . . . 158
9.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.1.1 Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.1.2 Soft Sensor Development . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.2 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9.2.1 Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9.2.2 Soft Sensor Development . . . . . . . . . . . . . . . . . . . . . . . . . 164
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
x
A Derivation of the linear relationship between eck and esk . . . . . . . . . . . . . . 188
B Detailed evaluation of Ebraceleftbigecf(k)bracketleftbig[zp(k)]T[zf?1(k)]Tbracketrightbigbracerightbig . . . . . . . . . . . . . . . 190
C Detailed derivation of Equation (5.13) and (5.14) for mean and variance recursive
update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
D Derivation of Equation (5.16). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
E Detailed derivation of Equation (5.17) and (5.18) for block mean and variance
recursive update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
xi
List of Figures
3.1 IllustrationofstepsinvolvedinmultivayPCA(MPCA).(a)Originalbatchrecords
with unequal length; (b) 3-dimensional array after trajectory alignment and mean
shifting; (c) Batch information for MPCA after unfolding (e.g. a matrix). . . . . 30
3.2 Schematic of the SPA framework for fault detection in batch processes. (a)
Original batch records with unequal length; (b) SP generation (e.g. various
batch statistics); (c) fault detection by dissimilarity quantification. . . . . . . . 31
3.3 Schematic of the window-based SPA method for monitoring continuous processes:
(a) original process data; (b) computed statistics patterns; (c) fault detection by
dissimilarity quantification. A block or a window of process variables shaded in
(a) is used to generate an SP shaded in (b). SPs are used to generate a point of
dissimilarity measure (e.g. shaded in (c)) to perform fault detection. . . . . . . 31
3.4 Scatter plot of the process variables for the nonlinear illustrative example . . . . 36
3.5 Scores of normal and faulty samples for PCA with the T2 threshold calculated
at the 99% confidence level (dashed-line). . . . . . . . . . . . . . . . . . . . . . 37
3.6 T2 of testing data. (a) Fault A; (b) Fault B. The dashed line indicates the
threshold on the T2 at the 99% confidence level. The dashed-dotted line indicates
the fault onset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.7 Density histogram and intensity map for the multivariate distribution of x1 and
x2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.8 Density histogram intensity map for different pairs of statistics for normal oper-
ating data. (a) Mean of x1 and mean of x2; (b) Std. deviation of x1 and Std.
deviation of x2; (c) Skewness of x1 and Skewness of x2; (c) Kurtosis of x1 and
Kurtosis of x2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.9 Scores of normal and faulty samples for SPA with the Dp threshold calculated at
the 99% confidence level (dashed-line). . . . . . . . . . . . . . . . . . . . . . . . 41
3.10 Dp and Dr indices of testing data. (a) Fault A; (b) Fault B. The dashed line
indicates the threshold on the Dp and Dr indices calculated at the 99% confidence
level. The dashed-dotted line indicates the fault onset. . . . . . . . . . . . . . . 42
4.1 Schematic diagram of the quadruple-tank process . . . . . . . . . . . . . . . . . 48
xii
4.2 Time series of process variables for the sensor fault in h4 . . . . . . . . . . . . . 49
4.3 Time series of process variables for the leakage fault in h1 . . . . . . . . . . . . 50
4.4 Fault detection performance. (a) PCA case II, (b) DPCA case II, (c) SPA case
I, (d) SPA case II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Score plot of the first two principal components for the SPA method. (a) Highly
correlated SPs (95% window overlapping); (b) Uncorrelated SPs (i.e., with-
out window overlapping). The dashed-line indicates the control limit of the
Hotelling?s T2 calculated at the 99% confidencel level. . . . . . . . . . . . . . . . 54
4.6 Dp and Dr indexes for normal testing data. (a) SPA case II, (b) SPA case II. . . 55
4.7 Contributionplotsbasedon(a)PCASPE;(b)PCAT2; (c)DPCASPE;(d)DPCA
T2; (e) SPA Dr; (f) SPA Dp for case II with bias in h4. Where hi and hi ? Li
denote the mean of hi and a lagged measurement of hi, respectively. . . . . . . . 56
4.8 Fault detection performance. (a) PCA case II, (b) DPCA case II, (c) SPA case
I, (d) SPA case II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.9 Contributionplotsbasedon(a)PCASPE;(b)PCAT2; (c)DPCASPE;(d)DPCA
T2; (e) SPA Dr; (f) SPA Dp for case II with bias in h4. Where hi and hi ? Li
denote the mean of hi and a lagged measurement of hi, respectively. . . . . . . . 58
4.10 Flow diagram of the Tennessee Eastman processes . . . . . . . . . . . . . . . . . 59
4.11 DetectionperformanceofPCA,DPCA,KPCA,ICAandSPAforfault5. (a)PCA;
(b) DPCA; (c) KPCA; (d) ICA; (e) SPA. The dash-dotted lines indicate the fault
onsets and dashed lines the index thresholds. . . . . . . . . . . . . . . . . . . . 67
4.12 Detection performance of PCA, DPCA, KPCA, ICA and SPA for fault 12.
(a) PCA; (b) DPCA; (c) KPCA; (d) ICA; (e) SPA. The dash-dotted lines in-
dicate the fault onsets and dashed lines the index thresholds. . . . . . . . . . . . 68
4.13 Detection performance of PCA, DPCA, KPCA, ICA and SPA for fault 18.
(a) PCA; (b) DPCA; (c) KPCA; (d) ICA; (e) SPA. The dash-dotted lines in-
dicate the fault onsets and dashed lines the index thresholds. . . . . . . . . . . . 69
4.14 Diagnosis of fault 10. (a) PCA (top T2, bottom SPE); (b) DPCA (top T2,
bottom SPE); (c) KPCA (reconstruction index ?i); (d) ICA(top I2d, middle I2d
bottom SPE SPE); (e) SPA (top row Dp, bottom row Dr). . . . . . . . . . . . 71
4.15 Diagnosis of fault 5. (a) PCA (top T2, bottom SPE); (b) DPCA (top T2, bottom
SPE); (c) KPCA (reconstruction index ?i); (d) ICA(top I2d, middle I2d bottom
SPE SPE); (e) SPA (top row Dp, bottom row Dr). . . . . . . . . . . . . . . . 72
xiii
4.16 Diagnosis of fault 12. (a) PCA (top T2, bottom SPE); (b) DPCA (top T2,
bottom SPE); (c) KPCA (reconstruction index ?i); (d) ICA(top I2d, middle I2d
bottom SPE SPE); (e) SPA (top row Dp, bottom row Dr). . . . . . . . . . . . 73
6.1 Schematic of a high-yield single-vessel Kamyr digester . . . . . . . . . . . . . . 87
6.2 Different delay associated with variables at different location. Subscripts in the
legend are the indices of the CSTRs used to simulate the digester where 1 and 50
correspond to the top and bottom of the digester respectively. Refer to Section 6.5
for the simulator details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3 Flow diagram of the proposed RO-DPLS approach . . . . . . . . . . . . . . . . 101
6.4 Soft sensor performance comparison - open loop: (a) DPLS; (b) RO-DPLS . . . 109
6.5 Soft sensor performance comparison - closed-loop: (a) DPLS; (b) RO-DPLS . . 112
6.6 Soft sensor performance comparison - industrial case: (a) DPLS; (b) RO-DPLS 114
7.1 Effect of different update intervals in four model update schemes applied to a
simulated case study. (a)MSE ; (b) ME. . . . . . . . . . . . . . . . . . . . . . . 126
7.2 Effect of different update intervals in four model update schemes applied to in-
dustrial case study I. (a) MSE ; (b) ME . . . . . . . . . . . . . . . . . . . . . . . 127
7.3 Effect of different update intervals in four model update schemes applied to in-
dustrial case study II. (a) MSE ; (b) ME . . . . . . . . . . . . . . . . . . . . . . 128
7.4 Effect of different forgetting factor on soft sensor performance for a fixed update
interval in three case studies. (a) MSE ; (b) ME. . . . . . . . . . . . . . . . . . 129
7.5 Prediction performance of different model update schemes applied to a simulated
case study. RPLS vs. FF-BRPLS (a); BRPLS vs. MW-BRPLS (b). . . . . . . . 130
7.6 Prediction performance of selected cases for different model update schemes ap-
plied to industrial case study I. RPLS (a); BRPLS (b); MW-BRPLS (c); FF-
BRPLS with ? = 0.1 (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.7 Prediction performance of selected cases for different model update schemes ap-
plied to industrial case study II. RPLS (a), BRPLS (b), MW-BRPLS (c), and
FF-BRPLS with ? = 0.1 (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.8 Simulated case study, MSE (a) and ME (b) of RO-DPLS prediction with fixed
or cross-validation selected components for a range of updating intervals. . . . . 136
xiv
7.9 Histogram of the number of components determined through cross-validation
in the simulated case study for two different cross-validation methods. (a) four
contiguous blocks cross-validation with model updated every 10 samples; (b) four
contiguous blocks cross-validation with model updated every 100 samples; (c)
leave-one-out cross-validation with model updated every 10 samples; (d) leave-
one-out cross-validation with model updated every 100 samples. . . . . . . . . . 137
7.10 Histogram of the number of components determined through cross-validation (a)
and comparison of prediction performance for fixed number of components vs.
cross-validation approach (b) for industrial case study II. . . . . . . . . . . . . . 139
7.11 Open loop response to a wood type disturbance and soft sensor prediction . . . 142
7.12 Comparison of closed-loop response for CL-1 (Kappa measurement feedback) and
CL-2 (soft sensor prediction feedback) . . . . . . . . . . . . . . . . . . . . . . . 143
8.1 Bayesian disturbance classification procedure . . . . . . . . . . . . . . . . . . . 151
8.2 Prediction comparison of different approaches applied to a simulated case study 153
8.3 SPE indices for the simulated case study. (a) SPEx; (b) SPEy . . . . . . . . . . 154
8.4 Comparison of predictions of different approaches for the industrial case study . 155
8.5 SPE indices for the industrial case study. (a) SPEx; (b) SPEy . . . . . . . . . . 156
xv
List of Tables
4.1 Simulation parameters for the quadruple tank case study* . . . . . . . . . . . . 48
4.2 Model settings for the quadruple tank case study . . . . . . . . . . . . . . . . . 52
4.3 Fault detection for bias in h4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Fault detection for leakage in tank 1 . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 Disturbances in the TEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.6 Monitored variables in TEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.7 Settings of different fault detection methods . . . . . . . . . . . . . . . . . . . . 62
4.8 Fault detection rates (percentage) of PCA, DPCA, KPCA, DPCA, and SPA for
the TEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.9 False alarms rate (percentage) of PCA, DPCA, KPCA, ICA, and SPA for the TEP 64
4.10 Fault detection delay (samples) of PCA, DPCA, KPCA, DPCA, and SPA for the
TEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1 Variances of the white measurement noise for the primary and secondary outputs 104
6.2 Selected regressor variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3 Dynamic soft sensor setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.4 Variances of the white noise for the unmeasured disturbances . . . . . . . . . . 106
6.5 Comparison of DPLS and RO-DPLS: open loop cases . . . . . . . . . . . . . . . 108
6.6 Controller tuning parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.7 Comparison of DPLS and RO-DPLS: closed-loop case . . . . . . . . . . . . . . . 111
6.8 RO-DPLS model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.9 Comparison of soft sensor performance in the industrial case . . . . . . . . . . . 113
7.1 Standard deviations of the white noise for the unmeasured disturbances . . . . . 121
xvi
7.2 Standard deviations of the white measurement noise for the primary and sec-
ondary outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3 Selected regressor variables for the simulated case study . . . . . . . . . . . . . 122
7.4 RO-DPLS model parameters for the simulated case study . . . . . . . . . . . . 123
7.5 RO-DPLS model parameters for the industrial case studies . . . . . . . . . . . . 124
7.6 SettingsandperformanceindicesofRO-DPLSwithdifferentmodelupdateschemes
for industrial case study I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.7 SettingsandperformanceindicesofRO-DPLSwithdifferentmodelupdateschemes
for industrial case study II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.8 PID controller settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.1 Performance of different soft sensors for the simulated case study . . . . . . . . 153
8.2 Performance of different soft sensors for the industrial case study . . . . . . . . 156
xvii
List of Abbreviations
(?)T transpose of a vector or matrix
(?)?1 inverse of a matrix
(??) prediction
? significance level
C ? Rm?p coefficient matrix in PLS
? diagonal matrix of the a retained eigenvalues in PCA
? forgetting factor
?x,y? inner product of vectors x and y
?D derivative time
?I integral time
? process time delay
a number of retained principal components in PCA/PLS
Dp monitoring index for the principal component subspace in SPA
Dr monitoring index for the residual subspace in SPA
E{?} expected value of a random variable
f future horizon
Kc proportional gain
xviii
la time lag between measurements for DPLS and auto-/cross-correlation statistics
m number of inputs in X
n number of samples in X and Y
p number of outputs in Y
T2 Hotelling?s T2 statistic
V ? ?m?n noise matrix
w window size
A, B, Cs, Cc system matrices for a LTI system
K the steady-state Kalman filter gain
P ? ?m?a loading matrix for X
R ? Rn?p residual matrix of Y
T ? ?n?a score matrix for X
X ? ?m?n original process matrix
x(k) sample vector for inputs
U ? Rn?a score matrix for Y
B ? Ra?a diagonal matrix of inner model coefficients in PLS
Q ? Rp?a loading matrix for Y
Y ? Rn?p matrix of primary measurements
AA Active Alkali
ANN Artificial Neural Networks
xix
BRPLS Block-Wise Recursive Partial Least Squares
DCS Distributed Control System
DL Dissolved Lignin
DPA Dynamic Principal Component Analysis
DPLS Dynamic Partial Least Squares
DS Total Dissolved Solids
EA Effective Alkali
FF-BRPLS Forgetting Factor Block-Wise Recursive Partial Least Squares
HOS Higher-Order Statistics
IC Independent Component
ICA Independent Component Analysis
KPCA Kernel Principal Component Analysis
LTI Linear Time Invariant
MLR Multiple Linear Regression
MPC Model Predictive Controller
MPCA Multiway Partial Least Squares
MPE Mean Prediction Error
MPLS Multiway Principal Component Analysis
MSPE Mean Squared Prediction Error
MSPM Multivariate Statistical Process Monitoring
xx
MW-BRPLS Moving Window Block-Wise Recursive Partial Least Squares
NIPALS Nonlinear Iterative Partial Least Squares
PCA Principal Component Analysis
PCR Principal Component Regression
PCS Principal Component Subspace
PID Proportional-Integral-Derivative Controller
PLS Partial Least Squares
RO-DPLS Reduced-order Dynamic Partial Least Squares
RPLS Recursive Partial Least Squares
RS Residual Subspace
SP Statistic Pattern
SPA Statistics Pattern Analysis
SPE Squared Prediction Error statistic
SPM Statistical Process Monitoring
SVD Singular Value Decomposition
SVM Support Vector Machine
TEP Tennessee Eastman Process
xxi
Chapter 1
Introduction and Dissertation Outline
Modern chemical plants normally have a large variety of sensors dedicated solely to the
task of monitoring equipments and process conditions. As a result, a large number of vari-
ables are automatically measured and stored by the governing control system. This has led
to the collection of vast amounts of data at different rates (e.g. seconds, minutes or hours).
In the last few decades, researchers have dedicated considerable efforts to develop and im-
prove ways to analyze the data. One of the most active research areas in recent decades have
been the development of Statistical Process Monitoring (SPM) methods [1?11]. Traditional
data-based monitoring methods include, the Shewhart [1], cumulative sum (CUSUM) [3],
and exponentially weighted moving average (EWMA) control charts [4]. These methods are
categorized as ?univariate? since they only look at the behavior of a single variable at a time.
Univariate approaches are utilized to monitor key process variables in order to distinguish
between common causes of variation and special (or assignable) causes[1, 12]. The first type
of variation refers to variations that affect the process all the time and are unavoidable within
the current process [6]. Special causes, on the other hand, are those producing deviations
that can potentially upset the process and deviate it from the desired operating point. It is
well known that univariate monitoring approaches can fail [2], as these methods ignore the
correlation among different variables that produce an effect in a certain key process variable
(e.g quality). To overcome the difficulty of analyzing simultaneously hundreds of process
variables collected at different rates, Multivariate Statistical Process Monitoring (MSPM)
techniques, which make use of the multivariate space, have been extensively studied by the
research community [13?15], and provide superior performance compared to the univariate
1
SPM control charts. MSPM techniques are designed to extract the main underlying phe-
nomena that are driving the process at a certain time from the process data. Most of the
MSPM techniques rely on the properties of chemometrics techniques. Wise et al. [16], define
chemometrics as the science of relating measurements made on a chemical system to the state
of the system via application of mathematical or statistical methods. In essence, chemomet-
rics techniques make use of the available data to develop an empirical model that allows
estimation of properties of interest such as quality or yield. The success of MSPM based on
chemometrics techniques relies on the fact that multivariate statistical models extract the
correlations among variables due to physical and chemical phenomena in an empirical way.
Chemometrics techniques such as principal component analysis (PCA) [2, 13, 17, 18] and
partial least squares (PLS) [19?25] are commonly applied. These methods are particularly
suited to analyzing large sets of correlated data. This is achieved by finding linear com-
binations of the original variables and projecting them in a lower-dimensional space. This
reduced space contains most of the variance observed originally and accounts for correlations
among different variables. Some pioneering work using PCA-based monitoring is presented
in the work of Jackson [14], Jackson and Mudholkar [15], Jackson [2], Kresta et al. [26], and
more recently Wise et al. [27]. The application of MSPM monitoring methods have been
reported successful in chemical and petrochemical industries [10, 11, 28?30] for batch and
continuous processes. In industry, one of the major benefits of MSPM monitoring methods
is associated with the detection and diagnosis of faults. By detecting failing sensors and
process upsets, MSPC techniques complement the role of automatic process control (APC)
to improve safety and process efficiency. It is important to point out that the goal of MSPM
approaches is different from automatic process control. APC is applied to reduce the vari-
ability of key process and product variables. This is achieved by compensating from the
observable behavior of disturbances in the process and product variables via manipulation
of less important variables ( e.g. manipulated variables) [6, 31]. MSPM methods are an
additional layer that is applied on top of the process and its automatic control system. It
2
is the goal of any monitoring approach to confirm that the process remains in the desired
operating region, otherwise detect and diagnose the process behavior that indicates the oc-
currence of a special event.
Although MSPM methods such as PCA-based monitoring methods have been extremely
successful in many applications, their monitoring performance can deteriorate depending on
the characteristics of the process. There are two main reasons for the deterioration in perfor-
mance. First, PCA is a second-order method, i.e., the mean and the variance-covariance of
the process variables are only considered to characterize the process. As a result, PCA lacks
the capability of providing higher order representations for non-Gaussian data [32], which
is common in industrial data due to nonlinear characteristics of processes and the effect of
feedback control. In addition, the theoretical development to determine the thresholds for
the traditional monitoring indices in PCA (e.g. Hotelling?s T2 and SPE) is based on a
multivariate distribution assumption. Therefore, when the scores (or latent variables) are
non-Gaussian distributed due to process nonlinearity or other reasons, the utilization of the
traditional monitoring indices may lead to a less satisfactory performance [32, 33].
In the first part of this work, a novel process monitoring framework termed statistics pat-
tern analysis (SPA) is developed. The advantages for monitoring industrial processes are
discussed, and its performance is comprehensively compared with traditional linear and
nonlinear monitoring methods. In addition, the SPA framework is extended to fault diagno-
sis based on contribution plots [5, 6, 9, 10, 34?36] to cope with the difficulties of traditional
methods for isolating the root cause of faults.
It is well known that regular PCA is only concerned with finding linear combinations of the
original variables contained in a matrix X that explain most of the variability contained orig-
inally. In several cases, including a group of primary measurements of greater importance
(e.g. quality or yield) could help indicating how the process is behaving. Unfortunately,
in the majority of cases, these variables are not measured as frequently due to several rea-
sons (e.g. no hardware sensors, online analyzers etc.). To address the challenge, soft sensors
3
(or inferential estimators) have been developed. Soft sensors make use of the information
contained in the matrix of process variables X to estimate the primary measurements Y. Soft
sensors can be developed in the form of a Luenberger observer [37] or a Kalman filter [38]
using a first principles dynamic model of the process. However, because chemical processes
are generally quite complex to model and are characterized by significant inherent nonlinear-
ities, a rigorous theoretical modeling approach is often impractical, requiring a great amount
of effort [39]. As a consequence, data-driven soft sensors have been developed and studied
extensively. Multiple linear regression (MLR) is the most basic form of model to design a
soft sensor. However, due to correlations in process variables, size of data set, and mea-
surement noise, MLR methods lead to imprecise estimations and poor prediction [26, 40].
Data-driven soft sensors based on multivariate regression techniques have been developed to
overcome the dimensionality, measurement noise and collinearity problems [8, 9, 28, 39, 41?
49]. Because data-driven soft sensors are easy to develop and to implement online, they
are potentially more attractive than stochastic filters or deterministic observers [39]. Artifi-
cial neural networks (ANN), principal component regression (PCR) or principal component
analysis (PCA), and partial least squares (PLS) are widely used regression techniques, and
their successful application to the development of soft sensors has been reported for different
processes [9, 47, 50?54]. Dynamic PCA (DPCA) and Dynamic PLS (DPLS), the extensions
of the conventional PCA and PLS algorithms for dynamic modeling, have been developed
to cope with the dynamic nature of processes. In these methods, lagged measurements are
augmented to the original input data matrices to cope with dynamic processes [29, 55].
Among the different soft sensor approaches, DPLS based soft sensors have been reported
successful in different type of processes [44, 48, 56?58]. However, despite many successful
applications, one limitation associated with DPLS is that it is usually viewed as an ad hoc
method. Additionally, there is a lack of theoretical understanding on whether it can cap-
ture process dynamics adequately and whether it can provide unbiased estimate for both
open and closed-loop data. Another problem encountered when performing DPLS is that
4
the addition of a relatively large number of lagged values in the input data matrix causes
a substantial increase in the dimension of the matrix. This implies that the computational
load required to the development of multivariate regression becomes considerably expensive,
and the regression procedure gets much lengthier. This is particularly true for spline PLS
and ANN PLS, which are based on computationally more intensive algorithms. As a result,
the model becomes more cumbersome to manipulate due to the presence of a higher number
of parameters [46]. Furthermore, as it can be seen later in this work, when process residence
time is long, large number of lagged measurements and more training data are needed for
DPLS to capture the process dynamics, and to provide satisfactory soft sensor performance.
To address these issues, in the second part of this work, a theoretical proof is derived to show
that DPLS-based soft sensors are dynamic estimators that can adequately capture process
dynamics, and are able to provide unbiased estimates of primary variables for both open
loop and closed-loop data. In addition, a reduced order DPLS (RO-DPLS) approach with
optimally selected lagged measurements of secondary variables is proposed (e.g. variables
that are easy-to-measure). The RO-DPLS soft sensor not only reduces model size and im-
proves prediction, but also significantly increases prediction horizon. The RO-DPLS soft
sensor is also extended to its adaptive version to cope with processes with frequent process
changes, addressing practical issues such as data scaling and the presence of outliers off-line
and online.
To summarize, this dissertation focuses on developing advance data-driven monitoring and
soft sensor strategies that can cope different industrial processes and their characteristics.
PART I : ADVANCED PROCESS MONITORING
Chapter 2 presents a review of the different data-driven techniques for process monitoring
compared in this work. These methods include principal component analysis (PCA), dy-
namic PCA, kernel PCA (KPCA), independent component analysis (ICA). In this chapter,
5
the main characteristics of the different algorithms and methods for fault detection and di-
agnosis are discussed.
In Chapter 3, a novel process monitoring framework termed statistics pattern analysis (SPA)
is introduced [32, 59?62]. The major difference of SPA and the other multivariate statis-
tics process monitoring methods is that in SPA, instead of monitoring the process variables
directly, the statistics of the process variables are monitored instead. In this chapter, the
SPA framework characteristics for fault detection are discussed. In addition, the framework
is extended to fault diagnosis based on contribution plots. This chapter contains some illus-
trative examples that show in detail how and why the SPA framework works. Finally, the
main characteristics and advantages of the SPA framework are discussed.
In Chapter 4, the potential of the SPA framework for fault detection and diagnosis is eval-
uated by using two case studies. In the first case study, a nonlinear system is simulated to
examine the performance and properties of the statistics pattern analysis framework (SPA).
In addition, the SPA framework performance is compared with traditionally applied linear
methods for process monitoring. In this chapter, it is demonstrated that SPA is able to han-
dle nonlinear processes, and is able to detect and diagnose subtle changes in the dynamics
of the system. In this case study, compared to the traditional PCA and DPCA methods,
SPA provides a more accurate, efficient, and reliable detection and diagnosis of faults. Two
different cases with different size of training samples are also studied. The study indicates
that SPA can also perform well with a limited amount of data. However, for the case where
data are limited, a window overlapping is necessary. If the window overlapping is large, it
causes the samples to be no longer independent from each other, which causes an increment
in the false alarm rates for normal operating data. For the case where enough data are
available (i.e., slight window-overlapping to no window overlapping), SPA not only is able
to provide high detection rates and a more clear identification of a fault, but it also reduces
the false alarms for normal operating data considerably and shows limited fault detection
delay. In the second case study, the SPA-based monitoring is tested at a plant-wide level.
6
The Tennessee Eastman process is used to evaluate the performance of the developed SPA-
based fault detection and diagnosis methods. In this case study, the performance of SPA is
compared with the two commonly applied linear monitoring methods (e.g. PCA and DPCA)
and two nonlinear monitoring methods (e.g. ICA and KPCA). This case study shows that
in general, all five methods work well in detecting and diagnosing most of faults. However,
for some faults that are difficult to detect and/or diagnose by PCA, DPCA, KPCA and ICA
based methods, SPA based method provides superior performance. In addition, because the
SPA-based method breaks down the contribution of a fault to different variable statistics, it
provides extra information in addition to identifying the major fault-contributing variable(s).
For example, the SPA-based contribution plots tell us whether the fault is due to a change in
variable mean or variance, which is a tremendous advantage for online process monitoring.
The excellent performance observed for fault detection and diagnosis, together with the less
intensive computational load required implementing SPA makes the framework a promising
alternative for monitoring industrial processes, especially when nonlinearities are prevalent.
PART II : SOFT SENSOR DEVELOPMENT
In industrial processes, several process variables that indicate the quality of the product or
yield are not measured frequently. This issue not only prevents the establishment of feedback
control strategies, but also forces the exclusion of these variables from the process monitor-
ing strategy. As a consequence, the performance of industrial processes is deteriorated. To
address the challenge, soft sensors (or inferential estimators) have been developed. Among
the different methods to develop soft sensors, the focus is given to dynamic partial least
square (DPLS) based soft sensors. DPLS based soft sensors have been successfully applied
to provide quality estimates of primary variables (e.g. quality or yield) in several processes.
As mentioned before, despite many successful applications, one limitation associated with
DPLS is that it is usually viewed as an ad hoc method, as there is a lack of theoretical
understanding on whether it can capture process dynamics adequately, and whether it can
7
provide unbiased estimates for both open and closed-loop data. Another problem encoun-
tered when performing DPLS is that the addition of a relatively large number of lagged
values in the input data matrix causes a substantial increase in the dimension of the matrix.
This implies that the computational load required to develop the multivariate regression be-
comes considerably heavier, and the regression procedure gets much lengthier. As a result,
the model becomes more cumbersome to manipulate due to the presence of a higher number
of parameters. In addition, when process residence time is long, large number of lagged
measurements and more training data are needed for DPLS to capture the process dynamics
and to provide satisfactory soft sensor performance.
Given the importance of PLS in the development of this work, in Chapter 5, a brief review
of the PLS method, its adaptive versions, and the data scaling schemes that accompany the
different versions of a PLS model is presented.
In Chapter 7, a rigorous proof is derived to show that a DPLS-based soft sensor is a dynamic
estimator that can adequately capture process dynamics, and provides unbiased estimates
of the primary variable for both open loop and closed-loop data [63?66]. Furthermore, a
reduced order DPLS (RO-DPLS) approach with optimally selected lagged measurements of
the secondary variables is proposed. The RO-DPLS soft sensor not only reduces model size
and improves prediction, but also significantly increases prediction horizon (i.e., provides
future predictions). The performance of the proposed RO-DPLS is demonstrated using both
a simulated single-vessel continuous digester and an industrial Kamyr digester. The simula-
tion results confirm that the DPLS and RO-DPLS soft sensors can capture process dynamics
adequately for both open loop and closed-loop operations. In addition, it is shown that the
RO-DPLS soft sensor performs better than the traditional DPLS approach, especially when
training data are limited. The industrial case study also confirms that RO-DPLS provides
an effective and practical solution to estimate the Kappa number (i.e, the key indicator of
the extent or reaction) in a continuous digester.
8
Real industrial processes are usually time-varying. For example, in the operation of a con-
tinuous digester, cooking temperature, the wood type utilized and other conditions vary
with time. Therefore the soft sensor model developed off-line should be updated to cope
with these changes. To address this difficulty, different adaptive schemes to update the soft
sensor online are investigated in Chapter 7. In this chapter, the developed RO-DPLS soft
sensor is extended to its online adaptive version [67?70]. The properties of four different
model update schemes are examined. These adaptation techniques include, recursive PLS
(RPLS), block-wise RPLS (BRPLS), moving window BRPLS (MW-BRPLS), and forget-
ting factor BRPLS (FF-BRPLS). In addition, the data scaling methods that accompany
the different updating schemes are studied. Extensive experiments were performed with one
simulated case study and two industrial case studies to investigate the properties of differ-
ent model update schemes. Results of the study presented indicate that for processes that
experience constant changes and various disturbances such as the digester process, recur-
sive RO-DPLS with RPLS model update scheme generally performs the best with frequent
update. The effectiveness of the recursive RO-DPLS soft sensor is demonstrated through
a digester closed-loop control case study, where a PID controller is used to reject a major
process disturbance i.e., a wood type change, and other common process disturbances. By
comparing the closed-loop performance between measurement feedback and soft sensor pre-
diction feedback, it is shown that the closed-loop control performance can be significantly
improved if the soft sensor prediction is fed back to the PID controller. The improvement is
mainly due to the reduced overall process delay as well as more frequent feedback enabled
by the reduced order model and the high frequency Kappa number predictions.
In the development of PLS-based soft sensors, outlier detection plays a critical role, especially
for recursive soft sensors that are updated online. In Chapter 8, multivariate approaches for
both off-line outlier detection (for initial soft sensor model building) and online outlier detec-
tion (for soft sensor model recursive update) are proposed [71, 72]. Specifically, for off-line
outlier detection leverage and y-studentized residuals are combined; while for online outlier
9
detection, the squared prediction error indices for X and Y are used to monitor the inde-
pendent variable and dependent variable space, respectively. For online outlier detection, to
differentiate the outliers caused by erroneous reading from those caused by process changes,
a Bayesian supervisory approach is proposed to further analyze and classify the identified
outliers [71, 72]. Both simulated and industrial case studies demonstrate the superior per-
formance of the soft sensor with Bayesian supervisory approach for outlier detection and
classification.
In the final Chapter 9, the major contributions of this work are summarized and suggestions
on future research directions are provided.
10
Chapter 2
Multivariate Process Monitoring
As discussed in the previous chapter, process monitoring techniques are not only key
in determining equipment malfunctions and instrument failures, but also are fundamental
in ensuring process safety, product quality, and process efficiency. As large amounts of
variables are measured and stored automatically by governing control systems, multivari-
ate statistical monitoring methods have become increasingly common in process industry.
Multivariate process monitoring techniques are based in different mathematical algorithms.
In addition, the way to detect and diagnose faults vary from one approach to the other.
In this chapter, the main characteristics of some relevant multivariate process monitoring
techniques are reviewed. This chapter reviews process monitoring techniques such as PCA
and dynamic PCA, together with a brief review of nonlinear monitoring techniques such as
Kernel PCA (KPCA) and Independent Component Analysis (ICA). The complete treatment
of these process monitoring methods is out of the scope of this dissertation. However, enough
background information is included so that the reader can follow the development of these
process monitoring techniques, and to understand the development of Part I of this work.
2.1 Principal Component Analysis
Given the conceptual importance of PCA in the development of process monitoring
schemes, in this chapter, the PCA method and related strategies for fault detection and
diagnosis based on PCA are reviewed.
11
2.1.1 Motivation
Highly automated control systems in industrial sites are able to collect thousands of
process measurements every day. The data captures the governing phenomena occurring in
the process. However, the m variables that are selected to represent any given process are
normally highly correlated and noisy. In addition, the number of variables that adequately
represent a process can be very large. These factors can make the analysis of process data
very difficult. The central idea of PCA is reducing the dimensionality of the original corre-
lated process data while retaining as much as possible the variation present originally. This
is achieved by projecting the process data into a lower-dimensional space, where a new set
of uncorrelated variables, the principal components, contain most of the variation of the
original variables in the first few principal components [73].
2.1.2 The PCA Method
Let X ? ?m?n denote the original process data matrix scaled to zero mean and unit
variance with n samples (rows) and m process variables (columns). PCA decomposes the
process data matrix X, by either nonlinear iterative partial least squares (NIPALS) [74] or
singular value decomposition (SVD) [75] X, as follows:
X = TPT + ?X = TPT + ?T?PT =
bracketleftBig
T?T
bracketrightBigbracketleftBig
P?P
bracketrightBigT
(2.1)
where T ? ?n?a and P ? ?m?a are known as the score and loading matrices, respectively.
The PCA projection reduces the original set of m variables to a principal components. The
residual matrix ?X can be further decomposed into the product of residual scores ?T and resid-
ual loading ?P matrices. The column space of P is the principal component subspace (PCS)
with dimension a. The column space of ?P is the residual subspace (RS) with dimensions m-a.
The PCA projection reduces the original set of m variables to a principal components. The
decomposition is made such that [T?T] s orthogonal and [P?P] is orthonormal. The columns
12
of P are the eigenvectors of the correlation matrix of the X variables associated with the
a largest eigenvalues and forms a new orthogonal basis for the space spanned by X. The
remaining columns of P (e.g ?P) correspond to the residual eigenvectors [7]. The pi loading
vectors (the columns of P), often referred as principal components, are linear combinations
of the original variables that explain the dominant directions of variability of the information
contained in the original process data matrix X [73, 76]. The score vectors ti are projections
of the process variables contained in X onto the new basis vector pi.
2.1.3 Fault Detection
Once the PCA model has been determined based on historical data, new multivariate
observations in a sample vector x can be referenced against this model. This is achieved by
projecting these observations onto the plane defined by the PCA loading vectors to obtain
their scores and residuals [6]. Traditionally, the projections or scores are plotted to reveal the
relationships of the variables forming a new multivariate observation x. In addition, to reveal
data patterns, to identify clusters, and to determine the influence of different multivariate
observations, two-dimensional plots of the most significant scores related to the dominant
directions of variability (e.g. dominant eigenvectors) are used. In addition, an evaluation
of the magnitudes of the loadings can serve as an indication of the relationships between
the process variables contained in X. For fault detection, Hotelling?s T2 [77] and SPE
charts [15] are commonly used. The SPE index measures variability that breaks the normal
process correlation [8], in other words, measures the variation in the residual subspace (RS),
and is defined as follows:
SPE = bardblxi ??xibardbl2 = bardbl?xbardbl2 =vextenddoublevextenddoubleparenleftbigI?PPTparenrightbigxvextenddoublevextenddouble2 (2.2)
where ?xi is the prediction of the multivariate observation x. Assuming that x follows a
multivariate normal distribution, Jackson and Mudholkar [15] developed a confidence limit
13
for SPE. The process is considered normal if SPE ? ?2?, where ?2? denotes the upper control
limit for SPE with a significance level ?. An alternative way of determining the confidence
limits for the SPE statistic is presented in [78]. The SPE statistic provides a way to test
if the process data are outside the normal operating region. The SPE statistic is normally
complemented by the use of the Hotelling?s T2 [77] statistic. The T2 index measures the
distance to the origin in the principal component subspace (PCS), i.e., provides an indication
of abnormal variability in the space defined by the PCS.
T2 = xTP??1PTx (2.3)
where?is the diagonal matrix of the a largest eigenvalues ofXXT. The process is considered
normal if T2 ? T2?, where the upper control limit can be determined using the F-distribution
[17]. It is important to point out that SPE and T2 limits can also be determined empirically
using validation data [32], which is often the case for industrial applications. Although T2
and SPE indices are used in conjunction for process monitoring, as noted in [8], their role
in process monitoring is asymmetric, i.e., both measure different situations in the process.
The normal region defined by the T2 statistic is usually much larger (e.g. capture normal
process variations) than the one defined by the SPE (e.g. noise and unexplained variability).
In general, a sample that only violates the upper control limit defined by the T2 statistic,
could indicate a fault but also a process change (e.g. away from the origin of the PCS),
which is not a fault. On the other hand, the SPE statistic defines a tighter region of normal
operation, and a sample that breaks the correlation structure defined in the model can easily
exceed the SPE control limit. Thus giving a more reliable indication of the presence of a
fault. This causes the SPE indexes to have lower chances of false alarms (i.e., type I errors)
and missed detection rates (e.g. type II errors) compared to the T2 [8]. Evidently, if a given
set of process variables (e.g. quality variables) have a stationary nature (e.g. vary around
an operating point), the use of the T2 statistic is preferred [8, 17].
14
2.1.4 Fault Diagnosis
One of the most important aspects in process monitoring is fault diagnosis. As described
before, traditionally plots of the scores and residuals together with the monitoring of the
time series of T2 indices and SPE indices are utilized for fault detection. However, with
these tools, it is difficult to determine the source of the problem after it has been detected.
In fault diagnosis, one is interested in pinpointing the variable or group of variables that are
most related to the particular fault. After diagnosis, the actual problem can be eliminated
and improvements in quality and yield can be observed. One of the most utilized solution
for fault diagnosis based on multivariate data analysis techniques is contribution plots [5,
6, 9, 10, 34?36]. Contribution plots provide a way to connect the information provided
by the monitoring index in the principal (e.g. T2) or the residual (e.g. SPE) subspace to
the actual process variables. In these plots, the contribution of the different variables to
the given monitoring index is visualized. In this way, variable(s) that contribute the most
to the fault are highlighted and related problems can be solved in a more oriented way.
The contributions of each process variable to the SPE statistic at instant i are the m (e.g.
the number of variables in the input space) elements of the residual xi ? ?xi squared [34].
These m elements are plotted as bars to find the contributions of the variables to the SPE
index at instant i. For T2, the contributions of the different variables can be calculated in
different ways [34, 79, 80]. Following [34], the contribution of the T2 statistic is determined
by rewriting the scores t as a sum of weighted data, with the loadings of each principal
component direction used as weights.
tid =
msummationdisplay
j=1
(xijpjd) (2.4)
where pjd is the p loading for variable j in dimension d (e.g. d goes from 1 to a, the number
of retained principal components) for the m process variables included in the input space.
The m elements represent the contribution of each variable to the score tid. As with the
15
contributions to SPE, a chart with bars of the m contributions to the tid is utilized for fault
diagnosis.
2.2 Dynamic Principal Component Analysis
In PCA, variables are assumed to be uncorrelated in time [29]. In other words, the
observations at sampling instant k are assumed to be independent from observations in the
past, i.e., at sampling instant k ?la, where la denotes the time lag between measurements.
As noted in [81], this assumption holds for processes where the sampling times are long.
However, for dynamic processes with fast sampling, a method needs to take into account
the serial correlations present in the data. In these type of processes, the process variables
will vary around the operating point region based on their dynamics. This in turn will
make the values of the variables at the current time, to depend on information of the past
measurements of the same variable and other related variables (e.g., auto-correlation and
cross-correlation). These characteristics of industrial processes often times make static PCA
to fail. To cope with these aspects, dynamic extensions of PCA to monitor different types of
processes have been proposed [29, 82?85]. In dynamic PCA (DPCA), static PCA is extended
to apply for monitoring multivariate dynamic systems. This is achieved by developing a
multivariate time series model of the data before the application of PCA. In a dynamic
system, the current values of the process variables X(k) will depend on the past values (e.g.
X(k?1), where k is the current sample index). Typical cases of dynamic models include,
multivariate autoregressive (AR), multivariate autoregressive with exogenous inputs (ARX),
AR orARX timeseries models for processes with moving average (ARMA)[86]. The common
factor of these methods is the augmentation of lagged data to the process variables to capture
the dynamic characteristics of the process. In DPCA, both the dimensionality reduction and
model extraction are done in a single step [29]. Two parameters are needed to be defined
in DPCA, the number of components to retain, and the number of time lags la needed to
characterize the dynamic of the process. Once these two parameters are defined, PCA is
16
applied to the dynamic version of matrix X (e.g. X = [X(k) X(k?1) X(k?la)]. Typically,
la is set to 1 or 2 representing a first and second order dynamic system, respectively [29, 81].
The order can be increased to have better approximations of nonlinear relationships. Similar
to PCA, after the principal components are determined based on the dynamic version of the
matrix X, the Hotelling?s T2 statistics and SPE indexes are used for fault detection. For
fault diagnosis, contribution plots of the T2 and SPE statistics are typically utilized. This
is achieved by summing up the contributions to T2 and SPE of the variables at the different
time lags [81]. In general, for processes where serial correlations are present, DPCA observes
better performance than regular PCA [29, 81].
2.3 Kernel Principal Component Analysis
2.3.1 Motivation
PCA-based monitoring methods have been extremely successful in many applications.
However, as PCA only considers the mean and the variance-covariance of variables to char-
acterize the process, and it assumes a linear relationship among the process variables, it
lacks the capability to provide higher order representations for non-Gaussian data. Complex
chemical and biological processes are nonlinear in nature. As a consequence, the process
data inherits a nonlinear behavior. The nonlinearity present in the process variables causes
PCA to perform poorly [87]. Kernel PCA (KPCA) has emerged as a solution to address this
problem [11, 88?93].
2.3.2 The KPCA Method
In KPCA the input data (e.g. x1,x2,...,xN ? Rm) is first mapped into a higher-
dimensional space via a nonlinear function, i.e., ? : Rm ? F. After the higher-dimensional
nonlinear mapping, the variables in the feature space are more likely to vary linearly [94].
The higher-dimensional linear space is also known as feature space (F) [11, 88, 89]. In
KPCA, PCA is done to find eigenvectors v and eigenvalues ? of the covariance matrix C in
17
the feature space F
C = 1N
Nsummationdisplay
j=1
?(xj)?(xj)T (2.5)
where the input process data xk ? Rm, k = 1,...,N is zero mean and unit variance and ?(?)
is a nonlinear mapping function that projects the input data to F. To find the principal
components, the eigenvalue problem has to be solved in F
?v = Cv (2.6)
In Eq. (2.6), all solutions v with ? negationslash= 0 must lie in the span of ?(x1),..., ?(xN) with
coefficients ?i (i = 1 = 1,...,N) such that [11, 88, 89]
v =
Nsummationdisplay
i=1
?i?(xi) (2.7)
Let ?a,b? denote the dot product between vectors a and b, by defining a K ? RN?N matrix
with elements [K]ij = ??(xi) , ?(xj)?, the eigenvalue decomposition in the feature space is
simplified as follows, see [11, 88?90] for a detailed derivation.
?? = 1NK? (2.8)
where ? = [?1?2 ??? ?N]T are the eigenvectors with eigenvalues ?1 ? ?2 ? ??? ?N. As
in regular PCA, the dimensionality of the problem in the feature space can be reduced by
selecting only the first few a eigenvectors. The KPCA procedure requires the matrix K to
be mean centered (e.g. K). The mean centered matrix K is obtained as follows [11, 88, 89]
K = K?KE?EK+EKE (2.9)
18
where
E = 1N
?
??
??
?
1 ??? 1
... ... ...
1 ??? 1
?
??
??
?
? RN?N (2.10)
To satisfy the eigenvalue normality constraint (e.g. bardblvbardbl2 = 1), the coefficients ? should be
normalized during the KPCA decomposition (e.g. bardbl?bardbl2 = 1N?). The score t of a test vector
x is obtained by projecting ?(x) into the space defined by the principal components vk in
F [11, 88]
tk = ?vk,?(x)? =
Nsummationdisplay
i=1
?ki ??(xi),?(x)? (2.11)
where k = 1,...,a retained eigenvectors. In KPCA, kernel functions are introduced to
compute the dot products in the feature space to solve the eigenvalue problem of Eq. (2.8)
in the form of k?a,b? = ??(a),?(b)? [11, 89]. Some commonly applied kernel functions are
listed below
? Linear kernel
k(a,b) = aTb (2.12)
? Polynomial kernel
k(a,b) = (aTb+d1)d2 (2.13)
? Gaussian kernel
k(a,b) = exp
parenleftBigg
?0.5bardbla?bbardbl
2
d23
parenrightBigg
(2.14)
? Sigmoid kernel
k(a,b) = tanh(d4(aTb) +d5) (2.15)
where the di parameters (i = 1,...,5) are empirically defined to achieve a given mapping ?
and feature space F [11].
19
2.3.3 Fault Detection
Similar to PCA, for fault detection, the T2 and SPE statistics are utilized as defined
below [11, 88, 90, 91]
T2 = [t1,...,tl]??1[t1,...,tl]T (2.16)
SPE =
Nsummationdisplay
j=1
t2j ?
lsummationdisplay
j=1
t2j (2.17)
where tk is calculated by Eq. (2.11) and ? is a diagonal matrix containing the eigenvalues of
the retained dimensions a after applying PCA in the feature space. As in PCA, the confidence
limits of the monitoring indexes can be determined theoretically via F-distribution [95] and
?2 distribution [6], respectively, or empirically using validation data. Once a KPCA model
is obtained, the kernel kte ? R1?N of a test vector xte ? Rm can be determined using
the training data xj by [ki]j = [kte(xte,xj)] [11]. Before projecting the test vector into
the space spanned by the retained eigenvectors, the kte kernel needs to be mean centered
kte = kte ?E1K?kteE+E1KE, where E1 = 1N[1,...,1] ? R1?N [11, 88]. The projection of
the test vector is calculated by
tk =
Nsummationdisplay
i=1
?ki kte(xi,xte) (2.18)
2.3.4 Fault Diagnosis
In regular PCA, the identification of the process variables that are most related to a
particular fault is traditionally done by utilizing contribution plots of the monitoring indices
T2 and SPE. However, for KPCA this task is not straightforward. This is a consequence of
the nonlinear mapping of the input space performed prior applying PCA in the feature space.
In this work, the approach proposed by Choi et al. [11] based on the sensor reconstruction
approach described in [96] is used for fault detection. In this approach, a fault index is
20
defined as a function of two type of squared errors (SPE) [90, 96]:
?i =
vextenddoublevextenddouble
vextenddouble?xi ???xi
vextenddoublevextenddouble
vextenddouble
2
bardblxi ??xibardbl2 (2.19)
where x is a multivariate observation from the input space, ?x is the estimate of x, ?xi is
identical to x except for the ith entry which is replaced by its reconstructed value, and ??xi
is its estimate. In other words, the denominator in Eq. (2.19) represents the SPE of the
input variable space, while the numerator represents the SPE of the reconstructed ?x in the
input space. In this approach, the trend of the fault index ?i for the different variables is
monitored to diagnose variables that contribute the most to the fault. In this approach,
when a faulty variable is reconstructed, the corresponding SPE will reduce or not increase
considerably. Thus a significant reduction in the fault index ?i of the ith variable is observed,
which indicates a variable with an abnormal pattern. In other words, if a fault is detected, the
denominator in Eq. (2.19) increases, while the numerator for a reconstructed faulty variable
decreases or not increases considerably compared to the numerator of Eq. (2.19) when the
faulty variable is not reconstructed. In this work, robust reconstruction is achieved by using
the algorithm proposed by Mika et al., [92, 93]. To achieve an optimal reconstructed value
z, the kernel function is restricted to the Gaussian form of Eq. (2.14), and the following
iterative approach is implemented to compute z [90]
z(t) =
summationtextN
i=1 ?iK(xi),z(t?1))xisummationtext
N
i=1 ?iK(xi,z(t?1)))
(2.20)
where z(0) = x and ?i = summationtextlk=1 tk?ki . In this way an estimate z of the testing vector x is
obtained by reconstructing its projection into the principal component subspace.
For online implementation of KPCA, there are several factors to be considered. First, the
dimension of the feature space need to be determined. When the training data are large,
the computation time for computing the matrix K increases significantly (e.g. K ? RN?N).
21
Second, the number of retained eigenvectors after performing PCA in the hyperplane needs
to be determined. Typically, a cross-validation approach is used [90]. In addition, the
parameters di involved in the computation of the kernel matrix K need to be determined.
These parameters, for instance d3 for the Gaussian kernel, not only play an important role
in the nonlinear mapping, but also in the monitoring and diagnosis performance. For a
Gaussian kernel, typically d3 in Eq. (2.14) is set to be a product of an ad-hoc constant times
the number of variables used in the monitoring approach (e.g. 5m or 10m, where m is the
number or variables in the input space) [11, 90].
2.4 Independent Component Analysis
2.4.1 Motivation
The presence of unmeasured disturbances, correlations and noise hides the valuable fac-
tors that govern industrial processes. Revealing these hidden factors can be used to better
characterize a process. In PCA, this is achieved by projecting the input data onto a lower-
dimensional space that captures most of the variance and accounts for correlations among
variables. Since the objective of PCA is to decorrelate the scores (e.g. projections) and only
consider up to second-order statistics to characterize a process, it lacks the capability of rep-
resent processes where nonlinearities are prevalent (e.g. non-Gaussian distribution of input
variables). Independent component analysis (ICA) is a recently developed technique [97?102]
that by satisfying formal statistical independence and implicitly using higher-order statistics,
it decomposes the process data into linear combinations of statistically independent com-
ponents (ICs) that contain the main governing characteristics of the process. Applications
of ICA have been reported in several fields of engineering, some examples are signal pro-
cessing, machine vibration analysis, radio communication, infrared optical source separation
etc. [98, 103].
22
2.4.2 The ICA Method
In ICA, xi zero-mean measured variables (i = 1,...,m) are expressed as a linear com-
bination of sj statistically independent components (j = 1,...,d ? m). Several algo-
rithms have been proposed to estimate independent components [104, 105]. In this work,
the algorithm developed in [102] is briefly reviewed. Let x = [x1,...,xm]T denote a column
vector of observations of process variables, and s = [s1,...,sd]T a column vector of inde-
pendent components, to be related by a mixing matrix A ? Rm?d of coefficients aij (e.g.
xi = ai1s1 +ai2s2 +...+aidsd) ? i, as follows
x = As (2.21)
The model of Eq. (2.21) is referred as ICA model [102]. In Eq. (2.21), both the mixing
matrix A and and independent components s must be estimated based on the vector of
observations x. The starting point of the estimation process is based on the assumption of
statistically independence of the the components si [102], i.e, the joint probability function
can be factorized in individual probability functions. For ICA to be possible, the independent
components are restricted to be non-Gaussian [102], since a joint probability function of
Gaussian, uncorrelated variables would not provide enough information to identify dominant
directions of the mixing matrix A. Assuming that d = m the mixing matrix A is square.
After the estimation of A, its inverse (W = A?1, the demixing matrix) can be computed,
and the independent components can be obtained by [102, 105]:
s = Wx (2.22)
Before ICA is applied, sphering (or whitening) is performed to remove correlation and cross-
correlation between the random input variables. Whitening can be achieved by performing
regular PCA on the correlation matrix R, where R = EbraceleftbigxxTbracerightbig, with vector x zero mean and
23
unit variance, and E{?} denotes the expected value. The decomposition of R by singular
value decomposition (SVD) is as follows
R = U?UT (2.23)
where U is an orthogonal matrix of the eigenvectors corresponding to the diagonal matrix
? of eigenvalues of R. A whitening transformation is achieved by
z = ??1/2UTx (2.24)
where ??1/2 = diag(??1/21 ,...,??1/2d ). After whitening, the mixing matrix A is transformed
to a new one B with better properties
z = ??1/2UTx = ??1/2UTAs = Bs (2.25)
where B is an orthogonal matrix (e.g. EbraceleftbigzzTbracerightbig = BEbraceleftbigssTbracerightbigBT = BBT = I, EbraceleftbigssTbracerightbig =
I) [33, 102, 106]. Thus, the estimation problem is reduced from determining d2 parameters
of matrix A to d(d?1)2 parameters of the orthogonal matrix B [102]. The estimates of s (e.g.
?s) are determined from Eq. (2.25) by
?s = BTz = BT??1/2UTx (2.26)
The relationship of the demixing matrix W and B is obtained from Eq. (2.22) and Eq. (2.26)
W = BT??1/2UT (2.27)
A robust and reliable way of determining the ICs is achieved through the use of the FastICA
algorithm [100?102, 105]. The key of the algorithm is to find a vector bi (e.g. a column of
B that maximizes non-Gaussianity of ?si = (biz)). In [102], it is shown that a measure of
24
non-Gaussianity is a direct indicator of statistical independence. A typical way of measuring
departures from Gaussianity is the use of the absolute value of the fourth-order cumulant or
kurtosis. As kurtosis is affected by outliers, other measures of non-Gaussianity have been
developed. Non-Gaussianity indicators based on the differential entropy such as negentropy
are normally utilized. The entropy of a random variable gives an indication of unpredictabil-
ity and lack of structure [102]. Thus Gaussian variables have the largest entropy among all
random variables of equal variance [102]. Negentropy J is defined as follows
J(y) = H(ygauss)?H(y) (2.28)
where H indicates the differential entropy, ygauss is a Gaussian random variable of the same
variance as y. The estimation of J is difficult to determine in practice. As a consequence,
approximations of J have been developed.
J(y) ? [E{G(y)}?E{G(v)}]2 (2.29)
where G is a non-quadratic function, y is assumed to be zero mean and unit variance, and v
is a Gaussian variable of zero mean and unit variance. To obtain robust approximations of
negentropy, the following contrast functions are suggested [102, 105]
G1(u) = 1a
1
log cosh(a1u) (2.30)
G2(u) = exp
parenleftbigg
?a2u
2
2
parenrightbigg
(2.31)
where 1 ? a1 ? 2 and a2 ? 1. In this work, G1 is the contrast function selected as
indicator of non-Gaussianity for the FastICA algorithm. Based on this approximation,
the FastICA algorithm calculates the columns of matrix B, which allows the identifica-
tion of the corresponding ICs (e.g. Eq. (2.26)). The details of the algorithm can be found
25
in [100?102, 105]. A MatlabTM implementation of the FastICA algorithm is available at
http://www.cis.hut.fi/projects/ica/fastica/.
2.4.3 Fault Detection
The application of process monitoring based on ICA have been studied by several inves-
tigators [33, 103, 106?113]. In this work, the approach found in [106] is followed. For fault
detection, under the assumption that the number of variables (m) is equal to the number of
independent components (d), the demixing matrix W and estimates of independent compo-
nents ?S for normal operation are found by the FastICA algorithm. In addition, the matrices
B, ??1/2UT, and A are obtained from the whitening step. In ICA, the ordering of indepen-
dent components is not determined by the variance of the variables as in PCA [107]. Several
criteria exist to order the independent components. These include, extent of non-Gaussianity,
L? norm, Euclidean norm, and mean squared error of reconstruction [114?117]. In [106],
the Euclidean norm approach is selected to order the columns of the demixing matrix W
and to sort the ICs. In addition, a SCREE plot of the norm of the Euclidean norm of the
columns of W versus the number of IC is used to reduce the dimensionality of the problem,
i.e., select only the columns of W with the most dominant Euclidean norm. The selected
columns of W are assumed to have the greatest effect in the variation of the estimates of
the independent components (?S). Thus the matrix W is partitioned into two, the dominant
part Wd (i.e. selected rows of W) and reduced part We (i.e., remaining columns of W).
Using the selected indices, the equivalent is done for matrix B to generate matrices Bd and
Be. For a new multivariate observation xnew, the new dominant and reduced independent
data vectors can be computed after the whitening transformation, i.e.,?snew,d = Wdxnew and
?snew,e = Wexnew, respectively.
In [106], three different statistics for fault detection are defined
I2 =?sTnew,d?snew,d (2.32)
26
I2e =?sTnew,e?snew,e (2.33)
SPE = (xnew ??xnew)T(xnew ??xnew) (2.34)
where ?x = (??1/2UT)?1BdWdxnew. In Eq. (2.32), I2 gives an indication of the variability
observed in the dominant independent component subspace. I2e in Eq. (2.33), accounts for
errors in the selection of the dominant part of the independent component subspace [106].
SPE in Eq. (2.34) accounts for the monitoring of the unmodeled part of the common cause
of variation in the input data. In [107], the thresholds for the different monitoring statistics
are determined based on a kernel density estimation to cope with the non-Gaussianity of the
ICs. For the performance comparisons in subsequent chapters and to provide a fair com-
parison of different methods, the thresholds are determined empirically using validation data.
2.4.4 Fault Diagnosis
For fault diagnosis (e.g. to reveal the variables that are most related to the fault),
Lee et al. [107] develop contribution plots for the different monitoring statistics, i.e., Equa-
tions (2.32) - (2.34). The variable contributions of xnew for I2, I2e, and SPE are determined
as follows
xcd = Q
?1Bd?snew,d
bardblQ?1Bd?snew,dbardbl bardbl?snew,dbardbl (2.35)
xce = Q
?1Be?snew,e
bardblQ?1Be?snew,ebardbl bardbl?snew,ebardbl (2.36)
xc,spe = e = xnew ??xnew (2.37)
where Q = (??1/2UT). The contribution plots resulting from the application of Equa-
tions (2.35) - (2.37) can have negative and positive contributions. As with the contribution
plots of regular PCA, the magnitude of the contribution indicates what variable or group of
variables are most related to the fault.
27
Chapter 3
Statistics Pattern Analysis: A Novel Process Monitoring Framework
In this section, the development of a novel process monitoring method is presented.
Statistics pattern analysis (SPA) is a new multivariate process monitoring method recently
developed [32, 59?61]. In this dissertation, the SPA framework properties for fault detection
and diagnosis are developed, and its application is illustrated.
3.1 Motivation
The SPA framework was originated to cope with the challenges in semiconductor man-
ufacturing [59], i.e., a batch process. Traditional methods for monitoring batch processes,
including multiway PCA (MPCA) and multiway PLS [6, 10, 28, 80, 118, 119], are faced with
the difficulty of handling the characteristics of batch data, such as unequal batch and/or step
length, unsynchronized or misaligned batch trajectory, and multi-modal batch trajectory dis-
tribution. These characteristics make traditional process monitoring methods to fail or need
additional preprocessing steps. These steps are usually done off-line and are difficult to au-
tomate [59]. Similar to continuous processes, batch process monitoring methods make use
of multivariate data analysis techniques to extract the correlation among different variables
and to reduce the dimensionality of the problem (e.g. MPCA). However, the characteristics
of certain processes, including the ones present in semiconductor manufacturing, make the
data to show significant deviations from a multivariate normal distribution. As discussed
previously, these characteristics make PCA or PLS monitoring performance to deteriorate.
In addition, including several preprocessing steps increases the difficulty for establishing and
maintaining these models. For example, in a high-mix semiconductor fabrication site, the
28
number of models is in the range of several thousands to tens of thousands [59, 120]. This oc-
curs due to the diversity of products manufactured in semiconductor fabrication site, where
a single machine can produce different type of products. As a consequence, model building
and maintenance are much more labor intensive than in a typical chemical plant. In [59],
the negative effect of adding pre-processing steps is illustrated by comparing the monitoring
performance of traditional monitoring methods (e.g. MPCA and MPLS), as a function of
the pre-processing steps. The study confirms that traditional methods tend to deteriorate
their effectiveness as the number of pre-processing steps increase.
3.2 The SPA Method
Unlike traditional process monitoring methods that monitor the different process vari-
ables to perform fault detection and diagnosis, the SPA-based monitoring makes use of the
?statistics? of the process variables in the batch to perform fault detection and diagnosis. In
this way, the method not only eliminates the need of all data preprocessing steps in monitor-
ing batch processes, but also provides superior performance [59]. Figure 3.1 and 3.2 illustrate
the steps involved for batch process monitoring for MPCA (e.g. traditional method) and
SPA, respectively. In SPA, a set of statistics patterns (SPs), which capture the characteris-
tics of the process variables in a batch, such as mean, and variance-covariance are used as an
input to perform fault detection. In SPA, a statistic pattern (SP) is defined as a collection of
different statistics calculated from a batch. Since batches are processed in the same equip-
ment, the governing physical/chemical mechanisms, such as mass transport, kinetics and
thermodynamics are the same. The SPs of ?in-control? batches will follow a certain pattern,
whereas abnormal SPs would exhibit some deviation from the normal behavior. As illus-
trated in Figure 3.2, after assembling a matrix of SPs, the second step, called dissimilarity
quantification step, is performed. In this step, dissimilarities among SPs are measured and
confidence limits are determined to distinguish between abnormal and normal batches. For
a new batch, if its dissimilarity index is below the confidence limit, it is considered normal;
29
Figure 3.1: Illustration of steps involved in multivay PCA (MPCA). (a) Original batch
records with unequal length; (b) 3-dimensional array after trajectory alignment and mean
shifting; (c) Batch information for MPCA after unfolding (e.g. a matrix).
otherwise, it is categorized as faulty. Continuous rather than batch processes are common in
chemical and petrochemical industries. As a consequence, the SPA framework was extended
to monitor continuous processes [32, 32, 60, 61]. For a continuous process, the definition of
statistic pattern (SP) varies. A SP for a continuous process is a collection of various statis-
tics calculated from a window (or segment) of the process variables [32]. These statistics
capture the characteristics of each individual variable (such as mean, variance, kurtosis and
skewness), the interactions among different variables (such as correlation), as well as process
dynamics (such as auto-correlation and cross-correlation). After the SPs are computed from
the training data, the dissimilarities among the training SPs are quantified to determine an
upper control limit of the detection index [32]. Figure 3.3 illustrates the two steps in SPA
for monitoring continuous processes.
The following sections provide a brief summary of the characteristics and elements consti-
tuting the steps involved in SPA-based monitoring.
3.3 Fault Detection
In this section, the main characteristics of the two steps for fault detection in SPA are
described.
30
Figure 3.2: Schematic of the SPA framework for fault detection in batch processes. (a)
Original batch records with unequal length; (b) SP generation (e.g. various batch statistics);
(c) fault detection by dissimilarity quantification.
Figure 3.3: Schematic of the window-based SPA method for monitoring continuous processes:
(a) original process data; (b) computed statistics patterns; (c) fault detection by dissimilarity
quantification. A block or a window of process variables shaded in (a) is used to generate
an SP shaded in (b). SPs are used to generate a point of dissimilarity measure (e.g. shaded
in (c)) to perform fault detection.
3.3.1 Statistic patterns generation
Let the matrix X represent the collection of information of process variables (e.g. from
a window for continuous processes or from a batch for batch processes), as shown below:
X = [x1 x2 x3 ??? xm]
31
X =
?
??
??
??
??
x1(k ?w + 1) x2(k ?w + 1) x3(k ?w + 1) ??? xm(k ?w + 1)
x1(k ?w + 2) x2(k ?w + 2) x3(k ?w + 2) ??? xm(k ?w + 2)
... ... ... ... ...
x1(k) x2(k) x3(k) ??? xm(k)
?
??
??
??
??
(3.1)
where w represents the window width for continuous processes or available samples in a
batch, k is the sample index of the last measurement (most recent or last sample in a batch)
in the window (or the batch). As previously discussed, a SP consists of different statistics
calculated from the group of data. Generally, statistics can be lumped in three different
categories:
S = [ ? | ? | ? ] (3.2)
? in Equation (3.2) denotes the first order statistics, i.e., variable means, whose elements
are calculated from the data contained in the window (or batch). ? in Equation (3.2) de-
notes the second-order statistics, which include variance, correlation, auto-correlation, and
cross-correlation of different variables. ? in Equation (3.2) denotes the higher-order statis-
tics, including skewness, kurtosis, and other higher-order moments. Higher-order statistics
(HOS) measure the extent of nonlinearity and quantify the non-Gaussianity of the prob-
ability distribution of the process variables [32, 121]. Specifically, skewness measures the
asymmetry and kurtosis measures the ?peakedness? of the process variable distribution.
HOS? for variables that follow a normal distribution have a value of zero. After matrix S
has been formed, the final SP is generated by vectorizing S into a row vector. Note that
by the symmetry of groups of statistics such as covariance (e.g. symmetric matrix), the
dimension of the SP can be reduced. Once the training SPs have been lumped into a matrix,
the dissimilarity quantification step can be performed.
32
3.3.2 Dissimilarity Quantification
After the SP matrix is obtained for the training data, a modeling stage is followed.
This model is used to perform a quantification of dissimilarities of training and testing
SPs. Dissimilarities can be quantified by measuring distances or angles between objects.
In the case of PCA, distance-based metrics such as T2 and SPE are used. In SPA, PCA
is chosen to craft a model relating the different SPs in the training data (e.g. normal
operating data). Note that the framework is not limited to the use of PCA as this method
is just one way to determine dissimilarities among different SPs. [122]. The major difference
between the traditional PCA-based and the proposed SPA-based fault detection methods
is that PCA monitors the process variables, while SPA monitors various statistics of the
process variables. In other words, in PCA singular value decomposition (SVD) is applied
to the original process variables to build the model for normal operation, and the new
measurements of the process variables are projected onto the PCA model to perform fault
detection; while in SPA, SVD is applied to various statistics of the process variables to
build the model, and the statistics calculated using the new measurements are projected
onto the model to perform fault detection. In this way, different statistics that capture
the different characteristics of the process can be selected to build the model for normal
process operation, and various higher-order statistics can be utilized explicitly. As PCA is
used to craft a model for normal operation, fault detection indices such as T2 and SPE are
also applicable. To distinguish the PCA-based indices from the SPA-based indices, in this
work, Dp and Dr indices are defined to represent the T2 and SPE indices monitoring the
principal component subspace and residual subspace, respectively. As in PCA, the process is
considered within the normal operating region if both dissimilarities indices are below their
thresholds calculated at the ? significance level. Before modeling, the SPs from training and
testing data have to be normalized (zero mean and unit variance), to avoid the influence in
the model of differences in scaling (i.e. units of measurement) and operating point differences
(i.e. different mean values) among variables. In addition, the thresholds for Dp and Dr can
33
be obtained using analytical expressions based on normality assumptions [8, 15], or based
on an empirical method using the training data [123, 124].
3.3.3 Why SPA works?
In SPA, PCA is applied to quantify the dissimilarities among statistic patterns (SPs). As
a result, the fundamental assumptions of PCA have to be satisfied, i.e., multivariate Gaussian
distribution and independent samples. The major claim in SPA is that although the process
variables may show deviations from a multivariate Gaussian distribution, the statistics of
these process variables satisfy a multivariate Gaussian distribution much better than the
process variables, and their projections in the principal component subspace, i.e., the scores.
In other words, the SPs follow a multivariate Gaussian distribution asymptotically. This
enables SPA to provide superior performance compared to applying PCA directly to the
input data [59]. This fact can be proved by the classical central limit theorem (CLT), which
states that ?whatever the distribution of the independent variables ?v - subject to certain
very general conditions - the sum ? = ?1 +???+?n is asymptotically Gaussian (m, ?) [125],
where
m = m1 +m2 +???+mn (3.3)
?2 = ?21 +?22 +???+?2n (3.4)
where mv and ?v are the mean and standard deviation of ?v?. Among the many extensions
of the classical CLT, extensions relevant to this work are provided in [126, 127], where
the requirements of the independency among different random variables is relaxed. In the
following section, the SPA framework and its properties are illustrated to elucidate the
previous discussions.
34
3.4 Illustrative Example
In this section an illustrative example of the application of SPA is presented. As dis-
cussed previously, the major difference and advantage of SPA is the use of the statistics of
the process variables rather than the process variables themselves for process monitoring.
The success of SPA is based from the fact that although the process variables may have
a non-Gaussian distribution, due to the central limit theorem (CLT), the statistics of the
process variables follow a Gaussian distribution asymptotically. This feature enables the
dissimilarity quantification based on PCA to truly capture the main characteristics of the
process variables, thus providing a better monitoring performance. In this section the appli-
cation and success of SPA is elucidated by using a simple illustrative example.
In this example, the process consist of two variables behaving according to the following
nonlinear relationship
x2 = x21 +e (3.5)
where x1 is a random variable taken from a uniform distribution with values between
[?0.5 0.5] and e ? N(0,0.022). A normal data set consisting of 15,000 samples is simulated,
10,000 samples are used for training, and the rest of the samples are used for validation.
Two data sets of 5,000 samples each, corresponding to two groups of faults (A and B) are
simulated as follows:
Fault A
x1 = e1 (3.6)
x2 = 0.15 +e2 (3.7)
Fault B
x1 = ?0.6 +e1a (3.8)
x2 = 0.1 +e2a (3.9)
35
?1?0.50
0.5
0
500
1000
?0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Sample
x1
x 2
Normal Training
Fault: A
Fault: B
Figure 3.4: Scatter plot of the process variables for the nonlinear illustrative example
where e1 and e1a are two different white noise realizations taken from a normal distribution
with variance ?2 = 0.022. Similarly, e2 and e2a are two different white noise realizations
with variance ?2 = 0.012. For each fault data set, the first 100 samples are normal samples,
and the remaining samples (4,900) are affected by the fault. Figure 3.4 shows the scatter
plot of some samples taken from the different data sets. It is clear from Figure 3.4 that the
two faults do not correspond to the behavior of the normal operating data. For comparison,
the PCA and SPA methods are used to detect the faults A and B. For PCA, two principal
components are selected to capture the variability of the process. This causes only the
T2 index to be applicable for fault detection (e.g. to monitor the principal component
subspace). For the SPA method, a nonoverlapping window width of 50 samples is used
for training and validating. For testing, (i.e., the faulty data) after the first window is
assembled, the window is shifted by one sample to reduce the detection delay. Since this
process is static and there is not a strong correlation between x1 and x2, no correlation
36
?3 ?2 ?1 0 1 2 3?5
?4
?3
?2
?1
0
1
2
3
4
5
PC 1
PC 2
Training
Fault A
Fault B
T2 threshold
Figure 3.5: Scores of normal and faulty samples for PCA with the T2 threshold calculated
at the 99% confidence level (dashed-line).
coefficients, auto- or cross- correlation coefficients are used. The statistics that are selected
to characterize the process are mean, standard deviation, skewness, and kurtosis (i.e. 8
statistics). Based on cross-validation, four principal components are selected to capture the
main characteristics of the process. For PCA, Figure 3.5 shows that the scores corresponding
to the faults A and B are located well within the normal operation region defined by the T2
index. Therefore it is expected that this fault cannot be detected by the T2 index. This is
illustrated in Figure 3.6, where the T2 indices fail to detect both faults. The PCA method
failure to detect these faults occurs as a result of the non-Gaussian multivariate distribution
of the process data, which is caused by their nonlinear relationship. This is illustrated in
Figure 3.7, where the multivariate distribution of x1 and x2 is evidently non-Gaussian. The
nonlinearity is then inherited by the process variables projected onto the model defined
by the principal directions. As previously discussed, due to the central limit theorem, the
statistics of the process variables follow a normal distribution asymptotically. Figure 3.8
shows the multivariate distribution of pairs of different statistics (e.g. mean x1 - mean
x2). Both, the density histogram and the intensity map of the different pairs of statistics
37
0 50 100 150 20010
?2
10?1
100
101
102
Sample index
T2
(a)
0 50 100 150 20010
?2
10?1
100
101
102
Sample index
T2
(b)
Figure 3.6: T2 of testing data. (a) Fault A; (b) Fault B. The dashed line indicates the
threshold on the T2 at the 99% confidence level. The dashed-dotted line indicates the fault
onset.
corroborate the central limit theorem, with all of the pairsshowing a multivariate distribution
very close to a pure multivariate Gaussian distribution. It is important to point out that
as the number of SPs increases (i.e., the number of samples), the multivariate distribution
of statistics approach a multivariate Gaussian distribution asymptotically, i.e., they would
eventually completely resemble a multivariate Gaussian distribution. By using the SP matrix
for monitoring instead of the matrix of process variables for this nonlinear example, the
main requirements of PCA, i.e., multivariate Gaussian distribution and independent scores
are satisfied. As a consequence, the scores obtained by applying PCA to the SP matrix for
normal and faulty data (Figure 3.9) observe a defined separation among each other, with the
scores corresponding to faulty data located outside of the normal operating region defined
by the Dp threshold. Consequently, both faults are expected to be detected by SPA. This
illustrated in Figure 3.10, where both faults are detected by the Dp and Dr indexes in SPA.
3.5 Fault Diagnosis
After a fault is detected by one or more fault detection indices exceeding their control
limits, it is desirable to perform fault diagnosis to identify the root cause of the fault. For
38
Figure 3.7: Density histogram and intensity map for the multivariate distribution of x1 and
x2.
PCA-based fault detection methods, contribution plot is the most commonly applied fault
diagnosis method, which is based on the assumption that the variables with the largest
contributions to the fault detection index are most likely the faulty variables. Because in
the SPA-based fault detection method, PCA is applied to quantify the dissimilarity among
different statistics patterns, here contribution plots for Dp and Dr are constructed to perform
fault diagnosis.
3.5.1 Contributions to Dp and Dr
Let X denote the SP matrix with n samples (rows) and m statistics (columns). Af-
ter auto-scaling to zero mean and unit variance, the matrix X is decomposed as in Equa-
tion (2.1). For fault detection on a new multivariate observation x, two indices are used: Dp
and Dr as defined below
Dp = (xTP?PTx) (3.10)
Dr = (bardbl?xbardbl2 = (xT ?P?PTx) (3.11)
39
(a) (b)
(c) (d)
Figure 3.8: Density histogram intensity map for different pairs of statistics for normal oper-
ating data. (a) Mean of x1 and mean of x2; (b) Std. deviation of x1 and Std. deviation of
x2; (c) Skewness of x1 and Skewness of x2; (c) Kurtosis of x1 and Kurtosis of x2.
where ? is the diagonal matrix of the l largest eigenvalues of XXT, ?P is the residual loading
matrix. The Dp statistic is a measure of the process variation in the principal component
subspace, while the Dr statistic indicates how well each SP conforms with the model. The
contribution of the ith statistics to Dp and Dr are the following:
Cpi = (?Ti (P?PT)1/2x)2 (3.12)
Cri = (?Ti ?P?PTx)2 (3.13)
where ?Ti is the ith column of the m-dimensional identity matrix.
40
?10 ?5 0 5 10?15
?10
?5
0
5
10
15
PC 1
PC 2
Training
Fault A
Fault B
T2 threshold
Figure 3.9: Scores of normal and faulty samples for SPA with the Dp threshold calculated
at the 99% confidence level (dashed-line).
3.5.2 Hierarchical Contribution Approach
The contributions can be determined based on a single point, or based on a group of
points. A visual inspection of the contributions highlights the variables that are more related
to the particular fault. In SPA, a hierarchical contribution approach is utilized. First,
the major group of statistics contributing to the fault are identified (e.g. mean, variance
etc.). Second, the contribution is breakdown into single elements, i.e., governing statistic of
different variables. In this way, not only SPA can isolate the variable or variables that are
most related to the fault, but it yields an indication of the nature of the fault (e.g. a fault
due to mean or standard deviation change). This unique feature makes contribution plots
based on SPA to be a more effective and informative way to tackle problems related to the
roots of faults online, e.g., instrument drift, noise, bias etc.
41
0 50 100 150 20010
0
105
Sample index
T2
0 50 100 150 20010
?5
100
105
Sample index
SPE
(a)
0 50 100 150 20010
0
105
Sample index
T2
0 50 100 150 20010
?5
100
105
Sample index
SPE
(b)
Figure 3.10: Dp and Dr indices of testing data. (a) Fault A; (b) Fault B. The dashed line
indicates the threshold on the Dp and Dr indices calculated at the 99% confidence level. The
dashed-dotted line indicates the fault onset.
3.6 Conclusion and Discussion
The idea of generating higher order statistics (HOS) for fault detection is not new,
and other existent methods such as artificial neural networks, kernel principal component
analysis, and independent component analysis all make use of HOS? implicitly. However,
Statistics Pattern Analysis (SPA) is the first method available that not only utilizes regular
statistics and HOS to better characterize the process, but it is the first method to monitor
these statistics ?explicitly?. As discussed in this chapter, SPA not only can be applied to
overcome the problems of manufacturing sites where mainly batch processes are utilized,
but it also can be applied to continuous processes, which are common in petrochemical and
chemical industries. As a result of the central limit theorem (CLT), the statistic patterns
(SPs) are guaranteed to approach a Gaussian distribution asymptotically, or at least comply
with a Gaussian distribution much better than the original variables due to the averaging
effect. The better properties offered by the SPs enhances the capabilities of the already
powerful PCA method at the dissimilarity quantification step. In addition, as the statistics
of the process variables are more likely to capture the main characteristics of faults, the
hierarchical contribution plot approach for fault detection is able not only to isolate the
42
possible root of the fault, but to indicate the nature of the fault (e.g. mean or variance
change), which is a huge advantage for online implementation. The material presented in
this chapter highlights the premise of the SPA framework for process monitoring, especially
for industrial processes where nonlinearities are prevalent. In the following chapter, more
of the properties of SPA applied for fault detection and diagnosis will be studied, and its
performance evaluated by the application to different case studies.
43
Chapter 4
Comprehensive Comparison of Linear and Nonlinear Monitoring Methods
4.1 Introduction
In this chapter, the performance of the SPA-based fault detection and diagnosis method
is compared with some representative linear and nonlinear multivariate process monitoring
methods using two case studies.
As discussed previously, the increasing demand for safer and more reliable systems in mod-
ern process operations leads to the rapid development of process monitoring techniques. By
promptly detecting process upsets, equipment malfunctions, and other special events, online
process monitoring not only plays an important role in ensuring process safety, but also
improves process efficiency and product quality. Generally speaking, there are two main
categories in process monitoring methods: univariate methods and multivariate methods. In
univariate methods such as Shewhart, cumulative sum and exponentially weighted moving
average charts [1, 3, 4], each variable (usually product quality) is monitored separately to
identify the special causes of variation (i.e., faults) from natural or common causes of varia-
tion. In multivariate methods such as the principal component analysis (PCA) and partial
least squares (PLS) methods [5, 6, 8, 9, 16, 26, 28], multiple variables are monitored together
to detect faults that violate normal process variable correlations.
With a large amount of variables measured and stored automatically by distributed control
systems (DCS), multivariate statistical monitoring methods have become increasingly com-
mon in process industry. Specifically, PCA-based process monitoring methods have gained
wide application in chemical and petrochemical industries. PCA-based monitoring meth-
ods can easily handle high dimensional, noisy and highly correlated data generated from
industrial processes, and provide superior performance compared to univariate methods.
44
In addition, the PCA-based process monitoring methods are attractive because they only
require a good historical data set of normal operation, which are easily available for the
computer-controlled industrial processes.
Although PCA-based monitoring methods have been very successful in many applications,
there are cases where they do not perform well. Two of the possible reasons are the following.
First, PCA only considers the mean and variance-covariance of the process data, and lacks
the capability of providing higher-order representation for non-Gaussian data. Second, the
control limits of Hotelling?s T2 and the squared prediction error (SPE) charts are developed
based on the assumption that the latent variables follow a multivariate Gaussian distribu-
tion. When the latent variables are non-Gaussian distributed due to the process nonlinearity
or other reasons, using Hotelling?s T2 and SPE may be misleading [33, 128].
To obtain better monitoring performance for the processes where PCA does not work well,
several alternative approaches have been developed. Luo et al. [84] apply discrete wavelet
analysis (DWA) to decompose the process measurements and then apply PCA to the wavelet
coefficients at different detail levels to monitor the process; Kano et al. [129] define a dis-
similarity index to monitor the distribution of the process time series data and to detect
changes for the correlations among process variables that cannot be detected by Hotelling?s
T2 and SPE charts; more recently, several multivariate monitoring methods based on inde-
pendent component analysis (ICA) have been proposed, where ICA decomposes the process
data in to linear combinations of statistically independent components [11, 33, 103]. Among
these approaches, Kano?s approach is a second-order method similar to PCA, while DW-
and ICA-based fault detection methods involve higher-order statistics implicitly.
In Chapter 2, a new multivariate statistical process monitoring framework, named Statistics
Pattern Analysis (SPA) [32, 59] is introduced. The major difference between the PCA-
based and SPA-based fault detection methods is that PCA monitors process variables, while
SPA monitors the statistics of the process variables. In other words, SPA examines the
variance-covariance of the process variables statistics (e.g. mean, variance, autocorrelation,
45
cross-correlation etc.) to perform fault detection. In SPA, different statistics that capture
the different characteristics of the process can be selected to build the model for normal
process operation, and various higher-order statistics can be utilized explicitly. Fault detec-
tion methods based on the SPA framework have been shown to provide superior monitoring
performance for several batch and continuous processes compared to the PCA and dynamic
PCA based methods [32, 59].
In this chapter, the potential of the SPA framework for fault diagnosis of continuous processes
is explored. Specifically, the variable contributions from the fault detection indices gener-
ated by SPA are derived; then contribution plots are constructed to perform fault diagnosis.
Because the SPA based fault detection method for continuous processes is a window-based
approach, the SPA based contribution plot is an averaged contribution plot for all samples in
the window. In the first case of study, the SPA framework performance is compared to linear
monitoring methods such as PCA and DPCA using a nonlinear quadruple-tank system. In
addition, as the SPA-based monitoring for continuous processes is a window-based approach,
and there might be cases where samples from process variables are limited, in this chapter,
the effect of window overlapping for calculating the SPs is presented.
In the second case study, the challenging Tennessee Eastman Process (TEP) [130] is used to
demonstrate the performance of the SPA-based fault detection and diagnosis method, which
is compared with linear process monitoring methods such as PCA, dynamic PCA (DPCA),
together with nonlinear process monitoring methods such as kernel PCA (KPCA), and inde-
pendent component analysis (ICA). In this case study, the main characteristics of SPA are
discussed. In addition, a comprehensive analysis of the important aspects for fault detection
and diagnosis are elucidated and discussed.
46
4.2 Quadruple Tank
In this section, a simulated quadruple-tank process is used to demonstrate the perfor-
mance of the proposed SPA method and investigate its properties. In addition, the moni-
toring performance of SPA is compared with the traditional PCA and DPCA methods.
The quadruple-tank process was originally developed by Johansson [131] as a novel multi-
variate laboratory process. This process consists of four interconnected water tanks, two
pumps, and associated valves. A schematic diagram of the process is shown in Figure 4.1.
The inputs are the voltages supplied to the pumps, v1 and v2, and the outputs are the water
levels h1 ? h4. The flow to each tank is adjusted using the associated valves ?1 and ?2. A
nonlinear model is derived based on mass balances and Bernoulli?s law.
dh1
dt = ?
a1
A1
radicalbig
2gh1 + a3A
1
radicalbig
2gh3 + ?1?1A
1
v1 (4.1)
dh2
dt = ?
a2
A2
radicalbig
2gh2 + a4A
2
radicalbig
2gh4 + ?2?2A
2
v2 (4.2)
dh3
dt = ?
a3
A3
radicalbig
2gh3 + (1??2)?2A
3
v2 (4.3)
dh4
dt = ?
a4
A4
radicalbig
2gh4 + (1??1)?1A
4
v1 (4.4)
(4.5)
For tank i, Ai is the cross-section of the tank, ai is the cross section of the outlet hole, and
hi is the water level. The voltage applied to pump i is vi and the corresponding flow is
?ivi. The parameters ?1 and ?2 ? (0,1) are determined from how the valves are set before
an experiment. The water flow rate to tank 1 (i.e., f1) is ?1?1v1 and to tank 4 (i.e., f4)
is (1 ? ?1)?1v1. The flow rates for tanks 2 ? 3 are obtained similarly. The acceleration of
gravity is denoted by g. The parameter values of this process are given in Table 4.1.
47
Figure 4.1: Schematic diagram of the quadruple-tank process
Table 4.1: Simulation parameters for the quadruple tank case study*
Parameter Units Value
A1;A3 cm2 28
A2;A4 cm2 32
a1;a3 cm2 0.0071
a2;a4 cm2 0.0057
?1;?2 cm3V ?1s?1 3.33
g cm/s2 9.81
*After Johansson [131]
4.2.1 Simulation Settings
The data are generated by Eqs. (4.1) - (4.4) where ?i and vi are corrupted by inde-
pendently Gaussian white noise with zero mean and standard deviation of 0.01 and 0.05,
respectively, which are about 1?2% of their upper limit or steady-state value. Measured hi
is corrupted by Gaussian distributed white noise with zero mean and standard deviation of
0.1 and measured ?i and vi contain the same level of noise as the input ?i and vi, respectively.
Three normal data sets are generated with different random seeds with a sampling frequency
48
of 10 seconds. They are denoted as training set, validation set and testing set, respectively.
The training set contains 240 hours of normal data and is used to build the process model;
the validation set contains 15 hours of normal data and is used to determine the thresholds of
the monitoring indexes, i.e., Dp and Dr for SPA, T2 and SPE for PCA and DPCA. Finally,
the testing set contains 15 hours of normal data, and is used to test the false alarm rate of
different monitoring methods. Two faults, a sensor fault and a tank leakage fault are exam-
ined in this work. For the sensor fault, a bias of ?h4 = 0.3 is added to sensor measurement
h4. Figure 4.2 shows the time series data of the process variables, water levels hi and flow
rates fi (i = 1,...,4) corresponding to the sensor fault in h4. It is clear from Figure 4.2 that
this fault is very difficult to detect by the naked eye due to its small magnitude. For the
0 50 100 150 200
10
12
14
h 1
Sample
0 50 100 150 20010
12
14
h 2
Sample
0 50 100 150 2000
2
4
h 3
Sample
0 50 100 150 2000
2
4
h 4
Sample
0 50 100 150 2005
6
7
8
f 1
Sample
0 50 100 150 2005
6
7
8
f 2
Sample
0 50 100 150 2003
4
5
6
f 3
Sample
0 50 100 150 2002
3
4
5
f 4
Sample
Figure 4.2: Time series of process variables for the sensor fault in h4
leakage fault, a leakage in tank 1 is introduced by assuming that there is a small hole at
49
the bottom of the tank with cross-section aleak = 0.005 cm2. The mass balance for tank 1
changes from Eqn. (4.1) to the following.
dh1
dt = ?
a1
A1
radicalbig
2gh1 + a3A
1
radicalbig
2gh3 + ?1?1A
1
v1 ? aleakA
1
radicalbig
2gh1 (4.6)
where the last term corresponds to the leakage of tank 1. Mass balance equations for the
other tanks do not change. Figure 4.3 shows the time series data of the process variables,
water levels hi and flow rates fi (i = 1,...,4) corresponding to the the leakage in tank 1.
For both faults we start the simulation with normal operation, and then introduce the fault
0 50 100 150 200
10
12
14
h 1
Sample
0 50 100 150 20010
12
14
h 2
Sample
0 50 100 150 2000
2
4
h 3
Sample
0 50 100 150 2000
2
4
h 4
Sample
0 50 100 150 2005
6
7
8
f 1
Sample
0 50 100 150 2005
6
7
8
f 2
Sample
0 50 100 150 2003
4
5
6
f 3
Sample
0 50 100 150 2002
3
4
5
f 4
Sample
Figure 4.3: Time series of process variables for the leakage fault in h1
after 0.5 hours. Totally, 15 hours of data are generated.
50
4.3 The Effect of Window Overlapping
One of the major requirements of the application of SPA is the availability of enough
data in order to have enough training SPs compared to the number of statistics included to
characterized the process. For continuous processes, if the data are limited, the number of
windows available to calculate the different statistics is reduced, and the determination of
the different coefficients of the SPA model might be affected. To circumvent this problem,
a window overlapping approach is utilized, i.e., the window to calculate the statistics is
shifted by a certain percentage of the total size of the window. Depending of the amount of
overlapping, the number of SPs increases, thus the ratio of number of statistics to number
of samples available decreases, and the determination of the different coefficients relating
the different statistics is more reliable. Although valid, following this approach rises one
question, what is the effect of using a large window overlapping, considering that one of the
major assumptions of PCA is the availability of independent samples, and a large window
overlapping could violate this assumption?
In this work, to test the effect of window overlapping, two scenarios are compared: in case I,
3 hours of training data are used to build the model, and in case II, all 240 hours of training
data are used to build the model. For all three methods, the number of PC?s is determined
through cross-validation. The thresholds are determined through empirical method using
validation data. Specifically, a 99% confidence upper control limit is determined as the
index value below which 99% of the calibration samples are located. Standard contribution
plots [34] are used for fault diagnosis. To reduce the effect of process measurement noise
on the contribution plot, the average contribution plot from 100 samples is examined to
evaluate the fault diagnosis performance of different methods.
4.3.1 Fault 1: The Sensor Bias
PCA, DPCA and SPA are applied to perform fault detection. The model settings for all
three methods are presented in Table 4.2. For SPA method, the window width to calculate
51
Table 4.2: Model settings for the quadruple tank case study
Method Variables PC?s Lags Window Width Window Overlap
Case I
PCA 8 4 ? ? ?
DPCA 24 6 2 ? ?
SPA 18 11 2 125 60%
Case II
PCA 8 4 ? ? ?
DPCA 24 6 2 ? ?
SPA 18 9 2 125 0%
the SPs is chosen to be 125 samples. Due to the limited training samples in case I, 60% of
overlap is allowed between two adjacent windows. In case II, with enough training samples,
the window overlap is removed to eliminate any correlation among different training SPs,
which is expected to improve process monitoring performance. Figure 4.4 presents the fault
detection results from different methods. Because detection results from PCA and DPCA
are similar for case I and case II, only results from case II are presented. For SPA, detection
results for both cases are presented because there are some noticeable differences between
them. The summary of fault detection rates, detection delays and false alarms rates for both
cases is presented in Table 4.3.
Table 4.3: Fault detection for bias in h4
PCA PCA SPA
T2 SPE T2 SPE Dp Dr
Case I
Detection rate (%) 1.6 2.6 1.1 3.5 98.5 99.8
False alarm rate (%) 1.4 1.0 1.1 0.9 10.0 6.7
Detection delay (min.) 16.5 0.17 16.3 0.17 13.8 6.16
Case II
Detection rate (%) 1.5 2.6 1.3 4.7 98.1 98.6
False alarm rate (%) 0.7 0.7 1.3 1.1 0.5 2.4
Detection delay (min.) 16.7 0.5 16.3 0.17 15.8 12.3
Table 4.3 indicates that the sensor bias is very difficult to detect by PCA and DPCA
methods, with detection rates lower than 5% for both cases. This is mainly due to the
relatively small bias in the level sensor h4. On the other hand, this fault can be easily
52
(a) (b)
(c) (d)
Figure 4.4: Fault detection performance. (a) PCA case II, (b) DPCA case II, (c) SPA case
I, (d) SPA case II.
detected by SPA for both cases. However, for case I, SPA has a higher false alarm rate
(10% for Dp and 6.7% for Dr). This is because window overlapping in training data is
implemented due to the limited training samples. The overlapping causes the training SPs
no longer be independent from each other; therefore the model may not adequately capture
the process characteristics, which results in higher false alarms rate. In case II, when more
SPs are included to craft the model and no window overlapping is present, not only the
detection rate is high, but the number of false alarms reduces significantly. The effect of
highly correlated SPs and their influence in the increment of false alarms rate is illustrated
in Figure 4.5. In Figure 4.5, the scores of the first two principal components of highly
correlated and uncorrelated SPs for normal training and testing are shown. As observed
in Figure 4.5(a), the scores of the normal training data cannot fully describe the normal
53
operating region. This causes the scores of normal testing data to form a cluster that
occupies a different region in the plane defined by the first two principal components. This
occurs as a consequence of the high overlapping of windows for calculating the SPs (e.g.
95%), which causes an increase in the false alarm rate. In contrast, Figure 4.5(b) show that
the scores of normal operating data, where no overlapping is used to calculate the statistics
of the process variables form an ellipse that describes better the normal operating region.
This causes the scores for normal testing data to be well within that region. As a result, the
percentage of false alarm rates is reduced to the established significance level, as verified in
in the results presented in Table 4.3. Figure 4.6 compares the monitoring results of SPA on
?20 ?15 ?10 ?5 0 5 10 15 20?15
?10
?5
0
5
10
15
PC1
PC2
Training
Testing
T2 threshold
(a)
?20 ?15 ?10 ?5 0 5 10 15 20?20
?15
?10
?5
0
5
10
15
20
PC1
PC2
Training
Testing
T2 threshold
(b)
Figure 4.5: Score plot of the first two principal components for the SPA method. (a)
Highly correlated SPs (95% window overlapping); (b) Uncorrelated SPs (i.e., without window
overlapping). The dashed-line indicates the control limit of the Hotelling?s T2 calculated at
the 99% confidencel level.
the testing set. Clearly, in case II with independent training samples, the SPA model is able
to capture the process characteristics adequately, and the false alarms rate is compatible
with the confidence limits (99%) that is used to determine the threshold. Note that, in order
to show the details, only segments of the process data are plotted for each method.
Contribution plots are generated to diagnose the fault for the three different methods
for case II. The average contribution plots for the three methods are presented in Figure 4.7.
54
Figures 4.7(a) and 4.7(b) show the contribution based on PCA-SPE and PCA-T2, respec-
tively. PCA is able to identify h4 as the root cause of the fault. However, the other level
measurements also show significant contribution to the fault, which makes pinpointing the
root cause of the fault difficult. On the other hand, DPCA in Figure 4.7(c) shows that is
able to clearly identify the fault by SPE. However it fails to correctly identify the fault by
T2, as shown in Figure 4.7(d). For SPA, Figures 4.7(e) and 4.7(f) show that both indexes
(i.e. Dr and Dp) are able to clearly identify the root cause of the fault (e.g. a mean shift of
level h4).
4.3.2 Fault 2: Leakage in Tank 1
The model settings for fault detection are the same as for the previous fault. Table 4.4
summarizes the fault detection rates and detection delays for each case. Figure 4.8 shows
the detection details for different methods. For this fault, both PCA and DPCA are able
to detect the fault, with PCA showing detection rates ranging from 60%?73% and DPCA
being able to detect the fault almost entirely by SPE index. The SPA method was able to
detect the fault successfully with detection rates close to 100% for both cases. The average
SPE contribution plots for the three methods are presented in Figure 4.9. Figures 4.9(a)
and 4.9(b) display the contribution based on PCA-SPE and PCA-T2, respectively. PCA is
able to identify h1 as the root cause of the fault in both indices. However, h3 is also identified
(a) (b)
Figure 4.6: Dp and Dr indexes for normal testing data. (a) SPA case II, (b) SPA case II.
55
(a) (b)
(c) (d)
(e) (f)
Figure 4.7: Contribution plots based on (a) PCA SPE; (b) PCA T2; (c) DPCA SPE;
(d) DPCA T2; (e) SPA Dr; (f) SPA Dp for case II with bias in h4. Where hi and hi ? Li
denote the mean of hi and a lagged measurement of hi, respectively.
as one major contributor to the fault in the contribution to SPE which could lead to more
difficultdiagnosis. Ontheotherhand, DPCAandSPAinFigures4.9(c);4.9(d),4.9(e),4.9(f),
indicate that both methods are able to clearly diagnose the fault by identifying a mean shift
of level h1 for SPA (e.g. h1) and a fault in level h1 for DPCA. However, the diagnosis of
56
Table 4.4: Fault detection for leakage in tank 1
PCA PCA SPA
T2 SPE T2 SPE Dp Dr
Case I
Detection rate (%) 60.6 72.9 48.4 99.7 99.8 99.8
Detection delay (min.) 0.17 1.3 0.17 0.33 2.3 2.0
Case II
Detection rate (%) 49.2 75.3 16.0 99.0 99.7 99.8
Detection delay (min.) 1.3 1.3 4.5 0.17 2.5 1.7
(a) (b)
(c) (d)
Figure 4.8: Fault detection performance. (a) PCA case II, (b) DPCA case II, (c) SPA case
I, (d) SPA case II.
the nature of the fault is straightforward for SPA since the contribution plot of the fault
indicates that an operating point shift in h1 has occurred.
57
(a) (b)
(c) (d)
(e) (f)
Figure 4.9: Contribution plots based on (a) PCA SPE; (b) PCA T2; (c) DPCA SPE;
(d) DPCA T2; (e) SPA Dr; (f) SPA Dp for case II with bias in h4. Where hi and hi ? Li
denote the mean of hi and a lagged measurement of hi, respectively.
58
4.4 Tennessee Eastman Process
The TEP process simulator has been widely used by the process monitoring community
as a realistic example to compare various monitoring and control approaches [29, 124, 129].
The process consists of five major unit operations: a reactor, a product condenser, a vapor-
liquid separator, a recycle compressor, and a product stripper. Four reactants A,C,D and E
plus the inert B are fed to the reactor to generate products G and H, as well as byproduct
F through two exothermic reactions. Figure 4.10 shows a process flow diagram of the TEP.
The simulator was originally developed by Downs and Vogel [130] and different control
strategies were implemented in different modified versions [132?135]. In this work, Ricker?s
simulator is used to generate normal and faulty data. The simulator is able to simulate
normal process operation together with 20 faulty conditions. Table 4.5 shows a list of the
20 process disturbances and their type.
Figure 4.10: Flow diagram of the Tennessee Eastman processes
59
Table 4.5: Disturbances in the TEP
no. Description Type
1 A/C feed ratio, B composition constant (stream 4) Step
2 B composition, A/C ratio constant (stream 4) Step
3 D feed temperature (stream 2) Step
4 Reactor cooling water inlet temperature Step
5 Condenser cooling water inlet temperature Step
6 A feed loss (stream 1) Step
7 C header pressure loss-reduced availability (stream 4) Step
8 A,B,C feed composition (stream 4) Random variation
9 D feed temperature (stream 2) Random variation
10 C feed temperature (stream 4) Random variation
11 Reactor cooling water inlet temperature Random variation
12 Condenser cooling water inlet temperature Random variation
13 Reaction kinetics Slow drift
14 Reactor cooling water valve Sticking
15 Condenser cooling water valve Sticking
16-20 Unknown Unknown
4.4.1 Simulation Settings
Following [133], the data are collected at a sampling interval of 3 min. The process data
include 11 manipulated variables, 22 continuous process measurements, and 19 composition
measurements which are sampled less frequently. Similar to [33], 22 continuous process
measurementsand9manipulated variables listed inTable4.6 areusedtomonitortheprocess.
The other two manipulated variables were fixed to constants in Ricker?s decentralized control
scheme, and therefore are not included in the monitoring. Details about the process variables
can be found in [130, 135]. A normal training set of 800 hr of normal data is used for training
the model, 50 hr of normal data for false alarm rate calculation, and each fault consists of
50 hr with the first 10 hr being normal.
4.4.2 Monitoring Settings
The monitoring settings of the different methods are listed in Table 4.7. The settings for
PCA, DPCA and SPA are similar to the settings in [32]. For SPA, the statistics included in
60
Table 4.6: Monitored variables in TEP
no. Variable description no. Variable description
Process Measurements
1 A feed (stream 1) 12 Product separator level
2 D feed (stream 2) 13 Product separator pressure
3 E feed (stream 3) 14 Product separator underflow (stream 10)
4 A and C feed (stream 4) 15 Stripper level
5 Recycle flow(stream 8) 16 Stripper pressure
6 Recycle feed rate (stream 6) 17 Stripper underflow (stream 11)
7 Reactor pressure 18 Stripper temperature
8 Reactor level 19 Stripper steam flow
9 Reactor temperature 20 Compressor work
10 Purge rate (stream 9) 21 Reactor cooling water outlet temperature
11 Product separator temperature 22 Separator cooling water temperature
Manipulated Variables
23 D feed flow (stream 2) 28 Separator pot liquid flow (stream 10)
24 E feed flow (stream 3) 29 Stripper liquid product flow (stream 11)
25 A feed flow (stream 1) 30 Reactor cooling water flow
26 A and C feed flow (stream 4) 31 Condenser cooling water flow
27 Purge valve (stream 9)
the statistic pattern (SP) include the mean, the standard deviation, correlation coefficients,
cross-correlation and auto-correlation coefficients. A correlation coefficient (rij) is retained
if |ri,j| > 0.3 for more than 50% of the training SPs. For auto-correlation (rdi ) and cross-
correlation (rdi,j) coefficients, it is required that their values are vextendsinglevextendsinglerdivextendsinglevextendsingle > 0.3 vextendsinglevextendsinglerdi,jvextendsinglevextendsingle > 0.3 for
more than 50% of the training SPs, where d denotes the time lag (d = 2) between variables.
For KPCA and ICA, the settings were found after a thorough search. For KPCA, a gaussian
kernel function, i.e., k(x,y) = exp
parenleftBig
?bardblx?ybardblc
parenrightBig
is used to compute the kernel matrix (K),
where c = ?m?2. The parameters ? = 8, ? = 0.9 were found by optimizing the detection
performance, and m = 31 is the number of process variables. In addition, the number of
components retained after applying PCA in the feature space, and number of components for
testing are determined by optimizing the detection performance. For ICA, the FastICA [102]
algorithm described in Chapter 2 is used to find the independent components. A contrast
function to approximate negentropy equal to G1(u) = 1a1logcosh(a1u), where a1 = 1 is used.
61
The number of independent components for testing is determined used the Euclidean norm
approach [106]. For all methods, an empirical approach is used to determine the upper
control limits so that different methods can be compared based on the same confidence
level [32].
Table 4.7: Settings of different fault detection methods
Methods Variables PCs (ICs) lags Window width Window shifting step
PCA 31 9 - - -
DPCA 90 20 2 - -
ICA 31 6 - - -
KPCA 60 26 - - -
SPA 257 6 2 100 50
4.5 Fault detection
In this section, the most important aspects in fault detection are discussed based on the
results obtained in monitoring the 20 faults in the TEP benchmark problem.
4.5.1 Detection Rate
The detection rates of the five methods for all faults are listed in Table 4.8. It is worth
noting that the results in Table 4.8 for PCA, DPCA and SPA are different from the results
reported in [32]. This is due to the fact that different data sets were used. In [32], the data
set was generated by the control strategy developed in [133], where the plant-wide control
structure recommended in [134] was implemented to generate the closed-loop normal and
faulty process data. While in the current work, the TEP simulator with decentralized control
system developed by Ricker [135] was implemented to generate the closed-loop normal and
faulty process data. By visually comparing the two control strategies under normal operation
condition, it was noticed that the process variation under the decentralized control is much
smaller than that under the plant-wide control. Therefore, the normal process model with
62
the decentralized control defines a tighter region of normal operation than that with the
plant-wide control, which makes the model more sensitive and therefore more effective to
fault detection. This is true no matter what fault detection is used. As a result, although
similar settings were used, the fault detection rates of all the three methods are higher for
almost all faults in this work than those in reported in [32], with PCA and DPCA improved
the most. From Table 4.8, it is evident that all five methods are effective in detecting most
of the faults. It is worth noting that faults 3, 9 and 15 have been suggested to be difficult to
detect when the plant-wide control strategy is implemented. In this work, these results were
confirmed since these faults are also difficult to detect by any of the five methods when the
decentralized control strategy is implemented. Therefore, these three faults are not listed
in Table 4.8. In addition, fault 16 cannot be detected by any of the five methods when the
decentralized control strategy is implemented. After visually inspecting the four difficult
cases, the reason for these faults not being detected is that the disturbances were completely
rejected without introducing noticeable changes to any process variables. For the rest of the
16 faults, PCA, DPCA, KPCA and ICA have difficulty in detecting faults 5 and 12 with
detection rates lower than 55% in most of the cases, which are depicted in Figures 4.11 - 4.13.
On the other hand, SPA based method is able to detect all 16 faults with all detection rates
higher than 90%. Additionally, for the faults that cannot be completely detected by PCA,
DPCA, KPCA and ICA methods, such as fault 18, SPA is more effective, this is illustrated
in Figure 4.13.
4.5.2 False Alarm Rate
In any process monitoring approach, it is attempted to reduce the false alarm rates to
the desired significance level (e.g. consistent samples labeled as faulty samples). In this
work, to ensure a fair comparison, the thresholds for monitoring are determined empirically
based on the values of the monitoring indexes in normal validation data. The false alarm
rates of all the methods for training and testing normal data are listed in Table 4.9. A
63
Table 4.8: Fault detection rates (percentage) of PCA, DPCA, KPCA, DPCA, and SPA for
the TEP
PCA DPCA KPCA ICA SPA
fault T2 SPE T2 SPE T2 SPE I2d I2e SPE Dp Dr
1 99.9 100 100 100 100 99.7 100 100 99.9 99.3 99.5
2 99.5 99.2 99.6 99.4 99.6 98.6 99.4 99.4 99.1 99.0 98.8
4 100 100 100 100 100 100 100 100 100 99.8 99.9
5 3.6 0.6 3.6 1.5 3.0 0.4 0.6 2.1 3.7 0.0 93.4
6 100 100 100 100 99.3 100 100 100 100 99.8 99.4
7 100 100 100 100 100 100 100 100 100 99.9 99.9
8 98.9 98.1 99.0 98.0 98.9 95.6 98.8 98.8 98.4 96.9 98.4
10 83.7 93.1 93.0 96.2 97.3 69.6 95.6 96.4 92.7 98.1 98.5
11 94.9 98.2 98.9 99.7 99.1 81.4 94.1 98.9 98.3 99.0 99.5
12 44.2 14.6 59.1 39.1 54.9 4.5 22.9 44.9 42.1 93.8 94.6
13 98.7 98.7 99.0 99.1 99.1 98.7 98.6 99.1 98.9 97.9 98.3
14 99.1 100 100 100 100 92.7 100 100 100 99.5 99.8
16 4.4 0.5 4.1 0.4 2.6 0.4 0.6 1.8 3.7 0.0 0.0
17 97.2 99.2 99.2 99.6 99.5 88.0 99.2 99.4 99.12 98.4 99.0
18 71.3 76.5 73.0 85.7 86.6 64.5 78.1 81.6 75.1 93.6 96.0
19 98.0 98.5 99.5 99.0 99.4 94.9 98.1 99.3 98.9 96.5 98.9
20 98.9 98.6 99.0 99.0 98.9 99.6 98.1 99.3 98.9 97.9 98.5
detailed analysis of the results shown in Table 4.9 indicates that the thresholds determined
empirically with 99% confidence level are reasonable and consistent, because the false alarm
rates are not far from 1% for all the methods.
Table 4.9: False alarms rate (percentage) of PCA, DPCA, KPCA, ICA, and SPA for the
TEP
PCA DPCA KPCA ICA SPA
Data T2 SPE T2 SPE T2 SPE I2d I2e SPE Dp Dr
Training 2.1 0.4 2.4 0.7 0.6 1.7 0.6 1.2 1.6 1.0 1.0
Testing 1.5 0.3 1.4 0.7 1.1 1.0 0.6 0.3 0.8 0.0 2.6
64
4.5.3 Detection Delay
It is worth noting that although detection delay is usually associated with window-based
approaches, there is only minor detection delay associated with SPA method. This is because
the detection limit is much tighter in SPA [59], since the variance of the variable statistics
is only 1n of the process variable (n is the window width). This is illustrated in Table 4.10,
where the detection delay associated with SPA is not significant compared to the detection
delay observed for the rest of the methods. Table 4.10 shows that there is a slight increment
of detection delay for faults 5 and 12, which is related to the nature of the fault and the
type of control strategy implemented. From Table 4.8, it is observed that SPA is the only
method to detect fault 5. Fault 5 corresponds to a step change of an unmeasured disturbance
(i.e., the condenser cooling water inlet temperature). A detailed analysis of the process data
indicates that this fault is rejected by the control strategy fairly well. However, as it will
be discussed in the fault diagnosis section, it eventually affects the mean of variable 31 (i.e.,
the condenser cooling water flow), which is only detected by SPA. As shown in Figure 4.11,
after this fault propagates enough through the system, the detection of SPA gives a clear
indication of the fault. For fault 12, a random variation of the condenser cooling water inlet
temperature affects the system. Again, the control strategy rejects the disturbance fairly
well, and although there is a slight delay of SPA detecting this fault, SPA is the only method
that after this fault propagates enough, it gives a clear indication that the fault occurred.
This is illustrated in Figure 4.12, where the detection performance of Dp and Dr in SPA
does not observe a chattering effect (i.e., drastic up?s and downs) around the monitoring
threshold, as the monitoring indexes of the other methods. For fault 16, as this fault is
not detected by any of the methods, the associated delay time is obviously large. For the
rest of the faults, the delay associated with SPA varies from 1 to 5 samples (1 sample is
equivalent to 3 minutes), which is not significant if one considers the benefits of having a
clear indication that a fault has occurred, as illustrated in Figures 4.11 - 4.13. This behavior
is also observed for the rest of the detectable faults.
65
Table 4.10: Fault detection delay (samples) of PCA, DPCA, KPCA, DPCA, and SPA for
the TEP
PCA DPCA KPCA ICA SPA
fault T2 SPE T2 SPE T2 SPE I2d I2e SPE Dp Dr
1 2 1 1 1 1 3 2 1 1 7 5
2 5 7 4 4 4 11 4 3 7 9 11
4 1 1 1 1 1 1 1 1 1 3 2
5 > 70 > 70 > 70 > 70 > 70 > 70 > 70 > 70 > 70 > 70 54
6 1 1 1 1 1 1 1 1 1 3 2
7 1 1 1 1 1 1 11 1 1 2 2
8 10 13 9 9 10 28 > 70 10 13 19 14
10 12 8 10 6 7 14 3 7 9 16 13
11 3 2 1 1 2 3 21 2 2 9 5
12 19 15 10 40 15 > 70 11 14 15 49 44
13 11 11 9 8 8 11 1 9 10 18 15
14 1 1 1 1 1 2 198 1 17 5 3
16 > 70 > 70 > 70 > 70 > 70 > 70 5 > 70 > 70 > 70 > 70
17 8 7 6 4 5 8 29 6 7 14 9
18 39 31 41 22 25 33 5 30 38 52 33
19 7 8 3 6 4 11 8 4 6 29 10
20 10 11 9 8 10 3 1 7 8 18 13
66
0 50 100 150 20010
0
101
102
Sample index
T2
0 50 100 150 20010
0
101
102
Sample index
SPE
(a)
0 50 100 150 20010
0
101
102
Sample index
T2
0 50 100 150
102
Sample index
SPE
(b)
0 50 100 150 20010
?4
10?3
10?2
Sample index
T2
0 50 100 150 200
10?6
10?4
Sample index
SPE
(c)
0 50 100 150 200
100
102
Sample index
I2 d
0 50 100 150 20010
1
102
Sample index
I2 e
0 50 100 150 20010
1
102
Sample index
SPE
(d)
0 50 100 150 20010
0
101
102
Sample index
D p
0 50 100 150 20010
3
104
105
Sample index
D r
(e)
Figure 4.11: Detection performance of PCA, DPCA, KPCA, ICA and SPA for fault 5.
(a) PCA; (b) DPCA; (c) KPCA; (d) ICA; (e) SPA. The dash-dotted lines indicate the fault
onsets and dashed lines the index thresholds.
67
0 50 100 150 20010
0
102
Sample index
T2
0 50 100 150 20010
0
101
102
Sample index
SPE
(a)
0 50 100 150 20010
0
102
Sample index
T2
0 50 100 150 20010
1
102
103
Sample index
SPE
(b)
0 50 100 150 20010
?4
10?2
Sample index
T2
0 50 100 150 200
10?6
10?4
Sample index
SPE
(c)
0 50 100 150 20010
?2
100
102
Sample index
I2 d
0 50 100 150 20010
1
102
103
Sample index
I2 e
0 50 100 150 20010
1
102
103
Sample index
SPE
(d)
0 50 100 150 20010
0
101
102
103
Sample index
D p
0 50 100 150 20010
3
104
105
106
Sample index
D r
(e)
Figure 4.12: Detection performance of PCA, DPCA, KPCA, ICA and SPA for fault 12.
(a) PCA; (b) DPCA; (c) KPCA; (d) ICA; (e) SPA. The dash-dotted lines indicate the fault
onsets and dashed lines the index thresholds.
68
0 50 100 150 20010
0
102
104
Sample index
T2
0 50 100 150 20010
0
102
104
Sample index
SPE
(a)
0 50 100 150 20010
0
102
104
Sample index
T2
0 50 100 150 20010
0
105
Sample index
SPE
(b)
0 50 100 150 20010
?4
10?2
100
Sample index
T2
0 50 100 150 200
10?5
Sample index
SPE
(c)
0 50 100 150 20010
?5
100
105
Sample index
I2 d
0 50 100 150 20010
0
102
104
Sample index
I2 e
0 50 100 150 20010
0
102
104
Sample index
SPE
(d)
0 50 100 150 20010
0
102
104
Sample index
D p
0 50 100 150 20010
2
104
106
108
Sample index
D r
(e)
Figure 4.13: Detection performance of PCA, DPCA, KPCA, ICA and SPA for fault 18.
(a) PCA; (b) DPCA; (c) KPCA; (d) ICA; (e) SPA. The dash-dotted lines indicate the fault
onsets and dashed lines the index thresholds.
69
4.6 Fault diagnosis
In this subsection, the proposed SPA based fault diagnosis approach is applied to TEP
to identify the root cause of each fault. As a comparison, PCA, DPCA, KPCA, ICA and
SPA are also applied to diagnose the faults. For the sake of clarity, the diagnoses of all faults
are not discussed in detail except the three illustrative examples given below. It is important
to mention that, in general, PCA, DPCA, KPCA, ICA and SPA methods are effective in
pinpointing the major fault-contributing process variables for most of the faults. For the case
of KPCA diagnosis, the method was found to be quite sensitive to the monitoring settings,
i.e. a good set of detection settings might not provide a good diagnosis and vice versa. For
the contribution plots based on SPA, two subplots are used to show the contributions from
variable mean and standard deviation. The contributions from other statistics such as auto-
and cross-correlations are small and are not shown. The example of fault 10 is shown in
Figure 4.14(a), where random variation is introduced to C feed temperature (stream 4 to
the stripper in Figure 4.10). Figure 4.14(a) shows that PCA, DPCA, KPCA, ICA and SPA
methods correctly identify that the variable 18 (stripper temperature) is the major fault-
contributing variable. In the case of KPCA, a significant drop for variable 18 is observed
in the reconstruction index (e.g. ?i), thus indicating that variable 18 is the root cause of
the fault (e.g. dashed line in Figure 4.14(c)). Compared to the other diagnosis, the SPA
based diagnosis via contribution plots provide extra information by identifying that it was
the variance, not the mean, of the variable 18 that contribute to this fault.
There are cases where SPA does a better job in fault detection than the rest of the
methods. In these cases, SPA also does a better job in fault diagnosis. One such example
is fault 5 shown in Figure 4.15, where a step change occurred in condenser cooling water
inlet temperature. Table 4.8 shows that PCA, DPCA, KPCA and ICA cannot detect this
fault. Therefore, it is expected that they cannot diagnose the fault either, which is verified
by Figures 4.15(a), 4.15(b), 4.15(d) where a range of process variables across different units
all contribute to this fault. In the case of KPCA, Figure 4.15(c) shows that variable 31
70
0 5 10 15 20 25 300
20
40
60
80
Variable
Percentage of Contribution
0 5 10 15 20 25 300
20
40
60
Variable
Percentage of Contribution
(a)
0 5 10 15 20 25 300
20
40
60
Variable
Percentage of Contribution
0 5 10 15 20 25 300
20
40
60
Variable
Percentage of Contribution
(b)
0 50 100 150 200 250 3000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sample
? i
(c)
0 5 10 15 20 25 30?2
0
2
Variable
Contribution
0 5 10 15 20 25 30?5
0
5
Variable
Contribution
0 5 10 15 20 25 30?5
0
5
Variable
Contribution
(d)
0 10 20 300
50
100
Variable Mean
Percentage of Contribution 0 10 20 300
50
100
Variable Std. Dev.
Percentage of Contribution
0 10 20 300
20
40
60
80
Variable Mean
Percentage of Contribution 0 10 20 300
20
40
60
80
Variable Std. Dev.
Percentage of Contribution
(e)
Figure 4.14: Diagnosis of fault 10. (a) PCA (top T2, bottom SPE); (b) DPCA (top T2,
bottom SPE); (c) KPCA (reconstruction index ?i); (d) ICA(top I2d, middle I2d bottom SPE
SPE); (e) SPA (top row Dp, bottom row Dr).
71
0 5 10 15 20 25 300
2
4
6
8
Variable
Percentage of Contribution
0 5 10 15 20 25 300
5
10
Variable
Percentage of Contribution
(a)
0 5 10 15 20 25 300
5
10
Variable
Percentage of Contribution
0 5 10 15 20 25 300
5
10
Variable
Percentage of Contribution
(b)
0 50 100 150 200 250 3000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Sample
? i
(c)
0 5 10 15 20 25 30?0.5
0
0.5
Variable
Contribution
0 5 10 15 20 25 30?1
0
1
Variable
Contribution
0 5 10 15 20 25 30?2
0
2
Variable
Contribution
(d)
0 10 20 300
5
10
Variable Mean
Percentage of Contribution 0 10 20 300
5
10
Variable Std. Dev.
Percentage of Contribution
0 10 20 300
20
40
60
Variable Mean
Percentage of Contribution 0 10 20 300
20
40
60
Variable Std. Dev.
Percentage of Contribution
(e)
Figure 4.15: Diagnosis of fault 5. (a) PCA (top T2, bottom SPE); (b) DPCA (top T2,
bottom SPE); (c) KPCA (reconstruction index ?i); (d) ICA(top I2d, middle I2d bottom SPE
SPE); (e) SPA (top row Dp, bottom row Dr).
72
0 5 10 15 20 25 300
5
10
Variable
Percentage of Contribution
0 5 10 15 20 25 300
5
10
Variable
Percentage of Contribution
(a)
0 5 10 15 20 25 300
5
10
15
Variable
Percentage of Contribution
0 5 10 15 20 25 300
5
10
Variable
Percentage of Contribution
(b)
0 50 100 150 200 250 3000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sample
? i
(c)
0 5 10 15 20 25 30?0.5
0
0.5
Variable
Contribution
0 5 10 15 20 25 30?1
0
1
Variable
Contribution
0 5 10 15 20 25 30?2
0
2
Variable
Contribution
(d)
0 10 20 300
10
20
30
40
50
Variable Mean
Percentage of Contribution 0 10 20 300
10
20
30
40
50
Variable Std. Dev.
Percentage of Contribution
0 10 20 300
10
20
30
40
Variable Mean
Percentage of Contribution 0 10 20 300
10
20
30
40
Variable Std. Dev.
Percentage of Contribution
(e)
Figure 4.16: Diagnosis of fault 12. (a) PCA (top T2, bottom SPE); (b) DPCA (top T2,
bottom SPE); (c) KPCA (reconstruction index ?i); (d) ICA(top I2d, middle I2d bottom SPE
SPE); (e) SPA (top row Dp, bottom row Dr).
73
observes a significant drop in the reconstruction index for this variable. However, to obtain
this diagnosis performance, a different set of diagnosis settings (e.g. width of the gaussian
kernel c) was used, which is an adverse factor for online implementation since it introduces
more tunable parameters. On the other hand, SPA was able to isolate the fault root cause to
the manipulated variable 31, the condenser cooling water flow rate. Since the cooling water
inlet temperature is not measured, the manipulated variable 31 is the most related variable.
In addition, SPA was able to indicate that a mean change, not variance, in variable 31 is the
root cause, which is directly related to the step change in the condenser cooling water inlet
temperature.
In the final example of fault 12 (random variation in condenser cooling water inlet tempera-
ture) shown in Figure 4.16, the purpose is to illustrate that a successful fault detection does
not necessarily lead to a correct fault diagnosis. From Table 4.8, all the five methods were
able to detect the fault. However, from Figure 4.16, PCA, DPCA, KPCA and ICA were not
able to isolate the root cause of the fault. SPA was able to relate the fault to the separator
temperature (variable 11) right after the condenser. Again, since the cooling water temper-
ature or the condenser temperature was not measured and this random variation would not
trigger the manipulated cooling water flow rate to change, the downstream separator right
after the condenser is the closest to the actual fault location. In other words, the random
variation in condenser cooling water inlet temperature (not measured) will immediately lead
to random variation in the separator temperature (measured, variable 11). It is also worth
noting that SPA correctly identified that the standard deviation, not the mean, of variable
11 is the major fault contributor.
4.7 Conclusion and Discussion
In the first case study, a nonlinear system (quadruple-tank) is simulated to examine the
performance of the SPA method. The example demonstrates that SPA is able to handle
nonlinear processes, and to detect and diagnose subtle changes in the dynamics of a system.
74
Additionally, SPA is able to detect and diagnose both faults more efficiently that the PCA
and DPCA methods. The two different cases with different size of training samples indicate
that SPA can also perform well with a limited amount of data. However, for the case where
data are limited, a window overlapping is necessary. The window overlapping causes the
samples to be no longer independent from each other which causes an increment in the false
alarm rates for normal operating data. For the case where enough data are available, SPA
not only is able to provide high detection rates and a more clear identification of a fault, but
it also reduces the false alarms for normal operating data considerably. It is worth noting
that the cost associated with using more SPs is that a large amount of data are needed for
model building, as the number of SPs is only 1/w of the number of original training samples,
where w is the size of the window. However, for most industrial processes this limitation can
be overcome since large amounts of historic data are available. Additionally, because of the
window-based approach, there is limited detection delay associated with the SPA method.
In fact, the detection delay is comparable to that the other monitoring methods, because
the window from SPA can be shifted by one sample during the online monitoring once the
model is build.
In the second case study, the potential of the SPA framework in fault detection and diagnosis
at a plant-wide level was explored. The Tennessee Eastman process is used to evaluate
the performance of the developed fault detection and diagnosis method based on the SPA
framework, and compared with the two commonly applied linear monitoring methods (e.g.
PCA and DPCA) and nonlinear monitoring methods (e.g. ICA and KPCA). The case study
shows that in general all five methods work well in detecting and diagnosing most of faults.
However, for some faults that are difficult to detect and/or diagnose by PCA, DPCA, KPCA
and ICA based methods, SPA based method provides superior performance. As the SPA-
based process monitoring for continuous processes is a window-based approach, detection
delay is associated with its application. In this case study, it was shown that the detection
delay associated with SPA is not significant, as the window can be shifted by one online.
75
In addition, the detection of faults is much more clear than other methods and does not
observe drastic changes in the monitoring index around the threshold. For fault diagnosis,
because the SPA-based method breaks down the contribution of a fault to different variable
statistics, it provides extra information in addition to identifying the major fault-contributing
variable(s). For example, the SPA-based contribution plots tell us whether the fault is due to
a change in variable mean or variance. It also should be noted that in general SPA requires
more training data to build a reliable model due to the computation of variable statistics.
However, this is not a big issue because most modern processes are equipped with DCS
systems and therefore are data rich.
76
Chapter 5
Partial Least Squares Algorithms
Given the importance of partial least squares (PLS) and recursive PLS in the devel-
opment of the second part of this work, in this chapter, the PLS algorithm and different
recursive model update approaches are briefly discussed. Specifically, the main characteris-
tics of regular recursive PLS (RPLS), block-wise RPLS (BRPLS), moving window block-wise
RPLS (MW-BRPLS), and forgetting factor block-wise RPLS (FF-BRPLS) are reviewed. In
addition, the data scaling methods that accompany the different algorithms are developed
and discussed.
5.1 PLS Regression
PLS regression have arisen as a useful alternative for analyzing systems of secondary
measurements X with strongly collinear dependency, highly correlated, noisy, and of high
dimensionality, typical characteristics of data from industrial processes [19, 22, 24, 25, 51,
136]. In PLS, the interest is using the variables in the X matrix (i.e., easy-to-measure) to
predict the behavior of one or several primary variables Y (e.g. quality or yield), which is
one of the most common type of problem in science and engineering [24]. Multiple linear
regression (MLR) is the most basic form of model to achieve this goal. However, due to
correlations in process variables, size of data set, and measurement noise, MLR methods
lead to imprecise estimations and poor prediction [26, 40]. In PLS , the properties of the
NIPALS [24] algorithm are utilized to let the score matrix T (e.g. projections of the X
matrix onto the plane defined by the characteristic directions of variability) to represent
the original data matrix X. The scores T of X are calculated in such a way that they are
good predictors of Y. In other words, the decompositions of the matrix X and matrix Y
77
are done in such a way as to maximize the covariance between them [136]. In this way, the
first few retained latent variables contain enough information to characterize X, and at the
same time they contain useful predictive data, providing both, a dimensionality reduction
and numerical robustness compared to MLR [136]. The PLS algorithm, as developed by
Wold et al. [24] is briefly reviewed below. Details of the algorithm and its properties can be
found in [19, 24, 25].
Let X ? Rn?m and Y ? Rn?p denote the input and output data matrices; where n is the
number of samples, m the number of independent variables, and p the number of dependent
variables.
Assume that X and Y are linearly related by:
Y = XC+V (5.1)
where C is the coefficient matrix and V is the noise matrix with appropriate dimensions.
The PLS regression method derives the linear model by first decomposing data matrices X
and Y into bilinear terms:
X = TPT +E (5.2)
Y = UQT +F (5.3)
where T ? Rn?a and U ? Rn?a are the score matrices of X and Y respectively, and a is the
number of retained PLS components. Matrix P ? Rm?a and Q ? Rp?a are known as the
loading matrices. Matrices E and F are the corresponding residual matrices with appropriate
dimensions. Equations (5.2) and (5.3) formulate the PLS outer model. The final goal of PLS
is to describe as much variation of Y as possible and at the same time get a useful relation
between X and Y [19]. This is done by relating the scores of X and Y with a linear inner
model:
U = TB+R (5.4)
78
where B ? Ra?a is a diagonal matrix of inner model coefficients that are obtained by mini-
mizing the residual R.
The estimate hatwideY of Y is obtained consequently:
hatwideY = TBQT +F (5.5)
For convenience, in this work the following notation is used to represent the modeling pro-
cedure that obtains the PLS model {T,P,U,Q,B} from the pair of data matrices {X,Y}
{X,Y} PLS? {T,P,U,Q,B} (5.6)
Examples of the application of PLS regression for soft sensor development are given in the
subsequent chapters.
5.2 Recursive PLS
In traditional PLS, a static model is built using a training data set, which is later used
for predicting future observations. If either new information is introduced or the conditions
reflected in the training data change, a new model needs to be trained using both the old
and the new training samples to adequately capture the characteristics of the process [137].
If this approach is followed continuously, the amount of training data for update will become
an issue very quickly. Helland et al. [137] proposed a recursive PLS (RPLS) algorithm
that efficiently overcomes the issue of data build up. In their work, it is proposed that
instead of using the old data and the new data to rebuild a new PLS model, the old PLS
model derived from the old data is augmented with the new data to obtain an updated
PLS model. However, Helland?s RPLS approach suffers from numerical instability for rank
deficient data. This issue was later addressed by Qin in [138]. In this work, Qin?s approaches
are implemented to update the RO-DPLS model when new data becomes available. In this
and the following sections, Qin?s approaches are briefly reviewed.
79
If {X,Y} is used to denote the old data, and {X1,Y1} is used to denote the new data, it
was shown in [138] that performing PLS on a data pair:
?
??
??
?
?? X
X1
?
??,
?
?? Y
Y1
?
??
?
??
??
results in the same regression model as performing PLS on the data pair:
??
?
??
?
?? PT
X1
?
??,
?
?? BQT
Y1
?
??
??
?
?? (5.7)
provided that the residual matrix E of X in Equation (5.2) is essentially zero. If n and n1 are
used to denote the number of samples in the old data set and the new data set respectively,
the RPLS approach updates the model using a PLS run size of (a + n1), while the regular
PLS updates the model with a run size of (n+n1), where a is the number of PLS components
used to replace the input matrix X. It is clear that RPLS is much more efficient than the
regular PLS if n >> a, which is typical in PLS applications [138].
5.3 Block-wise RPLS
To further improve computation efficiency, Qin [138] also introduced the block-wise
RPLS (BRPLS), where a PLS sub-model is first built using the new data pair {X1,Y1},
i.e., {X1,Y1} PLS? {T1,P1,U1,Q1,B1}; then it is combined with the existing model to
derive the updated PLS model. It was shown in [138] that performing PLS on the data pair:
??
?
??
?
?? X
X1
?
??,
?
?? Y
Y1
?
??
??
?
??
results in the same regression model as performing PLS on the data pair:
?
??
??
?
?? PT
PT1
?
??,
?
?? BQT
B1Q1T
?
??
?
??
?? (5.8)
80
provided that the residual matrices of X and X1 are essentially zero, i.e., E = 0 and E1 = 0.
5.4 Block-wise RPLS with Moving Window
To adequately adapt process changes, it is desirable to exclude older data from the
model by incorporating a moving window approach. Given {Xi,Yi} PLS? {Ti,Pi,Ui,Bi,Qi},
following [138], the MW-BRPLS performs PLS regression on the following pair of matrices:
?
????
???
?
????
????
?
??
??
??
??
PTi
PTi?1
...
PTi?w+1
?
??
??
??
??
,
?
??
??
??
??
BiQTi
Bi?1QTi?1
...
Bi?w+1QTi?w+1
?
??
??
??
??
?
????
???
?
????
????
(5.9)
When the new data pair {Xi+1,Yi+1} becomes available, a PLS sub-model is first derived:
{Xi+1,Yi+1} PLS? {Ti+1,Pi+1,Ui+1,Bi+1,Qi+1}
then, PTi+1 and Bi+1QTi+1 are augmented into the top row of the matrices in Equation 5.9
and the bottom row is dropped out. The window size w controls how fast the old data are
excluded from the model. A narrow window makes the model adapt faster to the new data.
5.5 Block-wise RPLS with Forgetting Factor
An alternative approach for online adaptation is to use a forgetting factor. Follow-
ing [138], the PLS model at step i is obtained by performing PLS on the following data
pair: ?
??
??
?
?? PTi
?PTic
?
??,
?
?? BiQTi
?BicQTic
?
??
??
?
?? (5.10)
where the subscript ic denotes the combined model obtained at step i ? 1, and ? is the
corresponding forgetting factor. A ? close to zero forgets the old data faster.
81
5.6 Online Data Scaling
The PLS and RPLS algorithms discussed before are derived with the assumption that
the data pair {X,Y} is scaled to zero mean and unit variance, so that different variables
will equally influence the regression model. In addition, the linear relationship between X
and Y extracted from the process data can be easily distorted by measurement noise, the
presence of outliers, and/or autocorrelated measurements. Therefore, data preprocessing is
an important step in building a data-driven PLS soft sensor. The commonly applied data
preprocessing steps include data scaling, outlier detection, and data denoising [139]. For
off-line model building, the data preprocessing is usually an iterative process, i.e., the pre-
processing steps such as outlier detection are repeated several times until the data obtained
can be used for model training, testing, and validation. Currently, data preprocessing, es-
pecially outlier detection, is the step that requires significant amount of manual work and
expert knowledge about the underlying process characteristics [140]. For online applications,
data preprocessing becomes more challenging because it is generally impossible to repeat the
data preprocessing steps with plant expert inputs as in off-line model building.
The most commonly applied scaling method is normalization or autoscaling, i.e., removing
variable means and scaling variables to unit variance. For off-line PLS model building, the
mean and the variance of the n0 training samples are calculated by:
x0 = 1n
0
n0summationdisplay
i=1
xi (5.11)
?20 = 1n
0 ?1
n0summationdisplay
i=1
(xi ?x0)2 (5.12)
Because many processes are time varying, the means and variances of the process variables
usually change over time, correspondingly the means and variances should be updated as
well. In this work, three different preprocessing approaches that accompany the different
model update approaches are considered.
82
1. Recursive scaling and centering (RSC):
For each variable
xn = n?1n xn?1 + 1nxn (5.13)
?2n = n?2n?1?2n?1 + 1n?1 (xn ?xn)2 + (xn ?xn?1)2 (5.14)
where n is the total number of the old and new data points; xn and ?n are the updated
mean and standard deviation. The detailed derivation of Equations (5.13) and (5.14)
is given in Appendix C. Note that the commonly applied formula for variance update
is the following [139]:
?2n = n?2n?1?2n?1 + 1n?1 (xn ?xn)2 (5.15)
which is not exactly correct. However, by comparing Equation (5.14) with Equa-
tion (5.15), the only difference is the term (xn ? xn?1)2. This term can be neglected
when n is large, because the ratio of this term to the second term in Equation (5.14)
can be calculated as
(xn ?xn?1)2
1
n?1(xn ?xn)
2 =
1
n?1 (5.16)
which approaches zero when n is large. Therefore, Equation (5.15) can be used as a
good approximation to update the variance for large n. Detailed derivation of Equa-
tion (5.16) is given in Appendix D. It is worth noting that one drawback with this
approach is that n keeps increasing as more data becomes available and the influence
of the new data point diminishes with increased n.
For block-wise algorithms, the variable mean and variance can be updated using the
following equations
xn = n?n1n xn?n1 + n1n xn,B (5.17)
?2n = n?n1 ?1n?1 ?2n?n1 + n1 ?1n?1 ?2n,B + n1 (n?n1)n(n?1) (xn?n1 ?xn,B)2 (5.18)
83
where n1 is the number of samples in the new data block; xn,B and ?2n,B are the new
data block mean and variance. The detailed derivation of Equations (5.17) and (5.18)
is given in Appendix E.
2. Moving window scaling and centering (MWSC):
In this case, since the oldest measurement in the window is removed with the newest one
added, the total number of measurements stays constant. One can recalculate the mean
or variance for the measurements in the window, or use the following approximation.
xn = w ?1w xn?1 + 1wxn (5.19)
?2n = w ?2w ?1?2n?1 + 1w ?1 (xn ?xn)2 (5.20)
where w is the window width which stays constant. In MWSC, the effect of the new
sample will not diminish as more data becomes available.
For block-wise algorithms with block size n1, the approximation becomes:
xn = w ?1w xn?n1 + 1wxn,B (5.21)
?2n = (w ?1)n1 ?1wn
1 ?1
?2n?n1 + n1 ?1wn
1 ?1
?2n,B (5.22)
where, xn,B and ?2n,B are the new data block mean and variance.
3. Forgetting factor scaling and centering (FFSC):
For each variable:
xn = ?xn?1 + (1??)xn (5.23)
?2n = ??2n?1 + (1??)(xn ?xn)2 (5.24)
84
For block-wise algorithms:
xn = ?xn?n1 + (1??)xn,B (5.25)
?2n = ??2n?n1 + (1??)?2n,B (5.26)
In the following chapters, the properties of PLS are used to develop different software sensors,
and the properties of the different methods reviewed in this chapter are discussed in detail.
85
Chapter 6
A Reduced-Order Dynamic PLS Soft Sensor Approach
6.1 Introduction
Pulping process, which converts wood chips into pulp by displacing lignin from cellu-
lose fibers, is one of the most important operations in a pulp and paper mill [141]. Yearly
US wood pulp production is around 52 million tons and the pulp and paper industry is an
important sector of the US manufacturing [142, 143]. Kraft pulping is the most commonly
used chemical pulping process, which usually utilizes a continuous Kamyr digester. A Kamyr
digester is a complex vertical plug flow reactor where the wood chips react with an aqueous
solution of sodium hydroxide and sodium sulfide, also known as white liquor, at elevated
temperatures. Fig. 6.1 shows the schematic diagram of a single-vessel Kamyr digester with
three distinct zones. Pre-steamed wood chips and white liquor enter the impregnation zone
from the top of the digester, and chemicals in white liquor diffuse into the pores of the wood
chips in this zone. Subsequently in the cook zone, the temperature is raised through the
upper and lower heated circulation and most of the delignification reactions occur. Finally
in the wash zone, the countercurrent flow of dilute wash liquor quenches the reactions and
removes the unreacted chemicals. It is well known that the performance of the pulping pro-
cess is of paramount importance to maximize the pulp quality and yield, reduce the overall
operating cost including energy usage, and minimize the adverse environmental impacts of
pulp mills [142]. The most important quality variable in pulping process is the Kappa num-
ber, which represents the residual lignin content of the pulp [142]. It is desired to minimize
the variations of the Kappa number in the pulp product. For a continuous digester, process
variables are measured at different frequencies. Primary measurement, i.e., the pulp Kappa
86
Figure 6.1: Schematic of a high-yield single-vessel Kamyr digester
number, is measured at a low frequency (e.g., one sample every 2 hrs with 30 mins measure-
ment delay); while secondary measurements are measured at a higher frequency (e.g., one
per 6 to 10 minutes) which include white liquor flow rate, wood chip flow rate, the tempera-
ture and composition of upper circulation, lower circulation and extraction [144]. In order to
describe the dynamics of a continuous digester and develop effective control methods, first
principles or physical models have been developed to simulate the pulping process by ap-
proximating the digester with a series of continuous stirred tank reactors (CSTRs) [145, 146].
However, control of a continuous digester is challenging due to the following characteristics
of the pulping process: (1) long residence time or delay; (2) lack of real-time measurement
of the extent of reaction (i.e., Kappa number); (3) complex nonlinear dynamic behavior;
(4) unmeasured stochastic disturbances. Some of these challenges are also present in other
industrial processes such as distillation columns [46, 48].
To address the challenge that the primary variable (i.e., the Kappa number) is not
measured online but is required for feedback control, soft sensors (or inferential estimators)
87
have been developed in digester control to estimate the Kappa numbers based on the sec-
ondary measurements [147?153]. Soft sensors can be developed in the form of a Luenberger
observer [37] or a Kalman filter [38] using a first principles dynamic model of the process.
However, because chemical processes are generally quite complex to model and are char-
acterized by significant inherent nonlinearities, a rigorous theoretical modeling approach is
often impractical, requiring a great amount of effort [39]. Consequently, there has been an
increasing interest toward the development of data-driven soft sensors using multivariate
regression techniques recently [8, 9, 28, 39, 41?49]. Because data-driven soft sensors are easy
to develop and to implement online, they are potentially more attractive than stochastic fil-
ters or deterministic observers [39]. Artificial neural networks (ANN), principal component
regression (PCR) or principal component analysis (PCA), and partial least squares (PLS)
are widely used regression techniques, and their successful application to the development of
soft sensors has been reported for different processes [9, 47, 50?54]. Dynamic PCA (DPCA)
and Dynamic PLS (DPLS), the extensions of the conventional PCA and PLS algorithms
for dynamic modeling, were studied through the augmentation of the original input data
matrices with lagged measurements [29, 55]. Recursive PCA/PLS algorithms have also been
proposed to deal with time-varying characteristics of industrial processes [138, 154, 155].
Because the DPLS soft sensor is the focus of this work, some representative work in this
area is briefly reviewed.
Lakshminarayanan et al. [44] proposed using DPLS to the identification and control
of multivariate systems. In [44], the linear and nonlinear DPLS models were integrated
into the model predictive control (MPC) framework. Kano et al. [48] compared steady-
state, static and dynamic PLS-based soft sensors and found that dynamic soft sensor DPLS
could greatly improve the estimation accuracy. Kano and others [48] also applied the DPLS
soft sensor to estimate the product compositions of a multicomponent distillation column
from online measured process variables such as tray temperatures, reflux flow rate, reboiler
heat duty and pressure. Zamprogna et al. [46] exploited DPCA and DPLS soft sensors to
88
estimate the product composition profiles in a simulated batch distillation process using
temperature measurements. It was also shown in [46] that linear DPLS performed at least
as good as nonlinear DPLS algorithms (polynomial, spline or ANN DPLS) in the batch
distillation process case study. More recently, DPLS soft sensors have been applied to several
industrial processes, including a specialty chemical batch process [56], a splitter and a crude
columns [156], a dearomatization process [57] and a cement kiln system [58].
Despite many successful applications, one limitation associated with DPLS is that it
is usually viewed as an ad hoc method as there is a lack of theoretical understanding on
whether it can capture process dynamics adequately and whether it can provide unbiased
estimate for both open and closed-loop data. Another problem encountered when performing
DPLS is that the addition of a relatively large number of lagged values in the input data
matrix causes a substantial increase in the dimension of the matrix. This implies that
the computational load required to the development of the multivariate regression becomes
considerably heavier, and the regression procedure gets much lengthier. This is particularly
true for spline PLS and ANN PLS, which are based on computationally more intensive
algorithms. As a result, the model becomes more cumbersome to manipulate due to the
presence of a higher number of parameters [46]. As can be seen later in this work, when
process residence time is long, which is true for all continuous digesters, large number of
lagged measurements and more training data are needed for DPLS to capture the process
dynamics and to provide satisfactory soft sensor performance.
To address these issues, in this work, a rigorous proof is derived to show that: (1) DPLS-
based soft sensor is a dynamic estimator that can adequately capture process dynamics; (2)
DPLS-based soft sensor can provide unbiased estimate of the primary variable for both open
loop and closed-loop data. Next, a reduced order DPLS (RO-DPLS) approach with optimally
selected lagged measurements of the secondary variables is proposed. The RO-DPLS soft
sensor not only reduces model size and improves prediction but also significantly increases
prediction horizon [63?66].
89
6.2 The Relationship of DPLS Soft Sensor and Subspace Identification
It has been widely assumed and accepted that DPLS-based soft sensors can capture
the processes dynamics, and DPLS-based soft sensor approaches have been used rather
commonly in industrial applications. The basic idea behind the DPLS soft sensor, i.e.,
using augmented measurements (or ?lagged data?) to capture process dynamics, is not new
and has been widely used in system identification, such as subspace identification where
extended state-space model is used to derive how to extract system matrices. In addition,
this idea has been used in process monitoring where dynamic PCA is used to remove the
time dependence of the self-correlated and cross-correlated dynamic process data. However,
to the best of the authors knowledge, there has not been any rigorous proof to show how the
process dynamics is captured by the static linear regression employed in the DPLS-based
soft sensor. Therefore, compared to Luenberger observer or Kalman filter, DPLS has been
commonly viewed as an ad hoc approach which uses lead-lag elements determined by cross-
validation to handle process dynamics [152]. In addition, it has been assumed that DPLS
soft sensor works for closed-loop data and successful applications exist, but there has not
been any proof that in theory DPLS-based soft sensor can provide unbiased estimate for
closed-loop data. In this work, a rigorous theoretical analysis is provided to show that for
linear time invariant (LTI) systems the DPLS soft sensor can adequately capture process
dynamics and provide unbiased estimate of the primary variable for both open loop and
closed-loop data.
In the following, the DPLS soft sensor formulation is derived using the subspace iden-
tification (subspace ID) framework. Then, based on the derived soft sensor formulation, it
is shown that unbiased estimates of the primary variable can be obtained from DPLS soft
sensor under both open loop and closed-loop operations.
In subspace ID, a stochastic linear system can be represented in the following innova-
tion form, which is utilized in several subspace ID algorithms such as MOESP [157] and
90
SIMPCA [158].
xk+1 = Axk +Buk +Kesk
ysk = Csxk +esk (6.1)
yck = Ccxk +eck
wherexk ? Rn, uk ? Rm are the system state and input; yck ? Rq andysk ? Rl are the primary
and secondary outputs; the innovations esk ? Rl and eck ? Rq are white noise; A, B, Cs, Cc
are system matrices with appropriate dimensions; K is the steady-state Kalman filter gain.
The system represented in (6.1) is assumed to be minimal in the sense that (A, Cs) is
observable and (A, [B, K]) is controllable; in addition, the eigenvalues of A?KCs are
assumed to be strictly inside the unit circle. In other words, it is assumed that the secondary
variables provide enough information to describe the dynamics of the overall system. This
assumption is reasonable, because if the primary variables provide information that the
secondary variables do not contain, one cannot expect to estimate the primary variables
accurately using the secondary variables. Without loss of generality, the following linear
relationship between the two innovation terms is assumed: eck = Fesk, which can be justified
with the assumption that (A, Cs) is observable as shown in Appendix A.
In order to derive the dynamic soft sensor formulation, the innovation form is trans-
formed into its predictor form, as used in [159].
xk+1 = AKxk +BKzk
ysk = Csxk +ek (6.2)
yck = Ccxk +eck
where zk =
bracketleftbigg
uTk, (ysk)T
bracketrightbigg
is the augmented process input and secondary output; AK =
A?KCs and BK = [ B, K ] are predictor matrices. The predictor form has the numerical
91
advantage for identifying both stable and unstable processes because AK = A ? KCs is
guaranteed to be stable even though the process matrix A may be unstable, while the
innovation form may lead to ill-conditioning for unstable processes [159].
By iterating the state equation in Eqn.(6.2), it is straightforward to derive the extended
state equation,
xk = L?pzp(k) +ApKxk?p ? L?pzp(k) for large p (6.3)
where L?p =
bracketleftbigg
BK AKBK ??? Ap?1K BK
bracketrightbigg
, the subscript p denotes the past horizon, and
the past data zp is defined as
zp(k) =
?
??
??
??
??
zk?1
zk?2
...
zk?p
?
??
??
??
??
(6.4)
Similarly, based on the state-space description in Eqn.(6.2), the extended output equation
can be obtained
ysf(k) = ??,sf parenleftbigL?pzp(k) +ApKxk?pparenrightbig+G?,sf zf?1(k) +ef(k) (6.5)
ycf(k) = ??,cf parenleftbigL?pzp(k)+ApKxk?pparenrightbig+G?,cf zf?1(k) +ecf(k) (6.6)
where the subscript f denotes the future horizon; ??,if and G?,if (i = {s,c}) are the extended
observability matrix and Toeplitz matrix corresponding to the secondary and primary out-
puts; superscript ? indicates that these matrices are constructed with predictor matrices
instead of the original system matrices:
??,if =
?
??
??
??
??
Ci
CiAK
...
CiAf?1k
?
??
??
??
??
92
G?,if =
?
??
??
??
??
0 0 ??? 0
CiBK 0 ??? 0
... ... ... ...
CiAf?2K BK CiAf?3K BK ??? CiBK
?
??
??
??
??
the future data yif(k) (i = {s,c}) and zf?1(k) are defined as the following:
yif(k) =
?
??
??
??
??
yik
yik+1
...
yik+f?1
?
??
??
??
??
zf?1(k) =
?
??
??
??
??
zk
zk+1
...
zk+f?2
?
??
??
??
??
When p is large, Eqns.(6.5) and (6.6) can be reduced to
ysf(k) = ??,sf L?pzp(k) +G?,sf zf?1(k) +ef(k) (6.7)
ycf(k) = ??,cf L?pzp(k) +G?,cf zf?1(k) +ecf(k) (6.8)
It is worth noting that Eqn.(6.7) or (6.8) serves as the foundation for subspace ID and
usually p greaterorequalslant f > n is required to facilitate the estimation of the system matrices [158].
Next the main conclusion on the DPLS soft sensor is stated in the following theorem:
Theorem 1. For a linear time invariant (LTI) system as shown in Eqn.(6.1), there exists
a linear regression, with the primary output as the response and past input and secondary
output as the regressor, that can provide unbiased estimate of primary output for both open
loop and closed-loop operations.
Proof. First, note that Eqn.(6.8) not only captures the process dynamics, but also holds for
both open loop and closed-loop cases, as it is the extended state-space representation of the
LTI system defined by Eqn. (6.1).
93
Eqn.(6.8) can be reformulated into
ycf(k) =
bracketleftbigg
??,cf L?p G?,cf
bracketrightbigg
?
?? zp(k)
zf?1(k)
?
??+ec
f(k) (6.9)
which can serve as a soft sensor model to predict ycf(k) from zp(k) and zf?1(k). Eqn.(6.9)
indicates that if and only if the future innovation ecf(k) is uncorrelated to the regressor
variable
?
?? zp(k)
zf?1(k)
?
??, i.e.,
E
??
?
??e
c
f(k)
?
?? zp(k)
zf?1(k)
?
??
T??
?
??= 0 (6.10)
an unbiased prediction of ycf(k) can be obtained from the model given in Eqn. (6.9),
i.e.,
?ycf(k) =
bracketleftbigg
??,cf L?p G?,cf
bracketrightbigg
?
?? zp(k)
zf?1(k)
?
?? (6.11)
Note that condition (6.10) always holds for open loop cases. Because the future inno-
vation is white, without any feedback, the future innovation ecf is independent from both
past data zp and future data zf?1. However, condition (6.10) does not hold for closed-loop
cases when f > 1 [160?164]. This is because under feedback control, the future input will
be correlated to past innovation, i.e. E
braceleftBig
(eck)Tuk+i
bracerightBig
negationslash= 0, for all positive integer i ? I+. As
a result, Ebraceleftbigecf(k)zf?1(k)Tbracerightbig negationslash= 0, although Ebraceleftbigecf(k)zp(k)Tbracerightbig = 0 as the future innovation
is still independent from past data. The detail of the matrix E
?
??
??e
c
f(k)
?
?? zp(k)
zf?1(k)
?
??
T??
?
?? is
given in Appendix B, which shows that the lower triangular part of the matrix is non-zero.
When f = 1, zf?1(k) is eliminated and Eqn.(6.9) reduces to
yck = ??,cf L?pzp(k)+eck (6.12)
94
As the current innovation eck is uncorrelated to past data for both open loop and closed-loop,
i.e., E[eck(zp)T] = 0, the soft sensor model obtained based on Eqn.(6.12) provides unbiased
estimate of yck, which is the one-step-ahead prediction, no matter if the primary variable is
used for feedback control or not. It is worth noting that in Eqn.(6.12) only process inputs
(u) and secondary output (ys) are included in the regressor.
To address the ill-conditioning problem caused by the correlated regressors (both be-
tween the same variables at different time instants, and between different variables), partial
least squares (PLS) can be applied to estimate parenleftbig??,cf L?pparenrightbig for the soft sensor model, which is
the commonly used DPLS soft sensor because of the way zp(k) is constructed (see Eqn.(6.4)).
The above analysis clearly shows that both the subspace ID and DPLS soft sensor are derived
based on the same extended state-space model (i.e., Eqn.(6.6)), while the DPLS soft sensor
is a special case with f = 1. Therefore, the DPLS soft sensor is theoretically a dynamic
estimator that gives unbiased estimate for both open loop and closed-loop data.
Remark. It is worth noting that the purpose of the soft sensor model in this work is to predict
the Kappa number, instead of estimating the system matrices A, B and Cs. Therefore, the
system does not have to be persistently excited; instead, only the observable part of the
process needs to be captured so that under a similar operation condition, the Kappa number
can be estimated from the secondary variables accurately. The purpose of the Theorem is
to show that by applying DPLS, one is able to construct a soft sensor that can adequately
capture the process dynamics under the given operation condition and provide unbiased
estimate of the primary variable, provided that enough secondary variables are used in the
regression.
Remark. Ku et al [29] extended PCA to DPCA in order to monitor a dynamic process by
using the augmented measurements of process variables as the new variables, which is also
the basic idea behind DPLS. The fundamental reason that such augmentation can enable
static PCA and PLS to capture process dynamics can be explained by the extended state-
space model, which is the foundation of subspace identification. This was also pointed out
95
in [29]. However, in [29], the emphasis is on process monitoring, and on providing a heuristic
explanation on how monitoring lagged data can remove the time dependence of the self-
correlated and cross-correlated dynamic process data, rather than providing a generalized
proof/derivation on how process dynamics is captured by a static linear relationship between
lagged variables. In this work, one major contribution is to derive an exact static linear
regression between the augmented secondary variables and primary variables based on the
state-space representation of an LTI system, and to show that such model provides unbiased
prediction of the primary variables under both open loop and closed-loop operations.
Remark. During the stage of model building, the measurements of both primary and sec-
ondary variables that are collected at the same frequency are needed to estimate the model,
i.e., ??,cf L?p. Note that high frequency measurements of the primary variables is not required.
In this work, the secondary variable measurements are down-sampled to obtain the same
frequency of both primary and secondary variables for the industrial case study. During
model prediction/application phase, the secondary variables are not down-sampled and the
Kappa numbers are predicted at the higher frequency as that of the secondary variables. If
zero or first order hold were implemented for the Kappa number, only the Kappa numbers
that were actually measured should be used for model building.
Remark. In the traditional subspace ID methods such as MOESP [157] and N4SID [165],
the innovation form instead of the observation form is used. The extended state-space form
using Hankel data matrix is given below:
Yf = ?fXk +HfUf +Ef (6.13)
One of the first steps in both methods is to eliminate the contribution from future input
(Uf) to the future output (Yf). In MOESP, it is done by projecting out the future input
through post multiplying Eqn. (6.13) with the orthogonal complement of future input,
i.e., ?TUf = I ? UTf (UfUTf )?1Uf; in N4SID, it uses an oblique projection to remove the
96
contribution from both the future input (Uf) and innovation (Ef), by projecting Eqn. (6.13)
onto the past data along the future input. In order to obtain unbiased estimate of the system
matrices, both methods require that the future innovation term (Ef) is uncorrelated with
future input (Uf), i.e., limn?? 1nbraceleftbigEfUTfbracerightbig= 0, which is the fundamental reason that these
traditional methods only work for open loop cases, as this condition is true for open loop
only. For closed-loop data, due to feedback, the future input will be correlated to past
innovation. Similar to what we have shown in Eqn. (B.2), the lower triangular part of the
matrix limn?? 1nbraceleftbigEfUTfbracerightbigis non-zero due to feedback control. Note that for subspace ID, in
order to estimate system matrices, it is usually required that the future horizon f > n ? 1.
Therefore, the traditional subspace ID methods cannot provide unbiased estimates using
closed-loop data [163].
Remark. In practice, most processes are time-varying instead of LTI, which is difficult to
address under the subspace ID framework. However, this can be addressed by adaptive or
recursive update of the soft sensor. In other words, a sequence of LTI systems is used to
approximate a time-varying and possibly nonlinear system. The details on how to effectively
update the soft sensor are presented in Chapter 7.
6.3 Reduced Order DPLS Soft Sensor
For some chemical processes, the process orders are high due to transport delay, strong
process dynamics, and nonlinearity. For example, for pulp digesters, n = 25 and n = 50 have
been reported in [152] and [166] respectively. In addition, in order for the contribution from
the initial state to be negligible, the past horizon is usually much larger than the process
order. Therefore, a large number of regressor variables (including their lagged values) would
be included in the soft sensor if it is constructed as the traditional DPLS soft sensor following
Eqn.(6.12). The potential issue with this is that model building and maintenance would be
more difficult and the model tends to be slow in response to process changes. In addition,
DPLS can only provide one-step-ahead predictions.
97
To address these issues, in this work, a reduced order DPLS (RO-DPLS) approach is
proposed. This approach minimizes the number of regressor variables by taking the process
characteristics into account. For digester control problem, one reason for the high system
order is the long transport delay. If the delay time can be identified and eliminated by
time shifting the measurements, the system order and the past horizon can be effectively
reduced. In digester control, because the wood chips flow from the top to the bottom of
the digester, the delays associated with the measurements collected at different locations
are different, e.g., the delay associated with the upper circulation would be different from
the delay associated with the extraction. To illustrate this point, the extended Purdue
model [145, 146] is implemented, which simulates the pulping process by approximating the
digester with a series of continuous stirred tank reactors (CSTRs). As shown in Figure 6.2,
after a disturbance in wood chip composition was introduced at 0 hour, the effect of the
disturbance shows up in the Kappa numbers with different delays at different locations
indicated by the indices of the CSTRs. Fig. 6.2 indicates that among all the variables
included in the past data vector zp, only the ones corresponding to their associated delays
contribute the most to the final Kappa number. Therefore, instead of using all elements in
zp(k) to predict the pulp Kappa number, based on the principle of parsimony, only those
that contribute the most will be selected to build the dynamic sensor. To illustrate this
point, by denoting uk = [ u1(k) ??? um(k) ]T, and ysk = [ y1(k) ??? yl(k) ]T, the past
data zp is reorganized into the following format and compare it to the regressor with selected
98
variables, denoted as z?p.
zp(k) =
?
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
?
u1(k ?1)
...
u1(k ?p)
u2(k ?1)
...
u2(k ?p)
...
um(k ?1)
...
um(k ?p)
y1(k ?1)
...
y1(k ?p)
...
yl(k ?1)
...
yl(k ?p)
?
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
?
z?p(k) =
?
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
?
u1(k ?du1)
...
u1(k ?du1 ?p?)
u2(k ?du2)
...
u2(k ?du2 ?p?)
...
um(k ?dum)
...
um(k ?dum ?p?)
y1(k ?dy1)
...
y1(k ?dy1 ?p?)
...
yl(k ?dym)
...
yl(k ?dy1 ?p?)
?
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
??
?
where dui denotes the delay difference between the ith input and the product Kappa number,
and dyj denotes the delay difference between the jth secondary output and the product Kappa
number. Letdmax = max{dui, dyj}, dmin = min{dui, dyj}(i ? [1, ??? , m], j ? [1, ??? , l]).
It is clear that p ? dmax +p?, and in most cases p ? p?. Therefore, by incorporating the pro-
cess delay information, RO-DPLS effectively reduces the dimension of the regressor. Another
big advantage associated with RO-DPLS is that dmin-step-ahead prediction can be obtained
directly instead of the one-step-ahead prediction by regular DPLS as given in Eqn.(6.12). It
is worth noting that multiple-step-ahead prediction depends on process characteristics. Only
99
one-step-ahead prediction is possible if dmin = 1. However, for most applications in practice
that require soft sensor approaches to estimate process quality variables, such as digester
and distillation column, dmin ? 1. It is worth noting that because the goal of a soft sensor is
0 1 2 3 4 5 6?0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (Hour)
Normalized Kappa
d3
d2
d1
Kappa12
Kappa18
Kappa35
Kappa50
Figure 6.2: Different delay associated with variables at different location. Subscripts in the
legend are the indices of the CSTRs used to simulate the digester where 1 and 50 correspond
to the top and bottom of the digester respectively. Refer to Section 6.5 for the simulator
details.
to obtain an accurate estimation of the primary output instead of system matrices, the past
horizon p? associated with each variable can be different, which provides additional flexibility
compared to subspace ID methods. In addition, it is possible to reduce the sampling points
within the past horizon so as to further reduce the dimension of the regressor.
6.4 Implementation of the RO-DPLS Soft Sensor
The implementation of the RO-DPLS soft sensor is illustrated in Figure 6.3, which
consists of five steps: variable selection, variable delay estimation, variable past horizon
estimation, regressor and regressand matrices construction and application of PLS to the
constructed matrices. As the last two steps are common to all DPLS based soft sensors,
below the details on the first three steps are discussed.
100
Variable Selection
Variable Delay
Estimation
Variable Past
Horizon Estimation
Regressor and
Response Matrices
Construction
PLS
Figure 6.3: Flow diagram of the proposed RO-DPLS approach
6.4.1 Regressor Variable Selection
Similar to DPLS, one potential issue with the RO-DPLS soft sensor approach is the
selection of the secondary variables. Inappropriate selection of secondary variables as soft
sensor inputs may lead to numerical problems, such as singularity and over-parameterization,
or may significantly reduce the estimation accuracy [39, 48]. It is well known that the
satisfactory performance of soft sensors is likely to be achieved if only those secondary
variables that are most sensitive to the primary variables are employed. This can be done
based on process knowledge [167], iterative selection method [168], sensitivity analysis [169?
171] and others [39, 152]. A brief review of these methods are given below.
Joseph and Brosilow [168] proposed an iterative method to select the optimal set of
secondary variables to be used as soft sensor inputs for a distillation column. The method
is based on the addition of secondary variables to the optimal set, one at a time. The
procedure is repeated until satisfactory estimation accuracy is obtained, or until all variables
have been included. Tolliver and McCume [169] suggested that the optimal temperature
locations (secondary measurements) in a distillation column be determined by evaluating
101
the column sensitivity to the material balance. Another soft sensor input variable selection
approach is based on the application of singular value decomposition (SVD) to the sensitivity
gain matrix [170, 171]. Amirthalingam and Lee suggested that variables selection should
be based on the achievable nominal performance [152]. In their approach, a closed-loop
performance index is established and the achievable values for all candidate variables are
computed and ranked. More recently, Zamprogna et al. proposed a variable selection method
based on a PCA sensitivity analysis [39]. All these methods have their pros and cons. In
this work, Joseph and Brosilow?s approach is adopted due to its simplicity. Because process
knowledge helps to remove most of the insignificant variables, the iterative procedure is not
time consuming.
6.4.2 Variable Delay Estimation
One key step of the proposed RO-DPLS soft sensor is the estimate of the delays asso-
ciated with different variables. For the continuous digester, because wood chips flow from
top to bottom, the transport delay associated with the measurements collected at different
locations of the digester are different. By identifying the delay time associated with each
one of the streams in the digester, the system order can be effectively reduced, therefore the
past horizon required to predict the Kappa number is reduced.
To estimate the delay time associated with each stream of the digester (i.e., upper
heater, lower heater, extraction), the major source of disturbance in the continuous pulping
process is considered ? wood composition variations. For the simulated case, a step change
can be introduced in the wood composition and the delays of the secondary variables for the
differentstreams are estimated using the methods reported in [172, 173] by approximating the
responses with systems of second order plus time delay (SOPTD). For industrial digesters,
the variable delays can be estimated based on the wood chip flow rate and the physical
dimensions of the digesters, e.g., the diameters and heights of different zones and physical
102
locations of different sensors. Similar variable delay estimation strategy has been reported
in [174].
6.4.3 Variable Past Horizon Estimation
To extract the dynamic relations between the primary output and the secondary output,
one has to determine the necessary past horizon (p for DPLS or p? for RO-DPLS) . For linear
systems, this can be determined based on the order of the dynamic system. For nonlinear
systems, past horizon could be higher in order to get a better linear approximation of the
actual nonlinear relationships [29]. Two different approaches can be adopted to select the
past horizon. One approach is based on [29], where PCA is performed on the data matrix
zp(k) or z?p(k), so that an implicit multivariate autoregressive (AR) (ARX model if the
process inputs are included) is extracted directly from the data. The Q statistic obtained
from PCA is then the squared prediction error of the ARX model. If enough past horizon,
p or p?, is included in the data matrix, the Q statistic is statistically independent from one
time instant to the next. Another approach is based on [124] where autoregressive models
with different p or p? values to the training data are fitted. The past horizon is selected to
be the p or p? value that minimizes the small sample AIC criterion [175]. In this work, the
second approach is used.
6.5 Application to a Simulated Continuous Digester
In this section, the performance of the proposed RO-DPLS soft sensor approach will
be evaluated using a high-yield digester simulated with the extended Purdue model, and
compared with the traditional DPLS approach.
6.5.1 Simulator
The extended Purdue model developed in [146] was implemented as the ?plant? for sim-
ulation purposes. The single-vessel high-yield digester as shown in Fig. 6.1 is approximated
103
by 50 CSTRs in series, which results in 950 nonlinear ordinary differential equations (ODEs).
For this case study, the following process variables are employed:
? Output The primary output is the product Kappa number; the secondary outputs
include the effective alkali (EA), dissolved lignin (DL) and free liquor temperature
(T) of upper recirculation, lower recirculation and extraction flow. Both primary and
secondary outputs are measured every 6 minutes during the simulation. White noises
were added to both primary and secondary measurements during all plant simulation
with variances listed in Table 6.1. Note that only EA and DL were used as regressor
variables in [152], but we also include T because our experience shows that temperature
contributions cannot be neglected in the DPLS and RO-DPLS approaches. Also note
that besides the variables mentioned above, there are many other variables that are
usually measured in practice, including flow rates of different streams, concentrations
of other liquid phase compositions, etc.. However, for soft sensor construction in this
work, their contributions are negligible.
Table 6.1: Variances of the white measurement noise for the primary and secondary outputs
Output Variance
EA 0.015
DL 0.015
T 0.05
Kappa # 0.05
? Manipulated input Although upper heater temperature, lower heater temperature
and white liquor flow rate can be adjusted in the real process, the lower heater tem-
perature was chosen as the sole manipulated variable based on the results in [152].
? Unmeasured disturbance Three different types of unmeasured disturbances are
considered in this work, namely entering white liquor EA concentration, upper and
104
Table 6.2: Selected regressor variables
Variable Description Location/Stream CSTR index
EA12 Free liquor effective alkali concentration Upper cooking 12
DL12 Free liquor dissolved lignin concentration Upper cooking 12
T12 Free liquor temperature Upper cooking 12
TU Upper heater exit temperature Upper heater exit -
EA18 Free liquor effective alkali concentration Lower cooking 18
DL18 Free liquor dissolved lignin concentration Lower cooking 18
T18 Free liquor temperature Lower cooking 18
TL Lower heater exit temperature Lower heater exit -
EA35 Free liquor effective alkali concentration Extraction 35
DL35 Free liquor dissolved lignin concentration Extraction 35
T35 Free liquor temperature Extraction 35
lower heater temperatures, and wood compositions. The disturbances can be intro-
duced to the simulator either separately or simultaneously, and both cases are studied
in this work.
6.5.2 Soft Sensors Setup
Following the procedures presented in Section 6.4, for both DPLS and RO-DPLS, the
same set of variables are selected to build the soft sensors, which are listed in Table 6.2.
The RO-DPLS parameters associated with the selected variables, including delay difference
from the product Kappa number (di) and past horizon (p?i), are listed in Table 6.3. For
DPLS, the past horizon p was chosen to be 34, which is the optimized number based on the
AIC criterion. For both DPLS and RO-DPLS, the number of principal components (PC?s)
are determined by 4-contiguous-block cross-validation. In this work, the same settings are
applied to both open loop and closed-loop case studies.
6.5.3 Open Loop Case Studies
Open loop simulations were conducted first. During the simulations, the liquor mea-
surements and Kappa number measurement were collected at the same frequency, i.e., every
105
Table 6.3: Dynamic soft sensor setup
Variable p?i di
EA12 3 17
DL12 3 17
T12 3 17
TU 3 17
EA18 5 16
DL18 5 16
T18 5 16
TL 5 16
EA35 6 14
DL35 6 14
T35 5 14
Table 6.4: Variances of the white noise for the unmeasured disturbances
Disturbance Variance
?s1 0.0003
?s2 0.0007
?s3 0.007
?s4 0.0001
?s5 0.002
6 minutes. To evaluate the performance of RO-DPLS, four different types of disturbances
are introduced:
Case 1: Integrated white noise with variance 0.0009 in entering white liquor EA concentra-
tion.
Case 2: Integrated white noise with variance 0.005 in upper and lower heater temperatures.
Case 3: Integrated white noises with variances listed in Table 6.4 in the five wood composi-
tions (?si,i = 1,2,...,5) at the inlet.
Case 4: The combination of the above three cases.
Note that the parameters listed in Tables 6.1 and 6.4 are the same as those in [166]. For
each case, 80 hours (800 samples) of simulated data were used for training; 700 hours (7000
106
samples) of the simulated data were used for testing. To assess the performance of the differ-
ent soft sensors, three statistics are used, namely maximum relative estimated error (?max),
mean prediction error (MPE) and mean squared prediction error (MSPE) as defined below.
?max = max
vextendsinglevextendsingle
vextendsinglevextendsingle
parenleftbiggy
i ? ?yi
yi
parenrightbiggvextendsinglevextendsingle
vextendsinglevextendsingle (6.14)
MPE = 1N
Nsummationdisplay
i=1
(yi ? ?yi) (6.15)
MSPE = 1N
Nsummationdisplay
i=1
(yi ? ?yi)2 (6.16)
where yi and ?yi are the measured and predicted Kappa numbers at the digester exit (i.e.,
the 50th CSTR) respectively.
The performances of the DPLS and RO-DPLS soft sensors for the four different cases are
summarized in Table 6.5. The results indicate that both methods can satisfactorily predict
the Kappa numbers, which is also confirmed by Fig. 6.4 where the predictions of the both
methods are compared to the measurements for Case 4. In other words, the case studies
demonstrate our theoretical result that DPLS can adequately capture process dynamics if
enough past horizon is included in the model. It can also be seen that the proposed RO-DPLS
soft sensor performed better than DPLS in all cases. RO-DPLS performed particularly better
than DPLS in the combined disturbance case (i.e., Case 4), which is probably more realistic
in an actual pulp digester. More importantly, RO-DPLS predicts the Kappa number 14 steps
ahead, while DPLS only provides one-step-ahead predictions. It is worth noting that if the
number of the training samples is increased by 10 times to 800 hours (8000 samples), the
performance of DPLS would be similar to that of RO-DPLS. However, the requirement of
large training data makes model building and maintenance more difficult. For example, the
industrial digester data used in this work were collected over a year. After data preprocessing
(i.e., eliminating data during digester down time, sensor failure, and zero/first order hold
values), only 1550 samples were left, which is equivalent to 155 hours of simulation. Also,
107
the model built based on large training data would be slow to capture process changes when
implemented online with the corresponding recursive algorithm. In addition, it does not
change the fact that RO-DPLS prediction is 13 steps ahead of DPLS, which is beneficial for
controller design. The aspects of recursive implementations and controller design are revised
in Chapter 7.
Table 6.5: Comparison of DPLS and RO-DPLS: open loop cases
Case Case 1 Case 2 Case 3 Case 4
?max
DPLS 0.9315 1.1044 1.5982 2.1831
RO-DPLS 0.9292 1.0211 1.2271 1.4879
% reduction 0.3% 7.5% 23.2% 31.8%
MPE
DPLS -0.1305 -0.0747 -0.3629 -0.6147
RO-DPLS -0.1094 0.0092 -0.1045 -.0595
% reduction 16.2% 87.7% 71.2% 90.3%
MSPE
DPLS 0.07493 0.08208 0.1991 0.6307
RO-DPLS 0.06892 0.06840 0.08640 0.1924
% reduction 8.0% 16.7% 56.6% 69.5%
6.5.4 Closed-Loop Case Study
In this subsection, it is demonstrated that both DPLS and the proposed RO-DPLS soft
sensors work not only for open loop data but also for closed-loop data as discussed in Sec-
tion 6.2. In addition, it is shown that RO-DPLS performs better than the traditional DPLS
soft sensor.
Previous studies [148, 176] have shown that with real-time feedback of the pulp Kappa num-
ber measurements, a well tuned PID controller can provide satisfactory control performance
in rejecting wood feed variances. In this closed-loop case study, the lower heater temperature
is selected as the manipulated input as in [152], and the rest of the 10 variables in Table 6.2
are secondary outputs. The plant model used to design the controller is a SOPTD model
identified from a step response in wood composition using the method reported in [172].
108
0 100 200 300 400 500 600 70040
45
50
55
60
65
70
Time (hour)
Kappa
Measurement
Prediction
(a)
0 100 200 300 400 500 600 70040
45
50
55
60
65
70
Time (hour)
Kappa
Measurement
Prediction
(b)
Figure 6.4: Soft sensor performance comparison - open loop: (a) DPLS; (b) RO-DPLS
109
Based on the SOPTD model approximation, a PID controller is designed with the trans-
fer function shown below, and the parameters based on the IMC-based controller tuning
relations [167] are summarized in Table 6.6.
G(s) = Kce
??s
?2s2 + 2?? + 1 (6.17)
Table 6.6: Controller tuning parameters
Kc ?I ?D
?0.0997 22.24 54.00
In this case study, 39 hours (390 samples) of simulated data were used for training; 95
hours (950 samples) of the simulated data were used for testing. White noise with variance
0.0002 was added to the Kappa measurements. Similar to the open loop cases, ?max, MPE,
and MSPE are used to evaluate the performance of DPLS and RO-DPLS, and the result
is summarized in Table 6.7. Figure 6.5 shows the comparison of the predictions of both
methods to the measurements. It can be seen from Figure 6.5 that DPLS and RO-DPLS
are indeed able to capture process dynamics when the process is under closed-loop control,
which is consistent with our theoretical finding in Section 6.2. In addition, both Table 6.7
and Figure 6.5 indicate that RO-DPLS outperforms DPLS. Similar to the open loop cases, if
the number of the training samples is increased by 10 fold, the performance of DPLS would
be similar to RO-DPLS. Even so, RO-DPLS provides 14-step-ahead prediction horizon that
the traditional DPLS cannot offer, which would allow us to tune the controller to be more
aggressive for better control. It is worth noting that in this case study, it is assumed that real-
time Kappa number is available for feedback control, which is not the case in practice. The
integration of the RO-DPLS soft sensor into digester control is currently under investigation.
It is worth noting that in this work the soft sensors were tuned (e.g., variable selection,
past horizon, number of PC, etc) to minimize MSPE because it is what matters in practice.
110
Our experiences have shown that if the soft sensors were tuned to minimize MPE, higher
MSPE will be resulted.
Table 6.7: Comparison of DPLS and RO-DPLS: closed-loop case
?max
DPLS 0.1062
RO-DPLS 0.0832
% reduction 21.7%
MPE
DPLS 0.0387
RO-DPLS 0.0225
% reduction 41.9%
MSPE
DPLS 1.90?10?3
RO-DPLS 0.523?10?3
% reduction 72.5%
6.6 Application to an Industrial Kamyr Digester
In this section we apply the RO-DPLS soft sensor to predict the Kappa number of a
single-vessel Kamyr digester using data from a paper mill located at Mahrt, Alabama run by
MeadWestvaco Corporation. The digester has a DCS and a Duralyzer-NIR digester analyzer
system [177, 178] to measure secondary measurements such as EA, DL, total dissolved solids
(TS) and active alkali (AA) for the different zones of the digester. The performance of RO-
DPLS is compared with that of DPLS. The soft sensors were trained using 1100 samples and
tested using 450 samples. Based on Joseph and Brosilow?s approach, totally 16 secondary
outputs were selected as regressor variables, which are listed in Table 6.8, together with the
associated delay and past horizon used in building the RO-DPLS model. As discussed in
Sec. 6.3, it is not required that all variables the have the same past horizon. It is just in this
case that a past horizon of 3 seems to work for all variables. Fine tuning the past horizon
for each variable would improve the performance for some cases, but the improvement is
incremental. For DPLS, the past horizon is set to be 22 based on the AIC criterion. The
same set of regressor variables are used for both DPLS and RO-DPLS soft sensors. The
111
0 20 40 60 80 10044.95
45
45.05
45.1
45.15
45.2
45.25
45.3
45.35
Time (hour)
Kappa
Meausrement
Prediction
(a)
0 20 40 60 80 10044.95
45
45.05
45.1
45.15
45.2
45.25
45.3
45.35
Time (hour)
Kappa
Meausrement
Prediction
(b)
Figure 6.5: Soft sensor performance comparison - closed-loop: (a) DPLS; (b) RO-DPLS
number of PC?s to be retained was determined by 4-block cross-validation. Table 6.9 gives
the summary of the performance of the different soft sensors in terms of MPE and MSPE.
?max is not included as a performance measure due to the significant measurement noise
and possibly outliers in the lab Kappa measurements as indicated by the occasional spikes
shown in Figure 6.6, where the soft sensor predictions are compared to the lab measurements.
Both Figure 6.6 and Table 6.9 show that RO-DPLS predicts the trend of Kappa number
changes better than DPLS. Another key aspect is that the RO-DPLS soft sensor offers
112
Table 6.8: RO-DPLS model parameters
Variable di p?i
Top Circulation Temperature 10 3
Combined Heater Temperature 10 3
Upper Cooking [Na2S] 9 3
Upper Cooking [Na2CO3] 9 3
Upper Cooking [Lignin] 9 3
Upper Cooking [Total EA] 9 3
Lower Cooking [Lignin] 8 3
Lower Cooking [Na2CO3] 8 3
Lower Cooking [Na2S] 8 3
Lower Cooking [Total EA] 8 3
Extraction Temperature 6 3
Extraction [Na2CO3] 6 3
Extraction [Lignin] 6 3
Extraction [Na2S] 6 3
Extraction [Total EA] 6 3
Extraction [% Solids] 6 3
Table 6.9: Comparison of soft sensor performance in the industrial case
MPE
DPLS 1.0703
RO-DPLS 0.6267
% reduction 41.5%
MSPE
DPLS 14.6199
RO-DPLS 12.6850
% reduction 13.2%
a prediction horizon of 1.5 hours, that as mentioned earlier can be combined with more
aggressive controller settings to obtain better control of the Kappa number. In addition,
the DPLS soft sensor requires a much longer past horizon (22 samples, or 5.5 hr) compared
to that of RO-DPLS (3 samples, or 0.75 hr). The possible reason is that a LTI system is
used to approximate a nonlinear system with large transport delay. To capture the large
delay using an approximated linear system, the resulted linear system usually has a very
high system order (n); in addition, the past horizon p is usually required to be much larger
than the system order n in order to ignore the effect of initial state. Therefore, the resulted
113
DPLS soft sensor requires a much longer past horizon. This is consistent with the findings
in [152], where the identified linear model for a continuous Kamyr digester using subspace ID
has the system order of n = 40. Even for the reduced order model in [166], the system order
is 25 for a continuous Kamyr digester. All of these advantages offered by the RO-DPLS soft
sensor makes the proposed approach a viable option for online implementation.
0 20 40 60 80 100 12075
80
85
90
95
100
105
Time (hour)
Kappa
Measurement
Prediction
(a)
0 20 40 60 80 100 12075
80
85
90
95
100
105
Time (hour)
Kappa
Measurement
Prediction
(b)
Figure 6.6: Soft sensor performance comparison - industrial case: (a) DPLS; (b) RO-DPLS
114
6.7 Conclusion and Discussion
In this chapter, a theoretical analysis on the DPLS soft sensor is provided. By deriving
the DPLS soft sensor formulation using the subspace ID framework, it is shown that the
DPLS soft sensor is theoretically a dynamic estimator that can adequately capture the
process dynamics provided that the past horizon is long enough. In addition, by setting the
future horizon f = 1, the DPLS soft sensor works for both open loop and closed-loop data,
which is different from some subspace ID methods.
To reduce the number of lagged variables that are required to be included in the regressor
matrix for the traditional DPLS soft sensor, a reduced-order DPLS approach is proposed.
By explicitly estimating the transport delay involved in the process, time-shifted variables
are used as regressor variables to reduce the system order, therefore reducing the required
past horizon for prediction. In this way, not only a reduced-order model that provides
superior performance can be obtained, but also multiple-step-ahead prediction instead of
one-step-ahead prediction are achieved.
The performance of the proposed RO-DPLS is demonstrated using both a simulated
single-vessel continuous digester and an industrial Kamyr digester at MeadWestVaco Corp.
The simulation results confirm that the DPLS and RO-DPLS soft sensors can capture process
dynamics adequately for both open loop and closed-loop operations. In addition, it was
shown that the RO-DPLS soft sensor performs better than the traditional DPLS approach,
especially when training data are limited. The industrial case study also confirms that RO-
DPLS provides an effective and practical solution to estimate the product Kappa number
for continuous digester.
It is worth noting that the application of the proposed RO-DPLS approach is not limited
to pulp digesters. Instead, it can be applied whenever DPLS is applicable. RO-DPLS will
be advantageous when significant delays exist between the available secondary output and
the primary output to be estimated.
115
Finally, as pointed out earlier, real industrial processes are usually time-varying. For
example, when the operation point such as cooking temperature changes, or when the wood
type changes, the soft sensor model should change as well. To address this difficulty, different
adaptive schemes to update the soft sensor online are investigated in the next chapter.
116
Chapter 7
Recursive Reduced-Order Dynamic PLS Soft Sensor
7.1 Introduction
In many industrial processes such as distillation columns and pulping digesters, the pri-
mary product variables that are required for feedback control are either not measured online
or not measured frequently. For example, for the Kamyr pulp digesters, the Kappa number,
a key indicator of the extent of reaction, is usually measured in a laboratory once per hour
with half an hour to one hour measurement delay. To address this challenge, many data-
driven soft sensors have been developed and implemented in process industry. Due to the
growing popularity, demonstrated usefulness and huge potential, soft sensor development has
attracted many research interests in the past two decades [39, 44, 46, 47, 179, 180]. The most
commonly applied modeling techniques in data-driven soft sensors are principal component
analysis (PCA), partial least squares (PLS), artificial neural networks (ANN) and support
vector machine (SVM) [51, 52, 54, 57, 181?183]. Kadlec et al. [139, 140] provide a compre-
hensive review on the characteristics of the data generated from the industrial processes and
different data-driven soft sensor modeling techniques. A more systematic and thorough dis-
cussion on the design procedures of soft sensors and their applications for solving industrial
problems can be found in [49].
Amongdifferentdata-drivensoftsensorapproaches, thedynamicpartialleastsquares(DPLS)
soft sensor approach has been applied to many industrial processes due to its simplicity and
available adaptive versions. In the previous chapter [63], a theoretical analysis on the DPLS
soft sensor modeling approach is provided, and it was proved that for a linear time invariant
(LTI) system, the DPLS soft sensor is a dynamic estimator that can adequately capture
process dynamics provided that enough lagged variables are used to build the model [63]. In
117
addition, a reduced-order DPLS (RO-DPLS) soft sensor approach was developed to address
some limitations of the traditional DPLS soft sensor when applied to processes with large
transport delays. Case studies of both simulated and industrial Kamyr digesters demonstrate
the improved performance of RO-DPLS soft sensor compared to the traditional DPLS soft
sensor.
In [63], only a static RO-DPLS soft sensor was considered. However, industrial processes
often experience time-varying changes, such as variations in process input material, process
fouling and catalytic decaying. As a result, the performance of an off-line developed soft
sensor often deteriorates. Under these circumstances, it is desirable to update the soft sen-
sor model with the new process data to reflect the process changes. Various adaptation
techniques have been published to update DPLS soft sensors online. Helland et al. [137]
introduced a recursive PLS (RPLS) regression algorithm which updates the model based on
new data without increasing the size of data matrices. Qin [138] further proposed a block-
wise RPLS, as well as a moving window and forgetting factor adaptation schemes. Dayal and
MacGregor [154] proposed a computationally efficient RPLS by using a kernel algorithm [184]
for the PLS computation rather than the NIPALS method [24]. However, existing work on
recursive PLS schemes mainly focus on investigating their theoretical properties, as well as
designing schemes to reduce computational load. To the best of our knowledge, no result
has been published on the practical aspects of these schemes, such as how to choose the
updating frequency and how to select the number of PLS components in order to optimize
the model performance.
In this chapter, the previously proposed RO-DPLS soft sensor [63] is extended to its online
adaptive version in order to track process changes. Several model update schemes as well
as different data scaling methods are examined with the emphasis on gaining understand-
ing on the practical aspects of the different recursive schemes. Simulated cases studies and
two data sets collected from an industrial Kamyr digester are used to compare different ap-
proaches [67?70]. The rest of the chapter is organized as follows: Section 7.2 provides some
118
background information for the RO-DPLS soft sensor. Section 7.3 compares the performance
of RO-DPLS with different model updating schemes and data scaling methods reviewed in
Chapter 5 using simulated and real industrial data. In Section 7.4, the effectiveness of the
recursive RO-DPLS soft sensor is also demonstrated using a digester closed-loop control case
study. Finally, conclusions and discussions are presented in Section 7.5.
7.2 Brief Review of the RO-DPLS Soft Sensor
For some chemical processes such as Kraft pulping, the process orders are very high
due to transport delay, strong process dynamics and/or nonlinearity. For such processes,
if a traditional DPLS approach is used, a large number of regressor variables (including
their lagged values) would be included in the soft sensor to capture the process dynamics
adequately. The regressor variables in the traditional DPLS approach denoted by zp(k) are
shown in Equation (7.1).
zp(k) = [u1(k ?1), ??? , u1(k ?p), ??? , um(k ?1), ??? , um(k ?p),
v1(k ?1), ??? , v1(k ?p), ??? , vl(k ?1), ??? , vl(k ?p)]T (7.1)
where ui (i = 1???m) are process input variables, p is the window of lagged variables
which is usually (much) larger than the process order, and vi (i = 1???l) are secondary
output variables. The potential issue with this traditional DPLS approach is that model
building requires large amount of training samples due to large number of regressor variables,
which in turn slows down the model?s response to process changes. To address this, in the
previous chapter, the RO-DPLS approach is proposed which significantly reduces the number
of regressor variables by taking the process characteristics into account. For processes with
large transport delay, if the delay time can be identified and eliminated by time shifting the
measurements, the system order can be effectively reduced, and much fewer lagged variables
can be used to build the soft sensor without sacrificing the prediction performance. In the
119
RO-DPLS soft sensor, instead of including all the variables that are used in the traditional
DPLS approach, based on the principle of parsimony, only the variables that contribute
the most to the primary output variables are selected to build the dynamic soft sensor. The
regressor variables in the RO-DPLS approach, denoted by z?p(k) are shown in Equation (7.2).
z?p(k) = [u1(k ?du1), ??? , u1(k ?du1 ?p?), ??? ,
um(k ?dum), ??? , um(k ?dum ?p?),
v1(k ?dv1), ??? , v1(k ?dv1 ?p?), ??? ,
vl(k ?dvm), ??? , vl(k ?dv1 ?p?)]T (7.2)
where dui denotes the delay difference between the ith input and the primary output, and
dvj denotes the delay difference between the jth secondary output and the primary output.
Let dmax = max{dui dvj}, dmin = min{dui, dvj},(i ? [1, ??? , m], j ? [1, ??? , l]). It is
clear that p ? dmax + p?, and in most cases p ? p?. Therefore, by incorporating the process
delay information, RO-DPLS effectively reduces the dimension of the regressor. Another
important advantage of RO-DPLS is that dmin-step-ahead prediction is obtained compared
to one-step-ahead prediction from the traditional DPLS.
7.3 Comparison of the RO-DPLS Soft Sensor Performance and Practical As-
pects
Inthissection, thepreviouslydevelopedRO-DPLSsoftsensorisextendedtoitsrecursive
version by applying the different model updating schemes and data scaling methods discussed
in Chapter 5. In this section, the impact of two practical aspects on different recursive PLS
schemes is investigated: model update frequency and the way to determine the number of
PLS components for prediction. A simulated case study and two industrial case studies of a
single vessel high yield Kamyr digester are used to compare the performance of RO-DPLS
with different model updating schemes and the corresponding data scaling methods.
120
Table 7.1: Standard deviations of the white noise for the unmeasured disturbances
Disturbance ?
?s1 0.1732
?s2 0.2646
?s3 0.8367
?s4 0.1000
?s5 0.4472
EA 0.0300
HS 0.0089
7.3.1 Description of the Data Sets and Case Studies
Simulated data set
The extended Purdue model [146] is implemented to simulate a single vessel high yield
Kamyr digester. The digester is approximated by 50 CSTRs in series. Open loop simulation
is conducted to generate the data set. During the simulation, the liquor measurements and
Kappa number measurements are collected at the same frequency, i.e., one sample every 6
minutes. Integrated white noise disturbances (dk+1 = dk +wk) are introduced to white liquor
compositions of effective alkali (EA), active hydrosulfide (HS), and five wood compositions
(?si,i = 1,2,...,5) at the inlet. The standard deviations of the white noise (wk) for the
unmeasured disturbances are listed in Table 7.1. In addition, measurement noises are added
to both primary and secondary variables during all plant simulation with standard deviations
listed in Table 7.2. 300 hours of data were simulated for training and 250 hours for testing.
The RO-DPLS soft sensor is set up as in [63]. The secondary variables that are selected as
regressor variables are listed in Table 7.3, and the delay (di) and past horizon (pi) of each
regressor variable are listed in Table 7.4.
Industrial data sets
Two industrial data sets were collected from a Kamyr digester at a pulp mill located
in Mahrt, Alabama run by MeadWestvaco Corp. The digester has a DCS system and a
121
Table7.2: Standarddeviationsofthewhitemeasurementnoisefortheprimaryandsecondary
outputs
Output ?
EA 0.1225
DL 0.1225
T 0.1225
Kappa # 0.2236
Table 7.3: Selected regressor variables for the simulated case study
Variable Description Location/Stream CSTR index
EA12 Free liquor EA concentration Upper cooking 12
DL12 Free liquor DL concentration Upper cooking 12
T12 Free liquor temperature Upper cooking 12
TU Upper heater exit temperature Upper heater exit -
EA18 Free liquor EA concentration Lower cooking 18
DL18 Free liquor DL concentration Lower cooking 18
T18 Free liquor temperature Lower cooking 18
TL Lower heater exit temperature Lower heater exit -
EA35 Free liquor EA concentration Extraction 35
DL35 Free liquor DL concentration Extraction 35
T35 Free liquor temperature Extraction 35
Duralyzer-NIR digester analyzer [177, 178] to measure secondary measurements such as
effective alkali (EA), total dissolved solids (DS), dissolved lingin (DL), and active alkali
(AA) for different streams of the digester. The first data set collected in 2006 contains
1570 samples with Kappa measurements (data set I). The second data set collected in 2010
contains 350 samples (data set II). Several commonly applied basic steps were taken to
preprocess the data sets, which include the removal of zero and first order hold readings,
as well as the removal of evident outliers (e.g. negative readings for flowrate). To avoid
distorting the information contained in the process data, no filtering technique (e.g. low
pass filter) was used because the presence of non-white noise could affect the performance
of the filtering step.
For industrial case study I, 1100 sample points from data set I are used as the training data
122
Table 7.4: RO-DPLS model parameters for the simulated case study
Variable di pi
EA12 17 3
DL12 17 3
T12 17 3
TU 17 3
EA18 16 5
DL18 16 5
T18 16 5
TL 16 5
EA35 14 6
DL35 14 6
T35 14 5
to build the initial model and 420 sample points from data set I as the testing data for online
update. For industrial case study II, as in the previous case, 1100 sample points from data set
I are used as training data, but 300 samples from the data set II are used as the testing data
for online update. Clearly, case study II presents a significantly more challenging problem
as training and testing data sets are collected about 4 years apart. The soft sensor setup for
the industrial case is the same as that reported in [63]. Totally 10 secondary variables are
selected as the regressor variables, which are listed in Table 7.5 together with the associated
delay (di) and past horizon (pi).
7.3.2 RO-DPLS Model Update Performance Comparison
To eliminate the effect of outliers on different RO-DPLS model update schemes, multi-
variate off-line outlier detection is performed to remove outliers from the industrial data
sets. In this way, a fair comparison of different model update schemes can be provided.
Specifically, the leverage and studentized residual of each sample is computed in order to
analyze the independent and dependent variable spaces and detect the outliers. To assess
the performance of the RO-DPLS model update schemes, the mean squared error (MSE)
123
Table 7.5: RO-DPLS model parameters for the industrial case studies
Variable di pi
Top Circulation Temperature 3 10
Combined Heater Temperature 10 3
Upper Cooking [Na2S] 9 3
Upper Cooking [Na2CO3] 9 3
Upper Cooking [Lignin] 9 3
Upper Cooking [Total EA] 9 3
Lower Cooking [Lignin] 8 3
Lower Cooking [Na2CO3] 8 3
Lower Cooking [Na2S] 8 3
Lower Cooking [Total EA] 8 3
Extraction Temperature 6 3
Extraction [Na2CO3] 6 3
Extraction [Lignin] 6 3
Extraction [Na2S] 6 3
Extraction [Total EA] 6 3
Extraction [% Solids] 6 3
and mean error (ME) are utilized. Their definitions are given below.
MSE = 1N
Nsummationdisplay
i=1
(yi ? ?yi)2 (7.3)
ME = 1N
Nsummationdisplay
i=1
(yi ? ?yi) (7.4)
where yi and ?yi are the measured and predicted Kappa numbers at the digester exit, respec-
tively. Because each model update scheme has certain tuning parameters associated with it,
extensive experiments have been done to test the different settings of each scheme in order to
gain better understanding on their properties. In this comparison, the effect of two factors
on model performance is examined in detail: model update frequency and how to select the
number of PLS components for prediction.
124
The effect of model update frequency
How often the soft sensor model should be updated in order to achieve the optimal
performance depends on the process and the model update scheme. For all model update
schemes, a search is done over a wide range of updating intervals by changing the number
of new samples used for model update. The comparison results for the three case studies
are presented in Figures 7.1 - 7.3 and will be discussed in detail in the following paragraphs.
Note that for FF-BRPLS, besides the model update interval, the choice of forgetting factor
also plays an important role in determining model prediction performance. To address this,
a two-dimensional search is conducted, and in Figures 7.1 - 7.3 only the effect of the model
updating frequency under the optimal forgetting factor is presented. Figure 7.4 shows the
effect of forgetting factor on FF-BRPLS with a fixed model update interval (80 samples) for
all three case studies.
a) Simulated case study
For the simulation case study, Figure 7.1 shows that the RO-DPLS soft sensor performs
better when the model is updated more frequently, no matter what recursive model
update algorithm is used. Specifically, updating the model more frequently not only
reduces MSE, but also reduces bias (ME) in the prediction error. This is mainly due to
the strong time-varying characteristics of the process caused by the disturbances in the
wood composition as well as the white liquor composition. It is worth noting that al-
though Figure 7.1(a) indicates RO-DPLS with FF-BRPLS updating scheme provides
the smallest MSE, it does not necessarily mean that it provides the best prediction
performance as its estimates are more biased than some other updating schemes such
as RPLS when the model is updated relatively frequently. To illustrate this point,
the best prediction performances of different updating schemes are compared in Fig-
ures 7.5(a) (RPLS and FF-BRPLS) and 7.5(b) (BRPLS and MW-BRPLS). Both fig-
ures show that the RO-DPLS soft sensors with different model update schemes provide
125
0 20 40 60 80 100 120 140 1600.06
0.07
0.08
0.09
0.1
0.11
0.12
Model update interval (No. of samples)
MSE
RPLS
BRPLS
MW?BRPLS
FF?BRPLS
(a)
0 20 40 60 80 100 120 140 160?0.06
?0.04
?0.02
0
0.02
0.04
Model update interval (No. of samples)
ME
RPLS
BRPLS
MW?BRPLS
FF?BRPLS
(b)
Figure 7.1: Effect of different update intervals in four model update schemes applied to a
simulated case study. (a)MSE ; (b) ME.
good prediction performance and track the process changes well. In addition, Fig-
ure 7.5(a) shows that although both FF-BRPLS and RPLS updating schemes provide
good prediction, the RPLS scheme tracks the process trend better than FF-BRPLS.
Figure 7.5(b) shows that MW-BRPLS tracks the process trend better than BRPLS,
but the predictions are noisier.
b) Industrial case studies
As shown in Figures 7.2 and 7.3, the results of RO-DPLS with different model updating
126
0 20 40 60 80 100 120 140 16014
16
18
20
22
24
26
28
30
Model update interval (No. of samples)
MSE
RPLS
BRPLS
MWBRPLS
FFBRPLS
(a)
0 20 40 60 80 100 120 140 160?0.7
?0.3
0.1
0.5
0.9
1.3
Model update interval (No. of samples)
ME
RPLS
BRPLS
MW?BRPLS
FF?BRPLS
(b)
Figure 7.2: Effect of different update intervals in four model update schemes applied to
industrial case study I. (a) MSE ; (b) ME .
schemes are mixed for the industrial case studies. There is no single method that
consistently outperforms others for different model update intervals. This is possibly
due to the fact that in the industrial data sets, the Kappa number measurements
contain significant amount of noises caused by different technicians who handle the
samples during different shifts; in addition, there are limited testing samples in both
data sets. Because of the same reasons, MSE may not be an accurate indicator of the
soft sensor performance, and ME seems to be a better performance index than MSE. To
127
0 20 40 60 80 100 120 140 16015
20
25
30
35
40
45
50
55
60
Model update interval (No. of samples)
MSE
RPLS
BRPLS
MW?BRPLS
FF?BRPLS
(a)
0 20 40 60 80 100 120 140 160?1
0
1
2
3
4
5
6
7
8
Model update interval (No. of samples)
ME
RPLS
BRPLS
MW?BRPLS
FF?BRPLS
(b)
Figure 7.3: Effect of different update intervals in four model update schemes applied to
industrial case study II. (a) MSE ; (b) ME .
illustrate this point, the prediction performance of two selected settings are compared
in detail for each method. For industrial case study I, the specific settings and
performance indices of each recursive model update scheme are listed in Table 7.6 and
their prediction performance are plotted in Figure 7.6. Both Table 7.6 and Figure 7.6
show that when the RO-DPLS soft sensor is updated more frequently, it tracks the
process variation better, although it may have a slightly higher MSE. When the soft
sensor is updated less frequently, the model prediction tends to ignore the large process
128
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.1
0.2
MSE
Simulation case
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.914
16
18
20
MSE
Industrial case I
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
20
40
60
Forgetting factor
MSE
Industrial case II
(a)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9?0.2
0
0.2
ME
Simulation case
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9?1
0
1
ME
Industrial case I
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
2
4
6
Forgetting factor
ME
Industrial case II
(b)
Figure 7.4: Effect of different forgetting factor on soft sensor performance for a fixed update
interval in three case studies. (a) MSE ; (b) ME.
variations and stays closer to the mean of the primary variable. These are true for all
updating schemes and are most clearly demonstrated by the RPLS updating scheme
(i.e., Figure 7.6(a)). Therefore, ME is a better performance index than MSE in this
specific industrial case study. For industrial case study II, Figure 7.7 compares the
prediction performances of the RO-DPLS soft sensor with two selected settings for each
model updating scheme. The specific settings and corresponding performance indices
are listed in Table 7.7. Compared to the industrial case I, the prediction performances
129
0 50 100 150 200 250 300 350 40039.6
39.8
40
40.2
40.4
40.6
40.8
41
41.2
41.4
41.6
Samples
Kappa
Measurement
RPLS
FF?BRPLS
(a)
0 50 100 150 200 250 300 350 40039.6
39.8
40
40.2
40.4
40.6
40.8
41
41.2
41.4
41.6
Samples
Kappa
Measurement
BRPLS
MW?BRPLS
(b)
Figure 7.5: Prediction performance of different model update schemes applied to a simulated
case study. RPLS vs. FF-BRPLS (a); BRPLS vs. MW-BRPLS (b).
of the RO-DPLS soft sensor with different updating schemes are noticeably worse. This
is expected given that the initial model is the same initial model of the case study I,
which is built with the data collected in 2006, while the testing data of the case study
II were collected in 2010. In addition, because ME provides better indication of soft
sensor performance than MSE in this case study, Table 7.7 shows that RO-DPLS soft
sensor with RPLS updating scheme provides the best prediction performance when
updated relatively frequently (i.e., every 10 samples or about every 2.5 hours). This is
130
mainly due to the fact that in the industrial case study II, the initial 2006 model does
not provide a good description of the 2010 testing data, and the soft sensor performance
relies heavily on how the model is updated. Because the information contained in the
new measurements is quite different from the initial model, it makes sense to use the
data themselves directly (i.e., the RPLS updating scheme) to update the model instead
of using the sub-PLS model extracted form the new data (i.e., the rest of the updating
schemes), as the sub-model will more or less remove some information from the data.
c) Discussion
The extensive experiments performed, show that for the digester process the RO-DPLS
0 50 100 150 200 250 300 350 40075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction 10s
Prediction 80s
(a)
0 50 100 150 200 250 300 350 40075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction 20s
Prediction 120s
(b)
0 50 100 150 200 250 300 350 40075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction 40s
Prediction 140s
(c)
0 50 100 150 200 250 300 350 40075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction 80s
Prediction 140s
(d)
Figure 7.6: Prediction performance of selected cases for different model update schemes
applied to industrial case study I. RPLS (a); BRPLS (b); MW-BRPLS (c); FF-BRPLS with
? = 0.1 (d).
131
Table 7.6: Settings and performance indices of RO-DPLS with different model update
schemes for industrial case study I
Updating scheme Scaling method Update interval MSE ME
RPLS RSC 10 samples 16.9133 0.0968
80 samples 14.2192 0.1877
BRPLS RSC 20 samples 15.7986 0.0934
120 samples 14.5871 0.1429
MW-BRPLS MWSC 40 samples 17.8934 0.1968
140 samples 14.4135 0.2875
FF-BRPLS FFSC 80 samples 16.0281 0.1753
140 samples 15.8370 0.4732
0 50 100 150 200 250 30075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction 10s
Prediction 80s
(a)
0 50 100 150 200 250 30075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction 60s
Prediction 80s
(b)
0 50 100 150 200 250 30075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction 40s
Prediction 80s
(c)
0 50 100 150 200 250 30075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction 40s
Prediction 80s
(d)
Figure 7.7: Prediction performance of selected cases for different model update schemes
applied to industrial case study II. RPLS (a), BRPLS (b), MW-BRPLS (c), and FF-BRPLS
with ? = 0.1 (d).
soft sensor with regular RPLS model updating scheme usually provides the best predic-
tion performance when the model is updated frequently (e.g., with 10 sample interval),
132
Table 7.7: Settings and performance indices of RO-DPLS with different model update
schemes for industrial case study II
Updating scheme Scaling method Update interval MSE ME
RPLS RSC 10 samples 22.4115 0.0592
80 samples 20.7392 0.6181
BRPLS RSC 60 samples 28.9921 3.0328
80 samples 22.4171 2.2200
MW-BRPLS MWSC 40 samples 24.7838 0.7956
80 samples 23.7232 1.7756
FF-BRPLS FFSC 40 samples 22.2816 0.0787
80 samples 20.0813 0.8975
whichismainlyduetothevolatilenatureoftheprocess. Inaddition, generallyspeaking
for processes that experience considerable variations, BRPLS model updating scheme
may not be a good choice. The benefits associated with BRPLS, i.e., computational
and storage efficiency, become more clear when the block size is large. However, large
block size usually results in sluggish tracking of process changes and may deteriorate
soft sensor performance for the processes that show significant time-varying behavior.
A question may arise that if it has been proven that both regular RPLS and BRPLS
are equivalent to the regular PLS, why their performances differ from each other no-
ticeably in these case studies. The answer is that the condition for the equivalency to
hold, i.e., the PLS model for the historical data and the PLS sub-model for the new
data have to capture the variability in X and X1 data blocks completely, is generally
not satisfied, especially for industrial data. To elaborate this point, the data pairs used
to build the model of the three algorithms are listed below.
? Regular PLS:
?
??
??
?
?? X
X1
?
??,
?
?? Y
Y1
?
??
?
??
??
? RPLS:
??
?
??
?
?? PT
X1
?
??,
?
?? BQT
Y1
?
??
??
?
??
133
? BRPLS:
??
?
??
?
?? PT
PT1
?
??,
?
?? BQT
B1Q1T
?
??
??
?
??
One way to interpret the RPLS algorithm is that PT and BQT can be viewed as
the representative samples for the historical data X and Y, as they are the set of
orthogonal bases that span the principal subspace of data block X and Y, respectively.
Similarly, for BRPLS, PT1 and B1QT1 can be viewed as the representative samples for
the new data block. When RPLS and BRPLS are implemented, the number of PLS
components is often determined through cross-validation, and is usually much smaller
than the number of samples in the data sets. Therefore, part of the information
contained in the original data will be lost when PT and BQT (or PT1 and B1QT1 ) are
used to replace the data blocks. By choosing the size of the new data block carefully,
the PLS sub-model (i.e., PT1 and B1QT1 ) can adequately capture the variability in the
new data. However, this may be difficult to achieve when there are sudden changes
in the process, as shown in the industrial case study II. When the size of the new
data block is too small, the PLS sub-model may not adequately capture the variability
of the new process condition, as the variability in the residual space of the PLS sub-
model is completely discarded. On the other hand, if the size of the new data block is
too big, the delay associated with data accumulation may deteriorate the soft sensor
performance. Therefore, for the cases where the process is not sampled frequently,
RPLS is a more reliable choice compared to BRPLS.
d) A modified data scaling approach
It has been recognized that data scaling affects the soft sensor performance, especially
when there exists operation point changes. One disadvantage associated with the
commonly used data scaling method for RPLS, i.e., Equations (5.13) and (5.14), is
that when n (i.e., the number of samples) is large, the mean and the variance do not
change much, no matter how the new data differs from the old data, as the weighting
134
assigned to the new data becomes negligible for large n. To address this limitation, a
modified data scaling method is proposed by replacing n with N, where N is a fixed
integer. In this way fixed weightings are assigned to the previous data and the new
data.
The effect of the number of PLS components retained for prediction
For model prediction, Qin [138] recommended that the number of components to be
determined through cross-validation for various recursive PLS algorithms. However, in the
same work, it was also pointed out that ?An interesting issue that deserves further study is
how to perform cross-validation online to update the number of components needed for pre-
diction?. In this subsection, the performance of the RO-DPLS soft sensor with different ways
of selecting the number of components for prediction is compared and our recommendations
are provided. The performances of the RO-DPLS soft sensor with five different component
selection methods are compared using the simulated data.
a) Number of components is determined through leave-one-out cross-validation with max-
imum allowed number of components as 10.
b) Number of components is determined through block cross-validation, with maximum
allowed number of components as 10, i.e., all training data are divided into four blocks
and three of them are used to build the model while the fourth one is used for cross-
validation.
c) Number of components for prediction is fixed to two.
d) Number of components for prediction is fixed to three.
e) Number of components for prediction is fixed to four.
In this comparison, only the results on the RO-DPLS with regular RPLS update scheme
are presented. The results on other model update schemes are similar to those on RPLS.
135
Figure 7.8 shows the MSE and ME for the five cases over a wide range of model updating
intervals. Figure 7.8 indicates that the two cross-validation approaches perform better than
the approaches with fixed number of components, especially when the soft sensor model is
not updated frequently. However, around the optimal updating intervals, i.e., model update
every 10 samples, there is not much difference among the different component selection ap-
proaches in terms of MSE and ME. To gain a better understanding on how cross-validation
0 10 20 30 40 50 60 70 80 90 100110 120130140 1500.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
Model update interval (No. of samples)
MSE
CBcv
LOOcv
2pc
3pc
4pc
(a)
0 10 20 30 40 50 60 70 80 90 100110 120130140 150?0.08
?0.06
?0.04
?0.02
0
0.02
0.04
0.06
Model update interval (No. of samples)
ME
CBcv
LOOcv
2pc
3pc
4pc
(b)
Figure 7.8: Simulated case study, MSE (a) and ME (b) of RO-DPLS prediction with fixed
or cross-validation selected components for a range of updating intervals.
approaches perform, the number of components determined through cross-validation for each
136
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
Number of components
Percentage
(a)
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
Number of components
Percentage
(b)
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
Number of components
Percentage
(c)
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
Number of components
Percentage
(d)
Figure 7.9: Histogram of the number of components determined through cross-validation
in the simulated case study for two different cross-validation methods. (a) four contiguous
blocks cross-validation with model updated every 10 samples; (b) four contiguous blocks
cross-validation with model updated every 100 samples; (c) leave-one-out cross-validation
with model updated every 10 samples; (d) leave-one-out cross-validation with model updated
every 100 samples.
update is monitored over the entire testing period. Two representative cases are examined,
model update every 10 samples and model update every 100 samples. Figure 7.9 shows the
histogram of the number of components determined through cross-validation approaches.
From Figure 7.9, it is observed that when the model is updated every 10 samples, the num-
ber of components for prediction determined through two cross-validation approaches have
a wide distribution from 1 to 10. However, when 100 samples are used for update, the se-
lected number of components becomes either 1 or 2. One possible reason for such difference
is the large variation presented in the process. When fewer samples are used to update
the model, the disturbances and noises contained in the new measurements have a bigger
137
impact on cross-validation. Consequently, the number of components determined through
cross-validation shows a wide distribution. Because the dominant process variability is not
likely to change so dramatically with in a short period of time, the great variability in the
number of components for prediction determined through cross-validation many not repre-
sent the process correctly. Therefore, when the soft sensor is updated every 10 samples,
using the approach of fix number of components is preferred. On the other hand, when more
samples (e.g., 100 samples) are used to update the model, the process may experience a
sequence of changes during the update interval. Consequently, using more components will
provide better description for one segment of the data, but it may yield poorer prediction
for another segment of the data, as the process may have already changed. In contrast, if
fewer components are selected, the model will only capture the most dominant directions of
process variation which are not likely to change over a period of time. This is why only one or
two components are selected for optimal prediction when 100 samples are used to update the
model. For the simulated case study and industrial case study I, when the model is updated
every 10 samples, there is not much difference in the model prediction performance between
the cross-validation approaches and the fixed number of components approach. However, for
the industrial case study II, even though the MSE for both approaches are similar, their de-
tailed prediction performances look different. In Figure 7.10(a), the histogram of the number
of components determined through cross-validation is shown and Figure 7.10(b) compares
the detailed prediction performance of the fix number of components (i.e., 2 components) and
cross-validation approach. Figure 7.10(a) shows that for 40% of the testing period, only one
component is selected, this is due to the extremely volatile nature of this industrial process.
Figure 7.10(b) indicates that due to the limited amount of process variability captured by
one component, several segments of the predictions are nearly constant. In contrast, when
the number of components is fixed at 2, there is no segment with nearly constant predictions.
This example further supports the choice of the fixed number of components approach for
138
model prediction for the processes of volatile nature which require a frequent model update.
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
Number of components
Percentage
(a)
0 50 100 150 200 250 30075
80
85
90
95
100
105
110
Samples
Kappa
Measurement
Prediction CB?cv
Prediction 2?components
(b)
Figure 7.10: Histogram of the number of components determined through cross-validation
(a) and comparison of prediction performance for fixed number of components vs. cross-
validation approach (b) for industrial case study II.
7.4 Application of the Recursive RO-DPLS Soft Sensor to Digester Control
In this section, the recursive version of the RO-DPLS soft sensor with RPLS updat-
ing scheme is applied to control the simulated Kamyr digester. Two control scenarios are
139
compared: PID control with Kappa number measurement feedback (CL-1), and PID con-
trol with the soft sensor predicted Kappa number feedback (CL-2). In this case study, a
challenging control problem is considered: rejecting the disturbance of wood type change.
It is worth noting that wood type change (from softwood to hardwood and vice versa) is
a major disturbance in pulping process, and usually results in off-spec product during the
transition [185].
7.4.1 Simulation Setup
The Purdue model introduced in section 7.3.1 is implemented to simulate the Kamyr
digester. The simulation starts with softwood as feeding material. Ten hours after the
process reaches steady-state, the feed is switched from softwood to hardwood, i.e., a step
disturbance. The corresponding reaction kinetics parameters and stoichiometric coefficients
for the different types of wood can be found in [146]. The parameters of the integrated white
noise are the same as those listed in Table 6.4. The secondary variables are sampled every 6
minutes, and the primary variable (i.e., the Kappa number) is sampled every 30 minutes plus
30 minutes measurement delay. For soft sensor update, the RO-DPLS with regular RPLS
update scheme is implemented and the soft sensor model update interval is 10 samples of
the Kappa measurement.
7.4.2 Controller Design
Similar to the work presented in [176], the transfer function model for the Kamyr digester
is identified through step response. The transfer function uses a second order plus time delay
(SOPTD) to approximate the dynamics between the input (lower heater temperature) and
the output (the Kappa number); i.e.,
G(s) = Kpe
??ps
?2s2 + 2??s+ 1 (7.5)
140
Table 7.8: PID controller settings
Case Feedback scheme Kc ?I(min) ?D(min)
CL-1 measurement feedback -0.0796 54.21 22.27
CL-2 soft sensor prediction feedback -0.1114 54.21 22.27
with Kp = ?2.4, ? = 34.7 min, and ? = 0.78. Here ?p = 112.3 min represents the transport
delay associated with the pulping process. When the measurements are fed back to con-
trol the process, measurement delay has to be considered, and the overall transfer function
becomes
G(s) = Kpe
?(?p+?m)s
?2s2 + 2??s+ 1 (7.6)
where ?m = 30 minutes is the measurement delay associated with the Kappa number. Based
on the identified SOPTD model, the PID controller settings are determined following the
IMC tuning relationships [167]. To make the comparison consistent, the same closed-loop
time constant is used to tune the two controllers, where ?c is set to be ?p + ?m, which is a
relatively conservative setting. For Kappa number measurement feedback, the overall delay
time is ? = ?p + ?m = 142.3 minutes, and the feedback frequency is every 30 minutes. For
soft sensor estimated Kappa number feedback, because the soft sensor predicts the future
Kappa number with future horizon (?f = 42 minutes), the overall delay ? = ?p ??f = 70.3
minutes. In addition, because the soft sensor provides the Kappa number every 6 minutes,
the feedback frequency is every 6 minutes. The same tuning rule is applied to both control
scenarios. The corresponding PID controller tuning parameters are listed in Table 7.8.
7.4.3 Comparison of Closed-Loop Performance
In this section, the performance of the recursive RO-DPLS soft sensor is examined, then
the closed-loop performances for the two PID controllers are compared. Figure 7.11 shows
the Kappa number measurement vs. soft sensor prediction in the open loop operation after
141
a wood type change disturbance is introduced. Figure 7.11 indicates that with recursive
update, the RO-DPLS soft sensor can track the major process change very well.
0 50 100 150 200 250 300 350 400 45015
20
25
30
35
40
45
50
55
60
65
Samples
Kappa
Measurement
Prediction
Disturbance onset
Figure 7.11: Open loop response to a wood type disturbance and soft sensor prediction
Figure 7.12 shows the closed-loop performance for the two control scenarios. By com-
paring the two closed-loop responses, it is clear that feeding back the predicted future Kappa
number can significantly improve the closed-loop performance by considerably reducing set-
tling time. This improvement is mainly achieved by the significantly reduced overall process
delay (from ?p + ?m to ?p ??f), which allows a much more aggressive controller tuning. In
addition, the improvement is due to the more frequent feedback which is also enabled by
the soft sensor. It is worth noting that the controller performances depicted in Figure 7.12
present a fair comparison but are not optimized, which can be improved through optimizing
control tuning parameters. However, the conclusion that the closed-loop control with soft
sensor prediction feedback greatly outperforms the closed-loop control with measurement
feedback will not change under any fair comparison.
142
0 200 400 600 800 1000 1200 1400 1600 180010
15
20
25
30
35
40
45
50
55
60
65
Samples
Kappa
CL?1
CL?2
Disturbance onset
Set point
Figure 7.12: Comparison of closed-loop response for CL-1 (Kappa measurement feedback)
and CL-2 (soft sensor prediction feedback)
7.5 Conclusion and Discussion
In this chapter, the previously developed RO-DPLS soft sensor is extended to its online
adaptive version. Four different model update schemes are examined, namely RPLS, BR-
PLS, MW-BRPLS, and FF-BRPLS, together with their corresponding data scaling methods.
Extensive experiments were performed with one simulated case study and two industrial case
studies to investigate the properties of different model update schemes.
Results show that for the processes that experience constant changes and various distur-
bances such as the digester process, recursive RO-DPLS with RPLS model update scheme
generally performs the best with frequent update. This is expected because for processes that
are constantly changing, frequent update provides better tracking of the process changes.
Also, for frequent model update, it is preferred to use the new process data directly to update
the soft sensor model (i.e., the RPLS scheme), instead of using the sub-PLS model extracted
from the new data as in the BRPLS scheme. This is because the sub-PLS model built based
on the new process data with limited sample size may lose important process information
if it is not in the principal subspace defined by the directions of dominant process variation
143
during that short period. In addition, although the cross-validation method is usually rec-
ommended to determine the number of components for model prediction, it is found that
when the soft sensor model is updated frequently, it would be better to fix the number of
components for prediction. This is because when the number of samples for model update
is small, the results from the cross-validation method may be overly sensitive to process
disturbance and results in dramatic changes in the number of selected components, which
may negatively affect prediction performance.
The effectiveness of the recursive RO-DPLS soft sensor with RPLS update scheme is demon-
strated through a digester closed-loop control case study where a PID controller is used
to reject a major process disturbance, wood type change, and other common process dis-
turbances. By comparing the closed-loop performance between measurement feedback and
soft sensor prediction feedback, it is shown that the closed-loop control performance can be
significantly improved if the soft sensor prediction is fed back to the PID controller. The
improvement is mainly due to the reduced overall process delay as well as more frequent
feedback enabled by the reduced order model and the high frequency Kappa number predic-
tions.
Finally, it is worth noting that outlier detection plays a critical role in PLS-based soft sensor,
especially for recursive soft sensors that are updated online. In this chapter, the industrial
data are pre-processed to remove all outliers before applying soft sensors scaling, as the focus
is to investigate the different recursive update schemes and data scaling methods. However,
for industrial application, online outlier detection is an imperative component for recursive
data-driven soft sensor development and will be addressed in the next chapter.
144
Chapter 8
Bayesian Supervision of Outliers for a Recursive Soft Sensor Update
8.1 Introduction
In many industrial processes such as distillation columns and pulping digesters, the
primary product variables that are required for feedback control are either not measured
online or not measured frequently. To address this challenge, many data-driven soft sensors
have been developed and implemented in process industry (see comprehensive reviews in
[140], [49] and references cited therein). Various adaptation techniques have been published
to update data-driven soft sensors online, and Kadlec et al. [139] provide a comprehensive
review on the adaptation mechanisms for data-driven soft sensors. In Chapter 6, a reduced-
order dynamic PLS (RO-DPLS) soft sensor was developed to address some limitations of
the traditional DPLS soft sensor when applied to processes with large transport delays [63].
By taking the process characteristics into account, RO-DPLS soft sensor can significantly
reduce the number of regressor variables and improve prediction performance. More re-
cently, the RO-DPLS soft sensor was extended to its online adaptation version in order to
track process changes [67], as described in Chapter 7. Since the focus in [67] was to in-
vestigate the properties of different recursive updating schemes and data scaling methods,
the industrial datasets were preprocessed to remove all outliers before subjecting them to
different experiments. However, it should be noted that the PLS algorithms are sensitive
to outliers in the dataset [186]. Therefore, outlier detection and handling plays a critical
role in the development of the PLS based soft sensors, and there exist extensive studies on
outlier detection for off-line model building [187]. Typical approaches are based on statistics
of historical data such as the 3? outlier detection method [188] and the more robust Ham-
pel identifier method [189]. In addition, multivariate outlier detection methods have also
145
been proposed, such as the PCA-based Jolliffe parameter [190]. Although many methods
have been published, outlier detection remains a challenging problem. Therefore, it is often
recommended to accompany any outlier detection method with a graphical inspection of
the residual space and model parameters to eliminate any possible outlier masking effect
(i.e., outliers are classified as consistent samples) and outlier swamping effect (i.e., consis-
tent samples are classified as outliers) [191]. For online adaptation of soft sensor models,
if erroneous readings are used to update the soft sensor model, future predictions from the
updated model may deteriorate significantly. In addition, online outlier detection is even
more challenging since while outliers could be erroneous readings, they could also be nor-
mal samples of new process states. It is worth mentioning that an alternative approach to
reduce the effect of outliers on soft sensor performance is to use the robust versions of PLS
algorithms [186, 192?195]. However, the robust versions of PLS algorithms usually involve
much more complicated computation, and may have various limitations, such as applicable
to one-dimensional response variable, not resistant to leverage point, and not applicable to
high dimensional regressors. More importantly, unlike the conventional PLS algorithm which
has various recursive versions available for online updating, currently there are no recursive
versions available for the robust PLS algorithms. Therefore, they are not desirable for online
model updating, and in this work the conventional PLS algorithm is considered only. In this
chapter, multivariate monitoring methods are developed for both off-line and online outlier
detection. In addition, to differentiate the samples that represent a process change from
those of erroneous readings, a Bayesian supervisory approach to further analyze and classify
the detected outliers is proposed [71, 72].
8.2 Outlier Detection Methods
In this section, the proposed outlier detection methods for off-line and online detection
are described in detail.
146
8.2.1 Off-line Outlier Detection for Initial Model Building
A combination of automatic outlier detection, via leverage and y-studentized residuals,
together with a visual inspection of the behavior of the different variables (e.g. zero-order
hold and evident outliers) is proposed to detect outliers off-line. Leverage of an observation
is a concept developed in the ordinary regression theory [196], which defines the influence
that a given sample will have on a model and is related to the Hotelling?s T2 statistic [191].
The leverage in terms of the T scores [197] is computed as follows:
hi = 1n +
asummationdisplay
j=1
t2i,j
tTj tj (8.1)
where n is the total number of samples, ti,a is the vector of scores for object i, and a is the
number of principal components retained. A sample is classified as an outlier if its leverage
hi > ?(1+a)/n, where ? is a constant (usually 2 or 3). In this work, ? = 3 is chosen as this
setting provides a balanced performance between desired sensitivity and specificity. For a
given sample, the studentized residual is an indication of the lack of fit of the y-value, which
is defined as the following [197].
ri = fis(1?h
i)1/2
(8.2)
where fi = yi ? ?yi is the residual of the dependent variable, ?yi is the ith prediction of the
dependent variable provided by the soft sensor model, and 1n?a?1summationtextni=1 f2i is the sample
estimate variance of the residual. To detect outliers, the studentized residuals are usually
compared to a normal distribution N(0,1) [191]. In this work, if |fi| > 3, the sample is
classified as an outlier.
8.2.2 Online Outlier Detection for Recursive Model Update
For online outlier detection, the SPEx and SPEy (squared prediction error for X and
Y) indices [6, 8] are used to monitor the independent variable and dependent variable space,
respectively. It is worth noting that Hotelling?s T2 index is not used to detect outliers,
147
although it is commonly used in conjunction with SPE index for fault detection. Our choice
of SPE over T2 is mainly due to their different roles in process monitoring. It has been
suggested that for the samples whose T2 indices exceed the threshold but not the SPE
indices, in many cases they correspond to process operation changes instead of outliers [8].
Our own experiences also confirm that when a real outlier occurs, it is usually identified by
both the T2 index and SPE index. Therefore, in this chapter, the SPE index is used alone
to detect outliers in the independent and dependent variable space. Specifically, if the SPE
index (i.e., SPEx or SPEy) violates its corresponding control limit, the sample is declared
an outlier and should be analyzed further. SPEx and SPEy for a new sample to be used for
soft sensor update are defined as:
SPEx =
msummationdisplay
i=1
(xnew,i ??xnew,i)2 (8.3)
SPEy =
psummationdisplay
i=1
(ynew,i ??ynew,i)2 (8.4)
where m is the number of independent variables, p the number of dependent variables,
xnew,i and ynew,i are the new samples of independent and dependent variables, and ?xnew,i
and ?ynew,i are the corresponding predictions. The thresholds for SPEx and SPEy can be
determined based on the theorems developed by Box [198] using the training data, or they
can be determined empirically using the training or validating data under normal operating
conditions [123, 124].
8.3 Bayesian Supervisory Approach for Online Outlier Classification
For online outlier detection, once a new sample is identified as an outlier, it is necessary
to determine whether it corresponds to an erroneous reading, or it represents a new process
state. In this work, a Bayesian supervisory approach is proposed to perform this task. It
should be noted that this task is less difficult for off-line data, because the data collected after
148
the outlier(s) are available to help make the decision. In the proposed Bayesian supervisory
approach, once an outlier is detected by the SPE indices, we wait for few more measurements
to become available and apply Bayesian inference to make the decision. The basic assumption
is that if an outlier is due to erroneous readings, the increase in the SPE indices will not
be sustainable and will result in an impulse or short step disturbance in the time series of
SPE indices. On the other hand, if an outlier is caused by a process change, the following
samples will all deviate from the previous model and will result in a sustained step or ramp
disturbance in the time series of SPE indices. Therefore, when an outlier is identified,
it is tried to classify whether the change in the SPE index belongs to an impulse/short
step or a ramp/step disturbance in order to determine whether the outlier is due to an
erroneous reading or a process change. In this work, the classification is achieved through a
Bayesian approach. By definition, Bayes? Theorem is a simple mathematical formula used for
calculating conditional probabilities. Simply put, it gives the probability of a random event
A occurring given that we know a related event B occurred. This probability is denoted as
P(A|B) and is called the posterior probability, since it is computed after all other information
on A and B is known. Using Bayes? Theorem, the posterior probability can be computed as
P(A|B) = P(B|A)P(A)P(B) (8.5)
In our case, event B corresponds to the measurements collected after the outlier is detected
and event A corresponds to a specific disturbance type. In the proposed approach, instead of
using the values of SPE indices directly which are often affected by the stochastic nature of
the process, the indexes values are transformed into a more robust statistic-based probability
description using Bayesian statistics. In this way, different processes with different dynamic
characteristics can be analyzed using a unified statistical framework.
149
8.3.1 Bayesian Outlier Classification Procedure
Figure 8.1 shows the schematic diagram of the proposed Bayesian outlier classification
procedure, which is a modification of the previously developed Bayesian approach for detec-
tion and classification of different disturbances in the semiconductor processes [199]. The
classification algorithm is triggered by the identification of an outlier through SPE indices,
and a brief description is provided below.
1. Denote the time index of the outlier as k; construct the pre- and post-change windows
around the outlier k. The pre-change window contains a few samples SPE indices
prior to the identified outlier; while the post-change window contains the SPE indices
after (and including) the outlier. In this work, the width of the pre-change window is
fixed to 5 samples, while the width of the post-change window varies depending on the
assumed type of disturbance.
2. Wait until sample k+1 is available, then perform hypothesis testing using Bayes The-
orem to determine whether the disturbance is an impulse.
3. If the hypothesis is rejected, we wait for more future samples to determine whether
the sample is part of a short step disturbance (with duration 2, 3 or 4).
4. If all previous hypotheses are rejected, it is concluded that a real process change has
occurred.
It is worth noting that the posterior probabilities in the post-change window form different
patterns depending on the pre-assumed disturbance type and observation values in the post-
change window. Instead of putting a threshold on a single posterior probability, which is
a univariate method [200], the proposed Bayesian approach compares the pattern of the
posterior probabilities in the post-change window to the predefined patterns in order to
classify the type of disturbance. This pattern matching approach is a multivariate method,
which is more robust compared to the univariate method, and greatly improves classification
150
Figure 8.1: Bayesian disturbance classification procedure
accuracy and reduces classification delay. In addition, it makes the classification results not
sensitive to the a priori probability (in this work, the a priori probability is set to 0.5).
Detailed description of different patterns of posterior probability in the post-change window
and pattern matching procedure can be found [199]. In addition, by specifying different
post-change window widths for different disturbances, the classification delay is minimized.
In other words, the classification decision is made when the minimum required information
becomes available.
8.4 Soft Sensor Recursive Update with Outlier Detection and Classification
In this section, the proposed outlier detection and classification methods are integrated
into the previously developed RO-DPLS soft sensor [63, 67] for online recursive update.
Based on the results of a comprehensive comparison of different recursive PLS update
schemes [67], regular recursive PLS updating scheme is chosen to update the RO-DPLS
soft sensor model. Both simulated case study and industrial case studies of the single vessel
151
Kamyr digester are used to demonstrate the performance of the proposed approach. In this
section, the performance of the RO-DPLS soft sensor is compared for four scenarios.
a) Static model without update;
b) Recursive update without outlier detection, i.e., all samples are used to update the
model;
c) Recursive update with outlier detection, i.e., all outliers identified by SPE indices are
excluded from model update;
d) Recursive update with Bayesian supervisory approach for outlier detection, i.e., erro-
neous readings are excluded from model update, while process changes are used for
model update;
Simulated case study
In this subsection, the simulated Kamyr digester is used to illustrate how the Bayesian
supervisory approach works. The extended Purdue model [146] is implemented to simulate
a single vessel high yield Kamyr digester. The RO-DPLS soft sensor set up can be found
in Chapter 6. In this case study, a very challenging problem is considered: tracking the
disturbance of a wood type change. It is worth noting that wood type change (from softwood
to hardwood and vice versa) is a major disturbance in pulping processes, and usually results
in off-spec product during the transition. Both single and consecutive multiple outliers (i.e.
impulses and short steps) are added to the process, and the Kappa number measurements are
plotted in Figure 8.2. From Figure 8.2, it is seen that the dramatic change in Kappa number
during the transition period is due to the wood type change, and the samples during the
transition should be used to update the model; while the changes that occur at samples 10
(impulse), 45 (short step with duration 3) and 65 (short step with duration 2), are introduced
outliers and should not be used for model update. The soft sensor predicted Kappa number
values are also plotted in Figure 8.2, with the corresponding mean squared prediction error
152
(MSE) and mean prediction error (ME) given in Table 8.1. Both Figure 8.2 and Table 8.1
demonstrate the important roles of outlier detection and classification, and their impact
on soft sensor performance. It is shown that outlier detection alone may even deteriorate
the performance of a soft sensor if process changes were treated the same as erroneous
measurements. On the other hand, if the proposed Bayesian outlier classification mechanism
is integrated into outlier detection, the soft sensor can be made more robust to erroneous
measurements and at the same time is able to track process changes. Figures 8.3(a)
Table 8.1: Performance of different soft sensors for the simulated case study
Soft Sensor MSE ME
(a) 405.4800 -10.8817
(b) 1.1476 -0.1695
(c) 120.6689 -3.7614
(d) 0.8505 -0.1526
0 50 100 150 200 25015
20
25
30
35
40
45
50
55
60
65
70
Sample
Kappa
Measurement
Erroneous reading
Prediction (a)
Prediction (b)
Prediction (c)
Prediction (d)
0 50 100 150 200
40
42
44
46
48
Figure 8.2: Prediction comparison of different approaches applied to a simulated case study
and 8.3(b) show the SPEx and SPEy indices for the Bayesian supervisory approach, together
with the classified outliers. Figure 8.3 shows that SPEx and SPEy indices can promptly
identify the outliers caused by both erroneous reading and process change, and the Bayesian
153
supervisory approach is effective in classifying the identified outliers. Without the Bayesian
supervisory approach, all identified outliers will be excluded from updating the model, which
results in poor prediction performance of approach (c), similar to the static model. For
approach (b) where all new samples are used for model update, the negative impact of using
erroneous readings for model update are illustrated more clearly with the insert in Figure 8.2.
0 50 100 150 200 25010
1
102
103
104
Sample
SPE
x
SPEx index
Threshold
Process change
Erroneous reading
(a)
0 50 100 150 200 25010
?10
10?8
10?6
10?4
10?2
100
102
104
106
Sample
SPE
Y
SPEy index
Threshold
Process change
Erroneous reading
(b)
Figure 8.3: SPE indices for the simulated case study. (a) SPEx; (b) SPEy
154
Industrial case study
In the industrial case study, the process data were collected from a Kamyr digester at a
pulp mill located in Mahrt, Alabama run by MeadWestvaco Corp. The training data were
collected in 2006 which contain 1100 samples, while the testing data for online update were
collected in 2010 which contain 300 samples. Clearly, this case study presents a challenging
problem as training and testing data sets were collected about 4 years apart. The soft sensor
setup for the industrial case is the same as that reported in Chapter 6. Figure 8.4 plots a
0 20 40 60 80 10075
80
85
90
95
100
105
110
115
120
125
Sample
Kappa
Measurement
Erroneous reading
Prediction (a)
Prediction (b)
Prediction (c)
Prediction (d)
Figure 8.4: Comparison of predictions of different approaches for the industrial case study
segment of the testing data to illustrate the prediction performance of different soft sensors
and compare them with the process measurements, and Table 8.2 lists the MSE and ME
of different approaches for the whole testing data set. Figures 8.5(a) and 8.5(a) show the
SPEx and SPEy indices for the Bayesian supervisory approach, together with the classified
outliers. It should be noted that for this case study the soft sensor that updates recursively
with outlier detection (but without classification) performs exactly the same as the static
soft sensor. This is due to the big difference between the training data and testing data,
which causes all new data to be classified as outliers. From this case study, it is clear that
155
Table 8.2: Performance of different soft sensors for the industrial case study
Soft Sensor MSE ME
(a) 75.3031 7.3024
(b) 34.8632 1.0535
(c) 75.3031 7.3024
(d) 29.4228 0.7468
0 20 40 60 80 10010
1
102
103
104
Sample
SPE
X
SPEx index
Threshold
Process change
Erroneous reading
(a)
0 20 40 60 80 1000
2
4
6
8
10
12
14
16
Sample
SPE
Y
SPEy index
Threshold
Erroneous reading
(b)
Figure 8.5: SPE indices for the industrial case study. (a) SPEx; (b) SPEy
the Bayesian supervisory approach is effective and robust in determining whether an outlier
is caused by erroneous reading or process change.
156
8.5 Conclusion and Discussion
Outlier detection and handling plays a critical role in data-driven soft sensor develop-
ment. In this chapter, multivariate approaches for both off-line outlier detection (for initial
soft sensor model building) and online outlier detection (for soft sensor model recursive
update) are proposed. Specifically, for off-line outlier detection leverage and y-studentized
residuals are combined; while for online outlier detection, the squared prediction error indices
for X and Y are used to monitor the independent variable and dependent variable space,
respectively. For online outlier detection, to differentiate the outliers caused by erroneous
reading from those caused by process changes, a Bayesian supervisory approach is proposed
to further analyze and classify the identified outliers. The classification of outliers is achieved
through matching the pattern generated by the posterior probabilities in the post-change
window to pre-defined patterns corresponding to different disturbances. Both simulated and
industrial case studies show that by updating the soft sensor model and correctly analyzing
the identified outliers, the soft sensor enhances its prediction performance qualitatively and
quantitatively.
157
Chapter 9
Conclusions and Proposed Future Work
9.1 Summary of Contributions
The overall goal of this work is to advance the state-of-the-art of industrial process
monitoring and soft sensor design techniques.
9.1.1 Process Monitoring
As industrial processes become more complex, advance monitoring techniques play a
fundamental role in assessing the process performance. By promptly detecting process up-
sets and/or instrumentation faults, process monitoring techniques not only aid in improving
product quality and process efficiency, but also guarantee a safe operation. In this work,
the potential for fault detection and diagnosis of a new state-of-the-art process monitoring
framework is studied. The statistics pattern analysis framework, by utilizing the statistics of
the process instead of the process variables themselves, not only improves the detection per-
formance compared with traditional linear process monitoring methods, but also outperforms
nonlinear extensions of PCA-based monitoring such as Kernel PCA (KPCA) and indepen-
dent component analysis (ICA). Furthermore, due to the flexible nature of the framework,
different statistics can be incorporated to the statistic pattern (SP) to capture the dominant
process characteristics such as nonlinearity, non-Gaussianity and dynamics. Additionally, in
SPA, the diagnosis of the root cause of different faults is improved not only by providing a
clear indication of the contribution of the variable that is most related to the given fault,
but indicating the nature of the type of fault (e.g. mean or variance change).
In this work, the properties of the SPA-based monitoring of continuous processes are studied
using two different case studies. With the quadruple-tank case study, the SPA-based process
158
monitoring showed that it can correctly detect and diagnose faults of a nonlinear process that
are difficult to detect visually or by using traditional monitoring approaches. In addition,
in this case study SPA showed that it can offer excellent fault detection rates in the event
of limited samples in the training data set. Evidently, due to the calculation of different
statistics, when the training data are limited, window overlapping is necessary to have a
representative number of SPs. In the study, an increase in the false alarms rate is observed
as the percentage in overlapping increases. This effect is caused by the lack of independence
if the samples (e.g. SPs). The study showed that for a case with an overlapping of around
60%, SPA still observes a high detection performance and a reasonable false alarm rate. On
the other hand, when the number of training samples is significant, SPA is able to detect
and diagnose faults in a more reliable manner. Type I errors in normal testing data (e.g.
false alarms) are close to the expected confidence level. In addition, both SPA-based fault
detection and diagnosis is significantly better than traditional monitoring methods.
In the TEP case study, the SPA framework was tested to monitor a more complex process.
The good performance observed in the quadruple-tank case study is confirmed with the
more complicated TEP process. In this case study, the performance of SPA was compared
to other linear and nonlinear monitoring methods. By monitoring the statistics of the pro-
cess variables instead of the process variables themselves, SPA was able to outperform the
detection and diagnosis performance of the other monitoring methods. Furthermore, even
that detection delay is associated with window-based approaches like SPA, it this study, it
was found that after the assembly of the validating window, the detection delays of SPA
compared to other methods are not significant. A closer look at the detection performance
indicates that SPA-based monitoring can offer a more consistent and reliable detection com-
pared to the other methods, which are important characteristics to consider in an online
implementation. In regards of the computation load, the complexity of SPA compared with
the two linear methods (e.g. PCA and DPCA) is not significant, and much less expensive
159
than the observed in the nonlinear case studies evaluated in this study. For fault diagno-
sis, the contributions to the different monitoring indices in SPA show a clear and reliable
indication of the root cause of different faults. Additionally, the hierarchical contribution
approach provides the framework with extra information about the nature of the faults (e.g.
mean or variance change). As a summary, the research findings indicate that the SPA-based
monitoring is a promising alternative for monitoring processes especially where nonlinearities
and multi-modal distributions are present.
9.1.2 Soft Sensor Development
In modern process operation, a massive amount of data from secondary measurements
(e.g. easy-to-measure) are readily available at high frequencies, but data indicating product
quality and/or yield are scarce and often times sampled at low frequency (e.g. hard to mea-
sure). As a result, developing data-driven software sensors (or soft sensors) using secondary
measurements (e.g. variables easy to measure) to predict primary measurements based on
multivariate regression techniques is gaining wide application in industry. The prediction of
these key process variables not only provides a way to have real time measurements, but also
aids in improving quality and preventing undesirable operating conditions. In this work, the
dynamic partial least squares (DPLS) soft sensor is studied in depth due to its successful
application in industry. Despite its successful applications, before this work, there was a
lack of theoretical understanding on the properties of the DPLS soft sensor. One of the
major contributions of this dissertation is the development of a theoretical understanding of
why a DPLS-based soft sensor can adequately capture process dynamics and provides unbi-
ased estimates under closed-loop operation. The theoretical analysis confirms that a DPLS
soft sensor is a dynamic estimator that gives unbiased estimate of primary measurements
for both open loop and closed-loop data. In addition, to overcome the difficulties related
with processes with large transport delays, a reduced-order DPLS (RO-DPLS) soft sensor
approach is proposed.
160
Real industrial processes have time-varying behavior and soft sensors developed off-line are
prone to fail as time progresses, thus indicating that an updating strategy is necessary. To
address this issue, the RO-DPLS soft sensor is also extended to its adaptive version by
exploring the properties and the characteristics of different recursive update schemes. In
addition, practical issues such as data scaling and the presence of outliers off-line and online
are studied.
It is well known that introducing outliers at the design and update stage of any data-driven
approach will deteriorate its performance. The task of detecting outliers off-line is relatively
easier compared to do it online. One of the major reasons is that the data and expert knowl-
edge are present to make a decision. For online outlier detection, the resources are limited
as expert inputs are not available to base the decision. Furthermore, outliers online can be
part of two groups; erroneous readings and process change. Outliers that are part of a group
of erroneous readings need to be identified and discarded from the update procedure. On
the other hand, outliers that are part of a process change are necessary, and need to be kept
to update the soft sensor model. In this aspect, the contribution of this work is significant.
Based on multivariate techniques for outlier detection online (e.g. SPE), once an outlier is
detected online, the monitoring indices are transformed into probabilities of the occurrence
of a given type of fault (e.g. impulse, short step etc.). In this work, a Bayesian-based ap-
proach is developed to further analyze the detected outliers and the probabilities obtained
from the SPE indices. These probabilities are compared with patterns of probabilities cor-
responding to different type of events. For example, with the Bayesian-based supervisory
approach, SPE indices corresponding to samples that are part of impulses and short steps
can be categorized as outliers. On the other hand, if the pattern of probabilities defines
a sustained change (e.g. process change), those identified outliers are not discarded, and
the multivariate observations of these secondary measurements are used to update the soft
sensor model. In both, simulation and industrial cases, the recursive RO-DPLS soft sensor
with outlier detection and classification approach observed enhanced prediction capabilities
161
if compared with the static RO-DPLS and with recursive-only RO-DPLS. As a summary,
the RO-DPLS has been equipped with the necessary tools for handling challenging cases
where a prediction of a key indicator of production performance is required.
9.2 Suggestions for Future Work
Future research directions which deserve further investigation are summarized in this
section.
9.2.1 Process Monitoring
? As discussed in this work, the SPA-based fault detection and diagnosis is a promis-
ing alternative for monitoring industrial processes, especially when nonlinearities are
prevalent. One of the major requirements of the framework is the availability of data
to obtain a number of training SPs (e.g. observations) that is statistically significant
compare to the number of statistics included in the SP (e.g. variables). Nowadays, pro-
cesses can be categorized as data rich, and thus this is not a big limitation. However,
to address this requirement, the following strategy to monitor continuous processes
only is proposed (for batch processes the advantages provided by SPA are far superior
than other methods). First, after the collection of the process variables that will be
part of the monitoring strategy, a PCA decomposition is performed to obtain both,
scores and loadings representing the process. Second, a subset of scores (e.g. defined
by the retained number of principal components) is used to characterize the process.
In other words, although nonlinearities might be present and obviously inherited by
the scores in PCA, redundancies and measurement noise are discarded in the last few
components, and only the first few scores are retained to represent the process. Third,
instead of calculating the statistics of the process variables to perform SPA, which
require the availability of a good amount of data, the statistics of the first few retained
scores are determined to perform SPA in the latent variable space. At the time of
162
the preparation of this document, preliminary results indicate that although PCA fails
to detect faults for some cases (e.g. especially when nonlinearities are prevalent), the
statistics of the projections (or scores) contain enough information to correctly define
normal and faulty operating conditions. Thus, enhancing the detection power of reg-
ular PCA. In addition, since the number of retained scores a is normally much lower
than the number of available variables m, the number of observations needed to obtain
the matrix of SPs is much less (e.g. data to calculate different statistics). One chal-
lenge associated with this approach is to develop a way to diagnose the variable that
is most related to the fault. In other words, since two PCA models are used (e.g. for
reducing the dimensionality, and for SPA), a method to back-calculate or estimate the
contribution of the input variables based on the contributions of the different statistics
to a fault in the latent variable space needs to be defined.
? A more thorough study of different properties of SPA need to be performed. In this
work, the effect of adding/removing different statistics, the effect of window-overlap
and fault diagnosis issues were addressed. More insights need to be developed to
generate rules of thumb to define groups of statistics that need to be included as pat of
the SP, based on the known characteristics of the process, which is important from the
point of view of online implementation. An additional factor is the study of providing
weights to different groups of statistics. Preliminary results indicate that not only
defining the right set of statistics enhances both, detection and diagnosis performance,
but also the weights of different statistics can contribute to achieve these two goals.
? In this work, the SPA framework has been shown to be a promising alternative for
monitoring industrial processes, especially where nonlinearities and multi-modal dis-
tributions are present. Additionally, the framework has shown good results even in
the event of window overlapping. Although, depending on the amount of overlapping,
an increase in the false alarm rates is observed. Closer evaluation of these results
163
raises one question, Is the overlapping the only source of the increment of the false
alarms rates?. In this regard, our experience indicate that other factors are involved
and can be studied to overcome this issue. First, it has been observed that although
cross-validation is a reasonable way to determine the number of principal components
to use as representative of the input space, it can provide a number of PCs that only
improves fault detection, but does not offer a balance of reduction of Type I and Type
II errors. In this regard, more testing needs to be done to evaluate this effect. In
addition, for the sake of a fair comparison, the empirical way of determining the SPs
is utilized in this work. However, it is possible that developing a way to determining
the thresholds for monitoring, taking into account the amount of overlapping, can aid
balancing the fault detection performance and the reduction of the false alarm rates of
normal testing and validating data.
9.2.2 Soft Sensor Development
? As PCA, the SPA framework can be extended to the development of soft sensors.
In other words, the statistics of secondary measurements in a batch or a window
can be utilized to predict key product quality variables. In this aspect, preliminary
results show that SPA-based soft sensors can effectively outperform the prediction
power offered by soft sensors based on PCA and PLS [201]. A preliminary study with
data from a semiconductor fabrication site have already shown promising results.
? As discussed previously, the recursive RO-DPLS soft sensor with outlier detection and
classification approach has shown impressive results in tracking the variability of kappa
number for the digester case study. In this regard, additional testing needs to be done
to expand the outlier detection and classification approach, to handle the update of
the thresholds of monitoring instead of a static threshold. Preliminary results in this
matter indicate that providing an update of the monitoring threshold can cope with
164
issues of transitioning the SPE indexes to a different operating region, and still provide
consistent outlier detection close to the given significance level.
? Although in this work some efforts were done in regards to making use of the prediction
provided by the RO-DPLS soft sensor for control, an extended comparison of the
benefits associated with it can be done. In other words, different control strategies such
as Smith predictor with PID and model predictive control (MPC) can be implemented
to highlight further the benefits associated with the future predictions provided by the
RO-DPLS soft sensor.
165
Bibliography
[1] Walter A. Shewhart. Economic Control of Quality of Manufactured Product. Van
Nostrand, Princeton, 1931.
[2] J. Edward Jackson. Principal components and factor analysis: Part I- principal com-
ponents. Journal of Quality Technology, 12:201?213, 1980.
[3] Douglas M. Hawkins and David H. Olwell. Cumulative sum charts and charting for
quality improvement. Springer, 1998.
[4] J. Stuart Hunter. Exponentially weigthed moving average. Journal of Quality Tech-
nology, 18:203?210, 1986.
[5] John F. MacGregor, Christiane Jaeckle, Costas Kiparissides, and M. Koutoudi. Process
monitoring and diagnosis by multiblock PLS methods. AIChE Journal, 40(5):826?838,
1994.
[6] J. F. MacGregor and T. Kourti. Statistical process control of multivariate processes.
Control Engineering Practice, 3(3):403?414, 1995.
[7] Q. Peter He, S. Joe Qin, and Jin Wang. A new fault diagnosis method using fault
directions in Fisher discriminant analysis. AIChE Journal, 51(2):555?571, 2005.
[8] S. Joe Qin. Statistical process monitoring: basics and beyond. Journal of Chemomet-
rics, 17(8-9):480?502, 2003.
[9] Theodora Kourti, Paul Nomikos, and John F. MacGregor. Analysis, monitoring and
fault diagnosis of batch processes using multiblock and multiway PLS. Journal of
Process Control, 5(4):277?284, 1995.
166
[10] Theodora Kourti, Jennifer Lee, and John F. Macgregor. Experiences with industrial
applications of projection methods for multivariate statistical process control. Com-
puters and Chemical Engineering, 20(Supplement 1):S745?S750, 1996.
[11] Changkyu Lee, Sang Wook Choi, and In-Beum Lee. Sensor fault identification based
on time-lagged PCA in dynamic processes. Chemometrics and Intelligent Laboratory
Systems, 70(2):165?178, 2004.
[12] W.E. Deming. Some Theory of Sampling. John Wiley, New York, 1950.
[13] Karl Pearson. On lines and planes of closest fit to systems of points in space. Philo-
sophical Magazine, 2:559?572, 1901.
[14] J. Edward Jackson. Quality control methods for several related variables. Technomet-
rics, 1(4):359?377, 1959.
[15] J. Edward Jackson and Govind S. Mudholkar. Control procedures for residuals asso-
ciated with principal component analysis. Technometrics, 21(3):341?349, 1979.
[16] Barry M. Wise and Neal B. Gallagher. The process chemometrics approach to process
monitoring and fault detection. Journal of Process Control, 6(6):329?348, 1996.
[17] J. Edward Jackson. A User?s guide to Principal Components. John Wiley and Sons,
Inc., 1991.
[18] Sergio Valle, Weihua Li, and S. Joe Qin. Selection of the number of principal compo-
nents: The variance of the reconstruction error criterion with a comparison to other
methods. Industrial and Engineering Chemistry Research, 38(11):4389?4401, 1999.
[19] Paul Geladi and Bruce R. Kowalski. Partial least-squares regression: A tutorial. An-
alytica Chimica Acta, 185:1?17, 1986.
[20] P. Geladi. Notes on the history and nature of partial least squares (PLS) modelling.
Journal of Chemometrics, 2(4):231?246, 1988.
167
[21] Bhupinder. S. Dayal and John F. MacGregor. Improved PLS algorithms. Journal of
Chemometrics, 11(1):73?85, 1997.
[22] Nouna Kettaneh, Anders Berglund, and Svante Wold. PCA and PLS with very large
data sets. Computational Statistics and Data Analysis, 48(1):69?85, 2005.
[23] Philip R. C. Nelson, Paul A. Taylor, and John F. MacGregor. Missing data methods
in PCA and PLS: Score calculations with incomplete observations. Chemometrics and
Intelligent Laboratory Systems, 35(1):45?65, 1996.
[24] Svante Wold, Michael Sjstrm, and Lennart Eriksson. PLS-regression: a basic tool of
chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2):109?130, 2001.
[25] Agnar Hskuldsson. PLS regression methods. Journal of Chemometrics, 2(3):211?228,
1988.
[26] Kresta J. V., MacGregor J. F., and Marlin T. E. Multivariate statistical monitoring of
process operating performance. Canadian Journal of Chemical Engineering, 69:35?47,
1991.
[27] Davis B. Ricker N.L. Kowalski B.R. Wise B.M., Veltkamp D.J. Principal component
analysis for monitoring the West Valley liquid fed ceramic melter. Waste Manage-
ment?88 Proc., pages 811?818, 1988.
[28] Paul Nomikos and John F. MacGregor. Monitoring batch processes using multiway
principal component analysis. AIChE Journal, 40(8):1361?1375, 1994.
[29] Wenfu Ku, Robert H. Storer, and Christos Georgakis. Disturbance detection and
isolation by dynamic principal component analysis. Chemometrics and Intelligent
Laboratory Systems, 30(1):179?196, 1995.
[30] Y. Rotem, A. Wachs, and D. R. Lewin. Ethylene compressor monitoring using model-
based PCA. AIChE Journal, 46(9):1825?1836, 2000.
168
[31] J.J. Downs and J.E. Doss. Present status and future needs- A view from north american
industry. InY.ArkunandH.Ray, editors, AIChE Symposium Proceedings of the Fourth
International Conference of Chemical Process Control, volume P-67, Padre Island, TX,
1991. AIChE publication.
[32] Jin Wang and Q. Peter He. Multivariate statistical process monitoring based on statis-
tics pattern analysis. Industrial and Engineering Chemistry Research, 49(17):7858?
7869, 2010.
[33] Jong-Min Lee, S. Joe Qin, and In-Beum Lee. Fault detection and diagnosis based on
modified independent component analysis. AIChE Journal, 52(10):3501?3514, 2006.
[34] Paige Miller, Ronald E. Swanson, and Charles E. Heckler. Contribution plots: A
missing link in multivariate quality control. Applied Mathemathics and Computer
Science, 8:755?792, 1988.
[35] Theodora Kourti. Application of latent variable methods to process control and multi-
variate statistical process control in industry. International Journal of Adaptive Control
and Signal Processing, 19(4):213?246, 2005.
[36] A. K. Conlin, E. B. Martin, and A. J. Morris. Confidence limits for contribution plots.
Journal of Chemometrics, 14(5-6):725?736, 2000.
[37] D.G.Luenberger. Observingthestateofalinearsystem. IEEE Transactions of Military
Electronics, MIL-8:74?80, 1964.
[38] R.E. Kalman. A new approach to linear filtering and prediction problems. Transcations
of the ASME, Journal of Basic Engineering, 87:35?45, 1960.
[39] Eliana Zamprogna, Massimiliano Barolo, and Dale E. Seborg. Optimal selection of
soft sensor inputs for batch distillation columns using principal component analysis.
Journal of Process Control, 15(1):39?52, 2005.
169
[40] N.R. Draper and H. Smith. Applied Regression Analysis. Wiley, New York, 1981.
[41] T. Kourti. 4.02 - Multivariate Statistical Process Control and Process Control, Using
Latent Variables. In D. Brown Editors-in Chief:Stephen, Tauler Rom, and Walczak
Beata, editors, Comprehensive Chemometrics, pages 21?54. Elsevier, Oxford, 2009.
[42] Theodora Kourti and John F. MacGregor. Process analysis, monitoring and diagno-
sis, using multivariate projection methods. Chemometrics and Intelligent Laboratory
Systems, 28(1):3?21, 1995.
[43] Rumana Sharmin, Uttandaraman Sundararaj, Sirish Shah, Larry Vande Griend, and
Yi-Jun Sun. Inferential sensors for estimation of polymer quality parameters: Indus-
trial application of a PLS-based soft sensor for a LDPE plant. Chemical Engineering
Science, 61(19):6372?6384, 2006.
[44] S. Lakshminarayanan, Sirish L. Shah, and K. Nandakumar. Modeling and control of
multivariable processes: Dynamic PLS approach. AIChE Journal, 43(9):2307?2322,
1997.
[45] E. Zamprogna, M. Barolo, and D.E. Seborg. Composition estimations in a middle-
vessel batch distillation column using artificial neural networks. Chemical Engineering
Research and Design, 79(6):689?696, 2001.
[46] Eliana Zamprogna, Massimiliano Barolo, and D.E. Seborg. Estimating product com-
position profiles in batch distillation via partial least squares regression. Control En-
gineering Practice, 12(7):917?929, 2004.
[47] Thor Mejdell and Sigurd Skogestad. Estimation of distillation compositions from mul-
tiple temperature measurements using partial-least-squares regression. Industrial and
Engineering Chemistry Research, 30(12):2543?2555, 1991.
170
[48] Manabu Kano, Koichi Miyazaki, Shinji Hasebe, and Iori Hashimoto. Inferential con-
trol system of distillation compositions using dynamic partial least squares regression.
Journal of Process Control, 10(2-3):157?166, 2000.
[49] Luigi Fortuna, S. Graziani, Alessandro Rizzo, and M. G. Xibilia. Soft Sensors for
Monitoring and Control of Industrial Processes. London: Springer, 2010.
[50] Thor Mejdell and Sigurd Skogestad. Composition estimator in a pilot-plant distillation
column using multiple temperatures. Industrial and Engineering Chemistry Research,
30(12):2555?2564, 1991.
[51] J. V. Kresta, T. E. Marlin, and J. F. MacGregor. Development of inferential process
models using PLS. Computers and Chemical Engineering, 18(7):597?611, 1994.
[52] S. J. Qin and T. J. McAvoy. Nonlinear FIR modeling via a neural net PLS approach.
Computers and Chemical Engineering, 20(2):147?159, 1996.
[53] S.J. Qin and T.J. McAvoy. Nonlinear PLS modeling using neural networks. Computers
and Chemical Engineering, 16(4):379?391, 1992.
[54] S.Joe Qin. Neural networks for intelligent sensors and control?practical issues and
some solutions. In Omid Omidvar and David L. Elliott, editors, Neural Systems for
Control, pages 213?234. Academic Press, San Diego, 1997.
[55] Michael H. Kaspar and W. Harmon Ray. Dynamic PLS modelling for process control.
Chemical Engineering Science, 48(20):3447?3461, 1993.
[56] Ognjen Marjanovic, Barry Lennox, David Sandoz, Keith Smith, and Milton Crofts.
Real-time monitoring of an industrial batch process. Computers and Chemical Engi-
neering, 30(10-12):1476?1481, 2006.
171
[57] Tiina Komulainen, Mauri Sourander, and Sirkka-Liisa Jms-Jounela. An online ap-
plication of dynamic PLS to a dearomatization process. Computers and Chemical
Engineering, 28(12):2611?2619, 2004.
[58] Bao Lin, Bodil Recke, Jrgen K.H. Knudsen, and Sten Bay Jrgensen. A systematic
approach for soft sensor development. Computers and Chemical Engineering, 31(5-
6):419?425, 2007.
[59] Q. P. He and J. Wang. Statistics pattern analysis: A new process monitoring framework
and its application to semiconductor batch processes. AIChE Journal, 57:107?121,
2011.
[60] H.J. Galicia, Q.P. He, and J. Wang. Statistics pattern analysis - fault detection and
diagnosis. In Proceedings of the ISA Automation Week, Mobile, AL, USA, 2011.
[61] H.J. Galicia, Q.P. He, and J. Wang. Statistics Pattern Analysis based fault detection
and diagnosis. CPC VIII Conference, 55, 2012.
[62] H.J. Galicia, Q.P. He, and J. Wang. A comprehensive evaluation of Statistics Pattern
Analysis based process monitoring. In International Symposium on Advanced Control
of Chemical Processes ADCHEM, 2012. Accepted.
[63] Hector J. Galicia, Q. Peter He, and Jin Wang. A reduced order soft sensor approach
and its application to a continuous digester. Journal of Process Control, 21(4):489?500,
2011.
[64] H.J. Galicia, Q.P. He, J. Wang, Hodges R., G. Krishnagopalan, and H. Cullinan.
Outlier detection and process monitoring with application to a recursive soft sensor
update for digester control. AIChE Annual Meeting, 2010.
172
[65] H.J. Galicia, Q.P. He, J. Wang, Hodges R., G. Krishnagopalan, and H. Cullinan. A
multi-model recursive subspace identification based dynamic soft sensor approach for
digester control. AIChE Annual Meeting, 2009.
[66] H.J. Galicia, Q.P. He, J. Wang, Hodges R., G. Krishnagopalan, and H. Cullinan. A
subspace identification based dynamic soft sensor for digester control. AIChE Annual
Meeting, 2008.
[67] H. J. Galicia, Q. P. He, and J. Wang. Comparison of the performance of a reduced-
order dynamic PLS soft sensor with different updating schemes for digester control.
Control Engineering Practice, 2012. Accepted.
[68] H.J. Galicia, Q.P. He, and J. Wang. Adaptive kappa number prediction via reduced-
order dynamic PLS soft sensor for the control of a continious Kamyr digester. In
TAPPI-Control Systems Conference, 2012. Accepted.
[69] H.J. Galicia, Q.P. He, and J. Wang. Recursive update of a reduced-order dynamic
PLS soft sensor and its application to digester control. In Proceedings of the ISA
Automation Week, Mobile, AL, USA, 2011.
[70] H.J. Galicia, Q.P. He, and J. Wang. Adaptive reduced-order dynamic PLS soft sensor
with application to digester control. In Proceedings of the SHPE Conference, Anaheim,
CA, USA, 2011.
[71] H.J. Galicia, Q.P. He, and J. Wang. A Bayesian supervisory approach of outlier
detection for recursive soft sensor update. CPC VIII Conference, 54, 2012.
[72] H.J. Galicia, Q.P. He, and J. Wang. Adaptive outlier detection and classification
for online soft sensor update. In International Symposium on Advanced Control of
Chemical Processes ADCHEM, 2012. Accepted.
[73] I. T. Jolliffe. Principal Component Analysis. New York: Springer-Verlag, 1986.
173
[74] Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis. Chemo-
metrics and Intelligent Laboratory Systems, 2(13):37?52, 1987.
[75] Jan J. Gerbrands. On the relationships between SVD, KLT and PCA. Pattern Recog-
nition, 14(16):375?381, 1981.
[76] Barry M. Wise. Adapting Multivariate Analysis for Monitoring and Modeling Dynamic
Systems. Ph.D. dissertation, 1991.
[77] H. Hotelling. The generalization of student?s ratio. Annals of Mathematical Statistics,
2(3):360?378, 1931.
[78] Paul Nomikos and J. F. MacGregor. Multivariate SPC charts for monitoring batch
processes. Technometrics, 37(1):41?59, 1995.
[79] Paul Nomikos. Detection and diagnosis of abnormal batch operations based on multi-
way principal component analysis. ISA Transactions, World Batch Forum, Toronto,
May 1996, 35(3):259?266, 1996.
[80] S. Joe Qin, Sergio Valle, and Michael J. Piovoso. On unifying multiblock analysis with
application to decentralized process monitoring. Journal of Chemometrics, 15(9):715?
742, 2001.
[81] Evan Russell, Leo H. Chiang, and Richard D. Braatz. Data-driven Methods for Fault
Detection and Diagnosis in Chemical Processes. Springer, London, 2000.
[82] Junghui Chen and Kun-Chih Liu. On-line batch process monitoring using dynamic
PCA and dynamic PLS models. Chemical Engineering Science, 57(1):63?75, 2002.
[83] Weihua Li and S. Joe Qin. Consistent dynamic PCA based on errors-in-variables
subspace identification. Journal of Process Control, 11(6):661?678, 2001.
174
[84] Rongfu Luo, Manish Misra, and David M. Himmelblau. Sensor fault detection via
multiscale analysis and dynamic PCA. Industrial and Engineering Chemistry Research,
38(4):1489?1495, 1999.
[85] Ningyun Lu, Yuan Yao, Furong Gao, and Fuli Wang. Two-dimensional dynamic PCA
for batch process monitoring. AIChE Journal, 51(12):3300?3304, 2005.
[86] G.E.P. Box, M. J. Gwilym, and G.C. Reinsel. Time Series Analysis: Forecasting and
Control. Prentice Hall, Eglewood Cliffs, NJ, 1994.
[87] D. Dong and T. J. McAvoy. Nonlinear principal component analysis based on principal
curves and neural networks. Computers and Chemical Engineering, 20(1):65?78, 1996.
[88] Ji-Hoon Cho, Jong-Min Lee, Sang Wook Choi, Dongkwon Lee, and In-Beum Lee.
Fault identification for process monitoring using kernel principal component analysis.
Chemical Engineering Science, 60(1):279?288, 2005.
[89] B. Schlkopf, A. Smola, and K. Mller. Nonlinear component analysis as a kernel eigen-
value problem. Neural Computation, 10(5):1299?1399, 1998.
[90] Sang Wook Choi, Changkyu Lee, Jong-Min Lee, Jin Hyun Park, and In-Beum Lee.
Fault detection and identification of nonlinear processes based on kernel PCA. Chemo-
metrics and Intelligent Laboratory Systems, 75(1):55?67, 2005.
[91] Zhiqiang Ge, Chunjie Yang, and Zhihuan Song. Improved kernel PCA-based monitor-
ing approach for nonlinear processes. Chemical Engineering Science, 64(9):2245?2255,
2009.
[92] S. Mika, B. Schlkopf, A. J. Smola, K.-R. Mller, M. Scholz, and G. Rtsch. Kernel PCA
and de-noising in feature spaces. Advances in Neural Information Processing Systems
11, pages 536?542, 1999.
175
[93] Takashi Takahashi and Takio Kurita. Robust de-noising by Kernel PCA artificial
neural networks ICANN 2002. volume 2415 of Lecture Notes in Computer Science,
pages 789?789. Springer Berlin / Heidelberg, 2002.
[94] S. Haykin. Neural Networks. Prentice Hall, Englewood Cliffs, NJ, 1999.
[95] D. Tracy, N. Multivariate control charts for individual observations. Journal of Quality
Technology, 24(2):88?95, 1992.
[96] Ricardo Dunia, S. Joe Qin, Thomas F. Edgar, and Thomas J. McAvoy. Identification of
faulty sensors using principal component analysis. AIChE Journal, 42(10):2797?2812,
1996.
[97] Christian Jutten and Jeanny Herault. Blind separation of sources, part I: An adaptive
algorithm based on neuromimetic architecture. Signal Processing, 24(1):1?10, 1991.
[98] M. Girolami. Self-Organising Neural Networks: Independent Component Analysis and
Blind Source Separation. Springer-Verlag, London, 1991.
[99] Pierre Comon. Independent component analysis, A new concept? Signal Processing,
36(3):287?314, 1994.
[100] Aapo Hyvrinen. Fast and robust fixed-point algorithms for independent component
analysis. IEEE Transactions on Neural Networks, 10(3):626?634, 1999.
[101] Aapo Hyvrinen. The fixed-point algorithm and maximum likelihood estimation for
independent component analysis. Neural Processing Letters, 10(1):1?5, 1999.
[102] A. Hyvrinen and E. Oja. Independent component analysis: algorithms and applica-
tions. Neural Networks, 13(4-5):411?430, 2000.
[103] Manabu Kano, Shouhei Tanaka, Shinji Hasebe, Iori Hashimoto, and Hiromu Ohno.
Monitoring independent components for fault detection. AIChE Journal, 49(4):969?
976, 2003.
176
[104] T. Lee. Independent Component Analysis: Theory and Applications. MA: Kluwer
Academic, Boston, 1998.
[105] A. Hyvrinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley, New
York, 2001.
[106] Jong-Min Lee, ChangKyoo Yoo, and In-Beum Lee. Statistical process monitoring with
independent component analysis. Journal of Process Control, 14(5):467?485, 2004.
[107] Jong-Min Lee, ChangKyoo Yoo, and In-Beum Lee. Statistical monitoring of dynamic
processes based on dynamic independent component analysis. Chemical Engineering
Science, 59(14):2995?3006, 2004.
[108] Jong-Min Lee, ChangKyoo Yoo, and In-Beum Lee. Statistical monitoring of dynamic
processes based on dynamic independent component analysis. Chemical Engineering
Science, 59(14):2995?3006, 2004.
[109] Zhinong Li, Yongyong He, Fulei Chu, Jie Han, and Wei Hao. Fault recognition method
for speed-up and speed-down process of rotating machinery based on independent com-
ponent analysis and factorial hidden Markov model. Journal of Sound and Vibration,
291(1-2):60?71, 2006.
[110] George Stefatos and A. Ben Hamza. Dynamic independent component analy-
sis approach for fault detection and diagnosis. Expert Systems with Applications,
37(12):8606?8617, 2010.
[111] Abdulhamit Subasi and M. Ismail Gursoy. EEG signal classification using PCA, ICA,
LDA and support vector machines. Expert Systems with Applications, 37(12):8659?
8666, 2010.
177
[112] Yingwei Zhang and Yang Zhang. Fault detection of non-Gaussian processes
based on modified independent component analysis. Chemical Engineering Science,
65(16):4630?4639, 2010.
[113] Zhizhou Li, Guohua Liu, Rui Zhang, and Zhencai Zhu. Fault detection, identification
and reconstruction for gyroscope in satellite based on independent component analysis.
Acta Astronautica, 68(7-8):1015?1023, 2011.
[114] Yiu-Ming Cheung and Lei Xu. An empirical method to select dominant independent
components in ICA for time series analysis. In Neural Networks, 1999. IJCNN ?99.
International Joint Conference on, volume 6, pages 3883?3887 vol.6, 1999.
[115] A.D. Back and A.S. Weigend. A first application of independent component analysis
to extracting structure from stock returns. International Journal of Neural Systems,
8(4):473?484, 1997.
[116] J. F. Cardoso and A. Souloumiac. Blind beamforming for non-gaussian signals. Radar
and Signal Processing, IEE Proceedings F, 140(6):362?370, 1993.
[117] Yiu-ming Cheung and Lei Xu. Independent component ordering in ICA time series
analysis. Neurocomputing, 41(14):145?152, 2001.
[118] Paul Nomikos and John F. MacGregor. Multi-way partial least squares in monitoring
batchprocesses. Chemometrics and Intelligent Laboratory Systems, 30(1):97?108, 1995.
[119] Johan A. Westerhuis, Stephen P. Gurden, and Age K. Smilde. Generalized contribu-
tion plots in multivariate statistical process monitoring. Chemometrics and Intelligent
Laboratory Systems, 51(1):95?114, 2000.
[120] T Adamson, G. Moore, M. Passow, J. Wong, and Y. Xu. Strategies for succesfully
implementing fab-wide FDC methodologies in semiconductor manufacturing, 2006.
178
[121] M. A. A. Shoukat Choudhury, Sirish L. Shah, and Nina F. Thornhill. Diagnosis of poor
control-loop performance using higher-order statistics. Automatica, 40(10):1719?1728,
2004.
[122] Ashish Singhal and Dale E. Seborg. Pattern matching in multivariate time series
databases using a moving-window approach. Industrial and Engineering Chemistry
Research, 41(16):3822?3838, 2002.
[123] Barry M. Wise, Neal B. Gallagher, Stephanie Watts Butler, Daniel D. White, and
Gabriel G. Barna. A comparison of principal component analysis, multiway principal
component analysis, trilinear decomposition and parallel factor analysis for fault de-
tection in a semiconductor etch process. Journal of Chemometrics, 13(3-4):379?396,
1999.
[124] Evan L. Russell, Leo H. Chiang, and Richard D. Braatz. Fault detection in industrial
processes using canonical variate analysis and dynamic principal component analysis.
Chemometrics and Intelligent Laboratory Systems, 51(1):81?93, 2000.
[125] Harald Cramer. Mathematical Methods of Statistics. Princeton University Press,
Princeton, NJ, 1946.
[126] F. R. Pene. Rate of convergence in the multidimensional central limit theorem for
stationary processes. application to the knudsen gas and to the sinai billiard. Annals
of Applied Probability, 15:2331?2392, 2005.
[127] Jrme Dedecker and Emmanuel Rio. On mean central limit theorems for stationary
sequences. Annales de l?Institut Henri Poincar, Probabilits et Statistiques, 44:693?726,
2008.
[128] E. B. Martin and A. J. Morris. Non-parametric confidence bounds for process perfor-
mance monitoring charts. Journal of Process Control, 6(6):349?358, 1996.
179
[129] Manabu Kano, Koji Nagao, Shinji Hasebe, Iori Hashimoto, Hiromu Ohno, Ramon
Strauss, and Bhavik R. Bakshi. Comparison of multivariate statistical process moni-
toring methods with applications to the Eastman challenge problem. Computers and
Chemical Engineering, 26(2):161?174, 2002.
[130] J.J.DownsandE.F.Vogel. Aplant-wideindustrialprocesscontrolproblem. Computers
and Chemical Engineering, 17(3):245?255, 1993.
[131] K.H. Johansson. The quadruple-tank process: a multivariable laboratory process with
an adjustable zero. Control Systems Technology, IEEE Transactions, 8(3):456?465,
2000.
[132] A. Banerjee and Y. Arkun. Control configuration design applied to the Tennessee East-
man plant-wide control problem. Computers and Chemical Engineering, 19(4):453?480,
1995.
[133] Leo H. Chiang, Evan L. Russell, and Richard D. Braatz. Fault Detection and Diagnosis
in Industrial Systems. Springer, London, 2001.
[134] P. R. Lyman and C. Georgakis. Plant-wide control of the Tennessee Eastman problem.
Computers and Chemical Engineering, 19(3):321?331, 1995.
[135] N.L. Ricker. Decentralized control of theTennessee Eastman challenge process. Journal
of Process Control, 6(4):205?221, 1996.
[136] D. J. H. Wilson and G. W. Irwin. PLS modelling and fault detection on the Tennessee
Eastman benchmark. International Journal of Systems Science, 31(11):1449?1457,
2000.
[137] Kristian Helland, Hans E. Berntsen, Odd S. Borgen, and Harald Martens. Recursive
algorithm for partial least squares regression. Chemometrics and Intelligent Laboratory
Systems, 14(1-3):129?137, 1992.
180
[138] S. Qin. Recursive PLS algorithms for adaptive data modeling. Computers and Chemical
Engineering, 22(4-5):503?514, 1998.
[139] Petr Kadlec, Ratko Grbic, and Bogdan Gabrys. Review of adaptation mechanisms for
data-driven soft sensors. Computers and Chemical Engineering, 35(1):1?24, 2011.
[140] Petr Kadlec, Bogdan Gabrys, and Sibylle Strandt. Data-driven soft sensors in the
process industry. Computers and Chemical Engineering, 33(4):795?814, 2009.
[141] M. Kocurek, T. Grace, and E. Malcolm. Pulp and Paper Manufacture - Alkaline
Pulping. TAPPI-USA and CPPA-Canada, Norcross, GA, third edition edition, 1989.
[142] N. DeKing. Pulp and paper global fact and price book 2003-2004. Paper loop, Inc.,
Boston, 2004.
[143] J. J. Castro and F. J. Doyle. A pulp mill benchmark problem for control: application
of plantwide control design. Journal of Process Control, 14(3):329?347, 2004.
[144] M. Paulonis and G. Krishnagopalan. Adaptive inferential control of kraft batch di-
gesters as based on pulping liquor analysis. TAPPI Journal, 74:169?175, 1991.
[145] C.C. Smith and T.J. Williams. Mathematical modeling, simulation and control of the
operation of kamyr continuous digester for kraft process. Technical Report 64, Purdue
University, PLAIC, West Lafayette, IN 47907, 1974.
[146] Philip A. Wisnewski, Francis J. Doyle III, and Ferhan Kayihan. Fundamen-
tal continuous-pulp-digester model for simulation and control. AIChE Journal,
43(12):3175?3192, 1997.
[147] F.J. Doyle. Nonlinear inferential control for process applications. Journal of Process
Control, 8(56):339?353, 1998.
[148] III Doyle, F.J. and F. Kayihan. Reaction profile control of the continuous pulp digester.
Chemical Engineering Science, 54:2679?2688, 1999.
181
[149] Philip A. Wisnewski and Francis J. Doyle. A reduced model approach to estimation
and control of a Kamyr digester. Computers and Chemical Engineering, 20(Supplement
2):S1053?S1058, 1996.
[150] Philip A. Wisnewski and Francis J. Doyle. Control structure selection and model
predictive control of the Weyerhaeuser digester problem. Journal of Process Control,
8(5-6):487?495, 1998.
[151] Jay H. Lee and A. K. Datta. Nonlinear inferential control of pulp digesters. AIChE
Journal, 40(1):50?64, 1994.
[152] Raja Amirthalingam and Jay H. Lee. Subspace identification based inferential control
applied to a continuous pulp digester. Journal of Process Control, 9(5):397?406, 1999.
[153] R. Amirthalingam, S. W. Sung, and J. H. Lee. Two-step procedure for data-based
modeling for inferential control applications. AIChE Journal, 46(10):1974?1988, 2000.
[154] Bhupinder S. Dayal and John F. MacGregor. Recursive exponentially weighted PLS
and its applications to adaptive control and prediction. Journal of Process Control,
7(3):169?179, 1997.
[155] Weihua Li, H. Henry Yue, Sergio Valle-Cervantes, and S. Joe Qin. Recursive PCA for
adaptive process monitoring. Journal of Process Control, 10(5):471?486, 2000.
[156] Sungyong Park and Chonghun Han. A nonlinear soft sensor based on multivariate
smoothing procedure for quality estimation in distillation columns. Computers and
Chemical Engineering, 24(2-7):871?877, 2000.
[157] C. T. Chou and Michel Verhaegen. Subspace algorithms for the identification of mul-
tivariable dynamic errors-in-variables models. Automatica, 33(10):1857?1869, 1997.
[158] Jin Wang and S. Joe Qin. A new subspace identification approach based on principal
component analysis. Journal of Process Control, 12(8):841?855, 2002.
182
[159] Alessandro Chiuso and Giorgio Picci. Consistency analysis of some closed-loop sub-
space identification methods. Automatica, 41(3):377?391, 2005.
[160] Lennart Ljung and Tomas McKelvey. Subspace identification from closed loop data.
Signal Processing, 52(2):209?215, 1996.
[161] U. Forssell and L. Ljung. Closed-loop identification revisited. Automatica, 35:1215?
1241, 1999.
[162] M. Gilson and P.V. den Hof. IV method for closed-loop system identification. In Pro-
ceedings of 13th IFAC symposium on System Identification, pages 537?542, Rotterdam,
Netherlands, 2003.
[163] J. Wang and S.J. Qin. Closed loop subspace identification using the parity space.
Automatica, 42:315?320, 2006.
[164] Biao Huang, Steven X. Ding, and S. Joe Qin. Closed-loop subspace identification: an
orthogonal projection approach. Journal of Process Control, 15(1):53?66, 2005.
[165] P. Van Overschee and B. De Moor. N4SID: Subspace algorithms for the identification
of combined deterministic-stochastic systems. Automatica, 30:75?93, 1994.
[166] P.A. Wisnewski and III Doyle, F.J. Model-based predictive control studies for a contin-
uous pulp digester. Control Systems Technology, IEEE Transactions on, 9(3):435?444,
2001.
[167] Dale E. Seborg, Thomas F. Edgar, and Duncan A. Mellichamp. Process Dynamics and
Control. John Wiley and Sons, Hoboken, NJ, 2nd edition, 2011.
[168] Babu Joseph and Coleman B. Brosilow. Inferential control of processes: Part I. steady
state analysis and design. AIChE Journal, 24(3):485?492, 1978.
[169] T. Tolliver and L. McCune. Distillation control design based on steady state simulation.
ISA Transactions, 17:3?10, 1978.
183
[170] B. Bequette and T. Edgar. AIChE Annual Meeting, 1984.
[171] C. Georgakis and D. Kindt. AIChE Annual Meeting, 1984.
[172] Gade P. Rangaiah and Peruvemba R. Krishnaswamy. Estimating second-order
plus dead time model parameters. Industrial and Engineering Chemistry Research,
33(7):1867?1871, 1994.
[173] G. P. Rangaiah and P. R. Krishnaswamy. Estimating second-order dead time parame-
ters from underdamped process transients. Chemical Engineering Science, 51(7):1149?
1155, 1996.
[174] K. Fujiwara, M. Kano, S. Hasebe, and A. Takinami. Soft-sensor development using
correlation-based just-in-time modeling. AIChE Journal, 55(7):1754?1765, 2009.
[175] W. Larimore. Statistical Methods in Control and Signal Processing. Marcel Dekker,
New York, 1997.
[176] J. Wang and Q. P. He. A practical solution for continuous digester control subspace
identification based inferential control revisited. Proceedings of Advanced Control of
Industrial Processes-International Conference, pages 433?438, 2008.
[177] R.E. Hodges. Applications of near infrared spectroscopy in the pulp and paper industry.
Ph.D. dissertation, 1999.
[178] R.E. Hodges and G.A. Krishnagopalan. Near-infrared spectroscopy for on-line analysis
of white and green liquors. TAPPI Journal, 82(9):101, 1999.
[179] Thor Mejdell and Bengt-Ove Andersson. Using temperature profile for product quality
estimation on a distillation column. ISA Transactions, 33(1):27?34, 1994.
[180] L. Fortuna, S. Graziani, and M. G. Xibilia. Soft sensors for product quality monitoring
in debutanizer distillation columns. Control Engineering Practice, 13(4):499?508, 2005.
184
[181] Manabu Kano and Yoshiaki Nakagawa. Data-based process monitoring, process con-
trol, and quality improvement: Recent developments and applications in steel industry.
Computers and Chemical Engineering, 32(1-2):12?24, 2008.
[182] Weiwu Yan, Huihe Shao, and Xiaofan Wang. Soft sensing modeling based on support
vector machine and Bayesian model selection. Computers and Chemical Engineering,
28(8):1489?1498, 2004.
[183] Feng Rui, Shen Wei, and Shao Huihe. A soft sensor modeling approach using support
vector machines. In American Control Conference, 2003. Proceedings of the 2003,
volume 5, pages 3702?3707 vol.5, 2003.
[184] Fredrik Lindgren, Paul Geladi, and Svante Wold. The kernel algorithm for PLS. Jour-
nal of Chemometrics, 7(1):45?59, 1993.
[185] Operator. Personal discussion with operator at MeadWestvaco Corp., 2007.
[186] M. Hubert and K. Vanden Branden. Robust methods for partial least squares regres-
sion. Journal of Chemometrics, 17(10):537?549, 2003.
[187] V. Hodge and J. Austin. A survey of outlier detection methodologies. Artificial Intel-
ligence Review, 22(2):85?126, 2004.
[188] R. K. Pearson. Outliers in process modeling and identification. Control Systems
Technology, IEEE Transactions on, 10(1):55?63, 2002.
[189] L. Davies and U. Gather. The identification of multiple outliers. Journal of the Amer-
ican Statistical Association, 88(423):782?792, 1993.
[190] I.T. Jolliffe. Principal Component Analysis. Springer, 2002.
[191] Harald Martens and Tormod Naes. Multivariate Calibration. John Wiley and Sons
Ltd., 1989.
185
[192] I. N. Wakelinc and H. J. H. Macfie. A robust PLS procedure. Journal of Chemometrics,
6(4):189?198, 1992.
[193] David J. Cummins and C. Webster Andrews. Iteratively reweighted partial least
squares: A performance analysis by Monte Carlo simulation. Journal of Chemometrics,
9(6):489?507, 1995.
[194] Juan A. Gil and Rosario Romera. On robust partial least squares (PLS) methods.
Journal of Chemometrics, 12(6):365?378, 1998.
[195] Randy J. Pell. Multiple outlier detection for multivariate calibration using robust
statistical techniques. Chemometrics and Intelligent Laboratory Systems, 52(1):87?
104, 2000.
[196] R.D. Cook and S. Weisberd. Residuals and Influence in Regression. Chapman and
Hall, London, 1982.
[197] B. Walczak and D. L. Massart. Robust principal components regression as a detection
tool for outliers. Chemometrics and Intelligent Laboratory Systems, 27(1):41?54, 1995.
[198] G.E.P. Box. Some theorems on quadratic forms applied in the study of analysis of
variance problems. The Annals of Mathematical Statistics, 25(2):290?302, 1954.
[199] Jin Wang and Q. Peter He. A Bayesian approach for disturbance detection and classifi-
cation and its application to state estimation in run-to-run control. IEEE Transactions
on semiconductor manufacturing, 20(2):126?136, 2007.
[200] A. Hu. An optimal Bayesian process control for flexible manufacturing processes. Ph.D.
dissertation, 1992.
[201] H.J. Galicia, Q.P. He, and J. Wang. Statistics Pattern Analysis based virtual metrology
for plasma etch processes. In American Control Conference, 2012. Accepted.
186
Appendices
187
Appendix A
Derivation of the linear relationship between eck and esk
The state-space representation of the system which includes both the primary and sec-
ondary outputs is the following,
xk+1 = Axk +Buk + [ Ks Kc ]
?
?? esk
eck
?
?? (A.1)
?
?? ysk
yck
?
?? =
?
?? Cs
Cc
?
??x
k +
?
?? esk
eck
?
?? (A.2)
Since we assume that the secondary output variables and the input variables can com-
pletely describe the system dynamics, the system can also be described by the following
state-space representation,
xk+1 = Axk +Buk +Kesk (A.3)
ysk = Csxk +esk (A.4)
By comparing Eqn. (A.1) and Eqn.(A.3), we can see that
[ Ks Kc ]
?
?? esk
eck
?
??= Kes
k (A.5)
188
By rearranging the above equation we have
eck = (Kc)? (K?Ks)esk
= Fesk (A.6)
where (Kc)? denotes the pseudo-inverse of Kc. Eqn. (A.6) shows that eck and esk are linearly
correlated under the assumption that (A,Cs) is observable.
189
Appendix B
Detailed evaluation of Ebraceleftbigecf(k)bracketleftbig[zp(k)]T[zf?1(k)]Tbracketrightbigbracerightbig
E
?
??
??e
c
f(k)
?
?? zp(k)
zf?1(k)
?
??
T??
?
??
= lim
n??
1
n
??
????
????
????
????
????
?
????
???
????
????
????
???
?
??
??
??
??
eck eck+1 ??? eck+n?1
eck+1 eck+2 ??? eck+n
... ... ??? ...
eck+f?1 eck+f ??? eck+f+n?2
?
??
??
??
??
?
??
??
??
??
??
??
??
??
??
??
?
uk?p uk?p+1 ??? uk?p+n?1
ysk?p ysk?p+1 ??? ysk?p+n?1
... ... ??? ...
uk uk+1 ??? uk+n?1
ysk ysk+1 ??? ysk+n?1
... ... ??? ...
uk+f?2 uk+f?1 ??? uk+f+n?3
ysk+f?2 ysk+f?1 ??? ysk+f+n?3
?
??
??
??
??
??
??
??
??
??
??
?
T???
???
????
????
????
????
?
????
???
????
????
????
???
(B.1)
190
= E
?
??
??
??
??
??
??
??
??
??
??
??
??
??
??
eckuk?p eck+1uk?p ??? eck+f?1uk?p
eckysk?p eck+1ysk?p ??? eck+f?1ysk?p
... ... ??? ...
eckuk eck+1uk ??? eck+f?1uk
eckysk eck+1ysk ??? eck+f?1ysk
eckuk+1 eck+1uk+1 ??? eck+f?1uk+1
eckysk+1 eck+1ysk+1 ??? eck+f?1ysk+1
... ... ??? ...
eckuk+f?2 eck+1uk+f?2 ??? eck+f?1uk+f?2
eckysk+f?2 eck+1ysk+f?2 ??? eck+f?1ysk+f?2
?
??
??
??
??
??
??
??
??
??
??
??
??
??
??
T
=
?
??
??
??
??
??
??
??
??
??
??
??
??
??
??
0 0 ??? 0
0 0 ??? 0
... ... ??? ...
0 0 ??? 0
0 0 ??? 0
E{eckuk+1} 0 ??? 0
Ebraceleftbigeckysk+1bracerightbig 0 ??? 0
... ... ... ...
E{eckuk+f?2} Ebraceleftbigeck+1uk+f?2bracerightbig ??? 0
Ebraceleftbigeckysk+f?2bracerightbig Ebraceleftbigeck+1ysk+f?2bracerightbig ??? 0
?
??
??
??
??
??
??
??
??
??
??
??
??
??
??
T
negationslash= 0 (B.2)
It can be seen from the above derivation that the lower triangular part of the matrix is
non-zero due to feedback control, which is the reason why the linear relationship Eqn. (6.9)
does not provide unbiased estimate of ycf(k) under closed-loop control when f > 1.
191
Appendix C
Detailed derivation of Equation (5.13) and (5.14) for mean and variance recursive update.
For variable mean, by plugging the definition of variable mean to the right hand side
(RHS) of Equation (5.13):
RHS = n?1n
parenleftBigg
1
n?1
n?1summationdisplay
i=1
xi
parenrightBigg
+ 1nxn
RHS =
summationtextn
i=1 xi
n = xn = LHS (C.1)
where LHS denotes the left hand side of Equation (5.13).
For variable variance in Equation (5.14):
192
RHS = n?2n?1
bracketleftBigg
1
n?2
n?1summationdisplay
i=1
(xi ?xn?1)2
bracketrightBigg
+(xn ?xn)
2
n?1 + (xn ?xn?1)
2
= 1n?1
bracketleftBiggn?1summationdisplay
i=1
(xi ?xn +xn ?xn?1)2 + (xn ?xn)2
bracketrightBigg
+(xn ?xn?1)2
= 1n?1
bracketleftBigg nsummationdisplay
i=1
(xi ?xn)2 + (n?1)(xn ?xn?1)2
+2(xn ?xn?1)
n?1summationdisplay
i=1
(xi ?xn)
bracketrightBigg
+ (xn ?xn?1)2
= ?2n + (xn ?xn?1)2 + 2n?1(xn ?xn?1)
[(n?1)(xn?1 ?xn)] + (xn ?xn?1)2
= ?2n + 2(xn ?xn?1)2 ?2(xn ?xn?1)2
= ?2n
= LHS (C.2)
193
Appendix D
Derivation of Equation (5.16).
xn?1 = nn?1xn ? 1n?1xn
(xn ?xn?1)2 =
parenleftbigg
xn ? nn?1xn + 1n?1xn
parenrightbigg2
(xn ?xn?1)2 =
bracketleftbigg 1
n?1 (xn ?xn)
bracketrightbigg2
Therefore,
(xn ?xn?1)2
1
n?1 (xn ?xn)
2 =
1
n?1 (D.1)
194
Appendix E
Detailed derivation of Equation (5.17) and (5.18) for block mean and variance recursive
update.
For variable mean in Equation (5.17):
RHS = (n?n1)n
summationtextn?n1
i=1 xi
n?n1 +
n1
n
summationtextn
i=n?n1+1 xi
n1
= 1n
bracketleftBiggn?n
1summationdisplay
i=1
xi +
nsummationdisplay
i=n?n1+1
xi
bracketrightBigg
= 1n
nsummationdisplay
i=1
xi = xn = LHS (E.1)
195
For variable variance in Equation (5.18):
RHS = n?n1 ?1n?1 ?2n?n1 + n1 ?1n?1 ?2n,B
+n1 (n?n1)n(n?1) (xn?n1 ?xn,B)2
= 1n?1
bracketleftBiggn?n
1summationdisplay
i=1
(xi ?xn?n1)2 +
nsummationdisplay
i=n?n1+1
(xi ?xn,B)2
bracketrightBigg
+n1 (n?n1)n(n?1) (xn?n1 ?xn,B)2
= 1n?1
bracketleftBiggn?n
1summationdisplay
i=1
(xi ?xn +xn ?xn?n1)2 +
nsummationdisplay
i=n?n1+1
(xi ?xn +xn ?xn,B)2
bracketrightBigg
+n1 (n?n1)n(n?1) (xn?n1 ?xn,B)2
= 1n?1
bracketleftBigg nsummationdisplay
i=1
(xi ?xn)2 + (n?n1)(xn ?xn?n1)2
+2
n?n1summationdisplay
i=1
(xi ?xn)(xn ?xn?n1) +n1 (xn ?xn,B)2
+2
nsummationdisplay
i=n?n1+1
(xi ?xn)(xn ?xn,B)
bracketrightBigg
+ n1 (n?n1)n(n?1) (xn?n1 ?xn,B)2
= ?2n + 1n?1bracketleftbig(n?n1)(xn ?xn?n1)2 ?2(n?n1)(xn ?xn?n1)2
+n1 (xn ?xn,B)2 ?2n1 (xn ?xn,B)2bracketrightbig+ n1 (n?n1)n(n?1) (xn?n1 ?xn,B)2
= ?2n ? 1n?1bracketleftbig(n?n1)(xn ?xn?n1)2 +n1 (xn ?xn,B)2bracketrightbig
+n1 (n?n1)n(n?1) (xn?n1 ?xn,B)2 (E.2)
The first and second terms inside the bracket in Equation (E.2) can be rewritten using
Equation (5.17) as follows:
(n?n1)(xn ?xn?n1)2 = (n?n1)
bracketleftBign1
n xn,B ?
n1
n xn?n1
bracketrightBig2
(E.3)
n1 (xn ?xn,B)2 = n1
bracketleftbiggn?n
1
n xn?n1 ?
n?n1
n xn,B
bracketrightbigg2
(E.4)
196
Therefore, Equation (E.2) can be rearranged as follows:
RHS = ?2n ? 1n?1
bracketleftBigg
(n?n1)n21
n2 (xn?n1 ?xn,B)
2 + n1 (n?n1)
2
n2 (xn?n1 ?xn,B)
2
bracketrightBigg
+n1 (n?n1)n(n?1) (xn?n1 ?xn,B)2
= ?2n ? 1n?1n1 (n?n1)n2 [n1 + (n?n1)]bracketleftbig(xn?n1 ?xn,B)2bracketrightbig
+n1 (n?n1)
2
n2 (xn?n1 ?xn,B)
2
RHS = ?2n = LHS (E.5)
197