p)
Minimum covariance determinant (MCD) estimator [35, 36] of multivariate location
and covariance matrix are popular for this case because of its high resistance to
outliers. It also provides fast algorithm (FAST-MCD) [37] for computation.
In this algorithm: the initial MCD estimators are deflned, based on h observa-
tions (h < n), as the mean ^?0 and the covariance matrix ^?0. The covariance matrix of
these h observations has the lowest determinant and h should be at least [(n+p+1)=2].
MCD estimator can resist n?h outliers and with this choice the MCD estimator has
a breakdown value of (n?h+1)=n. The value of h is taken approximately between
0:5n and 0:75n. The value h ? 0:5n is taken when there is 50% contamination and
if there is 25% contamination then the value of h ? 0:75n. When there are smaller
number of outliers the value of h is increased for a more precise estimates.
Reweighting is then done to increase the flnite sample e?ciency. Each data
point receives a weight 1 if its robust distance RD(xi) =
q
(xi ? ^?0)0^??10 (xi ? ^?0)
is ?
q
?2p;0:975 and weight 0 otherwise. For the observations with weight one the
reweighted MCD estimator is then deflned as the classical mean ^?M and covariance
matrix ^?M. The robust loadings are the flrst k1 eigenvectors of the MCD estimator
of ^?M sorted in descending order of the eigenvalues [20, 21, 22].
ROBPCA for high dimensional data (p > n)
For data with high-dimension (p > n), the MCD estimator can not be used because
the covariance matrix of h < p observations is always singular and can not be min-
imized. In this case ROBPCA method suggested by Hubert et al. [20, 21, 22] is
used on the Xn;p data. ROBPCA method is a combination of both projection pursuit
technique (PP)[19] and MCD covariance estimation in lower dimensions. PP is used
14
flrst to reduce dimension. The MCD method is then applied to this low dimensional
subspace to estimate the center and the scatter of the data.
Initial data preprocessing is done by applying singular value decomposition of
Xn;p. This results in huge dimension reduction as the data are represented using at
most n?1 = rank( ~Xn;p) variables without loss of information.
By applying PP, the high dimensional data points are projected on many uni-
variate directions. Then the robust center ^?r and scale ^ r (based on univariate MCD
method) of these projected data points on every direction are computed. For each
projected data point, the Stahel-Donoho?s outlyingness measure:
Outl(zj) = maxvj z
0
iv ? ^?r j
^ r ; i = 1;:::;n:
is used to form h subset, that has smallest outlyingness. Optimal k0 ? p principal
components are then selected from the covariance matrix of the flnal h subset. The
data are then projected onto this k0 dimensional subspace. Next the reweighted MCD
estimator is used to compute the center and the scatter of the data points in this low-
dimensional subspace. The dominant k1 eigenvectors of this covariance matrix are
the k1 robust principal components, and the MCD location and covariance matrix
estimates serve as robust estimates for the location vector ? and covariance matrix
.
BACONPCA in low dimensions (n > p)
The Blocked Adaptive Computationally E?cient Outlier Nominators (BACON) al-
gorithm developed by Billor et al. [2] is used for this robust procedure. BACON
is a cost efiective, fast computational method with high breakdown point based on
measuring robust distances from a basic subset, which is free of outliers. The initial
15
basic subset is derived algorithmically in two ways by Mahalanobis distances or by
Euclidean distances.
For BACONPCA in low dimensions, initial dimension reduction of the mean
centered data matrix Xc is done by singular value decomposition (SVD).
Xc = (X ?1n^?0) = UD?0;
where ^? is the classical mean vector, D is a p?p diagonal matrix of the eigenvalues
of the X0cXc and U0U = Ip = ?0?, ? is the matrix of the eigenvectors corresponding
to the eigenvalues of X0cXc. Ip is the p?p identity matrix. The next step is to obtain
the score matrix, Z = Xc?. The robust mean, ^?B and the robust variance-covariance
matrix, ^?B, are computed from clean observations obtained from BACON algorithm.
From the BACON based covariance matrix, ^?B, number of robust PCs are determined
as k1. ?1 is the matrix of the eigenvectors corresponding to the nonzero eigenvalues
of ^?B. Finally, robust score matrix, Z1 = (Z ?1n^?0B)?1, is obtained [3].
BACONPCA in high dimensions (p > n)
In this case BACONPCA method, suggested by Billor et al. [3], is used on the
centered Xc data. The mean centered data matrix Xc are preprocessed by singular
value decomposition (SVD) based on the eigenvalues and the eigenvectors of XcX0c
instead of X0cXc. Since decomposition of XcX0c is much faster than that of X0cXc.
Then the score matrix Z = Xc? is obtained, where ? is the matrix of the eigenvectors
corresponding to the eigenvalues of XcX0c.
Since BACON or MCD methods are useful only when n > p, these methods can-
not be used, where n < p, to determine clean observations of Z because of singularity
16
of the covariance matrix. Stahel-Donoho?s outlyingness measure is useful to deter-
mine a clean set of h observations of Z (Hubert et al. [22]). The high dimensional
data points, zi, are projected onto many univariate directions v. For every direction
v, robust center ?r and robust standard deviation, ^ r (based on univariate BACON
method) are obtained for the projected observations, z0iv (i = 1;:::;n). Outlyingness
measure based on these robust center and scale values can be deflned as:
Outl(zj) = maxvj z
0
iv ? ^?r j
^ r ; i = 1;:::;n:
This measure will detect the points which are outlying on the projection vector.
Therefore, this will result into h clean set of observations (h=0:75n). For h observa-
tions the mean, ^?1, and the scatter matrix, ^?1, of the Z matrix are obtained. The
spectral decomposition of ^?1, gives:
^?1 = ?1?1?01;
where ?1 is the matrix of the eigenvectors corresponding to the eigenvalues of ^?1, ? =
diag(?1;:::;?p) is the diagonal matrix of the eigenvalues of ^?1. Then we determine
the retaining number of principal components k0 < p by using some techniques, like
a scree plot. The data are then projected onto the subspace spanned by the flrst k0
eigenvectors of the covariance matrix ^?1, that is
Z2 = (Zn?p ?1n^?01)?p?k0
17
where ?p?k0 is a matrix of the flrst k0 eigenvectors ?1. Next, BACON algorithm is
applied to flnd the mean vector, ^?B, and scatter matrix, ^?B, of the matrix, Z2. Based
on the robust covariance matrix, ^?B, the robust PCs are obtained as:
Z3 = (Z2 ?1n^?0B)??p?k1;
where ??p?k1 is the matrix of eigenvectors corresponding to the flrst k1 eigenvalues,
that are determined by a selection criteria (e.g., a scree plot), of the robust BACON
based covariance matrix ^?B [3].
2.3 Classical and Robust Functional Principal Component Analysis and
Outlier Detection
When the dataset is in the form of a curve, the procedure for PCA can be
generalized for functional principal component analysis (FPCA) to obtain main modes
of variability for the curves.
2.3.1 Classical Functional Principal Component Analysis
When the dataset is in the form of a curve, the procedure for classical PCA
can be generalized for Functional Principal Component Analysis (FPCA) to obtain
main modes of variability for the curves. Instead of variable values xij, used in
PCA, functional values xi(t) are used in FPCA, so that the discrete index j in the
multivariate context is replaced by continuous index t. Unlike multivariate PCA,
components in functional PCs are functions rather than vectors. So summations over
j are replaced by integrations over t.
18
Let fx(t);t 2Tg be a stochastic process where T is some index set which is a
bounded interval on <. The principal component scores corresponding to weight is
generalized to an integral form,
Zi =
Z
j(t)xi(t)dt: (2.2)
The weight function j(t) is obtained by solving
max
h j; mi=I(j=m); j?m
N?1X(
Z
jxi)2 (2.3)
or equivalent to solving the functional eigenequation
Z
?(s;t) (t)dt = ? (s); 2 L2; (2.4)
where ? is the covariance function of the x(t). The sequence of eigenfunctions i,
i = 1;2;:::, sorted with respect to the corresponding eigenvalues ?1 ? ?2 ? ::: solves
the FPCA problem (2.3). The eigenequation is the same general equation as in PCA,
except here is now an eigenfunction rather than an eigenvector. There is a major
difierence between the multivariate and functional eigenanalysis. In multivariate case
the eigenvalue-eigenfunction pairs are p (number of variables) whereas, in functional
case they are inflnite (number of functional values). In practice, the unknown covari-
ance function ? needs to be estimated by the sample values xi(t), 1 ? i ? n; where
for each i; xi(t) is observed on a discrete set of points t = ft1;:::;tpg for flnite p.
FPCA problem can be represented in terms of basis function approach. In which,
flrst k bases functions in a basis f`1;:::;`kg are used, where k is large enough, so
that these functions will be able to describe most of the features of the data. The
bases are selected based on the nature of the data; for example if the data are smooth
19
and periodic then a Fourier basis might be ideal and for data that have a lot of local
features then B-splines might work better. Approximate each xi by:
^xi(t) =
kX
K=1
ciK`K(t): (2.5)
We can express all n curves simultaneously by deflning the vector-valued function x to
have components x1;x2;:::;xn and the vector valued function ` to have components
`1;:::;`k as:
x = C`; (2.6)
where the coe?cient matrix C is n ? k. In matrix terms the variance-covariance
function is:
?(s;t) = n?1`(s)0C0C`(t): (2.7)
Deflne W as a symmetric matrix of order k
W =
Z
``0: (2.8)
Suppose that the weight function has the expansion
(s) = XbK`K(s) (2.9)
and in matrix notation, (s) = `(s)0b. Using equations (2.6-2.9) the left hand side of
eigen equation (2.4) becomes
Z
?(s;t) (t)dt =
Z
n?1`(s)0C0C`(t)`(t)0bdt
= `(s)0n?1C0CW0b:
20
The eigenequation can be written as:
`(s)0n?1C0CWb = ?`(s)0b: (2.10)
As this equation holds true for all s, it can be written in matrix form in following
manner:
n?1C0CWb = ?b: (2.11)
Ask k= 1 implies b0Wb = 1 and similarly, two functions 1 and 2 will be orthogonal
if and only if the corresponding vectors of coe?cients satisfy b01Wb2 = 0. We deflne
u = W1=2b to get the required principal components by solving equivalent symmetric
eigenvalue problem
n?1W1=2C0CW1=2u = ?u (2.12)
and compute b = W?1=2u for each eigenvector. If the basis is orthonormal then
W = I. The functional PCA problem reduces to the standard multivariate PCA of
the coe?cient array C.
In this section, we examined FPCA as a dimension reduction tool. Although,
FPCA solves dimensionality problem, it fails to deal with data containing outliers. In
next section a new robust method, robust FPCA method, is given to overcome this
problem.
2.3.2 Robust Functional Principal Component Analysis and Outlier De-
tection
In this section, an outlier detection method for functional data via robust FPCA
is given. The robust FPCA method is to obtain functional principal components
21
that are less in uenced by outliers. Outlier detection method proposed by Febrero et
al.[13] is discussed and then the construction of the robust FPCA method is described.
Outlier Detection using Likelihood Ratio Test
Outlier detection procedure in functional data using Likelihood Ratio Test (LRT)
is developed by Febrero et al.[13], which is based on distance measure. Let the
functional sample be x1;:::;xn and the statistic is given as:
Ofi(xi) = kxi ? ^?TM;fi^
TSD;fi
k;
? = max1?i?nOfi(xi);
where k ? k is a norm in the functional space (k ? k1, k ? k2 or k ? k1), ^?TM;fi is the
fi-trimmed mean and ^ TSD;fi is the fi-trimmed standard deviation. Hence, Ofi(xi)
is the distance between xi and ^?TM;fi relative to ^ TSD;fi. The presence of outliers is
determined by comparing the test statistic (?) with some threshold and an iterative
procedure.
Description of the Proposed Method
Consider the functions as processes in continuous time deflned over an interval,
say T 2 [tmin; tmax]. The ith replication of functional observation is denoted as
xi(t) 2 L2[T]; i = 1;:::;n. In practice, it is impossible to observe the functional
values in continuous time. We usually obtain data only on a flnite and discrete grid
22
t = ft1;t2;:::;tpg2 T in the following manner:
yi = xi(t)+?i; 1 ? i ? n;
where ?i is a random error or noise with zero mean and constant variance function
2i(t). For simplicity, we assume that all processes are observed at the same time
points, which are equally spaced on T and is denoted by t = ft1;t2;:::;tpg.
The function xi can be represented as a linear combination of the flrst k basis
functions `K; K = 1;:::;k, where k is large enough, k < p using basis expansion
method given in Chapter 1 as:
xi(t) =
kX
K=1
ciK`K(t);
where ` is vector-valued function having components `1;:::;`k. The C is n x k
coe?cient matrix of the expansion, where C = [ciK]; 1 ? i ? n; 1 ? K ? k. The
simultaneous expansion of all n curves can be expressed in matrix notation as:
x = C`;
where x is a vector-valued function with xi, 1 ? i ? n, as its components.
We assume that basis function is orthonormal. To select optimal number of basis
functions, k, GCV developed by Craven and Wahba [9], is used which is described in
the following section.
Coe?cient Estimation: On partially observed functions the coe?cients ciK are com-
puted by using the least squares approach, for i = 1;:::;n and K = 1;:::;k,
nX
i=1
[yi(t)?
kX
K=1
ciK`K(t)]2
23
= (y ?C`)0(y ?C`)
=k y ?C` k2 :
SincewedealwithbasisfunctionthatisorthonormalthefunctionalPCAproblem
reduces to the standard multivariate PCA of the coe?cient array C (see Section
2.3.1). Instead of dealing with FPCA we apply classical PCA on C. Applying PCA
on C would provide the equivalent information about the structure of the covariance
function of functional data x(t). Outliers in C will be equivalent to the outliers in
functional data x(t). Therefore, the diagnostic plots developed to detect outliers
by using multivariate PCA method can also be used to detect functional outliers.
Diagnostic plot, Orthogonal-score plot [22], which is a by-product the robust PCA
method is used for identiflcation and classiflcation of outliers. By using PCA method
we obtain robust scores Z in the following manner:
Z = C ?V;
where Z is n ? k1 matrix, C is n ? k matrix of the coe?cients, V is k ? k1 robust
eigenvectors and k1 ? k. The selection criteria to choose the components k1 is based
on the eigenvalues. The predetermined threshold value is 90%. The optimal number
of components k1 is the minimal value for which the cumulative percentage of total
variance is greater than or equal to 90%.
Robust coe?cients are obtained by transforming the data back to