Robust Statistical Methods for the Functional Logistic Model
by
Melody B. Denhere
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
August 3, 2013
Keywords: functional data, outliers, diagnostics, logistic regression
Copyright 2013 by Melody B. Denhere
Approved by
Nedret Billor, Chair, Associate Professor of Mathematics and Statistics
Asheber Abebe, Associate Professor of Mathematics and Statistics
Peng Zeng, Associate Professor of Mathematics and Statistics
Abstract
Over the last decade or so, a lot of interest has emerged in the  eld of functional data
analysis. This interest spans from a broad spectrum of  elds such as brain imaging studies,
bio-metrics, genetics, e-commerce and computer science. Statistical tools, models and meth-
ods, whose strength is in recognizing this structural aspect of data are being discussed and
developed; ranging from functional linear regression, functional ANOVA, functional princi-
pal component analysis and functional outlier detection. In this work, we discuss statistical
methods for the functional logistic regression model; a model where the response is binary
and the covariate(s) functional. Essentially, we consider ways that allow for the parameter
estimator to be resistant to outliers, in addition to eliminating multicollinearity and high
dimensional problems; issues which are inherent with functional data. The methods include
robust techniques of estimating the parameter function for the model as well as diagnostic
measures to assess the  t of the model. Two estimation approaches are discussed; the  rst
one makes use of robust principal component estimation techniques and the second one uses
a robust penalization approach. Results from a simulation study and a real world example
are also presented to illustrate the performance of the proposed estimators.
ii
Acknowledgments
The completion of this dissertation would not have been possible without the support
and encouragement from the following individuals who have been instrumental in my grad-
uate studies at Auburn University.
To my advisor, Dr. Nedret Billor, I am thankful for the many discussions and insights
provided in this work. For stirring the interest in this  eld of research, for knowing how
to keep me going and for always being available with immeasurable guidance, support and
encouragement. I sincerely appreciate that. To the supervisory committee made up of Drs.
Ash Abebe and Peng Zeng, whose knowledge and advice in this work were invaluable.
To all my fellow classmates, and in particular Guy-vanie Miakonkana, Achard Bindele,
Jasdeep Pannu, Brice Nguelifack, Pallavi Sawant, Ruchika Sabharwal and Julian Allagan;
the journey would not have been the same without you. To Rose and Dan Brauss, you were
God-sent! No words could ever describe how much I appreciate you both. For always being
there and believing in me, especially when I became discouraged. The road trips, the movie
nights, the endless talks, the many many many wonderful memories - I am forever grateful.
To my amazing sister Chiedza, you kept me sane through it all. Thank you for indulging
me and being everything a sister could ask for and more. To my brothers Godfrey, Ngo-
nidzashe and Michael, I appreciate your support and prayers. To my mom, for nurturing
my love for numbers at an early age and guiding me to the One who gave me wings to  y.
And to my dad, for seeing so much more in me than I ever did and showing me that even
the sky is not the limit!
This work is dedicated to the memory of my grandfather, the late B.M. Denhere, who
encouraged us to pursue education and excel in all we do. I know you would be proud
Sekuru. To God be the Glory, great things He has done! (Psalm 100:5)
iii
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Functional Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Estimation of Smooth Functions . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Smoothing with Roughness Penalty . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Functional Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Estimation of Regression Parameters . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Principal Component Estimation . . . . . . . . . . . . . . . . . . . . 15
2.3.3 Ridge Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Robust Principal Component Functional Logistic Regression . . . . . . . . . . . 24
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Estimation of X(t) and  (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Model Selection Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
iv
3.4.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Canadian Weather Data Set . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Robust Penalized Functional Logistic Regression . . . . . . . . . . . . . . . . . . 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Estimation of X(t) and  (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Robust Penalized Maximum Likelihood Estimation . . . . . . . . . . . . . . 48
4.3.1 Details of the Estimation Technique . . . . . . . . . . . . . . . . . . . 50
4.3.2 Tuning Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.2 Poblenou NOX Levels Data Set . . . . . . . . . . . . . . . . . . . . . 55
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 Diagnostic Methods for the Functional Logistic Model . . . . . . . . . . . . . . 60
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Measures of Goodness-of-Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Regression Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4.1 Canadian Weather Diagnostics . . . . . . . . . . . . . . . . . . . . . . 67
5.4.2 Poblenou NOX Emission Diagnostics . . . . . . . . . . . . . . . . . . 68
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
v
List of Figures
2.1 50 sample curves generated from the stochastic process X(t) = Z(t) +t=4 +E . 20
2.2 Comparison of the median (t) estimates for the three di erent techniques against
the true parameter function sin(t+ =4). . . . . . . . . . . . . . . . . . . . . . 22
2.3 Median  (t) estimates (a) Maximum Likelihood Estimation (b) Principal Com-
ponent Estimation and (c) Ridge Estimation . . . . . . . . . . . . . . . . . . . . 23
3.1 50 sample curves generated from the overlapping stochastic process Xi(t) = ai1 +
ai2t+Wi(t) and di erentiated for each class of the response . . . . . . . . . . . 32
3.2 Sampling curves for Model 1 - 4 at q = 5% . . . . . . . . . . . . . . . . . . . . . 34
3.3 Side-by-side boxplots on the e ect of outlier curves on maximum likelihood esti-
mate (MLE),classical PCA (CPCA) and robust PCA (RPCA) estimates of  0 at
q = 5% contamination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Overlay comparison of the e ect of outlier curves on classical PCR and robust
PCR when the  rst 4 PCs are included in the model at q = 5% contamination. . 38
3.5 The mean monthly temperatures for 23 Canadian weather stations used to predict
the risk of drought . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Churchill weather station?s temperatures are altered . . . . . . . . . . . . . . . 40
3.7 Parameter estimation without using principal component estimation . . . . . . 41
vi
3.8 Parameter estimation when Churchill weather station?s temperatures are shifted
and its e ect on the principal component estimation methods . . . . . . . . . . 42
3.9 CPCA vs RPCA when Churchill weather station?s temperatures are shifted . . . 42
4.1 Comparison of the penalized method and robust penalized method at di ering
contamination levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 NOX emmission levels for non-working (top) and working (bottom) days . . . . 57
4.3 Functional form of the hourly trajectories with the outliers detected in the NOX
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 A comparison of the estimated  (t) with and without the 5 outliers for the non-
robust and robust penalized methods . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1 Index plots for the Canadian Weather data comparing residuals of the principal-
component based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Weather stations with the larger residuals shown as peaks in Figure (5.1) . . . . 69
5.3 Diagnostic plots that show the in uence of each observation on the di erent
diagnostic measures versus hii . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Index plot of the deviance residuals for the NOX penalized maximum likelihood
 tted models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 Diagnostic plots that show the in uence of each observation on the di erent
diagnostic measures versus hii for the NOX model . . . . . . . . . . . . . . . . 72
vii
List of Tables
3.1 Median MSEB (standard error) for the estimation of the functional parameter
based on the optimum model for Model 1 and Model 2 . . . . . . . . . . . . . . 35
3.2 Median MSEB (standard error) for the estimation of the functional parameter
based on the optimum model for Model 3 and Model 4 . . . . . . . . . . . . . . 35
3.3 The model details for each estimation technique . . . . . . . . . . . . . . . . . . 40
3.4 Goodness of  t measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1 Mean MSEB (standard error) for the estimation of the functional parameter,
 (t), for Models 2, 3 and 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
viii
Chapter 1
Functional Data Analysis
1.1 Introduction
In the last decade, a substantial amount of attention has been drawn to the  eld of
functional data analysis; resulting in the development and generalization of many statistical
techniques to this type of data. Much work has been devoted to this  eld with publications
on functional regression models forthcoming from James (22), Cardot and Sarda (4); M uller
and StadtM uller (29); Escabias et al. (9); Ferraty and Vieu (13) to name just a few. Ramsay
and Silverman (34) present functional data ideas and statistical methods in their book which
gave impetus to the functional data analysis community. Interest has been from a broad
spectrum of  elds such as biometrics, genetics, e-commerce and computer science. Statistical
tools, models and methods, whose strength is in recognizing this structural aspect of the data
have been discussed; ranging from functional linear regression, functional ANOVA, functional
principal component analysis and functional outlier detection. In fact, functional regression
methods have resulted in a re-look at some of the ways used to analyze longitudinal data.
There are many steps that arise in the analysis of functional data that di erentiate it
from analysis approaches taken for non-functional data. An important aspect to understand
when it comes to functional data is primarily in observing properties of functional data.
The term ?functional? is used to describe the intrinsic structure of the data; one thinks of
the observed data functions as whole entities and the analysis is based on these functions.
Naturally, the functional data is observed and recorded as discrete observations. However,
this data is viewed as having been generated by some function and so the analysis is made on
the function rather than viewing the data as a sequence of individual observations. Ramsay
and Silverman (34) discuss the details of the  rst steps in functional data analysis (FDA)
1
which include data representation issues, data registration and plotting pairs of derivatives.
In this chapter, we look at some of the major concepts that are integral in FDA. In particular,
we explore issues on estimation of the observed function from the discrete time points.
1.2 Estimation of Smooth Functions
We have already established that whilst analysis in FDA is based on observed functions,
in practice the observed data is recorded as discrete observations. Therefore, for n obser-
vations, discrete recordings of (tij;Xij) pairs are made and Xij is viewed as a snapshot of
the function Xi(t) at time tj where i = 1;:::;n represents the number of observations and
j = 1;:::;ni represents the number of replicates for each observation. It?s important to note
that time is not the only continuum over which functional data can be recorded. Other
continua such as spatial, position and frequency may be involved and the continuum time is
being used loosely.
An initial step in FDA is the estimation of the function Xi(t) based on the observed
discrete points Xi(tj). We assume that the function Xi(t) is smooth and therefore possesses
one or more derivatives. Without the smoothness property, not much is gained by treating
the data as functional rather than multivariate. This smoothness property ensures that any
two adjacent data values Xi(tj) and Xi(tj+1) are unlikely to be very di erent from each
other and are linked together to a certain extent. To simplify notation for this chapter,
we will denote the function to be estimated as X(t) since the estimation of the functional
observation from the discrete data is done independently for each i. There are several issues
to consider in estimating the smooth function X(t) from the discrete observed data. The
actual recorded data might be far from being smooth due to the presence of measurement
error; the data might be sparsely sampled or few in number making it di cult to adequately
give a stable estimate of the function; the time points at which the data is recorded might
be unevenly spaced. All these and other issues call for techniques that work around these
speci c problems with the sampled data.
2
Since smoothness is a property that is essential for the function X(t), there is need to
consider smoothing techniques in estimating this function from the raw discrete data in the
presence of measurement error or noise. The observed discrete data is expressed as,
Xj = X(tj) +"j; j = 1;:::;ni; i = 1;:::;n;
where the measurement error "j causes a roughness to the observed data. One might choose
to  lter out the noise or to alternatively leave the noise but ensure that the smoothness
property is met in the analysis of the results. An approach taken in this dissertation work
in estimating the smooth function, X(tj), is by way of some basis function.
By de nition, a basis function system is a set of known functions,  k, that are mathemat-
ically independent from each other with the property that one can approximate arbitrarily
well any function by taking a linear combination of a su ciently large number of these func-
tions. Therefore, the basis function approach represents a function X(t) (in L2(T)) in terms
of K known basis functions  k as,
X(t) =
KX
k=1
ck k(t);
= c0 ; (1.1)
where c and  are column vectors of length K. Two important issues arise from this basis
representation of a function. One of the important issues is the choice of the basis function
system. The determination of the appropriate basis function system is based on the charac-
teristics that are inherent in the data. In general, Fourier basis functions are used to model
periodic data and B-spline basis is used for non-periodic data. Other basis choices such as
wavelets, trigonometric functions or even polynomial functions can also be used should that
be appropriate (Ramsay and Silverman (34)).
3
A second important aspect in this representation is the determination of the dimension
of the expansion, K. An interpolation is achieved when K = ni. Therefore, K can be viewed
as a smoothing parameter, since the degree to which the data, Xj is smoothed is determined
by the number of basis functions. When an appropriate basis function system is selected
for the observed data, fewer basis functions are required to make a good approximation of
the function, i.e. the smaller K is. There are two opposing discussions when it comes to
choosing the number, K, of basis functions. On the one hand, the larger the K, the better
the  t. This is however, at the expense of  tting unnecessary noise that should ideally be
ignored. On the other hand, the smaller the K, the smoother the function (Ramsay and
Silverman (34)). The problem with this is that if K is made to be too small, then we risk
missing out on some important aspects of the functions that is being estimated. Therefore, a
trade-o between  t and smoothness is necessary. An assortment of algorithms exist for this
purpose; including methods such as cross-validation and generalized cross-validation (Craven
and Wahba (7)).
In order to estimate the coe cients of the linear expansion in (1.1), the ordinary least
squares criterion can be used. This criterion is expressed as,
SMSSE(Xjc) =
niX
j=1
"
Xj 
KX
k=1
ck k(tj)
#2
= (X  c)T(X  c): (1.2)
From this, the estimate, ^c, that minimizes SMSSE is ,
^c = ( 0 ) 1 0X:
This approach is appropriate when we assume the standard model for error, i.e. the errors,
"j are independently and identically distributed with mean zero and constant variance  ".
In the presence of non-stationary and/or autocorrelated errors, the weighted least squares
4
 t would be more appropriate. Thus, the weighted least squares estimate is,
^c = ( 0W ) 1 0WX;
where W is a symmetric positive de nite matrix that incorporates the unequal weighting of
the errors.
The next section looks at a more powerful option for approximating discrete data by a
function. The shortcomings of using (weighted) least squares criterion is in that the degree
of smoothness is established in a discontinuous manner and better results can be established
with a roughness penalty.
1.3 Smoothing with Roughness Penalty
As discussed in the previous section, there are two competing objectives in function
estimation: good  t vs. smooth  t. The competing objectives can be viewed in terms of the
following statistical principle,
MSE = Bias2 +Variance;
where MSE is the mean squared error; and bias and sampling variance are de ned as,
Bias[ ^X(t)] = X(t) E[ ^X(t)];
Var[ ^X(t)] = E[( ^X(t) E[ ^X(t)])2]:
Whilst an unbiased estimate is desirable, when the standard error model is not met, the
high variance associated with the curve will result in a compromised MSE. Therefore, the
MSE might be drastically reduced by sacri cing some bias so as to reduce some sampling
variance. This is achieved by imposing some smoothness to the estimated curve. Therefore,
the roughness penalty makes explicit what is sacri ced in bias to achieve an improved MSE.
5
The penalized residual sum of squares, de nes the compromise of trading smoothness
for  t as,
PENSSE(Xjc) = (X  c)TW(X  c) +  Pen(X(t));
where c is the coe cient vector;  is the appropriate basis vector; W is a symmetric positive
de nite matrix that incorporates the unequal weighting of the errors;  is a smoothing
parameter that measures the rate of exchange between the  t to the data; and Pen(X(t)) is
a measure of the function?s roughness. The curvature of a function is usually used to quantify
the idea of the roughness of a function. Therefore, a natural measure of a function?s roughness
is given as,
Pen(X(t)) =
Z
T
h
X00(s)
i2
ds
= c0Pc;
where
P =
Z
T
 00(s)f 00(s)gTds;
is the roughness penalty matrix. Thus, the objective function being optimized becomes,
PENSSE(Xjc) = (X  c)TW(X  c) +  c0Pc;
resulting in an estimate of the coe cient vector being,
^c = ( 0W + P) 1 0WX:
When  = 0, this reduces to the weighted least squares estimate. It can also be noticed
that as  !1, and more weight is given to the roughness term, the estimated function
6
approaches the standard linear regression; and as  !0, the estimated function approaches
an interpolation of the data. The selection of  can be achieved by cross-validation and
generalized cross-validation methods which will be discussed in more detail in later chapters
of this dissertation work.
1.4 Conclusion
In this chapter we introduced the concept of functional data and discussed some of the
pertinent issues in the  eld of functional data analysis. The rest of this work is organized
as follows. Chapter 2 gives some background to the functional logistic model and discusses
some of the estimation techniques in literature as well as making a case for the contribution
of this work. Chapter 3 explores the concept of robust principal component estimation which
is one of the results of this work. The estimation technique is developed and its performance
considered in a simulation study as well as with a real world example. Chapter 4 examines
and proposes a robust penalized approach to the estimation of the parameter function in the
functional logistic model, with simulation results and application to a real world example.
Chapter 5, studies diagnostic issues for the functional logistic model and discusses some
model validation methods, extending results from the standard logistic model. In the last
chapter, there is a discussion of other aspects of the functional logistic model that may be
explored as future work.
7
Chapter 2
Functional Logistic Regression
2.1 Introduction
Ramsay and Silverman (34) discussed functional linear regression models, a case in which
the linear relationship between random variables and functions is explored. The di erent
models that can arise from this set-up include,
 A functional response and a scalar independent variable(s). In this model, the observed
values are of the formfXij;Yi(t) : t2Tgfor i = 1;:::;n, j = 1;:::;p and where T is the
support of the functional predictors. The relationship between the functional response
and scalar predictors can be formulated as,
Yi(t) =
pX
j=1
Xij j(t) + i(t); i = 1;:::;n:
 A scalar response and a functional independent variable(s). The observed values in
this case are of the form fXi(t);Yig for i = 1;:::;n and this relationship formulated as
Yi =
Z
T
Xi(t) (t)dt+ i; i = 1;:::;n:
 A functional response and a functional independent variable(s). This last form occurs
when both the response and predictor are functional, i.e. fXi(t);Yi(t)gfor i = 1;:::;n.
The model is de ned as,
Yi(t) =
Z
T
Xi(t) (t)dt+ i(t); i = 1;:::;n:
8
Our work is focused on the functional logistic regression (FLR) model, where the re-
sponse is binary and the predictor(s) functional. Di erent approaches have been developed
in the estimation methods of the functional parameters of the functional logistic model.
Escabias et al. (9) discuss two approaches of parameter estimation that employ principal
component estimation. James (22) and M uller and StadtM uller (29) discuss the generalized
functional linear model and consider estimation methods for its parameters. These estima-
tion techniques, however, are not resistant to outliers and thus there is a need for some
robust estimation methods.
We focus on functional logistic regression in particular as there are some interesting
problems in FDA that could bene t from this model. In addition to modeling functional
covariates with binary responses, functional logistic regression is also useful for classi cation
methods. An increasing amount of research is focusing on functional logistic regression and
its application to functional Magnetic Resonance Imaging (fMRI) data. Ratcli e et al. (36),
Reiss et al. (37), Escabias et al. (10), Leng and M uller (25) and Tian (41) are some examples
of work that deal with applications of the functional logistic model to a variety of areas
including fMRI data.
The presence of outliers in any data set is almost inevitable and their e ect on the
model unquestionable. One school of thought proposes developing outlier detection methods
and using these to eliminate identi ed outliers when  tting the model. However, there are
many drawbacks to this approach. Firstly, there might be sample curves that are just at the
limit of outlyingness, and their removal from the data set might be too drastic an approach
to make. Over and above this, the process might need to be repeated before the data
set can be declared outlier-free. Secondly, there might be no consistency among di erent
researchers in the approaches taken to eliminate outliers and therefore, there is an element
of subjectivity. Lastly, making inferences from the model where data has been removed
causes problems in that regard due to the dependence between the ?good? observations and
the ?bad? observations (Victoria-Feser (44)). Therefore, robust estimation methods are more
9
viable in that the e ect of the outliers is minimized without removing them from the data
set. Whilst there has been some work that addressed estimation of the parameter function
for this model as cited above, to our knowledge there?s no contribution to the  eld as far
as robust estimation methods for this model is concerned. It is with this void in mind that
this dissertation work is dedicated to contributing to the  eld of functional data analysis by
proposing robust estimation methods for the functional logistic model.
2.2 The Model
We consider having independent functional covariates as X1(t);:::;Xn(t) where t2T
and T is the support of the sample curves Xi(t), i = 1;2;:::;n; and a random sample of
a binary response variable Y to be Y1;:::;Yn, that is Yi 2f0;1g; i = 1;2;:::;n. Then the
random variable Y is such that the observations,
Y  Bernoulli( i)
where  i is the probability of a positive response given Xi(t) which is given as,
 i = P(Y = 1jfXi(t) : t2Tg)
= expf 0 +
R
T Xi(t) (t)dtg
1 +expf 0 +RT Xi(t) (t)dtg; i = 1;:::;n:
 0 is a real parameter and  (t) is a smooth function of t, of which both are unknown
parameters. The logit transformation is,
li = log
  
i
1  i
 
=  0 +
Z
T
Xi(t) (t)dt; i = 1;:::;n: (2.1)
10
This can be viewed in the more general sense as a generalized functional linear model with
the link as the logit function. In this work, we consider the no-intercept model (i.e.  0 = 0).
2.3 Estimation of Regression Parameters
Estimation of the parameters cannot be achieved by the usual method of maximum
likelihood [Ramsay and Silverman (34)] resulting in a di erent approach for these functional
data. In particular, we consider the functional covariates, Xi(t), and functional parameter,
 (t), as belonging to a  nite-dimension space generated by the same (not necessarily) basis
f j(t)gj2N. We assume Xi(t)2L2(T) of squared integrable functions with the inner product,
hf;giu =
Z
T
f(s)g(s)ds; 8f;g2L2(T);
such that
Xi(t) =
KXX
j=1
cij j(t); i = 1;:::;n; (2.2)
where f j(t)g is an appropriate basis which is selected to re ect the characteristics of the
data.
With this assumption, we are able to reconstruct the functional form of the functional
covariates from the observed discrete points using two di erent approaches. On the one
hand, if the functional covariate(s) is observed with some measurement error, then the ith
subject at the kth replication is
Xik = Xi(tk) +"i(tk); k = 0;1;::;ni; i = 1;:::;n;
where "(t) and X(t) are independent. In this case where the functional covariate is observed
with some noise, we use some least squares approximation approach to obtain the functional
11
form of the covariates by approximating the basis coe cientfcijgfrom the discrete observa-
tions. On the other hand, if the functional covariate(s) is observed without error, then the
ith subject at the kth replication is
Xik = Xi(tk); k = 0;:::;ni; i = 1;:::;n:
In this case, some interpolation method such as cubic spline interpolation can be used to get
the functional form of the predictors.
We also de ne,
 (t) =
KbX
k=1
bk?k(t); (2.3)
where ?k(t); k = 1;:::;Kb is a known basis function.
Since estimates for the basis coe cient fcijg can be found either by smoothing (as
discussed in Chapter 1) or interpolation, using the basis expansion of the covariates and
parameter function as de ned in (2.2) and (2.3) in the regression model (2.1), the functional
logistic model reduces to a standard multiple logistic one,
li =  0 +
Z
T
Xi(t) (t)dt
=  0 +
KXX
j=1
KbX
k=1
cij jkbk; i = 1;:::;n;
where  jk = RT  j(t)?0k(t)dt for j = 1;:::;KX;k = 1;:::;Kb; cij is the basis coe cient;  0 is an
unknown real parameter; bk is the unknown basis coe cient used to estimate the parameter
function  (t). In matrix form, this can be written as,
L =  01 +C b; (2.4)
where L= (l1;:::;ln)0, C=fcijgn KX,  =f jkgKX Kb, 1= (1;:::;1)0 and b = (b1;:::;bKb)0.
12
Three di erent approaches of estimation can be discussed for this model (2.4). Firstly,
there is the maximum likelihood estimation (MLE) which results in unstable and inaccurate
estimators due to the higher collinear nature of the covariates. To deal with this shortcoming
of the MLE, the second approach of principal component estimation as discussed by Escabias
et al. (9) can be considered. Alternatively, a third approach is the penalized maximum like-
lihood estimation approach which is also explored in this work; and an empirical comparison
of the three approaches is given by way of a simulation study.
2.3.1 Maximum Likelihood Estimation
The FLR model in (2.4) is now in the same form as the standard logistic regression model
and, therefore, the maximum likelihood parameter estimates can be found. The responses,
Yi, are assumed to follow a Bernoulli( i) distribution, such that the log-likelihood function
can be written as,
L( ;Y) =
nX
i=1
 
Yilog
  
i
1  i
 
+log(1  i)
 
: (2.5)
The derivative of this log-likelihood function w.r.t.  i is
@L
@ i =
nX
i=1
Yi  i
 i(1  i):
Using the fact that,
log
  
i
1  i
 
= li =  0 + C0i b;
13
together with the chain rule, the derivative of the log-likelihood w.r.t.  0 is,
@L
@ 0 =
nX
i=1
@L
@ i
@ i
@l
@l
@ 0
=
nX
i=1
Yi  i
 i(1  i)   i(1  i) 1
=
nX
i=1
(Yi  i):
Similarly, the derivative of the log-likelihood w.r.t. b is,
@L
@b =
nX
i=1
@L
@ i
@ i
@l
@l
@b
=
nX
i=1
Yi  i
 i(1  i)   i(1  i) C
0
i 
=
nX
i=1
(Yi  i)C0i :
Therefore the likelihood equations for model (2.4), are:
nX
i=1
[Yi  i] = 0: (2.6)
nX
i=1
h
(Yi  i)C0i 
i
= 0: (2.7)
Since these equations are non-linear in the parameters of interest,  0 and b, an iterative
method is used to solve these equations and get estimates of the parameters. One such
method is the Newton-Raphson method which is used to solve these non-linear likelihood
equations,
X0(Y  ) = 0;
14
where from (2.4), X=(1j C ), Y= (Y1;:::;Yn)0 and  = ( 1;:::; n)0. The approximated
parameter function estimate for our functional logistic model will then be,
^ (t) =
KbX
k=1
^bk?k(t) (2.8)
= ^b0?;
where ? is a known basis function and ^b is the maximum likelihood estimate of the reduced
functional logistic model (2.4).
The estimation of the parameter function obtained using the maximum likelihood ap-
proach is not very accurate in the presence of highly correlated data. In fact, the design
matrix, C in (2.4),has high correlation among its columns. Thus, there is a need to elimi-
nate multicollinearity in order to obtain a more reliable estimation of the parameter function,
 (t) in our functional model. One such approach is discussed by Escabias et al. (9) and uses
the idea of principal component estimation. Another well-known adapted approach which
we explore in this chapter is ridge estimation.
2.3.2 Principal Component Estimation
The reduced functional logistic model in (2.4) is considered and in order to deal with the
multicollinearity problem highlighted, the principal components (PCs) of the design matrix
C are utilized instead. We let Z = f ijgn Kb be the matrix of PCs of the design matrix,
such that
Z = C V;
15
where V is a Kb Kb matrix whose columns are the eigenvectors associated with the eigen-
values of the covariance matrix of C . Then the model becomes,
L =  01 + Z ; (2.9)
where  = V0b. As with the MLE, the  (t) parameter function is then estimated as,
^ (t) = ^b0?;
where ^b = V ^ . A selected number of PCs are included in the model instead, so that (2.9)
becomes,
L(s) =  01(s) + Z(s) (s); (2.10)
where s denotes the number of selected PCs retained in the model. The number of se-
lected PCs can be determined either by the cross-validation method or the generalized cross-
validation method. These and other methods for selecting the number of PCs are discussed
in greater detail in Chapter 3.
2.3.3 Ridge Estimation
A second approach that can be used to deal with the highly correlated covariates in
model (2.4), would be in the form of penalized maximum likelihood estimation. Le Cessie
and van Houwelingen (23), extended the ridge regression theory used in standard linear
regression to logistic regression. We adopt this approach to de ne a ridge estimator for the
reduced functional logistic regression model. In earlier work, Schaefer (39) also discussed
the ridge logit estimator to handle collinear data. In that work, the parameter estimator is
16
given as,
^  = (X0VX + I) 1X0VX^ MLE;
where ^ MLE denotes the maximum likelihood estimator of the parameter; X is the design
matrix; V is the diagonal matrix of MLEs of success probabilities; I is the identity matrix
and  is the ridge parameter.
We consider the reduced functional logistic regression model (2.4) obtained by consid-
ering the functional observations, Xi(t), and the parameter function,  (t), as belonging to
the  nite-dimensional space generated by the bases f j(t)gKXj=1 and f?k(t)gKbk=1, respectively.
Under this model, the log-likelihood function as stated earlier is,
L( ;Y) =
nX
i=1
 
Yilog
  
i
1  i
 
+log(1  i)
 
:
Using the approach discussed by Le Cessie and van Houwelingen (23), we consider the
log-likelihood function above with a penalty on the norm of b, the unknown parameter in
model (2.4) as,
L ( ;Y) = L( ;Y)  kbk2; (2.11)
where kbk=
 X
k
b2k
!1
2
. Thus, instead of maximization of the log-likelihood function(2.5),
the maximization is on L ( ;Y) with a penalty on the norm of b. In this case,  controls
the shrinkage of the norm of the parameter b. When  = 0, the solution will be the ordinary
MLE. As the ridge parameter,  , approaches 1, the bk?s tend to 0. As with the concept of
ridge regression, the idea is to introduce some bias so as to get a biased estimator but whose
variance is smaller than that of the unbiased estimator (MLE in this case). The choice of  
would be such that MSE( ^  ) <MSE( ^ MLE).
17
Therefore the likelihood equations for model (2.11), are:
nX
i=1
[Yi  i] = 0:
nX
i=1
h
(Yi  i)C0i 
i
 2 b = 0: (2.12)
These likelihood equations, which are non-linear in the parameters, are solved by some
iterative method to obtain the ridge estimator, ^b . Much discussion on the choice of the
ridge parameter,  , has been made and one approach is based on minimizing an estimate
of the prediction error of the model. Le Cessie and van Houwelingen (23) consider di erent
measures that quantify the prediction error. Such measures include,
(i) Classi cation Error (CE),
CE =
8
>>>
><
>>>
>:
1 : Ynew = 1 and ^ new < 0:5
1
2 : ^ new = 0:5
0 : otherwise
(ii) Squared Error,
SE = (Ynew ^ new)2:
(iii) Minus Log-Likelihood Error
ML = fYnewlog(^ new) + (1 Ynew)log(1 ^ new)g;
where Ynew denotes the response for a new observation and ^ new denotes the probability
that the response is positive for a new covariate Xnew(t). Other measures include use of
the cross-validation method and this criterion is the one we used in the simulation study
18
presented in the next section. This is de ned for the mean squared error as,
MSECV = 1n
nX
i=1
fYi ^ ( i)g2;
where ^ ( i) is the estimated probability of a positive response, when the model is  t without
the ith observation.
We compare the performance of each of these estimation methods discussed in this
section by way of a simulation study.
2.4 Simulation Study
The following steps were carried out in this simulation study in order to investigate
how the di erent estimation techniques discussed in the previous section compare in the
estimation of the parameters of the functional logistic regression model.
Step 1: Generate the simulated data.
(a) Generate n sample curves for the functional logistic model.
We generate 50 sample functional observations of a known stochastic process X( ) con-
sidered over the interval [0;10] which has 21 equally spaced time points. We de ne this
stochastic process as X(t) = Z(t)+t=4+E where Z(t) is a zero mean Gaussian stochastic
process with covariance function C(s;t) = (12) (12)80js tj and E is a Bernoulli random
variable with p = 0:1. This model is illustrated in Figure 2.1. To obtain the functional
form of these sample curves, we consider them to belong to the  nite-dimensional space
generated by a cubic B-spline basis de ned on equally spaced nodes on the [0;10] inter-
val. We used the generalized cross-validation (GCV) method to determine the number
of basis functions (KX) to use as well as the smoothing parameter ( ) in the estimation
of the functional form of the covariate, X(t). The criterion is expressed as,
GCV( ) =
 n
n df( )
  SSE
n df( )
 
19
Figure 2.1: 50 sample curves generated from the stochastic process X(t) = Z(t) +t=4 +E
where df( ) =tracefS ; gis the equivalent degrees of measure and S ; is the projection
operator such that ^X = S ; X ; SSE is the residual sum of squares; and  is the
smoothing parameter that measures the rate of exchange between the  t to the data.
The optimal number of basis was achieved at KX = 10 with  = 2048, which is what
was used for all the replications.
(b) Select a parameter function.
The natural cubic spline interpolation of the parameter function sin(t+ =4) is selected.
The basis coe cients, b = (b1;:::;bKb)0 of the parameter function are known and used
to assess estimation techniques presented in this work.
(c) Generate the values of the response variable.
The probabilities are,
 i = expflig1 +expfl
ig
;
20
where li is as de ned in (2.1),  0 is  xed at 0 and i = 1;::;n. The n values of the response
are obtained by simulating observations of a Bernoulli distribution with probabilities  i.
The reduced functional logistic model in (2.4) was  t with the coe cient matrix as the
covariate. The Hosmer-Lemeshow goodness of  t test was also carried out [Hosmer and
Lemeshow (20)], which con rmed validity of the model for the generated data.
Step 2: Obtain the approximated estimations for the parameter function.
(a) Use maximum likelihood estimation with the coe cient matrix as the covariate of the
logistic model.
(b) Perform PCA on the covariates in (2.4) and  t the logistic model with the retained s
PCs as the covariates.
(c) Use penalized maximum likelihood estimation with the coe cient matrix as the covari-
ate.
Small values of MSEB indicate better estimation when the three di erent estimation
techniques are compared.
All simulations were implemented using R [R Development Core Team (33)]. The pack-
ages fda[Ramsay et al. (35)], rrcov[Todorov (42)] and penalized [Goeman (17)] were partic-
ularly useful in obtaining the functional form of the data; performing principal component
estimation and penalized maximum likelihood estimation, respectively. The cross-validated
likelihood criterion was used to determine the optimal shrinkage parameter for the penalized
ML estimation.
The median estimates of  (t) for 250 simulations are shown in Figures (2.2) and (2.3),
illustrating how the estimates compared with the true estimates of the function parameter.
As is evident in the graphs, the ML estimation method does not give accurate estimates since
no consideration is taken for the multicollinearity issue that is inherent in the design matrix.
21
Figure 2.2: Comparison of the median  (t) estimates for the three di erent techniques
against the true parameter function sin(t+ =4).
Both the PC-based and penalized ML estimation approaches give more accurate results,
with the estimates of the parameter function being much closer to the true  (t) function.
This is especially evident when the mean  (t) estimates (results not shown) are used. The
median MSEB for each of the techniques is 2:1311 for MLE; 0:1762 for PC estimation; and
0:2934 for the ridge estimation.
2.5 Conclusion
In this chapter, we have introduced the FLR model and reviewed some of the estimation
methods discussed in literature. It has been shown that due to the formulation of the design
matrix of the FLR model developed in this work, the estimation of the parameter function,
 (t), is greatly a ected by the presence of multicollinearity. Two approaches discussed to
minimize the collinear covariates were those of principal component estimation [Escabias
et al. (9)] and of penalized maximum likelihood estimation. These alternative approaches
provide more reliable estimates of  (t). These results are consistent with literature [Barker
22
(a) (b)
(c)
Figure 2.3: Median  (t) estimates (a) Maximum Likelihood Estimation (b) Principal Com-
ponent Estimation and (c) Ridge Estimation
and Brown (2), Le Cessie and van Houwelingen (23), Escabias et al. (9) and V ag o and
Kemen ey (43)] and we conclude that estimation of the function parameter  (t) can be es-
tablished with better accuracy using principal component estimation or penalized maximum
likelihood estimation techniques. However, in the presence of outlying sample curves, there
is a need to consider robust estimation methods. In chapters 3 and 4, we turn our focus to
this by proposing robust methods for the functional logistic regression model.
23
Chapter 3
Robust Principal Component Functional Logistic Regression
3.1 Introduction
Inherent with most functional data, and the estimation techniques discussed in this work,
is the fact that we are often dealing with highly correlated data, which poses problems in
parameter estimation resulting in estimations that might be far from accurate and unreliable.
The presence of functional observations that deviate from the overall pattern of the data,
creates additional ine ciencies in many procedures. In this work, we propose a robust
estimation approach for functional data with a binary response and functional covariates that
addresses these inadequacies. To our knowledge, there is no work that has been done in this
area of robust estimation for the functional logistic regression model. There are some robust
methods proposed for the functional linear regression model and for functional principal
component analysis. Boente and Fraiman (3), Gervini (15), Gervini (16), Bali et al. (1),
Sawant et al. (38) and Lee et al. (24) are examples of such work. This work is di erent from
these others in that it looks at robust methods for the functional logistic regression model,
for which no known robust approach has been proposed. However, functional outliers are
inevitable and therefore it is important that robust techniques for this model be developed.
The  rst approach taken in this chapter to develop robust estimators is that of us-
ing robust principal component estimation. The functional logistic model is reduced to a
standard multiple logistic model by approximating the functional covariates as a linear com-
bination of an appropriate basis [Ramsay and Silverman (34)]. This estimation approach of
the functional data results in collinear and (potentially) high dimensional data and there-
fore, an initial focus is the elimination of such problems. Some of the proposed approaches
in literature that eliminate these issues are by way of principal component estimation or
24
ridge estimation as discussed in Chapter 2. However, in the presence of atypical curves,
these estimations are unstable and inaccurate. Febrero et al. (11) de ne a functional outlier
as that curve that has been generated by a stochastic process with a di erent distribution
than the rest of the curves, which are assumed to be identically distributed. This de nition
appears to be all-encompassing as it refers to those curves that could be far away from most
of the curves; curves that have a di erent pattern from the rest or even those curves that
are atypical in some sub-interval of the period of interest. With real word data, outliers are
inevitable and this makes it necessary to develop robust estimators, of which we consider
robust estimation techniques using principal component estimation.
3.2 Estimation of X(t) and  (t)
We consider the functional covariates, Xi(t), and functional parameter,  (t), as belong-
ing to a  nite-dimensional space generated by (not necessarily) the same basis f j(t)gKXj=1
andf?k(t)gKbk=1, respectively. We consider Xi(t)2L2(T) with the usual inner product, such
that
Xi(t) 
KXX
j=1
cij j(t); (3.1)
where  j(t); j = 1;:::;KX, is an appropriate basis, selected to re ect the characteristics and
main features of the data. As discussed in Chapter 1, the truncation lag, KX, is a parameter
that is selected based on the features and characteristics of the data. This determines the
dimension of expansion; the larger KX is, the better the  t to the data is. However, this can
result in problems of over- tting and therefore capturing noise or variation in the data that
might be ignored. Smaller KX on the other hand, whilst being desirable should not be too
small such that there is a risk of overlooking important features of the data.
The selection of the basis system is an important aspect in functional data in that the
features that might be evident in the observed data should be adequately met by the selection
25
of an appropriate basis system. A good basis system selection potentially results in a smaller
KX which means less computational time in estimating the functional covariate and ensuring
that the basis coe cients, cij, serve as interesting descriptors of the given data. In general,
the Fourier basis functions are used to model periodic data and the B-spline basis is used
for non-periodic data. However, this is by no means the status-quo and great consideration
needs to be made in this regard. Thus, with the assumption that Xi(t) belongs to the
Hilbert space, we are able to reconstruct the functional form of the functional covariates
from the observed discrete points using two di erent approaches - either by smoothing or
interpolating.
We also de ne the functional parameter,
 (t) =
KbX
k=1
bk?k(t); (3.2)
where ?k(t); k = 1;:::;Kb is a known basis function; and Kb is selected in such a way that
KX  Kb. Since the estimate for the basis coe cientfcijgcan be found either by smoothing
or interpolation, using the basis expansion of the covariates and parameter function as de ned
in (3.1) and (3.2) in the regression model (2.1), the functional model reduces to a standard
multiple one,
li =  0 +
Z
T
Xi(t) (t)dt
=  0 +
KXX
j=1
KbX
k=1
cij jkbk; i = 1;:::;n;
where  jk = RT  j(t)?0k(t)dt ;j = 1;:::;KX;k = 1;:::;Kb; cij is the basis coe cient;  0 is an
unknown real parameter; bk is the unknown basis coe cient used to estimate the parameter
function  (t). In matrix form, this can be written as,
L =  01 +C b; (3.3)
26
where L= (l1;:::;ln)0, C=fcijgn KX,  =f jkgKX Kb, 1= (1;:::;1)0 and b = (b1;:::;bKb)0.
We develop the principal component estimation technique discussed in Escabias et al.
(9) further to cater for cases where there are functional outliers in the data. Due to the fact
that principal component estimation makes use of the eigen decomposition of the covariance
matrix of the design matrix, C , the presence of outliers will greatly in uence the PCs.
This sensitivity to outliers results in the  rst few PCs being attracted towards the outliers,
and therefore this approach might miss the main modes of variability of the rest of the
observations.
Our proposed approach uses robust PC estimation techniques that eliminate multi-
collinearity and reduces the e ect of functional outliers, resulting in a more accurate esti-
mator in the presence of outliers. We use robust PCA methods on the covariate matrix
to obtain robust PCs which are used as the design matrix in the standard multiple logistic
model. Robust Principal Component Analysis, ROBPCA, by Hubert et al. (21) is one such
approach which uses projection pursuit and robust covariance estimation based on the Min-
imum Covariance Determinant (MCD) method. The MCD method is based on seeking an
h-subset of observations whose classical covariance matrix has the smallest determinant.
The three basic steps in ROBPCA are given in the following algorithm:
Input: Data matrix A = C n Kb where n is the number of observations and Kb
represents the initial number of variables.
Output: Robust PC scores Zn k0 where k0 < Kb is the number of eigenvectors re-
tained.
1. A singular value decomposition (SVD) of the data is performed so as to project the
observations on the space spanned by themselves. This step is especially useful when
Kb n as it yields a huge dimension reduction.
2. A measure of outlyingness is de ned for each data point. This is achieved by pro-
jecting all the points onto many univariate directions, v, and then determining the
27
standardized distance of each projected point to the center of the data. The h(< n)
least outlying data points are determined and retained, where outlyingness is de ned
as,
Out(ai) = maxvja
0
iv ^ rj
^ r ; i = 1;:::;n;
where ^ r and ^ r are the univariate MCD based location and scale estimates for the pro-
jected data points, a0iv, respectively. The h data points are projected on the subspace
spanned by the  rst k0 eigenvectors of the sample covariance matrix of the h-subset.
3. The covariance matrix of the mean-centered data matrix, A n k0, obtained in the second
step using the MCD estimator is robustly estimated and PCA is applied on to the data
matrix.
We consider the functional logistic regression model as de ned before in (2.1) where the
functional covariate is de ned as (3.1), resulting in the standard multiple logistic regression
(3.3). We let Z(r) =f ijgn k0 be the matrix of robust PCs of the design matrix, s.t.
Z(r) = A V(r);
where A is the design matrix as described in Step 3 of the ROBPCA algorithm; V(r) is a
Kb k0 matrix whose columns are the eigenvectors associated with the eigenvalues of the
robust covariance estimation based on the MCD of A n k0, the mean-centered data matrix.
The logit transformation of the reduced functional logistic model becomes,
L(s)(r) =  (s)0 1 + Z(s)(r) (s); (3.4)
where  = V0(r)b and s is the number of retained PCs in the model.
The design matrix of this model is now void of collinear columns, and also because of
the robust approach taken to compute the PCs, the e ect of outlying curves is minimized.
28
Therefore, the estimate of the functional parameter is given by,
^ (t) = ^b0?;
where ^b = V(r)^ .
3.3 Model Selection Issues
There are di erent criteria used in deciding which PCs should be included in the model.
The natural order is to include PCs based on the explained variability. There are other
criterion available as discussed by Hocking (19) and by M uller and StadtM uller (29) which
take into consideration the predictive ability of the PCs. In the simulation study carried
out in the next section, the simplistic natural order of explained variability is used to decide
which PCs to include in the model.
Another issue in model selection is the decision concerning the number of PCs to include
in the model. Some of the measures that can be used include the integrated mean squared
error of the  (t) parameter function (IMSEB). This is de ned as,
IMSEB(s) = 1T
Z
T
( (t) ^ (s)(t))2dt;
where ^ (s)(t) is the estimated parameter function for the logistic model with s PCs.
Another available measure is the mean squared error of  (t) parameters (MSEB), which
is de ned as,
MSEB(s) = 1K
b + 1
 
( 0 ^ (s)0 )2 + ( (t) ^ (t)(s))2
 
;
where Kb is the number of basis functions, ^ (s)0 is the estimated intercept in the model with
s PCs and ^ (t)(s) is the estimated parameter function in the functional logistic model with s
PCs. The optimal model is selected as that model whose IMSEB or MSEB is the smallest. In
29
the simulation study, we use the MSEB to determine s, the number of PCs to include in the
model. The PCs are then added based on the explained variability of each PC, starting with
the one with the largest explained variability until the optimal number of PCs is attained.
However, in the case of real data, both these methods cannot be obtained and therefore
more practical approaches are required. Escabias et al. (9) suggest the use of the estimated
variance of the estimated parameters which is de ned as,
Var( ^ (t)(s)) = V(s)(r)(Z0(s)(r) W(s)(r)Z(s)(r)) 1V0(s)(r) ;
where W(s)(r) = diag[^ (s)(1 ^ (s))], Z(s)(r) is the matrix of s robust PC scores from the design
matrix and V(s)(r) is the matrix whose columns are the eigenvectors associated with the eigen-
values of the robust covariance estimation based on the MCD of the design matrix. Escabias
et al. (9) suggested selecting the optimum number of PCs by plotting the estimated variance
against the number of PCs, s, and selecting s just before a signi cant increase in the esti-
mated variance. In the case where there are potentially multiple cases like this, the smallest
s is then selected. The percent of variance explained (PVE) is another alternative approach
to determine the number of PCs to retain in the model. In this case, PCs with eigenvalues
greater than 1.0 can be retained. In both these instances, the idea is to ensure that much of
the variability in the model is retained.
Another method is the cross validation (CV) method. Cross validation involves parti-
tioning the data into two sets; the  rst set known as the training set is used to determine a
predictive model whilst the second set, known as the test set is used to validate the predic-
tive model. The leave-one out cross validation method leaves out one observation and  ts
the model with the remaining n 1 observations. Prediction is then made for the left-out
observation using this model, and this procedure is repeated for all the observations. For
30
the logistic model, this is de ned as,
CV(s) = 1n
nX
i=1
(yi ^ (s)i; i)2;
where ^ i; i indicates the predicted response with observation i missing from the predictive
model. An optimal number of PCs are selected as those with a minimum CV. The Informa-
tion Criterion (IC) method is another alternative. The IC can be viewed as a compromise
between the goodness of  t and the complexity of the model. The Akaike Information Cri-
terion (AIC) and Bayesian Information Criterion (BIC) are some of the widely used IC in
which case to select the optimal number of PCs, the IC is computed for varying s values and
the optimal s would then be chosen at the minimum.
3.4 Numerical Examples
In this section, we study the performance of our proposed estimation approach by way
of a simulation study, as well as applying the methodology to the Canadian Weather data.
We show the improved accuracy in the parameter function estimation in the case where
outliers are present.
3.4.1 Simulation Study
The following steps were carried out in this simulation study in order to investigate how
the proposed estimation technique performs and compares to other existing approaches in
the estimation of the parameter function of the functional logistic regression model. In the
 rst step to generate the data, the following were done,
(a) Generate n = 50 sample curves for the functional logistic model.
We generate 50 sample functional observations of a known stochastic process X( ) con-
sidered over the interval [0;10] which has 21 equally spaced time slots. We de ne this
31
Figure 3.1: 50 sample curves generated from the overlapping stochastic process Xi(t) =
ai1 +ai2t+Wi(t) and di erentiated for each class of the response
process as,
Xi(t) = ai1 +ai2t+Wi(t)
Wi(t) =
10X
r=1
bi1sin(2 10rt) +bi2cos(2 10rt)
where ai1  U[1;4] or ai1  U[2;4], ai2  N[1;0:2] or ai2  N[1;0:6], bi1;bi2  N[0;1=r2].
This model is illustrated in Figure 3.1. Since the response is binary, the sample curves
are di erentiated for the two classes. To obtain the functional form of these sample
curves, we consider them to belong to the  nite-dimensional space generated by some
basis, where  (t) is a cubic B-spline basis de ned on equally spaced nodes on the [0;10]
interval.
We used the generalized cross-validation (GCV) method to determine the number of
basis functions to use as well as to determine the smoothing parameter in estimating the
32
functional covariate, X(t). The criterion is expressed as,
GCV( ) =
 n
n df( )
  SSE
n df( )
 
where df( ) = trace fS ; g is the equivalent degrees of measure and S ; is the projec-
tion operator such that ^X = S ; X; SSE is the residual sum of squares; and  is the
smoothing parameter that measures the rate of exchange between the  t to the data.
The minimization of GCV with respect to  is achieved by doing a grid search of some
values of  and selecting that  with the minimum GCV value.
(b) The natural cubic spline interpolation of the parameter function sin(t+ =4) is selected
as the true  (t) function. The basis coe cients, b = (b1;:::;bKb)0 of the parameter
function are known and used to assess the estimation techniques presented in this work.
(c) The probabilities of a positive response (Yi = 1) given Xi(t) are,
 i = expflig1 +expfl
ig
; i = 1;::;n
where li is as de ned before in (3.3); and  0 is  xed at 0. The values of the response, Yi,
are obtained by simulating observations of a Bernoulli distribution with probabilities  i.
The second step involves contamination of the simulated data. We adopted the con-
tamination process as discussed by Fraiman and Muniz (14):
(i) Model 0: No Contamination
X(t) = a1 +a2t+W(t) is the generated data as discussed before.
(ii) Model 1: Asymmetric Contamination
Z(t) = X(t) + cM where c is 1 with probability q and 0 with probability 1  q
and q =f0%;5%;10%;15%;20%g is the contamination level; M is the contamination
constant size taking a value of 25 and X(t) is as de ned in Model 0.
33
(a) Model 1 (b) Model 2
(c) Model 3 (d) Model 4
Figure 3.2: Sampling curves for Model 1 - 4 at q = 5%
(iii) Model 2: Symmetric Contamination
Z(t) = X(t) +c M where X(t), c and M are as de ned before and  is a sequence of
random variables independent of c that takes the values 1 and 1 with probability 0:5
(iv) Model 3: Partial Contamination
Z(t) = X(t) + c M if t>T and Z(t) = X(t) if t<T, where T is a random number
generated from a uniform distribution on [0;10]
(v) Model 4: Peak Contamination
Z(t) = X(t) +c M if T  t T +l and Z(t) = X(t) if t =2[T;T +l] where l = 2 and
T is a random number from a uniform distribution on [0;10 l].
34
Table 3.1: Median MSEB (standard error) for the estimation of the functional parameter
based on the optimum model for Model 1 and Model 2
Asymmetric Symmetric
Cont. (%) CPCA RPCA CPCA RPCA
0 0.1903 0.1935 0.1903 0.1935
(0.0216) (0.0217) (0.0216) (0.0217)
5 0.1970 0.1731 0.1995 0.1615
(0.0209) (0.0203) (0.10193) (0.0194)
10 0.1802 0.1780 0.1844 0.1728
(0.0186) (0.0175) (0.0176) (0.0177)
15 0.1795 0.1862 0.1798 0.1768
(0.0172) (0.0171) (0.0167) (0.0169)
20 0.1854 0.1824 0.1815 0.1769
(0.0167) (0.0168) (0.0160) (0.0165)
Table 3.2: Median MSEB (standard error) for the estimation of the functional parameter
based on the optimum model for Model 3 and Model 4
Partial Peak
Cont. (%) CPCA RPCA CPCA RPCA
0 0.1790 0.1839 0.1790 0.1839
(0.0216) (0.0217) (0.0216) (0.0217)
5 0.2929 0.2152 0.2866 0.2263
(0.0204) (0.0197) (0.0209) (0.0196)
10 0.2833 0.2416 0.3059 0.2554
(0.0207) (0.0192) (0.0200) (0.0194)
15 0.2806 0.2504 0.2879 0.2594
(0.0206) (0.0199) (0.0203) (0.0202)
20 0.2779 0.2787 0.2878 0.3070
(0.0205) (0.0209) (0.0201) (0.0196)
The e ects of these di erent types of contamination are shown in Figure 3.2. The
reduced logistic model in (3.3) was  t with the coe cient matrix, C , as the covariate. The
Hosmer-Lemeshow (20) goodness of  t test was also carried out, which indicated the model to
be valid for the generated data. In the  nal step, we obtained the approximated estimations
for the parameter function using the robust approach proposed (RPCA) in Section 3:2.
The approximated estimates are compared with the maximum likelihood estimate (MLE)
as well as the classical principal component estimate (CPCA) as discussed by Escabias et al.
(9). The MSEB is used to determine how many PCs to retain in the model, with small MBSE
35
values indicating better estimation when these three di erent estimation techniques are com-
pared. All simulations were implemented using R (33). The packages fda (35) and rrcov
(42) were particularly useful in obtaining the functional form of the data and performing
principal component estimation, respectively.
The simulations were replicated 200 times and Tables 3.1 and 3.2 summarize the e ect
of outliers on the median MSEB of the models. The median was used to reduce the in uence
of the extreme observations. In most of the models, and especially so for Models 3 and
4, the proposed robust approach yields better results at the varying contamination levels.
It should be noted that though higher contamination levels (i.e. q = 15%;20%) were also
attempted with similar consequences, high contamination levels for the logistic regression
model make it di cult to distinguish between the contaminated and the simulated data.
This would explain the breakdown of the estimations at these higher contamination levels.
The estimates for the ML approach were unstable and ine cient as expected, and the median
MSEB values are excluded from the tables.
Figure 3.3 shows the distribution of ^ 0 for the principal component estimation methods
for the 5 di erent models at q = 5%. The true parameter value is 0 and there seems to be no
major di erence in the distributions of this scalar parameter. Figure 3.4 , on the other hand,
shows the median estimates for the parameter function for the two PC estimation techniques,
CPCA and RPCA. In this instance, 4 PCs were retained in the regression models and the
median  (t) estimates compared for the robust and classical PCA approaches. 4 PCs were
retained as this was the typical (median) number of PCs retained for the optimal model
for both methods and for the di erent contamination models. The e ect of contamination
is evident in that when there are no functional outliers (Model 0), the  (t) estimates are
no di erent as shown in Figure 3.4(a). However, the estimates deteriorate for the classical
PCA estimation techniques, even at q = 5% contamination. This is especially so with the
partial and peak contamination models. In this overlay comparison of the robust method
(RPCA) vs. the non-robust approach (CPCA), it can be seen that the robust estimation
36
(a) Model 0 (b) Model 1
(c) Model 2 (d) Model 3
(e) Model 4
Figure 3.3: Side-by-side boxplots on the e ect of outlier curves on maximum likelihood
estimate (MLE),classical PCA (CPCA) and robust PCA (RPCA) estimates of  0 at q = 5%
contamination
remains closer to the true simulated curve when compared with the non-robust approach of
PC estimation. Therefore, we obtain better parameter estimation results by making use of
the proposed method when there are outliers present in the data.
37
(a) Model 0 (b) Model 1
(c) Model 2 (d) Model 3
(e) Model 4
Figure 3.4: Overlay comparison of the e ect of outlier curves on classical PCR and robust
PCR when the  rst 4 PCs are included in the model at q = 5% contamination.
3.4.2 Canadian Weather Data Set
In their paper on modeling environmental data by functional principal component esti-
mation, Escabias et al. (10) used the Canadian weather data from Ramsay and Silverman
(34) to predict the risk of drought (Y = 1 for a station where there is no drought risk;
38
Figure 3.5: The mean monthly temperatures for 23 Canadian weather stations used to
predict the risk of drought
Y = 0 for a station where there is drought risk) based on the monthly average temperatures
recorded over 12 months. The annual precipitations for each area were used to determine
whether an area had risk of drought or not. An area is said to have drought risk if the
precipitation along a year in that area is lower than the 25th percentile of the total annual
precipitation in the entire country.
There are 23 samples representing the weather stations, each with 12 mean monthly
temperatures recorded. Figure 3.5 shows the sample curves for the 23 weather stations with
an indication of the drought risk for each station. From this dataset, n1 = 9 stations have
drought risk and the rest, n2 = 14, do not have drought risk based on the precipitation
records. Due to the sinusoidal nature of the sample curves, the Fourier basis was used in the
approximation of the temperature function for each of the weather stations. The generalized
cross-validation method was used to determine the order of expansion for the functional
covariate of which 11 basis functions are used with a smoothing parameter,   0:0009.
We introduce an outlying sample curve by reducing all the temperatures for the Churchill
station by 10 degrees Celsius and slightly changing the pattern of the temperature curve by
39
(a) Before (b) After
Figure 3.6: Churchill weather station?s temperatures are altered
stretching it by a factor of 0:675. The weather station was randomly selected. Figure 3.6
illustrates the e ect of that shift and stretch on the original data. The AIC was used in order
to determine how many PCs to retain in the principal component-based methods. Table (3.3)
gives the details of the retained PCs for the classical principal component analysis (CPCA)
approach as well as our proposed robust principal component analysis (RPCA). In both
cases, the  rst three PCs were retained and both models have an AIC of 8. There is little
di erence in the percent of variation explained (PVE) with the inclusion of these PCs in the
logistic model, all of the models having over 99% of variance explained.
Table 3.3: The model details for each estimation technique
Original Sample Contaminated Sample
Retained PCs AIC PVE Retained PCs AIC PVE
CPCA 1,2,3 8 99.62% 1,2,3 8 99.63%
RPCA 1,2,3 8 99.70% 1,2,3 8 99.72%
Due to multicollinearity issues, the maximum likelihood approach is not ideal in estimat-
ing the parameter function,  (t), of the functional logistic regression model for predicting
the risk of drought. Figure 3.7 shows the parameter estimate using maximum likelihood
40
Figure 3.7: Parameter estimation without using principal component estimation
estimation. Figure 3.8 shows the function parameter estimates of  (t) using the two di er-
ent approaches of principal component estimation. In the absence of any outlying curves,
the non-robust and robust principal component estimations for the parameter function are
almost similar. The e ect of contamination on the estimation of  (t) is quite noticeable
for the non-robust approach. The parameter estimate is distinctly di erent, and therefore
one would have a di erent interpretation when calculating the odds of drought for certain
seasons or time intervals. On the other hand, the presence of this outlying sample curve has
minimal e ect on the proposed robust approach. The parameter function estimate remains
quite unchanged, as can be seen in Figure (3.9).
Table 3.4: Goodness of  t measures
Original Sample Contaminated Sample
Method AUC Z p-value AUC Z p-value
MLE 0.6476 -0.8869 0.3751 0.6672 -0.8152 0.4149
Classical PCA 0.6508 -0.6428 0.5204 0.6481 -1.0026 0.3161
Robust PCA 0.6587 -0.5862 0.5773 0.7125 -0.5259 0.5989
Goodness of  t measures were also conducted for the three di erent approaches as
summarized in Table (3.4). The measures provided are the area under the ROC curve
41
(a) Before (b) After
Figure 3.8: Parameter estimation when Churchill weather station?s temperatures are shifted
and its e ect on the principal component estimation methods
(a) Classical PCA (b) Robust PCA
Figure 3.9: CPCA vs RPCA when Churchill weather station?s temperatures are shifted
(AUC) as well as the goodness of  t statistic (Z) and its p-value. For the goodness-of- t
test, the Le Cessie-van Houwelingen normal test statistic (Le Cessie and van Houwelingen
42
(23)) for the unweighted sum of squared errors is used. This is de ned as,
^Tr =
nX
i=1
^r2si
var(^r2si);
where ^rsi =
nX
j=1
wij
 y
j ^ jp
^ j(1 ^ j)
 
and the wij?s are weights. All three approaches pro-
vide good  ts (p-value > 0.05). The robust PCA approach has the highest area under the
ROC curves, especially in the presence of the outlying sample curve. These goodness-of- t
measures indicate that the model that uses the robust approach has AUC of 0:7125 whilst
that which uses the non-robust approach has one of 0:6755. This is considered as excellent
discrimination by Hosmer and Lemeshow (20), an indication that the model based on the
robust PCs better predicts the risk of drought.
3.5 Conclusion
The objective in this chapter was to suggest a robust estimation technique for the func-
tional logistic regression model. The estimation of the functional parameter in this model
cannot be achieved by the regular method of maximum likelihood. Therefore, we approxi-
mate the functional observations and de ne the parameter function in a  nite-dimensional
space generated by a known basis. This reduces the functional model to a multiple model, al-
beit with highly collinear covariates. The presence of multicollinearity and outliers causes the
maximum likelihood estimate from this multiple logistic model to be unstable and therefore,
unreliable.
Robust estimation methods for the functional logistic model are therefore an important
tool in estimation of the parameter function derived in this manner. In this work, we have
proposed an approach that makes use of robust principal component estimation, and essen-
tially reduces dimensionality and improves the estimation of the parameter function in the
presence of multicollinearity and outliers. From the simulation study, we have shown that in
43
the presence of outliers, this approach results in better estimations for the parameter func-
tion, and subsequently better interpretation of the model. We also illustrated the improved
performance of the proposed method, by analyzing a real data set.
44
Chapter 4
Robust Penalized Functional Logistic Regression
4.1 Introduction
In Chapter 3, we proposed an estimation technique that makes use of robust principal
component estimation. This method is shown to be a more reliable approach in estimating
the functional parameter in the functional logistic regression model, especially since it became
necessary to look at an approach that eliminated multicollinearity as the resulting design
matrix in the reduced standard logistic model was ridden with collinear columns. Even
though the use of a robust principal component estimation technique resulted in better
estimations for the model in the presence of outliers, the in uence of outliers, however, is
still noticeable. This can be attributed to the fact that this approach still makes use of the
maximum likelihood estimation approach.
It has been shown that for the standard logistic model, the MLE is sensitive to outliers
[Hosmer and Lemeshow (20), Croux et al. (8), Carroll and Pederson (5), Pregibon (32)]. The
robustness properties for the MLE were analyzed in terms of the in uence function. The
in uence function (IF) of an estimator is an asymptotic approximation to the behavior of
that estimator when the sample contains a small fraction of identical outliers. An estimator
with an unbounded IF is non-robust since an in nitesimal model deviation causes the bias
to be arbitrarily large. The IF of the MLE for the standard logistic model is,
IF(y;x; ) = M 1(y  ( 0x))x;
where M is the Fisher information matrix, M = E[ 0( x)xx0]. Therefore, the MLE is
unbounded in x and bounded in y. In fact, the kind of outliers whose in uence is large for
45
the logistic model are such that either kxik!1; yi = 1 and  0xi is bounded away from
1 or k xi k!1; yi = 0 and  0xi is bounded away from  1. In other words, extreme
values in the design space lead to a biased MLE. Moreover, it has also been shown that
mis-classi cation errors, i.e. errors in the response, lead to a biased MLE [Pregibon (32),
Copas (6)]. Therefore, there is a need to develop robust estimation methods that deal with
these issues.
There have been many approaches discussed in literature that look at ways to turn
the MLE into an estimate with bounded in uence. Such methods involve techniques of
down-weighting high-leverage observations. In this chapter, we look at one such alternative
approach that is a Mallows-type estimate. We adopt the same estimation procedures as
before that reduce the functional logistic model to a standard logistic model by approx-
imating the functional covariate and the functional parameter as a linear combination of
some appropriate basis. A Huber-type loss function is then used to down-weight the high-
leverage observations and there is also a roughness penalty term to ensure that the estimated
parameter function, ^ (t), is smooth.
4.2 Estimation of X(t) and  (t)
We consider the functional covariates, Xi(t), and functional parameter,  (t), as belong-
ing to a  nite-dimensional space generated by (not necessarily) the same basis. We consider
Xi(t)2H (i.e. Hilbert space), such that,
Xi(t) 
KXX
j=1
cij j(t); (4.1)
wheref j(t)gKXj=1, is an appropriate basis that is selected to re ect the characteristics of the
data.
46
We assume  (t) 2 H and that the parameter is de ned as a linear combination of a
cubic B-spline basis f k(t)gKbk=1, such that,
 (t) =
KbX
k=1
bk k(t);
where the order of expansion, Kb, is estimated either by cross-validation or generalized cross-
validation; and so that Kb KX. This results in the functional logistic model being reduced
to a standard logistic model,
 i = P(Yi = 1jXi(t)2L2(T))
=
exp
 
 0 +
KXX
j=1
KbX
k=1
cijJjkbk
!
1 +exp
 
 0 +
KXX
j=1
KbX
k=1
cijJjkbk
!; i = 1;:::;n; (4.2)
whose logit transformation is given by,
li =  0 +
KXX
j=1
KbX
k=1
cijJjkbk; i = 1;:::;n;
L =  01 +CJb; (4.3)
where L= (l1;:::;ln)0; C= fcijgn KX; J = fJjkgKX Kb with Jjk = RT  j(t) k(t)dt; 1=
(1;:::;1)0; and b = (b1;:::;bKb)0. In this reduced form, the maximum likelihood estimate
can be derived and the unknown parameters established using an iterative method such as
Newton-Raphson. The next section explains a robust penalization method that minimizes
the e ect of outliers on the estimate, resulting in a smooth robust estimate for the functional
logistic model.
47
4.3 Robust Penalized Maximum Likelihood Estimation
Cardot and Sarda (4), Goldsmith et al. (18) and Ogden and Reiss (30) have considered
penalization in the estimation of the functional logistic model or generalized functional lin-
ear model. The introduction of the penalty term in the maximization of the log-likelihood
function is to ensure that the smoothness property of the parameter function is met. This
regularization approach in the estimation of smooth functional estimates is common, espe-
cially with estimating the functional form of the predictor, X(t), as discussed in Chapter 1
and also in functional principal component analysis [Ramsay and Silverman (34)].
The general form of the penalized likelihood is,
L ( ;Y) = L( ;Y)   Pen( (t));
where L( ;Y) is the log-likelihood function;  is the smoothing parameter and Pen( (t)) is
the penalty term that ensures that the parameter function is smooth. Cardot and Sarda (4)
considered a roughness penalty of the formk?(m)0k bk2 where?k denotes the vector of all the
B-splines and ?(m)k the vector of derivatives of order m of all the B-splines for some integer
m. The GCV criterion is then used to determine the value of the smoothing parameter,  .
Goldsmith et al. (18), on the other hand, proposed an automated way of deducing the
smoothing parameter in the penalty term. They considered the parameter function,  (t), as
the truncated power series spline basis expansion,
 (t) = b0 +b1t+
KbX
k=3
bk(t Kk)+
where fKkgKbk=3 are knots. They also considered the reduced functional model as a mixed
e ects model with Kb 2 random e ects,fbkgKbk=3  N(0; 2bI). Therefore, the penalty term
is given as, 1 2
b
b0Db, where D is the penalty matrix and 1 2
b
is the smoothing parameter.
48
Thus, the smoothing parameter is estimated as a variance component in the mixed e ects
model.
We build on these ideas and propose a robust penalization method. It is important
to note that the penalized likelihood methods discussed above and such similar methods in
literature are sensitive to outlying observations. From (4.2), we denote the parameter vector
as  = ( 0;b0)0, and the covariate vector as ai = (1;fCJg0i)0 to simplify the notation. The
proposed estimator [RPMLE] is de ned as,
^ = arg max
 
nX
i=1
wifyilog( i) + (1 yi)log(1  i)g  2 0D0 ; (4.4)
where  i is short for  (a0i ;yi);  is a smoothing parameter; wi are weights that might
depend on ai, yi or both; and D0 is the penalty matrix augmented by a leading column and
row of Kb + 1 zeros. When wi = 1;8i and  = 0, then (4.4) yields the maximum likelihood
estimator. We consider a Mallows class estimator (Mallows (26)), that addresses the problem
of outliers in the design space for the reduced functional logistic model. The main idea with
a Mallows-type estimator is to down-weight observations which have high leverage. In our
proposed robust estimator in (4.4), this is achieved by the following weighting scheme as
proposed by Stefanski (40),
wi = W(hn(xi));
hn(x) =
h
(x ^ n)0 ^  1n (x ^ n);
i1
2
with ^ n and ^ n being a robust location vector and a robust dispersion matrix estimate of
the design matrix, respectively. W is a non-increasing function such that W(u)u is bounded.
We use the weight function corresponding to Huber?s  ?s, which is de ned as,
Wk(u) = min
 
1; kjuj
 
;
49
where k = 1:345 is a tuning constant. It is noted that while the Mallows-class estimator
can be less e cient than the usual logistic MLE, its bias can be smaller than that of the
MLE. Carroll and Pederson (5) were able to illustrate that in the case where the design has
extreme design points, selective down-weighting can lead to less biased estimates and this
decrease in bias together with the robustness property, may be worth the lower e ciency.
The estimation of the smoothing parameter,  , can be achieved by cross-validation or
by the GCV method. The smoothness of the parameter function,  (t), is based on the idea
of the curvature of a function. This curvature is quanti ed as,
Z
T
f 00(t)g2dt =
Z
T
KbX
k=1
bk 00k(t) 00k(t)bkdt
=
KbX
k=1
bkD(t)bk (4.5)
= b0Db (4.6)
where D(t) = RT  00k(t) 00k(t)dt is the smoother. The approximation of the smoother is possible
by means of a numerical quadrature scheme.
4.3.1 Details of the Estimation Technique
Iterative algorithms are usually used for the computation of M-estimators. Since the
likelihood equations for (4.4) are non-linear in the parameter of interest,  , an iterative
method has to be used to solve these equations and get estimates of the parameters. A
modi ed scoring algorithm or Newton-Raphson method may be used to solve the non-linear
likelihood equations.
50
To make use of the Newton-Raphson method, the gradient and hessian matrices are
required. We consider the likelihood function in (4.4) and consider that,
log( i) = log
 exp(a0
i )
1 +exp(a0i )
 
;
= a0i  log(1 +exp(a0i ));
and similarly,
log(1  i) = log
 1
1 +exp(a0i )
 
= log(1 +exp(a0i ))
The likelihood function in (4.4) can then be re-written as,
L( ;y) =
nX
i=1
wifyilog( i) + (1 yi)log(1  i)g  2 0D0 
=
nX
i=1
wifyia0i  log(1 +exp(a0i )g  2 0D0 :
The  rst derivative (the gradient matrix) then becomes,
L0( ;y) =
nX
i=1
wi
 
yiai aiexp(a
0
i )
1 +exp(a0i )
 
   0D0
=
nX
i=1
wiaifyi  ig   0D0
= A0Wr   0D0;
51
where r = y n and W = diag(wi). The second derivative (the hessian matrix) becomes,
L00( ;y) =
nX
i=1
 wi
 (a0
i)
2exp(a0
i )(1 +exp(a
0
i ) (a
0
i)
2(exp(a0
i ))
2
[1 +exp(a0i )]2
 
  D0
=
nX
i=1
 wif(a0i)2 i (a0i)2( i)2g  D0
=
nX
i=1
 wifai i(1  i)a0ig  D0
= A0V12WV12A  D0;
where V =diag(n i(1  i)). The derivation of the hessian matrix can be computationally
expensive and an approximation based on the gradient may be used instead. The Newton-
Raphson method leads to the iterative scheme,
 k+1 =  k + (A0V12WV12A + D0) 1(A0Wr   0D0); (4.7)
in which case V and r are evaluated at  k. When all the weights are unity, i.e. wi = 1 8i
and  = 0, then the above reiterative scheme simpli es to that of of the usual maximum
likelihood iterative procedure for ^ . A grid search is done to obtain the optimal smoothing
parameter,  , by way of the cross-validation method. This selection criteria is discussed in
the following section.
4.3.2 Tuning Parameter Selection
The in uence of the penalty is based on the magnitude of the smoothing parameter,
which is non-negative. When  = 0 then we have un-penalized estimates and thus, have
unbiased estimates where smoothness is not emphasized; whereas large values of  indicate
that more weight is given on the smoothness of the estimate. Increasing the value of  makes
the  (t) smoother and an optimum is reached by doing a grid search of  . The smoothing
parameter,  , is searched by varying it systematically and then monitoring the prediction
52
error using techniques such as cross-validation. The factor 12 is used so as to get rid of a
factor 2 that occurs when the penalty is di erentiated.
The leave-one out cross validation method leaves out one observation and  ts the models
with the remaining n 1 observations. Prediction is then made for the left-out observation
using this predictive model, and this procedure is repeated for all the observations. For the
logistic model, this is de ned as,
CV = 1n
nX
i=1
(Yi ^ i; i)2;
where ^ i; i corresponds to the predicted probability of a positive response given the predictor
with observation i missing from the predictive model. The CV is computed for a variety
of  and an optimal smoothing parameter with a minimum CV is then selected. Another
option for exponential family models is minimization of the information criterion (IC). The
IC can be viewed as a compromise between the goodness of  t and the complexity of the
model. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)
are some of the widely used ICs. To select the optimal smoothing parameter, a logarithmic
grid search on the non-negative penalty parameter  is performed and IC computed. The
optimal  is chosen at the minimum.
4.4 Numerical Examples
To display the robustness properties of the proposed estimator, we conduct a simulation
study and compare the robust approach discussed in this chapter with the existing maximum
likelihood estimate (MLE),
^ = arg max
 
nX
i=1
fyilog( i) + (1 yi)log(1  i)g (4.8)
53
as well as the penalized maximum likelihood estimate (PMLE),
^ = arg max
 
nX
i=1
fyilog( i) + (1 yi)log(1  i)g  2 0D0 : (4.9)
An application of the estimation method to a real data set is also explored and the results
discussed in this section.
4.4.1 Simulation Study
We use the same data generated in Chapter 3 for this simulation study. That is, we
generate 50 sample functional observations of a known stochastic process X( ) considered
over the interval [0;10] which has 21 equally spaced time slots where the process is de ned
as,
Xi(t) = ai1 +ai2t+Wi(t)
Wi(t) =
10X
r=1
bi1sin(2 10rt) +bi2cos(2 10rt)
where ai1  U[1;4] or ai1  U[2;4], ai2  N[1;0:2] or ai2  N[1;0:6], bi1;bi2  
N[0;1=r2].
The functional predictor is estimated using a cubic B-spline basis expansion where the
order of expansion, KX, is determined by way of the GCV method. The true  (t) function
is taken to be sin(t+ =4) and the natural cubic spline of this used to get an approximation
of the function. Outlying sample curves are introduced for the stochastic process X( ) using
the model of Fraiman and Muniz (14) as discussed in Chapter 3. These are asymmetric
contamination (Model 1), symmetric contamination (Model 2), partial contamination (Model
3) and peak contamination (Model 4).
The smoothing parameter is determined by the cross-validation method. The Huber-
type weights makes use of the Mahalanobis distance, where the robust location and dispersion
54
Table 4.1: Mean MSEB (standard error) for the estimation of the functional parameter,  (t),
for Models 2, 3 and 4
Model 2 Model 3 Model 4
Cont. (%) MLE PMLE RPMLE MLE PMLE RPMLE MLE PMLE RPMLE
0 26.3043 1.3211 0.5894 26.3043 1.3211 0.5894 26.3043 1.3211 0.5894
(9.5266) (0.5115) (0.1081) (9.5266) (0.5115) (0.1081) (9.5266) (0.5115) (0.1081)
5 20.5941 1.0453 0.5681 16.3381 0.9292 0.7444 20.5692 1.0591 0.5308
(12.7574) (0.3335) (0.1072) (6.2826) (0.2919) (0.3015) (6.9599) (0.3239) (0.0883)
10 8.1512 0.8327 0.4756 12.4196 0.8676 0.7055 18.2714 0.8676 0.6798
(3.6822) (0.2573) (0.0386) (4.9123) (0.2793) (0.2218) (4.7792) (0.2182) (0.2067)
15 4.3307 0.7522 0.4958 11.5515 1.0341 0.6168 17.0797 0.9183 0.6318
(2.9919) (0.2844) (0.0469) (4.7697) (0.3466) (0.1429) (4.4364) (0.2361) (0.1510)
20 3.7119 0.6544 0.4858 7.8925 0.8528 0.6115 17.5679 0.9504 0.6578
(4.4867) (0.2118) (0.0214) (3.3326) (0.2776) (0.1223) (5.1051) (0.2697) (0.1408)
estimates are both MCD-based. Table 4.1 summarizes the average of the MSEB measures
for the three estimation methods discussed in this chapter, i.e. the maximum likelihood
estimate (MLE), the penalized maximum likelihood estimate (PMLE) and our proposed
robust penalized approach (RPMLE). There is improved estimation with the inclusion of
the smoothing term, as expected and discussed by Cardot and Sarda (4) and Goldsmith
et al. (18). In the presence of outliers, however, our robust approach is a markedly better
estimator.
A comparison of our robust penalized estimation approach with the non-robust penalized
approach is shown in Figure (4.1). It is evident from this that, in the presence of outliers,
down-weighting these leverage points results in a more reliable estimate of the parameters
(with notable exception of high contamination levels for Model 1). Essentially, the e ect of
the outlying curves is minimized in the estimation of the functional parameter,  (t), resulting
in a more reliable functional logistic model for the observed data.
4.4.2 Poblenou NOX Levels Data Set
This dataset was used by Febrero et al. (11) and is a collection of NOX emissions
in Poblenou, Barcelona (Spain) over a period of 115 days with recordings starting on 23
February and ending on 26 June, in 2005. Over that period of time, hourly measurements
of the NOX are recorded and therefore we split the whole sample of hourly measures in a
55
(a) Model 1 (b) Model 2
(c) Model 3 (d) Model 4
Figure 4.1: Comparison of the penalized method and robust penalized method at di ering
contamination levels
dataset of functional trajectories of 24 h observations (each curve represents the evolution
of the levels in 1 day). There were twelve curves that contained missing data that were
excluded in the study.
We consider the functional logistic regression model where the functional predictor is the
functional trajectories of NOX levels over a 24 hour period with a binary response indicating
whether the day is a working day (Y = 1) or a non-working day (Y = 0). This data set is
56
Figure 4.2: NOX emmission levels for non-working (top) and working (bottom) days
Figure 4.3: Functional form of the hourly trajectories with the outliers detected in the NOX
dataset
known to have outliers and a robust functional principal component method by Sawant et al.
(38) identi ed the outlying curves as shown in Figure 4.3. These outliers were identi ed for
the following  ve days 03=09;03=11;03=18;04=29 and 05=02, all of which were working days.
The functional trajectories are estimated using a truncated Karhunen-Loeve decompo-
sition. We let
1X
k=1
 k k(s) k(t) be the spectral decomposition of the covariance function,
57
C(s,t), where  1   2     are the non-increasing eigenvalues associated with the orthonor-
mal eigen-functions  1(t); 2(t);   . We suppose Xi(t)2L2(T) and centered, then
Xi(t) =
KXX
j=1
cij j(t);
where  j(t) are the  rst KX eigenfunctions of the smooth covariance function C(s;t) =
cov[Xi(s);Xi(t)] and cij = RT Xi(t) j(t)dt. The truncation lag, KX, is determined using a
99% cut-o of the percent variance explained (PVE). This is de ned as,
PVE(KX) =
KXP
p=1
 p
24P
p=1
 p
;
where KX represents the minimum number of principal components needed to explain 99%
of the total variation in the model.
The estimated parameter function for the three methods discussed in this chapter are
shown in Figure (4.4). The pattern of the parameter is similar for all three methods with
slight di erences for the late afternoon to early morning time interval (i.e. 18:00 hrs to 24:00
hrs and also between 01:00 hrs and 03:00 hrs). This di erence is more pronounced for the
MLE and PMLE in the presence of outliers, which inevitably gives a di erent model inter-
pretation for those sub-intervals. Due to the down-weighting of high leverage observations,
the e ect of outlying curves is minimized in our robust approach. It is interesting to note
that the a ected sub-interval for the estimators is the same sub-interval where the functional
trajectories? behave di erently from the rest of the sample curves.
4.5 Conclusion
In this chapter we proposed a robust estimation approach that downweighs high leverage
points. This was achieved by reducing the functional logistic model by way of basis expansion.
58
(a) PMLE (b) RPMLE
Figure 4.4: A comparison of the estimated  (t) with and without the 5 outliers for the
non-robust and robust penalized methods
The in uence function for the maximum likelihood estimator for this reduced logistic model is
known to be unbounded in the x-direction. Therefore, functional observations whose pattern
di er from the rest of the observations, even in some sub-interval, greatly in uence the
estimator. It was with this in mind that weights based on the design space were used together
with some regularization to impose the smoothness property of the estimated parameter.
The proposed robust penalized method is a Mallows-type estimator that uses Huber-
type weights to down-weight outliers in the x-direction by using the robust Mahalanobis
distance of the covariate. The penalty term makes use of the curvature of the parameter
function and ensures smoothness of the estimator. The Monte Carlo study showed the
increased e ciency in estimation by the robust penalized estimator. The same conclusion
was arrived at for a real world data example.
59
Chapter 5
Diagnostic Methods for the Functional Logistic Model
5.1 Introduction
Often after  tting a model, one needs to assess the  t of the model before using it to
make any inference. For this to be done, it is imperative that there?s an understanding of
the model as well as the  tting procedure used. The idea is on assessing how good the  t is,
i.e. how well does the  t model describe the outcome. In this chapter, we adapt diagnostics
methods for the standard logistic model and extend these ideas to the functional logistic
model. We contribute to the diagnostics measures of this model.
Febrero-Bande et al. (12) proposed two statistics that measure the in uence of each
curve on the functional slope of the functional linear model, with a scalar response and
functional predictor(s), when the principal components method was used in the estimation
of the model. M uller and Chiou (28) developed diagnostic measures for functional regression
models where the response was functional and the predictor was either multivariate vectors
or random functions by proposing residual processes. Malloy et al. (27) proposed a method
that extended the Box-Tidwell score test and involved construction of residual plots to detect
non-linearity of the functional predictor(s) in the functional generalized linear model which
has a scalar response and functional predictors. Our proposed diagnostic measures di er
from all this work in two primary aspects. Firstly, our measures are for the functional
logistic model which has a binary response and functional predictor(s). Secondly, we focus
on measures of goodness-of  t as well as measures to identify ill- tting observations for the
model and/or observations that have a dominant in uence in the  t of the model.
Pregibon (32) made a signi cant contribution on diagnostic measures for the standard
logistic model, by extending ideas from linear regression. We explore ways that these and
60
other diagnostic methods can be adapted for data were the predictor is functional and the
response binary. We base our measures on the principal component based method discussed
in chapters 2 and 3. This methodology can be extended to other estimation methods and
we illustrate this by assessing the  t of the numerical examples analyzed in chapters 3 and
4.
5.2 Measures of Goodness-of-Fit
We consider the functional logistic model where the functional predictor, X(t)2L2(T),
de ned on the closed interval T = [a;b]  R. We assume that the unknown parameter
function,  (t)2L2(T), and that both functions are represented as a linear combination of a
known basis, such that the probability of a positive response given the functional predictor is
given as in (2.4). As discussed in chapter 2, the maximum likelihood estimate of this model
is unstable and ine cient due to multicollinearity issues with the design matrix X = (1jC ).
Therefore, we consider the principal component based approach, where we let Z =f ijgn KX
be the matrix of PCs of the design matrix, such that Z = C V, where V is a KX KX
matrix whose columns are the eigenvectors associated with the eigenvalues of the covariance
matrix of C . Then the logit model (2.4) becomes,
L(s) =  01(s) + Z(s) (s); (5.1)
in matrix form, where  = V0b and s denotes the number of principal components retained
in the model. The maximum likelihood estimate is then,
^ = arg max
 
nX
i=1
yilog[ (zi ;yi)] + (1 yi)log[1  (zi ;yi)]: (5.2)
Goodness-of- t summary measures are useful in giving an indication of the  t of the
model. In order to asses the  t of the model, covariate patterns need to be considered. These
are distinct groupings of the covariates which we will denote by J. Therefore, should some
61
subjects have the same values for the covariate, then J < n. The number of observations
with the same covariate pattern is denoted by mj for j = 1;:::;J such that Pmj = n. The
number of positive responses among the mj observations having the same covariate pattern
is also denoted by yj. And so the sum of yj would be the total number of observations with
the same covariate pattern having a positive response. Therefore, the maximum likelihood
estimate in (5.2) can be re-written as,
^ = arg max
 
JX
j=1
yjlog[ (zj ;yj)] + (mj yj)log[1  (zj ;yj)]: (5.3)
We discuss two measures that can be used to measure the di erence between the ob-
served and  tted values. We denote the  tted values as,
^yj = mj^ j = mj
exp
 
^ 0 + sP
k=1
zjk^ k
 
1 +exp
 
^ 0 + sP
k=1
zjk ^ k
 j = 1;:::;J:
The Pearson residual is de ned as,
r(yj;^ j) = yj mj^ jpm
j^ j(1 ^ j)
; j = 1;:::;J;
For the case that yj = 0, this reduces to,
r(yj;^ j) = pmj
s
^ j
1 ^ j;
whereas in the case that yj = mj, we have
r(yj;^ j) =pmj
s
1 ^ j
^ j
62
This has the Pearson chi-square statistic that is based on it,
X2 =
JX
j=1
r(yj;^ j)2: (5.4)
The deviance residual is de ned as,
d(yj;^ j) = 
 
2
 
yjlog
 y
j
mj^ j
 
+ (mj yj)log
 m
j yj
mj(1 ^ j)
   1
2
; j = 1;:::;J;
where the sign is determined by the sign of (yj mj^ j). In the case of no positive response,
i.e. yj = 0, then
d(yj;^ j) = 
 
2 mjlog
 1
1 ^ j
  1
2
= f2mjjlog(1 ^ j)jg12;
whereas, when all observations with that covariate pattern have a positive response, i.e.
yj = mj,
d(yj;^ j) =
 
2 mjlog
 1
^ j
  1
2
=f2mjjlog(^ j)jg12:
The deviance is then the summary statistic based on the deviance residual and is given as,
D =
JX
j=1
d(yj;^ j)2: (5.5)
Both these summary statistics are distributed as  2 with degrees of freedom J (s+1),
under the assumption that the  tted model is correct. The statistics measure the goodness
of the  t of the model; X2 measures the relative deviation between the observed and  tted
values, whilst D measures the disagreement between maxima of the observed and the  tted
63
log likelihood functions. Large values would be an indication that the observations are poorly
accounted for by the model.
Graphical plots of these measures of goodness of  t can be used to allow for easier
detection of those observations that are ill- tting to the model. A plot of the Pearson
residuals against standard normal quantiles is one such approach where deviations from a
linear pattern is an indication of the lack of goodness of  t. Other plots that can be utilized
are index plots of the residuals discussed against the observations or more commonly the
discussed residuals against the  tted values, i.e. rj vs. ^ j or dj vs. ^ j. Ill- tting observations
would have large values for the residuals.
5.3 Regression Diagnostics
The diagnostic measures discussed in this section will assist in identifying which of
the observations are not well-explained by the model, or in pointing out those observations
having dominance in some aspect of the  t and quantifying their e ect on the  t. The work
by Pregibon (32) was key in establishing the theoretical work that extended diagnostics
from the linear regression model to logistic regression. We seek to discuss these ideas for the
functional logistic model.
The analog of the projection matrix for the reduced functional logistic model (5.1) is
given by,
P = I H
= I U12Z(Z0UZ) 1Z0U12;
64
where U is an J J diagonal matrix with the general element uj = mj^ j(1 ^ j) and H is
the hat matrix. A diagonal element from this hat matrix, hj, is therefore of the form,
hj = mj^ j(1 ^ j)z0j(Z0UZ) 1z0j
= uj qj
where qj = z0j(Z0UZ) 1z0j. Pregibon (32) showed that, just as is the case with the linear
model, P is symmetric, idempotent and spans the residual space. Therefore, small values
of pjj detect extreme points in the design space. Diagnostic plots that would be useful
in identifying outlying and in uential curves include the Pearson residual (rj), the deviance
residual (dj) as discussed in the last section as well as the diagonal elements of the projection
matrix (pjj).
Finally, we discuss diagnostic measures that look at the e ect of in nitesimal model
perturbations on the model  t and therefore, quantify the e ect of each observation (or subset
of observations) on the model  t. The diagnostic to measure the sensitivity of the estimated
parameter estimate in the reduced functional logistic model to in nitesimal perturbations of
all observations with the same covariate pattern is given by,
4^ j = r
2
jhj
(1 hj)2
= r
2
sjhj
1 hj;
where rsj is the studentized Pearson residual which is de ned as,
rsj = rjp1 h
j
;
where rj is the Pearson residual. This measure,  ^ j, is analogous to Cook?s distance for the
linear model. Pe~na (31) proposed a measure that di ers from that of Cook?s measure for the
linear regression model. Unlike Cook?s Distance where the in uence of an observation was
65
determined by deleting the observation and measuring the in uence of that on the predictors;
Pe~na?s measure looks at the in uence of an observation based on how that observation is
being in uenced by the rest of the data. In the case of the reduced functional logistic model,
the in uence of the jth covariate pattern is measured as,
Pj =
 ^y
j ^yj( j)
 2
mj^ j(1 ^ j);
where ^yj( j) is the predicted response when the jth covariate pattern is removed from the
sample.
If those observations with the jth covariate pattern are not well  t by the model, in-
 nitesimal perturbations cause changes in the deviance and chi-square statistics discussed in
the previous section, which most often can be isolated to the residual components. To mea-
sure the e ect of each covariate pattern on the model  t, the rate of change of the deviance
statistic and chi-square statistic can be approximated by,
 Dj = d2j + r
2
jhj
1 hj;
and
 X2j = r
2
j
1 hj;
respectively. Those covariate patterns that are poorly  t by the model will be identi ed
by large values for  Dj and  X2j . Similarly, those covariate patterns having the greatest
in uence on the values of the estimated parameter are identi ed by large values for  ^ j.
5.4 Numerical Examples
The diagnostic measures discussed in this chapter are applied to data sets that were
previously used in earlier chapters. The Canadian Weather data set with the outlier is  tted
66
(a) ri vs. index (b) di vs. index
Figure 5.1: Index plots for the Canadian Weather data comparing residuals of the principal-
component based approaches
using the (robust) principal component based approach. The Poblenou NOX emission data
which has been identi ed to have have 5 outlying curves is  tted using a (robust) penalized
maximum likelihood approach. In both instances, we compute the diagnostic measures
and goodness-of- t measures for the robust and the non-robust approaches discussed in
this dissertation work and by utilizing the two di erent estimation methods - the principal
component based estimation; and the penalized maximum likelihood based estimation.
5.4.1 Canadian Weather Diagnostics
This data set has 23 samples representing the weather stations, each with 12 mean
monthly temperatures recorded. Of these sample curves, n1 = 9 stations have drought risk
and the rest, n2 = 14, do not have drought risk. The functional logistic model from this
data set was used to predict risk of drought for these 23 locations based on the functional
temperature predictor. An outlier was introduced in this data set by shifting and stretching
a random curve.
67
The classical principal component-based estimation, as explained in this chapter, as
well as our proposed robust principal component-based estimation technique discussed in
Chapter 3 are used to  t this data and their diagnostic measures compared. There were
no covariate patterns distinguished and so J = n. Figure (5.1) displays the index plots
for the two types of residuals discussed, i.e. Pearson residual and deviance residual. The
residuals in both instances are relatively small, indicating that both models are a good  t
of the data. On average, the robust PCA  t model has smaller residuals than the classical
PCA  t functional logistic model. The summary statistics that measure the appropriateness
of the  tted models were D = 2:8414 10 9 and X2 = 1:4207 10 9 for the  t that used
classical PCA approach, whilst the  t that used our proposed robust PCA method had
D = 8:6886 10 10 and X2 = 4:344 10 10 on 19 degrees of freedom in both cases.
The weather stations that were identi ed as having the higher residuals by both  ts
are shown in Figure (5.2). The identi ed stations are Sche ervll, Bagottville, Kamloops,
Yellowknife and Resolute - the  rst two are classi ed as having no risk of drought whilst the
last 3 are identi ed as having risk of drought. However, it?s important to note that these
residuals are still relatively small and so the appropriateness of the  t is met.
The diagnostic measures  ^ j,  X2j and  Dj were evaluated and a summary of these
statistics plotted against hjj as shown in Figure (5.3). The values of these statistics are small
and therefore, we can conclude that there are no observations that are poorly  t or having
a great in uence on the values of the estimated parameter. The temperature function for
Sche ervll (#7) has noticeably higher values for the non-robust model  t.
5.4.2 Poblenou NOX Emission Diagnostics
The Poblenou NOX example is considered as a functional logistic regression model
where the functional predictor is the functional trajectories of NOX levels over a 24 hour
period with a binary response indicating whether the day is a working day (Y = 1) or a
non-working day (Y = 0).
68
(a) No drought risk (b) Drought risk
(c) All
Figure 5.2: Weather stations with the larger residuals shown as peaks in Figure (5.1)
The approach taken to  t this model was that of (robust) penalized maximum likelihood
as discussed in Chapter 4. With this example we demonstrate that the diagnostic methods
discussed in this chapter can be extended to other estimation approaches. Figure (5.4)
displays the index plots of the deviance residuals (dj) against the days for which the NOX
were recorded. There are several days with large values for the deviance residuals. Three
69
(a) (b)
(c)
Figure 5.3: Diagnostic plots that show the in uence of each observation on the di erent
diagnostic measures versus hii
noticeable days are 03=23(#27), 03=25(#29) and 05=16(#72) with deviance residuals above
10.
The summary statistics for the goodness of  t for the non-robust penalized approach
and the robust penalized approach were extremely large in both cases (p-value << 0:01).
Therefore, both models are poor  ts of the data and drawing inferences on the odds of a
working day based on these hourly NOX emission levels would be inappropriate.
70
(a) Non-Robust (b) Robust
(c) Overlay
Figure 5.4: Index plot of the deviance residuals for the NOX penalized maximum likelihood
 tted models
The diagnostic plots in Figure (5.5) also indicate the observations with the largest
in uence on the parameter estimate as well as the diagnostic summaries. These observations
are consistent with those in the index plot.
71
(a) (b)
(c)
Figure 5.5: Diagnostic plots that show the in uence of each observation on the di erent
diagnostic measures versus hii for the NOX model
72
Chapter 6
Conclusion
The aim of this dissertation work was to propose robust statistical methods for the
functional logistic model, which has functional predictors and a binary response. Whilst an
increasing amount of work has been directed in functional data analysis, we are not aware
of any work speci cally looking at robust methods for the functional logistic model.
Firstly, we proposed a robust estimation technique that was principal component based.
Both the functional predictor and functional parameter were estimated using basis expansion,
where the order of expansion was determined by the generalized cross-validation method.
The reduced functional logistic model had multicollinearity issues that were eliminated by
robustly obtaining principal components that were used as the covariates in the model. The
estimated functional parameter was shown to be more e cient in a simulation study as well
as with an application to a real dataset.
Secondly, we proposed a Mallows-type estimator. This other robust estimation approach
made use of a penalization technique as well as down-weighting high leverage points. The
penalty term was introduced to ensure that the estimator retained the smoothness property
and it was based on the curvature properties of the ^ (t) function. Huber-type weights that
were based on the robust Mahalanobis distance of the covariate were used to minimize the
e ect of those observations that were outliers in the design space. The proposed estimator
was shown to be better and more e cient in a simulation study and on real functional data
with outlying curves.
Lastly, we explored goodness-of- t and diagnostic measures for the functional logistic
model. These measures were generalizations of widely-used and known diagnostics measures
73
of the standard logistic model. We showed that some adaptions are necessary when deal-
ing with the functional logistic model and illustrated their usefulness and performance by
analyzing the  t of the real-world examples discussed in prior chapters.
6.1 Future Work
The robust estimation approaches proposed in Chapters 3 and 4 only considered outliers
in the design space. However, for the functional logistic model, outliers can be viewed as
either extreme sample curves in the design space or as mis-classi cation errors in the response.
The robust methods proposed, whilst resistant to outlying curves in the design space, are
not resistant to mis-classi cation errors. These errors occur when observations in the y = 0
class are recorded as being in y = 1 class, and vice versa. To simultaneously addresses
the problem of mis-classi cation as well as dealing with extreme observations in the design
space, consideration of a weight function of the form wi = wyi wxi where wyi denotes weights
for mis-classi cation problems and wxi denotes weights to down-weigh extreme data points
in the design space can be made for the robust penalized approach proposed in Chapter 4.
Croux et al. (8) showed that the breakdown point for the MLE is  2(p 1)=n where p
indicates the number of predictors in the standard logistic model. The basis for the wxi is
therefore made from the hat matrix H = X(X0X) 1X0 where X is the n (p + 1) design
matrix for our reduced functional model. In particular, one can consider the Huber-type
weights
wxi = min
 
1; 2Kbnh
ii
 
;
wyi = min
 
1;c
  
  yi  i
[ i(1  i)]12
  
  
 1 
where c is a tuning constant that controls the degree of robustness.
74
The estimation methods proposed in this dissertation work have taken an approach
that reduces the functional predictors into multivariate predictors by way of basis expan-
sion. Therefore, in both cases robust multivariate techniques were used. Zhang and Chen
(45) showed that the smoothing step in these approaches may result in some bias. A fully
functional approach would be ideal, especially in the case of the principal component ap-
proach. Escabias et al. (9)?s work contrasted a functional PCA approach with one that used
multivariate PCA techniques on a reduced functional logistic regression. Gervini (15) pro-
posed a fully functional approach for spherical PCA. Lee et al. (24) also recently proposed
an M-estimation type functional principal component analysis method. These methods re-
sult in robust principal functions that can be used as functional predictors in the functional
logistic model.
The nature of the functional logistic model, allows it to be used for classi cation purposes
where the data has two classes and the observations are functional. Robust functional
classi cation methods would be applicable in diverse  elds where there are two or more
classes. The functional logistic model is part of the generalized functional linear model family.
Therefore, a broader focus on robust estimation methods for this family could be made.
Finally, the attraction to work with functional data stemmed from brain imaging studies
and the application of statistical methods and tools with functional Magnetic Resonance
Imaging (fMRI) data. An increasing amount of study and focus has emerged with brain
images and other neuro-imaging studies. In future work, a look at the application of these
functional statistical methods in dealing with this particular data will be a main focus.
75
Bibliography
[1] J. L. Bali, G. Boente, D. E. Tyler, and J.-L. Wang. Robust functional principal compo-
nents: A projection-pursuit approach. The Annals of Statistics, 39:2852 { 2882, 2011.
[2] L. Barker and C. Brown. Logistic regression when binary predictor variables are highly
correlated. Statistics in Medicine, 20:1431 { 1442, 2001.
[3] G. Boente and R. Fraiman. Discussion of robust principal components for functional
data by locantore et al. Test, 8:28 { 35, 1999.
[4] H. Cardot and P. Sarda. Estimation in generalized linear models for functional data via
penalized likelihood. Journal of Multivariate Analysis, 92(1):24 { 41, 2005.
[5] R.J. Carroll and S. Pederson. On robustness in the logistic regression model. Journal
of the Royal Statistical Society (B), 55:693 { 706, 1993.
[6] J.B. Copas. Binary regression models for contaminated data. Journal of Royal Statistical
Society (B), 50:225 { 265, 1988.
[7] P. Craven and G. Wahba. Smoothing noisy data with spline functions: estimating the
correct degree of smoothing by the method of generalized cross-validation. Numerische
Mathematik, 31:377 { 403, 1979.
[8] C. Croux, G. Dhaene, and D. Hoorelbeke. The breakdown behaviour of the maximum
likelihood estimator in the logistic regression model. Statistics and Probability Letters,
60:377 { 386, 2002.
76
[9] M. Escabias, A.M. Aguilera, and M.J. Valderrama. Principal component estimation
of functional logistic regression: Discussion of two di erent approaches. Journal of
Nonparametric Statistics, 16(3 { 4):365 { 384, 2004.
[10] M. Escabias, A.M. Aguilera, and M.J. Valderrama. Modeling environment data by
functional principal component logistic regression. Environmetrics, 16:95 { 107, 2005.
[11] M. Febrero, P. Galeano, and W. Gonz alez-Manteiga. Outlier detection in functional data
by depth measures, with application to identify abnormal NOx levels. Environmetrics,
19:331 { 345, 2007.
[12] M. Febrero-Bande, P. Galeano, and W. Gonz alez-Manteiga. In uence in the functional
linear model with scalar response. Journal of Multivariate Analysis, 101(2):327 { 339,
2010.
[13] F. Ferraty and P. Vieu. Nonparametric Functional Data Analysis: Theory and Practice.
Springer, 2006.
[14] R. Fraiman and G. Muniz. Trimmed means for functional data. Test, 10:419 { 440,
2001.
[15] D. Gervini. Robust functional estimation using the median and spherical principal
components. Biometrika, 95:587 { 600, 2008.
[16] D. Gervini. Detecting and handling outlying trajectories in irregularly sampled func-
tional datasets. The Annals of Applied Statistics, 3:1758 { 1775, 2009.
[17] J. J. Goeman. L1 penalized estimation in the cox proportional hazards model. Biomet-
rical Journal, 52(1):70{84, 2010.
[18] J. Goldsmith, J. Bobb, C.M. Crainiceanu, B. Ca o, and D. Reich. Penalized functional
regression. Journal of Computational and Graphical Statistics, 20:830 { 851, 2011.
77
[19] R.R. Hocking. The analysis and selection of variables in linear regression. Biometrics,
32:1 { 49, 1976.
[20] D.W. Hosmer and S. Lemeshow. Applied Logistic Regression. Wiley, second edition,
2000.
[21] M. Hubert, P.J. Rousseeuw, and K.V. Branden. Robpca: a new approach to robust
principal component analysis. Technometrics, 47(1):64 { 79, 2005.
[22] G.M. James. Generalized linear models with functional predictors. Journal of the Royal
Statistical Society, Series B, 64(3):411 { 432, 2002.
[23] S. Le Cessie and J.C. van Houwelingen. Ridge estimators in logistic regression. Journal
of the Royal Statistical Society, 41(1):191 { 201, 1992.
[24] S. Lee, H. Shin, and N. Billor. M-type smoothing spline estimation for principal func-
tions. Computational Statistics and Data Analysis, 2013.
[25] X. Leng and H.G. M uller. Classi cation using functional data analysis for temporal
gene expression data. Bioinformatics, 22:68 { 76, 2006.
[26] C.L. Mallows. On some topics in robustness. Technical Memorandum, 1975.
[27] E.J. Malloy, E.J. Bedrick, and T. Goldsmith. Diagnostics for the scale of functional
predictors in generalized linear models. Technometrics, 49(4):480 { 489, 2007.
[28] H-G. M uller and J-M. Chiou. Diagnostics for functional regression via residual processes.
Computational Statistics and Data Analysis, 51:4849 { 4863, 2007.
[29] H.G. M uller and U. StadtM uller. Generalized functional linear models. The Annals of
Statistics, 33(2):774 { 805, 2005.
[30] R.T. Ogden and P.T. Reiss. Functional generalized linear models with images as pre-
dictors. Biometrics, 66:61 { 69, 2010.
78
[31] D. Pe~na. A new statistic for in uence in linear regression. Technometrics, 47(1):1 { 12,
2005.
[32] D. Pregibon. Logistic regression diagnostics. The Annals of Statistics, 9(4):705 { 724,
1981.
[33] R Development Core Team. R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria, 2011. URL http://www.
R-project.org. ISBN 3-900051-07-0.
[34] J.O. Ramsay and B.W. Silverman. Functional Data Analysis. Springer, second edition,
2005.
[35] J.O. Ramsay, H. Wickham, S. Graves, and G. Hooker. Functional Data Analysis, 2012.
URL http://www.cran.r-project.org/web/packages/fda.
[36] S.J. Ratcli e, G.Z. Heller, and L.R. Leader. Functional data analysis with application to
periodically simulated foetal heart rate data ii: Functional logistic regression. Statistics
in Medicine, 21:1115 { 1127, 2002.
[37] P.T. Reiss, R.T. Ogden, J.J. Mann, and R.V. Parsey. Functional logistic regression with
pet imaging data: A voxel-level clinical diagnostic tool. Journal of Cerebral Blood Flow
and Metabolism, 25(S635), 2005.
[38] P. Sawant, N. Billor, and H. Shin. Functional outlier detection with robust functional
principal component analysis. Computational Statistics, 27(1):83 { 102, 2012.
[39] R.L. Schaefer. Alternate estimators in logistic regression when the data are collinear.
Journal of Statistical Computation and Simulation, 25(1 { 2):75 { 91, 1986.
[40] L.A. Stefanski. Infuence and measurement error in logistic regression. PhD Thesis,
1983.
79
[41] T.S. Tian. Functional data analysis in brain imaging studies. Frontiers in Psychology,
1(35), 2010.
[42] V. Todorov. Robust Location and Scatter Estimation and Robust Multivariate Analy-
sis with High Breakdown Point, 2012. URL http://www.cran.r-project.org/web/
packages/rrcov.
[43] E. V ag o and S. Kemen ey. Logistic ridge regression for clinical data analysis (a case
study). Applied Ecology and Environmental Research, 4(2):171 { 179, 2006.
[44] M. Victoria-Feser. Robust logistic regression for binomial responses. 2000. URL http:
//dx.doi.org/10.2139/ssrn.1763301.
[45] J.T. Zhang and J. Chen. Statistical inferences for functional data. The Annals of
Statistics, 35:1052 { 1079, 2007.
80