E cient Rank Regression with Wavelets Estimated Scores
by
Eddy Armand Kwessi Nyandjou
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
August 6, 2011
Keywords: Regression, Wavelets, Scores
Copyright 2011 by Eddy Armand Kwessi Nyandjou
Approved by
Asheber Abebe, Chair, Associate Professor of Mathematics and Statistics
Geraldo De Souza, Professor of Mathematics and Statistics
Nedret Billor, Associate Professor of Mathematics and Statistics
Peng Zeng, Associate Professor of Mathematics and Statistics
Abstract
Wavelets have been widely used lately in many areas such as physics, astronomy, bi-
ological sciences and recently to statistics. The main goal of this dissertation is to provide
a new contribution to an important problem in statistics and particularly nonparametric
statistics, namely estimating the optimal score function from the data with unknown un-
derlying distribution. This problem naturally arises in nonparametric linear regression
models and could be important in order to have a better insight on more important and
actual problems in longitudinal and repeated measures analysis through mixed models.
Our approach in estimating the score function is to use suitable compactly supported
wavelets like the Daubechies, Symlets or Coi ets family of wavelets. The smoothness
and time-frequency properties of these wavelets allow us to nd an asymptotically e -
cient estimator of the slope parameter of the linear model. Consequently, we are also able
to provide a consistent estimator of the asymptotic variance of the regression parameter.
For related mixed models, asymptotic relative e ciency is also discussed.
ii
Acknowledgments
Without your blessings Almighty God, I could never have achieved this. Thank you
so much for the strength, the health, the knowledge you gave throughout these years and
simply throughout my life.
I would like to dedicate this dissertation to my late parents NYANDJOU Felix and
NGUEBA Marie. Even though they did not live long enough to see this achievement,
every step of my journey was made possible because of the sacri ces they put in raising
me and my brothers. I wish to express my immense gratitude to the following:
My wife Corine Michelle KWESSI for her patience, understanding and total support
and for taking care of our wonderful children Eddy Harold KWESSI KWESSI, Alexandra
Felixia KWESSI and Alyssa Francesca KWESSI.
Dr. Asheber Abebe who suggested and supervised this work. His understanding,
availability and encouragement throughout this dissertation will never be forgotten.
Dr. Geraldo De Souza for helping me understand the eld of Harmonic Analysis,
for mentoring me like a father, for being there when I needed him.
Dr. Nedret Billor for her valuable advices and for playing other wonderful and
immeasurable roles.
Drs. Michel Smith and Chris Rodger for their total support and encouragement
during my short stay at Auburn University as a graduate student.
All Professors who taught during the course work for the marvelous job done, par-
ticularly Drs. Peng Zeng, Mark Carpenter, Gary Gruenhage, Paul Schmidt, Amnon J.
Meir, Narenda Govil.
iii
Pr. Charles Chidume whose encouragement and support has made my journey to
the US possible.
My Brothers Blaise Tchepnda Nyandjou, Thierry Roland Nono Nyandjou, Sidoine
Raoul Youtcha Nyandjou, Blaise II Tchepnda, my sisters Pelagie Njinkeu Nyandjou,
Elysee Tchatchoua Nyandjou, Grace Signe Nyandjou, Christelle Nyandjou for all the
time we spent together and all the support they provided to me. My course mates, Guy-
vannie Miankokana, Achard Bindele, Guy Merlin Nguelefack, Sean O?Neil, Frank Sturm,
Daniel Roberts, Drs. Gerald Chidume, Julian Allagan, Ngwane Fidele, Paul Alfonso,
for the fun we had at Auburn. My Friends, Alain Desire Kounkeu, Ruben Bayemi, Vales
Ngoule, Moses Ntam, Drs. Foutse Khomh, Etienne Non, for their support.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Contribution of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Preliminaries: Review of Rank Based Analysis and Wavelet Theory . . . . . 4
2.1 Rank Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Estimation of regression parameters . . . . . . . . . . . . . . . . . 5
2.1.2 Tests of linear hypothesis . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Basic Wavelet Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Examples of wavelets . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Estimation of the Score Function . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 General Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Discussion of the assumptions . . . . . . . . . . . . . . . . . . . . 28
3.3 Estimation of the coe cients . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Estimation of the score function . . . . . . . . . . . . . . . . . . . . . . . 37
4 Estimation of the Slope Parameter . . . . . . . . . . . . . . . . . . . . . . . 39
v
5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1 Issues related to simulations . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Simple Mixed Models with Dependent Error Structure . . . . . . . . . . 45
5.3 Adequate space for score functions . . . . . . . . . . . . . . . . . . . . . 49
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
vi
List of Figures
2.1 Geometry of Rank Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Geometry of Rank Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Gibbs Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 The Shannon Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 The Mexican Hat Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 The Daubechies Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
vii
List of Tables
5.1 Comparison between the di erent methods . . . . . . . . . . . . . . . . . . . 45
viii
Chapter 1
Introduction
One of the most widely used models in statistical modeling is the linear regres-
sion model where a hyperplane that ?best? describes the relationship between a response
variable Y and a vector of covariates x is constructed. Typically, one has a set of data
measured on, say, n subjects (Y1;x1);:::;(Yn;xn) and the construction of the hyperplane
involves errors since not all the data points fall on a plane. These errors are assumed to
be random.
The classical approach to estimating the hyperplane is using the least squares (LS)
procedure where the hyperplane is taken to be the column space spanned by the columns
of the matrix (xT1;:::;xTn)T that is the closest in terms of the Euclidean distance to the
vector (Y1;:::;Yn)T. It turns out, by the Gauss-Markov Theorem, that the LS procedure
gives the model that is the best linear unbiased estimator (BLUE) if the errors have
expectation zero, are uncorrelated, and have equal variances. However, the LS procedure
is not robust in the presence of outliers and other violations of underlying assumptions.
There are a several number of approaches one can follow to achieve robustness. One
of the more popular techniques is the technique of M-estimation (Huber, 1964) which
includes the LS procedure as a subset. Another approach was the method of R-estimation
which is based on ranks of the data. This was initially proposed for the simple linear
model by Adichie (1967) based on simple Hodges-Lehmann type location estimators. This
was later generalized for the multiple regression model by Jureckova (1971) and Jaeckel
(1972). In subsequent works, Hettmansperger and McKean Naranjo and McKean (1997),
and colleagues developed methods for testing general linear hypotheses, construction of
1
con dence intervals, etc. to make R-estimation of linear models a complete treatment
(Hettmansperger and McKean, 1998). As a result, the approach is also known as the
Jaeckel-Hettmansperger-McKean approach of model tting (Hollander and Wolfe, 1999).
1.1 Contribution of the Dissertation
One of the di culties of using M- and R-estimators is that they depend on an
unknown function (a score function) that needs to be chosen by the investigator. This is
just about all the control the investigator has so it is a very critical activity. However, it
is usually not very clear which score functions to use. A common approach is to choose
the function on the basis of a heuristic investigation of a t based on a chosen (usually
simple such as linear) score function. Another common approach is to go for robustness
by sacri cing e ciency and use score functions that contain some form of trimming or
Winsorization.
It is of interest to choose score functions that maximize the e ciency of the resulting
estimator. The e ciency of an estimator depends on the underlying distribution of the
random error terms and this distribution is unknown. One approach is to use certain
density estimators (e.g. kernel density estimator) to estimate the score function and
use the estimated score function to estimate the regression parameters. Such estimators,
known as adaptive estimators, are discussed in Stone (1975) and Koul and Susarla (1983)
among others.
A di erent approach that uses the underlying structure of R-estimators to determine
the score function that maximizes e ciency was given by Dionne (1981) and Naranjo
and McKean (1997). They employed Fourier series approximation based on a rst order
Taylor expansion to determine the score function that maximizes the e ciency of the
R-estimator. However, this lacked the exibility that is required for certain underlying
2
distribution. This was especially evident at the two extremes of the domain of the score
function.
In this dissertation, we propose a Wavelet based approximation of the score function
based on a second order Taylor series approximation. As it turns out the second order
Taylor approximation is ideal under the smoothness considerations of the score function.
We will provide a theoretical investigation of the optimality of this approach as well as
establish the asymptotic properties of the resulting estimator. We will also consider the
mixed model and investigate how our approach can be used to estimate the xed-e ects
parameters. Moreover, we will take on the task of determining the function space that is
most suited for such Wavelet-based approximation. This leads us to consider the problem
with greater generality from the perspective of harmonic analysis.
1.2 Organization
This dissertation is organized as follows. Chapter 2 contains a brief review of rank
based analysis of linear models and wavelet theory. In Chapter 3, we consider the problem
of estimation of score functions and give the main results of the dissertation. Chapter 4
gives an adaptive estimator of the slope parameter as well as the scale parameter. These
are then used to construct Wald type tests for general linear hypotheses on the slope
parameter. Chapter 5 provides a brief discussion on some issues related to computations
and includes proposals to include dependent error structure as well as the spaces of score
functions that are suitable for wavelet approximation.
3
Chapter 2
Preliminaries: Review of Rank Based Analysis and Wavelet Theory
In this chapter, we review basic notions of rank based analysis for linear models and
basic facts about wavelet theory.
2.1 Rank Based Analysis
We consider the following model:
Yi = xTi +e i; 1 i n; (2.1.1)
where Yi denotes the ith response and xi denote a p 1 vector of explanatory variables,
is a p 1 vector of unknown regression parameters and e i is the ith random error with
distribution F.
Our interest is to estimate and to test linear hypotheses about it. Note that we
could also have a model with intercept parameter. Indeed, let = L(e i) be a location
functional and let ei = e i . Then L(ei) = 0 and the model can be written as
Yi = + xTi +ei; 1 i n; (2.1.2)
Remark 2.1.1. does not depend on the location functional used if F is a member of
the location family of distributions.
Proof. Consider any location functional L of the distribution of ei and let = L(F)
where F is the common distribution of the ei?s for 1 i n. Then e i has distribution
4
F (x) = F(x ) and L(F ) = 0. We then have that F(x) = F (x ), so L(F) = is
the location functional for xi.
Furthermore, Yi has a distribution H(x) = F(x (xTi + )). Thus L(H) = xTi +
is a location functional for Yi and consequently = (xixTi ) 1(L(H) L(F)).
De nition 2.1.2. Let Y = (Y1; ;Yn)T denote the vector of observations, let X denote
the n p matrix whose ith row is xTi and let e = (e1; ;en). The model in (2.1.2) can
be expressed as
Y = 1 + X +e; (2.1.3)
where 1 is a n 1 vector of ones, 1 is a vector of reals representing the intercept and
is a p 1 vector of unknown regression parameters.
Note that the model
Y = 1 + +e (2.1.4)
is called Coordinate Free Model, where = X 2 F with F being the column
space spanned by the columns of X.
In addition to estimating the parameter of interest , we will also be interested in
general linear tests of the form
H0 : M = 0 versus HA : M 6= 0; (2.1.5)
where M is q p matrix of full row rank.
2.1.1 Estimation of regression parameters
De nition 2.1.3. An operatorjj jj is called a pseudo-norm if it satis es the following
conditions
5
1. jjxjj 0 , for all x2Rn
2. jjxjj = 0 if and only if x1 = = xn
3. jj xjj =j jjjxjj , for all 2R;x2Rn
4. jjx+yjj jjxjj +jjyjj , for all x;y2Rn.
Consider the function
jjxjj =
nX
i=1
a(R(xi))xi; (2.1.6)
where x = (x1; ;xn) is a vector in Rn, the a(i) are called scores and are such that
a(1) a(n),
nX
i=1
a(i) = 0 and a(i) = a(n + 1 i), R(xi) is the rank of xi
among x1; ;xn.
Theorem 2.1.4. The function jj jj in (2.1.6) is a pseudo norm.
Proof. 1. Positivity.
Using the connection between ranks and order statistics, we can write
jjxjj =
nX
i=1
a(i)x(i):
Suppose that x(i0) is the last order statistics with a negative score. Since
nX
i=1
a(i) =
0, we have
jjxjj =
nX
i=1
a(i)(x(i) x(i0))
=
X
i i0
a(i)(x(i) x(i0)) +
X
i i0
a(i)(x(i) x(i0)):
Since both terms on the right are nonnegative, we have jjxjj 0.
6
2. Furthermore, ifjjxjj = 0, then both terms in the last equality must be zero. Since
a(1) < a(n) and a(1) = a(n), we must have a(1) < 0 and a(n) > 0. Therefore
the rst term on the right can be written as
X
1 i i0
a(i)(x(i) x(i0)):
Thus,
X
1 i i0
a(i)(x(i) x(i0)) = 0 implies x(1) = x(2) = = x(i0):
Likewise, we have x(i0) = x(i0+1) = = x(n).
This shows that jjxjj = 0 implies that x1 = = xn.
3. Homogeneity.
For some positive real , we know thatR( x(i)) = R(x(i)) andR( x(i)) = R(x(n+1 i)).
Clearly, for a positive real , one has:
jj xjj =
nX
i=1
a(R( xi)( xi)
=
nX
i=1
a(R(xi))xi
= j jjjxjj :
7
If < 0, then
jj xjj =
nX
i=1
a(R( xi))( xi)
=
nX
i=1
a(R( ( xi))( ( xi))
=
nX
i=1
a(R(xn+1 i))( xi)
= j j
nX
i=1
a(n+ 1 i)(x(i))
= j j
nX
i=1
a(i)x(i)
= j jjjxjj :
4. Triangle inequality.
jjx+yjj =
nX
i=1
a(R(xi +yi))(xi +yi)
=
nX
i=1
a(R(xi +yi))xi +
nX
i=1
a(R(xi +yi))yi
nX
i=1
a(i)x(i) +
nX
i=1
a(i)y(i) by Hardy?s Tauberian Theorem
= jjxjj +jjyjj :
We now suppose that the scores are generated as a(i) = h(i=(n + 1)) for some
nondecreasing function h de ned on the interval (0;1) and such that
Z 1
0
h(u)du = 0;
Z 1
0
h2(u)du<1:
8
Consider the model in (2.1.4). Rewriting the pseudo-norm above asjjxjjh =
nX
i=1
a(R(xi))xi,
a Rank-estimate for is a vector bYh such that
Dh( ) = Distance(Y; F) =jjY bYhjjh = min
2 F
jjY jjh:
The function Dh( ) is also called the dispersion function.
Once has been estimated, can be estimated by solving the equation X = bYh,
hence the Rank-estimate of is b R = (XTX) 1XTbYh. The intercept can be estimated
by a location estimate based on the residuals be = Y bYh. We could use the median of
residuals denoted by b S = med
i=1; ;n
fYi xib Rg: Geometrically, the Rank-estimate of is
a vector that minimizes the normed di erence between Y and F as shown in Figure 2.1.
?>
bYh
Y
jjY bYhjjh
F
Figure 2.1: Geometry of Rank Estimation
The following result justi es the existence of the rank estimation and is due to Jaekel
Jaeckel (1972).
9
Remark 2.1.5. The dispersion function Dh( ) is a continuous, convex, almost every-
where di erentiable function.
Proof. Continuity follows from the the inequalityjDh( 1) Dh( 1)j jj 1 2jjpjjxjjh,
for all x2Rn, for all 1; 2 2Rp:
Convexity follows from the equalities and inequality
Dh( 1 + (1 ) 2) = jjY X( 1 + (a 2))jjh
= jj Y + (1 )Y + 1 + (1 ) 2jj
Dh( 1) + (1 )Dh( 2); for some 2(0;1):
Di erentiability follows from the equality rDh( ) =
nX
i=1
xia(R(Yi xTi )):
Remark 2.1.6. The Rank-estimate b R of is location and scale equivariant, that is
b R(kY) = kb R(Y) and b R(Y + X ) = b R(Y) + , for k2R and for 2Rp.
Proof. Dh(b R(Y)) = Dist(Y; F) =jjY bYhjjh. Therefore,
Dh(b R(Y + X )) = Dist(Y + X ; F)
= jjY + X bZhjjh
= jjY (bZh X )jjh
= Dist(Y; F):
So we must have bZh = bYh + X : Thus,
b R(Y + X ) = (XTX) 1XT[bYh + X ]
= (XTX) 1XTbYh +
= b R(Y) + :
10
In the same way, for any k6= 0,
Dh(b R(kY)) = Dist(kY; F)
= jjkY cWhjjh
= jkjjjY 1kcWhjjh
= jkjDist(Y; F) =jkjDh(b R(Y):
So we must have 1kcWh = bYh:
Thus,
b R(kY) = (XTX) 1XTcWh
= (XTX) 1XT(kbYh)
= k(XTX) 1XT(bYh)
= kb R(Y):
As a consequence, without loss of generality, the theory we will develop will be
established under the assumption that the true is 0 for simplicity.
We end this section with the following theorem proved in the appendix, on the
asymptotic properties of the Rank-estimator b R of .
Theorem 2.1.7.
0
B@ b S
b R
1
CA N
p+1
0
B@
0
B@
1
CA;
2
64 n 1 2S 0
0 2h(XTX) 1
3
75
1
CA
where h and S will be de ned later.
11
2.1.2 Tests of linear hypothesis
Let?s consider the model (2.1.2). Note that Distance(Y; F) is the amount of residual
dispersion not accounted for in the model (2.1.2). Let RF be the subspace subject to H0,
that is, RF =f 2 F : = X ; for some such that M = 0g.
Clearly, RF is a subspace of F and since MX = 0 is a system of q equations with
p unknowns, then Dim( RF) = p q.
If bYRh is the Rank-estimate when the reduced model is tted, then the nonnegative
quantity
RDh = Dist(Y; RF) Dist(Y; F) (2.1.7)
represents the Reduction in residual dispersion when we pass from the reduced
model to the full model as shown in Figure 2.2.
Thus, large values of RDh indicate HA while small values support H0: If RDh is
standardized, the ensuing asymptotic distribution theory suggests that Fh = RDh=qb
h=2
should be compared with the F-critical values with q and n (p+ 1) degrees of freedom,
at least for small sample studies, where b h is a consistent estimator of h.
The rank analogue of Wald?s test for the full model is given by
Wh = (M
b R)[M(XTX) 1MT] 1(Mb R)=q
b 2h : (2.1.8)
It can be proved (Hettmansperger and McKean, 1983) that Wh has an asymptotic 2(q)
distribution.
12
*
*?
&
$
%
6O
F
z
Y
RF
bYRh
- Dist(Y; F)
- Dist(Y; RF)}
- bYh
Figure 2.2: Geometry of Rank Tests
2.2 Basic Wavelet Theory
In this section, we introduce the basics of wavelet and provide the motivation for
their invention.
2.2.1 A motivating example
The Fourier series of a square integrable function f can be obtained by dilating the
orthonormal basisfe ikxgk2Z. However, each element of the basisfe ik( )xgk2Z obtained
by dilation is a complex sinusoidal wave which is global in x, hence the Fourier coe cients
do not provide information on the local behavior of the function f. For instance, consider
13
the 2 -periodic function
f(x) =
8>
>>>
<
>>>
>:
=2; x2(0; )
0; x = 0
=2; x2( ;0):
After computation of the Fourier coe cients, we have
f(x) = 2
1X
n=1
sin(2n 1)x
2n 1 ; x2( ; ):
Clearly for any nite interval (a;b), the behavior of the function
floc(x) =
8
><
>:
f(x); x2(a;b)
0; otherwise
cannot be directly obtained via the Fourier coe cients since f is local in the areas ( ;0)
and (0; ).
Another shortcoming of the Fourier theory is the Gibbs? Phenomenon. Indeed,
consider the previous function f. Since 2
1X
n=1
sin(2n 1)x
2n 1
x=0
= 0, the Fourier series of
f also converges to f(x) at x = 0:
Let Sn(f;x) = 2
nX
k=1
sin(2k 1)x
2k 1 . The jump of amplitude of f(x) at x = 0 is
f(0+) f(0 ) = :
Moreover,
limn!1Sn
f; 2n
=
Z
0
sint
t dt 1:85193706
14
and
limn!1Sn
f; 2n
=
Z
0
sint
t dt 1:85193706:
Thus,
limn!1
Sn f;
2n
Sn
f; 2n
3:70387412
1:179 :
The latter means that the amplitude of Sn(f;x) around 0 is at least 1:179 multiple
of the jump of f at 0. This is the Gibbs? phenomenon as seen in Figure 2.3.
Finally, another shortcoming of the Fourier analysis that is worth mentioning is
the problem of convergence. Indeed, in 1873, Paul Du Bois-Reymond constructed a
2 -periodic function whose Fourier series diverges at a given point.
Therefore, there is a natural need for an orthogonal system for which the local behav-
ior of a function can be recognized from its coe cients, for which the Gibbs? Phenomenon
can be avoided or at least dealt with and for which the phenomenon discovered by Du
Bois-Reymond cannot happen. Wavelets provide an answer to those concerns.
De nition 2.2.1. A function is called a wavelet function iff2j=2 (2jx k)gj;k2Z is
an orthonormal basis of L2(R).
Therefore, any function f in L2(R) can uniquely be represented as
f(x) =
X
j2Z
X
k2Z
Cjk jk(x): (2.2.1)
Note that (2.2.1) is called the homogeneous wavelet expansion of the function
f. This is to say that there exists an inhomogeneous wavelet expansion for f de ned
15
Figure 2.3: Gibbs Phenomenon
as
f(x) =
X
k2Z
C0k 0k(x) +
1X
j=0
X
k2Z
Cjk jk(x); (2.2.2)
where and are respectively called "father wavelet or scaling function" and
"mother wavelet". Besides, the "mother wavelet" can be obtained from the "father
wavelet" through the relation (x) = p2Pk2Z k (2x k), where the k?s are some
carefully chosen coe cients. There are many examples of scaling functions.
16
2.2.2 Examples of wavelets
1. Haar Wavelet
It?s the very rst wavelet constructed. Its "father wavelet" and "mother wavelet"
are respectively de ned as
(x) =
8
><
>:
1; x2[0;1)
0; otherwise
(x) =
8
>>>
><
>>>
>:
1; x2[0;1=2)
1; x2[1=2;1)
0; otherwise:
The functions jk(x) = 2j=2 (2jx k); j;k2Z are called "daughters wavelets"
and are compactly locally supported in diadic interval In = [k2 j;(k+ 1)2 j]. The
left and right panels in Figure 2.4 respectively depict and for the Haar wavelet
system.
Figure 2.4: The Haar Wavelet
2. Shannon wavelet
Consider the space VSh0 =ff2L2(R) : Support(Ff)( ) [ ; ]g.
Then,
17
8f2VSh0 ;f(x) =
X
k2Z
f(k)sin (x k) (x k) : (2.2.3)
This result is also known as the Sampling Theorem (see Hong et al. (2005)) since
the function f can be recovered from its sample values f(k). It follows that the
Shannon "father wavelet" and "mother wavelet" are respectively given by
(x) = sin x x ; (x) = sin (x 1=2) sin 2 (x 1=2) (x 1=2) :
These are shown in Figure 2.5.
Figure 2.5: The Shannon Wavelet
3. Mexican Hat Wavelet
This is by far the most used wavelet in practice for its simplicity. Its name comes
from the resemblance of the graph of its "mother wavelet" to a Mexican hat. The
"father wavelet" and "mother wavelet" are given respectively by:
18
(x) = e x2; (x) = 2p3 1=4(1 x2)e x2:
These are shown in Figure 2.6.
Figure 2.6: The Mexican Hat Wavelet
4. Daubechies Wavelet
Ingrid Daubechies Daubechies. (1992) was the rst to introduce continuous com-
pactly supported wavelets. Indeed, she proved that there exists a scaling function
2L2(R) such that
(x) =p2
X
k2Z
hk (2x k);
where fhkgk2Z is a sequence of real numbers such that
hk =p2
Z
(x) (2x k)dx;
X
k2Z
jhkj2 <1;
where is the complex conjugate of . Daubechies wavelets are classi ed as DN
for N = 2; ;20 where N is even and represents the number of coe cients and
19
N=2 the number of vanishing moments. Vanishing moments denote the ability of
the wavelets to encode a polynomial function or a signal. For example, D2 has
one vanishing moment so it?s good for encoding constant signals. Note that D2
coincides with the Haar wavelet. It?s actually the only explicit Daubechies wavelet
since the others do not have scaling functions that can be expressed in a closed form.
Daubechies wavelets can also be de ned on any interval [a;b] and more information
can be found in Andersson et al. (1994) and Cohen et al. (1993).
These are shown in Figure 2.7.
Figure 2.7: The Daubechies Wavelet
20
We end this section by noting that one key di erence between wavelet approximation
and Fourier approximation is that the Fourier approximation uses one function (called
window function) that is translated over the interval of de nition whereas its wavelet
counterpart uses a function that is translated and dilated to adapt itself to the local
properties of the function.
21
Chapter 3
Estimation of the Score Function
3.1 Introduction
In this chapter, we introduce the problem to be solved and put it into its historical
context.
Consider the linear model
Yi = + xTi +e i 1 i n; (3.1.1)
where e1; ;en are independent random variables with distribution function F and
density f.
It?s known that the least squares estimate b LS of is the one that minimizes the
square distancejjY x jj2Rn but fails to be robust, in the sense that it?s very sensitive to
departure from normality and to perturbations. Robust alternatives to the least squares
method include the M-estimation and Z-estimation.
M-estimation consists of nding an estimate b M of that maximizes (hence the
term M-estimation) a criterion function Mn( ) = 1n
nX
i=1
m (Xi) where m : X 7! R
are known functions. Z-estimation consists of nding an estimate b Z of that almost
maximizes the criterion function Mn( ) or is one of its near Zeros (hence the term Z-
estimation). Note that popular choices for the criterion function include the so called
Huber function and the biweight function given respectively by
22
M(u) =
8
><
>:
1
2u
2; juj k
kjuj 12k2; juj>k
; M(u) =
8
><
>:
1
6
1 1 (u=k)2 3
; juj k
1
6k
2; juj>k; k2R:
Although the aforementioned methods solve the problem of robustness, they have
their own shortcomings. First, it may be hard to nd zeros of (Mn( ))0. Second, the
existence of zeros near the boundary of the parameter set may make the estimation
problem become ill-posed. Third, consistency and uniqueness are not guaranteed.
Consider the estimator b R of 2Rp that minimizes the dispersion function
Dh( ) =
nX
i=1
a(R(Yi xTi ))(Yi xTi ); (3.1.2)
where R(yi xTi ) is the rank of yi xTi among y1 xT1 ; ;yn xTn and a(1)
a(n) are some scores. The scores are usually chosen as a(j) = h(j=(n+ 1)), where
h : (0;1)!R+ is a nondecreasing score function.
De nition 3.1.1. The score function is the gradient with respect to some parameter
of the logarithm of the likelihood function, that is h(u) = @@ logf( ;u):
Examples of score functions
1. Sign score function: h(u) = sign(u 1=2):
2. Logistic score function: h(u) = 2u 1:
3. Normal score function: h(u) = 1(u) where is the standard normal distribution.
23
Remark 3.1.2. The mean of the score function given a parameter is zero, that is,
E(hj ) = 0. This entails that the variance of the score function I(f), which is called the
Fisher Information, is given by I(f) =
Z 1
1
f0(u)
f(u)
2
f(u)du.
Proof.
E(hj ) =
Z 1
1
@lnf(u; )
@ f(u; )du
=
Z 1
1
@f(u; )
@ du
= @@
Z 1
1
f(u; )du
= @1@ = 0:
De nition 3.1.3. Given a score function h, we de ne the scale parameter as h = 1=
where
=
Z 1
0
h(u)hF(u)du and hF(u) = f
0(F 1(u))
f(F 1(u)):
Suppose that an estimate Tn of T( ) based on n observations is such that as !1,
pn
(Tn T( )) N(0; 2( )): (3.1.3)
De nition 3.1.4. Given two estimators Tn ;1 and Tn ;2 of T( ) , let n ;1;n ;2 be the num-
ber of observations needed to meet (3.1.3). Then the (Pitman) Asymptotic Relative
E ciency(ARE) of the two estimators is de ned as
ARE(Tn ;2;Tn ;1) = lim !1n ;1n
;2
=
2
1( )
22( ): (3.1.4)
24
It follows from the above de nition that an estimator Tn ;1 of T( ) is said to be as
asymptotically relatively e cient as an estimator Tn ;2 of T( ) if ARE(Tn ;2;Tn ;1)!1:
Theorem 3.1 (Asymptotic Relative E ciency). If the score function h is such that
h(u) = hF(u), then the resulting R-estimate b R of in the linear model (3.1.1) is
asymptotically relatively as e cient as the least squares estimate b LS.
Proof. Since var(b LS) = 2(XTX) 1 where 2 is the variance of the underlying normal
distribution and var(b R) = 2h(XTX) 1 by theorem 2.1.7, we have
ARE(b R;b LS) =
2
2h =
2
Z 1
0
h(u)hF(u))du
2
= 2
p
I(f)
p
Var(hF) corr(h(u);hF(u)) 2
= 2I(f) corr(h(u);hF(u)) 2
So optimality is obtained when corr(h(u);hF(u)) = 1 that is, h(u) = hF(u).
In the sequel, we assume the scores are chosen so that a(j) = hF(j=(n+ 1)), where
hF(u) = f
0(F 1(u))
f(F 1(u)): (3.1.5)
In this case, the estimator b R of is asymptotically e cient, that is, it is as e cient as
the least squares estimator b LS. The choice of the least squares estimator for comparison
is that by Gauss-Markov theorem, it achieves the uniform minimum variance among all
linear unbiased estimators.
Lemma 3.1.5. If the Fisher information I(f) is nite, then hF 2L2(0;1).
25
Proof.
I(f) =
Z 1
1
f0(x)
f(x)
2
f(x)dx
=
Z 1
0
f0(F 1)(u)
f(F 1)(u)
2
du by change of variable u = F(x)
=
Z 1
0
h2F(u)du:
Hence, I(f) <1 implies hF 2L2(0;1):
Therefore, under I(f) <1, there exist coe cients Cjk, such that
hF(t) =
1X
j= 1
X
k2Z
Cjk jk(t); (3.1.6)
where f jkgj;k2Z; is an orthonormal system in L2(0;1) with jk(t) = 2j=2 (2j=2t k) for
some function and
Cjk =
Z 1
0
hF(s) jk(s)ds: (3.1.7)
An asymptotically e cient estimate of the coe cients Cjk will yield an asymptotically
e cient estimate of hF.
A common approach in rank regression is to x the score function apriori on the
basis of robustness or simplicity considerations. However, for e cient results, a good
approximation of hF based on an approximate knowledge of F from a sample is of some
value. To that end, Van Eeden (1970) proposed an asymptotically e cient estimate of
location parameters using an estimate of hF based on a subset of the data. Dionne (1981)
used a similar subset-based technique to develop estimators of linear model parameters.
Beran (1974), for the location model, and Naranjo and McKean (1997), for the linear
model, provided Fourier series estimators of hF based on the whole sample. A di erent
26
approach to aforementioned methods that uses density estimation and based on quantile
regression was proposed by Koenker and Basset (1978).
The estimator proposed in this dissertation also uses the whole sample to estimate
hF. Our approach di ers from that of Naranjo and McKean (1997) in that:
1. we develop estimates of hF that provide asymptotically e cient estimators b R
based on a large class of orthonormal basis in L2(0;1),
2. we develop estimates based on second order approximations (Beran (1974) and
Naranjo and McKean (1997) used rst order approximations),
3. we eliminate restrictive assumptions on the data such as those in assumption (A6)
of Naranjo and McKean (1997) by using second order approximations, and
4. we provide a consistent, wavelet-based, estimator of the asymptotic variance of the
estimator b R.
Zygmund (1945) pointed out that second di erences of functions are much more
useful than rst di erences in estimating smooth functions. This motivates our use of
second order approximations. Also, the use of the second derivative gives us expressions
of coe cients that are easier to manipulate than the ones in Naranjo and McKean (1997).
This allows us to avoid making restrictive distributional assumptions such as assumption
(A6) of Naranjo and McKean (1997) that asserts that the rst derivative of ( (F))0F 1
be bounded, where (t) = exp( 2 ikt) and k is an integer. This excludes a wide range
of distributions such as the normal and the logistic.
In the following, we provide an asymptotically e cient estimate of the score function.
We begin by laying out the assumptions and discussing their consequences .
27
3.2 General Assumptions
We will assume without loss of generality that = 0, = 0 in view of remark
2.1.6, and that the xi?s are centered to have mean 0 in (3.1.1). We assume the following
conditions:
(H1) has compact support in (0;1) and is three times di erentiable with bounded
derivatives.
(H2) 1pn max
1 i n
kxikp = o(1).
(H3) 1n
nX
i=1
kxik2p = O(1).
(H4) f is absolutely continuous with I(f) <1 and f
0
f monotone.
(H5) There exists a sequence fb ng in Rp such that pnb n = Op(1).
(H6) limn!1n 1XTX = .
3.2.1 Discussion of the assumptions
(H1) assumes that is the mother wavelet of a wavelet system with compact support.
There exist many such systems of wavelets satisfying (H1) among which the Daubechies
wavelets, Coi ets and Symlets . This assumption implies that the "daughter" wavelets
jk satisfy
(l)jk = O(22j) 8k2N and l = 0;1;2;3: (3.2.1)
(H2) and (H3) guarantee that we can apply the Lindeberg-Feller Central Limit
Theorem. To see how that?s possible, let?s recall the Lindeberg-Feller Central Limit
theorem.
28
Theorem 3.2.1 (Lindeberg-Feller Central Limit Theorem). Let ( ;F;P) be a probability
space and let xi : ! R; i 2 N be independent random variables de ned on that
probability space. Assume that E(xi) = i and Var(xi) exist and are nite. Let S2n =
nX
i=1
Var(xi). If
limn!1 1S2
n
nX
i=1
Z
fjxi ij> Sng
(xi i)2dP = 0 8 > 0;
then Zn =
Pn
i=1(xi i)
Sn converges in distribution as n!1 to the standard normal
distribution.
Indeed for any > 0, we have
Z
fjxi ij> Sng
(xi i)2dP max
1 i n
jxi ij2
Z
fjxi ij> Sng
dP:
Applying Tchebychev inequality to the integral on the right, we have
Z
fjxi ij> Sng
dP Var(xi) 2S2
n
:
Hence, by applying the latter to the Lebesgue integral over Rp and considering the xi?s
to be centered, we have
1
S2n
nX
i=1
Z
fjxi ij> Sng
(xi i)2dP 1 2
max
1 i n
jjxijj2p
Pn
i=1jjxijj2p
= o(1):
by (H2) and (H3). Thus, we have Lindeberg Condition
limn!1 1S2
n
nX
i=1
Z
fjxi ij> Sng
(xi i)2dP = 0; 8 > 0:
29
For practical applications, this assumption means that the contribution of any individual
random variable xi for 1 i n to the variance S2n is arbitrarily small, for su ciently
large values of n.
Assumption (H4) implies that f is uniformly bounded, uniformly continuous and
square integrable on R.
Uniform boundedness.
f(x) = jf(x)j=
Z x
1
f(t)dt
by absolute continuity
=
Z x
1
f0(t)p
f(t)
p
f(t)dt
sZ
x
1
f0(t)
pf(t)
2
dt
sZ
x
1
f(t)dt by Cauchy-Schwartz inequality
sZ
1
1
f0(t)
pf(t)
2
dt
=
p
I(f):
Uniform continuity.
jf(x) f(y)j =
Z y
x
f0(t)dt
sZ
y
x
f0(t)
pf(t)
2
dt
sZ
y
x
f(t)dt
p
I(f)
p
jF(x) F(y)j by the Mean Value Theorem
p
I(f)
p
f jx yj where is between x and y
I(f)3=4
p
jx yj by uniform boundedness.
Thus for any > 0,
jx yj<
2
[I(f)]3=2 )jf(x) f(y)j< :
30
Square integrability.
Z 1
1
f2(x)dx =
Z 1
1
Z x
1
f0(t)dt
f(x)dx
sZ
1
1
Z x
1
f0(t)dt
2sZ 1
1
f2(x)dx by Cauchy-Schwartz inequality
sZ
1
1
Z x
1
f0(t)
pf(t)
2
dt
sZ
1
1
Z x
1
f(t)dt
sZ
1
1
f2(x)dx
p
I(f)
sZ
1
1
f2(x)dx:
Thus, Z
1
1
f2(x)dx I(f):
Remark 3.2.2. Even if a functionf is uniformly bounded, uniformly continuous, positive
almost everywhere, it is not guaranteed that I(f) is nite. An example is given by the
function
f(x) =
8
>>>
>>>
><
>>>
>>>>
:
1 x
2 ; 0 x 1
x 2j+1
2j+2 ; 2j 1 x 2j
2j+1 x
2j+2 ; 2j x 2j + 1;j 1
f( x); x 0:
Note that there are numerous estimators that satisfy assumption (H5) including the
least squares estimator and the general rank estimator with a speci ed score function h,
see Jureckova (1971); Jaeckel (1972).
We end this discussion by noticing that (H6) requires the design matrix X to be
such that the sample sizes go to in nity at the same rate.
31
3.3 Estimation of the coe cients
The following lemma provides an alternative representation of the orthonormal basis
coe cients Cjk in the expansion of hF.
Lemma 3.3.1. Assume that (H1) and (H4) hold.Then
Z 1
0
hF(t) (t)dt =
Z 1
1
d2
dz2
h
F(z)
i
F(z)dz : (3.3.1)
Proof. Eq. (5) of Naranjo and McKean (1997) gives
Z 1
0
hF(t) (t)dt =
Z 1
1
d
dz [ (F(z))]dF(z) : (3.3.2)
Integrating by parts the right-hand side of equation (3.3.2), we have
Z 1
1
d
dz [ (F(z))]dF(z) =
F(z) ddz [ (F(z))]
1
1
Z 1
1
d2
dz2 [ (F(z))]F(z)dz :
We can write
F(z) ddz [ (F(z))]
1
1
= limz!1[F(z)f(z) 0(F(z)) F( z)f( z) 0(F( z))] :
But
limz!1F(z) = 1; limz! 1F(z) = 0
and
limz!1 0(F(z)) = 0(1) = 0 = limz! 1 0(F(z)) = 0(0); since is compactly supported:
32
The Lemma follows.
Remark 3.3.2. If we replace assumption (H1) in Lemma 3.3.1 by the assumption that
f is absolutely continuous and
Z 1
1
jf0(x)jdx<1, then (3.3.1) still holds. In fact, these
two conditions insure that both limz! 1f(z) and limz!1f(z) exist and are both equal to zero.
They are restrictive though since they require a well behaved source for the sample data.
In comparison, we have numerous functions satisfying (H1) such as the Daubechies
base functions, Symlets and coi ets.
In the remainder of the dissertation, for 2Rp, we will let Fn( ; ) represent the
empirical distribution function of y1 xT1 ;:::;yn xTn ; that is,
Fn(z; ) = 1n
nX
i=1
I(yi xTi z):
The following lemma is a combination of Lemma 1 and Lemma 2 of Naranjo and McKean
(1997) and gives the asymptotic linearity of Fn(w; n) for n converging to 0 at a suitable
rate. The proof follows from Section 2.3 of Koul (1992) and is given in the appendices.
Lemma 3.3.3. Assume (H1) (H5). Then
sup
z2R
pnjF
n(z;b n) F(z)j= Op(1) :
Characterizations of orthonormal basis systems of L2(0;1) can be found in Meyer
(1991), Cohen et al. (1993) and Andersson et al. (1994). If the scaling function ? satis es
conditions given in Theorem 9.6 of H ardle et al. (1998) (for example certain Daubechies
wavelets), then hF belongs to the Besov Space Bsq2 (R). Besov spaces can be characterized
using wavelet coe cients; thus, they are the natural spaces for wavelet estimation of
functions. Moreover, in some Besov spaces, wavelet coe cients decay faster than Fourier
coe cients. For instance, it is shown in Zygmund (2002) that if a function belongs to
33
the Zygmund space B111 (0;1), then its Fourier coe cients Cn are O(n 1). It was proved
in Meyer (1990) that the wavelet coe cients Wjk of such a function are O(2 3j=2).
We now start the estimation process of the score function hF. The strategy consists
of rst estimating the coe cients Cjk by some dCnjk in the expansion (3.1.6) of hF and
then use them to provide an estimate chnF of hF, for some xed n .
Let f ngn2N and fMngn2N be sequences of real numbers such that Mn = O(n );0 <
< 1=4 and Mnpn 2
n
! 0;Mn 2n! 0 as n!1. This means that n = O(n ) where =
=2 1=4 and 1=4 < < 0. Given a scaling function ? with corresponding \mother
wavelet" and apn-consistent estimator b n of , equation (3:3:1) in Lemma 3.3.1, using
second di erences, suggests an estimator
dCnjk := 1
2n
Z 1
1
h
2 jk Fn(z;b n) jk Fn(z+ n;b n) jk Fn(z n;b n)
i
Fn(z;b n)dz
of Cjk = R10 hF(s) jk(s)ds:
Remark 3.3.4. Note that dCnjk can be computed from the data as
dCnjk = 2
n n
nX
i=1
i
h
2 Fn(ei; ^ n) Fn(ei + n; ^ n) Fn(ei n; ^ n)
i
; (3.3.3)
where ei = yi xTi b n
Proof. In fact, let W(z) =
Z z
F(s)ds, where a zero of W. Then
Z 1
1
d2
dz2
h
(F(z))
i
F(z)dz =
nX
i=1
Z ei+ n
ei n
d2
dz2
h
(Fn(z; ^ n))
i
dW(z) : (3.3.4)
For n small enough so that there is only one ej between ei n and ei + n, namely
ei, an approximation of the opposite of the integral on the right-hand side of equation
34
(3.3.4) is
"
(Fn(ei + n;b n)) 2 (Fn(ei;b n)) + (Fn(ei n;b n))
2n
#
W(e
i + n) W(ei n)
=
h (Fn(ei + n;b
n)) 2 (Fn(ei;b n)) + (Fn(ei n;b n))
2n
i"Z ei+ n
ei n
Fn(z;b n)dz
#
= 2n
n
(F
n(ei + n;b n)) 2 (Fn(ei;b n)) + (Fn(ei n;b n))
h nX
j=1
I(ej i)
i
;
where i2[ei n;ei + n]. Since we can replace ei by their order statistics, the form of
dCnjk proposed in equation (3.3.3) follows.
The following lemma establishes the consistency of dCnjk.
Lemma 3.3.5. Suppose that (H1) (H5) are satis ed. Then jMn(dCnjk Cjk)j= op(1)
Proof. De ne gCnjk = R1 1 d2dz2
h
jk F(z)
i
Fn(z;b n)dz. Now
gCnjk = 1
2n
Z 1
1
[2 jk (F(z)) jk (F(z + n)) jk (F(z n))]Fn(z;b n)dz + O(Mn 2n) :
Taking the di erence dCnjk gCnjk and expanding jk(Fn) about jk(F), we have
Mn(dCnjk gCnjk) = 2Mn 2
n
pn
Z 1
1
hp
n(Fn(z;b n) F(z))
i
0jk( 1;n(z))Fn(z;b n)dz
Mn 2
n
pn
Z 1
1
hp
n(Fn(z + n;b n) F(z + n))
i
0jk( 2;n(z))Fn(z;b n)dz
Mn 2
n
pn
Z 1
1
hp
n(Fn(z n;b n) F(z n))
i
0jk( 3;n(z))Fn(z;b n)dz
+ O(Mn 2n) ;
where 1;n(z) is between Fn(z;b n) and F(z), 2;n(z) is between Fn(z+ n;b n) and F(z+
n), 3;n(z) is between Fn(z n;b n) and F(z n). Since 0jk and Fn are bounded with
35
respect to n and sup
z
jpn(Fn(z;b n) F(z))j= Op(1), we have
jMn(dCnjk gCnjk)j= Op(Mn= 2npn) +O(Mn 2n): (3.3.5)
On the other hand,
Mn(Cnjk gCnjk) = Mn 2
n
Z 1
1
( 2n)( jk(F))00(z)Fn(z;b n)dz
Z 1
1
( jk(F))00(z)F(z)dz
Mn 2
n
Z 1
1
3
n
6
[( jk(F))000( 1(z)) + ( jk(F))000( 2(z))]Fn(z;b n)dz
+ O(Mn 2n)
= Mn
Z 1
1
h
Fn(z;b n) F(z)
i
( jk(F))00(z)dz +Op(Mn n) +O(Mn 2n);;
where 1(z)2(z n;z) and 2(z)2(z;z + n).
Thus we have
Mn(Cnjk gCnjk) = Mnpn
Z 1
1
hp
n(Fn(z;b n) F(z))
i
( jk(F))00(z)dz +Op(Mn n) +O(Mn 2n) :
But Z
1
1
( jk(F))00(z)dz =
Z 1
1
f0(z) 00jk(F(z))dz +
Z 1
1
f2(z) 0jk(F(z))dz) :
The two integrals on the right are bounded since f is absolutely continuous, f2L2(R),
and jk has bounded derivatives. Therefore we have
jMn(Cnjk gCnjk)j= Op(Mn=pn) +Op(Mn n) +O(Mn 2n): (3.3.6)
Equations (3.3.5) and (3.3.6) imply that
jMn(Cnjk dCnjk)j= Op(Mn= 2npn) +Op(Mn=pn) +O(Mn 2n) = op(1) : (3.3.7)
36
Remark 3.3.6. In view of (3.2.1), this means that there is a constant L> 0 such that
jdCnjk Cjkj L2
2j
Mn ; 8k2N:
3.4 Estimation of the score function
Now de ne the wavelet estimator of hF as
chnF(t) =
j1X
j=0
X
k
dCnjk jk(t) ;
where j1 is some chosen resolution level in N[f0g. Note that since we are using compactly
supported wavelets, the sum over k contains only a nite number of terms for a given
value of t (see Remark 10.1 on p. 127 of H ardle et al. (1998)).
Theorem 3.4.1. Under (H1) (H5), we have EkhF chnFk22 = o(1).
Proof. Let
hF(t) =
j1X
j=0
X
k
dCjk jk(t) +X
j>j1
X
k
Cjk jk(t) ;
where the convergence is absolute in L2(0;1). Thus
EkhF chnFk22 2E
0
@
Z 1
0
j1X
j=0
X
k
(Cjk dCnjk) jk(t)
2
dt
1
A+2E
0
@
Z 1
0
X
j>j1
X
k
Cjk jk(t)
2
dt
1
A :
The second term on the right is o(1) by absolute convergence in L2(0;1):
37
For the rst term, we have
Z 1
0
j1X
j=0
X
k
(Cjk dCnjk) jk(t)
2
dt
Z 1
0
j1X
j=0
(j1 + 1)
X
k
jCjk dCnjkjj jk(t)j
!2
dt:
But there is a positive constant L such thatjCjk dCnjkj L22jMn (see Remark 3.3.6). Thus
Z 1
0
j1X
j=0
X
k
(Cjk dCnjk) jk(t)
2
dt L2(j1 + 1)
j1X
j=0
24j
M2n
Z 1
0
X
k
j jk(t)j
!2
dt:
Since, by Proposition 8.3 of H ardle et al. (1998), the integral on the right hand side
is uniformly bounded in j, there is a constant C > 0 such that
Z 1
0
j1X
j=0
X
k
(Cjk dCnjk) jk(t)
2
dt CM2
n
(j1 + 1)
j1X
j=0
24j LM2
n
(j1 + 1)24j1+1:
Choosing j1 such that (j1 + 1)24j1+1
P
Z 1
0
chnF(t) hF(t)
2dt> ;
which is bounded by 1EkhF chnFk22 by Markov?s inequality. The desired result follows
from Theorem 3.4.1.
One may estimate (c nh) 1 using numerical integration methods (such as Gaussian
quadrature) using some grid on (0;1) since the chnF(t) can be computed for any given
t2(0;1).
41
As one application, consider testing the general linear hypothesis
H0 : M = 0 versus H1 : M 6= 0 ;
where M is a q p matrix of full row rank forming linear constraints. Under H0, the
quantity
BM =
Mb R
0
[M(X0X) 1M0] 1
Mb R
q(c nh)2 :
is asymptotically 2(q) by Theorem 4.0.3, Theorem 4.0.4, and Slutsky?s Lemma. Thus a
level- Wald test rejects H0 if BM exceeds the (1 ) quantile of the 2(q) distribution.
42
Chapter 5
Discussion
In this dissertation, we developed an asymptotically e cient rank estimator based
on score functions estimated using wavelets. A consistent estimator for is given for the
asymptotic variance of the rank estimator. This can be used in constructing Wald tests
of general linear hypotheses.
5.1 Issues related to simulations
We consider the logistic density function f(x) = e
x
(1 +e x)2 Its cumulative density
function F is given by F(x) = 11+e x and its score function for the logistic distribution is
given by: hF(u) = 2u 1;u2(0;1): This is density function satis es all the assumptions
(H1) (H5). We then generate N = 10;15;75;100 random points from this distribu-
tion. We will apply respectively the the classical Fourier approach denoted by (CF), the
rst order estimate with Fourier basis functions denoted by (FOF) proposed by Naranjo
and McKean to estimate its score function. Our approach that uses the second order
estimations and compact supported wavelets is denoted by (SOW). Our method, even
though theoretical results show it is superior to both approaches in terms of exibility
and precision has a shortcoming of its own. It?s very di cult to apply with the current
algorithms. The main reason for this complexity is the implicit nature of the compactly
supported wavelets used. An idea of the complexity of the use of these wavelets can be
seem in the cascade algorithm by Mallat (1989) which is the most used in applications
of compactly supported wavelets. Indeed this algorithm by Mallat (1989) requires to
43
have a sample of points from the function to be estimated whereas our method does not
require such a sample, but is still very complex. We are working on an algorithm that
will address this issue in the future. The Figure 5.1 below show the issues related to the
Naranjo and McKean?s approach. Indeed the continuous curve for di erent values of N
seems to have less precision that the other one with + symbols. The dotted line is the
original score function.
Figure 5.1: Simulations
The Table 5.1 below represents a comparison of the di erent methods to approximate
score functions in terms of convergence, exibility, applicability, Gibbs phenomenon,
precision.
44
CF FOF SOW
Convergence Yes Yes Yes
Flexibility No No Yes
Applicability Good Good No yet
Gibbs No Yes No
Precision Good Fair Best
Table 5.1: Comparison between the di erent methods
5.2 Simple Mixed Models with Dependent Error Structure
In our treatment we assumed that the errors are independent and identically dis-
tributed with a cdf F that has pdf f. Estimating the score function to maximize e ciency
is generally a very di cult problem for dependent error models. In simple dependent
data problems, however, this may be tractable using some of the methodology developed
earlier.
Consider the model
Yk = 1nk + XTk +ek 1 k m N =
mX
k=1
nk: (5.2.1)
Suppose that ek = 1nkbk+"k where the "k are iid and bk is a continuous random variable,
independent of "k. We also assume the random e ects b1; ;bm are iid. Then the errors
ek are exchangeable with the same marginal distribution F.
Then the asymptotic R-estimate of is
pN^
R = hN(X
TX) 1UN( ) +op(1);
and from Brunner and Denker (1994), it follows that
b R Np( ;Vh)
45
where
Vh = 2h(XTX) 1
mX
k=1
XTkCh;kXk
(XTX) 1
and
Ch;k = (1 h)Ink + hJnk h = covfh(G(e11));h(G(e12))g
where Ink and Jnk are respectively the nk nk identity matrix and the nk nk matrix
of ones. Assuming that X is centered, the asymptotic relative e ciency of the rank
estimator versus the least squares estimator is given by
ARE(b R;b LS) =
2(1 )
2h(1 h) ;
where
= Corr("1;"2) 2 = Var("1) h = Corr[h(F("1));h(F("2))];
and
1h =
Z 1
0
h(u)
f
0(F 1(u))
f(F 1(u))
du:
We would like to nd h that maximizes the ARE. This amounts to minimizing h and
maximizing h. However, the function that minimizes h does not necessarily maximize
h. Analytically nding h that would simultaneously do both is a di cult problem in
calculus of variations.
Simple case: the random error vector has a multivariate normal distribution
Kloke and McKean-2009 proved that if h(u) =p2(u 1=2) (Wilcoxon score), then:
ARE(b R;b LS) = 12 2
Z
f2(t)dt
2 (1 )
(1 h);
46
where 2 is the variance of the underlying normal distribution, is the intra-class corre-
lation and h is the Spearman Correlation within each class. Hence
1. ARE(b R;b LS)2[0:8660;0:9549] if 0 < < 1
2. ARE(b R;b LS)2[0:9549;0:9662] if 1 < < 0.
However, for general score function , estimating the optimal score function is the
following problem in calculus of variations:
bh = Sup
h
2
h
1 h
= Sup
h
(
R
1
0 h(u)
f0(F 1(u))
f(F 1(u))
du 2
1 RRR2 h(F(x))h(F(y))f(x;y)dxdy
)
(5.2.2)
The wavelet method developed earlier provides an approximation of the minimizer of h.
The maximizer of h can also be found using the techniques in this dissertation since
the wavelet basis of L2(R2) can be given as a product of wavelet basis in L2(R). Indeed,
Consider the optimization problem
Sup
h
Z Z
R2
h(F(x))h(F(y))f(x;y)dxdy (5.2.3)
and let J(h) =
Z Z
R2
h(F(x))h(F(y))f(x;y)dxdy 2
Z
R
g(y)h(F(y))dy where g is some
continuous function on R.
Then for any integrable function 6= 0,
J(h+ ) J(h) =
Z
R
(F(y))
Z
R
f(x;y)h(F(x))dx
dy
+
Z
R
(F(x))
Z
R
f(x;y)h(F(y))dy
dx
+ 2
Z Z
R2
f(x;y) (F(x)) (F(y))dxdy
+ 2
Z
R
g(y)h(F(y))dy 2
Z
R
g(y) (F(y))dy:
47
Thus, if f is a symmetric function of x and y, we have
J(h+ ) J(h)
= 2
Z
R
(F(y))
Z
R
f(x;y)h(F(x))dx g(y)
dy
+
Z Z
R2
f(x;y) (F(x)) (F(y))dxdy:
Hence,
lim !0 J(h+ ) J(h) = 0)
Z
R
f(x;y)h(F(x))dx = g(y):
Therefore, if f 2L2(R2), then (5.2.3) becomes the Homogeneous Fredholm Integral
Equation
Z
R
h(F(x))f(x;y)dx = g(y); for some continuous function g: (5.2.4)
1. If f(x;y) = k(y x), for some function k, then (5.2.4) has a solution
h(F(x)) =
Z 1
1
Fy[g(y)](u)
Fy[k(y x)](u)e
2i uydu
where F is the Fourier transform of F.
2. In general, the solution to (5.2.4) can be written as
h(x) =
1X
i=1
ai li(x)
where ai is a decreasing sequence of reals, hi;li are basis functions (Could be
wavelets) in L2(R) and <;> is the scalar product in L2(R) .
Thus, we have two approximations of h. The compromise is to use the estimated
score in one in the estimation of the other. This could be iterated until the di erence
between the two approximations is below a speci ed level of tolerance. Either one or
48
the average of the two approximations can be taken as the nal approximation of h as
suggested by the algorithm below.
X Get residuals from an initial t. Use them to get an estimate of F, say ^F0:
X Use wavelets and ^F0 to estimate the maximizer of h, say ^h F:
X Fit model using ^h F and get new residuals and a new estimate of F, say ^F :
X Use ^F to estimate a maximizer of h, say ^h F:
X Fit model using ^h F and get new residuals and a new estimate of F, say ^F F:
X Set ^F0 = ^F F and go back to step 2 until jj
^h F ^h Fjj2
1
2(jj
^h Fjj2 +jj^h Fjj) < : Otherwise stop.
X Take
^h F + ^h F
2 as an estimate of h.
5.3 Adequate space for score functions
In the previous sections, it was assumed that the score function belongs to the space
of square integrable functions. Though this property of the score function is guaranteed
if its Fisher information is nite, it is worth mentioning that this space if very "big". In
fact, pinning down the adequate space where score functions could be approximated by
wavelets is of some value. The space of square integrable functions is contained in the
space of continuous functions which are nice for practical applications but rare in reality.
This space contains the Sobolev space which is nice for theory but also rare in reality.
So the compromise could be a space lying in between the continuous functions and the
Sobolev space, and contained in the space of square integrable functions. The Besov
space is such a space. Besov spaces can completely be characterized in terms of wavelets
in the sense that any function in this space has a wavelet decomposition and any function
49
decomposition in terms of wavelets coe cients entails a function belonging to a Besov
space. As any space, Besov space have "nice" and "bad" functions. By "bad" function,
we mean an analytic function that cannot be continued outside its disk of convergence.
They are also called lacunary functions. Though the results we found were a partial
answer to the problem of adequate space for score function, the techniques used made it
worthwhile.
Characterization of lacunary function in Bergman-Besov-Lipchitz Spaces
The space B has been studied at length by various authors for various purposes.
This space rst appears in its simplest form in De Souza (1980) where it was denoted
by B1. This was later generalized to a weighted version B in De Souza (1985) and
De Souza (1983 ).It was shown in these papers that B is the boundary value of those
functions F for which
Z 1
0
Z 2
0
jF0(rei )j(1 r)1q 1d dr < 1 for the weight function
(t) = t1=q. It was shown in Bloom and De Souza (1989) that B , for a general weight
function , is a real characterization of analytic functions in the unit disc for whichZ
1
0
Z 2
0
jF0(rei )j (1 r)1 r d dr<1, generalizing the results obtained in De Souza (1985)
and De Souza (1983 ).The main result here is the analytic characterization of lacunary
functions in the spaces B for belonging to a class S of weights satisfying some condi-
tions that will be stated in the sequel.
Preliminaries
De nition 5.3.1. Lacunary functions are analytical functions F(z) =
1X
k=1
akznk for
which = inf
k
nk+1
nk > 1.
50
De nition 5.3.2. We de ne
B =
F : D!D;F analytic and
Z 1
0
Z 2
0
jF0(rei )j (1 r)1 r d dr<1
and
b =
8
<
:F : D!D;F(z) =
1X
n=0
anzn;
1X
n=0
2nK(n; )
X
k2In
jakj2
!1=2
<1
9
=
; ;
where In =fk2N : 2n 1 k< 2ng.
Note that B and b are endowed with norms kFkB = R10 R2 0 jF0(rei )j (1 r)1 r d dr
and kFkb = P1n=0 2nK(n; ) Pk2Injakj2 1=2, respectively.
Notation: If (t) = t1=q, q 1, in De nition 5.3.2, then we denote B by Bq and b by
bq.
De nition 5.3.3. We say the weight function : [0;1]![0;1) belongs to the class S
if (0) = 0, is nondecreasing, and there are positive constants C1;C2;K(n; ) satisfying
Z 1
0
r2n 1 1 (1 r)1 r dr C1K(n; ); 8n 1 (5.3.1)
and Z
1 2 n
1 2 (n 1)
r2n 1 (1 r)1 r dr C2K(n; ); 8n 2 : (5.3.2)
Hereafter c and C denote generic positive constants and when there is no ambiguity,
we shall name all constants by c and C. Similar weight function classes can be found
in Mateljevi c and Pavlovi c (1984) where they were used to characterize weighted Hardy
spaces.
Lemma 5.3.4. The class S is not empty.
51
Proof. Consider the family of weights de ned by U = f : (t) = t1=q;1 q <1g so
that (1 r)1 r = (1 r)1q 1. Clearly (0) = 0 and is a nondecreasing function of t so
that the weight function satis es the conditions stated in Bloom and De Souza (1989).
We will show that U S.
Take 2U. Using a result of Alzer (2001) and the facts that 1=q 1 and 2n 1
we have
Z 1
0
r2(n 1) 1(1 r)1q 1dr = B
2(n 1); 1q
1q 12(n 1)
2 2 n=q; n 1; (5.3.3)
where B( ; ) is the beta function de ned by B(x;y) = R10 tx 1(1 t)y 1dt:
Also for 1 2 (n 1) r 1 2 n, we have 2n 12(1 n)=q (1 r)1q 1 2n2 (n 1)=q.
Thus, since 21=q > 1 and 0 r 1, we obtain
Z 1 2 n
1 2 (n 1)
r2n 1(1 r)1q 1dr 2n 12 n=q
Z 1 2 n
1 2 (n 1)
r2n 1dr
2n 12 n=q
Z 1 2 n
1 2 (n 1)
r2ndr
= 2
n 12 n=q
2n + 1
(1 2 n)2n+1 (1 2 (n 1))2n+1
= 2
n 12 n=q
2n + 1 (2
(n 1) 2 n)
2nX
k=0
(1 2 n)2n k(1 2 (n 1))k
52
But (1 2 n)2n 1(1 2 (n 1)) (1 2 (n 1))2n. So for 0 k 2n, (1 2 n)2n k(1
2 (n 1))k (1 2 (n 1))2n. Thus
2nX
k=0
(1 2 n)2n k(1 2 (n 1))k (2n + 1)(1 2 (n 1))2n :
Therefore
Z 1 2 n
1 2 (n 1)
r2n 1(1 r)1q 1rdr 2n 12 n=q2 n(1 2 (n 1))2n
2 52 n=q; n 2: (5.3.4)
Taking K(n; ) = 2 n=q, (5.3.3) and (5.3.4) imply that 2S.
Theorem 5.3.5. Suppose 2S . Then b is a Banach space. Moreover for any function
F(z) = P1n=1anzn belonging to B , there is a constant C > 0 such that
kFkB CkFkb :
If F(z) = P1k=1 akznk is lacunary and belongs to B , then there is a constant c> 0 such
that
kFkB ckFkb :
Proof. We will rst show that b is a Banach space. First we show that b is a linear
space. Let and be two complex numbers and let f;g2b . By Minkowski?s inequality
we have
X
k2In
j ak + bkj2
!1=2
j j
X
k2In
jakj2
!1=2
+j j
X
k2In
jbkj2
!1=2
:
Thus f + g2b .
53
Now we show that b is complete. To that end let fFsgs2N be a Cauchy sequence
in (b ;k kb ), where Fs(z) = P1n=0asnzn. We will show that there is some F 2b such
that Fs !F in b . Given > 0, there is some S 2N such that for s;t S we have
kFs Ftkb < ; that is,
1X
n=0
2nK(n; )
X
k2In
jask atkj2
!1=2
<
for s;t S. This implies that for all n2N,
X
k2In
jask atkj2 <
2
2nK(n; )
for s;t S. Therefore for all n 2 N;fasngs2N is a Cauchy sequence in R, a complete
metric space. This implies that for all n2N there exists some an2R such that asn!an
in R. Now let F(z) = P1n=0anzn. We shall show that F 2b and that Fs !F in b .
To that end, we will prove that F Fs2b and use the linearity of b to conclude that
(F Fs) +Fs = F 2b :
Given > 0, there exists S2N such that
1X
n=0
2nK(n; )
X
k2In
jatk askj2
!1=2
<
whenever s;t S. Now let M > 0 be arbitrary. Then for s;t S
MX
n=0
2nK(n; )
X
k2In
jatk askj2
!1=2
< :
54
Fixing s S and letting t!1, we have
limt!1
MX
n=0
2nK(n; )
X
k2In
jatk askj2
!1=2
=
MX
n=0
2nK(n; )
X
k2In
jak atkj2
!1=2
< :
M being arbitrary, this shows that for a given > 0, there exists S2N such that
1X
n=0
2nK(n; )
X
k2In
jak askj2
!1=2
<
whenever s S. That is F Fs2bp. This also proves that given > 0,kF Fskb < ,
for s S and hence Fs!F in b .
Let us now prove that there are constants C;c> 0 such that
kFkB CkFkb
and, for lacunary functions,
kFkB ckFkb :
Let
J =
Z 1
0
Z 2
0
1
2 jF
0(rei )j (1 r)
1 r d dr:
Note that F0(rei ) = P1n=1nanrn 1ein . By the Cauchy-Schwarz inequality and Parse-
val?s identity we have
J C
Z 1
0
1X
n=1
jnanrn 1j2
!1=2
(1 r)
1 r dr:
55
But
Z 1
0
1X
n=1
jnanrn 1j2
!1=2
(1 r)
1 r dr =
Z 1
0
1X
n=1
X
k2In
k2jakj2r2(k 1)
!1=2
(1 r)
1 r dr
and since 0 0
limn!1B 2n
nX
k=1
E[Z 2nkI(jZ nkj> Bn)] = 0: (5.3.6)
Since B2n converges to a positive constant, we need only to show that the sum converges
to 0. By de nition of Z nk and Cauchy-Schwartz Inequality, we have
jZ nkj n 1=2jt1j+N 1=2jtT2 xkjjh(F(Yk))j:
Hence
I(jZ nkj> Bn) I(n 1=2jt1j+n 1=2jtT2 xkjjh(F(Yk))j> Bn):
By Cauchy-Schwartz inequality,
n 1=2jtT2 xkj n 1=2jjxkjjnjjtjjp+1
=
n 1
pX
j=1
x2kj
1=2
jjtjjp+1
pmax
j
n 1x2kj
1=2
jjtjjp+1
63
Let Kn =
pmax
j
n 1x2kj
1=2
. Kn is independent of k and converges to 0. Therefore we
have
I
jf(F(Yk))j> Bn n
1=2t1
Kn
I(n 1=2jt1j+n 1=2jtT2 xkjjh(F(Yk))j> Bn)
Finally, we also have:
nX
k=1
E
Z nkI
jf(F(Yk))j> Bn n
1=2t1
Kn
= t1E
I
jf(F(Y1))j> Bn n
1=2t1
Kn
+
(2=n)E
sgn(Y1)h(F(Y1))I
jf(F(Y1))j> Bn n
1=2t1
Kn
tT2
nX
k=1
xk +
E
h2(F(Y1))I
jf(F(Y1))j> Bn n
1=2t1
Kn
(1=n)
nX
k=1
(tT2 xk)2:
Note that the sum in (5.3.6) is less than or equal to the expression above. The design
matrix being centered, the middle term on the right side is 0.
In the expression Bn n
1=2t1
Kn ; the numerator converges to a positive constant as the
denominator converges to 0; hence the expression goes to 1. Since h is bounded, the
indicator converges to 0. Using again the boundedness of h, the limit and the expectation
can be interchanged by the Lebesgue Dominated Convergence Theorem. This show that
(5.3.6) is true and hence Z n converges in distribution to a univariate normal distribution.
Therefore, Tn converges to a multivariate normal distribution.
64