E cient Rank Regression with Wavelets Estimated Scores by Eddy Armand Kwessi Nyandjou A dissertation submitted to the Graduate Faculty of Auburn University in partial ful llment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 6, 2011 Keywords: Regression, Wavelets, Scores Copyright 2011 by Eddy Armand Kwessi Nyandjou Approved by Asheber Abebe, Chair, Associate Professor of Mathematics and Statistics Geraldo De Souza, Professor of Mathematics and Statistics Nedret Billor, Associate Professor of Mathematics and Statistics Peng Zeng, Associate Professor of Mathematics and Statistics Abstract Wavelets have been widely used lately in many areas such as physics, astronomy, bi- ological sciences and recently to statistics. The main goal of this dissertation is to provide a new contribution to an important problem in statistics and particularly nonparametric statistics, namely estimating the optimal score function from the data with unknown un- derlying distribution. This problem naturally arises in nonparametric linear regression models and could be important in order to have a better insight on more important and actual problems in longitudinal and repeated measures analysis through mixed models. Our approach in estimating the score function is to use suitable compactly supported wavelets like the Daubechies, Symlets or Coi ets family of wavelets. The smoothness and time-frequency properties of these wavelets allow us to nd an asymptotically e - cient estimator of the slope parameter of the linear model. Consequently, we are also able to provide a consistent estimator of the asymptotic variance of the regression parameter. For related mixed models, asymptotic relative e ciency is also discussed. ii Acknowledgments Without your blessings Almighty God, I could never have achieved this. Thank you so much for the strength, the health, the knowledge you gave throughout these years and simply throughout my life. I would like to dedicate this dissertation to my late parents NYANDJOU Felix and NGUEBA Marie. Even though they did not live long enough to see this achievement, every step of my journey was made possible because of the sacri ces they put in raising me and my brothers. I wish to express my immense gratitude to the following: My wife Corine Michelle KWESSI for her patience, understanding and total support and for taking care of our wonderful children Eddy Harold KWESSI KWESSI, Alexandra Felixia KWESSI and Alyssa Francesca KWESSI. Dr. Asheber Abebe who suggested and supervised this work. His understanding, availability and encouragement throughout this dissertation will never be forgotten. Dr. Geraldo De Souza for helping me understand the eld of Harmonic Analysis, for mentoring me like a father, for being there when I needed him. Dr. Nedret Billor for her valuable advices and for playing other wonderful and immeasurable roles. Drs. Michel Smith and Chris Rodger for their total support and encouragement during my short stay at Auburn University as a graduate student. All Professors who taught during the course work for the marvelous job done, par- ticularly Drs. Peng Zeng, Mark Carpenter, Gary Gruenhage, Paul Schmidt, Amnon J. Meir, Narenda Govil. iii Pr. Charles Chidume whose encouragement and support has made my journey to the US possible. My Brothers Blaise Tchepnda Nyandjou, Thierry Roland Nono Nyandjou, Sidoine Raoul Youtcha Nyandjou, Blaise II Tchepnda, my sisters Pelagie Njinkeu Nyandjou, Elysee Tchatchoua Nyandjou, Grace Signe Nyandjou, Christelle Nyandjou for all the time we spent together and all the support they provided to me. My course mates, Guy- vannie Miankokana, Achard Bindele, Guy Merlin Nguelefack, Sean O?Neil, Frank Sturm, Daniel Roberts, Drs. Gerald Chidume, Julian Allagan, Ngwane Fidele, Paul Alfonso, for the fun we had at Auburn. My Friends, Alain Desire Kounkeu, Ruben Bayemi, Vales Ngoule, Moses Ntam, Drs. Foutse Khomh, Etienne Non, for their support. iv Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Contribution of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Preliminaries: Review of Rank Based Analysis and Wavelet Theory . . . . . 4 2.1 Rank Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Estimation of regression parameters . . . . . . . . . . . . . . . . . 5 2.1.2 Tests of linear hypothesis . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Basic Wavelet Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Examples of wavelets . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Estimation of the Score Function . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 General Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Discussion of the assumptions . . . . . . . . . . . . . . . . . . . . 28 3.3 Estimation of the coe cients . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Estimation of the score function . . . . . . . . . . . . . . . . . . . . . . . 37 4 Estimation of the Slope Parameter . . . . . . . . . . . . . . . . . . . . . . . 39 v 5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.1 Issues related to simulations . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2 Simple Mixed Models with Dependent Error Structure . . . . . . . . . . 45 5.3 Adequate space for score functions . . . . . . . . . . . . . . . . . . . . . 49 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 vi List of Figures 2.1 Geometry of Rank Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Geometry of Rank Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Gibbs Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 The Shannon Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 The Mexican Hat Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7 The Daubechies Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 vii List of Tables 5.1 Comparison between the di erent methods . . . . . . . . . . . . . . . . . . . 45 viii Chapter 1 Introduction One of the most widely used models in statistical modeling is the linear regres- sion model where a hyperplane that ?best? describes the relationship between a response variable Y and a vector of covariates x is constructed. Typically, one has a set of data measured on, say, n subjects (Y1;x1);:::;(Yn;xn) and the construction of the hyperplane involves errors since not all the data points fall on a plane. These errors are assumed to be random. The classical approach to estimating the hyperplane is using the least squares (LS) procedure where the hyperplane is taken to be the column space spanned by the columns of the matrix (xT1;:::;xTn)T that is the closest in terms of the Euclidean distance to the vector (Y1;:::;Yn)T. It turns out, by the Gauss-Markov Theorem, that the LS procedure gives the model that is the best linear unbiased estimator (BLUE) if the errors have expectation zero, are uncorrelated, and have equal variances. However, the LS procedure is not robust in the presence of outliers and other violations of underlying assumptions. There are a several number of approaches one can follow to achieve robustness. One of the more popular techniques is the technique of M-estimation (Huber, 1964) which includes the LS procedure as a subset. Another approach was the method of R-estimation which is based on ranks of the data. This was initially proposed for the simple linear model by Adichie (1967) based on simple Hodges-Lehmann type location estimators. This was later generalized for the multiple regression model by Jureckova (1971) and Jaeckel (1972). In subsequent works, Hettmansperger and McKean Naranjo and McKean (1997), and colleagues developed methods for testing general linear hypotheses, construction of 1 con dence intervals, etc. to make R-estimation of linear models a complete treatment (Hettmansperger and McKean, 1998). As a result, the approach is also known as the Jaeckel-Hettmansperger-McKean approach of model tting (Hollander and Wolfe, 1999). 1.1 Contribution of the Dissertation One of the di culties of using M- and R-estimators is that they depend on an unknown function (a score function) that needs to be chosen by the investigator. This is just about all the control the investigator has so it is a very critical activity. However, it is usually not very clear which score functions to use. A common approach is to choose the function on the basis of a heuristic investigation of a t based on a chosen (usually simple such as linear) score function. Another common approach is to go for robustness by sacri cing e ciency and use score functions that contain some form of trimming or Winsorization. It is of interest to choose score functions that maximize the e ciency of the resulting estimator. The e ciency of an estimator depends on the underlying distribution of the random error terms and this distribution is unknown. One approach is to use certain density estimators (e.g. kernel density estimator) to estimate the score function and use the estimated score function to estimate the regression parameters. Such estimators, known as adaptive estimators, are discussed in Stone (1975) and Koul and Susarla (1983) among others. A di erent approach that uses the underlying structure of R-estimators to determine the score function that maximizes e ciency was given by Dionne (1981) and Naranjo and McKean (1997). They employed Fourier series approximation based on a rst order Taylor expansion to determine the score function that maximizes the e ciency of the R-estimator. However, this lacked the exibility that is required for certain underlying 2 distribution. This was especially evident at the two extremes of the domain of the score function. In this dissertation, we propose a Wavelet based approximation of the score function based on a second order Taylor series approximation. As it turns out the second order Taylor approximation is ideal under the smoothness considerations of the score function. We will provide a theoretical investigation of the optimality of this approach as well as establish the asymptotic properties of the resulting estimator. We will also consider the mixed model and investigate how our approach can be used to estimate the xed-e ects parameters. Moreover, we will take on the task of determining the function space that is most suited for such Wavelet-based approximation. This leads us to consider the problem with greater generality from the perspective of harmonic analysis. 1.2 Organization This dissertation is organized as follows. Chapter 2 contains a brief review of rank based analysis of linear models and wavelet theory. In Chapter 3, we consider the problem of estimation of score functions and give the main results of the dissertation. Chapter 4 gives an adaptive estimator of the slope parameter as well as the scale parameter. These are then used to construct Wald type tests for general linear hypotheses on the slope parameter. Chapter 5 provides a brief discussion on some issues related to computations and includes proposals to include dependent error structure as well as the spaces of score functions that are suitable for wavelet approximation. 3 Chapter 2 Preliminaries: Review of Rank Based Analysis and Wavelet Theory In this chapter, we review basic notions of rank based analysis for linear models and basic facts about wavelet theory. 2.1 Rank Based Analysis We consider the following model: Yi = xTi +e i; 1 i n; (2.1.1) where Yi denotes the ith response and xi denote a p 1 vector of explanatory variables, is a p 1 vector of unknown regression parameters and e i is the ith random error with distribution F. Our interest is to estimate and to test linear hypotheses about it. Note that we could also have a model with intercept parameter. Indeed, let = L(e i) be a location functional and let ei = e i . Then L(ei) = 0 and the model can be written as Yi = + xTi +ei; 1 i n; (2.1.2) Remark 2.1.1. does not depend on the location functional used if F is a member of the location family of distributions. Proof. Consider any location functional L of the distribution of ei and let = L(F) where F is the common distribution of the ei?s for 1 i n. Then e i has distribution 4 F (x) = F(x ) and L(F ) = 0. We then have that F(x) = F (x ), so L(F) = is the location functional for xi. Furthermore, Yi has a distribution H(x) = F(x (xTi + )). Thus L(H) = xTi + is a location functional for Yi and consequently = (xixTi ) 1(L(H) L(F)). De nition 2.1.2. Let Y = (Y1; ;Yn)T denote the vector of observations, let X denote the n p matrix whose ith row is xTi and let e = (e1; ;en). The model in (2.1.2) can be expressed as Y = 1 + X +e; (2.1.3) where 1 is a n 1 vector of ones, 1 is a vector of reals representing the intercept and is a p 1 vector of unknown regression parameters. Note that the model Y = 1 + +e (2.1.4) is called Coordinate Free Model, where = X 2 F with F being the column space spanned by the columns of X. In addition to estimating the parameter of interest , we will also be interested in general linear tests of the form H0 : M = 0 versus HA : M 6= 0; (2.1.5) where M is q p matrix of full row rank. 2.1.1 Estimation of regression parameters De nition 2.1.3. An operatorjj jj is called a pseudo-norm if it satis es the following conditions 5 1. jjxjj 0 , for all x2Rn 2. jjxjj = 0 if and only if x1 = = xn 3. jj xjj =j jjjxjj , for all 2R;x2Rn 4. jjx+yjj jjxjj +jjyjj , for all x;y2Rn. Consider the function jjxjj = nX i=1 a(R(xi))xi; (2.1.6) where x = (x1; ;xn) is a vector in Rn, the a(i) are called scores and are such that a(1) a(n), nX i=1 a(i) = 0 and a(i) = a(n + 1 i), R(xi) is the rank of xi among x1; ;xn. Theorem 2.1.4. The function jj jj in (2.1.6) is a pseudo norm. Proof. 1. Positivity. Using the connection between ranks and order statistics, we can write jjxjj = nX i=1 a(i)x(i): Suppose that x(i0) is the last order statistics with a negative score. Since nX i=1 a(i) = 0, we have jjxjj = nX i=1 a(i)(x(i) x(i0)) = X i i0 a(i)(x(i) x(i0)) + X i i0 a(i)(x(i) x(i0)): Since both terms on the right are nonnegative, we have jjxjj 0. 6 2. Furthermore, ifjjxjj = 0, then both terms in the last equality must be zero. Since a(1) < a(n) and a(1) = a(n), we must have a(1) < 0 and a(n) > 0. Therefore the rst term on the right can be written as X 1 i i0 a(i)(x(i) x(i0)): Thus, X 1 i i0 a(i)(x(i) x(i0)) = 0 implies x(1) = x(2) = = x(i0): Likewise, we have x(i0) = x(i0+1) = = x(n). This shows that jjxjj = 0 implies that x1 = = xn. 3. Homogeneity. For some positive real , we know thatR( x(i)) = R(x(i)) andR( x(i)) = R(x(n+1 i)). Clearly, for a positive real , one has: jj xjj = nX i=1 a(R( xi)( xi) = nX i=1 a(R(xi))xi = j jjjxjj : 7 If < 0, then jj xjj = nX i=1 a(R( xi))( xi) = nX i=1 a(R( ( xi))( ( xi)) = nX i=1 a(R(xn+1 i))( xi) = j j nX i=1 a(n+ 1 i)(x(i)) = j j nX i=1 a(i)x(i) = j jjjxjj : 4. Triangle inequality. jjx+yjj = nX i=1 a(R(xi +yi))(xi +yi) = nX i=1 a(R(xi +yi))xi + nX i=1 a(R(xi +yi))yi nX i=1 a(i)x(i) + nX i=1 a(i)y(i) by Hardy?s Tauberian Theorem = jjxjj +jjyjj : We now suppose that the scores are generated as a(i) = h(i=(n + 1)) for some nondecreasing function h de ned on the interval (0;1) and such that Z 1 0 h(u)du = 0; Z 1 0 h2(u)du<1: 8 Consider the model in (2.1.4). Rewriting the pseudo-norm above asjjxjjh = nX i=1 a(R(xi))xi, a Rank-estimate for is a vector bYh such that Dh( ) = Distance(Y; F) =jjY bYhjjh = min 2 F jjY jjh: The function Dh( ) is also called the dispersion function. Once has been estimated, can be estimated by solving the equation X = bYh, hence the Rank-estimate of is b R = (XTX) 1XTbYh. The intercept can be estimated by a location estimate based on the residuals be = Y bYh. We could use the median of residuals denoted by b S = med i=1; ;n fYi xib Rg: Geometrically, the Rank-estimate of is a vector that minimizes the normed di erence between Y and F as shown in Figure 2.1. ?> bYh Y jjY bYhjjh F Figure 2.1: Geometry of Rank Estimation The following result justi es the existence of the rank estimation and is due to Jaekel Jaeckel (1972). 9 Remark 2.1.5. The dispersion function Dh( ) is a continuous, convex, almost every- where di erentiable function. Proof. Continuity follows from the the inequalityjDh( 1) Dh( 1)j jj 1 2jjpjjxjjh, for all x2Rn, for all 1; 2 2Rp: Convexity follows from the equalities and inequality Dh( 1 + (1 ) 2) = jjY X( 1 + (a 2))jjh = jj Y + (1 )Y + 1 + (1 ) 2jj Dh( 1) + (1 )Dh( 2); for some 2(0;1): Di erentiability follows from the equality rDh( ) = nX i=1 xia(R(Yi xTi )): Remark 2.1.6. The Rank-estimate b R of is location and scale equivariant, that is b R(kY) = kb R(Y) and b R(Y + X ) = b R(Y) + , for k2R and for 2Rp. Proof. Dh(b R(Y)) = Dist(Y; F) =jjY bYhjjh. Therefore, Dh(b R(Y + X )) = Dist(Y + X ; F) = jjY + X bZhjjh = jjY (bZh X )jjh = Dist(Y; F): So we must have bZh = bYh + X : Thus, b R(Y + X ) = (XTX) 1XT[bYh + X ] = (XTX) 1XTbYh + = b R(Y) + : 10 In the same way, for any k6= 0, Dh(b R(kY)) = Dist(kY; F) = jjkY cWhjjh = jkjjjY 1kcWhjjh = jkjDist(Y; F) =jkjDh(b R(Y): So we must have 1kcWh = bYh: Thus, b R(kY) = (XTX) 1XTcWh = (XTX) 1XT(kbYh) = k(XTX) 1XT(bYh) = kb R(Y): As a consequence, without loss of generality, the theory we will develop will be established under the assumption that the true is 0 for simplicity. We end this section with the following theorem proved in the appendix, on the asymptotic properties of the Rank-estimator b R of . Theorem 2.1.7. 0 B@ b S b R 1 CA N p+1 0 B@ 0 B@ 1 CA; 2 64 n 1 2S 0 0 2h(XTX) 1 3 75 1 CA where h and S will be de ned later. 11 2.1.2 Tests of linear hypothesis Let?s consider the model (2.1.2). Note that Distance(Y; F) is the amount of residual dispersion not accounted for in the model (2.1.2). Let RF be the subspace subject to H0, that is, RF =f 2 F : = X ; for some such that M = 0g. Clearly, RF is a subspace of F and since MX = 0 is a system of q equations with p unknowns, then Dim( RF) = p q. If bYRh is the Rank-estimate when the reduced model is tted, then the nonnegative quantity RDh = Dist(Y; RF) Dist(Y; F) (2.1.7) represents the Reduction in residual dispersion when we pass from the reduced model to the full model as shown in Figure 2.2. Thus, large values of RDh indicate HA while small values support H0: If RDh is standardized, the ensuing asymptotic distribution theory suggests that Fh = RDh=qb h=2 should be compared with the F-critical values with q and n (p+ 1) degrees of freedom, at least for small sample studies, where b h is a consistent estimator of h. The rank analogue of Wald?s test for the full model is given by Wh = (M b R)[M(XTX) 1MT] 1(Mb R)=q b 2h : (2.1.8) It can be proved (Hettmansperger and McKean, 1983) that Wh has an asymptotic 2(q) distribution. 12 * *? & $ % 6O F z Y RF bYRh - Dist(Y; F) - Dist(Y; RF)} - bYh Figure 2.2: Geometry of Rank Tests 2.2 Basic Wavelet Theory In this section, we introduce the basics of wavelet and provide the motivation for their invention. 2.2.1 A motivating example The Fourier series of a square integrable function f can be obtained by dilating the orthonormal basisfe ikxgk2Z. However, each element of the basisfe ik( )xgk2Z obtained by dilation is a complex sinusoidal wave which is global in x, hence the Fourier coe cients do not provide information on the local behavior of the function f. For instance, consider 13 the 2 -periodic function f(x) = 8> >>> < >>> >: =2; x2(0; ) 0; x = 0 =2; x2( ;0): After computation of the Fourier coe cients, we have f(x) = 2 1X n=1 sin(2n 1)x 2n 1 ; x2( ; ): Clearly for any nite interval (a;b), the behavior of the function floc(x) = 8 >< >: f(x); x2(a;b) 0; otherwise cannot be directly obtained via the Fourier coe cients since f is local in the areas ( ;0) and (0; ). Another shortcoming of the Fourier theory is the Gibbs? Phenomenon. Indeed, consider the previous function f. Since 2 1X n=1 sin(2n 1)x 2n 1 x=0 = 0, the Fourier series of f also converges to f(x) at x = 0: Let Sn(f;x) = 2 nX k=1 sin(2k 1)x 2k 1 . The jump of amplitude of f(x) at x = 0 is f(0+) f(0 ) = : Moreover, limn!1Sn f; 2n = Z 0 sint t dt 1:85193706 14 and limn!1Sn f; 2n = Z 0 sint t dt 1:85193706: Thus, limn!1 Sn f; 2n Sn f; 2n 3:70387412 1:179 : The latter means that the amplitude of Sn(f;x) around 0 is at least 1:179 multiple of the jump of f at 0. This is the Gibbs? phenomenon as seen in Figure 2.3. Finally, another shortcoming of the Fourier analysis that is worth mentioning is the problem of convergence. Indeed, in 1873, Paul Du Bois-Reymond constructed a 2 -periodic function whose Fourier series diverges at a given point. Therefore, there is a natural need for an orthogonal system for which the local behav- ior of a function can be recognized from its coe cients, for which the Gibbs? Phenomenon can be avoided or at least dealt with and for which the phenomenon discovered by Du Bois-Reymond cannot happen. Wavelets provide an answer to those concerns. De nition 2.2.1. A function is called a wavelet function iff2j=2 (2jx k)gj;k2Z is an orthonormal basis of L2(R). Therefore, any function f in L2(R) can uniquely be represented as f(x) = X j2Z X k2Z Cjk jk(x): (2.2.1) Note that (2.2.1) is called the homogeneous wavelet expansion of the function f. This is to say that there exists an inhomogeneous wavelet expansion for f de ned 15 Figure 2.3: Gibbs Phenomenon as f(x) = X k2Z C0k 0k(x) + 1X j=0 X k2Z Cjk jk(x); (2.2.2) where and are respectively called "father wavelet or scaling function" and "mother wavelet". Besides, the "mother wavelet" can be obtained from the "father wavelet" through the relation (x) = p2Pk2Z k (2x k), where the k?s are some carefully chosen coe cients. There are many examples of scaling functions. 16 2.2.2 Examples of wavelets 1. Haar Wavelet It?s the very rst wavelet constructed. Its "father wavelet" and "mother wavelet" are respectively de ned as (x) = 8 >< >: 1; x2[0;1) 0; otherwise (x) = 8 >>> >< >>> >: 1; x2[0;1=2) 1; x2[1=2;1) 0; otherwise: The functions jk(x) = 2j=2 (2jx k); j;k2Z are called "daughters wavelets" and are compactly locally supported in diadic interval In = [k2 j;(k+ 1)2 j]. The left and right panels in Figure 2.4 respectively depict and for the Haar wavelet system. Figure 2.4: The Haar Wavelet 2. Shannon wavelet Consider the space VSh0 =ff2L2(R) : Support(Ff)( ) [ ; ]g. Then, 17 8f2VSh0 ;f(x) = X k2Z f(k)sin (x k) (x k) : (2.2.3) This result is also known as the Sampling Theorem (see Hong et al. (2005)) since the function f can be recovered from its sample values f(k). It follows that the Shannon "father wavelet" and "mother wavelet" are respectively given by (x) = sin x x ; (x) = sin (x 1=2) sin 2 (x 1=2) (x 1=2) : These are shown in Figure 2.5. Figure 2.5: The Shannon Wavelet 3. Mexican Hat Wavelet This is by far the most used wavelet in practice for its simplicity. Its name comes from the resemblance of the graph of its "mother wavelet" to a Mexican hat. The "father wavelet" and "mother wavelet" are given respectively by: 18 (x) = e x2; (x) = 2p3 1=4(1 x2)e x2: These are shown in Figure 2.6. Figure 2.6: The Mexican Hat Wavelet 4. Daubechies Wavelet Ingrid Daubechies Daubechies. (1992) was the rst to introduce continuous com- pactly supported wavelets. Indeed, she proved that there exists a scaling function 2L2(R) such that (x) =p2 X k2Z hk (2x k); where fhkgk2Z is a sequence of real numbers such that hk =p2 Z (x) (2x k)dx; X k2Z jhkj2 <1; where is the complex conjugate of . Daubechies wavelets are classi ed as DN for N = 2; ;20 where N is even and represents the number of coe cients and 19 N=2 the number of vanishing moments. Vanishing moments denote the ability of the wavelets to encode a polynomial function or a signal. For example, D2 has one vanishing moment so it?s good for encoding constant signals. Note that D2 coincides with the Haar wavelet. It?s actually the only explicit Daubechies wavelet since the others do not have scaling functions that can be expressed in a closed form. Daubechies wavelets can also be de ned on any interval [a;b] and more information can be found in Andersson et al. (1994) and Cohen et al. (1993). These are shown in Figure 2.7. Figure 2.7: The Daubechies Wavelet 20 We end this section by noting that one key di erence between wavelet approximation and Fourier approximation is that the Fourier approximation uses one function (called window function) that is translated over the interval of de nition whereas its wavelet counterpart uses a function that is translated and dilated to adapt itself to the local properties of the function. 21 Chapter 3 Estimation of the Score Function 3.1 Introduction In this chapter, we introduce the problem to be solved and put it into its historical context. Consider the linear model Yi = + xTi +e i 1 i n; (3.1.1) where e1; ;en are independent random variables with distribution function F and density f. It?s known that the least squares estimate b LS of is the one that minimizes the square distancejjY x jj2Rn but fails to be robust, in the sense that it?s very sensitive to departure from normality and to perturbations. Robust alternatives to the least squares method include the M-estimation and Z-estimation. M-estimation consists of nding an estimate b M of that maximizes (hence the term M-estimation) a criterion function Mn( ) = 1n nX i=1 m (Xi) where m : X 7! R are known functions. Z-estimation consists of nding an estimate b Z of that almost maximizes the criterion function Mn( ) or is one of its near Zeros (hence the term Z- estimation). Note that popular choices for the criterion function include the so called Huber function and the biweight function given respectively by 22 M(u) = 8 >< >: 1 2u 2; juj k kjuj 12k2; juj>k ; M(u) = 8 >< >: 1 6 1 1 (u=k)2 3 ; juj k 1 6k 2; juj>k; k2R: Although the aforementioned methods solve the problem of robustness, they have their own shortcomings. First, it may be hard to nd zeros of (Mn( ))0. Second, the existence of zeros near the boundary of the parameter set may make the estimation problem become ill-posed. Third, consistency and uniqueness are not guaranteed. Consider the estimator b R of 2Rp that minimizes the dispersion function Dh( ) = nX i=1 a(R(Yi xTi ))(Yi xTi ); (3.1.2) where R(yi xTi ) is the rank of yi xTi among y1 xT1 ; ;yn xTn and a(1) a(n) are some scores. The scores are usually chosen as a(j) = h(j=(n+ 1)), where h : (0;1)!R+ is a nondecreasing score function. De nition 3.1.1. The score function is the gradient with respect to some parameter of the logarithm of the likelihood function, that is h(u) = @@ logf( ;u): Examples of score functions 1. Sign score function: h(u) = sign(u 1=2): 2. Logistic score function: h(u) = 2u 1: 3. Normal score function: h(u) = 1(u) where is the standard normal distribution. 23 Remark 3.1.2. The mean of the score function given a parameter is zero, that is, E(hj ) = 0. This entails that the variance of the score function I(f), which is called the Fisher Information, is given by I(f) = Z 1 1 f0(u) f(u) 2 f(u)du. Proof. E(hj ) = Z 1 1 @lnf(u; ) @ f(u; )du = Z 1 1 @f(u; ) @ du = @@ Z 1 1 f(u; )du = @1@ = 0: De nition 3.1.3. Given a score function h, we de ne the scale parameter as h = 1= where = Z 1 0 h(u)hF(u)du and hF(u) = f 0(F 1(u)) f(F 1(u)): Suppose that an estimate Tn of T( ) based on n observations is such that as !1, pn (Tn T( )) N(0; 2( )): (3.1.3) De nition 3.1.4. Given two estimators Tn ;1 and Tn ;2 of T( ) , let n ;1;n ;2 be the num- ber of observations needed to meet (3.1.3). Then the (Pitman) Asymptotic Relative E ciency(ARE) of the two estimators is de ned as ARE(Tn ;2;Tn ;1) = lim !1n ;1n ;2 = 2 1( ) 22( ): (3.1.4) 24 It follows from the above de nition that an estimator Tn ;1 of T( ) is said to be as asymptotically relatively e cient as an estimator Tn ;2 of T( ) if ARE(Tn ;2;Tn ;1)!1: Theorem 3.1 (Asymptotic Relative E ciency). If the score function h is such that h(u) = hF(u), then the resulting R-estimate b R of in the linear model (3.1.1) is asymptotically relatively as e cient as the least squares estimate b LS. Proof. Since var(b LS) = 2(XTX) 1 where 2 is the variance of the underlying normal distribution and var(b R) = 2h(XTX) 1 by theorem 2.1.7, we have ARE(b R;b LS) = 2 2h = 2 Z 1 0 h(u)hF(u))du 2 = 2 p I(f) p Var(hF) corr(h(u);hF(u)) 2 = 2I(f) corr(h(u);hF(u)) 2 So optimality is obtained when corr(h(u);hF(u)) = 1 that is, h(u) = hF(u). In the sequel, we assume the scores are chosen so that a(j) = hF(j=(n+ 1)), where hF(u) = f 0(F 1(u)) f(F 1(u)): (3.1.5) In this case, the estimator b R of is asymptotically e cient, that is, it is as e cient as the least squares estimator b LS. The choice of the least squares estimator for comparison is that by Gauss-Markov theorem, it achieves the uniform minimum variance among all linear unbiased estimators. Lemma 3.1.5. If the Fisher information I(f) is nite, then hF 2L2(0;1). 25 Proof. I(f) = Z 1 1 f0(x) f(x) 2 f(x)dx = Z 1 0 f0(F 1)(u) f(F 1)(u) 2 du by change of variable u = F(x) = Z 1 0 h2F(u)du: Hence, I(f) <1 implies hF 2L2(0;1): Therefore, under I(f) <1, there exist coe cients Cjk, such that hF(t) = 1X j= 1 X k2Z Cjk jk(t); (3.1.6) where f jkgj;k2Z; is an orthonormal system in L2(0;1) with jk(t) = 2j=2 (2j=2t k) for some function and Cjk = Z 1 0 hF(s) jk(s)ds: (3.1.7) An asymptotically e cient estimate of the coe cients Cjk will yield an asymptotically e cient estimate of hF. A common approach in rank regression is to x the score function apriori on the basis of robustness or simplicity considerations. However, for e cient results, a good approximation of hF based on an approximate knowledge of F from a sample is of some value. To that end, Van Eeden (1970) proposed an asymptotically e cient estimate of location parameters using an estimate of hF based on a subset of the data. Dionne (1981) used a similar subset-based technique to develop estimators of linear model parameters. Beran (1974), for the location model, and Naranjo and McKean (1997), for the linear model, provided Fourier series estimators of hF based on the whole sample. A di erent 26 approach to aforementioned methods that uses density estimation and based on quantile regression was proposed by Koenker and Basset (1978). The estimator proposed in this dissertation also uses the whole sample to estimate hF. Our approach di ers from that of Naranjo and McKean (1997) in that: 1. we develop estimates of hF that provide asymptotically e cient estimators b R based on a large class of orthonormal basis in L2(0;1), 2. we develop estimates based on second order approximations (Beran (1974) and Naranjo and McKean (1997) used rst order approximations), 3. we eliminate restrictive assumptions on the data such as those in assumption (A6) of Naranjo and McKean (1997) by using second order approximations, and 4. we provide a consistent, wavelet-based, estimator of the asymptotic variance of the estimator b R. Zygmund (1945) pointed out that second di erences of functions are much more useful than rst di erences in estimating smooth functions. This motivates our use of second order approximations. Also, the use of the second derivative gives us expressions of coe cients that are easier to manipulate than the ones in Naranjo and McKean (1997). This allows us to avoid making restrictive distributional assumptions such as assumption (A6) of Naranjo and McKean (1997) that asserts that the rst derivative of ( (F))0F 1 be bounded, where (t) = exp( 2 ikt) and k is an integer. This excludes a wide range of distributions such as the normal and the logistic. In the following, we provide an asymptotically e cient estimate of the score function. We begin by laying out the assumptions and discussing their consequences . 27 3.2 General Assumptions We will assume without loss of generality that = 0, = 0 in view of remark 2.1.6, and that the xi?s are centered to have mean 0 in (3.1.1). We assume the following conditions: (H1) has compact support in (0;1) and is three times di erentiable with bounded derivatives. (H2) 1pn max 1 i n kxikp = o(1). (H3) 1n nX i=1 kxik2p = O(1). (H4) f is absolutely continuous with I(f) <1 and f 0 f monotone. (H5) There exists a sequence fb ng in Rp such that pnb n = Op(1). (H6) limn!1n 1XTX = . 3.2.1 Discussion of the assumptions (H1) assumes that is the mother wavelet of a wavelet system with compact support. There exist many such systems of wavelets satisfying (H1) among which the Daubechies wavelets, Coi ets and Symlets . This assumption implies that the "daughter" wavelets jk satisfy (l)jk = O(22j) 8k2N and l = 0;1;2;3: (3.2.1) (H2) and (H3) guarantee that we can apply the Lindeberg-Feller Central Limit Theorem. To see how that?s possible, let?s recall the Lindeberg-Feller Central Limit theorem. 28 Theorem 3.2.1 (Lindeberg-Feller Central Limit Theorem). Let ( ;F;P) be a probability space and let xi : ! R; i 2 N be independent random variables de ned on that probability space. Assume that E(xi) = i and Var(xi) exist and are nite. Let S2n = nX i=1 Var(xi). If limn!1 1S2 n nX i=1 Z fjxi ij> Sng (xi i)2dP = 0 8 > 0; then Zn = Pn i=1(xi i) Sn converges in distribution as n!1 to the standard normal distribution. Indeed for any > 0, we have Z fjxi ij> Sng (xi i)2dP max 1 i n jxi ij2 Z fjxi ij> Sng dP: Applying Tchebychev inequality to the integral on the right, we have Z fjxi ij> Sng dP Var(xi) 2S2 n : Hence, by applying the latter to the Lebesgue integral over Rp and considering the xi?s to be centered, we have 1 S2n nX i=1 Z fjxi ij> Sng (xi i)2dP 1 2 max 1 i n jjxijj2p Pn i=1jjxijj2p = o(1): by (H2) and (H3). Thus, we have Lindeberg Condition limn!1 1S2 n nX i=1 Z fjxi ij> Sng (xi i)2dP = 0; 8 > 0: 29 For practical applications, this assumption means that the contribution of any individual random variable xi for 1 i n to the variance S2n is arbitrarily small, for su ciently large values of n. Assumption (H4) implies that f is uniformly bounded, uniformly continuous and square integrable on R. Uniform boundedness. f(x) = jf(x)j= Z x 1 f(t)dt by absolute continuity = Z x 1 f0(t)p f(t) p f(t)dt sZ x 1 f0(t) pf(t) 2 dt sZ x 1 f(t)dt by Cauchy-Schwartz inequality sZ 1 1 f0(t) pf(t) 2 dt = p I(f): Uniform continuity. jf(x) f(y)j = Z y x f0(t)dt sZ y x f0(t) pf(t) 2 dt sZ y x f(t)dt p I(f) p jF(x) F(y)j by the Mean Value Theorem p I(f) p f jx yj where is between x and y I(f)3=4 p jx yj by uniform boundedness. Thus for any > 0, jx yj< 2 [I(f)]3=2 )jf(x) f(y)j< : 30 Square integrability. Z 1 1 f2(x)dx = Z 1 1 Z x 1 f0(t)dt f(x)dx sZ 1 1 Z x 1 f0(t)dt 2sZ 1 1 f2(x)dx by Cauchy-Schwartz inequality sZ 1 1 Z x 1 f0(t) pf(t) 2 dt sZ 1 1 Z x 1 f(t)dt sZ 1 1 f2(x)dx p I(f) sZ 1 1 f2(x)dx: Thus, Z 1 1 f2(x)dx I(f): Remark 3.2.2. Even if a functionf is uniformly bounded, uniformly continuous, positive almost everywhere, it is not guaranteed that I(f) is nite. An example is given by the function f(x) = 8 >>> >>> >< >>> >>>> : 1 x 2 ; 0 x 1 x 2j+1 2j+2 ; 2j 1 x 2j 2j+1 x 2j+2 ; 2j x 2j + 1;j 1 f( x); x 0: Note that there are numerous estimators that satisfy assumption (H5) including the least squares estimator and the general rank estimator with a speci ed score function h, see Jureckova (1971); Jaeckel (1972). We end this discussion by noticing that (H6) requires the design matrix X to be such that the sample sizes go to in nity at the same rate. 31 3.3 Estimation of the coe cients The following lemma provides an alternative representation of the orthonormal basis coe cients Cjk in the expansion of hF. Lemma 3.3.1. Assume that (H1) and (H4) hold.Then Z 1 0 hF(t) (t)dt = Z 1 1 d2 dz2 h F(z) i F(z)dz : (3.3.1) Proof. Eq. (5) of Naranjo and McKean (1997) gives Z 1 0 hF(t) (t)dt = Z 1 1 d dz [ (F(z))]dF(z) : (3.3.2) Integrating by parts the right-hand side of equation (3.3.2), we have Z 1 1 d dz [ (F(z))]dF(z) = F(z) ddz [ (F(z))] 1 1 Z 1 1 d2 dz2 [ (F(z))]F(z)dz : We can write F(z) ddz [ (F(z))] 1 1 = limz!1[F(z)f(z) 0(F(z)) F( z)f( z) 0(F( z))] : But limz!1F(z) = 1; limz! 1F(z) = 0 and limz!1 0(F(z)) = 0(1) = 0 = limz! 1 0(F(z)) = 0(0); since is compactly supported: 32 The Lemma follows. Remark 3.3.2. If we replace assumption (H1) in Lemma 3.3.1 by the assumption that f is absolutely continuous and Z 1 1 jf0(x)jdx<1, then (3.3.1) still holds. In fact, these two conditions insure that both limz! 1f(z) and limz!1f(z) exist and are both equal to zero. They are restrictive though since they require a well behaved source for the sample data. In comparison, we have numerous functions satisfying (H1) such as the Daubechies base functions, Symlets and coi ets. In the remainder of the dissertation, for 2Rp, we will let Fn( ; ) represent the empirical distribution function of y1 xT1 ;:::;yn xTn ; that is, Fn(z; ) = 1n nX i=1 I(yi xTi z): The following lemma is a combination of Lemma 1 and Lemma 2 of Naranjo and McKean (1997) and gives the asymptotic linearity of Fn(w; n) for n converging to 0 at a suitable rate. The proof follows from Section 2.3 of Koul (1992) and is given in the appendices. Lemma 3.3.3. Assume (H1) (H5). Then sup z2R pnjF n(z;b n) F(z)j= Op(1) : Characterizations of orthonormal basis systems of L2(0;1) can be found in Meyer (1991), Cohen et al. (1993) and Andersson et al. (1994). If the scaling function ? satis es conditions given in Theorem 9.6 of H ardle et al. (1998) (for example certain Daubechies wavelets), then hF belongs to the Besov Space Bsq2 (R). Besov spaces can be characterized using wavelet coe cients; thus, they are the natural spaces for wavelet estimation of functions. Moreover, in some Besov spaces, wavelet coe cients decay faster than Fourier coe cients. For instance, it is shown in Zygmund (2002) that if a function belongs to 33 the Zygmund space B111 (0;1), then its Fourier coe cients Cn are O(n 1). It was proved in Meyer (1990) that the wavelet coe cients Wjk of such a function are O(2 3j=2). We now start the estimation process of the score function hF. The strategy consists of rst estimating the coe cients Cjk by some dCnjk in the expansion (3.1.6) of hF and then use them to provide an estimate chnF of hF, for some xed n . Let f ngn2N and fMngn2N be sequences of real numbers such that Mn = O(n );0 < < 1=4 and Mnpn 2 n ! 0;Mn 2n! 0 as n!1. This means that n = O(n ) where = =2 1=4 and 1=4 < < 0. Given a scaling function ? with corresponding \mother wavelet" and apn-consistent estimator b n of , equation (3:3:1) in Lemma 3.3.1, using second di erences, suggests an estimator dCnjk := 1 2n Z 1 1 h 2 jk Fn(z;b n) jk Fn(z+ n;b n) jk Fn(z n;b n) i Fn(z;b n)dz of Cjk = R10 hF(s) jk(s)ds: Remark 3.3.4. Note that dCnjk can be computed from the data as dCnjk = 2 n n nX i=1 i h 2 Fn(ei; ^ n) Fn(ei + n; ^ n) Fn(ei n; ^ n) i ; (3.3.3) where ei = yi xTi b n Proof. In fact, let W(z) = Z z F(s)ds, where a zero of W. Then Z 1 1 d2 dz2 h (F(z)) i F(z)dz = nX i=1 Z ei+ n ei n d2 dz2 h (Fn(z; ^ n)) i dW(z) : (3.3.4) For n small enough so that there is only one ej between ei n and ei + n, namely ei, an approximation of the opposite of the integral on the right-hand side of equation 34 (3.3.4) is " (Fn(ei + n;b n)) 2 (Fn(ei;b n)) + (Fn(ei n;b n)) 2n # W(e i + n) W(ei n) = h (Fn(ei + n;b n)) 2 (Fn(ei;b n)) + (Fn(ei n;b n)) 2n i"Z ei+ n ei n Fn(z;b n)dz # = 2n n (F n(ei + n;b n)) 2 (Fn(ei;b n)) + (Fn(ei n;b n)) h nX j=1 I(ej i) i ; where i2[ei n;ei + n]. Since we can replace ei by their order statistics, the form of dCnjk proposed in equation (3.3.3) follows. The following lemma establishes the consistency of dCnjk. Lemma 3.3.5. Suppose that (H1) (H5) are satis ed. Then jMn(dCnjk Cjk)j= op(1) Proof. De ne gCnjk = R1 1 d2dz2 h jk F(z) i Fn(z;b n)dz. Now gCnjk = 1 2n Z 1 1 [2 jk (F(z)) jk (F(z + n)) jk (F(z n))]Fn(z;b n)dz + O(Mn 2n) : Taking the di erence dCnjk gCnjk and expanding jk(Fn) about jk(F), we have Mn(dCnjk gCnjk) = 2Mn 2 n pn Z 1 1 hp n(Fn(z;b n) F(z)) i 0jk( 1;n(z))Fn(z;b n)dz Mn 2 n pn Z 1 1 hp n(Fn(z + n;b n) F(z + n)) i 0jk( 2;n(z))Fn(z;b n)dz Mn 2 n pn Z 1 1 hp n(Fn(z n;b n) F(z n)) i 0jk( 3;n(z))Fn(z;b n)dz + O(Mn 2n) ; where 1;n(z) is between Fn(z;b n) and F(z), 2;n(z) is between Fn(z+ n;b n) and F(z+ n), 3;n(z) is between Fn(z n;b n) and F(z n). Since 0jk and Fn are bounded with 35 respect to n and sup z jpn(Fn(z;b n) F(z))j= Op(1), we have jMn(dCnjk gCnjk)j= Op(Mn= 2npn) +O(Mn 2n): (3.3.5) On the other hand, Mn(Cnjk gCnjk) = Mn 2 n Z 1 1 ( 2n)( jk(F))00(z)Fn(z;b n)dz Z 1 1 ( jk(F))00(z)F(z)dz Mn 2 n Z 1 1 3 n 6 [( jk(F))000( 1(z)) + ( jk(F))000( 2(z))]Fn(z;b n)dz + O(Mn 2n) = Mn Z 1 1 h Fn(z;b n) F(z) i ( jk(F))00(z)dz +Op(Mn n) +O(Mn 2n);; where 1(z)2(z n;z) and 2(z)2(z;z + n). Thus we have Mn(Cnjk gCnjk) = Mnpn Z 1 1 hp n(Fn(z;b n) F(z)) i ( jk(F))00(z)dz +Op(Mn n) +O(Mn 2n) : But Z 1 1 ( jk(F))00(z)dz = Z 1 1 f0(z) 00jk(F(z))dz + Z 1 1 f2(z) 0jk(F(z))dz) : The two integrals on the right are bounded since f is absolutely continuous, f2L2(R), and jk has bounded derivatives. Therefore we have jMn(Cnjk gCnjk)j= Op(Mn=pn) +Op(Mn n) +O(Mn 2n): (3.3.6) Equations (3.3.5) and (3.3.6) imply that jMn(Cnjk dCnjk)j= Op(Mn= 2npn) +Op(Mn=pn) +O(Mn 2n) = op(1) : (3.3.7) 36 Remark 3.3.6. In view of (3.2.1), this means that there is a constant L> 0 such that jdCnjk Cjkj L2 2j Mn ; 8k2N: 3.4 Estimation of the score function Now de ne the wavelet estimator of hF as chnF(t) = j1X j=0 X k dCnjk jk(t) ; where j1 is some chosen resolution level in N[f0g. Note that since we are using compactly supported wavelets, the sum over k contains only a nite number of terms for a given value of t (see Remark 10.1 on p. 127 of H ardle et al. (1998)). Theorem 3.4.1. Under (H1) (H5), we have EkhF chnFk22 = o(1). Proof. Let hF(t) = j1X j=0 X k dCjk jk(t) +X j>j1 X k Cjk jk(t) ; where the convergence is absolute in L2(0;1). Thus EkhF chnFk22 2E 0 @ Z 1 0 j1X j=0 X k (Cjk dCnjk) jk(t) 2 dt 1 A+2E 0 @ Z 1 0 X j>j1 X k Cjk jk(t) 2 dt 1 A : The second term on the right is o(1) by absolute convergence in L2(0;1): 37 For the rst term, we have Z 1 0 j1X j=0 X k (Cjk dCnjk) jk(t) 2 dt Z 1 0 j1X j=0 (j1 + 1) X k jCjk dCnjkjj jk(t)j !2 dt: But there is a positive constant L such thatjCjk dCnjkj L22jMn (see Remark 3.3.6). Thus Z 1 0 j1X j=0 X k (Cjk dCnjk) jk(t) 2 dt L2(j1 + 1) j1X j=0 24j M2n Z 1 0 X k j jk(t)j !2 dt: Since, by Proposition 8.3 of H ardle et al. (1998), the integral on the right hand side is uniformly bounded in j, there is a constant C > 0 such that Z 1 0 j1X j=0 X k (Cjk dCnjk) jk(t) 2 dt CM2 n (j1 + 1) j1X j=0 24j LM2 n (j1 + 1)24j1+1: Choosing j1 such that (j1 + 1)24j1+1 P Z 1 0 chnF(t) hF(t) 2dt> ; which is bounded by 1EkhF chnFk22 by Markov?s inequality. The desired result follows from Theorem 3.4.1. One may estimate (c nh) 1 using numerical integration methods (such as Gaussian quadrature) using some grid on (0;1) since the chnF(t) can be computed for any given t2(0;1). 41 As one application, consider testing the general linear hypothesis H0 : M = 0 versus H1 : M 6= 0 ; where M is a q p matrix of full row rank forming linear constraints. Under H0, the quantity BM = Mb R 0 [M(X0X) 1M0] 1 Mb R q(c nh)2 : is asymptotically 2(q) by Theorem 4.0.3, Theorem 4.0.4, and Slutsky?s Lemma. Thus a level- Wald test rejects H0 if BM exceeds the (1 ) quantile of the 2(q) distribution. 42 Chapter 5 Discussion In this dissertation, we developed an asymptotically e cient rank estimator based on score functions estimated using wavelets. A consistent estimator for is given for the asymptotic variance of the rank estimator. This can be used in constructing Wald tests of general linear hypotheses. 5.1 Issues related to simulations We consider the logistic density function f(x) = e x (1 +e x)2 Its cumulative density function F is given by F(x) = 11+e x and its score function for the logistic distribution is given by: hF(u) = 2u 1;u2(0;1): This is density function satis es all the assumptions (H1) (H5). We then generate N = 10;15;75;100 random points from this distribu- tion. We will apply respectively the the classical Fourier approach denoted by (CF), the rst order estimate with Fourier basis functions denoted by (FOF) proposed by Naranjo and McKean to estimate its score function. Our approach that uses the second order estimations and compact supported wavelets is denoted by (SOW). Our method, even though theoretical results show it is superior to both approaches in terms of exibility and precision has a shortcoming of its own. It?s very di cult to apply with the current algorithms. The main reason for this complexity is the implicit nature of the compactly supported wavelets used. An idea of the complexity of the use of these wavelets can be seem in the cascade algorithm by Mallat (1989) which is the most used in applications of compactly supported wavelets. Indeed this algorithm by Mallat (1989) requires to 43 have a sample of points from the function to be estimated whereas our method does not require such a sample, but is still very complex. We are working on an algorithm that will address this issue in the future. The Figure 5.1 below show the issues related to the Naranjo and McKean?s approach. Indeed the continuous curve for di erent values of N seems to have less precision that the other one with + symbols. The dotted line is the original score function. Figure 5.1: Simulations The Table 5.1 below represents a comparison of the di erent methods to approximate score functions in terms of convergence, exibility, applicability, Gibbs phenomenon, precision. 44 CF FOF SOW Convergence Yes Yes Yes Flexibility No No Yes Applicability Good Good No yet Gibbs No Yes No Precision Good Fair Best Table 5.1: Comparison between the di erent methods 5.2 Simple Mixed Models with Dependent Error Structure In our treatment we assumed that the errors are independent and identically dis- tributed with a cdf F that has pdf f. Estimating the score function to maximize e ciency is generally a very di cult problem for dependent error models. In simple dependent data problems, however, this may be tractable using some of the methodology developed earlier. Consider the model Yk = 1nk + XTk +ek 1 k m N = mX k=1 nk: (5.2.1) Suppose that ek = 1nkbk+"k where the "k are iid and bk is a continuous random variable, independent of "k. We also assume the random e ects b1; ;bm are iid. Then the errors ek are exchangeable with the same marginal distribution F. Then the asymptotic R-estimate of is pN^ R = hN(X TX) 1UN( ) +op(1); and from Brunner and Denker (1994), it follows that b R Np( ;Vh) 45 where Vh = 2h(XTX) 1 mX k=1 XTkCh;kXk (XTX) 1 and Ch;k = (1 h)Ink + hJnk h = covfh(G(e11));h(G(e12))g where Ink and Jnk are respectively the nk nk identity matrix and the nk nk matrix of ones. Assuming that X is centered, the asymptotic relative e ciency of the rank estimator versus the least squares estimator is given by ARE(b R;b LS) = 2(1 ) 2h(1 h) ; where = Corr("1;"2) 2 = Var("1) h = Corr[h(F("1));h(F("2))]; and 1h = Z 1 0 h(u) f 0(F 1(u)) f(F 1(u)) du: We would like to nd h that maximizes the ARE. This amounts to minimizing h and maximizing h. However, the function that minimizes h does not necessarily maximize h. Analytically nding h that would simultaneously do both is a di cult problem in calculus of variations. Simple case: the random error vector has a multivariate normal distribution Kloke and McKean-2009 proved that if h(u) =p2(u 1=2) (Wilcoxon score), then: ARE(b R;b LS) = 12 2 Z f2(t)dt 2 (1 ) (1 h); 46 where 2 is the variance of the underlying normal distribution, is the intra-class corre- lation and h is the Spearman Correlation within each class. Hence 1. ARE(b R;b LS)2[0:8660;0:9549] if 0 < < 1 2. ARE(b R;b LS)2[0:9549;0:9662] if 1 < < 0. However, for general score function , estimating the optimal score function is the following problem in calculus of variations: bh = Sup h 2 h 1 h = Sup h ( R 1 0 h(u) f0(F 1(u)) f(F 1(u)) du 2 1 RRR2 h(F(x))h(F(y))f(x;y)dxdy ) (5.2.2) The wavelet method developed earlier provides an approximation of the minimizer of h. The maximizer of h can also be found using the techniques in this dissertation since the wavelet basis of L2(R2) can be given as a product of wavelet basis in L2(R). Indeed, Consider the optimization problem Sup h Z Z R2 h(F(x))h(F(y))f(x;y)dxdy (5.2.3) and let J(h) = Z Z R2 h(F(x))h(F(y))f(x;y)dxdy 2 Z R g(y)h(F(y))dy where g is some continuous function on R. Then for any integrable function 6= 0, J(h+ ) J(h) = Z R (F(y)) Z R f(x;y)h(F(x))dx dy + Z R (F(x)) Z R f(x;y)h(F(y))dy dx + 2 Z Z R2 f(x;y) (F(x)) (F(y))dxdy + 2 Z R g(y)h(F(y))dy 2 Z R g(y) (F(y))dy: 47 Thus, if f is a symmetric function of x and y, we have J(h+ ) J(h) = 2 Z R (F(y)) Z R f(x;y)h(F(x))dx g(y) dy + Z Z R2 f(x;y) (F(x)) (F(y))dxdy: Hence, lim !0 J(h+ ) J(h) = 0) Z R f(x;y)h(F(x))dx = g(y): Therefore, if f 2L2(R2), then (5.2.3) becomes the Homogeneous Fredholm Integral Equation Z R h(F(x))f(x;y)dx = g(y); for some continuous function g: (5.2.4) 1. If f(x;y) = k(y x), for some function k, then (5.2.4) has a solution h(F(x)) = Z 1 1 Fy[g(y)](u) Fy[k(y x)](u)e 2i uydu where F is the Fourier transform of F. 2. In general, the solution to (5.2.4) can be written as h(x) = 1X i=1 ai li(x) where ai is a decreasing sequence of reals, hi;li are basis functions (Could be wavelets) in L2(R) and <;> is the scalar product in L2(R) . Thus, we have two approximations of h. The compromise is to use the estimated score in one in the estimation of the other. This could be iterated until the di erence between the two approximations is below a speci ed level of tolerance. Either one or 48 the average of the two approximations can be taken as the nal approximation of h as suggested by the algorithm below. X Get residuals from an initial t. Use them to get an estimate of F, say ^F0: X Use wavelets and ^F0 to estimate the maximizer of h, say ^h F: X Fit model using ^h F and get new residuals and a new estimate of F, say ^F : X Use ^F to estimate a maximizer of h, say ^h F: X Fit model using ^h F and get new residuals and a new estimate of F, say ^F F: X Set ^F0 = ^F F and go back to step 2 until jj ^h F ^h Fjj2 1 2(jj ^h Fjj2 +jj^h Fjj) < : Otherwise stop. X Take ^h F + ^h F 2 as an estimate of h. 5.3 Adequate space for score functions In the previous sections, it was assumed that the score function belongs to the space of square integrable functions. Though this property of the score function is guaranteed if its Fisher information is nite, it is worth mentioning that this space if very "big". In fact, pinning down the adequate space where score functions could be approximated by wavelets is of some value. The space of square integrable functions is contained in the space of continuous functions which are nice for practical applications but rare in reality. This space contains the Sobolev space which is nice for theory but also rare in reality. So the compromise could be a space lying in between the continuous functions and the Sobolev space, and contained in the space of square integrable functions. The Besov space is such a space. Besov spaces can completely be characterized in terms of wavelets in the sense that any function in this space has a wavelet decomposition and any function 49 decomposition in terms of wavelets coe cients entails a function belonging to a Besov space. As any space, Besov space have "nice" and "bad" functions. By "bad" function, we mean an analytic function that cannot be continued outside its disk of convergence. They are also called lacunary functions. Though the results we found were a partial answer to the problem of adequate space for score function, the techniques used made it worthwhile. Characterization of lacunary function in Bergman-Besov-Lipchitz Spaces The space B has been studied at length by various authors for various purposes. This space rst appears in its simplest form in De Souza (1980) where it was denoted by B1. This was later generalized to a weighted version B in De Souza (1985) and De Souza (1983 ).It was shown in these papers that B is the boundary value of those functions F for which Z 1 0 Z 2 0 jF0(rei )j(1 r)1q 1d dr < 1 for the weight function (t) = t1=q. It was shown in Bloom and De Souza (1989) that B , for a general weight function , is a real characterization of analytic functions in the unit disc for whichZ 1 0 Z 2 0 jF0(rei )j (1 r)1 r d dr<1, generalizing the results obtained in De Souza (1985) and De Souza (1983 ).The main result here is the analytic characterization of lacunary functions in the spaces B for belonging to a class S of weights satisfying some condi- tions that will be stated in the sequel. Preliminaries De nition 5.3.1. Lacunary functions are analytical functions F(z) = 1X k=1 akznk for which = inf k nk+1 nk > 1. 50 De nition 5.3.2. We de ne B = F : D!D;F analytic and Z 1 0 Z 2 0 jF0(rei )j (1 r)1 r d dr<1 and b = 8 < :F : D!D;F(z) = 1X n=0 anzn; 1X n=0 2nK(n; ) X k2In jakj2 !1=2 <1 9 = ; ; where In =fk2N : 2n 1 k< 2ng. Note that B and b are endowed with norms kFkB = R10 R2 0 jF0(rei )j (1 r)1 r d dr and kFkb = P1n=0 2nK(n; ) Pk2Injakj2 1=2, respectively. Notation: If (t) = t1=q, q 1, in De nition 5.3.2, then we denote B by Bq and b by bq. De nition 5.3.3. We say the weight function : [0;1]![0;1) belongs to the class S if (0) = 0, is nondecreasing, and there are positive constants C1;C2;K(n; ) satisfying Z 1 0 r2n 1 1 (1 r)1 r dr C1K(n; ); 8n 1 (5.3.1) and Z 1 2 n 1 2 (n 1) r2n 1 (1 r)1 r dr C2K(n; ); 8n 2 : (5.3.2) Hereafter c and C denote generic positive constants and when there is no ambiguity, we shall name all constants by c and C. Similar weight function classes can be found in Mateljevi c and Pavlovi c (1984) where they were used to characterize weighted Hardy spaces. Lemma 5.3.4. The class S is not empty. 51 Proof. Consider the family of weights de ned by U = f : (t) = t1=q;1 q <1g so that (1 r)1 r = (1 r)1q 1. Clearly (0) = 0 and is a nondecreasing function of t so that the weight function satis es the conditions stated in Bloom and De Souza (1989). We will show that U S. Take 2U. Using a result of Alzer (2001) and the facts that 1=q 1 and 2n 1 we have Z 1 0 r2(n 1) 1(1 r)1q 1dr = B 2(n 1); 1q 1q 12(n 1) 2 2 n=q; n 1; (5.3.3) where B( ; ) is the beta function de ned by B(x;y) = R10 tx 1(1 t)y 1dt: Also for 1 2 (n 1) r 1 2 n, we have 2n 12(1 n)=q (1 r)1q 1 2n2 (n 1)=q. Thus, since 21=q > 1 and 0 r 1, we obtain Z 1 2 n 1 2 (n 1) r2n 1(1 r)1q 1dr 2n 12 n=q Z 1 2 n 1 2 (n 1) r2n 1dr 2n 12 n=q Z 1 2 n 1 2 (n 1) r2ndr = 2 n 12 n=q 2n + 1 (1 2 n)2n+1 (1 2 (n 1))2n+1 = 2 n 12 n=q 2n + 1 (2 (n 1) 2 n) 2nX k=0 (1 2 n)2n k(1 2 (n 1))k 52 But (1 2 n)2n 1(1 2 (n 1)) (1 2 (n 1))2n. So for 0 k 2n, (1 2 n)2n k(1 2 (n 1))k (1 2 (n 1))2n. Thus 2nX k=0 (1 2 n)2n k(1 2 (n 1))k (2n + 1)(1 2 (n 1))2n : Therefore Z 1 2 n 1 2 (n 1) r2n 1(1 r)1q 1rdr 2n 12 n=q2 n(1 2 (n 1))2n 2 52 n=q; n 2: (5.3.4) Taking K(n; ) = 2 n=q, (5.3.3) and (5.3.4) imply that 2S. Theorem 5.3.5. Suppose 2S . Then b is a Banach space. Moreover for any function F(z) = P1n=1anzn belonging to B , there is a constant C > 0 such that kFkB CkFkb : If F(z) = P1k=1 akznk is lacunary and belongs to B , then there is a constant c> 0 such that kFkB ckFkb : Proof. We will rst show that b is a Banach space. First we show that b is a linear space. Let and be two complex numbers and let f;g2b . By Minkowski?s inequality we have X k2In j ak + bkj2 !1=2 j j X k2In jakj2 !1=2 +j j X k2In jbkj2 !1=2 : Thus f + g2b . 53 Now we show that b is complete. To that end let fFsgs2N be a Cauchy sequence in (b ;k kb ), where Fs(z) = P1n=0asnzn. We will show that there is some F 2b such that Fs !F in b . Given > 0, there is some S 2N such that for s;t S we have kFs Ftkb < ; that is, 1X n=0 2nK(n; ) X k2In jask atkj2 !1=2 < for s;t S. This implies that for all n2N, X k2In jask atkj2 < 2 2nK(n; ) for s;t S. Therefore for all n 2 N;fasngs2N is a Cauchy sequence in R, a complete metric space. This implies that for all n2N there exists some an2R such that asn!an in R. Now let F(z) = P1n=0anzn. We shall show that F 2b and that Fs !F in b . To that end, we will prove that F Fs2b and use the linearity of b to conclude that (F Fs) +Fs = F 2b : Given > 0, there exists S2N such that 1X n=0 2nK(n; ) X k2In jatk askj2 !1=2 < whenever s;t S. Now let M > 0 be arbitrary. Then for s;t S MX n=0 2nK(n; ) X k2In jatk askj2 !1=2 < : 54 Fixing s S and letting t!1, we have limt!1 MX n=0 2nK(n; ) X k2In jatk askj2 !1=2 = MX n=0 2nK(n; ) X k2In jak atkj2 !1=2 < : M being arbitrary, this shows that for a given > 0, there exists S2N such that 1X n=0 2nK(n; ) X k2In jak askj2 !1=2 < whenever s S. That is F Fs2bp. This also proves that given > 0,kF Fskb < , for s S and hence Fs!F in b . Let us now prove that there are constants C;c> 0 such that kFkB CkFkb and, for lacunary functions, kFkB ckFkb : Let J = Z 1 0 Z 2 0 1 2 jF 0(rei )j (1 r) 1 r d dr: Note that F0(rei ) = P1n=1nanrn 1ein . By the Cauchy-Schwarz inequality and Parse- val?s identity we have J C Z 1 0 1X n=1 jnanrn 1j2 !1=2 (1 r) 1 r dr: 55 But Z 1 0 1X n=1 jnanrn 1j2 !1=2 (1 r) 1 r dr = Z 1 0 1X n=1 X k2In k2jakj2r2(k 1) !1=2 (1 r) 1 r dr and since 0 0 limn!1B 2n nX k=1 E[Z 2nkI(jZ nkj> Bn)] = 0: (5.3.6) Since B2n converges to a positive constant, we need only to show that the sum converges to 0. By de nition of Z nk and Cauchy-Schwartz Inequality, we have jZ nkj n 1=2jt1j+N 1=2jtT2 xkjjh(F(Yk))j: Hence I(jZ nkj> Bn) I(n 1=2jt1j+n 1=2jtT2 xkjjh(F(Yk))j> Bn): By Cauchy-Schwartz inequality, n 1=2jtT2 xkj n 1=2jjxkjjnjjtjjp+1 = n 1 pX j=1 x2kj 1=2 jjtjjp+1 pmax j n 1x2kj 1=2 jjtjjp+1 63 Let Kn = pmax j n 1x2kj 1=2 . Kn is independent of k and converges to 0. Therefore we have I jf(F(Yk))j> Bn n 1=2t1 Kn I(n 1=2jt1j+n 1=2jtT2 xkjjh(F(Yk))j> Bn) Finally, we also have: nX k=1 E Z nkI jf(F(Yk))j> Bn n 1=2t1 Kn = t1E I jf(F(Y1))j> Bn n 1=2t1 Kn + (2=n)E sgn(Y1)h(F(Y1))I jf(F(Y1))j> Bn n 1=2t1 Kn tT2 nX k=1 xk + E h2(F(Y1))I jf(F(Y1))j> Bn n 1=2t1 Kn (1=n) nX k=1 (tT2 xk)2: Note that the sum in (5.3.6) is less than or equal to the expression above. The design matrix being centered, the middle term on the right side is 0. In the expression Bn n 1=2t1 Kn ; the numerator converges to a positive constant as the denominator converges to 0; hence the expression goes to 1. Since h is bounded, the indicator converges to 0. Using again the boundedness of h, the limit and the expectation can be interchanged by the Lebesgue Dominated Convergence Theorem. This show that (5.3.6) is true and hence Z n converges in distribution to a univariate normal distribution. Therefore, Tn converges to a multivariate normal distribution. 64