Rank Based Methods for Repeated Measurement Data by Yankun Gong A dissertation submitted to the Graduate Faculty of Auburn University in partial ful llment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 6, 2011 Keywords: Rank, Nonparametric, Predation Preference, Weighted Function, IRLS, Repeated Measures, Sensitivity Curve, Relative E ciency Copyright 2011 by Yankun Gong Approved by Asheber Abebe, Chair, Associate Professor of Mathematics and Statistics Peng Zeng, Associate Professor of Mathematics and Statistics Mark Carpenter, Professor of Mathematics and Statistics Abstract This dissertation considers rank based methods for one sample and two sample repeated measurement data. As a speci c example, in Chapter 2, this dissertation considers nonpara- metric tests for selective predation. [1] proposed a nonparametric tests for selective predation using the linear score function. Motivated by this method, general rank tests are given for the case of one predatory species and prey characterized by a binary feature of interest and the case of two predatory species and prey characterized by either a continuous or a categori- cal feature of interest. The score functions used to construct the test statistics are monotone and hence the test is designed to detect simple ordered alternatives. The results based on the asymptotic Gaussian distribution of the test statistics show that the tests retain nominal Type-I error rates. The results also show that power of the asymptotic test depends on the chosen score function. In Chapter 2, we study the one sample and two sample repeated measurement data with random censoring and the simulation results show that we can take the asymptotic distribution as the underlying distribution of the test statistic even with a high censoring rate and a small sample size. In Chapter 3, this dissertation considers using rank based methods to test the trend of the di erence between two samples for two sample repeated measurement data. As a speci c example, we consider nonparametric methods for testing whether the rate of prey feature change in the selection of one species is faster than that of another species. Although the Page test is used in conjunction with a single randomized complete block design, we extend it to the situation where we have two randomized complete block designs. We derive the asymptotic distribution of a general test statistic which includes the Page statistic as a special case. The results based on the asymptotic Gaussian distribution of the test statistics show that the tests retain nominal Type-I error rates. ii In Chapter 4, the nite sample performance of the rank estimator of regression coe - cients obtained using the iteratively reweighted least squares (IRLS) of Sievers and Abebe (2004) is evaluated. E ciency comparisons show that the IRLS method does quite well in comparison to least squares or the traditional rank estimates in cases of moderate tailed er- ror distributions; however, the IRLS method does not appear to be suitable for heavy tailed data. Moreover, the results show that the IRLS estimator will have an unbounded in uence function even if we use an initial estimator with a bounded in uence function. In Chapter 5, this dissertation study how to test the trend of the di erence in the two sample repeated measurement data using two generalized estimating equation (GEE) models: the likelihood based GEE model proposed by Liang and Zeger (1986) and the IRLS Rank-based GEE model proposed by [2]. The results show that the GEE model proposed by [2] is robust to outliers in response space and can be used to analyze data with small sample sizes compared to the GEE model proposed by Liang and Zeger (1986). iii Acknowledgments My greatest thanks extend to Dr. Asheber Abebe for his patience and guidance through- out this endeavor. Dr. Abebe took considerable time and energy to further the progress in my studies of Statistics. I also wish to express my appreciation to my parents and my brother Yanfei Gong who have given their love and constant support throughout my life. I would also like to thank Dr. Peng Zeng and Dr. Mark Carpenter on the committee for their contribution to this dissertation and serving as committee members. iv Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Rank Tests for Selective Predation . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Data and Model in [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Experiment Frame and Data . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 General Rank Test for Preference: One-Sample . . . . . . . . . . . . . . . . 12 2.4 Two Sample Comparison of Preference Patterns . . . . . . . . . . . . . . . . 14 2.4.1 Continuous Prey Feature . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.2 Categorical Prey Feature . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Censored Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5.1 One-sample Selection Problem . . . . . . . . . . . . . . . . . . . . . . 18 2.5.2 Two Sample Selection Problem . . . . . . . . . . . . . . . . . . . . . 19 2.6 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6.1 Null Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6.2 Power Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.6.3 Null Simulation for Two Sample Case with Censoring . . . . . . . . . 26 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 Testing the Di erence Trend in Two Sample Repeated Measurement Data . . . 28 v 3.1 Motivation of the Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Test Statistic and Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 Null Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.2 Results for the data in [48] . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4 Iteratively Reweighted Rank Regression Estimator . . . . . . . . . . . . . . . . 39 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Reweighted Least Squares Formulation . . . . . . . . . . . . . . . . . . . . . 41 4.3 Computational Details and Results . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.1 Algorithm for Estimating Regression Coe cients . . . . . . . . . . . 43 4.3.2 Variance Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.3 In uence Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.4 Tests of Linear Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5 IRLS Estimation of Rank Generalized Estimation Equations Model . . . . . . . 53 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2 Ordinary LS based GEE models . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3 Iterated Reweighted Rank-Based Estimates for GEE Models . . . . . . . . . 56 5.4 Our Case for the Ordinary LS based GEE Model . . . . . . . . . . . . . . . 59 5.5 Our Case for the Iterated Reweighted Rank-Based Estimators for GEE Models 61 5.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Appendix: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 vi List of Figures 2.1 Figure for Null Rejection Rates in Table 2.2 . . . . . . . . . . . . . . . . . . . . 23 3.1 Boxplots for CRP levels at time 0 and time 24 for the two groups HI and LO . 30 3.2 Mean and median pro les for the CRP levels in two groups HI and LO . . . . . 37 4.1 Sensitivity Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1 Residul plots for The ordinary GEE and rank-based GEE . . . . . . . . . . . . 64 vii List of Tables 2.1 Selections of ve large or small armyworms in bird trials . . . . . . . . . . . . . 10 2.2 Simulated null rejection rates of W+ using its asymptotic normal critical constant for continuous distributions, (t)/t; = :05. . . . . . . . . . . . . . . . . . . 22 2.3 The simulated power of W+ using its asymptotic normal distribution where Xj N( j;1); Yj N(0;1); 1 j k; = 0:05 . . . . . . . . . . . . . . . . . . . . 25 2.4 Simulated null rejection rates ofW+ using its asymptotic normal critical constant, (t)/t; = :05;P("rj = 1) = P( sj = 1) = :5. . . . . . . . . . . . . . . . . . . 26 3.1 CRP levels in [48] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Simulated null rejection rates of W using its asymptotic normal critical con- stant, (t)/t; = :05 and k = 5. . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1 Relative e ciencies of IRLS versus HM and LS when the errors follow a tdf distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2 Relative e ciencies of IRLS versus HM and LS when the errors follow a CN( ;3) distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 REs of bV of IRLS vs HM when ei tdf . . . . . . . . . . . . . . . . . . . . . . . 48 4.4 REs of bV of IRLS vs HM when ei CN( ;3) . . . . . . . . . . . . . . . . . . . 48 4.5 Empirical levels for IRLS vs HM . . . . . . . . . . . . . . . . . . . . . . . . . 51 viii Chapter 1 Introduction In the 70+ years since their origin in the mid-1930s, nonparametric statistical methods have ourished and have emerged as valuable methodology for statisticians and other scien- tists performing data analysis. Roughly speaking, a nonparametric procedure is a statistical procedure that has certain desirable properties that hold under relatively mild assumptions regarding the underlying population from which the data are obtained. Nonparametric tech- niques have the following advantages compared to parametric techniques: 1. Nonparametric methods require few assumptions about the underlying population from which the data are drawn. In particular, nonparametric procedures forgo the tradi- tional normality assumption. 2. Nonparametric procedures enable the user to obtain exact p-values for tests, exact coverage probabilities for con dence intervals, exact experimentwise error rates for multiple comparison procedures, and exact coverage probabilities for con dence bands without relying on assumptions that the underlying populations are normal. 3. Nonparametric techniques are often (although not always) easier to apply than their normal theory counterparts. 4. Usually the nonparametric procedures are only slightly less e cient than their normal theory competitors when the underlying populations are normal, and they can be mildly or wildly more e cient than these competitors when the underlying populations are not normal. 5. Nonparametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of nonparametric 1 methods may be necessary when data have a ranking but no clear numerical interpre- tation, such as when assessing preference; in terms of levels of measurement, for data on an ordinal scale. 6. Due to the reliance on fewer assumptions, nonparametric methods are relatively robust to outlying observations and other violations. In this dissertation, we discuss the nonparametric techniques in repeated measurement data. Repeated measurement data, in which the same response variable is recorded on each ob- servational unit on several di erent occasions, occur frequently in many di erent disciplines. One example is that sequential experiments for treatment comparisons are widely used in clinical (e.g., a new treatment and a control or placebo) or pharmaceutical trials and in many other applied contexts. There are two possible inferential goals in these studies. One goal is to establish a signi cant overall di erence between the two treatments under study, the other goal is after we conclude that there does exist a di erence between the two treatments, to check if the di erence stays the same, decays, or expands over time. Several approaches have been considered in the past to deal with comparison of two groups with respect to their repeated measures. A useful and common initial step in the analysis of repeated measures data is to graph the data in some way. A method often employed, particularly in medical publications, is to plot means by treatment group for every time point. An example of such a plot for the data from the trail of two treatments for the control of intestinal parasites in cattle can be found in [3]. Another commonly used method of analysis for repeated measures that involve a number of treatment groups, particularly in medical and related research, is to compare the groups at each time point, by using either t-test or some nonparametric equivalent. [4] suggested that this approach may be quite useful if the occasions are few and the intervals between them are large. A more relevant, but still relatively straightforward, approach to the analysis of repeated measures data is that involving the use of summary measures, and sometimes known as 2 response feature analysis. Here the responses for each subject are used to construct a single number that summarizes some aspect of the subject?s response pro le. (In some cases more than a single summary measure may be used). The summary measure to be used needs to be chosen before the analysis of the data and should, of course, be relevant to the particular questions of interest in the study. [5] gave a list of potentially useful summary measures. [6] suggest three methods of analysis using summary statistics, which are post-treatment means (POST), mean changes (CHANGE) and analysis of covariance (ANCOVA). The multivariate procedure involves testing on a set of transformed variables represent- ing the within-subject di erence of each within-subject factor and their interactions. The hypothesis that the means of a set of transformed variables representing a within-subject factor, or an interaction between within-subject factors, are zero can be tested by using Hotelling?s T2 statistics which is introduced in [7]. To deal with the cases when the variables are not normally distributed or the sample size is small, [8] proposed numerous nonparamet- ric approaches. The repeated measures data most appropriately analysed by the methods described above are those from designed experiments where all subjects have the same number of ob- servations measured at equivalent time interval. Since these conditions often do not hold in clinical trials, methods to accommodate missing data have been developed ([9]). Many of these are likelihood-based methods ([10], [11]). When values are missing completely at ran- dom, non-parametric and semiparametric methods have been developed to analyze such data ([10], [12], [13]). When the missing values are not random, [14] propose a semiparametric estimation procedure for treatment di erences over time based on repeated measurements of an outcome variable when the patient?s follow-up time may depend on observed and/or missing measurements. They examine the e ect of switching to didanosine(ddI) from zi- dovudine(AZT) for HIV-infected patients who had tolerated AZT for at least 16 weeks, 304 patients were randomly chosen to continue AZT therapy and 298 patients were assigned to 3 take ddI at a daily dose of 500 mg. The patient?s CD4 cell count, a proxy for the progres- sion of HIV infection, was obtained periodically during the trial. In addition to the usual clinical endpoint analysis, they also evaluate the group di erence based on such repeated measurements. Apart from clinical trials, another important application eld of repeated measurements is to study the predation patterns in evolutionary Biology. In Chapter 2 of this dissertation, we use the Wilcoxon signed rank statistic to study the repeated measurement data model in [1]. Our intention is to develop nonparametric hypothesis tests for selective predation as a result of values taken by prey feature of interest. We do this by extending the method of [1] to allow for di ering gradients in prey selection and random censoring. We also extend the method to study the two sample predation preference. If we conclude that the two predator species do have di erent prey preference, in order to test whether the rate of prey feature change in the selections of one species is faster than that of another species, in Chapter 3, we need to use theory of long memory processes. The phenomenon of long memory had been known long before suitable stochastic mod- els were developed. Scientists in diverse elds of statistical applications observed empirically that correlations between observations that are far apart (in time or space) decay to zero at a slower rate than one would expect from independent data or data following classic Markov- type models. As a result of Mandelbrot?s pioneering work, self-similar and related stationary processes with long memory were later introduced to statistics, to provide a sound mathe- matical basis for statistical inference. Since then, long memory (or long-range dependence) has become a rapidly developing subject. Because of the diversity of applications, the litera- ture on the topic is broadly scattered in a large number of journals, including those in elds such as agronomy, astronomy, chemistry, economics, engineering, environmental sciences, mathematics, physics, geosciences, hydrology, and statistics. Best known is the occurrence of long-range dependence in geophysics and hydrology (for a review, see [15]). In particular, the so-called Hurst E ect ([16]) can be explained by slowly decaying correlations. However, 4 there are many other elds of application where this type of correlation occurs. As early as 1895 the astronomer Newcomb discussed the phenomenon of long-range dependence in astronomical data sets and called it "semi-systematic" errors. He also proposed a heuristic explanation by superposition of independent random errors and constant systematic errors. In 1902, Karl Pearson observed slowly decaying correlations in simulated astronomical obser- vations. Further examples are discussed, for instance, by [17] and [18] for economical data, [19] for data from biology, geophysics, meteorology and hydrology, [20] for meteorological data, [21] for telecommunication data and [22] and [23] for agricultural data. In Chapter 3, using stationary process with long-memory theory, we propose a procedure that is analogous to the Page test [24] to study the trend of the di erence between two samples of repeated measurement data. Although the Page test is used in conjunction with a single randomized complete block design, we extend it to the situation where we have two randomized complete block designs. We derive the asymptotic distribution of a general test statistic that includes the Page statistic as a special case. This provides an approximate level test for the changing rates in the simple ordering case. For the repeated measurement data, when the outcome variable is approximately Gaus- sian, statistical methods are well developed, e.g. [25] and [26]. For non-Gaussian outcomes, however, less development has taken place. For binary data, repeated measures models in which observations for a subject are assumed to have exchangeable correlations have been proposed by [27] using a probit link, by [28] using a logit link and by [29] using log linear models. Only the model proposed by [28] allows for time-dependent covariates. [10] pro- posed an extension of generalized linear models to the analysis of longitudinal data. They introduce a class of estimating equations that give consistent estimates of the regression pa- rameters and of their variance under mild assumptions about the time dependence. Using a working generalized linear model for the marginal distribution of the response, the estimating equations are derived without specifying the joint distribution of a subject?s observations yet 5 they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. The methods, however, are not robust against outliers since they are based on score equations from the maximum likelihood method of estimation. A solution proposed by [30] is to use M-type estimation by involving downweighting schemes. Another solution is one given by [31] who proposed an adaptation of the Wilcoxon-Mann-Whitney method of esti- mating linear regression parameters for use in longitudinal data analysis under the working independence model. They used joint ranking (JR) of all observations in their development. [32] consider the same model as [31] but they use separate between-subject and within- subject ranks to specify their Wilcoxon-Mann-Whitney estimating equation. [2] provided a direct rank estimation analogue of the GEE model of [10] obtained via the minimization of rank dispersion function of [33]. This is a generalization of the work of [31]. Their develop- ment used the iterated reweighted least squares (IRLS) formulation of the rank dispersion function given in [34]. In Chapter 4 of this dissertation, we evaluate the nite sample performance of the IRLS estimate of b R and its variance. We discuss the IRLS formulation, provide computational algorithms as well as simulation results pertaining to the estimator, its variance, and its in uence function. In Chapter 5, Using the LS based GEE model in [10] and the IRLS Rank-Based GEE model proposed by [2] (see also [35]), we study the problem of selective predation patterns. 6 Chapter 2 Rank Tests for Selective Predation 2.1 Introduction Repeated measurement data, in which the same response variable is recorded on each observational unit on several di erent occasions, occur frequently in many di erent disci- plines. One major eld that generates a great deal of repeated measures data is in Biology, when people study the predatory preference. In this chapter we will study the predatory preference problem as a special case of repeated measure data. All the test statistics we used and the corresponding asymptotic properties can be used for similar repeated measures data. Predation is necessary for life to exist. It is a force in evolution of species as natural selection is biased in favor of e ective predators and elusive prey. We are interested in the case where predatory behavior is guided by certain features of the prey. Dating back to the time of Darwin, scientists have long recognized that signals used by animals to attract mates can have the unintended consequence of attracting predators and resulting in gender selective predation [36]. Conversely, as a recent study of [37] shows, in some species, signaling places the receiver of the signal at a higher risk of predation than the signaler. Several methods have been given in the past to quantify these types of selective predation [38]. As pointed out in [39] and [1], many of these methods have major shortcomings in that they ignore the order in which the predator selects its prey. To that end, [39] proposed a test procedure, based on the maximum likelihood principle, that accounts for the order of selection. The performance of this method was unsatisfactory for small sample sizes. As a remedy, [1] suggested to use a Wilcoxon signed rank statistic to test for predatory preference. Their approach takes into account not only the selection of prey but also the order of in which they are selected. 7 Further, as most nonparametric methods, the method provides satisfactory results for small samples. However, this method, by using a linear score function, assumes that changes in consecutive prey selections are equally important regardless of when the selection is made. In this chapter, it will be assumed that there is only one feature of the prey that determines predatory preference. Hereafter, we will refer to this as \prey feature of interest." Our intention is to develop nonparametric hypothesis tests for selective predation as a result of values taken by prey feature of interest. We will do this by extending the method of [1] in three directions: 1. We provide a class of general rank tests for the experiment considered by [1]; that is, testing for selective predation of one random sample of predators where the prey feature of interest takes one of only two possible values. Examples of dichotomous prey features of interest are gender and size categorized as \large" and \small". General rank tests will allow one to place varied emphasis on selections at di erent stages of the selection experiment. This could be helpful in controlling the manner in which extraneous variables, such as the level of hunger of predators, can a ect selection preference at the beginning versus at the end of the experiment. 2. We propose a class of general rank score tests for the di erence in predation patterns of two random samples of predators (eg. two species). In this case, we treat the case where the prey feature of interest is continuous (\surface area", for example). We then give the test where prey feature is a categorical variable with m classes (m 2). The test statistic for this case becomes equivalent to that of the continuous case when there are no ties in the data. 3. If we look at our data as a repeated measures data, Xrj denote the measurement of the rth experimental subject in population X at condition or time point j, similar for Ysj, we may not have a complete data frame, i.e. some observations of the experimental subjects maybe missing, which is often the case in clinical trials. We propose the 8 test statistic for randomly missing data for the cases of one sample problem and two samples problem in repeated measures data. In all the three cases, we study the asymptotic distributions of the test statistics as the number of predators increases while the number of allowed selections remains constant for all predators. 2.2 Data and Model in [1] 2.2.1 Experiment Frame and Data The purpose of this chapter is to suggest a test statistic that can be useful to establish preference between two response in repeated measures experimental design settings. The context for this study was experimental work conducted at the University of Florida aimed at conservation of naturally occurring enemies of crop pests as a means of improving biological control of cropping systems. At issue was whether birds could be useful in crop pest control due to their consumption of pests that escape mortality from other agents of biological control. Specially, the research investigated aspects of pest insects that would make them more attractive food sources for birds. Trails were conducted using red-winged blackbirds with fall armyworms as their food source. The purpose of the trails was to determine whether the birds preferred larger versus smaller armyworms and whether the birds preferred non- parasitized armyworms to parasitized armyworms of the same size. The parasite used was larva of E.plathypenae, a species common in Florida and previously investigated for its biological control value. A description of the experiment can be found in [40]. In this experiment, a plastic chamber divided into two equal sized parts was used to present each bird with two food choices. The positions of the two foods in the subchambers (left or right) were varied over the trials and no directional preference was observed. In trials to determine preference for food size, for example, each bird was presented with ve large armyworms in one subchamber and ve small armyworms in the other. The researchers 9 recorded the rst ve choices made by the bird, including the order in which the armyworms were selected by the birds. One of the data sets from this experiments is presented in Table 2.1. It displays the selection by each bird: large (L) or small (S), in the order in which they were selected (left to right). Table 2.1: Selections of ve large or small armyworms in bird trials Trial number Choice: 1 2 3 4 5 Score: 5 4 3 2 1 1 L L L L S 2 L L S L S 3 S L L L S 4 L S L S S 5 L L L L S 6 L L L S L 7 L L L S S 8 L S L L S 9 S S L L L 10 L L S L L 11 L L L S S 12 L L L L S 13 L L L L S 14 L L L S S 15 L L L S S 16 S L S S L 17 L L S L S 18 L L S L S 19 L L L S S 20 S L L S L 21 L L S S L 2.2.2 Test Statistic When testing for preference, the null hypothesis is "no preference", i.e. that the bird randomly selects its food type (L or S) and the ve selections made by each bird are indepen- dent. In other words, each no preference selection by a bird is like a coin toss with probability p = 12 of selection L over S and with independence between selections. A Wilcoxon signed rank statistic can be used to describe the selections made in each trial. A "+" is assigned to 10 each large armyworm selected and a " " is assigned to each small armyworm. The score of 5 is assigned to the rst selection made by the bird, 4 to the second selection,. . . , and 1 to the fth selection made by the bird. Using these scores means that early selection of a food type and more selections of that food type, will both contribute to increasing the indication of preference for that food type. Under the alternative that the birds exhibit a preference for one of the two food types, we would expect p6= 12. For example, p > 12 would indicate a preference for L, and p < 12 would indicate a preference for S. Because of the particular design of this experiment, abundance of the two types of food is not an issue. If the two food types were mixed together in one food chamber, abundance would play a more important role in the analysis. However, in this setting they were in separate subchambers and the researchers recorded only which subchamber was chosen by the bird in each selection. In the jth bird trial we let T+j = (sum of the scores assigned to +(L) food selections) and T j = (sum of the scores assigned to -(S) food selections) Summing over the trials on di erent birds produces the test statistic T+ = nX j=1 T+j where n denotes the number of birds used in the experiment. In general, suppose there are k selections made by each subject in the experimental situation and n subjects in the experiment. Then under the null hypothesis, T+ has the distribution of the sum of n independent Wilcoxon signed rank statistics in which k is the 11 largest rank assigned in each of the signed rank statistics. It is clear that EH0[T+] = nk(k + 1)4 and VarH0[T+] = nk(k + 1)(2k + 1)24 When n is large and k is xed, the null distribution of T+ can be approximately by a normal distribution. This follows from the usual Central Limit Theorem because under the null, T+ is the sum of n independent and identically distributed statistics, Tj, each with nite mean and variance. From the data in Table 2.1, the exact p-value is 0:0001510. 2.3 General Rank Test for Preference: One-Sample Suppose that we have an experiment in which there is a random sample of n predators (n > 1) and that each predator is allowed to select k preys (k > 1). We shall assume that relative abundance of prey remains unchanged during successive selections, or equivalently, that successive selections are made with replacement. Let !ij represent the feature of interest (eg. color, size, etc) of the jth prey selected by predator i, where 1 i n and 1 j k. It will be assumed that predators make their selections independently of each other. As in [1], we will focus on the case where the prey feature of interest is categorized into two classes (2 sizes, 2 genders etc). In this case, we can de ne a random variable X = 0 or X = 1 depending on the category. We will write Xij for X(!ij) and fX = cg for f! : X(!) = cg; c = 0;1. Then, under the null hypothesis that there is no preferential predation and selections are independent, Xi1;:::;Xik are independently and identically distributed according to the Bernoulli distribution with probability of success p = 1=2, 1 i n. A value of p > 1=2 indicates preference for fX = 1g prey. We will con ne our attention to testing whether fX = 1g is the preferred prey feature in that predators tend to select fX = 1g prey more than fX = 0g prey at the beginning of the selection experiment than at the end. 12 [1] gave a test statistic for detecting such selection patterns. In our notation, the Wilcoxon signed rank test statistic given by [1] is T+ = nX i=1 kX j=1 (k j + 1)Xij : (2.3.1) As intended, the test rejects the null if T+ is large; that is, if there are more fX = 1g prey selections at the beginning of selection than at the end. This is done by giving less importance to later fX = 1g prey selections than earlier selections using the score function (j) k j + 1. However, the importance gradient remains constant since (j) decreases linearly as j goes from 1 to k. This does not allow investigators to factor in extraneous variables (such as time of day, predator?s hunger level, location, etc) that can potentially in uence the selection process. Using a general score function [35] in the de nition of the test statistic provides investigators the exibility to decide on a gradient. To that end, let be a non-increasing function de ned on the interval (0;1) such that Pkj=1 2fj=(k + 1)g<1. We now de ne the generalized Wilcoxon signed rank test statistic for preference patterns as T+ = nX i=1 kX j=1 jXij ; (2.3.2) where j = fj=(k + 1)g. The following theorem gives the rst two moments and the asymptotic distribution of the test statistic T+ . The proof is straightforward and may be found, for instance, in [35]. Theorem 2.1 Assume that the null hypothesis is true. Then for k<1 xed andPkj=1 2j < 1 we have T+ E0(T+ ) fvar0(T+ )g1=2 !N(0;1) in distribution as n!1; where E0(T+ ) = 2 1nPkj=1 j and var0(T+ ) = 4 1nPkj=1 2j. 13 The score function can be normalized such that Pkj=1 j = 0 and Pkj=1 2j = 1. One way to obtain normalized score functions is to take fj=(k + 1)g= c[b hfj=(k + 1)g] (2.3.3) and to solve for b and c using the constraints Pkj=1 j = 0 and Pkj=1 2j = 1. Here b and c are scalars and h(u) is a nondecreasing function de ned on [0;1] depending on u alone. The score function leading to a test statistic equivalent to (2.3.1) of [1] may be obtained by taking h(u) = u. Other simple score functions use h(u) = u2 and h(u) = sgn(u). Taking h(u) = u in (2.3.3) gives b = 1=2 and c = [f12(k + 1)g=fk(k 1)g]1=2. For the data given in [1], one obtains a p-value of 3:14 10 5 for h(u) = u and 1:22 10 5 for h(u) = u2 using the asymptotic test in Theorem 2.1. [1] found a p-value of 1:51 10 5 using a permutation test. 2.4 Two Sample Comparison of Preference Patterns Suppose now that we have two predatory species X and Y and record the rst k choices made by these two species. As earlier, we will assume that there is only one prey feature of interest. However, in this case, we will not restrict our analyses to the case where this feature is categorized into two classes. Instead, we will consider a case where the feature is continuously measured and also the case of an arbitrary number of categories. To simplify our argument, we will assume that prey are in nitely abundant or that selection is made with replacement. Our goal is to test whether there is a signi cant di erence in the predatory preferences of these two species. The data from this experiment are presented as Xrj (r = 1;:::;n1;j = 1;:::;k) and Ysj (s = 1;:::;n2;j = 1;:::;k). Here Xrj and Ysj denote the feature of interest of the jth prey selected by the rth individual from species X and the sth individual from species Y , respectively. When there is no confusion, we will use X and Y as generic random variables 14 denoting prey feature selected by species X and Y , respectively. We are in particular interested in testing whether the X species prefer larger values of the particular feature than the Y species. This will be characterized by larger values of pr(X > Y ) at the beginning of the experiment than at the end. We propose the following test statistic for comparing two species selection patterns: W+ = n1X r=1 n2X s=1 kX j=1 j?(Xrj Ysj) ; (2.4.1) where ?(x) = 8 >>>> >>< >>> >>>: 1 (x> 0) ; 1=2 (x = 0) ; 0 (x< 0) : In the following we will study the null asymptotic behavior of W+ for continuous and cate- gorical prey feature, respectively. All proofs are given in the Appendix. 2.4.1 Continuous Prey Feature Assume that X and Y are measured continuously. Under the null hypothesis that there is no preferential selection, pr(?(X Y ) = 1) = 0:5 = pr(?(X Y ) = 0). The null asymptotic distribution of W+ is given in the following theorem. Theorem 2.2 Under the null hypothesis that there is no di erence in preferences, k <1 xed, and Pkj=1 2j <1 we have W+ E0(W+ ) var 0(W+ ) 1=2 ! N(0;1) in distribution as min(n1;n2)!1, where E0(W+ ) = 2 1n1n2Pkj=1 j and var0(W+ ) = 12 1n1n2(n1 +n2 + 1)Pkj=1 2j. Proof 2.3 Write W+ = Pkj=1 jUn1n2j, where Un1n2j = Pn1r=1Pn2s=1 ?(Xrj Ysj) for j = 1;:::;k. For j = 1;:::;k xed, under the null hypothesis, Un1n2j is just the Mann-Whitney 15 U-statistic and thus [41] E0(Un1n2j) = n1n22 and var0(Un1n2j) = n1n2(n1 +n2 + 1)12 : Moreover, as min(n1;n2)!1 Un1n2j n1n2=2 fn1n2(n1 +n2 + 1)=12g1=2 ! N(0;1) in distribution whenever the null hypothesis is true. Thus W+ (n1n2=2)Pki=1 j fn1n2(n1 +n2 + 1)=12g1=2 = kX j=1 j U n1n2j n1n2=2 fn1n2(n1 +n2 + 1)=12g1=2 ! N 0; kX j=1 2j in distribution since k<1 and Pkj=1 2j <1. Once again, in applications, standardizing the score function such that Pkj=1 j = 0 and Pk j=1 2 j = 1 will simplify the expressions for mean and variance. 2.4.2 Categorical Prey Feature Assume that the prey feature of interest is categorized into m disjoint classes. We wish to extend the two sample selective predation problem to this case where selections are being made with replacement. Under the null hypothesis of no selective predation, the combined sample fX1j;:::;Xn1j;Y1j;:::;Yn2jg has a multinomial distribution with mass function P(T1 = tj1;:::;Tm = tjm) = (n1 +n2)! mY l=1 ptjll tjl! ; where Pml=1 pl = 1 and Pml=1tjl = n1 + n2 (j = 1;:::;k). In this case, we may have ties in our data that need to be taken into account. The following theorem gives the asymptotic normality of W+ . 16 Theorem 2.4 Suppose prey feature is categorical with m possible outcomes (m > 1) and that the null hypothesis is true. Let tjl be the observed frequencies for selection j (j = 1;:::;k; l = 1;:::;m). Then W+ E0(W+ ) var 0(W+ ) 1=2 ! N(0;1) in distribution as min(n1;n2)!1, where E0(W+ ) = 2 1n1n2Pkj=1 j and var0(W+ ) = n1n2(n1 +n2 + 1)12 kX j=1 2j n1n212(n 1 +n2)(n1 +n2 1) kX j=1 mX l=1 2j(t3jl tjl) : Proof 2.5 The proof follows from the proof of Theorem 2.2. The only di erence is in the computation of the null variance of Un1n2j which is given by [42] as var0(Un1n2j) = n1n2(n1 +n2 + 1)12 n1n212(n 1 +n2)(n1 +n2 1) mX l=1 (t3jl tjl) : We remark here that the added complexity presented by a discrete prey feature is the in- troduction of ties in X and X that one needs to take into account. As in most nonpara- metric statistics, although ties a ect the null variance, they do not usually a ect the null expected value of the statistic [41]. It can be easily seen that the null variance of the dis- crete case is the same as the null variance of the continuous prey feature case when tjl = 1 (j = 1;:::;k; l = 1;:::;m). 2.5 Censored Data In repeated measurement data study, especially in clinical trials, if the repeated mea- surements are taken over time during the study, some experimental subjects may not have a complete vector of observations. some of the observations may be censored. In this section we assume all the censored observations are censored completely at random. 17 De ne the i:i:d censor indicators "rj and sj to be: "rj = 8 >>< >>: 1; if Xrj is observed; 0; otherwise: and sj = 8 >>< >>: 1; if YSj is observed; 0; otherwise: for r = 1;:::;n1; s = 1;:::;n2 and j = 1;:::;k we also assume: P("rj = 1) = pj and P( sj = 1) = qj 2.5.1 One-sample Selection Problem The problem in Section 2.3 can be considered as a one sample repeated measures prob- lem. Let Xr = (Xr1;:::;Xrk)0 denote independent random samples for the rth subject, and j = 1;:::;k are indices representing a set of conditions or a series of prespeci ed condi- tions or time points for which repeated measurements from each experimental subject are obtained. Some experimental subjects may not have a complete vector of observations. We use "rj to indicate if the jth observation of rth experimental subject in group X is censored or not. Then the test statistic in (2.3.2) can be modi ed as: Tc = kX j=1 nX r=1 j"rjXrj (2.5.1) It is easy to see that E0(Tc) = n2 kX j=1 jE("j) = n2 kX j=1 pj j 18 Var0(Tc) = kX j=1 2jVar0( nX r=1 "rjXrj) = kX j=1 2jnVar0("rjXrj) = kX j=1 (E0("2rjX2rj) E20("rjXrj)) = kX j=1 (E("2rj)E0(X2rj) E2("rj)E20(Xrj)) = n kX j=1 (12E("2rj) 14E2("rj)) 2j = n kX j=1 (12pj 14p2j) 2j Then because Pnr=1 j"rjXrj are independently and identically distributed for j = 1;:::;k, using the Central limit theorem, we can get the asymptotic normality of Tc similar as the result in Theorem 2.1. 2.5.2 Two Sample Selection Problem For the two sample predation preference problem, we also can look it as a two sample repeated measures problem. For example, in clinical trials, we have control group X and the new treatment group Y . We record the data of the experimental subjects in both groups under some conditions or at some prespeci ed time points j = 1;:::;k. The investigator wishes to draw an overall conclusion whether the new treatment constitutes an "improve- ment" over the control for the entire study. As de ned in Section 2.4, we use X and Y to denote the two populations for the two groups and we use "rj to denote if the jth observation of rth experimental subject in group X is censored or not, similar for sj. Then the test statistic in (2.4.1) can be rewritten as: Wc = n1X r=1 n2X s=1 kX j=1 j"rj sj?(Xrj Ysj) (2.5.2) 19 We can calculate the null expectation and variance of Wc E0(Wc ) = n1n22 kX j=1 jE("rj)E( sj) = n1n22 kX j=1 jpjqj Var0(Wc ) = kX j=1 2jVar0[ n1X r=1 n2X s=1 "rj sj?(Xrj Ysj)] if we denote Qrsj = "rj sj?(Xrj Ysj) Var0(Wc ) = kX j=1 2j n1X r=1 n2X s=1 Var0(Qrsj) + kX j=1 2j n1X r=1 n2X s=1 n1X r1=1 n2X s1=1 Cov0(Qrsj;Qr1s1j) = kX j=1 2j[n1n2Var0(Q11j) +n1n2(n2 1)Cov0(Q11j;Q12j) +n1n2(n1 1)Cov0(Q11j;Q21j)] Where: Var0(Q11j) = E(Q2rsj) E2(Qrsj) = E0("1j 1j?(X1j Y1j))2 [E("1j)E( 1j)E0(?(X1j Y1j))]2 = 12E("21j)E( 21j) 14E2("1j)E2( 1j) = 12E("1j)E( 1j) 14E2("1j)E2( 1j) = 12pjqj 14p2jq2j 20 Cov0(Q11j;Q12j) = E0(Q11jQ12j) E0(Q11j)E0(Q12j) = E0("1j 1j?(X1j Y1j)"1j 2j?(X1j Y2j)) E20("1j 1j?(X1j Y2j)) = E("21j)E( 1j 2j)E0(?(X1j Y1j)?(X1j Y2j)) E2("1j)E2( 1j)E20(?(X1j Y1j)) = 13E("21j)E( 1j 2j) 14E2("1j 1j) = 13pjq2j 14p2jq2j and similarly Cov(Q11j;Q21j) = 13E("1j"2j)E( 21j) 14E2("1j 1j) = 13p2jqj 14p2jq2j Then becausePn1r=1Pn2s=1Qrsj are independently and identically distributed for j = 1;:::;k, using the Central limit theorem, we can get the asymptotic property of Wc following similar steps as in Theorem 2.2. 2.6 Monte Carlo Simulation We performed a large scale simulation to evaluate the nite sample performance of the asymptotic distribution of the statistics W+ and Wc . 2.6.1 Null Simulation First we performed a null simulation for the two-sample continuous prey feature case under the nominal = 0:05. For this study, independent, identically distributed samples were generated from the standard Cauchy (C), central t with 5 degrees of freedom (t5), standard normal (N), and the standard Laplace (L) distributions. We considered all 42 = 10 possible combinations of the distributions for the two samples. For the sample sizes we took 21 (n1;n2) combinations of (3;10), (3;3), (10;3), and (10;10). For the number of selections of each subject we took k = 3;6;and 12. The asymptotic distribution given in Theorem 2.2 is used to determine the null rejection rates. We performed 104 repetitions and found the proportion of cases for which fW+ E0(W+ )g=fvar0(W+ )g1=2 > 1:645. The results for (t) = 1(1 t) were quite similar to the results for (t) = 1 t which we report in Table 2.2. We note that for most distribution combinations, the rejection rates were very Table 2.2: Simulated null rejection rates of W+ using its asymptotic normal critical constant for continuous distributions, (t)/t; = :05. k N;N C;N t5;N N;L C;C C;t5 C;L t5;t5 t5;L L;L n1 = 10;n2 = 10 3 0.0482 0.0513 0.0496 0.0484 0.0470 0.0559 0.0538 0.0479 0.0508 0.0457 6 0.0518 0.0556 0.0490 0.0483 0.0484 0.0534 0.0528 0.0499 0.0532 0.0477 12 0.0496 0.0545 0.0510 0.0456 0.0517 0.0557 0.0531 0.0478 0.0500 0.0493 n1 = 3;n2 = 10 3 0.0449 0.0785 0.0576 0.0445 0.0487 0.0661 0.0671 0.0502 0.0472 0.0507 6 0.0469 0.0743 0.0572 0.0488 0.0516 0.0717 0.0701 0.0521 0.0508 0.0544 12 0.0492 0.0695 0.0548 0.0452 0.0480 0.0679 0.0667 0.0487 0.0510 0.0509 n1 = 3;n2 = 3 3 0.0440 0.0471 0.0435 0.0468 0.0454 0.0454 0.0481 0.0481 0.0472 0.0479 6 0.0514 0.0499 0.0542 0.0467 0.0524 0.0526 0.0495 0.0493 0.0560 0.0506 12 0.0509 0.0507 0.0502 0.0490 0.0458 0.0503 0.0502 0.0508 0.0481 0.0480 n1 = 10;n2 = 3 3 0.0483 0.0281 0.0473 0.0575 0.0488 0.0361 0.0342 0.0505 0.0529 0.0499 6 0.0531 0.0316 0.0439 0.0556 0.0480 0.0387 0.0347 0.0533 0.0516 0.0515 12 0.0487 0.0296 0.0454 0.0566 0.0488 0.0355 0.0345 0.0520 0.0494 0.0517 close to the nominal = 0:05 despite using the asymptotic distribution for such small samples. In the hybrid distribution cases where the two samples were drawn from two di erent distributions, when the sample size from the heavier tailed distribution is smaller than the sample size from the lighter tailed one, we found that the rejection rates were higher than 0.05. The reverse phenomenon occurs when the sample sizes are reversed. It appears that the asymptotic variances are under- and over-estimated, respectively, for the two cases. We can see the pattern more clearly from Figure 2.1. 22 Figure 2.1: Figure for Null Rejection Rates in Table 2.2 Distributions Null Rejection Rates, k=6 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 a71 C,C C,t5 C,N C,L t5,t5 t5,N t5,L N,N N,L L,L 0.0316 0.0500 0.0743 a71 a71 a71 n1=10,n2=10 n1=3, n2=10 n1=3, n2=3 n1=10,n2=3 In Figure 2.1, the horizontal axis denotes di erent distributional settings and the vertical axis denotes null rejection rate. Di erent color denotes di erent sample size combinations and the at line in the middle denotes the nominal = 0:05. The Figure shows the null rejection rate for di erent distributional settings and di erent sample size combinations. From this Figure, we can see some clear patterns: First, for most distributional settings, the null rejection rates are fairly close to 0:05 despite using the asymptotic distribution for such small sample size. Second, the red line is the most preferred line since overall, it?s the closed one to the at line which denotes the nominal = 0:05. This indicates the test statistic performs the best for equal large sample size (n1 = n2 = 10) which meets our expectation. Third, there are some abnormality due to including the extremely heavy tailed Cauchy distribution to generate the sample. If the sample size generated using Cauchy distribution is relatively larger compared to the other sample size, the asymptotic variance tends to be over-estimated, which would lead to a smaller null rejection rate. Similarly, if the sample 23 size generated using Cauchy distribution is relatively smaller compared to the other sample size, the asymptotic variance tends to be under-estimated, which would lead to a larger null rejection rate. 2.6.2 Power Simulation To determine the power of the test based on the asymptotic distribution in Theorem 2.2, we generated independent samples for the jth selection as Xj N( j;1) and Yj N(0;1) (j = 1;:::;k). We study the performance of the asymptotic test under simple ordering 1 k and umbrella ordering 1 q k, q2f1;:::;kg, of the X means. We used Wilcoxon scores (t) = 1 t and normal scores (t) = 1(1 t), where is the standard normal cdf. The proportion of cases out of 104 in which the standardized statistic exceeded 1.645 were recorded. In the rst part of the simulation, we generated the X samples by drawing the values of the rst approximately k=2 selections from the N(0:5;1) distribution and the remaining selections from the N(0;1) distribution. For the second part, the rst k=3 X selections were drawn from the N(1;1) distribution, the middle k=3 were from the N(0:5;1) distribution, and the nal k=3 were from the N(0;1) distribution. Finally, the third part used the rst k=3 selections from N(0;1), the middle k=3 from N(0:5;1) and the nal k=3 from N(0;1) creating an umbrella pattern. The results are given in Table 2.3. It is evident that the rejection rates were increasing with increasing k or increasing n1 and n2 for both score functions. It also appears that the test?s ability to detect simple ordered mean patterns increased when the number of means used to generate the X sample was larger. This goes hand in hand with our intuition that the test statistic becomes better trained to detect patterns when there are more distinct classes. The test based Wilcoxon scores was more powerful than the test based on normal scores. This is expected since the means are arranged in a linearly decreasing pattern where linear scores are optimal. Normal scores are optimal for the case where the means are quickly decreasing at the beginning and at the end but slowly decreasing in the middle. 24 Table 2.3: The simulated power of W+ using its asymptotic normal distribution where Xj N( j;1); Yj N(0;1); 1 j k; = 0:05 (t) = 1 t (t) = 1(1 t) n1 = n2 n1 = n2 Mean pattern k 2 5 7 2 5 7 f0:5 0:5 0g 3 0.111 0.138 0.164 0.046 0.112 0.134 12 0.240 0.492 0.624 0.154 0.273 0.350 f1 0:5 0g 3 0.248 0.453 0.567 0.072 0.220 0.286 12 0.569 0.939 0.984 0.270 0.562 0.711 f0 0:5 0g 3 0.099 0.109 0.112 0.026 0.046 0.047 12 0.115 0.185 0.228 0.055 0.050 0.051 The power of the test was low in detecting an umbrella ordered pattern in the alternative. We note that the test is not designed to detect such patterns since the score function is monotone non-increasing with each successive selection. If one is interested in such patterns, then the test of [43] for umbrella alternatives may be adapted for this case in a straightforward manner. If the populations that generated the samples are as in Table 2.3 but the X mean con guration is f1;1=4;0;0g, then a test using linear scores will not be as powerful as one using a score function that is decreasing fast at rst and slowly at the end. One such score function is (t) = G 1(1 t), where G is the Gamma(1=5;4) distribution. Using the Wilcoxon score function, for k = 4 selections, 103 replications gave rejection rates of 29%, and 40% for n1 = n2 values of 5 and 7, respectively. The corresponding rejection rates for the gamma score function were 40% and 54%. Clearly, the choice of score functions is important. It is possible to select bent scores as discussed on p. 100 of [44] or Section 4 of [45]. Alternatively, nonparametric estimates of the optimal score function may be obtained using the methods of [46] and [47]. Finally, we performed a simulation study with the selections were correlated. This was done by generating the X sample from a trivariate normal distribution with mean (1;1=2;0) 25 and an order one autoregressive (AR(1)) covariance matrix with (i;j)th entry ji jj for = 0;0:2;0:5;0:9. The rejection rates were nearly the same as the ones in Table 2.3. 2.6.3 Null Simulation for Two Sample Case with Censoring Similar to the null simulation for the two-sample continuous prey feature case, for this study, under the nominal = 0:05, independent, identically distributed samples were gen- erated from the standard Cauchy (C), central t with 5 degrees of freedom (t5), standard normal (N), and the standard Laplace (L) distributions. We considered all 42 = 10 possible combinations of the distributions for the two samples. For the sample sizes we took (n1;n2) combinations of (3;10), (3;3), and (10;3). For the number of repeated measurements of each subject we took k = 3;12. For the censor probability, we took (pj = qj = 0:5) The asymptotic distribution of Wc is used to determine the null rejection rates. We performed 104 repetitions and found the proportion of cases for which fWc E0(Wc )g=fvar0(Wc )g1=2 > 1:645. Table 2.4: Simulated null rejection rates of W+ using its asymptotic normal critical constant, (t)/t; = :05;P("rj = 1) = P( sj = 1) = :5. k N;N C;N t5;N N;L C;C C;t5 C;L t5;t5 t5;L L;L n1 = 3;n2 = 10 3 0.0436 0.0479 0.0427 0.0413 0.0431 0.0476 0.0521 0.0402 0.0439 0.0449 12 0.0459 0.0589 0.0460 0.0453 0.0481 0.0572 0.0526 0.0476 0.0469 0.0420 n1 = 3;n2 = 3 3 0.0504 0.0487 0.0509 0.0498 0.0512 0.0523 0.0451 0.0468 0.0524 0.0499 12 0.0469 0.0461 0.0460 0.0441 0.0437 0.0454 0.0431 0.0480 0.0444 0.0446 n1 = 10;n2 = 3 3 0.0445 0.0380 0.0401 0.0472 0.0421 0.0403 0.0404 0.0451 0.0447 0.0445 12 0.0482 0.0383 0.0449 0.0546 0.0460 0.0444 0.0365 0.0466 0.0493 0.0490 It is evident from Table 2.4, Although the same size is very small, n1; n2 just takes 3 and 10, for a very large censor probability pj = qj = 0:5, for all the distribution settings, the null rejection rate is still fairly close to the nominal = 0:05. 26 Finally, we performed simulations for di erent censoring probabilities (pj = 0:1; qj = 0:1), (pj = 0:1; qj = 0:9), (pj = 0:25; qj = 0:25), (pj = 0:25; qj = 0:75) and (pj = 0:9; qj = 0:9) . In all these cases, the null rejection rates are all close to the nomial 0:05, with less censoring probability corresponding to closer to 0:05, which meets our expectation. 2.7 Conclusion This chapter gave nonparametric rank based test statistics for detecting preference pat- terns in selective predation by extending the method of [1]. This test statistics can be used broadly in repeated measures data study. We provided a class of general rank score tests for the one sample case where we only have one species of predators and prey have two features of interest. This gives the exibility to place varied emphasis on consecutive selections at di erent stages of the selection experiment. This could be helpful in controlling the manner in which extraneous variables can a ect selection preference patterns. This chapter also proposes a class of general rank score tests for the di erence in preda- tion patterns of two predatory species, which is a special case for the two-sample repeated measures data. In this case, prey feature of interest can be continuous or categorical. It is shown that the test statistic for the categorical case becomes equivalent to the continuous case if the data have no ties. In both cases, the asymptotic distribution of the test statistic is Gaussian. We also study the test statistics and its asymptotic properties for the one-sample and two-sample repeated measures data with random censoring. The results of a simulation study using the asymptotic Gaussian distribution but small samples shows that the test has a satisfactory nite-sample performance. The null simulation shows that null rejection rates are close to nominal values. The asymptotic test is powerful in detecting simple-ordered alternatives. 27 Chapter 3 Testing the Di erence Trend in Two Sample Repeated Measurement Data 3.1 Motivation of the Method Consider the predator-prey relationship discussed in the previous chapter. The manner in which the predator selects its prey is crucial for its survival. Several methods have been given in the past to quantify selective predation. As pointed out in [1], one drawback of most existing methods is that they do not take into account the order in which di erent preys are taken. [1] suggest to use a Wilcoxon statistic to test for food preference while taking into account the order in which di erent prey are taken. The use of the Wilcoxon statistic means that the gradient of selection remains the same throughout the experiment. In the previous chapter we gave a generalized form of the Wilcoxon statistic used by [1] to allow experimenters place varied emphasis on the selection gradient at di erent parts of the experiment. Further we proposed a method to test the di erence in preference patterns of two di erent species of predators, where prey feature can be either continuous or categorical. In this chapter, we are in particular interested in testing whether the rate of prey feature change in the consecutive selections of one species is faster than that of another species. Another problem we consider involves an experiment conducted by the Department of Kinesiology at Auburn University. Let us consider part of the study by [48] concerning the e ect of a single session of high intensity aerobic exercise on in ammatory markers of subjects taken over time. Eighteen subjects were placed into two groups (High Fitness and Moderate Fitness) depending on their tness levels, nine in each group. The response here is the C- Reactive protein (CRP). Elevated CRP levels are a marker of low-grade chronic in ammation and may predict a higher risk for cardiovascular disease. Our e ect of interest is the trend of di erence of CRP levels between the two groups (High Fitness - Moderate Fitness), we 28 are interested in testing whether the di erence stays the same over time or decays to zero over time. Each individual in the two groups was required to nish an exercise session which consumes 500 kcal. CRP levels were obtained 24 hours and immediately prior to the acute bout of exercise and subsequently 24, 72 and 120 hours following exercise. Table 3.1: CRP levels in [48] id group -24 0 24 72 120 1 LO 1.79 0.78 0.68 0.86 0.83 2 LO 0.09 0.14 0.07 0.14 0.51 3 LO 1.13 0.99 1.06 0.91 0.92 4 LO 2.94 2.07 1.51 1.05 1.03 5 LO 1.82 2.28 2.99 1.95 1.78 6 LO 0.17 0.08 0.29 0.3 0.22 7 LO 0.38 0.19 0.29 0.49 0.26 8 LO 1.32 0.96 1.8 0.82 1.68 9 LO 0.4 0.11 0.25 0.2 0.31 10 HI 0.11 0.19 0.21 0.22 0.08 11 HI 0.15 0.17 0.15 0.11 0.21 12 HI 0.06 0.05 0.05 0.06 0.16 13 HI 0.31 0.2 0.28 0.14 0.24 14 HI 0.21 0.28 0.35 0.1 0.2 15 HI 0.36 0.37 4.54 1.8 1.1 16 HI 0.92 1.23 1.47 0.96 0.81 17 HI 0.52 0.52 0.51 0.31 0.35 18 HI 2.05 1.61 1.35 0.73 0.67 In this example, the data sample size is small, we only have 9 subjects in each group and each subject has only 5 repeated measurement data. The data are highly skewed and contain outliers, which we can see from Figure 3.1. Both the two-sample selective predation problem and the CRP level problem are speci c examples for two sample repeated measurement data, and the problem we are interested in can be generalized as studying the trend di erence between the two samples over repeated measurements. In the current chapter, we propose a procedure that is analogous to the Page test [24] to study the trend of the di erence between two samples in a repeated measurement data. Although the Page test is used for randomized complete block designs, we extend it to the situation where we have two randomized complete block designs. We derive the 29 Figure 3.1: Boxplots for CRP levels at time 0 and time 24 for the two groups HI and LO a71 a71 a71 HI LO 0.0 0.5 1.0 1.5 2.0 CRP levels immediatly prior to exercise time CRP a71 HI LO 0 1 2 3 4 CRP levels 24 hours following exercise time CRP asymptotic distribution of a general test statistic that includes the Page statistic as a special case. This provides an approximate level test for the changing rates in the simple ordering case. 3.2 Test Statistic and Theory Suppose now that we have two populations of repeated measurement data represented by X and Y , and each subject in the two groups has k repeated measurements. We will assume that the repeated measurements for each subject are continuously measured. Suppose we have a random sample n1 from X and n2 = N n1 from Y to participate in the experiment. The data from this experiment are presented as an n1 k matrix X and n2 k matrix Y. In both cases, rows represent individuals and columns represent repeated measurements. Let U = [X0 Y0]0 be the N k matrix of responses. In the following Jp q denotes a p q matrix of ones and 0p q Jp q Jp q is a p q matrix of zeros. 30 Consider the following multivariate linear model U = JN k + C + E ; (3.2.1) 2R is an unknown intercept parameter, C = [J0n1 k 00n2 k]0, = diag( 1;:::; k) is a diagonal matrix of k real unknown parameters, and E is an N k matrix of random error terms. In the model in (3.2.1), if we assume that E[E] = 0N k, then, for any r = 1;:::;n1 and s = 1;:::;n2, E(Xrj Ysj) = j, for j = 1;:::;k. Thus is a shift parameter. We are interested in testing H0 : 1 = = k HA : 1 k ; with at least one strict inequality. Rejecting H0 in favor of HA means that the gap between two groups is increasing over time. Let N = n1n2 and de ne N k matrix Z = X Jn2 1 Jn1 1 Y. Let R be the N k matrix of row-ranks with (i;j)th element of R is Rij = Pkt=1 IfZit Zijg. Selection ranks are ~R = R0 JN 1 : [24] introduced a rank test of H0 versus HA for the case where the ?s are parameters of a randomized complete block design by modifying the Friedman test. Extending this to our situation, we de ne an equivalent of the Page L statistic for testing H0 versus H1 as W = A0k ~R ; where A0k = (1;:::;k). A level test rejects H0 in favor of HA if W >w where w is chosen to satisfy P0(W >w ) = . Here P0 stands for the probability measure under the restriction imposed by H0. The exact value of w can be determined; however, this is a very tedious process. So, we opt for deriving an asymptotic test based on the asymptotic distribution of 31 W. We will derive the asymptotic distribution of a more general test statistic of which W is one special case. To that end, let = ( 1;:::; k)0 be a vector of contrast coe cients such that 1 k and normalized such that J0k 1 = 0 and 0 = 1. We then de ne our \generalized" Page test statistic as W = 0~R : (3.2.2) The following theorem gives the null expectation and variance of W . The proof is quite long and is found in the appendix. Theorem 3.1 Assume that the null hypothesis is true and the score function are normal- ized. Then E0(W ) = 0 and var0(W ) = 0V = n1n2k(k + 1)12 +n1n2(n1 +n2 2)f (2k 4)A (k 2)B + (k 1)(k 2)C + 3k 2 11k + 16 12 g where A = Z P0(U11 U(n1+1)1 +U(n1+1)2 >t)f1 P0(U11 U11 +U(n1+1)2 >t)gdF(t);; B = Z fP0(U11 U(n1+1)1 +U(n1+1)2 >t)g2dF(t) ; and C = Z fP0(U11 U(n1+1)1 +U(n1+1)2 0. We have short-range dependence in the case = 1 and long-range dependence (or long memory) in the case < 1. It is usual to characterize a long-memory process by the number H = 1 =2, the so called Hurst parameter. In the following, we will show that test statistic W is a sum of a long-memory process. We can write Var0(W ) = n1n2C1(k) +n1n2(n1 +n2 2)C2(k) Var0(W ) = 1N C1(k) + N 2N C2(k) where C1(k) and C2(k) are functions of k and N = n1n2. When N tends to 1, we can rewrite Var0(W ) as: Var0(W ) C2(k)N (2H 2) where the Hurst parameter H = 12 log(n1+n2) log(n1)+log(n2) + 1 with = 2 2H < 1 and H!3=4 as min(n1;n2)!1: If we rewrite the test statistic (3.2.2) as W = n1X i=1 n2X j=1 0Rij = n1X i=1 Ti then W = 1N n1X i=1 n2X j=1 0Rij = n1n2X i=1 Ti n1n2 where we assume Ti;i = 1;:::;n1 is a stationary process. So we conclude W is a sum of long-memory process. 33 To continue our discussion, we need to assume that there is a function G2L2(R), such that Ti=n2 = G(Xi), where Xi is a stationary Gaussian process with long memory. The Hurst parameter is now m = inffk > 0 : ck 6= 0g, where k is the coe cient in the Hermite polynomial expansion of G as G(x) = P1k=0 ckk!Hk(x) . Theorem 3.2 Let Gm be the class of all functions in L2(R) with Hermite rank m. Let G 2 Gm and Xt is a stationary Gaussian process with long-range correlation and Sn = Pn t=1 G(Xt). Then the following holds: If 1=2 2, 1=2 < 3=4 = H < 1 1=(2m), which falls under case 1 of Theorem 3.2, in which 2S = limn!1n 1 nX j;l=1 E[G(Xj)G(Xl)] exists. In our case, limn!1n 1 nX j;l=1 E[G(Xj)G(Xl)] = limn 1!1 n 11 n1X j;l=1 E[Tjnn Tlnn ] limn 1!1 (n1 1)Gk =1 where Gk is a function of k. So we conclude that this can not be the case, m > 2 is not true. 2. If m = 2, which falls under case 2 of Theorem 3.2, but in this case, the distribution of Zm is complicated. 3. If m = 1, which still falls in case 2 of Theorem 3.2. In our case, we can get that N HW D !pcmamN(0;1) where pcmam is a constant we need to estimate. Since Var0(W ) C2(k)N (2H 2), we get Var0(W) = C2(k)N 2H, we can use pC2(k) as an estimator of the constant pc mam. We will use simulation to show this is the case. 3.3 Monte Carlo Simulation 3.3.1 Null Simulation First we perform a null simulation for the two-sample continuous prey feature case under the nominal = 0:05. For this study, independent, identically distributed samples are generated from the standard Cauchy (C), central t with 5 degrees of freedom (t5), standard normal (N), and the standard Laplace (L) distributions. For the sample sizes we take combinations (n1;n2) as (2;2), (2;4),(2;7),(2;10),(4;4),(4;7),(4;10),(7;7),(7;10), and (10;10). For the number of selections of each subject we take k = 5. The asymptotic 35 distribution given in Theorem 3.2 is used to determine the null rejection rates. We perform 104 repetitions and nd the proportion of cases for which (n1n2)HW =pC2(k) > 1:645. We note that for all the distribution combinations, the null rejection rate are all close to Table 3.2: Simulated null rejection rates of W using its asymptotic normal critical constant, (t)/t; = :05 and k = 5. (n1;n2) N;N C;C t5;t5 L;L 2,2 0.0555 0.0590 0.0537 0.0574 2,4 0.0550 0.0613 0.0539 0.0580 2,7 0.0577 0.0554 0.0612 0.0553 2,10 0.0601 0.0586 0.0575 0.0616 4,4 0.0611 0.0558 0.0569 0.0569 4,7 0.0507 0.0522 0.0548 0.0571 4,10 0.0512 0.0529 0.0549 0.0548 7,7 0.0551 0.0536 0.0520 0.0548 7,10 0.0524 0.0591 0.0547 0.0509 10,10 0.0545 0.0524 0.0521 0.0522 and a little above the nominal = 0:05 despite using the asymptotic distribution as the underlying distribution for such small samples. We can conclude that in Theorem 3.2, we should take m = 1, and pC2(k) is a ne although not perfect estimate of the constant pc mam. To improve the simulation result, we need to nd a better estimate than pC 2(k) for the constant pcmam. 3.3.2 Results for the data in [48] For the data set in [48], we are interested in testing the following hypothesis: H0 : The di erence between two groups stays the same. Ha : The di erence between two groups decays to zero. Using the nondecreasing normalized linear score function and the asymptotic Gaussian dis- tribution of the test statistic W , we nd that the p value=.466. We would not reject H0 using this p value and we conclude that the di erence between the two groups (Moderate Fitness and High Fitness) does not decay to zero, which we can also see from the Figure 3.2 . 36 Figure 3.2: Mean and median pro les for the CRP levels in two groups HI and LO -20 0 20 40 60 80 100 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 Mean Profiles times Me a n s LO HI -20 0 20 40 60 80 100 0.2 0.4 0.6 0.8 1.0 Median Profiles times Me d i a n s LO HI In both the two gures, the horizontal axis denotes the times at which we obtain the CRP values for each individual. The rst gure shows the mean CRP levels in each group (Moderate Fitness and High Fitness) at the ve time points. The second gure shows the median CRP levels in each group (Moderate Fitness and High Fitness) at the ve time points. Since from gure 3.1 we know the data is highly skewed and there are outliers in this data, we prefer to use the median pro le, which clearly verify our result that the di erence of CRP levels between the two groups does not decay to zero over time. 3.4 Conclusion If we already have enough evidence to conclude that two species of predator tend to choose di erent prey, then we may want to study the di erence further, for example, to compare the trends of the two selection patterns. In this chapter, we proposed a rank based method to test the trend between two samples in repeated measurement data. We 37 establish the asymptotic normality of the test statistic. Our simulation study showed that the asymptotic test retains the nominal Type I error rate. If we want to test the trend of the di erence of the opposite direction, that is, Ha : 1 2 ::: k with at least one strict inequality, then we can just multiply the generalized normalized non-decreasing weight function by -1. This will not a ect the asymptotic properties of the test statistic. 38 Chapter 4 Iteratively Reweighted Rank Regression Estimator 4.1 Introduction In this chapter, we introduce and study IRLS rank estimators. This method of es- timating rank regression parameters is essential in developing a general rank approach to generalized estimating equations (GEE). We shall reconsider testing in the two-sample re- peated measures problems using rank GEE methods in the next chapter. Moreover, using the rank GEE approach we will develop a program for rank testing and estimation of treat- ment e ects for repeated measurement models with the response distribution assumed to be continuous and in the exponential family of distributions. Consider the general linear model Y = X + e ; (4.1.1) where 2Rp is a vector of unknown parameters, Y2Rn denotes the response vector, and X denotes the n p design matrix of predictors. The random error vector e = (e1;:::;en)T is such that e1;:::;en are independent and identically distributed according to a distribution function F with density f. Least squares (LS) procedures are widely used for the analysis of linear models such as the one given in (4.1.1). In addition to their mathematical clarity, they o er the user a uni ed methodology with which to attack many diverse problems. The LS estimator of in (4.1.1) is given by b LS = Argmin 2Rp kY X k2 ; 39 where k k2 is the squared Euclidean norm; that is kvk2LS = Pni=1v2i for v 2Rn. The LS estimator is given by b LS = (X0X) 1X0Y. Under regularity conditions, the large sample distribution is given by b LS D !N( ; 2(X0X) 1) : Since the LS estimator minimizes the Euclidean distance between the response vector and the space de ned by the linear model, the LS estimator is not robust to outliers and skewness. There are several classes of robust estimators that are analogous to the traditional LS estimators. They are usually obtained by replacing the Euclidean norm with another norm. Furthermore, depending on the selected norm, the analysis can be made robust and highly e cient compared to the LS analysis. A popular rank-based estimator uses the rank semi- norm instead of the Euclidean norm. The rank semi-norm on Rn is given by kvkR = nX i=1 a(R(vi))vi ; v2Rn where R(vi) denotes the rank of vi among v1;:::;vn, a(i) = (i=(n + 1)) and (u) is the score function, which can be any nondecreasing function on (0;1). The fact that this is a semi-norm is shown in [49]. The rank estimate of is a vector b R such that b R = Argmin 2Rp kY X kR : This method was proposed by [33]. Under regularity conditions found in Chapter 3 of [44], the asymptotic distribution of the rank estimator is given by b RD !N( ;VR) ; (4.1.2) where VR = 2(X0X) 1 with 1 = Z 1 0 (u) f 0(F 1(u)) f(F 1(u)) du: (4.1.3) 40 A common rank estimator is the Wilcoxon estimator that uses (u) =p12(u 0:5). In this case, 1 =p12R1 1f2(t)dt. It can be shown that the rank estimate solves the estimating equations X0a(Y X ) = 0, where a(Y X ) denotes the n-vector with ith component a[R(Yi x0i )]. The solution cannot be obtained in closed form, but there are some algorithms available for obtaining an approximate solution [44]. One approach proposed by [34] uses iteratively reweighted least squares (IRLS) to estimate b R. The appeal of this method is its simplicity and that it can be obtained using any package that can compute LS estimates. The purpose of this chapter is to evaluate the nite sample performance of the IRLS estimate of b R and its variance. We provide computational algorithms as well as simulation results pertaining to the estimator, its variance, and its in uence function. 4.2 Reweighted Least Squares Formulation For any 2Rp, letri( ) denote theith residualYi xTi and r( ) = (r1( );:::;rn( ))T. Now b n minimizes the dispersion function Dn( ) :=kr( )kR = nX i=1 R(r i( )) n+ 1 ri( ) : (4.2.1) We will assume that the score generating function is standardized so that R10 (u)du = 0 and R10 2(u)du = 1. Since is nondecreasing, ( ) = 0 for some 2 (0;1) and is such that (u) 0 ( 0) whenever u (u ). Let m ( ) be the th quantile of the ri( ); i = 1;:::;n. The dispersion function in (4.2.1) may be written as Dn( ) = nX i=1 R(r i( )) n+ 1 [ri( ) m ( )] = nX i=1 wi( )[ri( ) m ( )]2 41 where wi( ) = 8 >>< >>: nR(r i( )) n+1 o ri( ) m ( ) ; if jri( ) m ( )j6= 0 ; 0 ; if jri( ) m ( )j= 0 : Due to the centering of the residuals by m ( ), the weights, wi( ), are nonnegative. The expression above suggests an iterative scheme. Given the kth step estimate, b k; the (k+1)th step estimate b k+1 minimizes the (k + 1)th step dispersion given by D n( jb k) = nX i=1 wi(b k)((Yi xTi ) m (b k))2; k = 0;1;::: ; (4.2.2) that is b k+1 = Argmin D n( jb k) : Such algorithms have been considered in the past for computing estimators. [50] used such a method to obtain least absolute deviation estimators. [51] used a similar strategy to obtain M-estimates of regression coe cients. [52] develop an IRLS algorithm for the dispersion function given by (4.2.1) but centering using the mean of the residuals at every step. It is shown in [34] that pnkb k b nk! 0 in probability as n!1 for every k 1 under some regularity conditions including the consistency of the initial estimator b 0. They argued that since the asymptotic properties of b n are known, the asymptotic properties of b k are also known and equivalent to those of b n for every k 1. What are not known, however, are the nite sample properties of b k. It is well known that the LS estimator is more e cient than the rank estimator when the error distribution is normal and the rank estimator has higher asymptotic e ciency than the LS estimator when the error distribution has long tails [44]. For nite samples, however, we expect the IRLS estimator discussed in this chapter to borrow some of its properties from LS. As such, we hypothesize that the IRLS estimator will have higher e ciency than both the LS and the rank estimators for error distributions with moderate tail thicknesses. 42 4.3 Computational Details and Results 4.3.1 Algorithm for Estimating Regression Coe cients Let Y i ( ) = Yi m ( ). Taking the derivative of equation (4.2.2) with respect to and setting the resulting expression equal to 0 we obtain X i wi(b k)[Y i (b k) xTi ]xi = 0 (4.3.1) which is the weighted LS estimating equation for the model Y i (b k) = xTi +"i ; 1 i n. Given an initial estimate b 0, we nd the (k + 1)th step IRLS estimate as b k+1 = (X0W(b k)X) 1X0W(b k)(Y m (b k)) (4.3.2) for k = 0;1;:::. Here we describe the algorithm used for computing the rank estimate of using IRLS. Algorithm 4.1 1. Let the initial estimator b 0 be given. Set cw and c to small positive numbers (usually 10 5) and set k = 1. 2. If jri(b k 1) m (b k 1)jc kb k 1k, then set k = k + 1 and return to step 2; else, stop. We compare our approach to that of the traditional approach of obtaining the rank estimate of . The traditional approach uses the quasi-Newton algorithm proposed by [53]. It per- forms a Newton step starting with an initial estimate. It then checks if the step is successful in reducing the dispersion function. If the step is successful, then it will continue Newton 43 step iterations. Otherwise, it performs a linear search to bracket the solution. We will use the Wilcoxon semi-norm and the R code given by [54] that uses quantile regression methods of [55] to compute the estimate using the traditional approach. This code is available at http://www.stat.wmich.edu/mckean/HMC/Rcode. We will refer to the estimate obtained using this method the HM estimate. The simulation considers estimating = ( 2; 3; 4)T in the model Yi = 1 + 2x2i + 3x3i + 4x4i +ei ; i = 1;:::;n: We generated xji from the Uniform(0;10j) distribution, j = 2;3;4. The errors ei were randomly generated from the t distribution or the contaminated normal distribution in each repetition. We shall set the true parameter = (0;0;0)T; that is we take Yi = ei. In repetition b, let the estimates of obtained using IRLS, HM, and LS be b (b)Ij , b (b)HMj, and b (b)LSj, respectively, for j = 2;3;4 and b = 1;:::;B. The average mean squared error estimates IRLS, HM, and LS are found as [MSEI = 1 3B 3X j=1 BX b=1 (b (b)Ij )2 ; [MSEHM = 1 3B 3X j=1 BX b=1 (b (b)HMj )2 ; and [MSELS = 1 3B 3X j=1 BX b=1 (b (b)LSj )2 ; respectively. The estimated relative e ciencies of IRLS with respect to HM and LS are RE(I;HM) = [MSEHM[ MSEI and RE(I;LS) = [MSELS[ MSEI ; where RE > 1 indicates that IRLS is more e cient than the corresponding procedure. 44 The t distribution was chosen as an error distribution since it covers a wide range of tail thicknesses starting with the Cauchy (t1) to the normal (t1). The contaminated normal distribution is ideal for studying the e ect of outliers on estimates. We will consider a scale contamination of the N(0;1) distribution which is created by sampling a proportion 1 of the time from the N(0;1) distribution and a proportion of the time from the N(0; 2) distribution. We will use CN( ; ) to represent the contaminated normal distribution whose distribution function is given by F ; (x) = (1 ) (x) + (x= ) ; where is the standard normal distribution function. Table 4.1 and 4.2 give the results of our simulation study for B = 5000 repetitions. As can be seen in Table 4.1, when the errors are Table 4.1: Relative e ciencies of IRLS versus HM and LS when the errors follow a tdf distribution df = 3 df = 6 df = 12 n HM LS HM LS HM LS 10 0.981 1.267 1.051 1.042 1.057 .987 15 0.965 1.474 1.024 1.040 1.046 .993 20 0.945 1.433 1.011 1.065 1.039 1.005 generated from the tdf distribution, HM is more e cient than LS and IRLS when the error distribution is heavy tailed (df = 3). LS is more e cient than HM and IRLS when the tail of the distribution approaches the tails of the normal distribution (df = 12). IRLS is more e cient than HM and LS for a moderate-tailed distribution (df = 6). Table 4.2 contains Table 4.2: Relative e ciencies of IRLS versus HM and LS when the errors follow a CN( ;3) dis- tribution = 0 = :05 = :10 n HM LS HM LS HM LS 10 1.096 .970 1.040 1.062 1.041 1.022 15 1.059 .968 1.011 1.094 .984 1.146 20 1.057 .974 .996 1.117 .957 1.182 results of the case where the error distribution is N( ;3). The results for CN( ;5) were 45 similar; hence omitted. It is clear that LS is the most e cient of the three when the error distribution is normal ( = 0). For large contamination ( = :10), HM is more e cient than LS and IRLS except for very small sample sizes. IRLS seems to have superior performance for such cases. IRLS is the most e cient of the three for moderate contamination ( = :05). 4.3.2 Variance Estimator Suppose b k is the consistent kth step IRLS estimator of . Given b k, the variance of b k+1 given in (4.3.2) is bV = E[Var(b k+1jb k)] +Var[E(b k+1jb k)] ; where Var(b k+1jb k) = [(X0W(b k)X) 1X0W(b k)]Var(Y)[(X0W(b k)X) 1X0W(b k)]0 (4.3.3) and E(b k+1jb k) = (X0W(b k)X) 1X0W(b k)(X0 m (b k)) = (X0W(b k)X) 1X0W(b k)m (b k) (4.3.4) We will use the leave-one-out procedure described below to obtain estimates of (4.3.3) and (4.3.4). Algorithm 4.2 1. For i = 1;:::;n, leave the ith observation out, calculate the regression coe cient b (i) using the (n 1) observations in Algorithm 4.1. Let the number of steps required to converge be k(i) + 1. 2. Use (4.3.3) and (4.3.4) to calculate Var(b (i)k(i)+1jb (i)k(i)) and E(b (i)k(i)+1jb (i)k(i)) 46 3. Set bV1 = 1nPni=1 Var(b (i)k(i)+1jb (i)k(i)) and bV2 = Var( 1nPni=1E(b (i)k(i)+1jb (i)k(i))). 4. Take bV = bV1 + bV2 as estimate of V. In the literature, there are a number of proposed methods to estimate V = 2(X0X) 1. All of them are based on estimating from the set of residuals via either considering lengths of nonparametric intervals based on the Walsh averages of the residuals [53] or nonparametric density estimation [56]. We shall compare our estimates of V obtained through Algorithm 4.2 to that of [56]. The algorithm of [56], described brie y here, is implemented in the R code of [54]. The method of [56] uses kernel density estimates with rectangular kernels. Letbe1;:::;ben be the estimated residuals and let bHn(y) = 1 n nX i=1 nX j=1 [ (j=n) (i=n)]I(jbe(i) be(j)j y); where (u) = (u) (0) (1) (0) : A consistent estimator of is given by ^ = r n n p 1 ( [ (1) (0)]bHn(bH 1n ( ))=pn 2bH 1n ( )=pn ) 1 : The recommended values of are 0.8 if n > 5p and 0.9 otherwise. Similar to the case of coe cients, we will refer to the estimate of the variance obtained this way as HM. In the following, we perform a Monte Carlo comparison of the variances of the IRLS method to HM. The setting of the study is the same as the one used in Subsection 4.3.1 and the MSEs are computed in a similar fashion. Since the distribution of the errors is speci ed in the simulation, the true value of is found via exact or numerical integration using 1 =p12R1 1f2(t)dt. The relative e ciencies of IRLS versus HM using the average MSEs are given in Tables 4.3 and 4.4. We can see from Table 4.3 that when the errors 47 Table 4.3: REs of bV of IRLS vs HM when ei tdf n df = 3 df = 6 df = 12 10 .481 6.608 3.674 15 .391 2.200 2.482 20 .044 1.413 1.710 follow the tdf distribution, HM is more e cient for heavier tailed distributions (df = 3) and IRLS is more e cient for moderate and light tailed distributions (df = 6;12). From Table 4.4: REs of bV of IRLS vs HM when ei CN( ;3) n = 0 = :05 = :10 10 9.907 2.827 3.809 15 3.871 1.775 1.916 20 2.431 .855 .347 Table 4.4, we observe that for the CN( ;3) distribution, HM is more e cient for larger contamination ( = :05;:10) when the sample size is relatively large (n = 20). Generally IRLS gives more e cient estimates when there is no contamination = 0 or the sample size is small (n = 10;15). 4.3.3 In uence Function One measure of robustness is the in uence function [57] which measures the amount of change in the estimator that can be brought about by an in nitesimal local contamination. Following the approach of [58], we can derive the in uence function of b k+1 given in (4.3.2) at a given point (x0;y0). Let r = y x0 and de ne Aw = 1w w and z(x0;y0) = 1w w(y0 x00 )(y0 x00 )x0, where w = E r@w@r xx0 and w = E[wxx0] : 48 Now the in uence function of b k+1 at (x0;y0) is IF(b k+1; x0;y0) = kX i=0 Aiwz(x0;y0) +Ak+1w IF(b 0; x0;y0) : Note that for our weights z(x0;y0) = 1w (F(y0 x00 ))x0 : So, IF(b k+1; x0;y0) will be bounded in the y-direction if we use a bounded score function and IF(b 0; x0;y0) is bounded in the y-direction. However, IF(b k+1; x0;y0) remains un- bounded in the x-direction regardless of the boundedness of IF(b 0; x0;y0). This is in fact true for rank estimates. Rank estimates that achieve bounded in uence have been given by [47] and [59]. These use the so-called weighted Wilcoxon dispersion function proposed by [60]. The resulting dispersion function is not the same as the one in (4.2.1). To evaluate the robustness of the IRLS estimator in nite sample cases, we performed a Monte Carlo analysis using the sensitivity curve of the estimator. The sensitivity curve is the empirical in uence function de ned by [61, 62] SCn(z) = b (z;Z1;:::;Zn) b (Z1;:::;Zn) 1=(n+ 1) ; where Zi = (Xi;Yi) for i = 1;:::;n, z 2Rp+1, and b (Z1;:::;Zn) represents the estimator of based on the observations Z1;:::;Zn. This is used to get an idea of the sensitivity of b to local changes. We performed a Monte Carlo analysis of the sensitivity of the IRLS estimator. In our simulation, we considered a simple linear model through the origin given by Yi = Xi +"i ; i = 1;:::;100 ; 49 where Xi N(0;1), "i N(0;16 1), and the true value of was taken to be zero. We obtained 1000 IRLS estimates of , each time using the generated data and after a point (x;y) 2f( 15; 15);( 14:9; 14:9);:::;(15;15)g was added. As initial values, we used the least squares estimate of as well as the high-breakdown and bounded in uence MM estimate of [63]. The average sensitivity curve of the initial (left panel) and the fully iterated IRLS estimate (right panel) are given in Figure 4.1. It is clear from the gure that MM is Figure 4.1: Sensitivity Curves -15 -10 -5 0 5 10 15 0 10 30 50 70 LS z SC (z) -15 -10 -5 0 5 10 15 0.00 0.10 0.20 MM z SC (z) -15 -10 -5 0 5 10 15 0 2 4 6 8 10 LS-IRLS z SC (z) -15 -10 -5 0 5 10 15 0 2 4 6 8 MM-IRLS z SC (z) a robust estimator while LS has an unbounded sensitivity curve. It is also clear that, even when we start with a highly robust estimator, the IRLS estimate of remains sensitive to outlying values in the data. 50 4.4 Tests of Linear Hypotheses Consider testing a general linear hypothesis of the form H0 : A = 0 versus H1 : A 6= 0 (4.4.1) where A is a q p hypothesis matrix with rank q. For example, when A is the p p identity matrix, H1 corresponds to regression signi cance. The Wald test rejects H0 if T2 = (Ab )0(AbVA0) 1(Ab ) (4.4.2) is larger than 21 (q). However, as shown in [64], for nite samples, a better test rejects H0 if T2=q > F1 (q;n p 1), where F1 (q;n p 1) corresponds to the (1 )100% percentile of the F distribution with q and n p 1 degrees of freedom. We now perform a small Monte Carlo simulation to compare the performance of T2 obtained via IRLS and HM. To this end, we consider the model used in Subsection 4.3.1 with n = 30. We are interested in testing H0 : = 0 versus H0 : 6= 0 : This is equivalent to using A = I3, the 3 3 identity matrix, in the general linear hypoth- esis (4.4.1). Table 4.5 contains the proportion of rejections under H0 in 1000 repetitions. Table 4.5 shows that the Wald test based on IRLS appears to overestimate the nominal Table 4.5: Empirical levels for IRLS vs HM N(0;1) CN(:1;3) :01 :05 :10 :01 :05 :10 IRLS :024 :072 :129 :012 :063 :083 HM :004 :035 :074 :005 :037 :070 51 and the Wald test based on HM underestimates . IRLS comes closer to the nominal in the case of contaminated normal errors. 4.5 Conclusion For linear models, the least squares estimator of the regression coe cient is optimal if the error distribution is normal. For distributions with longer tails than the normal, however, rank estimators are more e cient than the least squares estimator. [34] proposed an IRLS method for estimating rank estimators. In this chapter, we study the nite sample performance of the IRLS method. For Wilcoxon scores, using the LS estimate as an initial estimate, we nd that the IRLS algorithm leads to estimates whose e ciency is between those of the LS and rank estimates obtained using the algorithm of [54]. The LS estimates are e cient when the distribution of the error is short-tailed. The algorithm of [54] gives e cient estimates when the distribution of the error is long-tailed while the IRLS algorithm gives e cient estimates when the distribution of the error is moderate-tailed. It is also observed that there is a need for an IRLS formulation of bounded in uence rank estimators [47]. The boundedness of the initial estimator is a necessary condition for the boundedness of the in uence function of IRLS estimator, but it is not su cient. 52 Chapter 5 IRLS Estimation of Rank Generalized Estimation Equations Model 5.1 Introduction The analysis of data resulting from repeated measurement designs such designs often be- comes complicated because of the non-zero within-subject correlation. To avoid this problem [10] proposed the Generalized Estimation Equations (GEE) Model, an extension of general- ized linear models to the analysis of longitudinal data. They introduced a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject?s observations yet they reduce to the score equations for multivariate Gaussian outcomes. The GEE specify only the relation- ship between the marginal mean of the response variable and covariates. Within-subject correlation is then accounted for through a ?working? correlation matrix. Since then, the GEE model has been used to deal with many speci c data models, for example, for binary models [65] [66], in nonlinear regression models [67], in proportional odds model [68], in Binomial model [69] [70], in logistic model [71], in contingency tables [72], in mixed logistic models [73], in normally distributed models [74]. However, the solution of GEEs proposed by [10] is obtained using an iterated reweighted least squares tting which is not robust. One solution proposed by [30] is by introducing a diagonal weight matrix for the within-subject correlation into the estimating equations. An- other solution is one given by [31] who proposed an adaption of the Wilcoxon-Mann-Whitney method of estimating linear regression parameters for use in longitudinal data analysis under the working independence model. They used joint ranking (JR) of all observations in their development. [32] consider the same model as [31] but they use separate between-subject 53 and within-subject ranks to specify their Wilcoxon-Mann-Whitney estimating equation. [2] proposed a rank-based tting procedure that only involves substituting a norm based on a score function for the Euclidean norm used by [10]. Their subsequent tting, while also an iterated reweighted least squares solution to GEEs, is robust to outliers in response space and further the weights can easily be adapted for robustness in factor space. In this Chapter, we will rst introduce the ordinary Least Square based GEE models proposed by [10] and the Iterated Reweighted Rank-Based Estimates for GEE models. Then under similar assumptions, we introduce our selective predation pattern problem as a special case of the two di erent GEEs. 5.2 Ordinary LS based GEE models Consider a longitudinal set of observations over K subjects, let yij denote the jth re- sponse for the ith subject for j = 1;2:::;ni and i = 1;2;:::;K. Assume that xij is a p 1 vector of corresponding covariates. Let N = PKi=1 ni denote the total sample size. Assume that the marginal distribution of yij is of the exponential class of distributions and is given by f(yij) = exp[yij ij a( ij) +b(yij)] (5.2.1) where > 0, ij = h( ij) and ij = xTij . Thus the mean and variance are given by E(yij) = a0( ij) and Var(yij) = a00( ij)= In this notation, the link function is h 1 (a0) 1. Let Yi = (yi1;:::;yini) and Xi = (xi1;:::;xini) denote the ni 1 vector of response and the ni p matrix of covariates, respectively, for the ith individual. We consider the general case where the components of the vector of responses for the ith subject, Yi, are dependent. Let i = ( i1; i2;:::; ini)T, so that E(Yi) = a0( i) = (a0( i1);:::;a0( ini))T. For a s 1 vector of unknown parameters , let Ri = Ri( ) denote a ni ni correlation matrix. De ne 54 the matrix Vi = A1=2i Ri( )A1=2i = (5.2.2) where Ai = diag(a00( i1);:::;a00( ini)). We refer to Ri as the working correlation matrix. Vi will be equal to cov(Yi) if Ri( ) is the true correlation matrix for the Yi. For estimation, let bVi be an estimate of Vi and bRi be an estimate of Ri, which, in general, requires estimation of and often an initial estimate of . In general, we will denote the estimator of by ^ ( ; ) to re ect its dependence on and . Liang and Zeger (1986) de ned their estimate in terms of general estimation equations (GEE). De ne the ni p Hessian matrix, Di = @a 0( i) @ ;i = 1;:::;K Then their GEE estimator b LS is the solution to the equations XK i=1D T i bV 1 i [Yi a 0( i)] = 0: (5.2.3) which we denoted by XK i=1Ui[ ; b f ; ^ ( )g] = 0 We can de ne the dispersion function in terms of the Euclidean norm. DLS( ) = XK i=1[Yi a 0( i)]TbV 1 i [Yi a 0( i)] = XK i=1[ bV 12i Yi bV 12i a0( i)]T[bV 12i Yi V 12i a0( i)] = XK i=1 Xni j=1[y ij dij( )] 2 55 where Y i = bV 12i Yi = (y i1;:::;y ik)T, dij( ) = cTj a0( i), and cTj is the jth row of bV 12i . The gradient of DLS( ) is rDLS( ) = XK i=1D T i bV 1 i [Yi a 0( i)]: (5.2.4) Thus the solution to (5.2.3) also can be expressed as ^ LS = ArgminDLS( ) (5.2.5) From this point of view, ^ LS is a nonlinear least squares (LS) estimator. The following theorem given by [10] establishes the asymptotic normality of ^ LS. Theorem 5.1 Under mild regularity conditions and given that: A.1 ^ is K12 -consistent given and . A.2 ^ is K12 -consistent given ; and A.3 j@b ( ; )=@ j H(Y; ) which is Op(1). Then K12 (b LS ) is asymptotically multivariate Gaussian with zero mean and covariance matrix VLS given by VLS = lim K!1 K( XK i=1D T i V 1 i Di) 1fXK i=1D T i V 1 i cov(YiV 1 i Di)g( XK i=1D T i V 1 i Di) 1 5.3 Iterated Reweighted Rank-Based Estimates for GEE Models [75] developed a class of nonlinear robust estimators. Similar to nonlinear LS estimators, these estimators minimize a norm of the residuals where, for a vector v2 Rn, the norm is de ned by kvk= nX i=1 [r(vi)=(n+ 1)]vi 56 where r(vi) denote the rank of vi among v1;:::;vn and the score function (u) is a nonde- creasing, square-integrable function de ned on the interval (0;1), Without loss of generality, we standardized so that Z (u)du = 0 and Z (u)2du = 1 For nonnegative weights, we need one other assumption on the score function. For discussion, we also assume that the score function is odd about 1=2; that is (1 u) = (u) let Y i = bV12i Yi = (y i1;:::;y ik)T,gij( ) = cTj a0( i), where cTj is the jth row of bV12i , and let G i = [gij]. The rank-based dispersion function is given by DR( ) = XK i=1 Xni j=1 [r(y ij gij( ))=(n+ 1)][y ij gij( )] (5.3.1) We next write the R estimator as weighted LS estimator. From this representation the asymptotic theory of the R estimator can be derived. Furthermore, it naturally suggests an IRLS algorithm. Let eij = y ij gij denote the (i;j)th residual and let m( ) = med(i;j)feij( )g denote the median of all the residuals. Then because the scores sum to 0, we have the identity, DR( ) = XK i=1 Xni j=1 [r(eij( ))=(n+ 1)][eij( ) m( )] = XK i=1 Xni j=1 [r(eij( ))=(n+ 1)] eij( ) m( ) [eij( ) m( )] = XK i=1 Xni j=1wij( )[eij( ) m( )] 2 where wij( ) = [r(eij( ))=(n+ 1)]=[eij( ) m( )] is a weight function. As usual, we take wij( ) = 0 if eij( ) m( ) = 0. Note that the weights are nonnegative. 57 Now let b (0)R denote an initial estimator of . As estimates of the weights, we use bwij(b (0)R ); i.e., the weight function evaluated at b (0), we have the dispersion function D R( jb (0)R ) = XK i=1 Xni j=1bwij( b (0)R )[eij( ) m(b (0)R )]2 = XK i=1 Xni j=1[ q ^wij(b (0)R )eij( ) q bwij(b (0)R )m(b (0)R )]2 Let b (1)R = ArgminD ( jb (0)R ) This establishes a sequence of IRLS estimates, b (k)R g;k = 1;2;::: After some algebraic simpli cation, we can obtain the gradient rD R( jb (k)R ) = 2 XK i=1D T i bV 12 i cWibV 12 i [Yi a 0( ) m (b (k) R )] (5.3.2) where m (b (k)R ) = bV12i m(b (k)R )1,1 denotes a ni 1 vector all of whose elements are 1, and cW = diagfbwi1;:::;bwin igis the diagonal matrix of weights for the ith subject. Hence b (k+1)R satis es the general estimating equations (GEE) given by, XK i=1D T i bV 12 i cWibV 12 i [Yi a 0( ) m ( (k) R )] = 0 (5.3.3) Which we denoted by XK i=1Zi( ; ( )) = 0 Theorem 5.2 Under these assumptions, A.1 pKj^ ( ) j= Op(1), as K!1, when is known. A.2 pKjb ( ; ) j= Op(1) when and are known. A.3 j@b ( ; )=@ j H(Y; ) which is Op(1) 58 A.4 (Lindeberg-Feller Conditions): For i = 1;:::;K, let Ui = V 12i and UN = [UT1 UT2 :::UTK]T. Denote the (l;j)th entry of UN by ulj,l = 1;2;:::;n;j = 1;2;:::;ni: Then max 1 l K u2ljP n m=1 u2mj!0;for all j = 1;:::;ni and lim K!1 1 nU T NUN exists and is positive de nite: A.5 The score function ?(u) is bounded and satis es the standardizing conditions. A.6 The marginal pdf of eyij = yij gij( ) is continuous and variance-covariance matrix given in the following is positive de nite. Assume that the initial estimate satis es pK(b 0R ) = Op(1). Then under the above assumptions, for k 1, pK(b (k)R ) has an asymptotic normal distribution with mean 0 and covariance matrix, lim K!1 Kf XK i=1D T i V 12 i WiV 12 i Dig 1fXK i=1D T i V 12 i Var(? y i)V 12 i Digf XK i=1D T i V 12 i WiV 12 i Dig where ?i denotes the ni 1 vector (?[r(eyi1=(n+ 1))];:::;?[r(eyini)=(n+ 1)])T and where Yyi = V 1=2i Yi = (yyi1;:::;yyik)T Gyi( ) = V 1=2i a0i( ) = [gyij] eyij = yyij gyij( ) 5.4 Our Case for the Ordinary LS based GEE Model In our case of studying two sample predation preference, we are considering a longitudi- nal set of observations over n1 +n2 subjects, Using the notation in Section 5.2, K = n1 +n2 59 andni = k. letuijdenote thejth response forith subject forj = 1;2:::;k andi = 1;2;:::;K then we can write the model as: uij = +xTij where = ( 1; 2;:::; k)T, is a known intercept. If is unknown here, use the median of xij, where n1 < i K. Here xij = I(i n1)Ij, where Ij is a k 1 vector with the jth item equal to 1, and all the other items equal to 0. If we denote uij by yij then our model becomes: yij = xTij (5.4.1) Let N = (n1 +n2)k denote the total sample size. Assume that the marginal distribution of yij is of the exponential class of distributions and is given by (5.2.1). In our case, ij = ij = xTij . the link function h 1 (a0) 1 = I and the mean and variance are given by E(yij) = a0( ij) = xTij and Var(yij) = a00( ij)= = 1 Ai = diagfa00( i1);:::;a00( ik)g= I Vi = A1=2i Ri( )A1=2i = = Ri( )= a0( i) = XTi , which means Di = 8 >>< >>: I; if i n1 ; 0; if n1 + 1 i K : So the GEE estimator b LS actually is the solution to the equations Xn1 i=1 bV 1i [Yi XTi ] = 0: (5.4.2) 60 which we denote by Xn1 i=1Ui[ ; b f ; ^ ( )g] = 0 and the gradient of DLS( ) is rDLs( ) = Xn1 i=1 bV 1i [Yi XTi ] We have the similar theorem as Theorem 5.1. The proof is given in the appendix. Theorem 5.3 Under mild regularity conditions and given that: A.1 b is K12 -consistent given and . A.2 ^ is K12 -consistent given ; and A.3 j@b ( ; )=@ j H(Y; ) which is Op(1). Then K12 (b LS ) is asymptotically multivariate Gaussian with zero mean and covariance matrix VLS given by VLS = limn 1!1 n1( Xn1 i=1V 1 i ) 1fXn1 i=1V 1 i cov(Yi)V 1 i g( Xn1 i=1V 1 i ) 1 5.5 Our Case for the Iterated Reweighted Rank-Based Estimators for GEE Models In our case, rst we need an initial intercept ^ which is given by median(uij xTij 0R), where 0R is the initial R satisfying pK(b 0R ) = Op(1) and n1 t)1 P0(U11 U11 +U(n1+1)2 >t)dF(t);; B = Z P0(U11 U(n1+1)1 +U(n1+1)2 >t)2dF(t) ; and C = Z fP0(U11 U(n1+1)1 +U(n1+1)2 t)1 P0(U11 U(n1+1)1 +U(n1+1)2 >t)dF(t) = A (4) p6= v;q = u E[I(U1p U(n1+1)p t)1 P0(U11 U(n1+1)1 +U(n1+1)2 >t)dF(t) = A Thus kX u=1 kX v=1;u6=v u v kX p=1;p6=u kX q=1;q6=v E[I(X1p Y1p t)2dF(t) = B Thus kX u=1 kX v=1 u v kX p=1;p6=u kX q=1;q6=v E[I(X1p Y1p