Rank Based Methods for Repeated Measurement Data
by
Yankun Gong
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
August 6, 2011
Keywords: Rank, Nonparametric, Predation Preference, Weighted Function, IRLS,
Repeated Measures, Sensitivity Curve, Relative E ciency
Copyright 2011 by Yankun Gong
Approved by
Asheber Abebe, Chair, Associate Professor of Mathematics and Statistics
Peng Zeng, Associate Professor of Mathematics and Statistics
Mark Carpenter, Professor of Mathematics and Statistics
Abstract
This dissertation considers rank based methods for one sample and two sample repeated
measurement data. As a speci c example, in Chapter 2, this dissertation considers nonpara-
metric tests for selective predation. [1] proposed a nonparametric tests for selective predation
using the linear score function. Motivated by this method, general rank tests are given for
the case of one predatory species and prey characterized by a binary feature of interest and
the case of two predatory species and prey characterized by either a continuous or a categori-
cal feature of interest. The score functions used to construct the test statistics are monotone
and hence the test is designed to detect simple ordered alternatives. The results based on
the asymptotic Gaussian distribution of the test statistics show that the tests retain nominal
Type-I error rates. The results also show that power of the asymptotic test depends on the
chosen score function. In Chapter 2, we study the one sample and two sample repeated
measurement data with random censoring and the simulation results show that we can take
the asymptotic distribution as the underlying distribution of the test statistic even with a
high censoring rate and a small sample size.
In Chapter 3, this dissertation considers using rank based methods to test the trend
of the di erence between two samples for two sample repeated measurement data. As a
speci c example, we consider nonparametric methods for testing whether the rate of prey
feature change in the selection of one species is faster than that of another species. Although
the Page test is used in conjunction with a single randomized complete block design, we
extend it to the situation where we have two randomized complete block designs. We derive
the asymptotic distribution of a general test statistic which includes the Page statistic as a
special case. The results based on the asymptotic Gaussian distribution of the test statistics
show that the tests retain nominal Type-I error rates.
ii
In Chapter 4, the  nite sample performance of the rank estimator of regression coe -
cients obtained using the iteratively reweighted least squares (IRLS) of Sievers and Abebe
(2004) is evaluated. E ciency comparisons show that the IRLS method does quite well in
comparison to least squares or the traditional rank estimates in cases of moderate tailed er-
ror distributions; however, the IRLS method does not appear to be suitable for heavy tailed
data. Moreover, the results show that the IRLS estimator will have an unbounded in uence
function even if we use an initial estimator with a bounded in uence function.
In Chapter 5, this dissertation study how to test the trend of the di erence in the
two sample repeated measurement data using two generalized estimating equation (GEE)
models: the likelihood based GEE model proposed by Liang and Zeger (1986) and the IRLS
Rank-based GEE model proposed by [2]. The results show that the GEE model proposed by
[2] is robust to outliers in response space and can be used to analyze data with small sample
sizes compared to the GEE model proposed by Liang and Zeger (1986).
iii
Acknowledgments
My greatest thanks extend to Dr. Asheber Abebe for his patience and guidance through-
out this endeavor. Dr. Abebe took considerable time and energy to further the progress
in my studies of Statistics. I also wish to express my appreciation to my parents and my
brother Yanfei Gong who have given their love and constant support throughout my life. I
would also like to thank Dr. Peng Zeng and Dr. Mark Carpenter on the committee for their
contribution to this dissertation and serving as committee members.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Rank Tests for Selective Predation . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Data and Model in [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Experiment Frame and Data . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 General Rank Test for Preference: One-Sample . . . . . . . . . . . . . . . . 12
2.4 Two Sample Comparison of Preference Patterns . . . . . . . . . . . . . . . . 14
2.4.1 Continuous Prey Feature . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 Categorical Prey Feature . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Censored Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 One-sample Selection Problem . . . . . . . . . . . . . . . . . . . . . . 18
2.5.2 Two Sample Selection Problem . . . . . . . . . . . . . . . . . . . . . 19
2.6 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.1 Null Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.2 Power Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.3 Null Simulation for Two Sample Case with Censoring . . . . . . . . . 26
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Testing the Di erence Trend in Two Sample Repeated Measurement Data . . . 28
v
3.1 Motivation of the Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Test Statistic and Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Null Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Results for the data in [48] . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Iteratively Reweighted Rank Regression Estimator . . . . . . . . . . . . . . . . 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Reweighted Least Squares Formulation . . . . . . . . . . . . . . . . . . . . . 41
4.3 Computational Details and Results . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1 Algorithm for Estimating Regression Coe cients . . . . . . . . . . . 43
4.3.2 Variance Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.3 In uence Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Tests of Linear Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 IRLS Estimation of Rank Generalized Estimation Equations Model . . . . . . . 53
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Ordinary LS based GEE models . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Iterated Reweighted Rank-Based Estimates for GEE Models . . . . . . . . . 56
5.4 Our Case for the Ordinary LS based GEE Model . . . . . . . . . . . . . . . 59
5.5 Our Case for the Iterated Reweighted Rank-Based Estimators for GEE Models 61
5.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Appendix: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
vi
List of Figures
2.1 Figure for Null Rejection Rates in Table 2.2 . . . . . . . . . . . . . . . . . . . . 23
3.1 Boxplots for CRP levels at time 0 and time 24 for the two groups HI and LO . 30
3.2 Mean and median pro les for the CRP levels in two groups HI and LO . . . . . 37
4.1 Sensitivity Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1 Residul plots for The ordinary GEE and rank-based GEE . . . . . . . . . . . . 64
vii
List of Tables
2.1 Selections of  ve large or small armyworms in bird trials . . . . . . . . . . . . . 10
2.2 Simulated null rejection rates of W+ using its asymptotic normal critical constant
for continuous distributions,  (t)/t;  = :05. . . . . . . . . . . . . . . . . . . 22
2.3 The simulated power of W+ using its asymptotic normal distribution where Xj 
N( j;1); Yj N(0;1); 1 j k;  = 0:05 . . . . . . . . . . . . . . . . . . . . 25
2.4 Simulated null rejection rates ofW+ using its asymptotic normal critical constant,
 (t)/t;  = :05;P("rj = 1) = P( sj = 1) = :5. . . . . . . . . . . . . . . . . . . 26
3.1 CRP levels in [48] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Simulated null rejection rates of W using its asymptotic normal critical con-
stant,  (t)/t;  = :05 and k = 5. . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1 Relative e ciencies of IRLS versus HM and LS when the errors follow a tdf
distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Relative e ciencies of IRLS versus HM and LS when the errors follow a CN( ;3)
distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 REs of bV of IRLS vs HM when ei tdf . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 REs of bV of IRLS vs HM when ei CN( ;3) . . . . . . . . . . . . . . . . . . . 48
4.5 Empirical  levels for IRLS vs HM . . . . . . . . . . . . . . . . . . . . . . . . . 51
viii
Chapter 1
Introduction
In the 70+ years since their origin in the mid-1930s, nonparametric statistical methods
have  ourished and have emerged as valuable methodology for statisticians and other scien-
tists performing data analysis. Roughly speaking, a nonparametric procedure is a statistical
procedure that has certain desirable properties that hold under relatively mild assumptions
regarding the underlying population from which the data are obtained. Nonparametric tech-
niques have the following advantages compared to parametric techniques:
1. Nonparametric methods require few assumptions about the underlying population from
which the data are drawn. In particular, nonparametric procedures forgo the tradi-
tional normality assumption.
2. Nonparametric procedures enable the user to obtain exact p-values for tests, exact
coverage probabilities for con dence intervals, exact experimentwise error rates for
multiple comparison procedures, and exact coverage probabilities for con dence bands
without relying on assumptions that the underlying populations are normal.
3. Nonparametric techniques are often (although not always) easier to apply than their
normal theory counterparts.
4. Usually the nonparametric procedures are only slightly less e cient than their normal
theory competitors when the underlying populations are normal, and they can be
mildly or wildly more e cient than these competitors when the underlying populations
are not normal.
5. Nonparametric methods are widely used for studying populations that take on a ranked
order (such as movie reviews receiving one to four stars). The use of nonparametric
1
methods may be necessary when data have a ranking but no clear numerical interpre-
tation, such as when assessing preference; in terms of levels of measurement, for data
on an ordinal scale.
6. Due to the reliance on fewer assumptions, nonparametric methods are relatively robust
to outlying observations and other violations.
In this dissertation, we discuss the nonparametric techniques in repeated measurement data.
Repeated measurement data, in which the same response variable is recorded on each ob-
servational unit on several di erent occasions, occur frequently in many di erent disciplines.
One example is that sequential experiments for treatment comparisons are widely used in
clinical (e.g., a new treatment and a control or placebo) or pharmaceutical trials and in many
other applied contexts. There are two possible inferential goals in these studies. One goal
is to establish a signi cant overall di erence between the two treatments under study, the
other goal is after we conclude that there does exist a di erence between the two treatments,
to check if the di erence stays the same, decays, or expands over time.
Several approaches have been considered in the past to deal with comparison of two
groups with respect to their repeated measures. A useful and common initial step in the
analysis of repeated measures data is to graph the data in some way. A method often
employed, particularly in medical publications, is to plot means by treatment group for
every time point. An example of such a plot for the data from the trail of two treatments
for the control of intestinal parasites in cattle can be found in [3].
Another commonly used method of analysis for repeated measures that involve a number
of treatment groups, particularly in medical and related research, is to compare the groups
at each time point, by using either t-test or some nonparametric equivalent. [4] suggested
that this approach may be quite useful if the occasions are few and the intervals between
them are large.
A more relevant, but still relatively straightforward, approach to the analysis of repeated
measures data is that involving the use of summary measures, and sometimes known as
2
response feature analysis. Here the responses for each subject are used to construct a single
number that summarizes some aspect of the subject?s response pro le. (In some cases more
than a single summary measure may be used). The summary measure to be used needs to
be chosen before the analysis of the data and should, of course, be relevant to the particular
questions of interest in the study. [5] gave a list of potentially useful summary measures. [6]
suggest three methods of analysis using summary statistics, which are post-treatment means
(POST), mean changes (CHANGE) and analysis of covariance (ANCOVA).
The multivariate procedure involves testing on a set of transformed variables represent-
ing the within-subject di erence of each within-subject factor and their interactions. The
hypothesis that the means of a set of transformed variables representing a within-subject
factor, or an interaction between within-subject factors, are zero can be tested by using
Hotelling?s T2 statistics which is introduced in [7]. To deal with the cases when the variables
are not normally distributed or the sample size is small, [8] proposed numerous nonparamet-
ric approaches.
The repeated measures data most appropriately analysed by the methods described
above are those from designed experiments where all subjects have the same number of ob-
servations measured at equivalent time interval. Since these conditions often do not hold in
clinical trials, methods to accommodate missing data have been developed ([9]). Many of
these are likelihood-based methods ([10], [11]). When values are missing completely at ran-
dom, non-parametric and semiparametric methods have been developed to analyze such data
([10], [12], [13]). When the missing values are not random, [14] propose a semiparametric
estimation procedure for treatment di erences over time based on repeated measurements
of an outcome variable when the patient?s follow-up time may depend on observed and/or
missing measurements. They examine the e ect of switching to didanosine(ddI) from zi-
dovudine(AZT) for HIV-infected patients who had tolerated AZT for at least 16 weeks, 304
patients were randomly chosen to continue AZT therapy and 298 patients were assigned to
3
take ddI at a daily dose of 500 mg. The patient?s CD4 cell count, a proxy for the progres-
sion of HIV infection, was obtained periodically during the trial. In addition to the usual
clinical endpoint analysis, they also evaluate the group di erence based on such repeated
measurements.
Apart from clinical trials, another important application  eld of repeated measurements
is to study the predation patterns in evolutionary Biology. In Chapter 2 of this dissertation,
we use the Wilcoxon signed rank statistic to study the repeated measurement data model
in [1]. Our intention is to develop nonparametric hypothesis tests for selective predation as
a result of values taken by prey feature of interest. We do this by extending the method of
[1] to allow for di ering gradients in prey selection and random censoring. We also extend
the method to study the two sample predation preference.
If we conclude that the two predator species do have di erent prey preference, in order
to test whether the rate of prey feature change in the selections of one species is faster than
that of another species, in Chapter 3, we need to use theory of long memory processes.
The phenomenon of long memory had been known long before suitable stochastic mod-
els were developed. Scientists in diverse  elds of statistical applications observed empirically
that correlations between observations that are far apart (in time or space) decay to zero at
a slower rate than one would expect from independent data or data following classic Markov-
type models. As a result of Mandelbrot?s pioneering work, self-similar and related stationary
processes with long memory were later introduced to statistics, to provide a sound mathe-
matical basis for statistical inference. Since then, long memory (or long-range dependence)
has become a rapidly developing subject. Because of the diversity of applications, the litera-
ture on the topic is broadly scattered in a large number of journals, including those in  elds
such as agronomy, astronomy, chemistry, economics, engineering, environmental sciences,
mathematics, physics, geosciences, hydrology, and statistics. Best known is the occurrence
of long-range dependence in geophysics and hydrology (for a review, see [15]). In particular,
the so-called Hurst E ect ([16]) can be explained by slowly decaying correlations. However,
4
there are many other  elds of application where this type of correlation occurs. As early
as 1895 the astronomer Newcomb discussed the phenomenon of long-range dependence in
astronomical data sets and called it "semi-systematic" errors. He also proposed a heuristic
explanation by superposition of independent random errors and constant systematic errors.
In 1902, Karl Pearson observed slowly decaying correlations in simulated astronomical obser-
vations. Further examples are discussed, for instance, by [17] and [18] for economical data,
[19] for data from biology, geophysics, meteorology and hydrology, [20] for meteorological
data, [21] for telecommunication data and [22] and [23] for agricultural data.
In Chapter 3, using stationary process with long-memory theory, we propose a procedure
that is analogous to the Page test [24] to study the trend of the di erence between two
samples of repeated measurement data. Although the Page test is used in conjunction with
a single randomized complete block design, we extend it to the situation where we have two
randomized complete block designs. We derive the asymptotic distribution of a general test
statistic that includes the Page statistic as a special case. This provides an approximate  
level test for the changing rates in the simple ordering case.
For the repeated measurement data, when the outcome variable is approximately Gaus-
sian, statistical methods are well developed, e.g. [25] and [26]. For non-Gaussian outcomes,
however, less development has taken place. For binary data, repeated measures models in
which observations for a subject are assumed to have exchangeable correlations have been
proposed by [27] using a probit link, by [28] using a logit link and by [29] using log linear
models. Only the model proposed by [28] allows for time-dependent covariates. [10] pro-
posed an extension of generalized linear models to the analysis of longitudinal data. They
introduce a class of estimating equations that give consistent estimates of the regression pa-
rameters and of their variance under mild assumptions about the time dependence. Using a
working generalized linear model for the marginal distribution of the response, the estimating
equations are derived without specifying the joint distribution of a subject?s observations yet
5
they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory
is presented for the general class of estimators.
The methods, however, are not robust against outliers since they are based on score
equations from the maximum likelihood method of estimation. A solution proposed by [30]
is to use M-type estimation by involving downweighting schemes. Another solution is one
given by [31] who proposed an adaptation of the Wilcoxon-Mann-Whitney method of esti-
mating linear regression parameters for use in longitudinal data analysis under the working
independence model. They used joint ranking (JR) of all observations in their development.
[32] consider the same model as [31] but they use separate between-subject and within-
subject ranks to specify their Wilcoxon-Mann-Whitney estimating equation. [2] provided a
direct rank estimation analogue of the GEE model of [10] obtained via the minimization of
rank dispersion function of [33]. This is a generalization of the work of [31]. Their develop-
ment used the iterated reweighted least squares (IRLS) formulation of the rank dispersion
function given in [34].
In Chapter 4 of this dissertation, we evaluate the  nite sample performance of the IRLS
estimate of b R and its variance. We discuss the IRLS formulation, provide computational
algorithms as well as simulation results pertaining to the estimator, its variance, and its
in uence function.
In Chapter 5, Using the LS based GEE model in [10] and the IRLS Rank-Based GEE
model proposed by [2] (see also [35]), we study the problem of selective predation patterns.
6
Chapter 2
Rank Tests for Selective Predation
2.1 Introduction
Repeated measurement data, in which the same response variable is recorded on each
observational unit on several di erent occasions, occur frequently in many di erent disci-
plines. One major  eld that generates a great deal of repeated measures data is in Biology,
when people study the predatory preference. In this chapter we will study the predatory
preference problem as a special case of repeated measure data. All the test statistics we
used and the corresponding asymptotic properties can be used for similar repeated measures
data.
Predation is necessary for life to exist. It is a force in evolution of species as natural
selection is biased in favor of e ective predators and elusive prey. We are interested in the
case where predatory behavior is guided by certain features of the prey. Dating back to the
time of Darwin, scientists have long recognized that signals used by animals to attract mates
can have the unintended consequence of attracting predators and resulting in gender selective
predation [36]. Conversely, as a recent study of [37] shows, in some species, signaling places
the receiver of the signal at a higher risk of predation than the signaler. Several methods
have been given in the past to quantify these types of selective predation [38]. As pointed out
in [39] and [1], many of these methods have major shortcomings in that they ignore the order
in which the predator selects its prey. To that end, [39] proposed a test procedure, based on
the maximum likelihood principle, that accounts for the order of selection. The performance
of this method was unsatisfactory for small sample sizes. As a remedy, [1] suggested to
use a Wilcoxon signed rank statistic to test for predatory preference. Their approach takes
into account not only the selection of prey but also the order of in which they are selected.
7
Further, as most nonparametric methods, the method provides satisfactory results for small
samples. However, this method, by using a linear score function, assumes that changes in
consecutive prey selections are equally important regardless of when the selection is made.
In this chapter, it will be assumed that there is only one feature of the prey that
determines predatory preference. Hereafter, we will refer to this as \prey feature of interest."
Our intention is to develop nonparametric hypothesis tests for selective predation as a result
of values taken by prey feature of interest. We will do this by extending the method of [1]
in three directions:
1. We provide a class of general rank tests for the experiment considered by [1]; that
is, testing for selective predation of one random sample of predators where the prey
feature of interest takes one of only two possible values. Examples of dichotomous prey
features of interest are gender and size categorized as \large" and \small". General
rank tests will allow one to place varied emphasis on selections at di erent stages of
the selection experiment. This could be helpful in controlling the manner in which
extraneous variables, such as the level of hunger of predators, can a ect selection
preference at the beginning versus at the end of the experiment.
2. We propose a class of general rank score tests for the di erence in predation patterns
of two random samples of predators (eg. two species). In this case, we treat the case
where the prey feature of interest is continuous (\surface area", for example). We then
give the test where prey feature is a categorical variable with m classes (m 2). The
test statistic for this case becomes equivalent to that of the continuous case when there
are no ties in the data.
3. If we look at our data as a repeated measures data, Xrj denote the measurement of the
rth experimental subject in population X at condition or time point j, similar for Ysj,
we may not have a complete data frame, i.e. some observations of the experimental
subjects maybe missing, which is often the case in clinical trials. We propose the
8
test statistic for randomly missing data for the cases of one sample problem and two
samples problem in repeated measures data.
In all the three cases, we study the asymptotic distributions of the test statistics as the
number of predators increases while the number of allowed selections remains constant for
all predators.
2.2 Data and Model in [1]
2.2.1 Experiment Frame and Data
The purpose of this chapter is to suggest a test statistic that can be useful to establish
preference between two response in repeated measures experimental design settings. The
context for this study was experimental work conducted at the University of Florida aimed at
conservation of naturally occurring enemies of crop pests as a means of improving biological
control of cropping systems. At issue was whether birds could be useful in crop pest control
due to their consumption of pests that escape mortality from other agents of biological
control. Specially, the research investigated aspects of pest insects that would make them
more attractive food sources for birds. Trails were conducted using red-winged blackbirds
with fall armyworms as their food source. The purpose of the trails was to determine whether
the birds preferred larger versus smaller armyworms and whether the birds preferred non-
parasitized armyworms to parasitized armyworms of the same size. The parasite used was
larva of E.plathypenae, a species common in Florida and previously investigated for its
biological control value. A description of the experiment can be found in [40].
In this experiment, a plastic chamber divided into two equal sized parts was used to
present each bird with two food choices. The positions of the two foods in the subchambers
(left or right) were varied over the trials and no directional preference was observed. In trials
to determine preference for food size, for example, each bird was presented with  ve large
armyworms in one subchamber and  ve small armyworms in the other. The researchers
9
recorded the  rst  ve choices made by the bird, including the order in which the armyworms
were selected by the birds. One of the data sets from this experiments is presented in
Table 2.1. It displays the selection by each bird: large (L) or small (S), in the order in which
they were selected (left to right).
Table 2.1: Selections of  ve large or small armyworms in bird trials
Trial number Choice: 1 2 3 4 5
Score: 5 4 3 2 1
1 L L L L S
2 L L S L S
3 S L L L S
4 L S L S S
5 L L L L S
6 L L L S L
7 L L L S S
8 L S L L S
9 S S L L L
10 L L S L L
11 L L L S S
12 L L L L S
13 L L L L S
14 L L L S S
15 L L L S S
16 S L S S L
17 L L S L S
18 L L S L S
19 L L L S S
20 S L L S L
21 L L S S L
2.2.2 Test Statistic
When testing for preference, the null hypothesis is "no preference", i.e. that the bird
randomly selects its food type (L or S) and the  ve selections made by each bird are indepen-
dent. In other words, each no preference selection by a bird is like a coin toss with probability
p = 12 of selection L over S and with independence between selections. A Wilcoxon signed
rank statistic can be used to describe the selections made in each trial. A "+" is assigned to
10
each large armyworm selected and a " " is assigned to each small armyworm. The score of
5 is assigned to the  rst selection made by the bird, 4 to the second selection,. . . , and 1 to
the  fth selection made by the bird. Using these scores means that early selection of a food
type and more selections of that food type, will both contribute to increasing the indication
of preference for that food type. Under the alternative that the birds exhibit a preference
for one of the two food types, we would expect p6= 12. For example, p > 12 would indicate
a preference for L, and p < 12 would indicate a preference for S. Because of the particular
design of this experiment, abundance of the two types of food is not an issue. If the two food
types were mixed together in one food chamber, abundance would play a more important
role in the analysis. However, in this setting they were in separate subchambers and the
researchers recorded only which subchamber was chosen by the bird in each selection.
In the jth bird trial we let
T+j = (sum of the scores assigned to +(L) food selections)
and
T j = (sum of the scores assigned to -(S) food selections)
Summing over the trials on di erent birds produces the test statistic
T+ =
nX
j=1
T+j
where n denotes the number of birds used in the experiment.
In general, suppose there are k selections made by each subject in the experimental
situation and n subjects in the experiment. Then under the null hypothesis, T+ has the
distribution of the sum of n independent Wilcoxon signed rank statistics in which k is the
11
largest rank assigned in each of the signed rank statistics. It is clear that
EH0[T+] = nk(k + 1)4 and VarH0[T+] = nk(k + 1)(2k + 1)24
When n is large and k is  xed, the null distribution of T+ can be approximately by a normal
distribution. This follows from the usual Central Limit Theorem because under the null, T+
is the sum of n independent and identically distributed statistics, Tj, each with  nite mean
and variance. From the data in Table 2.1, the exact p-value is 0:0001510.
2.3 General Rank Test for Preference: One-Sample
Suppose that we have an experiment in which there is a random sample of n predators
(n > 1) and that each predator is allowed to select k preys (k > 1). We shall assume that
relative abundance of prey remains unchanged during successive selections, or equivalently,
that successive selections are made with replacement. Let !ij represent the feature of interest
(eg. color, size, etc) of the jth prey selected by predator i, where 1 i n and 1 j k. It
will be assumed that predators make their selections independently of each other. As in [1],
we will focus on the case where the prey feature of interest is categorized into two classes (2
sizes, 2 genders etc). In this case, we can de ne a random variable X = 0 or X = 1 depending
on the category. We will write Xij for X(!ij) and fX = cg for f! : X(!) = cg; c = 0;1.
Then, under the null hypothesis that there is no preferential predation and selections are
independent, Xi1;:::;Xik are independently and identically distributed according to the
Bernoulli distribution with probability of success p = 1=2, 1  i n. A value of p > 1=2
indicates preference for fX = 1g prey. We will con ne our attention to testing whether
fX = 1g is the preferred prey feature in that predators tend to select fX = 1g prey more
than fX = 0g prey at the beginning of the selection experiment than at the end.
12
[1] gave a test statistic for detecting such selection patterns. In our notation, the
Wilcoxon signed rank test statistic given by [1] is
T+ =
nX
i=1
kX
j=1
(k j + 1)Xij : (2.3.1)
As intended, the test rejects the null if T+ is large; that is, if there are more fX = 1g
prey selections at the beginning of selection than at the end. This is done by giving less
importance to later fX = 1g prey selections than earlier selections using the score function
 (j) k j + 1. However, the importance gradient remains constant since  (j) decreases
linearly as j goes from 1 to k. This does not allow investigators to factor in extraneous
variables (such as time of day, predator?s hunger level, location, etc) that can potentially
in uence the selection process. Using a general score function [35] in the de nition of the test
statistic provides investigators the  exibility to decide on a gradient. To that end, let  be
a non-increasing function de ned on the interval (0;1) such that Pkj=1 2fj=(k + 1)g<1.
We now de ne the generalized Wilcoxon signed rank test statistic for preference patterns as
T+ =
nX
i=1
kX
j=1
 jXij ; (2.3.2)
where  j =  fj=(k + 1)g. The following theorem gives the  rst two moments and the
asymptotic distribution of the test statistic T+ . The proof is straightforward and may be
found, for instance, in [35].
Theorem 2.1 Assume that the null hypothesis is true. Then for k<1 xed andPkj=1 2j <
1 we have
T+  E0(T+ )
fvar0(T+ )g1=2 !N(0;1) in distribution as n!1;
where E0(T+ ) = 2 1nPkj=1 j and var0(T+ ) = 4 1nPkj=1  2j.
13
The score function  can be normalized such that Pkj=1  j = 0 and Pkj=1  2j = 1. One way
to obtain normalized score functions is to take
 fj=(k + 1)g= c[b hfj=(k + 1)g] (2.3.3)
and to solve for b and c using the constraints Pkj=1 j = 0 and Pkj=1  2j = 1. Here b and
c are scalars and h(u) is a nondecreasing function de ned on [0;1] depending on u alone.
The score function leading to a test statistic equivalent to (2.3.1) of [1] may be obtained by
taking h(u) = u. Other simple score functions use h(u) = u2 and h(u) = sgn(u).
Taking h(u) = u in (2.3.3) gives b = 1=2 and c = [f12(k + 1)g=fk(k 1)g]1=2. For
the data given in [1], one obtains a p-value of 3:14 10 5 for h(u) = u and 1:22 10 5 for
h(u) = u2 using the asymptotic test in Theorem 2.1. [1] found a p-value of 1:51 10 5 using
a permutation test.
2.4 Two Sample Comparison of Preference Patterns
Suppose now that we have two predatory species X and Y and record the  rst k choices
made by these two species. As earlier, we will assume that there is only one prey feature
of interest. However, in this case, we will not restrict our analyses to the case where this
feature is categorized into two classes. Instead, we will consider a case where the feature is
continuously measured and also the case of an arbitrary number of categories. To simplify
our argument, we will assume that prey are in nitely abundant or that selection is made with
replacement. Our goal is to test whether there is a signi cant di erence in the predatory
preferences of these two species.
The data from this experiment are presented as Xrj (r = 1;:::;n1;j = 1;:::;k) and
Ysj (s = 1;:::;n2;j = 1;:::;k). Here Xrj and Ysj denote the feature of interest of the jth
prey selected by the rth individual from species X and the sth individual from species Y ,
respectively. When there is no confusion, we will use X and Y as generic random variables
14
denoting prey feature selected by species X and Y , respectively. We are in particular
interested in testing whether the X species prefer larger values of the particular feature
than the Y species. This will be characterized by larger values of pr(X > Y ) at the
beginning of the experiment than at the end.
We propose the following test statistic for comparing two species selection patterns:
W+ =
n1X
r=1
n2X
s=1
kX
j=1
 j?(Xrj Ysj) ; (2.4.1)
where
?(x) =
8
>>>>
>><
>>>
>>>:
1 (x> 0) ;
1=2 (x = 0) ;
0 (x< 0) :
In the following we will study the null asymptotic behavior of W+ for continuous and cate-
gorical prey feature, respectively. All proofs are given in the Appendix.
2.4.1 Continuous Prey Feature
Assume that X and Y are measured continuously. Under the null hypothesis that
there is no preferential selection, pr(?(X  Y ) = 1) = 0:5 = pr(?(X  Y ) = 0). The null
asymptotic distribution of W+ is given in the following theorem.
Theorem 2.2 Under the null hypothesis that there is no di erence in preferences, k <1
 xed, and Pkj=1  2j <1 we have
W+  E0(W+ )
 var
0(W+ )
 1=2 ! N(0;1) in distribution as min(n1;n2)!1,
where E0(W+ ) = 2 1n1n2Pkj=1 j and var0(W+ ) = 12 1n1n2(n1 +n2 + 1)Pkj=1 2j.
Proof 2.3 Write W+ = Pkj=1  jUn1n2j, where Un1n2j = Pn1r=1Pn2s=1 ?(Xrj Ysj) for j =
1;:::;k. For j = 1;:::;k  xed, under the null hypothesis, Un1n2j is just the Mann-Whitney
15
U-statistic and thus [41]
E0(Un1n2j) = n1n22 and var0(Un1n2j) = n1n2(n1 +n2 + 1)12 :
Moreover, as min(n1;n2)!1
Un1n2j n1n2=2
fn1n2(n1 +n2 + 1)=12g1=2 ! N(0;1)
in distribution whenever the null hypothesis is true. Thus
W+  (n1n2=2)Pki=1  j
fn1n2(n1 +n2 + 1)=12g1=2 =
kX
j=1
 j
 U
n1n2j n1n2=2
fn1n2(n1 +n2 + 1)=12g1=2
 
! N
 
0;
kX
j=1
 2j
 
in distribution since k<1 and Pkj=1  2j <1.
Once again, in applications, standardizing the score function  such that Pkj=1 j = 0 and
Pk
j=1  
2
j = 1 will simplify the expressions for mean and variance.
2.4.2 Categorical Prey Feature
Assume that the prey feature of interest is categorized into m disjoint classes. We wish
to extend the two sample selective predation problem to this case where selections are being
made with replacement. Under the null hypothesis of no selective predation, the combined
sample fX1j;:::;Xn1j;Y1j;:::;Yn2jg has a multinomial distribution with mass function
P(T1 = tj1;:::;Tm = tjm) = (n1 +n2)!
mY
l=1
ptjll
tjl! ;
where Pml=1 pl = 1 and Pml=1tjl = n1 + n2 (j = 1;:::;k). In this case, we may have ties in
our data that need to be taken into account. The following theorem gives the asymptotic
normality of W+ .
16
Theorem 2.4 Suppose prey feature is categorical with m possible outcomes (m > 1) and
that the null hypothesis is true. Let tjl be the observed frequencies for selection j (j =
1;:::;k; l = 1;:::;m). Then
W+  E0(W+ )
 var
0(W+ )
 1=2 ! N(0;1) in distribution as min(n1;n2)!1,
where E0(W+ ) = 2 1n1n2Pkj=1 j and
var0(W+ ) = n1n2(n1 +n2 + 1)12
kX
j=1
 2j  n1n212(n
1 +n2)(n1 +n2 1)
kX
j=1
mX
l=1
 2j(t3jl tjl) :
Proof 2.5 The proof follows from the proof of Theorem 2.2. The only di erence is in the
computation of the null variance of Un1n2j which is given by [42] as
var0(Un1n2j) = n1n2(n1 +n2 + 1)12  n1n212(n
1 +n2)(n1 +n2 1)
mX
l=1
(t3jl tjl) :
We remark here that the added complexity presented by a discrete prey feature is the in-
troduction of ties in X and X that one needs to take into account. As in most nonpara-
metric statistics, although ties a ect the null variance, they do not usually a ect the null
expected value of the statistic [41]. It can be easily seen that the null variance of the dis-
crete case is the same as the null variance of the continuous prey feature case when tjl = 1
(j = 1;:::;k; l = 1;:::;m).
2.5 Censored Data
In repeated measurement data study, especially in clinical trials, if the repeated mea-
surements are taken over time during the study, some experimental subjects may not have a
complete vector of observations. some of the observations may be censored. In this section
we assume all the censored observations are censored completely at random.
17
De ne the i:i:d censor indicators "rj and  sj to be:
"rj =
8
>><
>>:
1; if Xrj is observed;
0; otherwise:
and
 sj =
8
>><
>>:
1; if YSj is observed;
0; otherwise:
for r = 1;:::;n1; s = 1;:::;n2 and j = 1;:::;k
we also assume:
P("rj = 1) = pj and P( sj = 1) = qj
2.5.1 One-sample Selection Problem
The problem in Section 2.3 can be considered as a one sample repeated measures prob-
lem. Let Xr = (Xr1;:::;Xrk)0 denote independent random samples for the rth subject, and
j = 1;:::;k are indices representing a set of conditions or a series of prespeci ed condi-
tions or time points for which repeated measurements from each experimental subject are
obtained. Some experimental subjects may not have a complete vector of observations. We
use "rj to indicate if the jth observation of rth experimental subject in group X is censored
or not. Then the test statistic in (2.3.2) can be modi ed as:
Tc =
kX
j=1
nX
r=1
 j"rjXrj (2.5.1)
It is easy to see that
E0(Tc) = n2
kX
j=1
 jE("j) = n2
kX
j=1
pj j
18
Var0(Tc) =
kX
j=1
 2jVar0(
nX
r=1
"rjXrj)
=
kX
j=1
 2jnVar0("rjXrj)
=
kX
j=1
(E0("2rjX2rj) E20("rjXrj))
=
kX
j=1
(E("2rj)E0(X2rj) E2("rj)E20(Xrj))
= n
kX
j=1
(12E("2rj) 14E2("rj)) 2j
= n
kX
j=1
(12pj 14p2j) 2j
Then because Pnr=1  j"rjXrj are independently and identically distributed for j = 1;:::;k,
using the Central limit theorem, we can get the asymptotic normality of Tc similar as the
result in Theorem 2.1.
2.5.2 Two Sample Selection Problem
For the two sample predation preference problem, we also can look it as a two sample
repeated measures problem. For example, in clinical trials, we have control group X and
the new treatment group Y . We record the data of the experimental subjects in both groups
under some conditions or at some prespeci ed time points j = 1;:::;k. The investigator
wishes to draw an overall conclusion whether the new treatment constitutes an "improve-
ment" over the control for the entire study. As de ned in Section 2.4, we use X and Y to
denote the two populations for the two groups and we use "rj to denote if the jth observation
of rth experimental subject in group X is censored or not, similar for  sj. Then the test
statistic in (2.4.1) can be rewritten as:
Wc =
n1X
r=1
n2X
s=1
kX
j=1
 j"rj sj?(Xrj Ysj) (2.5.2)
19
We can calculate the null expectation and variance of Wc 
E0(Wc ) = n1n22
kX
j=1
 jE("rj)E( sj)
= n1n22
kX
j=1
 jpjqj
Var0(Wc ) =
kX
j=1
 2jVar0[
n1X
r=1
n2X
s=1
"rj sj?(Xrj Ysj)]
if we denote Qrsj = "rj sj?(Xrj Ysj)
Var0(Wc ) =
kX
j=1
 2j
n1X
r=1
n2X
s=1
Var0(Qrsj) +
kX
j=1
 2j
n1X
r=1
n2X
s=1
n1X
r1=1
n2X
s1=1
Cov0(Qrsj;Qr1s1j)
=
kX
j=1
 2j[n1n2Var0(Q11j) +n1n2(n2 1)Cov0(Q11j;Q12j)
+n1n2(n1 1)Cov0(Q11j;Q21j)]
Where:
Var0(Q11j) = E(Q2rsj) E2(Qrsj)
= E0("1j 1j?(X1j Y1j))2 [E("1j)E( 1j)E0(?(X1j Y1j))]2
= 12E("21j)E( 21j) 14E2("1j)E2( 1j)
= 12E("1j)E( 1j) 14E2("1j)E2( 1j)
= 12pjqj 14p2jq2j
20
Cov0(Q11j;Q12j) = E0(Q11jQ12j) E0(Q11j)E0(Q12j)
= E0("1j 1j?(X1j Y1j)"1j 2j?(X1j Y2j)) E20("1j 1j?(X1j Y2j))
= E("21j)E( 1j 2j)E0(?(X1j Y1j)?(X1j Y2j))
 E2("1j)E2( 1j)E20(?(X1j Y1j))
= 13E("21j)E( 1j 2j) 14E2("1j 1j)
= 13pjq2j  14p2jq2j
and similarly
Cov(Q11j;Q21j) = 13E("1j"2j)E( 21j) 14E2("1j 1j)
= 13p2jqj 14p2jq2j
Then becausePn1r=1Pn2s=1Qrsj are independently and identically distributed for j = 1;:::;k,
using the Central limit theorem, we can get the asymptotic property of Wc following similar
steps as in Theorem 2.2.
2.6 Monte Carlo Simulation
We performed a large scale simulation to evaluate the  nite sample performance of the
asymptotic distribution of the statistics W+ and Wc .
2.6.1 Null Simulation
First we performed a null simulation for the two-sample continuous prey feature case
under the nominal  = 0:05. For this study, independent, identically distributed samples
were generated from the standard Cauchy (C), central t with 5 degrees of freedom (t5),
standard normal (N), and the standard Laplace (L) distributions. We considered all 42 = 10
possible combinations of the distributions for the two samples. For the sample sizes we took
21
(n1;n2) combinations of (3;10), (3;3), (10;3), and (10;10). For the number of selections of
each subject we took k = 3;6;and 12. The asymptotic distribution given in Theorem 2.2
is used to determine the null rejection rates. We performed 104 repetitions and found the
proportion of cases for which fW+  E0(W+ )g=fvar0(W+ )g1=2 > 1:645. The results for
 (t) =   1(1 t) were quite similar to the results for  (t) = 1 t which we report in
Table 2.2. We note that for most distribution combinations, the rejection rates were very
Table 2.2: Simulated null rejection rates of W+ using its asymptotic normal critical constant for
continuous distributions,  (t)/t;  = :05.
k N;N C;N t5;N N;L C;C C;t5 C;L t5;t5 t5;L L;L
n1 = 10;n2 = 10
3 0.0482 0.0513 0.0496 0.0484 0.0470 0.0559 0.0538 0.0479 0.0508 0.0457
6 0.0518 0.0556 0.0490 0.0483 0.0484 0.0534 0.0528 0.0499 0.0532 0.0477
12 0.0496 0.0545 0.0510 0.0456 0.0517 0.0557 0.0531 0.0478 0.0500 0.0493
n1 = 3;n2 = 10
3 0.0449 0.0785 0.0576 0.0445 0.0487 0.0661 0.0671 0.0502 0.0472 0.0507
6 0.0469 0.0743 0.0572 0.0488 0.0516 0.0717 0.0701 0.0521 0.0508 0.0544
12 0.0492 0.0695 0.0548 0.0452 0.0480 0.0679 0.0667 0.0487 0.0510 0.0509
n1 = 3;n2 = 3
3 0.0440 0.0471 0.0435 0.0468 0.0454 0.0454 0.0481 0.0481 0.0472 0.0479
6 0.0514 0.0499 0.0542 0.0467 0.0524 0.0526 0.0495 0.0493 0.0560 0.0506
12 0.0509 0.0507 0.0502 0.0490 0.0458 0.0503 0.0502 0.0508 0.0481 0.0480
n1 = 10;n2 = 3
3 0.0483 0.0281 0.0473 0.0575 0.0488 0.0361 0.0342 0.0505 0.0529 0.0499
6 0.0531 0.0316 0.0439 0.0556 0.0480 0.0387 0.0347 0.0533 0.0516 0.0515
12 0.0487 0.0296 0.0454 0.0566 0.0488 0.0355 0.0345 0.0520 0.0494 0.0517
close to the nominal  = 0:05 despite using the asymptotic distribution for such small
samples. In the hybrid distribution cases where the two samples were drawn from two
di erent distributions, when the sample size from the heavier tailed distribution is smaller
than the sample size from the lighter tailed one, we found that the rejection rates were higher
than 0.05. The reverse phenomenon occurs when the sample sizes are reversed. It appears
that the asymptotic variances are under- and over-estimated, respectively, for the two cases.
We can see the pattern more clearly from Figure 2.1.
22
Figure 2.1: Figure for Null Rejection Rates in Table 2.2
Distributions
Null Rejection Rates, k=6
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
a71
C,C C,t5 C,N C,L t5,t5 t5,N t5,L N,N N,L L,L
0.0316
0.0500
0.0743
a71
a71
a71
n1=10,n2=10
n1=3, n2=10
n1=3, n2=3
n1=10,n2=3
In Figure 2.1, the horizontal axis denotes di erent distributional settings and the vertical
axis denotes null rejection rate. Di erent color denotes di erent sample size combinations
and the  at line in the middle denotes the nominal  = 0:05. The Figure shows the null
rejection rate for di erent distributional settings and di erent sample size combinations.
From this Figure, we can see some clear patterns: First, for most distributional settings, the
null rejection rates are fairly close to 0:05 despite using the asymptotic distribution for such
small sample size. Second, the red line is the most preferred line since overall, it?s the closed
one to the  at line which denotes the nominal  = 0:05. This indicates the test statistic
performs the best for equal large sample size (n1 = n2 = 10) which meets our expectation.
Third, there are some abnormality due to including the extremely heavy tailed Cauchy
distribution to generate the sample. If the sample size generated using Cauchy distribution
is relatively larger compared to the other sample size, the asymptotic variance tends to be
over-estimated, which would lead to a smaller null rejection rate. Similarly, if the sample
23
size generated using Cauchy distribution is relatively smaller compared to the other sample
size, the asymptotic variance tends to be under-estimated, which would lead to a larger null
rejection rate.
2.6.2 Power Simulation
To determine the power of the test based on the asymptotic distribution in Theorem 2.2,
we generated independent samples for the jth selection as Xj  N( j;1) and Yj  N(0;1)
(j = 1;:::;k). We study the performance of the asymptotic test under simple ordering
 1       k and umbrella ordering  1       q       k, q2f1;:::;kg, of the X
means. We used Wilcoxon scores  (t) = 1 t and normal scores  (t) =   1(1 t), where
 is the standard normal cdf. The proportion of cases out of 104 in which the standardized
statistic exceeded 1.645 were recorded. In the  rst part of the simulation, we generated the
X samples by drawing the values of the  rst approximately k=2 selections from the N(0:5;1)
distribution and the remaining selections from the N(0;1) distribution. For the second part,
the  rst k=3 X selections were drawn from the N(1;1) distribution, the middle k=3 were
from the N(0:5;1) distribution, and the  nal k=3 were from the N(0;1) distribution. Finally,
the third part used the  rst k=3 selections from N(0;1), the middle k=3 from N(0:5;1) and
the  nal k=3 from N(0;1) creating an umbrella pattern. The results are given in Table 2.3.
It is evident that the rejection rates were increasing with increasing k or increasing n1 and n2
for both score functions. It also appears that the test?s ability to detect simple ordered mean
patterns increased when the number of means used to generate the X sample was larger.
This goes hand in hand with our intuition that the test statistic becomes better trained to
detect patterns when there are more distinct classes. The test based Wilcoxon scores was
more powerful than the test based on normal scores. This is expected since the means are
arranged in a linearly decreasing pattern where linear scores are optimal. Normal scores are
optimal for the case where the means are quickly decreasing at the beginning and at the end
but slowly decreasing in the middle.
24
Table 2.3: The simulated power of W+ using its asymptotic normal distribution where Xj  
N( j;1); Yj N(0;1); 1 j k;  = 0:05
 (t) = 1 t  (t) =   1(1 t)
n1 = n2 n1 = n2
Mean pattern k 2 5 7 2 5 7
f0:5 0:5 0g 3 0.111 0.138 0.164 0.046 0.112 0.134
12 0.240 0.492 0.624 0.154 0.273 0.350
f1 0:5 0g 3 0.248 0.453 0.567 0.072 0.220 0.286
12 0.569 0.939 0.984 0.270 0.562 0.711
f0 0:5 0g 3 0.099 0.109 0.112 0.026 0.046 0.047
12 0.115 0.185 0.228 0.055 0.050 0.051
The power of the test was low in detecting an umbrella ordered pattern in the alternative.
We note that the test is not designed to detect such patterns since the score function is
monotone non-increasing with each successive selection. If one is interested in such patterns,
then the test of [43] for umbrella alternatives may be adapted for this case in a straightforward
manner.
If the populations that generated the samples are as in Table 2.3 but the X mean
con guration is f1;1=4;0;0g, then a test using linear scores will not be as powerful as one
using a score function that is decreasing fast at  rst and slowly at the end. One such
score function is  (t) = G 1(1 t), where G is the Gamma(1=5;4) distribution. Using the
Wilcoxon score function, for k = 4 selections, 103 replications gave rejection rates of 29%,
and 40% for n1 = n2 values of 5 and 7, respectively. The corresponding rejection rates
for the gamma score function were 40% and 54%. Clearly, the choice of score functions is
important. It is possible to select bent scores as discussed on p. 100 of [44] or Section 4 of
[45]. Alternatively, nonparametric estimates of the optimal score function may be obtained
using the methods of [46] and [47].
Finally, we performed a simulation study with the selections were correlated. This was
done by generating the X sample from a trivariate normal distribution with mean (1;1=2;0)
25
and an order one autoregressive (AR(1)) covariance matrix with (i;j)th entry  ji jj for
 = 0;0:2;0:5;0:9. The rejection rates were nearly the same as the ones in Table 2.3.
2.6.3 Null Simulation for Two Sample Case with Censoring
Similar to the null simulation for the two-sample continuous prey feature case, for this
study, under the nominal  = 0:05, independent, identically distributed samples were gen-
erated from the standard Cauchy (C), central t with 5 degrees of freedom (t5), standard
normal (N), and the standard Laplace (L) distributions. We considered all  42 = 10 possible
combinations of the distributions for the two samples. For the sample sizes we took (n1;n2)
combinations of (3;10), (3;3), and (10;3). For the number of repeated measurements of each
subject we took k = 3;12. For the censor probability, we took (pj = qj = 0:5) The asymptotic
distribution of Wc is used to determine the null rejection rates. We performed 104 repetitions
and found the proportion of cases for which fWc  E0(Wc )g=fvar0(Wc )g1=2 > 1:645.
Table 2.4: Simulated null rejection rates of W+ using its asymptotic normal critical constant,
 (t)/t;  = :05;P("rj = 1) = P( sj = 1) = :5.
k N;N C;N t5;N N;L C;C C;t5 C;L t5;t5 t5;L L;L
n1 = 3;n2 = 10
3 0.0436 0.0479 0.0427 0.0413 0.0431 0.0476 0.0521 0.0402 0.0439 0.0449
12 0.0459 0.0589 0.0460 0.0453 0.0481 0.0572 0.0526 0.0476 0.0469 0.0420
n1 = 3;n2 = 3
3 0.0504 0.0487 0.0509 0.0498 0.0512 0.0523 0.0451 0.0468 0.0524 0.0499
12 0.0469 0.0461 0.0460 0.0441 0.0437 0.0454 0.0431 0.0480 0.0444 0.0446
n1 = 10;n2 = 3
3 0.0445 0.0380 0.0401 0.0472 0.0421 0.0403 0.0404 0.0451 0.0447 0.0445
12 0.0482 0.0383 0.0449 0.0546 0.0460 0.0444 0.0365 0.0466 0.0493 0.0490
It is evident from Table 2.4, Although the same size is very small, n1; n2 just takes
3 and 10, for a very large censor probability pj = qj = 0:5, for all the distribution settings,
the null rejection rate is still fairly close to the nominal  = 0:05.
26
Finally, we performed simulations for di erent censoring probabilities (pj = 0:1; qj =
0:1), (pj = 0:1; qj = 0:9), (pj = 0:25; qj = 0:25), (pj = 0:25; qj = 0:75) and (pj = 0:9; qj =
0:9) . In all these cases, the null rejection rates are all close to the nomial 0:05, with less
censoring probability corresponding to closer to 0:05, which meets our expectation.
2.7 Conclusion
This chapter gave nonparametric rank based test statistics for detecting preference pat-
terns in selective predation by extending the method of [1]. This test statistics can be used
broadly in repeated measures data study. We provided a class of general rank score tests for
the one sample case where we only have one species of predators and prey have two features
of interest. This gives the  exibility to place varied emphasis on consecutive selections at
di erent stages of the selection experiment. This could be helpful in controlling the manner
in which extraneous variables can a ect selection preference patterns.
This chapter also proposes a class of general rank score tests for the di erence in preda-
tion patterns of two predatory species, which is a special case for the two-sample repeated
measures data. In this case, prey feature of interest can be continuous or categorical. It is
shown that the test statistic for the categorical case becomes equivalent to the continuous
case if the data have no ties. In both cases, the asymptotic distribution of the test statistic is
Gaussian. We also study the test statistics and its asymptotic properties for the one-sample
and two-sample repeated measures data with random censoring. The results of a simulation
study using the asymptotic Gaussian distribution but small samples shows that the test has
a satisfactory  nite-sample performance. The null simulation shows that null rejection rates
are close to nominal  values. The asymptotic test is powerful in detecting simple-ordered
alternatives.
27
Chapter 3
Testing the Di erence Trend in Two Sample Repeated Measurement Data
3.1 Motivation of the Method
Consider the predator-prey relationship discussed in the previous chapter. The manner
in which the predator selects its prey is crucial for its survival. Several methods have been
given in the past to quantify selective predation. As pointed out in [1], one drawback of
most existing methods is that they do not take into account the order in which di erent
preys are taken. [1] suggest to use a Wilcoxon statistic to test for food preference while
taking into account the order in which di erent prey are taken. The use of the Wilcoxon
statistic means that the gradient of selection remains the same throughout the experiment.
In the previous chapter we gave a generalized form of the Wilcoxon statistic used by [1] to
allow experimenters place varied emphasis on the selection gradient at di erent parts of the
experiment. Further we proposed a method to test the di erence in preference patterns of
two di erent species of predators, where prey feature can be either continuous or categorical.
In this chapter, we are in particular interested in testing whether the rate of prey feature
change in the consecutive selections of one species is faster than that of another species.
Another problem we consider involves an experiment conducted by the Department of
Kinesiology at Auburn University. Let us consider part of the study by [48] concerning the
e ect of a single session of high intensity aerobic exercise on in ammatory markers of subjects
taken over time. Eighteen subjects were placed into two groups (High Fitness and Moderate
Fitness) depending on their  tness levels, nine in each group. The response here is the C-
Reactive protein (CRP). Elevated CRP levels are a marker of low-grade chronic in ammation
and may predict a higher risk for cardiovascular disease. Our e ect of interest is the trend
of di erence of CRP levels between the two groups (High Fitness - Moderate Fitness), we
28
are interested in testing whether the di erence stays the same over time or decays to zero
over time. Each individual in the two groups was required to  nish an exercise session which
consumes 500 kcal. CRP levels were obtained 24 hours and immediately prior to the acute
bout of exercise and subsequently 24, 72 and 120 hours following exercise.
Table 3.1: CRP levels in [48]
id group -24 0 24 72 120
1 LO 1.79 0.78 0.68 0.86 0.83
2 LO 0.09 0.14 0.07 0.14 0.51
3 LO 1.13 0.99 1.06 0.91 0.92
4 LO 2.94 2.07 1.51 1.05 1.03
5 LO 1.82 2.28 2.99 1.95 1.78
6 LO 0.17 0.08 0.29 0.3 0.22
7 LO 0.38 0.19 0.29 0.49 0.26
8 LO 1.32 0.96 1.8 0.82 1.68
9 LO 0.4 0.11 0.25 0.2 0.31
10 HI 0.11 0.19 0.21 0.22 0.08
11 HI 0.15 0.17 0.15 0.11 0.21
12 HI 0.06 0.05 0.05 0.06 0.16
13 HI 0.31 0.2 0.28 0.14 0.24
14 HI 0.21 0.28 0.35 0.1 0.2
15 HI 0.36 0.37 4.54 1.8 1.1
16 HI 0.92 1.23 1.47 0.96 0.81
17 HI 0.52 0.52 0.51 0.31 0.35
18 HI 2.05 1.61 1.35 0.73 0.67
In this example, the data sample size is small, we only have 9 subjects in each group
and each subject has only 5 repeated measurement data. The data are highly skewed and
contain outliers, which we can see from Figure 3.1.
Both the two-sample selective predation problem and the CRP level problem are speci c
examples for two sample repeated measurement data, and the problem we are interested in
can be generalized as studying the trend di erence between the two samples over repeated
measurements. In the current chapter, we propose a procedure that is analogous to the Page
test [24] to study the trend of the di erence between two samples in a repeated measurement
data. Although the Page test is used for randomized complete block designs, we extend it
to the situation where we have two randomized complete block designs. We derive the
29
Figure 3.1: Boxplots for CRP levels at time 0 and time 24 for the two groups HI and LO
a71
a71
a71
HI LO
0.0
0.5
1.0
1.5
2.0
CRP levels immediatly prior to exercise
time
CRP
a71
HI LO
0
1
2
3
4
CRP levels 24 hours following exercise
time
CRP
asymptotic distribution of a general test statistic that includes the Page statistic as a special
case. This provides an approximate  level test for the changing rates in the simple ordering
case.
3.2 Test Statistic and Theory
Suppose now that we have two populations of repeated measurement data represented
by X and Y , and each subject in the two groups has k repeated measurements. We
will assume that the repeated measurements for each subject are continuously measured.
Suppose we have a random sample n1 from X and n2 = N n1 from Y to participate in
the experiment. The data from this experiment are presented as an n1 k matrix X and
n2 k matrix Y. In both cases, rows represent individuals and columns represent repeated
measurements. Let U = [X0 Y0]0 be the N k matrix of responses. In the following Jp q
denotes a p q matrix of ones and 0p q Jp q Jp q is a p q matrix of zeros.
30
Consider the following multivariate linear model
U =  JN k + C + E ; (3.2.1)
 2R is an unknown intercept parameter, C = [J0n1 k 00n2 k]0,  = diag( 1;:::; k) is a
diagonal matrix of k real unknown parameters, and E is an N k matrix of random error
terms. In the model in (3.2.1), if we assume that E[E] = 0N k, then, for any r = 1;:::;n1
and s = 1;:::;n2, E(Xrj Ysj) =  j, for j = 1;:::;k. Thus  is a shift parameter.
We are interested in testing
H0 :  1 =   =  k
HA :  1       k ; with at least one strict inequality.
Rejecting H0 in favor of HA means that the gap between two groups is increasing over time.
Let N = n1n2 and de ne N  k matrix Z = X Jn2 1 Jn1 1 Y. Let R be the N k
matrix of row-ranks with (i;j)th element of R is Rij = Pkt=1 IfZit Zijg. Selection ranks
are
~R = R0 JN  1 :
[24] introduced a rank test of H0 versus HA for the case where the  ?s are parameters of a
randomized complete block design by modifying the Friedman test. Extending this to our
situation, we de ne an equivalent of the Page L statistic for testing H0 versus H1 as
W = A0k ~R ;
where A0k = (1;:::;k). A level  test rejects H0 in favor of HA if W >w where w is chosen
to satisfy P0(W >w ) =  . Here P0 stands for the probability measure under the restriction
imposed by H0. The exact value of w can be determined; however, this is a very tedious
process. So, we opt for deriving an asymptotic test based on the asymptotic distribution of
31
W.
We will derive the asymptotic distribution of a more general test statistic of which W is one
special case. To that end, let  = ( 1;:::; k)0 be a vector of contrast coe cients such that
 1       k and normalized such that J0k 1 = 0 and  0 = 1.
We then de ne our \generalized" Page test statistic as
W =  0~R : (3.2.2)
The following theorem gives the null expectation and variance of W . The proof is
quite long and is found in the appendix.
Theorem 3.1 Assume that the null hypothesis is true and the score function  are normal-
ized. Then E0(W ) = 0 and
var0(W ) =  0V 
= n1n2k(k + 1)12 +n1n2(n1 +n2 2)f (2k 4)A
 (k 2)B + (k 1)(k 2)C + 3k
2 11k + 16
12 g
where
A =
Z
P0(U11 U(n1+1)1 +U(n1+1)2 >t)f1 P0(U11 U11 +U(n1+1)2 >t)gdF(t);;
B =
Z
fP0(U11 U(n1+1)1 +U(n1+1)2 >t)g2dF(t) ; and
C =
Z
fP0(U11 U(n1+1)1 +U(n1+1)2 <t)g2dF(t) :
For practical purposes, the quantities A;B, and C in the expression for var0(W ) have to be
estimated from available data. One can use the nonparametric estimators ^An1n2, ^Bn1n2, and
^Cn
1n2 by using the empirical distribution function in the de ning integrals. This is equivalent
to using the Riemann sum of the integrals.
32
Let Zi; i = 1;:::;n be a stationary time series, let Zn = 1nPni=1Zi. Assuming the
variance of Zn decay to zero proportional to n  for some 0 <   1, that is, assuming
Var(Zn)  Cvarn  for some constant Cvar > 0. We have short-range dependence in the
case  = 1 and long-range dependence (or long memory) in the case  < 1. It is usual
to characterize a long-memory process by the number H = 1  =2, the so called Hurst
parameter. In the following, we will show that test statistic W is a sum of a long-memory
process. We can write
Var0(W ) = n1n2C1(k) +n1n2(n1 +n2 2)C2(k)
Var0(W ) = 1N C1(k) + N 2N C2(k)
where C1(k) and C2(k) are functions of k and N = n1n2. When N tends to 1, we can
rewrite Var0(W ) as:
Var0(W ) C2(k)N (2H 2)
where the Hurst parameter H = 12
 
log(n1+n2)
log(n1)+log(n2) + 1
 
with  = 2 2H < 1 and
H!3=4 as min(n1;n2)!1:
If we rewrite the test statistic (3.2.2) as
W =
n1X
i=1
n2X
j=1
 0Rij =
n1X
i=1
Ti
then
W = 1N 
n1X
i=1
n2X
j=1
 0Rij =
n1n2X
i=1
Ti
n1n2
where we assume Ti;i = 1;:::;n1 is a stationary process. So we conclude W is a sum of
long-memory process.
33
To continue our discussion, we need to assume that there is a function G2L2(R), such
that Ti=n2 = G(Xi), where Xi is a stationary Gaussian process with long memory. The
Hurst parameter is now m = inffk > 0 : ck 6= 0g, where k is the coe cient in the Hermite
polynomial expansion of G as G(x) = P1k=0 ckk!Hk(x) .
Theorem 3.2 Let Gm be the class of all functions in L2(R) with Hermite rank m. Let
G 2 Gm and Xt is a stationary Gaussian process with long-range correlation and Sn =
Pn
t=1 G(Xt). Then the following holds:
 If 1=2 <H < 1 1=(2m), then
 2S = limn!1n 1
nX
j;l=1
E[G(Xj)G(Xl)]
exists and
S n = n 12Sn!S in distribution as n!1
where S is a normal random variable with zero mean and variance  2S
 If 1 1=(2m) <H < 1, then
S n = n 1 m(H 1)Sn!pcmamm!Zm in distribution;
where
cm = 2c
m
 m!
(1 m(2 2H))(2 m(2 2H))
cm is a dealing with Xt and Zm is a nondegenerate random variable.
Let us discuss the possible values of m.
34
1. If m > 2, 1=2 < 3=4 = H < 1 1=(2m), which falls under case 1 of Theorem 3.2, in
which  2S = limn!1n 1
nX
j;l=1
E[G(Xj)G(Xl)] exists. In our case,
limn!1n 1
nX
j;l=1
E[G(Xj)G(Xl)] = limn
1!1
n 11
n1X
j;l=1
E[Tjnn Tlnn ]  limn
1!1
(n1 1)Gk =1
where Gk is a function of k. So we conclude that this can not be the case, m > 2 is
not true.
2. If m = 2, which falls under case 2 of Theorem 3.2, but in this case, the distribution of
Zm is complicated.
3. If m = 1, which still falls in case 2 of Theorem 3.2. In our case, we can get that
N HW D !pcmamN(0;1)
where pcmam is a constant we need to estimate. Since Var0(W )  C2(k)N (2H 2),
we get Var0(W) = C2(k)N 2H, we can use pC2(k) as an estimator of the constant
pc
mam.
We will use simulation to show this is the case.
3.3 Monte Carlo Simulation
3.3.1 Null Simulation
First we perform a null simulation for the two-sample continuous prey feature case
under the nominal  = 0:05. For this study, independent, identically distributed samples
are generated from the standard Cauchy (C), central t with 5 degrees of freedom (t5),
standard normal (N), and the standard Laplace (L) distributions. For the sample sizes we
take combinations (n1;n2) as (2;2), (2;4),(2;7),(2;10),(4;4),(4;7),(4;10),(7;7),(7;10), and
(10;10). For the number of selections of each subject we take k = 5. The asymptotic
35
distribution given in Theorem 3.2 is used to determine the null rejection rates. We perform
104 repetitions and  nd the proportion of cases for which (n1n2)HW =pC2(k) > 1:645.
We note that for all the distribution combinations, the null rejection rate are all close to
Table 3.2: Simulated null rejection rates of W using its asymptotic normal critical constant,
 (t)/t;  = :05 and k = 5.
(n1;n2) N;N C;C t5;t5 L;L
2,2 0.0555 0.0590 0.0537 0.0574
2,4 0.0550 0.0613 0.0539 0.0580
2,7 0.0577 0.0554 0.0612 0.0553
2,10 0.0601 0.0586 0.0575 0.0616
4,4 0.0611 0.0558 0.0569 0.0569
4,7 0.0507 0.0522 0.0548 0.0571
4,10 0.0512 0.0529 0.0549 0.0548
7,7 0.0551 0.0536 0.0520 0.0548
7,10 0.0524 0.0591 0.0547 0.0509
10,10 0.0545 0.0524 0.0521 0.0522
and a little above the nominal  = 0:05 despite using the asymptotic distribution as the
underlying distribution for such small samples. We can conclude that in Theorem 3.2, we
should take m = 1, and pC2(k) is a  ne although not perfect estimate of the constant
pc
mam. To improve the simulation result, we need to  nd a better estimate than
pC
2(k)
for the constant pcmam.
3.3.2 Results for the data in [48]
For the data set in [48], we are interested in testing the following hypothesis:
 H0 : The di erence between two groups stays the same.
 Ha : The di erence between two groups decays to zero.
Using the nondecreasing normalized linear score function and the asymptotic Gaussian dis-
tribution of the test statistic W , we  nd that the p value=.466. We would not reject H0
using this p value and we conclude that the di erence between the two groups (Moderate
Fitness and High Fitness) does not decay to zero, which we can also see from the Figure 3.2
.
36
Figure 3.2: Mean and median pro les for the CRP levels in two groups HI and LO
-20 0 20 40 60 80 100
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
Mean Profiles
times
Me
a
n
s
LO
HI
-20 0 20 40 60 80 100
0.2
0.4
0.6
0.8
1.0
Median Profiles
times
Me
d
i
a
n
s
LO
HI
In both the two  gures, the horizontal axis denotes the times at which we obtain the
CRP values for each individual. The  rst  gure shows the mean CRP levels in each group
(Moderate Fitness and High Fitness) at the  ve time points. The second  gure shows the
median CRP levels in each group (Moderate Fitness and High Fitness) at the  ve time
points. Since from  gure 3.1 we know the data is highly skewed and there are outliers
in this data, we prefer to use the median pro le, which clearly verify our result that the
di erence of CRP levels between the two groups does not decay to zero over time.
3.4 Conclusion
If we already have enough evidence to conclude that two species of predator tend to
choose di erent prey, then we may want to study the di erence further, for example, to
compare the trends of the two selection patterns. In this chapter, we proposed a rank
based method to test the trend between two samples in repeated measurement data. We
37
establish the asymptotic normality of the test statistic. Our simulation study showed that
the asymptotic test retains the nominal Type I error rate. If we want to test the trend of
the di erence of the opposite direction, that is, Ha :  1   2  :::   k with at least
one strict inequality, then we can just multiply the generalized normalized non-decreasing
weight function by -1. This will not a ect the asymptotic properties of the test statistic.
38
Chapter 4
Iteratively Reweighted Rank Regression Estimator
4.1 Introduction
In this chapter, we introduce and study IRLS rank estimators. This method of es-
timating rank regression parameters is essential in developing a general rank approach to
generalized estimating equations (GEE). We shall reconsider testing in the two-sample re-
peated measures problems using rank GEE methods in the next chapter. Moreover, using
the rank GEE approach we will develop a program for rank testing and estimation of treat-
ment e ects for repeated measurement models with the response distribution assumed to be
continuous and in the exponential family of distributions.
Consider the general linear model
Y = X + e ; (4.1.1)
where  2Rp is a vector of unknown parameters, Y2Rn denotes the response vector, and
X denotes the n p design matrix of predictors. The random error vector e = (e1;:::;en)T
is such that e1;:::;en are independent and identically distributed according to a distribution
function F with density f.
Least squares (LS) procedures are widely used for the analysis of linear models such
as the one given in (4.1.1). In addition to their mathematical clarity, they o er the user a
uni ed methodology with which to attack many diverse problems. The LS estimator of  in
(4.1.1) is given by
b LS = Argmin
 2Rp
kY X k2 ;
39
where k k2 is the squared Euclidean norm; that is kvk2LS = Pni=1v2i for v 2Rn. The LS
estimator is given by b LS = (X0X) 1X0Y. Under regularity conditions, the large sample
distribution is given by
b LS D !N( ; 2(X0X) 1) :
Since the LS estimator minimizes the Euclidean distance between the response vector and the
space de ned by the linear model, the LS estimator is not robust to outliers and skewness.
There are several classes of robust estimators that are analogous to the traditional LS
estimators. They are usually obtained by replacing the Euclidean norm with another norm.
Furthermore, depending on the selected norm, the analysis can be made robust and highly
e cient compared to the LS analysis. A popular rank-based estimator uses the rank semi-
norm instead of the Euclidean norm. The rank semi-norm on Rn is given by
kvkR =
nX
i=1
a(R(vi))vi ; v2Rn
where R(vi) denotes the rank of vi among v1;:::;vn, a(i) =  (i=(n + 1)) and  (u) is the
score function, which can be any nondecreasing function on (0;1). The fact that this is a
semi-norm is shown in [49]. The rank estimate of  is a vector b R such that
b R = Argmin
 2Rp
kY X kR :
This method was proposed by [33]. Under regularity conditions found in Chapter 3 of [44],
the asymptotic distribution of the rank estimator is given by
b RD !N( ;VR) ; (4.1.2)
where VR =  2(X0X) 1 with
  1 =
Z 1
0
 (u)
 
 f
0(F 1(u))
f(F 1(u))
 
du: (4.1.3)
40
A common rank estimator is the Wilcoxon estimator that uses  (u) =p12(u 0:5). In this
case,   1 =p12R1 1f2(t)dt.
It can be shown that the rank estimate solves the estimating equations X0a(Y X ) =
0, where a(Y X ) denotes the n-vector with ith component a[R(Yi x0i )]. The solution
cannot be obtained in closed form, but there are some algorithms available for obtaining an
approximate solution [44]. One approach proposed by [34] uses iteratively reweighted least
squares (IRLS) to estimate b R. The appeal of this method is its simplicity and that it can
be obtained using any package that can compute LS estimates.
The purpose of this chapter is to evaluate the  nite sample performance of the IRLS
estimate of b R and its variance. We provide computational algorithms as well as simulation
results pertaining to the estimator, its variance, and its in uence function.
4.2 Reweighted Least Squares Formulation
For any  2Rp, letri( ) denote theith residualYi xTi  and r( ) = (r1( );:::;rn( ))T.
Now b n minimizes the dispersion function
Dn( ) :=kr( )kR =
nX
i=1
 
 R(r
i( ))
n+ 1
 
ri( ) : (4.2.1)
We will assume that the score generating function  is standardized so that R10  (u)du = 0
and R10  2(u)du = 1. Since  is nondecreasing,  ( ) = 0 for some  2 (0;1) and  is such
that  (u)  0 ( 0) whenever u   (u   ). Let m ( ) be the  th quantile of the
ri( ); i = 1;:::;n. The dispersion function in (4.2.1) may be written as
Dn( ) =
nX
i=1
 
 R(r
i( ))
n+ 1
 
[ri( ) m ( )]
=
nX
i=1
wi( )[ri( ) m ( )]2
41
where
wi( ) =
8
>><
>>:
 
nR(r
i( ))
n+1
o
ri( ) m ( ) ; if jri( ) m ( )j6= 0 ;
0 ; if jri( ) m ( )j= 0 :
Due to the centering of the residuals by m ( ), the weights, wi( ), are nonnegative. The
expression above suggests an iterative scheme. Given the kth step estimate, b k; the (k+1)th
step estimate b k+1 minimizes the (k + 1)th step dispersion given by
D n( jb k) =
nX
i=1
wi(b k)((Yi xTi  ) m (b k))2; k = 0;1;::: ; (4.2.2)
that is
b k+1 = Argmin
 
D n( jb k) :
Such algorithms have been considered in the past for computing estimators. [50] used such a
method to obtain least absolute deviation estimators. [51] used a similar strategy to obtain
M-estimates of regression coe cients. [52] develop an IRLS algorithm for the dispersion
function given by (4.2.1) but centering using the mean of the residuals at every step.
It is shown in [34] that pnkb k b nk! 0 in probability as n!1 for every k  1
under some regularity conditions including the consistency of the initial estimator b 0. They
argued that since the asymptotic properties of b n are known, the asymptotic properties of
b k are also known and equivalent to those of b n for every k  1. What are not known,
however, are the  nite sample properties of b k.
It is well known that the LS estimator is more e cient than the rank estimator when
the error distribution is normal and the rank estimator has higher asymptotic e ciency than
the LS estimator when the error distribution has long tails [44]. For  nite samples, however,
we expect the IRLS estimator discussed in this chapter to borrow some of its properties from
LS. As such, we hypothesize that the IRLS estimator will have higher e ciency than both
the LS and the rank estimators for error distributions with moderate tail thicknesses.
42
4.3 Computational Details and Results
4.3.1 Algorithm for Estimating Regression Coe cients
Let Y i ( ) = Yi m ( ). Taking the derivative of equation (4.2.2) with respect to  
and setting the resulting expression equal to 0 we obtain
X
i
wi(b k)[Y i (b k) xTi  ]xi = 0 (4.3.1)
which is the weighted LS estimating equation for the model Y i (b k) = xTi  +"i ; 1 i n.
Given an initial estimate b 0, we  nd the (k + 1)th step IRLS estimate as
b k+1 = (X0W(b k)X) 1X0W(b k)(Y m (b k)) (4.3.2)
for k = 0;1;:::.
Here we describe the algorithm used for computing the rank estimate of  using IRLS.
Algorithm 4.1
1. Let the initial estimator b 0 be given. Set cw and c to small positive numbers (usually
10 5) and set k = 1.
2. If jri(b k 1) m (b k 1)j<cw, then set wi(b k 1) = 0 for i = 1;:::;n. Set W(b k 1) =
diag(w1(b k 1);:::;wn(b k 1)).
3. Calculate b k = (X0W(b k 1)X) 1X0W( k 1)(Y m (b k 1))
4. If kb k b k 1k>c kb k 1k, then set k = k + 1 and return to step 2; else, stop.
We compare our approach to that of the traditional approach of obtaining the rank estimate
of  . The traditional approach uses the quasi-Newton algorithm proposed by [53]. It per-
forms a Newton step starting with an initial estimate. It then checks if the step is successful
in reducing the dispersion function. If the step is successful, then it will continue Newton
43
step iterations. Otherwise, it performs a linear search to bracket the solution. We will use
the Wilcoxon semi-norm and the R code given by [54] that uses quantile regression methods
of [55] to compute the estimate using the traditional approach. This code is available at
http://www.stat.wmich.edu/mckean/HMC/Rcode. We will refer to the estimate obtained
using this method the HM estimate.
The simulation considers estimating  = ( 2; 3; 4)T in the model
Yi =  1 + 2x2i + 3x3i + 4x4i +ei ; i = 1;:::;n:
We generated xji from the Uniform(0;10j) distribution, j = 2;3;4. The errors ei were
randomly generated from the t distribution or the contaminated normal distribution in each
repetition. We shall set the true parameter  = (0;0;0)T; that is we take Yi = ei.
In repetition b, let the estimates of  obtained using IRLS, HM, and LS be b (b)Ij , b (b)HMj,
and b (b)LSj, respectively, for j = 2;3;4 and b = 1;:::;B. The average mean squared error
estimates IRLS, HM, and LS are found as
[MSEI = 1
3B
3X
j=1
BX
b=1
(b (b)Ij   )2 ;
[MSEHM = 1
3B
3X
j=1
BX
b=1
(b (b)HMj  )2 ; and
[MSELS = 1
3B
3X
j=1
BX
b=1
(b (b)LSj  )2 ;
respectively. The estimated relative e ciencies of IRLS with respect to HM and LS are
RE(I;HM) = [MSEHM[
MSEI
and RE(I;LS) = [MSELS[
MSEI
;
where RE > 1 indicates that IRLS is more e cient than the corresponding procedure.
44
The t distribution was chosen as an error distribution since it covers a wide range of
tail thicknesses starting with the Cauchy (t1) to the normal (t1). The contaminated normal
distribution is ideal for studying the e ect of outliers on estimates. We will consider a scale
contamination of the N(0;1) distribution which is created by sampling a proportion 1  
of the time from the N(0;1) distribution and a proportion  of the time from the N(0; 2)
distribution. We will use CN( ; ) to represent the contaminated normal distribution whose
distribution function is given by
F ; (x) = (1  ) (x) +  (x= ) ;
where  is the standard normal distribution function. Table 4.1 and 4.2 give the results of our
simulation study for B = 5000 repetitions. As can be seen in Table 4.1, when the errors are
Table 4.1: Relative e ciencies of IRLS versus HM and LS when the errors follow a tdf distribution
df = 3 df = 6 df = 12
n HM LS HM LS HM LS
10 0.981 1.267 1.051 1.042 1.057 .987
15 0.965 1.474 1.024 1.040 1.046 .993
20 0.945 1.433 1.011 1.065 1.039 1.005
generated from the tdf distribution, HM is more e cient than LS and IRLS when the error
distribution is heavy tailed (df = 3). LS is more e cient than HM and IRLS when the tail
of the distribution approaches the tails of the normal distribution (df = 12). IRLS is more
e cient than HM and LS for a moderate-tailed distribution (df = 6). Table 4.2 contains
Table 4.2: Relative e ciencies of IRLS versus HM and LS when the errors follow a CN( ;3) dis-
tribution
 = 0  = :05  = :10
n HM LS HM LS HM LS
10 1.096 .970 1.040 1.062 1.041 1.022
15 1.059 .968 1.011 1.094 .984 1.146
20 1.057 .974 .996 1.117 .957 1.182
results of the case where the error distribution is N( ;3). The results for CN( ;5) were
45
similar; hence omitted. It is clear that LS is the most e cient of the three when the error
distribution is normal ( = 0). For large contamination ( = :10), HM is more e cient than
LS and IRLS except for very small sample sizes. IRLS seems to have superior performance
for such cases. IRLS is the most e cient of the three for moderate contamination ( = :05).
4.3.2 Variance Estimator
Suppose b k is the consistent kth step IRLS estimator of  . Given b k, the variance of
b k+1 given in (4.3.2) is
bV = E[Var(b k+1jb k)] +Var[E(b k+1jb k)] ;
where
Var(b k+1jb k) = [(X0W(b k)X) 1X0W(b k)]Var(Y)[(X0W(b k)X) 1X0W(b k)]0 (4.3.3)
and
E(b k+1jb k) = (X0W(b k)X) 1X0W(b k)(X0  m (b k))
=   (X0W(b k)X) 1X0W(b k)m (b k) (4.3.4)
We will use the leave-one-out procedure described below to obtain estimates of (4.3.3) and
(4.3.4).
Algorithm 4.2
1. For i = 1;:::;n, leave the ith observation out, calculate the regression coe cient b (i)
using the (n 1) observations in Algorithm 4.1. Let the number of steps required to
converge be k(i) + 1.
2. Use (4.3.3) and (4.3.4) to calculate Var(b (i)k(i)+1jb (i)k(i)) and E(b (i)k(i)+1jb (i)k(i))
46
3. Set bV1 = 1nPni=1 Var(b (i)k(i)+1jb (i)k(i)) and bV2 = Var( 1nPni=1E(b (i)k(i)+1jb (i)k(i))).
4. Take bV = bV1 + bV2 as estimate of V.
In the literature, there are a number of proposed methods to estimate V =  2(X0X) 1. All
of them are based on estimating  from the set of residuals via either considering lengths of
nonparametric intervals based on the Walsh averages of the residuals [53] or nonparametric
density estimation [56]. We shall compare our estimates of V obtained through Algorithm 4.2
to that of [56]. The algorithm of [56], described brie y here, is implemented in the R code
of [54].
The method of [56] uses kernel density estimates with rectangular kernels. Letbe1;:::;ben
be the estimated residuals and let
bHn(y) = 1
n
nX
i=1
nX
j=1
[  (j=n)   (i=n)]I(jbe(i) be(j)j y);
where
  (u) =  (u)  (0) (1)  (0) :
A consistent estimator of  is given by
^ =
r n
n p 1
(
[ (1)  (0)]bHn(bH 1n ( ))=pn
2bH 1n ( )=pn
) 1
:
The recommended values of  are 0.8 if n > 5p and 0.9 otherwise. Similar to the case of
coe cients, we will refer to the estimate of the variance obtained this way as HM.
In the following, we perform a Monte Carlo comparison of the variances of the IRLS
method to HM. The setting of the study is the same as the one used in Subsection 4.3.1
and the MSEs are computed in a similar fashion. Since the distribution of the errors is
speci ed in the simulation, the true value of  is found via exact or numerical integration
using   1 =p12R1 1f2(t)dt. The relative e ciencies of IRLS versus HM using the average
MSEs are given in Tables 4.3 and 4.4. We can see from Table 4.3 that when the errors
47
Table 4.3: REs of bV of IRLS vs HM when ei tdf
n df = 3 df = 6 df = 12
10 .481 6.608 3.674
15 .391 2.200 2.482
20 .044 1.413 1.710
follow the tdf distribution, HM is more e cient for heavier tailed distributions (df = 3)
and IRLS is more e cient for moderate and light tailed distributions (df = 6;12). From
Table 4.4: REs of bV of IRLS vs HM when ei CN( ;3)
n  = 0  = :05  = :10
10 9.907 2.827 3.809
15 3.871 1.775 1.916
20 2.431 .855 .347
Table 4.4, we observe that for the CN( ;3) distribution, HM is more e cient for larger
contamination ( = :05;:10) when the sample size is relatively large (n = 20). Generally
IRLS gives more e cient estimates when there is no contamination  = 0 or the sample size
is small (n = 10;15).
4.3.3 In uence Function
One measure of robustness is the in uence function [57] which measures the amount of
change in the estimator that can be brought about by an in nitesimal local contamination.
Following the approach of [58], we can derive the in uence function of b k+1 given in (4.3.2)
at a given point (x0;y0). Let r = y x0 and de ne Aw =   1w  w and z(x0;y0) =   1w w(y0 
x00 )(y0 x00 )x0, where
 w = E
 
r@w@r xx0
 
and  w = E[wxx0] :
48
Now the in uence function of b k+1 at (x0;y0) is
IF(b k+1; x0;y0) =
kX
i=0
Aiwz(x0;y0) +Ak+1w IF(b 0; x0;y0) :
Note that for our weights
z(x0;y0) =   1w  (F(y0 x00 ))x0 :
So, IF(b k+1; x0;y0) will be bounded in the y-direction if we use a bounded score function
 and IF(b 0; x0;y0) is bounded in the y-direction. However, IF(b k+1; x0;y0) remains un-
bounded in the x-direction regardless of the boundedness of IF(b 0; x0;y0). This is in fact
true for rank estimates. Rank estimates that achieve bounded in uence have been given by
[47] and [59]. These use the so-called weighted Wilcoxon dispersion function proposed by
[60]. The resulting dispersion function is not the same as the one in (4.2.1).
To evaluate the robustness of the IRLS estimator in  nite sample cases, we performed
a Monte Carlo analysis using the sensitivity curve of the estimator. The sensitivity curve is
the empirical in uence function de ned by [61, 62]
SCn(z) =
b (z;Z1;:::;Zn) b (Z1;:::;Zn)
1=(n+ 1) ;
where Zi = (Xi;Yi) for i = 1;:::;n, z 2Rp+1, and b (Z1;:::;Zn) represents the estimator
of  based on the observations Z1;:::;Zn. This is used to get an idea of the sensitivity of
b to local changes.
We performed a Monte Carlo analysis of the sensitivity of the IRLS estimator. In our
simulation, we considered a simple linear model through the origin given by
Yi =  Xi +"i ; i = 1;:::;100 ;
49
where Xi  N(0;1), "i  N(0;16 1), and the true value of  was taken to be zero. We
obtained 1000 IRLS estimates of  , each time using the generated data and after a point
(x;y) 2f( 15; 15);( 14:9; 14:9);:::;(15;15)g was added. As initial values, we used
the least squares estimate of  as well as the high-breakdown and bounded in uence MM
estimate of [63]. The average sensitivity curve of the initial (left panel) and the fully iterated
IRLS estimate (right panel) are given in Figure 4.1. It is clear from the  gure that MM is
Figure 4.1: Sensitivity Curves
-15 -10 -5 0 5 10 15
0
10
30
50
70
LS
z
SC
(z)
-15 -10 -5 0 5 10 15
0.00
0.10
0.20
MM
z
SC
(z)
-15 -10 -5 0 5 10 15
0
2
4
6
8
10
LS-IRLS
z
SC
(z)
-15 -10 -5 0 5 10 15
0
2
4
6
8
MM-IRLS
z
SC
(z)
a robust estimator while LS has an unbounded sensitivity curve. It is also clear that, even
when we start with a highly robust estimator, the IRLS estimate of  remains sensitive to
outlying values in the data.
50
4.4 Tests of Linear Hypotheses
Consider testing a general linear hypothesis of the form
H0 : A = 0 versus H1 : A 6= 0 (4.4.1)
where A is a q p hypothesis matrix with rank q. For example, when A is the p p identity
matrix, H1 corresponds to regression signi cance. The Wald test rejects H0 if
T2 = (Ab )0(AbVA0) 1(Ab ) (4.4.2)
is larger than  21  (q). However, as shown in [64], for  nite samples, a better test rejects
H0 if T2=q > F1  (q;n p 1), where F1  (q;n p 1) corresponds to the (1  )100%
percentile of the F distribution with q and n p 1 degrees of freedom.
We now perform a small Monte Carlo simulation to compare the performance of T2
obtained via IRLS and HM. To this end, we consider the model used in Subsection 4.3.1
with n = 30. We are interested in testing
H0 :  = 0 versus H0 :  6= 0 :
This is equivalent to using A = I3, the 3 3 identity matrix, in the general linear hypoth-
esis (4.4.1). Table 4.5 contains the proportion of rejections under H0 in 1000 repetitions.
Table 4.5 shows that the Wald test based on IRLS appears to overestimate the nominal  
Table 4.5: Empirical  levels for IRLS vs HM
N(0;1) CN(:1;3)
:01 :05 :10 :01 :05 :10
IRLS :024 :072 :129 :012 :063 :083
HM :004 :035 :074 :005 :037 :070
51
and the Wald test based on HM underestimates  . IRLS comes closer to the nominal  in
the case of contaminated normal errors.
4.5 Conclusion
For linear models, the least squares estimator of the regression coe cient is optimal
if the error distribution is normal. For distributions with longer tails than the normal,
however, rank estimators are more e cient than the least squares estimator. [34] proposed
an IRLS method for estimating rank estimators. In this chapter, we study the  nite sample
performance of the IRLS method. For Wilcoxon scores, using the LS estimate as an initial
estimate, we  nd that the IRLS algorithm leads to estimates whose e ciency is between
those of the LS and rank estimates obtained using the algorithm of [54]. The LS estimates
are e cient when the distribution of the error is short-tailed. The algorithm of [54] gives
e cient estimates when the distribution of the error is long-tailed while the IRLS algorithm
gives e cient estimates when the distribution of the error is moderate-tailed. It is also
observed that there is a need for an IRLS formulation of bounded in uence rank estimators
[47]. The boundedness of the initial estimator is a necessary condition for the boundedness
of the in uence function of IRLS estimator, but it is not su cient.
52
Chapter 5
IRLS Estimation of Rank Generalized Estimation Equations Model
5.1 Introduction
The analysis of data resulting from repeated measurement designs such designs often be-
comes complicated because of the non-zero within-subject correlation. To avoid this problem
[10] proposed the Generalized Estimation Equations (GEE) Model, an extension of general-
ized linear models to the analysis of longitudinal data. They introduced a class of estimating
equations that give consistent estimates of the regression parameters and of their variance
under mild assumptions about the time dependence. The estimating equations are derived
without specifying the joint distribution of a subject?s observations yet they reduce to the
score equations for multivariate Gaussian outcomes. The GEE specify only the relation-
ship between the marginal mean of the response variable and covariates. Within-subject
correlation is then accounted for through a ?working? correlation matrix.
Since then, the GEE model has been used to deal with many speci c data models, for
example, for binary models [65] [66], in nonlinear regression models [67], in proportional
odds model [68], in Binomial model [69] [70], in logistic model [71], in contingency tables
[72], in mixed logistic models [73], in normally distributed models [74].
However, the solution of GEEs proposed by [10] is obtained using an iterated reweighted
least squares  tting which is not robust. One solution proposed by [30] is by introducing a
diagonal weight matrix for the within-subject correlation into the estimating equations. An-
other solution is one given by [31] who proposed an adaption of the Wilcoxon-Mann-Whitney
method of estimating linear regression parameters for use in longitudinal data analysis under
the working independence model. They used joint ranking (JR) of all observations in their
development. [32] consider the same model as [31] but they use separate between-subject
53
and within-subject ranks to specify their Wilcoxon-Mann-Whitney estimating equation. [2]
proposed a rank-based  tting procedure that only involves substituting a norm based on a
score function for the Euclidean norm used by [10]. Their subsequent  tting, while also an
iterated reweighted least squares solution to GEEs, is robust to outliers in response space
and further the weights can easily be adapted for robustness in factor space.
In this Chapter, we will  rst introduce the ordinary Least Square based GEE models
proposed by [10] and the Iterated Reweighted Rank-Based Estimates for GEE models. Then
under similar assumptions, we introduce our selective predation pattern problem as a special
case of the two di erent GEEs.
5.2 Ordinary LS based GEE models
Consider a longitudinal set of observations over K subjects, let yij denote the jth re-
sponse for the ith subject for j = 1;2:::;ni and i = 1;2;:::;K. Assume that xij is a p 1
vector of corresponding covariates. Let N = PKi=1 ni denote the total sample size. Assume
that the marginal distribution of yij is of the exponential class of distributions and is given
by
f(yij) = exp[yij ij a( ij) +b(yij)] (5.2.1)
where  > 0,  ij = h( ij) and  ij = xTij . Thus the mean and variance are given by
E(yij) = a0( ij) and Var(yij) = a00( ij)= 
In this notation, the link function is h 1 (a0) 1.
Let Yi = (yi1;:::;yini) and Xi = (xi1;:::;xini) denote the ni 1 vector of response and
the ni p matrix of covariates, respectively, for the ith individual. We consider the general
case where the components of the vector of responses for the ith subject, Yi, are dependent.
Let  i = ( i1; i2;:::; ini)T, so that E(Yi) = a0( i) = (a0( i1);:::;a0( ini))T. For a s 1
vector of unknown parameters  , let Ri = Ri( ) denote a ni ni correlation matrix. De ne
54
the matrix
Vi = A1=2i Ri( )A1=2i = (5.2.2)
where Ai = diag(a00( i1);:::;a00( ini)). We refer to Ri as the working correlation matrix. Vi
will be equal to cov(Yi) if Ri( ) is the true correlation matrix for the Yi. For estimation, let
bVi be an estimate of Vi and bRi be an estimate of Ri, which, in general, requires estimation
of  and often an initial estimate of  . In general, we will denote the estimator of  by
^ ( ; ) to re ect its dependence on  and  .
Liang and Zeger (1986) de ned their estimate in terms of general estimation equations
(GEE). De ne the ni p Hessian matrix,
Di = @a
0( i)
@ ;i = 1;:::;K
Then their GEE estimator b LS is the solution to the equations
XK
i=1D
T
i bV
 1
i [Yi a
0( i)] = 0: (5.2.3)
which we denoted by
XK
i=1Ui[ ;
b f ; ^ ( )g] = 0
We can de ne the dispersion function in terms of the Euclidean norm.
DLS( ) =
XK
i=1[Yi a
0( i)]TbV 1
i [Yi a
0( i)]
=
XK
i=1[
bV 12i Yi bV 12i a0( i)]T[bV 12i Yi V 12i a0( i)]
=
XK
i=1
Xni
j=1[y
 
ij dij( )]
2
55
where Y i = bV 12i Yi = (y i1;:::;y ik)T, dij( ) = cTj a0( i), and cTj is the jth row of bV 12i . The
gradient of DLS( ) is
rDLS( ) = 
XK
i=1D
T
i bV
 1
i [Yi a
0( i)]: (5.2.4)
Thus the solution to (5.2.3) also can be expressed as
^ LS = ArgminDLS( ) (5.2.5)
From this point of view, ^ LS is a nonlinear least squares (LS) estimator. The following
theorem given by [10] establishes the asymptotic normality of ^ LS.
Theorem 5.1 Under mild regularity conditions and given that:
A.1 ^ is K12 -consistent given  and  .
A.2 ^ is K12 -consistent given  ; and
A.3 j@b ( ; )=@ j H(Y; ) which is Op(1).
Then K12 (b LS  ) is asymptotically multivariate Gaussian with zero mean and covariance
matrix VLS given by
VLS = lim
K!1
K(
XK
i=1D
T
i V
 1
i Di)
 1fXK
i=1D
T
i V
 1
i cov(YiV
 1
i Di)g(
XK
i=1D
T
i V
 1
i Di)
 1
5.3 Iterated Reweighted Rank-Based Estimates for GEE Models
[75] developed a class of nonlinear robust estimators. Similar to nonlinear LS estimators,
these estimators minimize a norm of the residuals where, for a vector v2 Rn, the norm is
de ned by
kvk=
nX
i=1
 [r(vi)=(n+ 1)]vi
56
where r(vi) denote the rank of vi among v1;:::;vn and the score function  (u) is a nonde-
creasing, square-integrable function de ned on the interval (0;1), Without loss of generality,
we standardized  so that
Z
 (u)du = 0 and
Z
 (u)2du = 1
For nonnegative weights, we need one other assumption on the score function. For discussion,
we also assume that the score function is odd about 1=2; that is
 (1 u) =  (u)
let Y i = bV12i Yi = (y i1;:::;y ik)T,gij( ) = cTj a0( i), where cTj is the jth row of bV12i , and let
G i = [gij]. The rank-based dispersion function is given by
DR( ) =
XK
i=1
Xni
j=1 [r(y
 
ij gij( ))=(n+ 1)][y
 
ij gij( )] (5.3.1)
We next write the R estimator as weighted LS estimator. From this representation the
asymptotic theory of the R estimator can be derived. Furthermore, it naturally suggests
an IRLS algorithm. Let eij = y ij  gij denote the (i;j)th residual and let m( ) =
med(i;j)feij( )g denote the median of all the residuals. Then because the scores sum to
0, we have the identity,
DR( ) =
XK
i=1
Xni
j=1 [r(eij( ))=(n+ 1)][eij( ) m( )]
=
XK
i=1
Xni
j=1
 [r(eij( ))=(n+ 1)]
eij( ) m( ) [eij( ) m( )]
=
XK
i=1
Xni
j=1wij( )[eij( ) m( )]
2
where wij( ) =  [r(eij( ))=(n+ 1)]=[eij( ) m( )] is a weight function. As usual, we take
wij( ) = 0 if eij( ) m( ) = 0. Note that the weights are nonnegative.
57
Now let b (0)R denote an initial estimator of  . As estimates of the weights, we use
bwij(b (0)R ); i.e., the weight function evaluated at b (0), we have the dispersion function
D R( jb (0)R ) =
XK
i=1
Xni
j=1bwij(
b (0)R )[eij( ) m(b (0)R )]2
=
XK
i=1
Xni
j=1[
q
^wij(b (0)R )eij( ) 
q
bwij(b (0)R )m(b (0)R )]2
Let
b (1)R = ArgminD ( jb (0)R )
This establishes a sequence of IRLS estimates, b (k)R g;k = 1;2;:::
After some algebraic simpli cation, we can obtain the gradient
rD R( jb (k)R ) = 2
XK
i=1D
T
i bV
 12
i cWibV
 12
i [Yi a
0( ) m (b (k)
R )] (5.3.2)
where m (b (k)R ) = bV12i m(b (k)R )1,1 denotes a ni 1 vector all of whose elements are 1, and
cW = diagfbwi1;:::;bwin
igis the diagonal matrix of weights for the ith subject. Hence
b (k+1)R
satis es the general estimating equations (GEE) given by,
XK
i=1D
T
i bV
 12
i cWibV
 12
i [Yi a
0( ) m ( (k)
R )] = 0 (5.3.3)
Which we denoted by
XK
i=1Zi( ;  ( )) = 0
Theorem 5.2 Under these assumptions,
A.1 pKj^ ( )  j= Op(1), as K!1, when  is known.
A.2 pKjb ( ; )  j= Op(1) when  and  are known.
A.3 j@b ( ; )=@ j H(Y; ) which is Op(1)
58
A.4 (Lindeberg-Feller Conditions): For i = 1;:::;K, let Ui = V 12i and UN = [UT1 UT2 :::UTK]T.
Denote the (l;j)th entry of UN by ulj,l = 1;2;:::;n;j = 1;2;:::;ni: Then
max
1 l K
u2ljP
n
m=1
u2mj!0;for all j = 1;:::;ni
and
lim
K!1
1
nU
T
NUN exists and is positive de nite:
A.5 The score function ?(u) is bounded and satis es the standardizing conditions.
A.6 The marginal pdf of eyij = yij gij( ) is continuous and variance-covariance matrix
given in the following is positive de nite.
Assume that the initial estimate satis es pK(b 0R  ) = Op(1). Then under the above
assumptions, for k 1, pK(b (k)R   ) has an asymptotic normal distribution with mean 0
and covariance matrix,
lim
K!1
Kf
XK
i=1D
T
i V
 12
i WiV
 12
i Dig
 1fXK
i=1D
T
i V
 12
i Var(?
y
i)V
 12
i Digf
XK
i=1D
T
i V
 12
i WiV
 12
i Dig
where ?i denotes the ni 1 vector (?[r(eyi1=(n+ 1))];:::;?[r(eyini)=(n+ 1)])T and where
Yyi = V 1=2i Yi = (yyi1;:::;yyik)T
Gyi( ) = V 1=2i a0i( ) = [gyij]
eyij = yyij gyij( )
5.4 Our Case for the Ordinary LS based GEE Model
In our case of studying two sample predation preference, we are considering a longitudi-
nal set of observations over n1 +n2 subjects, Using the notation in Section 5.2, K = n1 +n2
59
andni = k. letuijdenote thejth response forith subject forj = 1;2:::;k andi = 1;2;:::;K
then we can write the model as:
uij =   +xTij 
where  = ( 1; 2;:::; k)T,   is a known intercept. If   is unknown here, use the median
of xij, where n1 < i K. Here xij = I(i n1)Ij, where Ij is a k 1 vector with the jth
item equal to 1, and all the other items equal to 0. If we denote uij  by yij then our
model becomes:
yij = xTij (5.4.1)
Let N = (n1 +n2)k denote the total sample size. Assume that the marginal distribution of
yij is of the exponential class of distributions and is given by (5.2.1). In our case,  ij =  ij =
xTij . the link function h 1 (a0) 1 = I and the mean and variance are given by
E(yij) = a0( ij) = xTij and Var(yij) = a00( ij)= = 1 
Ai = diagfa00( i1);:::;a00( ik)g= I
Vi = A1=2i Ri( )A1=2i = = Ri( )= 
a0( i) = XTi  , which means
Di =
8
>><
>>:
I; if i n1 ;
0; if n1 + 1 i K :
So the GEE estimator b LS actually is the solution to the equations
Xn1
i=1
bV 1i [Yi XTi  ] = 0: (5.4.2)
60
which we denote by
Xn1
i=1Ui[ ;
b f ; ^ ( )g] = 0
and the gradient of DLS( ) is
rDLs( ) = 
Xn1
i=1
bV 1i [Yi XTi  ]
We have the similar theorem as Theorem 5.1. The proof is given in the appendix.
Theorem 5.3 Under mild regularity conditions and given that:
A.1 b is K12 -consistent given  and  .
A.2 ^ is K12 -consistent given  ; and
A.3 j@b ( ; )=@ j H(Y; ) which is Op(1).
Then K12 (b LS  ) is asymptotically multivariate Gaussian with zero mean and covariance
matrix VLS given by
VLS = limn
1!1
n1(
Xn1
i=1V
 1
i )
 1fXn1
i=1V
 1
i cov(Yi)V
 1
i g(
Xn1
i=1V
 1
i )
 1
5.5 Our Case for the Iterated Reweighted Rank-Based Estimators for GEE
Models
In our case,  rst we need an initial intercept ^  which is given by median(uij xTij 0R),
where  0R is the initial  R satisfying pK(b 0R  ) = Op(1) and n1 <i K.
the gradient is given by
D R( jb (k)V ) = 2
n1X
i=1
bV 12i cWibV 12i [Yi X0  m (b (k)R )]
61
where Wi is the same as de ned in Section 5.3, Yi = Uki    (k) and Vi = Ri( )= 
The GEE for b (k+1)R is given by
n1X
i=1
bV 12i cWibV 12i [Yi a0( ) m ( (k)R )] = 0 (5.5.1)
which we denote by
n1X
i=1
Zi( ;  ( )) = 0 :
The following theorem gives the asymptotic distribution of b (k)R . The proof is given in
the appendix.
Theorem 5.4 Under these assumptions,
A.1 pKjb ( )  j= Op(1), as K!1, when  is known.
A.2 pKjb ( ; )  j= Op(1) when  and  are known.
A.3 j@b ( ; )=@ j H(Y; ) which is Op(1)
A.4 (Lindeberg-Feller Conditions): For i = 1;:::;K, let Ui = V 12i and Un = [UT1 UT2 :::UTn1]T.
Denote the (l;j)th entry of Un by ulj,l = 1;2;:::;K;j = 1;2;:::;k: Then
max
1 l K
u2ljP
K
m=1
u2mj!0;for all j = 1;:::;k
and
lim
K!1
1
KU
T
NUN exists and is positive de nite:
A.5 The score function ?(u) is bounded and satis es the standardizing conditions.
A.6 The marginal pdf of eyij = yij gij( ) is continuous and variance-covariance matrix
given in the following is positive de nite.
62
Assume that the initial estimate satis es pK(b 0R  ) = Op(1). Then under the above
assumptions, for k 1, pK(b (k)R   ) has an asymptotic normal distribution with mean 0
and covariance matrix,
limn
1!1
n1f
Xn1
i=1V
 12
i WiV
 12
i g
 1fXn1
i=1V
 12
i Var(?
y
i)V
 12
i gf
Xn1
i=1V
 12
i WiV
 12
i g
where ?i denotes the k 1 vector (?[r(eyi1=(n+ 1))];:::;?[r(eyik)=(n+ 1)])T and where
Yyi = V 1=2i Yi = (yyi1;:::;yyik)T
Gyi( ) = V 1=2i a0i( ) = [gyij]
eyij = yyij gyij( )
5.6 Example
Let us consider once again the study of [48] concerning the e ect of a single session of
high intensity aerobic exercise on in ammatory markers of subjects taken over time. Recall
that 18 subjects were placed into two groups (High Fitness and Moderate Fitness) depending
on their  tness levels, 9 in each group, and the response of interest here is the C-Reactive
protein (CRP). Our e ect of interest is the trend of di erence in CRP beween the two groups
(High Fitness - Moderate Fitness), we are intereted in testing will the di erence stay the
same over time or decay to zero over time. In particular, we are interested in testing the
following hypothesis:
 H0 : The di erence between two groups remains the same.
 Ha : The di erence between two groups is not the same.
Using the ordinary LS based GEE model, we get p value = 0:0087 and using the iterated
reweighted rank-based estimators for GEE model, we get p value = 0:443 and the estimate
63
of di erence is (0:594;0:331;0:003;0:254;0:413). We draw two di erent conclusions using
these two p values. We tend to believe in the p value of the iterated reweighted rank-
based estimators for GEE model because from Figure 3.2, it is clear that from either the
mean pro le or the median pro le, the di erence between the two groups HI and LO is not
zero over time.
For the data we used in [48], Figure 5.1 shows the residual values versus  tted response
Figure 5.1: Residul plots for The ordinary GEE and rank-based GEE
0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
-1
0
1
2
3
MLE Deviance Residual Plot
fits
re
si
d
u
a
l
s
0.4 0.5 0.6 0.7 0.8 0.9 1.0
-1
0
1
2
3
4
5
Rank Deviance Residual Plot
fits
re
si
d
u
a
l
s
values for the two di erent GEE models: The LS based GEE model and the Rank based
GEE model. It is clear that the LS  t is adversely a ected by the outlying observations.
We  nd that overall the residuals are closer to 0 for the Rank Deviance Residual plot and it
is easier to identify the two outliers in this plot. This is not the case with the LS deviance
plot.
64
5.7 Conclusion
Repeated measurement designs occur in many areas of statistical research. The analy-
sis of data resulting from such designs often becomes complicated because of the non-zero
within-subject correlation. To avoid this problem [10] proposed the Generalized Estimation
Equations Model (GEE). However, the solution of GEEs proposed by igns often becomes
complicated because of the non-zero within-subject correlation, To avoid this problem [10]
is not robust because it is based on iterated reweighted least squares  tting. [2] proposed a
rank-based  tting procedure which is robust to outliers in response space. In this chapter,
we used these two GEE models to analyze the selective predation pattern problem, as a
special case of the two sample repeated measurement data. We were able to simplify some
assumptions and derive the asymptotic normality of our estimates.
65
Chapter 6
Summary
In Chapter 2, this dissertation gave nonparametric test statistics for detecting preference
patterns in selective predation by extending the method of [1]. We provided a class of general
rank score tests for the one sample case where we only have one species of predators and prey
have two features of interest. This gives the  exibility to place varied emphasis on consecutive
selections at di erent stages of the selection experiment. This could be helpful in controlling
the manner in which extraneous variables can a ect selection preference patterns.
This dissertation also proposes a class of general rank score tests for the di erence in
predation patterns of two predatory species. In this case, prey feature of interest can be
continuous or categorical. It is shown that the test statistic for the categorical case becomes
equivalent to the continuous case if the data have no ties. In both cases, the asymptotic
distribution of the test statistic is Gaussian. The results of a simulation study using the
asymptotic Gaussian distribution but small samples shows that the test has a satisfactory
 nite-sample performance. The null simulation shows that null rejection rates are close to
nominal  values. We also demonstrate that the asymptotic test is powerful in detecting
simple-ordered alternatives.
If we already have evidence to conclude that two species of predator tend to choose
di erent prey, then one might be interested in comparing the change in speed and the change
in direction of the two overall patterns. We proposed a method to compare the change in
the trends of prey selection patterns of the two species. The trend in the opposite direction
can be tested by multiplying the generalized normalized weight function by -1 in the test
statistic.
66
For linear models, the least squares estimator of the regression coe cient is optimal
if the error distribution is normal. For distributions with longer tails than the normal,
however, rank estimators are more e cient than the least squares estimator. [34] proposed
an IRLS method for estimating rank estimators. In Chapter 4, we studied the  nite sample
performance of the IRLS method. For Wilcoxon scores, using the LS estimate as an initial
estimate, we found that the IRLS algorithm leads to estimates whose e ciency is between
those of the LS and rank estimates obtained using the algorithm of [54]. The LS estimates
are e cient when the distribution of the error is short-tailed. The algorithm of [54] gives
e cient estimates when the distribution of the error is long-tailed while the IRLS algorithm
gives e cient estimates when the distribution of the error is moderate-tailed.
It is also observed that there is a need for an IRLS formulation of bounded in uence
rank estimators [47]. The boundedness of the initial estimator is a necessary condition for
the boundedness of the in uence function of IRLS estimator, but it is not su cient.
In Chapter 5, we used the IRLS representation of rank estimation to propose a rank
analogue of the GEE model proposed by Liang and Zegar (1986). We used these two GEE
methods to study the changing trend problem studied in Chapter 3. The results show that
the rank based GEE method gives superior performance than the GEE method of [10] for
small datasets with outliers.
67
Bibliography
[1] Marinela Capanu, Gregory A. Jones, and Ronald H. Randles. Testing for preference us-
ing a sum of Wilcoxon signed rank statistics. Computational Statistics & Data Analysis,
51(2):793{796, NOV 15 2006.
[2] A. Abebe, J. W. McKean, and J. D. Kloke. Iterated reweighted rank-based estimates
for gee models. Submitted, 2010.
[3] Michael G. Kenward. A method for comparing pro les of repeated measurements.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 36(3):pp. 296{
308, 1987.
[4] Dj Finney. Repeated Measurements-What is Measured and What Repeats. Statistics
in Medicine, 9(6):639{644, JUN 1990.
[5] Jns Matthews, D Altman, Mj Campbell, and P Royston. Analysis of Serial Measure-
ments in Medical-Research - Reply. British Medical Journal, 300(6725):680, MAR 10
1990.
[6] L Frison and Sj Pocock. Repeated Measures In Clinical-Trials - Analysis Using
Mean Summary Statistics And Its Implications For Design. Ststistics In Medicine,
11(13):1685{1704, SEP 30 1992.
[7] T. W. Anderson. An introduction to multivariate statistical analysis. Wiley Publications
in Statistics. John Wiley & Sons Inc., New York, 1958.
[8] Peter C. O?Brien. Procedures for comparing samples with multiple endpoints. Biomet-
rics, 40(4):1079{1087, 1984.
[9] Roderick J. A. Little and Donald B. Rubin. Statistical analysis with missing data. Wiley
Series in Probability and Mathematical Statistics: Applied Probability and Statistics.
John Wiley & Sons Inc., New York, 1987.
[10] Kung Yee Liang and Scott L. Zeger. Longitudinal data analysis using generalized linear
models. Biometrika, 73(1):13{22, 1986.
[11] MC Wu and KR Bailey. Estimation and Comparison of Changes in the Presence of In-
formative Right Censoring-Conditional Linear-Model. Biometrics, 45(3):939{955, SEP
1989.
[12] L. J. Wei and Wayne E. Johnson. Combining dependent tests with incomplete repeated
measurements. Biometrika, 72(2):pp. 359{364, 1985.
68
[13] L. J. Wei and J. M. Lachin. Two-sample asymptotically distribution-free tests for
incomplete multivariate observations. Journal of the American Statistical Association,
79(387):pp. 653{661, 1984.
[14] D. V. Glidden and L. J. Wei. Rank estimation of treatment di erences based on repeated
measurements subject to dependent censoring. J. Amer. Statist. Assoc., 94(447):888{
895, 1999.
[15] A. J. Lawrance and N. T. Kottegoda. Stochastic modelling of river ow time series.
Journal of the Royal Statistical Society. Series A (General), 140(1):pp. 1{47, 1977.
[16] HE Hurst. Long-term Storage Capacity of Reservoirs. Transactions of The American
Society of Civil Engineers, 116:770{799, 1951.
[17] J. B. Carlin and A. P. Dempster. Sensitivity analysis of seasonal adjustments: Empirical
case studies. Journal of the American Statistical Association, 84(405):pp. 6{20, 1989.
[18] S Porter-Hudak. An Application of the Seasonal Fractionally Di erenced Model to the
Monetary Aggregates. Journal of the American Statistical Association, 85(410):338{344,
JUN 1990.
[19] Benoit Mandelbrot. Long-run linearity, locally gaussian process, h-spectra and in nite
variances. International Economic Review, 10(1):pp. 82{111, 1969.
[20] John Haslett and Adrian E. Raftery. Space-time modelling with long-memory depen-
dence: Assessing ireland?s wind power resource. Journal of the Royal Statistical Society.
Series C (Applied Statistics), 38(1):pp. 1{50, 1989.
[21] Jan Beran. A goodness-of- t test for time series with long range dependence. Journal
of the Royal Statistical Society. Series B (Methodological), 54(3):pp. 749{760, 1992.
[22] P. Whittle. On the variation of yield variance with plot size. Biometrika, 43(3/4):pp.
337{343, 1956.
[23] P. Whittle. Topographic correlation, power-law covariance functions, and di usion.
Biometrika, 49(3/4):pp. 305{314, 1962.
[24] Ellis Batten Page. Ordered hypotheses for multiple treatments: a signi cance test for
linear ranks. J. Amer. Statist. Assoc., 58:216{230, 1963.
[25] NM Laird and JH Ware. Random-E ects Models For Longitudinal Data. Biometrics,
38(4):963{974, 1982.
[26] JH Ware. Linear-Models For The Analysis Of Longitudinal-Studies. American Statis-
tician, 39(2):95{101, 1985.
[27] Y Ochi and RL Prentice. Likelihood Inference In A Correlated Probit Regression-Model.
Biometrika, 71(3):531{543, 1984.
69
[28] R Stiratelli, N Laird, and JH Ware. Random-E ects Models For Serial Observations
With Binary Response. Biometrics, 40(4):961{971, 1984.
[29] Gary G. Koch, J. Richard Landis, Jean L. Freeman, Jr. Freeman, Daniel H., and
Robert G. Lehnen. A general methodology for the analysis of experiments with re-
peated measurement of categorical data. Biometrics, 33(1):pp. 133{158, 1977.
[30] Bahjat F. Qaqish and John S. Preisser. Resistant  ts for regression with correlated
outcomes: an estimating equations approach. J. Statist. Plann. Inference, 75(2):415{
431, 1999. The Seventh Eugene Lukacs Conference (Bowling Green, OH, 1997).
[31] Sin-Ho Jung and Zhiliang Ying. Rank-based regression with repeated measurements
data. Biometrika, 90(3):732{740, 2003.
[32] You-Gan Wang and Min Zhu. Rank-based regression for analysis of repeated measures.
Biometrika, 93(2):459{464, 2006.
[33] Louis A. Jaeckel. Estimating regression coe cients by minimizing the dispersion of the
residuals. Ann. Math. Statist., 43:1449{1458, 1972.
[34] Gerald L. Sievers and Asheber Abebe. Rank estimation of regression coe cients using
iterated reweighted least squares. J. Stat. Comput. Simul., 74(11):821{831, 2004.
[35] T. P. Hettmansperger and J. W. McKean. Robust nonparametric statistical methods,
2nd ed. Monographs on Statistics & Applied Probability. CRC Press, Boca Raton, 2011.
[36] M Zuk and GR Kolluru. Exploitation of sexual signals by predators and parasitoids.
Quarterly Review of Biology, 73(4):415{438, DEC 1998.
[37] Nelika K. Hughes, Jennifer L. Kelley, and Peter B. Banks. Receiving behaviour is
sensitive to risks from eavesdropping predators. Oecologia, 160(3):609{617, JUN 2009.
[38] J Chesson. The Estimation and Analysis of Preference and its Relationship to Foraging
Models. Ecology, 64(5):1297{1304, 1983.
[39] B. F. J. Manly. A note on a model for selection experiments. Biometrics, 36(1):pp.
9{18, 1980.
[40] GA Jones, KE Sieving, ML Avery, and RL Meagher. Parasitized and non-parasitized
prey selectivity by an insectivorous bird. CROP PROTECTION, 24(2):185{189, FEB
2005.
[41] Myles Hollander and Douglas A. Wolfe. Nonparametric statistical methods. Wiley Series
in Probability and Statistics: Texts and References Section. John Wiley & Sons Inc.,
New York, second edition, 1999. A Wiley-Interscience Publication.
[42] J. H. Klotz. The Wilcoxon, ties, and the computer. J. Amer. Statist. Assoc., 61:772{787,
1966.
70
[43] Gregory A. Mack and Douglas A. Wolfe. k-sample rank tests for umbrella alternatives.
J. Amer. Statist. Assoc., 76(373):175{181, 1981.
[44] T. P. Hettmansperger and J. W. McKean. Robust nonparametric statistical methods,
volume 5 of Kendall?s Library of Statistics. Edward Arnold, London, 1998.
[45] Joseph W. McKean, Thomas J. Vidmar, and Gerald L. Sievers. A robust two-stage
multiple comparison procedure with application to a random drug screen. Biometrics,
45(4):pp. 1281{1297, 1989.
[46] Rudolf Beran. Asymptotically e cient adaptive rank estimates in location models. Ann.
Statist., 2:63{74, 1974.
[47] Joshua D. Naranjo and Joseph W. McKean. Rank regression with estimated scores.
Statist. Probab. Lett., 33(2):209{216, 1997.
[48] Eric P. Plaisance, J. Kyle Taylor, So ya Alhassan, Asheber Abebe, Michael L. Mestek,
and Peter W. Grandjean. Cardiovascular  tness and vascular in ammatory markers
after acute aerobic exercise. International Journal of Sport Nutrition and Exercise
Metabolism, 17(2):152{162, APR 2007.
[49] Joseph W. McKean and Ronald M. Schrader. The geometry of robust procedures in
linear models. J. Roy. Statist. Soc. Ser. B, 42(3):366{371, 1980.
[50] E. J. Schlossmacher. An iterative technique for absolute deviations curve  tting. Journal
of the American Statistical Association, 68(344):pp. 857{859, 1973.
[51] Peter J. Huber. Robust statistics. John Wiley & Sons Inc., New York, 1981. Wiley
Series in Probability and Mathematical Statistics.
[52] Kuo Sheuan Cheng and Thomas P. Hettmansperger. Weighted least-squares rank esti-
mates. Comm. Statist. A|Theory Methods, 12(9):1069{1086, 1983.
[53] Joseph W. McKean and Thomas P. Hettmansperger. A robust analysis of the general
linear model based on one step R-estimates. Biometrika, 65(3):571{579, 1978.
[54] JT Terpstra and JW McKean. Rank-based analyses of linear models using R. Journal
Of Statistical Software, 14(7), JUL 2005.
[55] Roger Koenker and Gilbert Bassett, Jr. Regression quantiles. Econometrica, 46(1):33{
50, 1978.
[56] Hira L. Koul, Gerald L. Sievers, and Joseph McKean. An estimator of the scale pa-
rameter for the rank analysis of linear models under general score functions. Scand. J.
Statist., 14(2):131{141, 1987.
[57] Frank R. Hampel. The in uence curve and its role in robust estimation. J. Amer.
Statist. Assoc., 69:383{393, 1974.
71
[58] Michael B. Dollinger and Robert G. Staudte. In uence functions of iteratively
reweighted least squares estimators. J. Amer. Statist. Assoc., 86(415):709{716, 1991.
[59] William H. Chang, Joseph W. McKean, Joshua D. Naranjo, and Simon J. Sheather.
High-breakdown rank regression. J. Amer. Statist. Assoc., 94(445):205{219, 1999.
[60] Gerald L. Sievers. A weighted dispersion function for estimation in linear models.
Comm. Statist. A|Theory Methods, 12(10):1161{1179, 1983.
[61] Frank R. Hampel, Elvezio M. Ronchetti, Peter J. Rousseeuw, and Werner A. Stahel.
Robust statistics. Wiley Series in Probability and Mathematical Statistics: Probability
and Mathematical Statistics. John Wiley & Sons Inc., New York, 1986. The approach
based on in uence functions.
[62] Antonio Cuevas and Juan Romo. On the estimation of the in uence curve. Canad. J.
Statist., 23(1):1{9, 1995.
[63] V  ctor J. Yohai. High breakdown-point and high e ciency robust estimates for regres-
sion. Ann. Statist., 15(2):642{656, 1987.
[64] Thomas P. Hettmansperger and Joseph W. McKean. A robust alternative based on
ranks to least squares in analyzing linear models. Technometrics, 19(3):275{284, 1977.
[65] Kuo-Chin Lin, Yi-Ju Chen, and Yu Shyr. A nonparametric smoothing method for
assessing GEE models with longitudinal binary data. Stat. Med., 27(22):4428{4439,
2008.
[66] Sin-Ho Jung and Chul W. Ahn. Sample size for a two-group comparison of repeated
binary measurements using GEE. Stat. Med., 24(17):2583{2596, 2005.
[67] Zhongwen Tang. LOF of logistic GEE models and cost e cient Bayesian optimal designs
for nonlinear combinations of parameters in nonlinear regression models. ProQuest LLC,
Ann Arbor, MI, 2008. Thesis (Ph.D.){Kansas State University.
[68] Junxiang Luo. Goodness-of- t tests for proportional odds model with GEE for ordinal
categorical responses and estimating sampling frequency in pollen exposure assessment
over time. ProQuest LLC, Ann Arbor, MI, 2006. Thesis (Ph.D.){University of Cincin-
nati.
[69] Hui-Yi Lin and Leann Myers. Power and type I error rates of goodness-of- t statistics
for binomial generalized estimating equations (GEE) models. Comput. Statist. Data
Anal., 50(12):3432{3448, 2006.
[70] Hui-Yi Lin. A comparison of goodness-of- t tests for binomial generalized estimating
equations (GEE) models. ProQuest LLC, Ann Arbor, MI, 2004. Thesis (Ph.D.){Tulane
University.
[71] Scott Evans and Lingling Li. A comparison of goodness of  t tests for the logistic GEE
model. Stat. Med., 24(8):1245{1261, 2005.
72
[72] Ji-Hyun Lee and Bahjat F. Qaqish. Modi ed GEE and goodness of the marginal
 t (GOMF) test with correlated bindary responses for contingency tables. Biom. J.,
46(6):675{686, 2004.
[73] Mohand-Larbi Feddag, Ion Grama, and Mounir Mesbah. Generalized estimating equa-
tions (GEE) for mixed logistic models. Comm. Statist. Theory Methods, 32(4):851{874,
2003.
[74] Chi-tsung Wu, Marcia L. Gumpertz, and Dennis D. Boos. Comparison of GEE,
MINQUE, ML, and REML estimating equations for normally distributed data. Amer.
Statist., 55(2):125{130, 2001.
[75] A. Abebe and J. W. McKean. Highly e cient nonlinear regression based on the wilcoxon
norm. Festschrift in Honor of Mir Masoom Ali, pages 340{357, 2007.
73
Appendix: Proofs
Proof of Theorem 3.1
Under H0,  1 =    =  k =  . So, the expectations of all row ranks are equal
and equal to (k + 1)=2. Then E0( ~R) = (N (k + 1)=2)Jk 1. So, E0(W ) =  0E0( ~R) =
(N (k + 1)=2) 0Jk 1 = 0. Now var0(W ) =  0cov0( ~R) . But the (t;s)th element of
cov0( ~R) is
vts =
N X
i=1
N X
j=1
cov0(Rit;Rjs)
When t = s, we have
vts = N var(Rit) +N (N 2)f 112(k 1) + (C 14)(k 1)(k 2)g
= N k
2 1
12 +N
 (N 2)f 1
12(k 1) + (C 
1
4)(k 1)(k 2)g
Whent6= s, we have
vts = N cov(R1t;R1s) +N (N 2)f 112 + 2(k 2)(A 14) + (B 14)(k 2)g
= N ( k + 112 ) +N (N 2)f 112 + 2(k 2)(A 14) + (B 14)(k 2)g
where
A =
Z
P0(U11 U(n1+1)1 +U(n1+1)2 >t)1 P0(U11 U11 +U(n1+1)2 >t)dF(t);;
B =
Z
P0(U11 U(n1+1)1 +U(n1+1)2 >t)2dF(t) ; and
C =
Z
fP0(U11 U(n1+1)1 +U(n1+1)2 <t)g2dF(t) :
74
Denote Pks=1  s( sk+1)Ris = Qi; where  s =  ( sk+1); we have:
Var(W ) =
n1n2X
i=1
var(Qi) +
n1n2X
i=1
n1n2X
l=1
Cov[Qi;Ql]
= n1n2var(Q1) +n1n2(n1 +n2 2)Cov[Q1;Q2]
We obtain
Cov(Q1;Q2) = Cov(
kX
s=1
 sR1s;
kX
s=1
 sR2s;)
=
kX
u=1
kX
v=1
 u vCov(R1u;R2v)
=
kX
u=1
kX
v=1
 u vCov(
kX
p=1;p6=u
I(U1p U(n1+1)p <U1u U(n1+1)u) + 1;
kX
q=1;q6=v
I(U1q U(n1+2)q <U1v U(n1+1)v) + 1)
=
kX
u=1
kX
v=1
 u v
kX
p=1;p6=u
kX
q=1;q6=v
E[I(U1p U(n1+1)p <U1u U(n1+1)u)
I(U1q U(n1+2)q <U1v U(n1+2)v)]
 
kX
u=1
kX
v=1
 u v
kX
p=1;p6=u
kX
q=1;q6=v
E2(I(U1p U(n1+1)p <U1u U(n1+1)u))
The calculation of the  rst term in the above equation is as the following four cases:
Case 1. p6= q;u6= v;
(1) p6= v;q6= u;
E2(I(U1p U(n1+1)p <U1u U(n1+1)u)) = 14
75
(2) p = v;q = u
E[I(U1p U(n1+1)p <U1u U(n1+1)u)I(U1u U(n1+2)u <U1p U(n1+2)p]
=
Z Z
P0(U(n1+1)u U(n1+1)p <t1 t2)(1 P(U(n1+1)u U(n1+1)p <t1 t2))dFt1dFt2
= 16
(3) p = v;q6= u
E[I(U1p U(n1+1)p <X1u U(n1+1)u)I(U1u (n1+2)u <U1p U(n1+2)p]
=
Z
P0(U11 U(n1+1)1 +U(n1+1)2 >t)1 P0(U11 U(n1+1)1 +U(n1+1)2 >t)dF(t)
= A
(4) p6= v;q = u
E[I(U1p U(n1+1)p <U1u U(n1+1)u)I(U1u U(n1+2)u <U1p U(n1+2)p]
=
Z
P0(U11 U(n1+1)1 +U(n1+1)2 >t)1 P0(U11 U(n1+1)1 +U(n1+1)2 >t)dF(t)
= A
Thus
kX
u=1
kX
v=1;u6=v
 u v
kX
p=1;p6=u
kX
q=1;q6=v
E[I(X1p Y1p <X1u Y1u)I(X1q Y2q <X1v Y2v)]
=  [(2k 4)A+ (k 2)(k 3)14 + 16]
76
Case 2. p = q;u = v
E[I(U1p U(n1+1)p <U1u U(n1+1)u)I(U1q U(n1+2)q <U1v U(n1+2)v)]
=
Z Z
P20 (U(n1+1)u U(n1+1)p <t1 t2)dFt1dFt2
= 13
Thus
kX
u=1
kX
v=1
 u u
kX
p=1;p6=u
kX
p=1;p6=v
E[I(U1p U(n1+1)p <U1u U(n1+1)u)
I(U1q U(n1+2)q <U1v U(n1+2)v)]
= 13(k 1)
Case 3. p = q;u6= v
E[I(U1p U(n1+1)p <U1u U(n1+1)u)I(U1q U(n1+2)q <U1v U(n1+2)v)]
=
Z
P0(U11 U(n1+1)1 +U(n1+1)2 >t)2dF(t)
= B
Thus
kX
u=1
kX
v=1
 u v
kX
p=1;p6=u
kX
q=1;q6=v
E[I(X1p Y1p <X1u Y1u)I(X1q Y2q <X1v Y2v)]
=  (k 2)B
77
Case 4. p6= q;u = v
E[I(U1p U(n1+1)p <U1u U(n1+1)u)I(U1q U(n1+2)q <U1v U(n1+2)v)]
=
Z
fP0(U11 U(n1+1)1 +U(n1+1)2 <t)g2dF(t)
= C
Thus
kX
u=1
kX
v=1
 u u
kX
p=1;p6=u
kX
q=1;q6=v
E[I(U1p U(n1+1)p <U1u U(n1+1)u)
I(U1q U(n1+2)q <U1v Y(n1+2)v)]
= (k 1)(k 2)C
The second term
kX
u=1
kX
v=1
 u v
kX
p=1;p6=u
kX
q=1;q6=v
E2(I(U1p U(n1+1)p <U1u U(n1+1)u))
=
kX
u=1
kX
v=1
 u v
kX
p=1;p6=u
kX
q=1;q6=v
1
4(k 1)
2
Plug the result of Cov(Q1;Q2) in Var(W ), we proved the  rst part in Theorem 3.1, then
using the Central Limit Theorem, we proved the second part in Theorem 3.1.
Proof of Theorem 5.3
Write   ( ) = ^ f ; ^ ( )g. We  rst expand PKi=1 Ui( ;  ( )) in a Taylor series about
the true parameter  and evaluated at b LS. By the chain rule, the gradient in this expansion
78
is give by
ri =  Uif ; 
 ( )g
  
= @Uif ; 
 ( )g
@ +
@Uif ;  ( )g
@  
@  ( )
@ 
= Ai + BiC
Because d LS solves equation PKi=1 Ui( ;  ( )) = 0, we have
0 =
KX
i=1
Ui( ;  ( )) +
KX
i=1
ri(b LS  )
Solving for pK(b LS  ), we obtain
pK(b 
LS  ) = [ 
1
K
KX
i=1
ri] 1[ 1pK
KX
i=1
Uif ;\  ( )g]
Secondly, we  x  and expand 1pK PKi=1 Ui( ;  ( )) about the true parameter  and
evaluated at   to get
1p
K
KX
i=1
Ui( ;  ( )) = 1pK
KX
i=1
Ui( ; )
+ 1K
nX
i=1
@Ui( ; )
@ 
pK(    ) +o
p(1)
= A + B C +op(1)
79
Now, B = op(1), Using Law of Large Number and @Ui( ; )=@ are linear functions of
Yi XTi  , whose means are zeros, and condition (A.1) to (A.3) give
C =pn1(    )
=pK[^ f ; ^ ( )g ^ ( ; ) + ^ ( ; )  ]
=pK[@^ @ ( ; )(^   ) + ^ ( ; )  ]
= Op(1)
Consequently, 1pn1 PKi=1 Ui( ;  ( )) is asymptotically equivalent to A whose asymptotic
distribution is multivariate Gaussian with zero mean and covariance matrix
lim
K!1
f
KX
i=1
V 1covfYigV 1=Kg
Finally,PKi=1 = op(K) because @Ui( ;  )=@  are also linear functions of Yi XTi  ?s whose
means are zero. because
@  ( )
@ =
@  ( )
@ 
@ 
@ 
and then using condition (A.3), we can get C = Op(1), and that PKi=1 AiK converges as
K!1 to  PKi=1 V 1=K. This completes the proof.
Proof of Theorem 5.4
This proof is quite similar to the previous one.
Let   ( ) = ^ ( ; ^ ( )). Let k 1 be arbitrary but  xed. For i = 1;:::;K, let
Zi( ;  ( )) = bV 12i WibV 12i [Yi XTi   m ( )]
= bV 12i WibV12i [Yi XTi   m ( )]
= bV 12i Wi[Y i  G i( ) ml]
80
We  rst expand K 1=2PKi=1Zi( ;  ( )) in a Taylor series about the true parameter  
and evaluated at b (k)R , we can get that
pK(b (k)
R   ) =f 
1
K
KX
i=1
rig 1[ 1pK
KX
i=1
Zi( ;  ( ))]
where
ri = @Zi( ;  ( ))@ + @Zi( ;  ( ))@ @ @ = Ai + BiC
Secondly, we  xed  and expand K 1=2PKi=1 Zi( ;  ( )) about the true parameter  and
evaluated at   to get
1p
K
KX
i=1
Zi( ;  ( )) = 1pK
KX
i=1
Zi( ; ) + 1K
KX
i=1
@
@ Zi( ; )
pK(    ) +o
p(1)
= 1pK
KX
i=1
Zi( ; ) +B C +op(1)
Where the op(1) term is due to regularity conditions which imply that the remainder term is
1
KOp(1). Note that the weights are evaluated at the true parameters in this expansion too.
Because Zi( ; ) is evaluated at the true parameters, letting hTij be the jth row of the V 1=2i ,
we then have
1p
K
KX
i=1
Zi( ; ) = 1pK
KX
i=1
kX
j=1
hTijwij[yyij gyij m( )]
= 1pK
KX
i=1
kX
j=1
hTija[r(yyij gyij( ))]
= 1pK
KX
i=1
V 1=2i a[r(Yyi  Gyi( ))]
The second equality holds because the weights are evaluated at the true parameters.
By Assumption [A.5] and [A.6], it follows from Theorem 3:1 of Brunner and Denker (1994)
and the usual Cramer-Wold device that 1pn1 Pn1i=1Zi( ; ) is asymptotically normal with
81
mean 0 and variance-covariance matrix
M = 1K
KX
i=1
V 1=2i Var(?yi)V 1=2i
The remainder of the proof follows from the previous proof. In particular, the results that
B = op(1) and C = Op(1) hold here, the proof of Theorem 5.4 is complete.
82