g. This condition is satisfied if we have convergence
in L1 of Jn [cf. also Doob (1994), Theorem VI.18]. To this end, we will marginally violate
assumption (ii) of Lemma 1 and assume that
sup
n
kJnkp sup
n
n1
n
nX
i=1
j?+(i=(n+ 1))jp
o1=p
<1 (2.4)
for 1 p 1. Notice also that
1
n
[nt]X
i=1
?+(i=(n+ 1))
Z t
0
Jn 1n
[nt]+1X
i=1
?+(i=(n+ 1)):
Taking the limit as n!1we obtain that limn!1Rt0 Jn = Rt0 ?+ for all t2(0;1) provided
that ?+ has at most a finite number of discontinuities. Thus if ?+ satisfies (2.4) and g2Lq
all the conditions of Lemma 1 hold. The following corollary is a special case of this result.
Corollary 1. Let W1;:::;Wn be a random sample from a distribution F with support on
R+. Let : R+ !R+ be a continuous Borel measurable function. Suppose, for 1 p;q 1
with 1=p+ 1=q = 1, E[ (W)]q <1andk?+kp <1. Then
Tn n 1
nX
i=1
?+(i=(n+ 1)) (W(i)) a:s: !
Z
(?+) ( F 1) <1:
A formal proof of Corollary 1 may be constructed along the lines described in the
paragraph preceding it with the function g defined as F 1. It will not be included here
for the sake of brevity.
Lemma 2. Under assumptions A1 - A3
Dn(V; ) a:s: ! ( ) a.e. V, uniformly for all 2 , (2.5)
9
where : !R is a function satisfying
inf
2
( ) > ( 0); (2.6)
for any a closed subset of not containing 0.
Proof. The a:s: pointwise convergence of Dn(V; ) follows from expression (2.3) and Corol-
lary 1, which also furnishes the function
( )
Z
(?+) ( ~G 1 ) <1: (2.7)
Then under A1 - A3, Theorem 2 of Jennrich (1969) gives (2.5).
To establish (2.6) we follow a similar strategy as in H?ssjer (1994). Under A1 and A3
for any s> 0, for 6= 0,
~G (s) = P(je ff(x; ) f(x; 0)gj s)
= ExfPe(je ff(x; ) f(x; 0)gj sjx)g
__
>>>
<
>>>
>:
1( k); if u< 2k 1;
1(u+12 ); if 2k 1 u< 2k 1;
1(k); if u 2k 1 .
16
Usually we take k = 4.
2.4 Breakdown Point
One of the virtues of the estimators discussed in this paper is that they allow for trim-
ming. This in turn provides us with estimates that are robust when one or more of the
model assumptions are violated. In this section we will consider the breakdown point of our
estimator as a measure of its robustness. Assuming that the true value of the parameter to
be estimated is in the interior of the parameter space , breakdown represents a severe form
of inconsistency in that the estimator converges to a point on the boundary of instead of
0.
Recall that V =f(x1;y1);:::;(xn;yn)g V denotes the sample data points. LetVm be
the set of all data sets obtained by replacing any m points in V by arbitrary points. The
finite sample breakdown point of an estimatorb is defined as [see Donoho and Huber (1983)]
" n(b ;V) = min
1 m n
m
n : supZ2Vmj
b (Z) b (V)j=1
; (2.11)
where b (V) is the estimate obtained based on the sample V. In nonlinear regression, how-
ever, this definition of the breakdown point fails since " is not invariant to nonlinear repa-
rameterizations. For a discussion of this see Stromberg and Ruppert (1992). We will adopt
the definition of breakdown point for nonlinear models given by Stromberg and Ruppert
(1992). The definition proceeds by defining finite sample upper and lower breakdown points,
"+ and " , which depend on the regression model, f. For any x0 2X, the upper and lower
breakdown points are defined as
"+(f;b ;V;x0) =
8
>>>>
>><
>>>>
>>:
min0 m n mn : supZ2Vmf(x0;b (Z)) = sup f(x0; )
if sup f(x0; ) >f(x0;b );
1 otherwise,
(2.12)
17
and
" (f;b ;V;x0) =
8
>>>
>>><
>>>
>>>:
min0 m n mn : infZ2Vmf(x0;b (Z)) = inf f(x0; )
if inf f(x0; ) 0g
where k [n=2] + 1. Here [b] stands for the greatest integer less than or equal to b. This
forces at least the first half of the ordered absolute residuals to contribute to the dispersion
function. In light of this, the dispersion function may be written as
Dn(V; ) = 1n
kX
i=1
an(i) (jz( )j(i))
The following theorem is a version of Theorem 3 of Stromberg and Ruppert (1992). We
impose the same conditions but give the result in terms of k. The results given are for upper
18
breakdown. Analogues for lower breakdown are straightforward. The proof is obtained by
replacing med1 i n with n 1Pki=1 and m with n k in Stromberg and Ruppert?s (1992)
proof of Theorem 3. In the following, #(A) denotes the cardinality of the set A.
Theorem 2. Assume for some fixed x0 there exist k fi : 1 i ng where #( k) =
2n [n=2] k such that
lim
M"1
inf
f :f(x; )>Mg
finf
i2 k
f(xi; )g = sup
f(x0; )
Then
"+(f;b ;V;x0) n k + 1n :
Theorem 2 establishes that even when the regression function f lies on the boundary
for a portion of the data, the bias of the estimator of 0 remains within reasonable bounds
if trimming is implemented. The following corollary gives the asymptotic (as n ! 1)
breakdown point of b n.
Corollary 5. Let = supfu : ?+(u) > 0g. The asymptotic breakdown point ofb n is at least
1 .
This is reminiscent of the breakdown point of a linear function of order statistics which
is equal to the smaller one of the two fractions of mass at either ends of the distribution
which receive weights equal to zero (Hampel, 1971). The same result obtained in Corollary 5
was given by Hampel (1971) for one-sample location estimators based on linear functions of
order statistics (see sec. 7 (i) of Hampel (1971)).
Consider the class of models with the form f(x; ) = g( 0 + 1x), where ( 0; 1) 2R2
and g(t) is monotone increasing in t. This class of models is considered by Stromberg and
Ruppert (1992) and contains popular models like the logistic regression model g( 0; 1x) =
f1 + exp( ( 0 + 1x))g 1. A breakdown point of 1 can be achieved ifb n is obtained via
a minimization of (2.2) with an(i) = ?+(i=(n+ 1)) such that = supfu : ?+(u) > 0g.
19
Remark 5. A definition of breakdown based on ?badness measures? which includes the defini-
tion given by Stromberg and Ruppert (1992) was given by Sakata and White (1995). Under
our assumptions this definition reduces to the one used in the current paper as shown in
Theorem 2.3 of Sakata and White (1995).
20
Chapter 3
Bounded Influence Nonlinear Signed-Rank Regression
3.1 Introduction
As in the previous chapter, let us consider the following nonlinear regression model
yi = f(xi; 0) +ei; 1 i n; (3.1)
where 0 2 is a vector of parameters, xi is a vector of independent variables in a vector
space X, and f is a real-valued function defined on X . Let V =f(y1;x1);:::;(yn;xn)g
be the set of sample data points. Note that V V R X. We shall assume that
is compact, 0 is an interior point of , and f(x; ) is a twice continuously differentiable
function of for each x2X and a measurable function of x for each 2 . The errors ei
are assumed to be iid with a distribution function G.
The asymptotic normality of the least squares (LS) estimator of 0 has been discussed
in Jennrich (1969), Wu (1981), and Wang (1996) among others. The asymptotic normality
of the least absolute deviations (LAD) estimator of 0 is discussed in Wang (1995). However,
as pointed out in Haupt and Oberhofer (2009), the treatment of Wang (1995) and Wang
(1996) were missing some necessary global conditions. The estimator that will be introduced
in this chapter is based on a generalized form of the signed-rank objective function. It
provides a unified treatment of a class of estimators including those considered in Wang
(1995) and Wang (1996). Moreover, we show how a weight functions can be incorporated
to obtain estimators with bounded influence function (Hampel, 1974). Simply stated, the
influence function represents the amount of change in the estimator caused by infinitesimal
21
contamination in the data. Thus it is a measure of the sensitivity of an estimator to outliers
and it is desired that this function be bounded.
Rank-based estimators of linear models (where f(x; 0) = x0 0 in (3.1)) have been
studiedextensively. Jaeckel(1972)gaveageneralclassofrankestimatorsforlinearregression
parameters that are efficient and robust to outliers in the response space. These include the
Wilcoxon estimator which is equal to the median of pairwise slopes (Yj Yi)=(xj xi) for
the case of simple linear regression. These estimators, however, were found to be sensitive to
outliers in the x direction (Hettmansperger et al., 2000; Hettmansperger and McKean, 1998);
thus having an unbounded influence function. Sievers (1983) introduced weighted Wilcoxon
estimators that were later shown to possess a bounded influence function by Naranjo and
Hettmansperger (1994). Chang et al. (1999) provided one-step estimators that have high
breakdown point based on the weighted Wilcoxon pseudonorm, where the weights depend
on a robust and consistent estimator of 0.
The signed-rank (SR) estimator of the slope parameter in the linear model is also ef-
ficient and robust to outliers in the y direction but sensitive to outliers in the x direction
(Hettmansperger et al., 2000). As the Wilcoxon estimator, the SR estimator is suitable
when dealing with datasets from studies with controlled designs. However, it may be ad-
versely affected when exploring datasets based on uncontrolled studies. To address the lack
of robustness in the x direction, Tableman (1990) provided a one step signed-rank estima-
tor for the linear model that has a bounded influence function. The results of Tableman
(1990) were motivated by the work of Krasker and Welsch (1982) who gave a class of M-
estimators with bounded influence function for linear regression estimation. A framework
similar to Tableman (1990) has been investigated by Wiens and Zhou (1994) who provided
bounded-influence rank estimators in the linear model using a general form of the SR objec-
tive function. They also show how the efficiency can be optimized by appropriate choices of
scores and weights under a boundedness constraint on the influence function.
22
For the general nonlinear model given in (3.1), Abebe and McKean (2007) studied
the asymptotic properties of the Wilcoxon estimator of 0. Just as in linear models, this
estimator was shown to be efficient but sensitive to local changes in the direction of x.
Jure?kov? (2008) also studied the asymptotic properties of the rank estimator of 0 in (3.1).
Her approach takes advantage of the asymptotic equivalence of regression quantiles and
regression rank scores to provide rank scores based on the regression function. The approach
results in a restricted set of scores. Also, the resulting estimator does not possess a bounded
influence function.
In this chapter, we propose a class of rank-based estimators of 0 in (3.1) based on the
minimization of a weighted signed-rank objective function. In contrast with the approach
of Abebe and McKean (2007) and Jure?kov? (2008), this approach allows for a set of scores
generated by any nondecreasing bounded score function that has at most a finite number of
discontinuities. Also, by utilizingthe theory of Sobolev spaces, this approach removescertain
restrictive assumptions such as compactness of X, Lipschitz continuity of the regression
function, boundedness of the first derivative of the density of the error distribution that
were needed in the work of Jure?kov? (2008). Our objective function is very general. For
instance, the LS objective function is a special case of our objective function. However, the
objective function of Jure?kov? (2008) does not include the LS objective function. We also
show how Krasker-Welsch type weights (Krasker and Welsch, 1982) can be defined based on
the regression function f to result in a bounded influence function. Moreover, simulation
studies show that the proposed weighted estimators are also efficient attaining a relative
efficiency of .955 versus least squares when G is Gaussian.
Other robust approaches to nonlinear regression include Stromberg (1993) who provided
computational algorithms for computing high breakdown nonlinear regression parameters
using the least median of squares (Rousseeuw, 1984) and MM (Yohai, 1987) estimators.
Stromberg (1995) establishes the consistency of the least trimmed squares (LTS) estimator
23
for the nonlinear model in (3.1). The LTS was shown to have a high breakdown point by
Stromberg and Ruppert (1992).
For linear models, the estimator proposed in this chapter can be regarded as a general-
ization of the objective function of Tableman (1990) to include other norms such as weighted
LAD and LS. Moreover, since we do not restrict ourselves to the linear model, not only is it
an extension of signed rank estimators for the linear model to the nonlinear regression case,
but it is also a generalization of LAD and LS type estimators for the nonlinear regression
model.
The remainder of the chapter is organized as follows. Our proposed estimator is given
in Section 3.2. Section 3.2 also contains asymptotic and robustness results concerning the
proposed weighted estimator. Section 3.3 gives the results using plug-in estimator of the
weights based on a consistent estimator of the regression parameter. Real data and sim-
ulation examples are given in Section 3.4. Section 3.5 provides a discussion. Proofs and
technical results are given in the appendix.
3.2 Weighted SR Estimator
Consider the signed-rank (SR) estimator, b S, of 0 in equation (3.1) that minimizes
T+n ( ) = Pni=1Rijzi( )j, where zi( ) = yi f(xi; ) and Ri = #fj : jzj( )j jzi( )jg is
the rank ofjzi( )j, i = 1;:::;n. The least squares (LS) and least absolute deviation (LAD)
estimators of 0 minimize Pni=1z2i ( ) and Pni=1jzi( )j, respectively. It is well known that
the LS estimator is sensitive to outliers in both x and y directions while the SR and LAD
estimators are sensitive to outliers in the x-direction. There is clearly a need for a method
that is not sensitive to outliers in both x and y directions. We obtain this by considering a
weighted form of the SR estimator.
24
We define the weighted SR (WSR) estimator b n of 0 to be any vector minimizing
Dn(V;w; ) = 1n
nX
i=1
w(xi; 0)an(i) (jz( )j(i)) (3.2)
where zi( ) = yi f(xi; ) andjz( )j(i) is the ith ordered value amongjz1( )j;:::;jzn( )j.
The function : R+ ! R+ is continuous and strictly increasing. The numbers an(i) are
scores generated as an(i) = ?+(i=(n + 1)), for some bounded and non-decreasing score
function ?+ : (0;1)!R+ that has at most a finite number of discontinuities. The function
w : X !R+ is a continuous weight function. Because Dn(V;w; ) is continuous in ,
Lemma 2 of Jennrich (1969) implies the existence of a minimizer of Dn(V;w; ).
It is clear that weighted LS and LAD are special cases of WSR. Weighted LS is obtained
by taking?+ 1 and (t) = t2, t 0 while weighted LAD is obtained by taking?+ 1 and
(t) = t. In our analyses, however, LS and LAD refer to the unweighted versions obtained
by taking w 1.
In the following, we will establish the asymptotic properties of b n and discuss how
weights can be used to obtain a bounded influence function. As given in (3.2), the weights
depend on the unknown true parameter 0. This will make our derivations cleaner. However,
to be of practical use, the weights would have to be estimated. In Section 3.3, we will discuss
a plug-in estimator of the weights based on a consistent estimator of 0 and how estimators
based on these estimated weights have the same asymptotic properties as their counterparts
based on ?true? weights.
3.2.1 Preliminaries
The following definitions and notations will be used throughout this paper. Let be
a domain. We denote by Lp( ;P), 1 p 1, the space of P-measurable functions on
for which
Z
jhjpdP <1 with the usual modification for p = 1. C1( ) is the space of
smooth(infinitelydifferentiable)functionsdefinedin ,D( ) isthespaceofsmoothfunctions
25
with compact support in and L1loc( ) is the space of locally integrable functions in ; that
is, functions that are integrable in any compact subset of . Let = ( 1;:::; n) 2 Nn0,
N0 = N[f0g, be a multi-index. The differential operator is defined as
D = @
j j
@ 11 :::@ nn ;
where j j = Pni=1 i and = ( 1;:::; n). Let 2 L1loc( ). Given 2 Nn0, a function
2L1loc( ) is called the th-weak derivative of if for all 2D( )
Z
D u du = ( 1)j j
Z
du;
and we put = D .
As an example, consider (u) = juj. Clearly, is not differentiable in the usual sense
at 0. But 2L1loc(R), is weakly differentiable and 0(u) = sgn(u).
Let m2N0 and 1 p 1. The Sobolev space denoted by Wm;p( ) is defined as
Wm;p( ) =f 2Lp( ) : D 2Lp( ) withj j mg :
Given a function K2L1 such that
Z
Rn
K(x)dx = 1, let K (x) = nK(x= ). The family of
functionsfK ; > 0g, is called a mollifier with kernel K and K is known as the Friedrichs?
mollifier. SomeimportantfactsrelatedtoSobolevspacesthatmaybeusefulinourdiscussion
are listed below without proofs. A detailed discussion of these can be found in Brezis (1983)
and Adams (1975).
(S1) K 2C1(Rn), supp(K ) =fx2Rn : kxk g, K 0, and
Z
Rn
K (x)dx = 1. Here
supp(K ) denotes the support of K .
26
(S2) (Regularization Theorem)
Let K be a Friedrichs? mollifier. If 2L1loc(Rn), then the convolution product
K (x) =
Z
Rn
(x y)K (y)dy
exists for all x2Rn. Moreover, K 2C1(Rn), supp( K ) supp( ) + B0(0; )
where B0(0; ) = fx2Rn : kxk g, D ( K ) = D K and supp(D K )
supp(K ). Also ifMis a compact set of points of continuity of , then K !
uniformly onMas !0.
(S3) Let K be a Friedrichs? mollifier. Let 2Wm;p( ) for 1 p 1. Then, K !
in Lp( ) and K ! in Wm;p(!) as ! 0 for all ! . ! means ! is
open, the closure of !, ! is compact and ! .
3.2.2 Consistency
Let ( 0;F;P) be a probability space. For i = 1;:::;n, assume that xi and ei =
yi f(xi; 0) are independent random variables (carried by ( 0;F;P)) with distributions
H and G, respectively. Setting ~G to denote the distribution of jz( )j, we can rewrite
Dn(V;w; ) as
Dn(V;w; ) = 1n
nX
i=1
w(xi; 0)an(i)( ~G 1 )( (i))
where (i) are order statistics from the uniform U(0;1) distribution.
Theorem 3. Let
(I1) P(f(x; ) = f(x; 0)) < 1 for any 6= 0,
(I2) w2Lp(X ) and there exists a function h2Lq(V) such thatj ( ~G 1 (v))j h(v), for
all 2 and all 1 p;q 1such that 1=p+ 1=q = 1, and
(I3) G has a density g that is symmetric about 0 and strictly decreasing on R+.
27
Then b n a:s: ! 0.
Before giving the proof, we state the following lemma without proof. The proof of this
Lemma may be constructed following Lemma 2.
Lemma 4. Under assumptions (A1) (A3), Dn(V;w; ) a:s: ! ( ) a.e. V, uniformly for all
2 , where : !R is a function satisfying inf
2
( ) > ( 0) for any a closed
subset of not containing 0.
Proof. By Lemma 1 of Wu (1981), to establish the consistency of b n, it is sufficient to show
that
lim infn!1 inf
2
D
n(V;w; ) Dn(V;w; 0)
> 0 a.s. (3.3)
forany aclosedsubsetof notcontaining 0. TothatendletAn(V;w; ) = Dn(V;w; )
( ), B( ; 0) = ( ) ( 0), and Cn(V;w; 0) = ( 0) Dn(V;w; 0).
By Lemma 2, we have An(V;w; ) a:s: ! 0 uniformly for all 2 , inf
2
B( ; 0) > 0,
and lim infn!1 Cn(V;w; 0) = 0 a.s. For the statement given in (3.3) to hold, it suffices to show
that lim inf
n!1
inf
2
An(V;w; ) = 0 a.s. An(V;w; ), being uniformly convergent and continu-
ousonacompactset ,isequicontinuouson a.e.V. Thisgives lim infn!1 inf
2
fAn(V;w; )g= 0
a.s. and the proof is complete.
Assumption (I1) is a very weak condition needed for 0 to be identified. The linear
version of (I1) was given by H?ssjer (1994) as P(j 0xj = 0) < 1 for any 6= 0 under the
assumption that 0 = 0. Since?+ is bounded, by (I2), we havekw?+kp <1. Moreover, (I2)
and H?lder?s inequality ensure that the product (w?+)( ~G 1 ) is integrable. (I3) admits
a wide variety of error distributions examples of which are the normal, double exponential
and Cauchy distributions with location parameter equal to 0.
28
3.2.3 Asymptotic Normality
Write (t) = [ ~G 1 (t)] and i = w(xi; 0)an(R i) where R i, i = 1;:::;n are the rank
of 1;:::; n. Then (3.2) can be written as
Dn(V;w; ) = 1n
nX
i=1
w(xi; 0)an(i)( ~G 1 )( (i)) = 1n
nX
i=1
i ( i) :
By (I2),k ikp <1for 1 p 1. Now set n( ) = D Dn(V;w; ) and (t) = D (t)
forj j= 1. Since the dependence of on y is only through z( ), we will suppress y in the
notation and write (x). Now denote the n p matrix X by X = 0(x1);:::; 0(xn)
and define hnii to be the ith diagonal component of X (X TX ) 1X T. Now b n is a zero of
n( ) = 1n
nX
i=1
i ( i): (3.4)
Thus b n can be seen as a weighted M-estimator with weights 1;:::; n. So, under some
conditions, the asymptotic theory of the weighted M-estimation can be applied.
In addition to (I1) - (I3), consider the following conditions:
(I4) ! (t) is a map in W3;p(B), where B is a neighborhood of 0 for every fixed t.
(I5) There exist functions 2W2;p(V) such thatjD (t)j (t) for every 2B and
j j 2.
(I6) A 0 = E w(x; 0)?+( ) [D ( )] = 0 , where U(0;1), is a positive definite matrix
forj j= 1.
(I7) limn!1 max
1 i n
hnii!0
Example 1. The assumptions above allow us to define certain types of hybrid estimators
which may be constructed in the interest of efficiency and robustness. One such estimator is
one that behaves like an LS estimator for small absolute residuals and like an LAD estimator
29
for large absolute residuals. As an illustration, let us consider the one-dimensional case with
= [a;b][[b;c], where a__