New Classes of Multivariate Gamma
Survival and Reliability Models
Except where reference is made to the work of others, the work described in this
dissertation is my own or was done in collaboration with my advisory committee. This
dissertation does not include proprietary or classifled information.
Norou Dini Diawara
Certiflcate of Approval:
Asheber Abebe
Assistant Professor
Mathematics and Statistics
Mark Carpenter, Chair
Associate Professor
Mathematics and Statistics
Nedret Billor
Associate Professor
Mathematics and Statistics
Olav Kallenberg
Professor
Mathematics and Statistics
Jerzy Szulga
Professor
Mathematics and Statistics
Stephen L. McFarland
Dean
Graduate School
New Classes of Multivariate Gamma
Survival and Reliability Models
Norou Dini Diawara
A Dissertation
Submitted to
the Graduate Faculty of
Auburn University
in Partial Fulflllment of the
Requirements for the
Degree of
Doctor of Philosophy
Auburn, Alabama
August 7, 2006
New Classes of Multivariate Gamma
Survival and Reliability Models
Norou Diawara
Permission is granted to Auburn University to make copies of this dissertation at its
discretion, upon the request of individuals or institutions and at their expense.
The author reserves all publication rights.
Signature of Author
August 7, 2006
Date of Graduation
iii
Vita
Norou Dini Diawara, son of Moustapha Bassirou Diawara and Khady Diatou Dieng,
was born in Ziguinchor, S?en?egal on November 15, 1971. He attented public school in Dakar,
Senegal. After graduating from Lamine Gueye?s High School, he entered Cheick Anta Diop
University (U.C.A.D.) in September 1991, from which he received his B.S. in Mathematics
and Physics in 1993. He started his graduate studies in the Department of Mathematics
at U.C.A.D. With the help of his family and uncle Alioune Diop, he went on to complete
his Licence and Ma^?trise at the University of Le Havre in Le Havre, France in 1996. He
continued his studies at the University of South Alabama, in Mobile, AL, USA where he
earned his M.S. in Mathematics and Statistics in 1999. After two years at the University
of Georgia, in Athens, GA. as graduate teaching assistant and working for the Canadian
Consulate General in Atlanta, GA., he came to Auburn University in August 2002 where
he completed his Ph.D. in Mathematics and Statistics in 2006.
iv
Dissertation Abstract
New Classes of Multivariate Gamma
Survival and Reliability Models
Norou Dini Diawara
Doctor of Philosophy, August 7, 2006
(M.S., University of South Alabama, Mobile,AL., 1999)
(M.S., Le Havre University, Le Havre, France, 1996)
121 Typed Pages
Directed by Mark Carpenter
Multivariate modeling and analysis based on the multivariate normal distribution is
well established and widely used. However, when the marginal distributions have only
a positive support, such as time-to-event models, that are positively skewed, often the
multivariate normal theory and resulting approximations fail. Accordingly, over the last flfty
years, thousands of papers have been published suggesting many ways of generating families
of positive support multivariate distributions, such as gamma, Weibull and exponential.
As evidenced by recent literature, this quest is still rigorously pursued even today. In
this dissertation, we provide a large and  exible class of multivariate gamma distributions
that contains both absolutely continuous and discontinuous distributions on the positive
hypercube support. All of these models are applicable to the area of reliability and survival
modeling.
v
Acknowledgments
I would like to thank my advisor, Dr. Carpenter, for his academic guidance, generous
advice, his sharp comments and support, during our many discussions. I am very appre-
ciative of his encouragement and patience when things seemed fuzzy. Without his support,
this thesis would not have been completed. It has been a privilege working with him.
I also wish to express my sincere gratitude to the members of my committee: Dr.
Ash Abebe, Dr. Nedret Billor, Dr. Olav Kallenberg and Dr. Jerzy Szulga for their time,
patience and suggestions that led to me improving this work. I have truly enjoyed working
with each one of them, and I really appreciate their full support. Special thanks to Dr. Olav
Kallenberg for the knowledge I have gained through the seminars and years of research we
had started. I thank him for his continuous support since I came to Auburn University, his
kindness, time, and collaboration. Thanks to the faculty and stafi of our Department, and
to my fellow students for the great experience at Auburn.
I am deeply indebted to my family, who has always supported me, and specially to my
parents and my siblings Papa Amadou, Amadou Belal, Ndeye Maguette, and Ndeye Fatou,
for their love, guidance and vision. Special thanks to my uncle Alioune Diop and my aunt
Ndiaya Mbengue for their kindness, support and encouragement. I am grateful to my wife,
Oumy Ba, for her endless love, friendship, support and encouragement. I thank her for her
patience during all these stages when we had to spend time apart, and her alone taking
care of our baby Sidy Manoumb?e. She is an inspiration to me.
I dedicate this dissertation to my beloved brother Sidy Manoumb?e, and all those who
truly believed in me, encouraged and supported me during this long path.
vi
Style manual or journal used Transactions of the American Mathematical Society
(together with the style known as \auphd"). Bibliography follows van Leunen?s A Handbook
for Scholars.
Computer software used The document preparation package TEX (speciflcally LATEX)
together with the departmental style-flle auphd.sty.
vii
Table of Contents
List of Figures ix
1 Introduction 3
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 The Univariate Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Estimation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Marginal, Conditional and Joint Distributions . . . . . . . . . . . . . . . . . 20
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Preliminaries 25
2.1 The Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 The Dirac Delta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Inflnite Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Stable Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 The Multivariate Gamma 39
3.1 Properties and Characterization . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 The Fatal Shock Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 The Multivariate Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 The Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4 The Continuous Case Model 62
4.1 Deflnition of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Estimation Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 The Multivariate Exponential 78
5.1 The Multivariate Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 The Bivariate Exponential Model and MLE . . . . . . . . . . . . . . . . . . 86
5.3 Simulation Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6 Conclusion 106
Bibliography 107
viii
List of Figures
1.1 Graph of gamma pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Graph of gamma cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Graph of bivariate gamma distribution . . . . . . . . . . . . . . . . . . . . . 38
ix
Abbreviations
Abbreviation Description
iid Independent and identically distributed
EM Expectation Maximization
LST Laplace-Stieltjes transform
MCMC Markov chain Monte Carlo
MLE Maximum Likelihood Estimator
MSE Mean Square Error
cdf Cumulative density function
pdf Probability density function (or mass function in discrete case)
pp. Pages
prob. Probability
et al. and others
a.s. almost surely
i.e. that means
r.s. Random sample
r.v. Random variable
a^b Minimum between a and b
a_b Maximum between a and b
1
Notations
Notation Description
R Set of real numbers
R+ Set of positive real numbers
Z= f0;?1;::g Set of integer numbers
N= f1;2;:::g Set of positive integer numbers
X Random variable
p Dimension of multivariate distribution
[a] The entire part of a number a
P The power set
jAj Cardinal of a set A
A(k) kth element of the set A
Ac(k) kth element of the set Ac, complement set of A
E(X), Var(X) Expectation and variance of the r.v. X
Ga(?;?;fi) Gamma distribution with location ?, scale ?, and shape fi
?(fi) Gamma function with parameter fi
 Parameter set of full model
^ Maximum likelihood estimate for parameter set of full model
FXjY and fXjY conditional cdf and pdf, respectively
X ? f(x) The random variable X follows the density f(x)
LX The Laplace transform of the random variable X
/ Proportional to
2
Chapter 1
Introduction
1.1 Motivation
Multivariate survival or reliability analysis refers to the multivariate modeling of the
difierent times to multiple events that are recorded on the same experimental unit. In such
cases, when more than one time-to-event is observed for each individual under study, it is
reasonable to assume that the event times are dependent. However, it is common practice
to assume \working independence" (see Lawless (2003) [56] for example) where each event
is analyzed separately, while ignoring this dependence (efiectively treating them as inde-
pendent events). This seemingly pragmatic approach can lead to confusing interpretations
and incorrect results. Carpenter et al. (2006) [12] showed that ignoring dependence be-
tween variables comes at a great cost in terms of bias, mean square error (MSE), and other
objective criteria in the bivariate exponential case.
This idea is quite intuitive, because ignoring dependence requires maximizing the wrong
or misspecifled likelihood function. That is, the maximum likelihood estimators (MLE?s) of
the parameters based on the marginal likelihoods are not necessarily the same as the MLE?s
derived from the joint likelihood.
Another problem with ignoring dependence is that such an analysis would require one to
adjustforthemultipletestingproblem. Inclinicaltrials, theFoodandDrugAdministration,
for example, routinely requires conservative adjustments for multiple endpoints such as a
primary endpoint of time-to-death due to a brain tumor, and a secondary endpoint of
time-to-recurrence of the tumor after surgery.
3
Because the marginal distributions are of positive support and often very positively
skewed, treating the observations as multivariate normal can lead to incorrect results and/or
convergence issues. For example, the multivariate mixture model is used in many appli-
cations for unsupervised classiflcation. Hougaard (2001) [37] advises cautions using the
multivariate normal when the underlying distribution does not apply to it.
Also, these methods can be computationally intensive (see Borovkov and Utev (1984)
[7], Chen (1982) [14], Mclachlan and Peel (2000) [67], Johnson and Wichern (1998) [48] )
even if the underlying latent populations are multivariate normal. In fact, Mclachlan and
Peel (2000) [67] provide a whole chapter on mixture models for failure time data.
Therefore, over the last flfty years there has been great interest in deriving and char-
acterizing multivariate distributions with positive support and skewed marginals. A few of
the many widely known references are from Arnold (1967) [2], Barlow and Proschan (1965)
[5], Fang (1990) [24], Furman (2005) [28], Gosh and Gelfand (1998) [30], Hanagal (1996)
[32], Hougaard (2000) [36], Hougaard (2001) [37], Joe (1997) [44], Kotz et al. (2000) [53],
Lee (1979) [57], Lu and Bhattacharyya (1990) [59], Marshall and Olkin (1967) [61], Mathai
and Moschopoulos (1991) [63], Mathai and Moschopoulos (1992) [64], Moran (1969) [69],
Sivazlian (1981) [76], and Walker and Stephens (1999) [79].
In this dissertation, we provide a large and  exible class of multivariate gamma dis-
tributions that contains both absolutely continuous and discontinuous distributions on the
positive hypercube support. All of these models are applicable to the area of reliability and
survival modeling. Also, since this theoretical research resulted from a real world application
we provide methods for parameter estimation for a variety of applications.
4
1.2 Research Question
Our focus is on generating multivariate survival and reliability models with specifled
marginal distributions of the same family, i.e., exponential, Weibull and gamma. One
common use of these models, in survival analysis, is to examine the survival times of patients
in medical or biological settings, based on factors such as treatment, age, education, to
predict the time to death, or time to certain events. In the industrial engineering setting,
the study of time to failure of devices, machines or components in a system, are referred
to as reliability analysis. Although the applications are very difierent, these models can be
used in both contexts of survival and reliability analysis.
Our approach to creating a dependence structure between multiple events, is to in-
directly associate the random variables (r.v.?s) linearly with latent, unobservable r.v.?s.
Specially, let p be some flxed integer, and X0;X1;:::;Xp be r.v.?s with specifled distribu-
tions of the same family, say f(xj ), that are linked through the mutually independent r.v.?s
Z1;Z2;:::;Zp in the following way:
Xi = aiX0 +Zi (1.1)
where ai?s are some nonnegative constants, and Zi independent of X0 for i = 1;2;???;p.
Note that X0;Z1;Z2;???;Zp are considered latent, unobservable r.v.?s , that generate the
observable multivariate vector X = (X1;X2;???;Xp)0:
Much of the research in this dissertation is on the derivation and characterization of the
classes of distributions of Z1;Z2;???;Zp, that produce the specifled marginal distributions
5
of X1;X2;???;Xp; respectively. We derive and study estimators of the parameters under
the resulting dependence structure.
Interestingly, if we treat the latent variables, X0;Z1;???;Zp; as missing observations,
then methods based on conditional expectation and/or expectation-maximization (EM)
algorithms have great potential in estimating the parameters associated with the latent
population. We develop such approaches as well as a missing value substitution techniques
in improving our estimates of the parameters of the marginal distributions and parameters
associated with the latent terms.
Since in this dissertation we focus on a new multivariate gamma distribution, in the
sections that follow, we review the univariate gamma and other multivariate gammas in
their various forms found in the literature and in practice.
1.3 The Univariate Gamma Distribution
Positively skewed distributions on a positive support occur quite often in practical
applications such as in reliability and survival analysis. Lawless (2003) [56] gives some
examples of the most common models, and their applications. The gamma family of dis-
tributions is right tailed, and seems a natural choice for positive valued distributions with
heavy tails. It is widely used in reliability and survival. For example, the gamma dis-
tribution is used in statistical studies of times between earthquakes. So estimation of its
parameters is very important.
The gamma distribution belongs to the class of exponential distributions on the real
line with respect to Lebesgue measure. See McCullagh and Nelder (1989) [66] for example.
6
This family plays an important role in many areas of probability and statistics through the
normal, exponential, gamma, Poisson, binomial distributions, and many other ones.
The exponential family distribution is a large class that allow us to handle a wide
category of distributions, and many of the properties are similar to the normal distribution.
Lehmann(1986) [58] gives some results for that class of distributions.
McCullagh and Nelder (1989) [66] give some examples of models where the gamma
distribution is used in the regression structure under generalized linear model (GLM). GLM
allows a unifled theory for many models in practical statistics, including models for gamma
responses and survival data.
We consider the univariate three parameter gamma distribution X, denoted here as
Ga(?;?;fi). If ? = 0; we denote the two parameter gamma distribution X as Ga(?;fi):
Deflnition 1.1 The three parameter gamma distribution, see Cohen and Whitten (1986)
[16] and Billingsley (1986) [6] for examples, is deflned by the density function
fX(:;?;?;fi) : t 2 (?;1) 7! fX(t;?;?;fi) 2 R+ of the three parameters ? 2 R;? > 0 and
fi > 0 given as
fX(t;?;?;fi) = ?
fi
?(fi)(t??)
fi?1e??(t??)I[?;1)(t) (1.2)
where
? 2R; ? > 0; and fi > 0 are the location, scale and shape parameters, respectively.
?(fi) =
Z 1
0
xfi?1e?xdx is called the gamma function.
The three parameter gamma distribution is also known as Type III of distributions
in the Pearson?s system of distributions. Rather than considering the distribution at the
7
origin, it is sometimes more appropriate to shift it on the real axis, allowing the domain to
be [?;1), as it is done in (1.2). The scale parameter ? is also called the rate or inverse scale
under inverse parametrization. The smaller the value of ?, the wider the spread, and vice
versa. Whereas for the normal distribution, the scale parameter is the standard deviation.
When 0 < fi < 1; the distribution has a vertical asymptote. When fi > 1; the distribution
has a mode at fi?1? + ? and heavy tails. See Figure 1.1 and Figure 1.2 below for the cdf
and pdf, respectively.
The gamma function ?(fi) =
Z 1
0
xfi?1e?xdx with ?
 1
2
?
= p? is the normalizing
constant of the distribution, and satisfles the relation fi?(fi) = ?(fi + 1): The incom-
plete gamma is deflned as I(fi;t) = 1?(fi)
Z t
0
xfi?1e?xdx; as in Lawless (2003) [56], and
?(fi;t) = 1?(fi)
Z 1
t
xfi?1e?xdx is called the complementary incomplete gamma function.
Several special cases of the gamma , such as the Erlang or the exponential, exist, and
are widely used. The gamma distribution of (1.2) with location at the origin and scale 1 is
known as the standard exponential distribution. Most statistical packages, such as SASr
and S-Plus generate standard exponential distributions. Its pdf is:
f(t) = ?e??t; where ? > 0, and t ? 0: (1.3)
The displaced exponential is the exponential shifted by some value ?; denoted exp(?;?)
and deflned as
f(t) = ?e??(t??); where ? > 0, and t ? ?: (1.4)
8
Gamma pdf with lambda=1/2,  
Low thickness: alpha=1, Medium thickness: alpha=3, High thickness: alpha=5
0
0.1
0.2
0.3
0.4
0.5
5 10 15 20 25Figure 1.1: Graph of gamma pdf
The Erlang distribution is also a particular case of the gamma distribution where the
shape parameter fi in (1.2), is a positive integer value, denoted here as n. Its pdf is:
f(t) = ?
n
?(n)(t??)
n?1e??(t??) 2R+; where ? > 0, and n 2N. (1.5)
The Erlang distribution is often used to model the waiting times in queuing systems,
particularly in the case of telephone tra?c engineering. The Erlang distribution is the
probability distribution of the waiting time until the nth arrival in a one-dimensional Poisson
process with intensity ?: The case n = 1 reduces to the exponential distribution. The
9
Gamma cdf with lambda=1/2,  
Low tickness: alpha=1, Medium tickness: alpha=3, High tickness: alpha=5
0
0.2
0.4
0.6
0.8
1
5 10 15 20 25Figure 1.2: Graph of gamma cdf
exponential distribution is the difierence between successive occurrences of events given by
a Poisson process.
Another interpretation due to Barlow and Proschan (1965) [5], explains the Erlang
distribution as the waiting time until failure by a random shock in a device. The device
fails when there are exactly fi shocks that occur randomly over time, fi 2N, at a Poisson
rate with parameter ?. That is the convolution of fi exponential distributions.
10
1.3.1 Properties
The class of gamma distributions which includes the exponential, the Erlang and the
Chi-Square distributions, is widely used in applied and theoretical statistics. Unlike the
normal distribution, the gamma distribution is asymmetric, and cannot be distinguished
by its mean and variance alone, known as flrst and second moments, respectively. In
addition to the mean and variance, this distribution is characterized by its third and fourth
order moments known as skewness and kurtosis, and by higher moments. See Grove and
Coddington (2005) [31] for example.
The expected value of the gamma distribution in (1.2) and its variance are
EX = fi? +? and Var(X) = fi?2; (1.6)
respectively.
The probability that an event of interest has not occurred by time x, the survival or
reliability function, is given by
SX(x) = 1?FX(x) = 1?I
 
fi;?(x??)
?
; (1.7)
where I(fi;t) = 1?(fi)
Z t
0
xfi?1e?xdx:
The instantaneous rate of occurrence of an event, also called hazard function, or failure
rate function, is obtained from (1.7) and is deflned as:
h(x) = f(x)S(x) = ?
fi
?(fi)
(x??)fi?1
(1?I(fi;?(x??)))e
??(x??); (1.8)
11
and the cumulative hazard rate is
H(x) =
Z x
?
h(u)du = ?logS(x): (1.9)
The hazard function (1.8), for the gamma density in (1.2), is increasing when fi > 1
from 0 to ?, decreasing when fi < 1 from 1 to ?, and constant when fi = 1 (exponential
case).
Many authors have recognized the importance of modeling and studying the hazard
function (1.8) or the cumulative hazard rate (1.9). For example, Cox (1972) [20] used the
hazard function to get estimators of the efiects of difierent covariates on the times to failures
of a system, by his famously proposed Cox proportional hazard model.
Balakrishnan and Wang (2000) [4] showed that the estimation problem for the three
parameter gamma has always been a challenging and interesting one, in particular when the
shape parameter fi is small (less than 2.5). As the shape fi ! 1, the gamma distribution
in (1.2) approaches normality.
Another property related with the gamma pdf is that it is closed under scalar product.
More precisely, if X ? Ga(?;?;fi), then
Y = cX ? Ga(c?; ?c;fi) (1.10)
for all c > 0;
and fY (y) = fX(yc)? 1c = (?=c)
fi
?(fi) (y ?c?)
fi?1e??c(y?c?):
12
The Laplace transform of (1.2) is given by
LX(t) = e??t
 ?
?+t
?fi
: (1.11)
Some advantages of the gamma distribution are in the simple forms of its density (1.2)
and its Laplace transform (1.11) further discussed in Example 2.2.
Another important property is that in the Erlang type distribution, G(?;?;m) can be
thought of as the sum of m independent distributions that are each displaced exponential
as in (1.4) with parameters ?i and ? such that
mX
i=1
?i = ?.
Moreprecisely, if Xi ? exp(?i;?) i = 1;2;???;m;thenX =
mX
i=1
Xi ? G(
mX
i=1
?i;?;m):
Dudewicz and Mishra (1988) [23], and Billingsley (1986) [6] showed that a sum of inde-
pendent gamma r.v.?s with same location and scale parameters ? and ?; is again a gamma
r.v. with the added shape parameters. The class of distributions fGa(?;?;m); m 2Ng
is then additive as its sum operation is well deflned. This is analogous to the informally
deflned multiplicative distribution for distributions that have a well deflned multiplicative
operation.
1.3.2 Estimation of Parameters
Since the gamma distribution has many applications and is widely used, obtaining
reliable estimates of its parameters is a key issue, especially since the gamma can model
a variety of hazard functions. There are several methods of estimation such as graphical
methods, methods of moments, the least squares method, and the maximum likelihood
method.
13
The estimations of the shape and scale are afiected if the location has to be estimated
using method of moments. Refer to Bowman and Shenton (2002) [9] for an example.
The location parameter ?, in most cases non-negative, is typically called the threshold
parameter. Here, it is interpreted as the minimum possible lifetime and can be estimated
by the minimum order statistics, as the condition x ? ? must be always satisfled, although
it is only an MLE in the exponential case. Note that Balakrishnan and Wang (2000) [4]
also considered a general linear combination of the order statistics,
^? = Pni=1 cix(i);
where x(1);x(2);???;x(n) are the order statistics from a given sample of size n.
However, as mentioned for example in Bowman and Shenton (1988) [8], there is a high
degree of deviation of the estimates from the parent distribution if one uses the method
of moments. In exploratory data analysis, the sample mean and the sample standard de-
viation are two quantities often computed as measures of center and spread, respectively.
In the normal distribution case, the population mean and variance are mathematically in-
dependent, and the sample mean and variance are known to be statistically independent.
However, in the non normal case, the population mean and variance are often linked math-
ematically (See McCullagh and Nelder (1989) [66]), i.e. the higher the mean, the higher the
standard deviation, and in fact the sample mean and sample variance are not independent.
The coe?cient of variation establishes a similar independence property for the gamma
case. The population coe?cient of variation, CV, for a distribution X with mean ?X and
standard deviation  X is deflned as CV =  X?
X
. Estimates are obtained for example by
Hwang and Huang (2002) [41], among other authors. They derive the results, establishing
that the sample mean ?X and the sample variance S2n are unbiased estimators of (1.6),
14
respectively. (See Hwang and Hu (1999)[39] and Hwang and Hu (2000)[40]). We recall from
Dudewicz and Mishra (1986) [23] and Casella and Berger (1990) [13], that the independence
of the sample mean and sample standard deviation characterizes the normal distribution.
Theorem 1.2 establishes a similar characterization for the gamma distribution through the
sample coe?cient of variation CVn = Sn?X
n
.
Theorem 1.2 Suppose that n ? 3. X1;X2;:::;Xn are independent Ga(?;?;fi) r.v.?s ifi
?Xn and CVn are independent.
Proof: Proved in Hwang and Hu (1999) [39].
Hwang and Hu (1999) [39] use this characterization to obtain estimates of shape and
scale parameters for the gamma distribution. They showed that these new estimators
perform better that the MLE, and the method of moments estimators for a sample of size
n ? 25:
Using (1.2), the likelihood function based on an independent identically gamma dis-
tributed random sample X1;X2;???;Xn of size n is called , is given by:
L(?;?;fi) = ?
nfi
?n(fi)
nY
i=1
(xi ??)fi?1e??
Pn
i=1(xi??)I(?;1)(xi):
The likelihood is viewed as a function of the parameters ?; ?; and fi conditioned on
the data. To ease the computations, one often deals with the logarithmic of the likelihood,
and the log likelihood is given as
l(?;?;fi) = nfilog??n log?(fi)+(fi?1)
nX
i=1
log(xi ??)??
nX
i=1
(xi ??):
15
Based on the strong law of large numbers or ergodic theorems, the maximum likelihood
estimators have useful properties such as consistency and su?ciency. They achieve the
Cram?er-Rao minimum variance asymptotically. They have the smallest variance among all
possible estimators asymptotically as the sample size increases. In other words, MLE?s are
statistically e?cient and given better and often more reliable estimates than the method of
moments.
1.4 Literature Review
Several approaches and models have been suggested and developed for constructing the
multivariate gamma distribution by many authors such as Johnson and Kotz (1970) [45],
Hougaard (2001) [37], Marshall and Olkin (1967) [61], Tanner (1996) [77], Moran (1969)
[69], Mathai and Moschopoulos (1992) [64], and Walker and Stephens (1999) [79].
Deflning the multivariate gamma distribution can be quite intuitive, but properties are
di?cult to illustrate. That could explain why numerous versions of multivariate gamma
distributions are proposed in the literature. In particular, various forms of bivariate models
have been described from combinations of independent gamma distributions. Johnson et
al. (1997) [53] gives a coverage of a useful variety. One approach is to start with certain
desirable statistical properties in the univariate case, and build the multivariate extension
through various mechanisms. We take one such approach in this dissertation. See for
example Mathai and Moschopoulos (1991) [63], Gaver and Lewis (1980) [29], and Hanagal
(1996) [32]. Another approach, which deflnes the multivariate gamma from the bivariate,
is not as straightforward, since many bivariate gamma do not have a natural extension
to multivariate dependence. For example, see Marshall and Olkin (1967) [61], Marshall
16
and Olkin (1983) [62], and Sivazlian (1981) [76]. In fact, Marshall and Olkin (1967) [61],
and Marshall and Olkin (1983) [62] proposed an approach that does not extend to the
multivariate case easily. Sivazlian (1981) [76] gives a form of multivariate gamma with
independent structure.
Walker and Stephens (1999) [79] deflne a multivariate family of distributions on Rp+
in the context of survival analysis. It includes the Weibull, with interesting properties for
which the marginal distribution is of the same family, with a  exible relationship between
the components, straightforward analysis, and is applicable for censored data. As they
stated, the extension to the multivariate setting has proved quite problematic. Our goal
is to solve that problem and maintain these properties as well in our construction of the
multivariate gamma distribution.
AnotherclassicalmultivariategammaisknownasCheriyanandRamabhadran?sgamma
distribution and is given in Kotz et al. (2000) [53]. It is a multivariate additive distribution
which is a simple case of the one we deflne in (1.1) where the nonnegative constant is taken
to be unity. Results for this distribution are also noted in Mathai and Moschopoulos (1991)
[63], which they further extend.
Henderson and Shimakura (2003) [34] presented a version of multivariate gamma dis-
tribution with gamma marginal distributions and correlation matrix without a closed form
for the density. Moreover, the estimation of parameters and asymptotic properties are not
well studied in those cases.
One needs to develop a multivariate gamma distribution that can be used in modeling
processes, such as those deflned by Marshall and Olkin (1983) [62] for the bivariate logistic
model, or the bivariate exponential model by Mardia (1970) [60]. These methods and
17
others are given in Coles and Tawn (1994) [19], where it is shown how statistical methods
in the multivariate context may be applied to problems of data analysis in the extremes in
particular.
As we said earlier, for the multivariate normal X = (X1;X2;???;Xp)0; with p ?
p positive deflnite covariance matrix ? having Var(Xi) =  2i (1 ? i ? p) as diago-
nal elements of ?; Chen (1987) [15], among others, obtained a characterization of X as
U(X;?) = 1; where
U(X;?) = supg2C Var[g(X)]E[rtg(X)?rg(X)]:
Here C is the class of measurable functions g : Rp 7! R such that E[g(X)]2 < 1 and
Var[g(X)] > 0 and rg(X) represents the gradient operator.
Then for the multivariate gamma, U(X;?) > 1: While in the Gaussian case, where the
correlation matrix and the marginal distributions completely specify the joint distribution,
those two components (correlation matrix and marginal distributions) do not induce a
unique joint distribution for correlated gamma r.v.?s.
We cannot arbitrarily set X such that the marginal distributions are gamma, i.e Xi ?
Ga(0;?i;fii) for i = 1;???;n; and Corr(Xi;Xj) = ?ij 2 [?1;1] for i;j 2 f1;2;???;ng:
The di?culty arises from the fact that the mean and variance/covariance cannot be modeled
arbitrary as in the normal case, because in the gamma case, the f?ijg1?i;j?n are functions
of the shape parameters. This is true for all multivariate gamma motivated in the literature
to date.
Using the spirit of the multivariate normal density, several versions of the multivari-
ate gamma can be found. Fang and Zhang (1990) [24] describe a family of multivariate
18
distributions called p-dimensional r.v. X = (X1;X2;???;Xp)0 with symmetric Kotz type
distribution (KTD) denoted as X ? KTDp(?;?p?p;fi;m;fl) where
? ? represents the location vector in Rp,
? ?p?p is the positive deflnite covariance matrix,
? fi is such that fi > (2?p)=2,
? m > 0 and fl > 0.
It is deflned as having a joint density of the form
fX(x) = Cpj?j?1[(x??)t??1(x??)]fi?1expf?m[(x??)t??1(x??)]flg;
where Cp is the normalizing constant given by,
Cp = fl?(p=2)m
p=2fl+(fi?1)=fl
?p=2?(p=2fl +(fi?1)=p):
When fl = 1;fi = 1; and m = 12; we obtain the multivariate normal distribution.
Using Fang and Zhang (1990) [24] approach, and modifying the multivariate normal,
we hope to have properties similar to the multivariate normal. We deflne the square root
of a vector as the square root of its components. We introduce the type of multivariate
gamma as follows:
fX(x) = Cpj?j?1[(m[x??])(1=2)]t??1(m[x??])1=2]fi?1
?expf?[(m[x??])(1=2)t??1(m[x??])1=2]flg:
19
Taking fl = 1;m = (pfi1;???;pfip)t, we have the form of the multivariate gamma
distribution. When the Xi?s are independent, we have the desired form. The result is
not obtained when there is dependence. This multivariate gamma is an example of the
multivariate dispersion model introduced by Jorgensen (1987) [49].
1.5 Marginal, Conditional and Joint Distributions
Here, we review the relevant concepts of marginal, joint and conditional distributions
that are referred to frequently in this dissertation. These concepts can be found in text-
books such as Feller (1971) [25], Dudewicz and Mishra (1988) [23], Folland (1999) [26], and
Kallenberg (2002) [51]. We also deflne the concept of conditional independence.
Deflnition 1.3 Let (?;A;P) be a probability space, i.e. ? is a set, A is a  -algebra, a
family of subsets of ? and P is a mapping, P : A ! [0;1] such that (i) P(?) = 1 and (ii)
if fAn;n 2Ng disjoints sets in A, then P(Sn An) = P1n=1 P(An).
Deflnition 1.4 Let B(R) be the Borel  ?fleld generated on R: A random variable (r.v.) X
is a measurable function on a probability space, i.e. X is a mapping X : ? !R such that
for every B 2B(R), X?1B 2A.
Deflnition 1.5 The cumulative distribution function (cdf) of X is a function FX, also
denoted as F if there is no confusion, F :R! [0;1] deflned as
F(x) = (P ?X?1)((?1;x]) = P(X ? x); x 2R:
The cdf is a nondecreasing, right continuous, and has the following limits:
limx!?1F(x) = 0 and limx!1F(x) = 1.
20
For a flxed integer p 2 N, the joint cumulative distribution of p?r.v.?s, or p?variate
denoted X = (X1;???;Xp)0 is an extension of the above concept, and is deflned on a
probability space (?;A;P). When ? =Rp, the joint cdf of X is deflned as F :Rp ! [0;1]
by:
F(x1;???xp) = P(X1 ? x1;???;Xp ? xp): (1.12)
The joint cdf is right continuous, and satisfles the following:
limxi!1F(x1;???;xi;???;xp) = F(x1;???;xi?1;xi+1;???;xp);
which is the joint distribution of the remaining p?1-variates and
limx1;???;xp!?1F(x1;???;xp) = 0 and limx1;???;xp!1F(x1;???;xp) = 1:
The marginal cdf of Xk; with 1 ? k ? p; is deflned as
F(xk) = limx1;???;xk?1;xk+1;???;xp!1F(x1;???;xp):
The p?r.v?s. are said to be independent if (1.12) can be written as
F(x1;???;xp) = FX1(x1)???FXp(xp);
and then E(X1???Xp) = E(X1)???E(Xp) provided that all the expectations exist. In addi-
tion, the r.v.?s fgi(Xi);1 ? i ? pg are independent. It is also known that uncorrelated r.v.?s
can be deduced from independence, but uncorrelated does not always imply independence.
Another concept that is useful is that of conditional distributions.
Deflnition 1.6 Suppose (X1;X2;???; Xp)0 is a random vector. The conditional distribu-
tion of (X1;X2;???;Xk)0; with 1 ? k ? p; given (Xk+1;???;Xp)0; is given by the ratio of
the joint distribution of all the r.v.?s over the joint marginal distribution of the conditioned
21
r.v.?s. It is given by:
fX1;X2;???;XkjXk+1;???;Xp(x1;x2;???;xkjxk+1;???;xp) = fX1;X2;???;Xp(x1;x2;???;xp)f
Xk+1;???;Xp(xk+1;???;xp)
;
provided that fXk+1;???;Xp(xk+1;???;xp) > 0; where f could be a pdf or a pmf.
Deflnition 1.7 The p?variate vector X = (X1;X2;???;Xp)0 is said to be conditionally
independent, conditioned on a variable, say X0; if
f(x1;x2;???;xpjx0) = f1(x1jx0)f2(x2jx0)???fp(xpjx0): (1.13)
where fi(?jx0) represents the conditional pdf of Xi; i = 1;???;p given X0:
(1.13) is equivalent to F(x1;x2;???;xpjx0) = F1(x1jx0)F2(x2jx0)???Fp(xpjx0)
where Fi(?jx0) represents the conditional cdf of Xi; for i = 1;???;p given X0:
Theorem 1.8 The joint density of (X0;X1;:::;Xp) satisfying (1.1) is expressed as:
f(x0;x1;x2;???;xp) = f(x0)
pY
i=1
fZi(xi ?aix0):
and (X1;X2;???;Xp)0 are conditionally independent given the latent variable X0.
Proof: Using the independence of Zi?s, i = 1;::;p; of each other and of X0, we have that
the joint density of X0;Z1;Z2;???;Zp ; given as:
f(x0;z1;z2;???;zp) = f(x0)f(z1)f(z2)???f(zp).
Hence based on the linear transformation Xi = aiX0 +Zi () Zi = Xi?aiX0; for i =
1;2;::;p, we have that:
f(x0;x1;x2;???;xp) = f(x0)f(x1 ?a1x0)???f(xp ?apx0)
22
= f(x0)
pY
i=1
fZi(xi ?aix0):
From Deflnition 1.7, (X1;X2;???;Xp)0 given X0 are independent as the conditional
distribution of (X1;X2;???;Xp)0 given X0 is:
f(x1;x2;???;xpjx0) = f(x0;x1;x2;???;xp)f(x
0)
= fZ1(x1 ?a1x0)fZ2(x2 ?a2x0)???fZp(xp ?apx0)
=
pY
i=1
fZi(xi ?aix0)
=
pY
i=1
fXi(xijx0):
Note that, given x0, aix0 plays the role of a location parameter. So xi ? aix0 for all
i = 1;2;???;p. In Chapter 3, we see that when x0 is unknown, a possible estimate for it is
the minimum of the xi=ai?s; i = 1;???;p:
Deflnition 1.9 : The random vector X = (X1;X2;???;Xp)0 is multivariate gamma if its
components Xi; i = 1;???;p; marginally are of gamma family following the structure de-
scribed in (1.1), and each subset of X has a multivariate gamma form.
1.6 Thesis Outline
In this dissertation, we are concerned with the three parameter gamma distribution.
The characterization of the joint gamma distribution of X = (X1;X2;???;Xp) based on
(1.1):
23
? has Mathai and Moschopoulos (1992) [64] as a special case and a more  exible linear
structure,
? has Iyer et al. (2004) [43] as a special case both in terms of structure and dimension,
? has Nadarajah and Kotz (2005) [72] as a special case,
? and shares similar properties to the Marshall and Olkin (1967) [61] model, that in-
cludes the continuous and discontinuous portions. Marshall and Olkin?s expression
for the bivariate exponential model has received a lot of attention due to its allowance
for the simultaneous failure of several system components induced from a fatal shock.
We provide difierent inference procedures for the parameters, and characterize location-
scale families of multivariate gamma distributions. The marginal distributions are tradi-
tional location-scale gammas, and their joint distribution contains absolutely continuous
classes, as well as, the Marshall-Olkin (1967) [61] type of distributions with a positive
probability mass on a set of measure zero. MLE?s are developed in the bivariate case.
The dissertation is organized as follows. In Chapter 2, we start with preliminary
materials and fundamental concepts used in these new classes of survival and reliability
models. In Chapter 3, we derive and characterize the generalized multivariate gamma
distribution. The continuous case of the multivariate gamma is studied in Chapter 4. In
Chapter 5, a discontinuous case of the multivariate gamma, in the form of multivariate
exponential, is given.
24
Chapter 2
Preliminaries
The necessary mathematical concepts for this dissertation are concentrated in this
chapter. We review the notions of Laplace transforms (See Feller (1971) [25], Abramowitz
and Stegun (1972) [1], and Billingsley (1986) [6]) and Dirac delta (See Cohen-Tannoudji et
al. (1977), and Khuri (2004) [52]), inflnite and stable distributions (See Feller (1971) [25],
and Hougaard (2000) [36]).
2.1 The Laplace Transform
The Laplace transform (the equivalent concept of moment generating function) is used
in many areas of statistics, probability theory, and risk analysis. We are interested in the
Laplace transform associated with real arguments. The characterization by the means of
the real Laplace transform does not presume the existence of moments
Deflnition 2.1 If X is a r.v. deflned on R+ with cdf FX, satisfying P(X = 0) < 1, then
its Laplace-Stieltjes transform (LST) is the function valued in R deflned in Abramowitz and
Stegun (1972) [1] as:
LX(s) = Ee?sX =
Z 1
0
e?sxdFX(x): (2.1)
Here are some properties associated with the LST:
? existence: the integral in (2.1) is with respect to Lebesgue-Stieltjes integration. In our
cases of positive support distributions, (2.1) always exists. In fact, 0 < LX(s) ? 1:
? LX(s) is inflnitely difierentiable, and d
nLX
dsn (s) exists for all n 2 N:
25
? For m 2N; the mth moment of X is given by EXm = (?1)mL(m)X (0):
? additivity: the LST of the sum of independent r.v.?s is obtained by taking the product
of the LST of the individual r.v. For X1;:::;Xn independent r.v.?s, then X = Pni=1 Xi
has LST:
LX(s) = Ee?sX = E
nY
i=1
esXi =
nY
i=1
EesXi =
nY
i=1
LXi(s):
? the LST can be deflned for a p?variate distribution, say X = (X1;X2;:::;Xp)0 as:
LX(s) = Ee?(s1X1+???+spXp); for s = (s1;s2;???;sp)0 2Rp+:
? uniqueness: if X1 and X2 are two r.v.?s such that LX1(s) = LX2(s) then fX1(x) =
fX2(x);for all x except on a set of measure 0. Therefore, the LST completely charac-
terizes the distribution.
The LST helps in the computations and in the linear combinations of r.v.?s associated
with some distributions.
Example 2.2 If X ? Ga(?;?;fi) with pdf (1.2), then its LST is given by (1.11), as:
LX(s) =
Z 1
?
e?sx ?
fi
?(fi)(x??)
fi?1e??(x??)dx
= e?s?
Z 1
?
e?s(x??) ?
fi
?(fi)(x??)
fi?1e??(x??)dx
= e?s?
Z 1
?
?fi
?(fi)(x??)
fi?1e?(?+s)(x??)dx
= e?s?
Z 1
0
?fi
?(fi)t
fi?1e?(?+s)tdt
= e?s? ?
fi
(?+s)fi
Z 1
0
(?+s)fi
?(fi) t
fi?1e?(?+s)tdt
= e?s?
 ?
?+s
?fi
:
26
The gamma distribution, shifted at the origin, with unit mean has LST
LX(t) = e??t
 1
1 +t=fi
?fi
?! e??te?t = e?(?+1)t as fi !1:
Example 2.3 : Let X1;X2;???;Xn, n > 1, be independent r.v.?s distributed as Ga(?i;?;fii)
for 1 ? i ? n: It is well known that X = X1+X2+???Xn has density Ga(
nX
i=1
?i;?;
nX
i=1
fii):
This is easily shown using LST. The Laplace transform of the gamma distribution Xi; i =
1;???;n; given in Example 2.2 becomes:
LXi(s) = e?s?i
 ?
?+s
?fii
.
Using the properties of the LST, we have
LX(s) =
nY
i=1
LXi(s) =
nY
i=1
e?s?i
 ?
?+s
?fii
= e?s
Pn
i=1 ?i
 ?
?+s
?Pni=1 fii
) X ? Ga(
nX
i=1
?i;?;
nX
i=1
fii):
In the remaining cases, we consider situations where the ??s are difierent. When two
or more of the gamma distributions have same parameter ?, we can add the r.v.?s to obtain
another gamma distribution with the same parameter ?; but a difierent shape parameter.
Example 2.4 Suppose X and Y are independent discrete and positive support continuous
distributions with pmf and pdf p(x) and f(y), respectively. Then
LXY(s) = Ee?sXY =
Z 1
0
X
x
e?sxyf(y)p(x)dy
=
X
x
 Z 1
0
e?sxyf(y)dy
?
p(x)
=
X
x
LY(sx)p(x):
27
We later use Example 2.4 in the case when X is a Bernoulli random variable with
probability p, and Y is a positive support distribution. In that case, we have that:
LXY(s) = p +(1 ?p)LY(s):
We are also interested in the sum of r.v.?s. For two continuous r.v.?s X1 ? f1 and
X2 ? f2, their sum, X = X1 +X2; has density obtained from the joint pdf as:
fX(x) =
Z 1
?1
Z x?x1
?1
f(x1;x2)dx1dx2:
The convolution of two functions f;g :R!R is the function
(f ?g)(x) =
Z 1
?1
f(t)g(x?t)dt:
If the r.v.?s are independent, the density of their sum is the convolution of their densi-
ties, and can be represented as below.
Theorem 2.5 Assume that X1 and X2 are independent r.v.?s deflned on R+ with pmf/pdf
f1 and f2; respectively. Then X = X1 +X2 has density
fX(x) = (f1 ?f2)(x) =
Z x
0
f1(t)f2(x?t)dt:
Proof: Proved in Hunter and Nachtergaele (2001) [38]
However, a lot of work can be alleviated as the LST of X in Theorem 2.5 is given by:
LX(t) = LX1(t)LX2(t); and is recognized in some distributional form.
Example 2.6 Consider X to be the sum of two iid of exponential type r.v.?s with parame-
ter ?: Then fX(x) =
Z x
0
f1(t)f2(x ? t)dt =
Z x
0
?e??t?e??(x?t)dt = ?2xe??x; which is a
Ga(0;?;2):
28
Theorem 2.7 For X1 ? Ga(?1;?1;fi1) and X2 ? Ga(?2;?2;fi2), the r.v. X = X1 + X2
has a gamma distribution Ga(?1 +?2;?;fi1 +fi2) ifi ?1 = ?2 = ?:
Proof: The LST of X is given by LX(s) = e?(?1+?2)s ?
fi1
1 ?
fi2
2
(?1 +s)fi1(?2 +s)fi2
If ?1 6= ?2, then the LST is not representative of a gamma distribution.
Considering the case where ?1 = ?2 = ?, then the LST and the density of the sum X =
X1 +X2 become:
LX(s) = e?s?
 ?
?+s
?fi
fX(x) = ?
fi
?(fi)(x??)
fi?1e??(x??)
where fi = fi1 +fi2 and ? = ?1 +?2:
We generalize this above result later in Theorem 3.3. The sum of independent gamma
distributions with same scale parameter gives a gamma distribution with same scale para-
meter with the location and scale updated. In that sense, the gamma distribution follows
an additive property. Mathai and Moschopoulos (1992) [64] use this property to generate
their class of multivariate gamma distributions.
2.2 The Dirac Delta Function
Another important collection of tools that we use are the properties of the Dirac dis-
tribution, sometimes referred to as the Dirac delta function. See Cohen-Tannoudji et al.
(1977) [18], and Khuri (2004) [52].
Deflnition 2.8 Dirac delta function at the point c 2R, is a point mass distribution denoted
?c, and we say that a r.v. X has point mass ?c distribution at c if its pmf is given by
f(xjc) = ?c(x) = ?(x?c) = 0 if x 6= c; and
Z 1
?1
f(xjc)dx = 1: (2.2)
29
The integral notation in (2.2) is not mathematically justifled. ? is not rigourously
deflned as a function, but as a distribution, and there have been several references on this
function. Cohen-Tannoudji et al. (1977) [18] gives a complete discussion.
Despite its name, the Dirac delta function is not a function in the classical sense. One
reason for this is that because the function f(x) = ?(x); and g(x) = 0 a:e: , are equal
almost everywhere, yet their (Lebesgue) integrals are difierent. See Folland (1999) [26]
and Rudin (1976) [74]. Another reason is that it is singular. Instead, it is said to be a
distribution. It is a generalized idea of functions, and can be used inside integrals. The well
known mathematician Laurent Schwartz gave it in 1947 a rigorous mathematical deflnition
as a linear functional on the space of test functions D, the set of all real valued inflnitely
difierentiable functions with compact support on R such that for a given f(x) 2 D, the
value of the functional is given by Kallenberg (1986) [50], and Hunter and Nachtergaele
(2001) [38]. Such linear functionals are called generalized functions or distributions. For
this reason, the delta function is more appropriately called Dirac?s delta distribution. Thus,
the value of the Dirac delta function ?x is deflned by its action of a function f(x) 2 D when
used in integral as in formula (2) in Khuri (2004) [52].
The theory of distributions in mathematics has been highly developed, and as a result,
the Dirac delta function is well established and accepted in mathematics as a generalized
function or distribution. Note also that it has been modifled from the original version
deflned by Dirac in 1920.
It is well known that the Heaviside step function is an antiderivative of the Dirac
distribution. The Heaviside step function, also called unit step function, see for example
30
Abramowitz and Stegun (1972) [1], is a discontinuous function deflned as
H(x) =
Z x
?1
?(t)dt =
8>
<
>:
0; if x ? 0;
1; if x > 0.
(2.3)
The value of the Heaviside function at 0 is sometimes taken to be 0; or 12 (most popular
for symmetry purposes) or 1: Here, we take it to be 0:
Both Dirac and Heaviside functions have been used in a variety of flelds of science and
engineering. Their use in statistics is relatively new. For another reference see the paper
by Pazman and Pronzato (1996) [73].
The Dirac delta function is a very useful tool in approximating tall narrow spike func-
tions (also called impulse functions), and the following integralZ
?1
?1
f(x)?(x)dx = ?[f] = f(0)
for any (test) function f(x), is more a notation for convenience, and not a true integral.
It can be regarded as an "operator" or a linear functional on the space of (test) functions,
which gives the value of the function at 0. It is important to see that the integral is simply
a notational convenience, and not a true integral.
More details are given in Kallenberg (1986) [50], Williamson (1962) [80], Au and Tam
(1999) [3] and Shilov and Gurevich (1977) [75] for examples. So as a distribution, the Dirac
delta function ?(x?s) is a pdf with mean median and mode s, cdf H(x?s), variance and
skewness 0 satisfying the following:
?
Z 1
?1
?(fix)dx =
Z 1
?1
?(u)dujfij = 1jfij
? ?(fix) = ?(x)jfij
31
? ?(x) = lima!0?a(x) where ?a(x) = 1ap?e?x2=a2 as limit of a normal distribution.
To end this review, we note the following results:
c > 0; H(cx) = H(x);
H(x?a) = 1?H(?x+a) = 1?H(a?x) andZ
H(x?a)dx = (x?a)H(x?a).
The Dirac delta distribution can be thought as the limit case of a distribution whose
density must be concentrated at the origin point. So for a r.v. X with Dirac density
?(x?c);c ? 0, the LST is given by LX(t) = e?ct:
The moments for the Dirac delta function ?c are given by: EXk = ck; Var(X) = 0;
and its characteristic function is given by `(t) = eitc:
The Dirac function provides a very helpful tool in mathematical statistics as it provides
a unifying approach in the treatment of discrete and continuous distributions and their
transformations. We give two examples in each case below.
Example 2.9 Khuri (2004) [52]
Suppose X is a discrete r.v. that takes values a1;a2;???;an with corresponding probabili-
ties p1;p2;???;pn; respectively. Assume that Pni=1 pi = 1: Then the pdf p(x) of X can be
represented as p(x) = Pni=1 pi?(x?ai):
For example, if X ? B(n;p) where B(n;p) is the binomial distribution, then
p(x) = Pni=0 pi(1?p)n?i?(x?i) and the kth noncentral moment of X is given by
Z 1
?1
xkp(x)dx =
nX
i=1
pi
Z 1
?1
xk?(x?ai)dx =
nX
i=1
akipi:
32
Example 2.10 Khuri (2004) [52]
Let X1 and X2 be independent r.v. distributed as X1 ? Ga(? = 12;fi) and X2 ? Ga(? =
1
2;fl). The joint distribution of Y =
X1
X1+X2 and Z = X1 +X2 can expressed as:
f(y;z) =
Z 1
?1
Z 1
?1
f(x1;x2)?[ x1x
1 +x2
?y]?(x1 +x2 ?z)dx1dx2
= 1?(fi)?(fl)2fi+fl
Z 1
0
xfl?12 dx2
Z 1
0
xfi?11 e?12(x1+x2)
??( x1x
1 +x?2
?y)?[x1 ?(z ?x2)]dx1
= 1?(fi)?(fl)2fi+fl
Z 1
0
xfl?12 (z ?x2)fi?1e?12z?[z ?x2z ?y]dx2
= 1?(fi)?(fl)2fi+flyfi?1(1?y)fl?1zfi+fl?1e?12z:
The cumulants are alternative ways of summarizing the properties of the r.v.?s specially
when the moments are not easily obtained. The moments of X are not directly related to
the moments of aX + b, for a and b positive constants. The moments of X = X1 + X2
do not have simple relation with the moments of X1 and X2: The idea is then to use the
cumulants.
Deflnition 2.11 The function ?(s) = logLX(?s) is called the cumulant generating func-
tion of X.
Expanding ?(s) in its power series, ?(s) =
1X
i=1
?i
i! s
i, gives coe?cients ?i called ith
cumulant of the distribution X. The cumulant generating function allows one to relate the
cumulants to the moments. It allows us to characterize inflnitely divisible LST?s which is
given in Subsection 2.3. See Feller (1971) [25].
33
2.3 Inflnite Divisibility
There are important results obtained from the concept of inflnite divisibility which have
many applications in the theory of limit distributions for the sum of independent r.v.?s. In
general it is di?cult to determine whether a given distribution is inflnitely divisible or
not. We would like to consider what conditions are required for the pdf of the gamma
distributions and its mixture to be inflnitely divisible. We also consider the exponential
distribution.
We flrst give notations and a deflnition. Let the symbol d= denote equality in distribu-
tion, and d?! denote convergence in distribution.
Deflnition 2.12 Consider a random vector X. Its distribution is said to be inflnitely divis-
ible if for every n 2N there exist iid random vectors Xn1;Xn2;???;Xnn with Pk Xnk d= X:
In other words, an inflnitely divisible r.v. X has pdf f(x) that can be represented as the
sum of an arbitrary number of iid r.v.?s X1;X2;???;Xn, with cdf Fn , that is:
X d= X1 +X2 +???+Xn
hence the term inflnitely divisible. Borrowing from Billingsley (1985) [6] pp. 383-384, the
distribution F of X is the n-fold convolution Fn ?Fn ?????Fn where Fn is the distribution
function of Xi, 1 ? i ? n:
We use that property of inflnitely divisible to characterize the distribution of the un-
known r.v. from a linear relationship as given in (1.1).
Two simple examples of inflnitely divisible distributions are the Poisson distribution
and the negative Binomial distribution.
34
Deflnition 2.13 : A function ` on the interval I = [0;1) is completely monotone if it
possesses derivatives `(n) at all orders which alternate in sign, i.e. if
(?1)n`(n)(s) ? 0; for all s in the interior of I, and n = 0;1;2;:::
Theorem 2.14 ` is completely monotone ifi ` is the Laplace transform of some measure.
Proof: See Feller (1971) [25].
For two real valued functions `1 and `2 that are completely monotone, so is their
product and their compositions, when appropriately chosen.
It is important to note that any completely monotone probability density function is
inflnitely divisible. See Feller (1971) [25]. Moreover, if ` is completely monotone on [0;1)
and `(c) = 0 for some c > 0, then ` must be identically zero on [0;1):
All Erlang type distributions are inflnitely divisible. The Erlang distribution deflned
in (1.5), is inflnitely divisible in the sense that it can be represented as the sum of n
independent r.v.?s with a common distribution. (See Feller (1971) [25] pages 176-177). In
fact, n need not be an integer as the gamma can be expressed in a canonical form of an
inflnitely divisible distribution.
Notice that the distribution of the sum of mutually independent gamma r.v.?s with
difierent scale parameters, is not gamma, even if they are mutually independent. Rather,
it is described as a mixed gamma with mixing shape parameter. Using ideas from Mathai
and Moschopoulos (1991) [63] and Mathai and Saxena (1978) [65], we characterize the
distribution of the sums in Chapter 4.
35
2.4 Stable Distributions
There are many results about the limit theorems. See Kallenberg (2002) [51] and
Billingsley (1986) [6]. We study the class of positive stable distributions. The results could
also apply to distributions other than gamma. Here we introduce the necessary deflnitions
and recall limit laws for the joint distribution of the sum of independent distributions from
Feller (1971) [25].
Deflnition 2.15 (Stable distribution, Feller (1971) [25]) Let X0;X1;X2;??? be an iid se-
quence of r.v.?s with common distribution F. The r.v. X0 is stable or the distribution F
is stable (in the broad sense) if for each n there exist constants cn > 0 and  n such that
Sn d= cnX0 +  n; where Sn = X1 + X2 +???+ Xn and F is not concentrated at one point.
We say F is stable in the strict sense if the above holds with  n = 0:
Stable distributions are a rich class of probability distributions that allow one to gen-
eralize the Central Limit Theorem (CLT), replacing the assumption of flnite variance with
a less restrictive condition on the inflnite variance. For data with heavy tails, stable distri-
butions should flt better than a normal distribution.
From Deflnition 2.12 above, the two r.v.?s X0 and Sn are of the same type as
fSn(x) = fX
 x? 
n
cn
?
:
Stable distributions can be characterized with the concept of domain of attraction.
Deflnition 2.16 (Domain of Attraction, Feller (1971) [25]) The distribution of the inde-
pendent sequence of r.v.?s X1;X2;??? belongs to the domain of attraction of a r.v. X0 with
distribution F, if there exist norming or scaling constants an > 0 and bn such that
Sn
an ?bn =
X1 +X2 +???+Xn
an ?bn
d?! X
0 as n !1:
36
The above deflnition is difierent from Deflnition 2.12 in the sense that there is not
equality in distribution, but convergence in distribution. In fact, Billingsley (1985) [6] page
389, gives a more restrictive version of Deflnition 2.13, where for each n, there exit constants
an > 0 and bn such that Sna
n
?bn has same distribution as the iid r.v.?s X1;X2;???;Xn , and
calls it stable. Also, the distribution of X0 is not necessarily the same as the distribution of
the sequence. Finally, this deflnition recaptures the CLT when we take a sequence of r.v.?s
with mean ? and variance  2, where an =
pn
 and bn =
n3=2?
 with X0 being the standard
normal distribution. The deflnition also states that similar limit theorems are possible for
distributions without variance, that is stable distributions.
The next result states the equivalence of Deflnition 2.12 and Deflnition 2.13.
Theorem 2.17 A r.v. X0 with distribution F possesses a domain of attraction ifi it is
stable.
Proof: Given in Feller (1971) [25].
Hence the non-degenerate limiting distribution of a sum of gamma distributions must
be a member of a stable class.
However, gamma distributions are not stable, although they are inflnitely divisible.
Stable distributions are inflnitely divisible, but not the opposite. For another example,
consider the case of the Poisson distribution. The Poisson distribution, although inflnitely
divisible, is not stable (See Billingsley (1986) [6] Section 28 page 389). The class of inflnitely
divisible distribution is larger than the class of stable distributions. That leads us to explore
other possibilities.
37
Consider now two independent gamma distributions X1 ? Ga(?1;?1;fi1) and X2 ?
Ga(?2;?2;fi2). The graph of the joint distribution is given below in Figure 2.1.
Figure 2.1: Graph of bivariate gamma distribution
We characterize the multivariate gamma distribution in the next chapter.
38
Chapter 3
The Multivariate Gamma
In this chapter, we propose and characterize a quite general multivariate gamma dis-
tribution, with specifled three parameter gamma marginals of Type III in the Pearson?s
system. We introduce some notations and basic concepts of multivariate gamma distribu-
tions. Because a multivariate distribution function is completely characterized by its mean,
variance and covariance structure, we give these properties. We also propose a series of pa-
rameter estimation techniques that can be adapted to special classes of distributions such
as those discussed in Chapter 4 (Continuous class) and Chapter 5 (Fatal shock).
Multivariate gamma distributions belong in the class of multivariate lifetime distri-
butions, since all the components of the random vector have positive support and their
marginals follow a typical lifetime distribution. Hougaard (2000) [36] among other authors
give many applications of the multivariate lifetime distributions. As he mentioned, the fleld
has gaps and is still growing both in theory and application.
The bivariate exponential, one simple case of bivariate gamma, plays a fundamental role
in survival and reliability analysis because of its adaptation to many practical situations.
The complexity of the model, in its notation and estimation technique, increases with the
number of components. Within this framework, Moran (1969) [69], among other authors,
had also described a form of a bivariate gamma process and given estimation techniques
of one parameter using the other variate as a control. It has been di?cult to work with
until the development of proportional hazard models or frailty models. See Cox (1972)
[20], Ghosh and Gelfand (1998) [30] and Hougaard (2000) [36] to mention a few. However,
39
we do not deal with the proportional hazard model here, since it is primarily used for the
univariate survival case. In fact, Cox?s proportional hazard model, sometimes considered as
nonparametric, is typically treated as multivariate in only the recurring events situations,
such as monitoring times until repeated hospitalizations of the same individual, or repeated
failures of a device in a system. Our model is truly multivariate in the sense that the
associated parameters are mathematically independent and we can model components with
difierent means and variances.
3.1 Properties and Characterization
In the next deflnition, we formally deflne our multivariate gamma that was brie y
described in (1.1).
Deflnition 3.1 : Let X0;X1;:::;Xp gamma r.v.?s with shape parameters fii, scale para-
meters ?i and location parameters ?i, i.e Xi ? Ga(?i;?i;fii) for i = 0;1;:::;p as given
in (1.2). Let Z1;Z2;:::;Zp be mutually independent r.v.?s satisfying Xi = aiX0 + Zi for
some ai > 0 and Zi independent of X0 for i = 1;2;:::;p. We deflne the joint distribution
of the random vector X = (X1;X2;:::;Xp)0 as the multivariate gamma distribution for a
flxed integer p.
Although the density function is not explicitly expressed in Deflnition 3.1, we can study
various properties of the multivariate gamma distribution by using the linear relationship
given in (1.1), and the fact that Zi; i = 1;::;p; are independent of each other and of X0.
Note that the distribution of Zi; i = 1;::;p; has not been specifled. As we will explain later,
depending on the choice of the constants ai, the distributions of Zi take on many forms. In
fact, in some cases, Zi is a gamma, and in other cases, Zi is a mixture of distributions.
40
The linear correlation between Xi and the latent term X0 is expressed as:
?(Xi;X0) = Cov(Xi;X0)pVar(X
i)
pVar(X
0)
;
where Cov(Xi;X0) = E(X0Xi)?E(X0)E(Xi) is the covariance between X0 and Xi, and
Var(X0) and Var(Xi) denote the variances of X0 and Xi; respectively.
Note that if the Zi were degenerate, say Zi ? bi; or P(Zi = bi) = 1; bi 2 R+; i =
1;2;::;p; then the linear correlation, a measure of linear dependence, would be perfect. That
is Zi ? bi ) Xi = aiX0 + bi, which gives ?(X0;Xi) = ?1; or more precisely ?(Xi;Xj) =
sgn(aiaj). The degenerate case is the only case where j?j = 1:
This linear relation (1.1) seems to be natural as pointed out and motivated in Carpenter
et al. (2006) [12], where properties for the exponential model are given. Similarly, from
this structural relationship, we derive the following values of the means, variances, and
covariances:
E(Xi) = fii?
i
+?i; (3.1)
E(Zi) = E(Xi)?aiE(X0) = (fii?
i
?aifi0?
0
)+(?i ?ai?0); (3.2)
Var(Xi) = fii?2
i
; (3.3)
Var(Zi) = Var(Xi)?a2iVar(X0) = fii?2
i
?a2i fi0?2
0
> 0 () fiifi
0
>
 a
i?i
?0
?2
; (3.4)
Cov(Xi;Xj) = aiajVar(X0) = aiajfi0?2
0
() ?(Xi;Xj) = aiaj?i?j?2
0
fi0p
fiifij; (3.5)
Cov(Zi;Zj) = 0; (3.6)
for i 6= j; i;j = 1;::;p:
41
It is clear from Deflnition 3.1 and Equation (3.5) that Xi and Xj; i 6= j; i;j = 1;::;p;
are positively correlated. This is a restriction in our model, but not unusual in this fleld.
See He and Lawless (2005) [33] for more examples of positive correlations in lifetime data.
Also, note that (3.1), (3.3), (3.4) and (3.5) can be more concisely expressed in the mean
vector of the mean of X:
E(X) =
 fi
1
?1 +?1;
fi2
?2 +?2;???;
fip
?p +?p
?0
; (3.7)
the variance/covariance matrix:
? =
0
BB
BB
BB
BB
B@
fi1
?21 a1a2
fi0
?20 ??? a1ap
fi0
?20
a2a1fi0?2
0
fi2
?22 ??? a2ap
fi0
?20
... ... ... ...
apa1fi0?2
0
??? apap?1fi0?2
0
fip
?2p
1
CC
CC
CC
CC
CA
= fi0?2
0
0
BB
BB
BB
BB
B@
?20
?21
fi1
fi0 a1a2 ??? a1ap
a2a1 ?20?2
2
fi2
fi0 ??? a2ap
... ... ... ...
apa1 ??? apap?1 ?20?2
p
fip
fi0
1
CC
CC
CC
CC
CA
;
and the correlation matrix:
? = fi0?2
0
0
BB
BB
BB
BB
B@
?20
fi0
a1a2?1?2p
fi1fi2 ???
a1ap?1?pp
fi1fip
a2a1?2?1p
fi1fi2
?20
fi0 ???
a2ap?2?pp
fi2fip
... ... ... ...
apa1?p?1p
fipfi1 ???
apap?1?p?p?1p
fipfip?1
?20
fi0
1
CC
CC
CC
CC
CA
:
For Xi ? Ga(?i;?i;fii) for i = 0;1;???;p, based on the LST given in (2.1) and using
the binomial theorem, we have the following:
dmLXi(s)
dsm = ?
fi
mX
k=0
 m
k
??dk(?
i +s)?fi
dsk
 ?dm?ke??is
dsm?k
 
; for m 2N:
Setting s = 0, we have that
42
E(Xmi ) =
mX
k=0
 m
k
?
?fi+ki ?m?ki (fii)k;
where (fi)k = fi(fi+1)???(fi+k?1):
The moments and cross-moments are given for m and l 2N; by:
E(Xmi ) = E(aiX0 +Zi)m =
mX
r=0
 m
r
?
ariEXr0EZm?ri
=
mX
r=0
 m
r
?
EZm?ri ari
rX
k=0
 r
k
?
?fi+ki ?r?ki (fii)k
and E(Xmi Xlj) = E(aiX0 +Zi)m(ajX0 +Zj)l
= E
mX
r=0
 m
r
?
ariXr0Zm?ri
lX
s=0
 l
s
?
asjXs0Zl?sj
=
mX
r=0
lX
s=0
 m
r
? l
s
?
ariasjEXr+s0 EZm?ri EZl?sj ; i 6= j; i;j = 1;2;:::;p:
The LST has many helpful properties as discussed in Section 2.1. In particular, in this
dissertation, we use it for many of the derivations of the distributions of the latent variables.
The multivariate LST for any p-variate random vector X deflned in (1.1) is:
LX(s) = Ee?s?X = Ee?
Pp
i=1 siXi
= Ee?
Pp
i=1 si(aiX0+Zi) = E
pY
i=1
e?aisiX0e?siZi
= Ee?X0
Pp
i=1 aisi
pY
i=1
Ee?siZi = LX0
 pX
i=1
aisi
? pY
i=1
LZi(si)
= LX0
 pX
i=1
aisi
? pY
i=1
LXi(si)
LX0(aisi) ;
43
for s = (s1;s2;:::;sp)0; regardless of the underlying distributions as long as the LST exists.
If X0 ? Ga(?0;?0;fi0); then the above becomes
LX(s) = e??0
Pp
i=1 aisi
 ?
0
?0 +Ppi=1 aisi
?fi0 pY
i=1
LXi(si)
LX0(aisi): (3.8)
Note that from the above, the LST of Zi is:
LZi(s) = LXi(s)L
X0(ais)
; for i = 1;2;::;p: (3.9)
As long as LZi in Equation (3.9) is completely monotone as in Deflnition 2.13, the
conditions of Deflnition 3.1 are satisfled, and the marginals are gamma. Equation (3.9)
will be used extensively throughout this dissertation, but for now we present an important
theorem that gives some properties on the scale and shift transformations of the multivariate
gamma distribution given in Deflnition 3.1.
Theorem 3.2 The class of multivariate gamma distributions based on Deflnition 3.1 is
closed under scale transformation in the sense of products with diagonal matrices with pos-
itive entries, shift transformations, and under flnite independent convolutions.
Proof: Consider a multivariate gamma X = (X1;???;Xp)0, and let C = (ci)i=1;::;p be a
diagonal matrix with positive entries. Set Y = CX. Then we have that
LY(s) = Ee?s?Y = Ee?
Pp
i=1 siYi
= Ee?
Pp
i=1 cisi(aiX0+Zi) = E
pY
i=1
e?aicisiX0e?cisiZi
= Ee?X0
Pp
i=1 aicisi
pY
i=1
Ee?cisiZi = LX0(
pX
i=1
aicisi)
pY
i=1
LZi(cisi)
44
= e??0
Pp
i=1 aicisi
 ?
0
?0 +Ppi=1 aicisi
?fi0 pY
i=1
LXi(cisi)
LX0(aicisi);
for s = (s1;s2;:::;sp); which is of the same form as in (3.8). The mean EY = CEX and
the variance is expressed as: Var(Y) = C0Var(X)C:
Let d = (d1;d2;???;dp) and W deflned as: W = (W1;W2;???;Wp)0 := X + d =
(X1 +d1;X2 +d2;???;Xp +dp). Then, we have the following:
LW(s) = Ee?s?W = Ee?
Pp
i=1 siWi = Ee?
Pp
i=1 si(Xi+di)
= Ee?
Pp
i=1 sidie?
Pp
i=1 si(aiX0+Zi) = Ee?
Pp
i=1 sidi
pY
i=1
e?aisiX0e?siZi
= e?
Pp
i=1 sidiEe?X0
Pp
i=1 aisi
pY
i=1
Ee?siZi
= e?
Pp
i=1 sidiLX0(
pX
i=1
aisi)
pY
i=1
LZi(si)
= e?
Pp
i=1 sidie??0
Pp
i=1 aisi
 ?
0
?0 +Ppi=1 aisi
?fi0 pY
i=1
LXi(si)
LX0(aisi)
= e?
Pp
i=1(di+?0ai)si
 ?
0
?0 +Ppi=1 aisi
?fi0 pY
i=1
LXi(si)
LX0(aisi);
which is of the same form as in (3.8). So the multivariate gamma is closed under shift
transformation of the form W = X+d.
Consider now n independent p-variate gamma distributions Xi=(Xi1;Xi2;???;Xip)0
for i = 1;???;n; and set
X = X1 +???+Xn= (X11 +???+Xn1;X12 +???+Xn2;???;X1p +???+Xnp)0.
LX(s) = Ee?s0(X1+???+Xn) = Ee?
Pp
i=1 si(X1i+???+Xni)
45
=
pY
i=1
Ee?si(X1i+???+Xni) = LX1(s)???LXn(s);
which has the form of n product of (3.8).
Recall that the distribution of Zi has not been specifled, but its LST is given in (3.9). In
the next theorem, we state a very interesting property, as it gives a necessary and su?cient
condition for all the latent variables Zi to be of the gamma type distributions (1.2).
Theorem 3.3 Suppose that for some ai > 0; i=1,...,p, Xi ? Ga(?i;fii) i=0,1,...,p,
are related as in Deflnition 3.1 with Zi; i = 1;:::;p; then
Zi ? Ga(?i;fii ?fi0) () ai = ?0?
i
and pi = P(Zi = 0) = 0:
Proof: From the expression of ai = ?0?
i
and the result in (1.10), we have that aiX0 ?
Ga(?i;fi0): Since Xi ? Ga(?i;fii); it follows from Example 2.3 (the additive property of the
gamma distribution) that Z must be distributed as Zi ? Ga(?i;fii ?fi0):
Conversely, assume that Zi ? Ga(?i;fii?fi0). Then from Example 2.2 and the independence
between X0 and Zi, we have that:
LXi(t) = LX0(ait)LZi(t) =) LX0(ait) =
 ?
0
?0 +ait
?fi0
=
 ?
i
?i +t
?fii ?
i
?i +t
?fi0?fii
:
Hence ?0?
0 +ait
= ?i?
i +t
; and solving for ai gives
?0 +ait
?0 =
?i +t
?i =) 1+
ait
?0 = 1+
t
?i or ai =
?0
?i :
Mathai and Moschopoulos (1991) [63] among other authors made use of this particular
coe?cientoflinearrelationshipbetweentwogammadistributions, anddevelopedproperties,
46
approximations, and inequalities based on that particular model. In this dissertation, we
allow for more general representations.
In Chapter 4, we examine further this speciflc form, and call it the continuous class of
distributions. In all other cases where the conditions of Theorem 3.3 are not satisfled, we call
that form the \fatal shock" class, because there is a positive probability of simultaneous
or proportional occurrences. In the Section 3.2, we focus on the characterization of the
distribution of Zi; i = 1;2;::;p in the fatal shock class. This last type of distributions is
further studied in Chapter 5.
All the Zi?s take on similar structures. To simplify the notations, without lost of
generality, we drop the indices, and we rewrite (1.1) as:
Y = aX +Z; for some a > 0; (3.10)
where Y;X;a and Z in (3.10) play the role of Xi;X0;ai and Zi in (1.1) respectively.
A simple computation shows that the cdf of Z is:
P(Z ? z) = P(Y ?aX ? z)
= P(Y = aX)+(1?p)P(X ? Y ?za )
= P(Y = aX)+(1?p)
?
1?P(X ? Y ?za )
?
= P(Y = aX)+(1?p)
?
1?
Z 1
z
FX(y ?za )fY (y)dy
?
:
Since
Z 1
z
FX(y ?za )fY (y)dy = FX(y ?za )FY (y)
?1
z
?
Z 1
z
1
afX(
y ?z
a )FY (y)dy
47
= 1?
Z 1
z
1
afX(
y ?z
a )FY (y)dy;
the cdf can be further simplifled as
P(Z ? z) = P(Y = aX)+ 1?pa
Z 1
z
fX(y ?za )FY (y)dy;
where P(Y = aX) = P(Z = 0) is the probability of proportional occurrence.
Note that if the conditions of Theorem 3.3 are not satisfled, then
P(Y = aX) = P(Z = 0) > 0: In Section 3.2, we further characterize the distribution of Z:
3.2 The Fatal Shock Model
In this section, we study the form of the distribution of Z obtained in (3.10) for a
general a > 0 where a 6= ?0?
i
.
Suppose that X and Y are distributed as X ? Ga(?;?;fi) and Ga(?0;fi0;?0), respec-
tively. Assuming a linear relationship as in (3.10), and from Equation (3.9), where a is some
positive constant, and Z is independent of X, the LST of Z becomes:
LZ(s) = LY(s)L
X(as)
= e(???0)s
 ?0
?0 +s
?fi0 ?+as
?
?fi
:
All distributions Z satisfying this relation (3.10) for all a 2 (0;1); are called members
of the self decomposable r.v.?s. See also Gaver and Lewis (1980) [29].
Previously, we have studied the properties of the r.v. Z, derive its LST in (3.9), mean
in (3.2), variance and covariance in (3.4) and (3.6), respectively. However, the form of the
distribution of Z has not been given. In the next theorems, we give the required forms that
48
the distribution of Z must take on to give gamma marginals. We state the main theorems,
and start with the Erlang case deflned in (3.10).
Theorem 3.4 Erlang type distributions.
Let X ? Ga(?;?;fi) and Y ? Ga(?0;?0;fi0) such that fi;fi0 2N;fi ? fi0 and as in (3.10), Z
is independent of X. Then Z is the shifted sum of independent:
? fi r.v.?s, each being product of Bernoulli and exponential distribution with parameter
?0;
? and of fi0 ?fi gamma r.v.?s with scale ?0 and shape k = 1;::;fi0 ?fi:
Proof: Using the Laplace transform technique, we have:
LZ(s) = LY(s)L
X(as)
= e?(???0)s?
0fi0
?fi
(?+as)fi
(?0 +s)fi0
= e?(???)s?
0fi0
?fi
(?+as)fi
(?0 +s)fi
? 1
(?0 +s)fi0?fi
?
By the theorem of expansion in partial fractions, with fi instead of fi0 ?fi,
1
(?0 +s)fi =
fiX
k=1
Ck
(?0 +s)k where Ck are real numbers.
Hence:
LZ(s) = e?(???0)s ?
0fi0
?fi
?
p +(1 ?p)LY(s)
?fi fi0?fiX
k=1
Ck
?0k
 ?0
?0 +s
?k
:
where Ck; k = 1;2;:::;fi0 ?fi; are constant and p = a ??0: Based on the properties of the
LST from Section 2.1, we have that
?
p + (1 ? p)LY(s)
?
is the product of Bernoulli and
exponential r.v.?s as discussed in Example 2.4 and in Carpenter et al. (2006) [12]. So?
p+(1?p)LY(s)
?fi
is the product of fi independent of those terms.
49
Also, sincetheLSTofasumoffunctionsisthesumoftheirLST,then
fi0?fiX
k=1
Ck
?0k
 ?0
?0 +s
?k
is the LST of the sum of fi0 ?fi gamma distributions.
The special case of the exponential where fi = fi0 = 1 has been studied in Carpenter et
al. (2006) [12], and by Iyer et al. (2002) [42]. A similar result in the gamma context has
been given in Carpenter and Diawara (2006) [11].
Theorem 3.5 Suppose X ? Ga(?;?;fi), with fi 2N; and Y ? Ga(?0;?0;fi0) as in (3.10),
and fi ? fi0. Then Z is the sum of two r.v.?s:
? a Ga(???0;?0;fi0 ?fi) and
? another distribution which is the sum of fi independent r.v.?s each of which is a product
of a Bernoulli and an exponential with parameter ?0.
Proof: From the expression of LZ,
LZ(s) =
??0
?
(?+as)
(?0 +s)
?fi
e(???0)s
? ?0
?0 +s
?fi0?fi
;
the result follows.
Also, assuming Z = X2 ?aX1, and using the Taylor approximation of
log(1+s) =
1X
n=0
(?1)n+1
(n+1)!s
n+1;
the cumulants of Z are obtained as
logLZ(s) = logLX2(s)?logLX1(as)
= fi1 log(1? as?
1
)?fi2 log(1? s?
2
)
50
= fi1
1X
n=0
(?1)n+1
(n+1)!(
as
?1)
n+1 ?fi2
1X
n=0
(?1)n+1
(n+1)!(
s
?2)
n+1
=
1X
n=0
(?1)n+1
(n+1)!s
n+1
?
fi1( a?
1
)n+1 ?fi2( 1?
2
)n+1
?
Another characterization of Z is obtained by considering several cases for the Ga(?;?;fi)
and Ga(?0;?0;fi0).
Corollary 3.6 Let fi = fi0 2N, then Z is a shifted sum of fi independent r.v.?s, each being
a product of a Bernoulli with mean (1?p) and an exponential with parameter ?0, where the
value of the shift is ???0.
Proof: fi 2N. Then the LST of Z can be expressed as:
LZ(s) = e(???0)s
??0
?
(?+as)
(?0 +s)
?fi
= e(???0)s
?
(1?p)( ?
0
?0 +s)+p
?fi
:
Theorem 3.7 Suppose X ? Ga(?;?;fi), and Y ? Ga(?0;?0;fi0) as in (3.10), with fi < fi0,
and assume that [fi+1] ? fi0. Then Z is is the sum of
? [fi+1] independent r.v.?s, each being a product of a Bernoulli with mean (1?p) and
an exponential with parameter ?0
? Ga(???0;?0;fi0 ?[fi+1])
? Ga(0;?=a;[fi+1]?fi)
51
Proof:
LZ(s) =
??0
?
?+as
?0 +s
?[fi+1]
e(???0)s
? ?0
?0 +s
?fi0?[fi+1]??+as
?
?fi?[fi+1]
and notice that: ?
?+as
?
?fi?[fi+1]
=
? ?
?+as
?[fi+1]?fi
:
In fact, instead of assuming [fi+1] ? fi0, it su?ces to assume that [fi+?] ? fi0 for some
? 2 [0;1]:
Theorem 3.8 Suppose X ? Ga(?;?;fi), and Y ? Ga(?0;?0;fi0) as in (3.10), with fi =
fi0 =2N, then Z does not have the LST of some measure.
Proof: fi =2N The explicit formula in terms of the LST is
LZ(s) =
??0
?
(?+as)
(?0 +s)
?[fi]
?
? ?0
?0 +s
?fi?[fi]
?e(???0)s
??+as
?
?fi?[fi]
The question we have is how to interpret
??+as
?
?fi?[fi]
:
However, set `(s) =
??+as
?
?fl
; where fl = fi?[fi] 2 [0;1]:
`(1)(s) = fla?
??+as
?
?fl?1
`(2)(s) = fl(fl ?1)
 a
?
?2??+as
?
?fl?2
...
`(n)(s) = fl(fl ?1)???(fl ?n+1)
 a
?
?n??+as
?
?fl?n
;
52
So `(0) = 1, and has derivatives that alternate in sign. However, by Theorem 2.14,
from Feller (1971) [25], (?1)n`(s) ? 0; for all n = 0;1;2;::: ` is not completely monotone.
We have that ` is not the Laplace transform of some probability distribution.
Theorem 3.9 Let X ? Ga(?;?;fi) and Y ? Ga(?0;?0;fi0) such that fi 2N; fi0 2R+; fi0 ?
fi and as in (3.10). Then Z has LST of some measure ifi fi = fi0.
Proof:
LZ(s) = LY(s)L
X(as)
= e(???0)s
 ?0
?0 +s
?fi0 ?+as
?
?fi
= e(???0)s
??0
?
(?+as)
(?0 +s)
?fi??+as
?
?fi?fi0
:
The expression
??+as
?
?fi?fi0
is completely monotone ifi fi ? fi0. Since fi0 ? fi
must be satisfled also, then fi = fi0:
We have seen that the distribution of Z takes on many difierent forms depending on the
parameters that are given. In Theorem 3.3, we found the only case where the latent variable
is of continuous gamma type as the marginals. This case is further studied in Chapter 4.
The results from Theorem 3.4 up to Theorem 3.9 give the various forms that the distribution
of Z can take. Theorem 3.4 and Theorem 3.5 are similar, as they express the distribution
of Z as a sum of product of Bernoulli and exponential, and that of gamma r.v.?s. The
special case, given in the Corollary 3.6, states that the Erlang case where fi = fi0 2 N;
leads to just the product of fi Bernoulli and exponential. These results give alternative
ways of characterizing the distribution of the latent variables, an opportunity to replace the
unconditional expectation, density by the unconditional expectation, density. That allow
to increase the e?ciency in the variance covariance matrix of parameters. In all those cases,
53
fi ? fi0 is a required condition for the model validation. The Theorem 3.7 shows that there
must be an integer value between fi and fi0 to justify the gamma marginals for Y and X in
the Equation (3.10). If there is no such integer value between fi and fi0, then Z does not
have the LST of some measure. It is not be represented by a distributional form. Theorem
3.9 conflrms that flnding in the sense that it gives back the result obtained in Corollary 3.6.
3.3 The Multivariate Gamma
The previous section overviews the possible forms of distribution that Z, based on
Equation (3.10), can take. Our primary goal is to obtain the multivariate gamma form of
X as in Deflnition 3.1. That means we need to study the distribution ofZ = (Z1;Z2;???;Zp)
based on Equation (1.1) and Deflnition 3.1.
In Chapter 4, we study the case where the latent variables Zi are all of gamma types
as in the result obtained from Theorem 3.3. In that case the probability of proportional
occurrence is pi = 0; for i = 1;2;::;p. Many authors have suggested that formulation.
Mathai and Moschopoulos (1991) [63] use that form to give a version of multivariate gamma.
Theygivepropertiesofsuchmodelintermsofmeans, varianceandcovarianceofparameters,
approximations, and inequalities. In this dissertation, we have proposed a general model as
in Deflnition 3.1 of a multivariate gamma, that simplifles to the one proposed by Mathai
and Moschopoulos (1991) [63], and Mathai and Moschopoulos (1992) [64]. Nadarajah and
Kotz (2005) [72] derive a linear combination of exponential and gamma r.v.?s. They assume
that there is no probability of simultaneous occurrence between Xi and X0: However, as
given in the famous paper by Marshall and Olkin (1969) [61], that is not meaningful in
\shock model" conditions. Our model captures the shock model that Marshall and Olkin
54
(1967) [61] studied in the simple case of the multivariate exponential distribution. That
lead us to elaborate on the multivariate exponential distribution, and the results are given
in Chapter 5.
When ai = ?0?
i
; ;i = 1;2;::;p in Equation (1.1), then X reduces to multivariate gamma
proposedbyMathaiandMoschopoulos(1991)[63]. Wegivetheformofthejointdistribution
of Z in the next chapter, its interprtation and estimation procedures of its parameters.
Iyer et al. (2002) [42] and Iyer et al. (2004) [43] dealt with the case of positive flxed
ai?s in the one dimensional case assuming that X0 and Xi are exponential with parameters
?0 and ?i; ;i = 1;2;::;p; respectively. Carpenter et al. (2006) [12] showed that the density
of Zi is then expressed as:
fZi(z) = pi?(z)+(1?pi)fXi(z)I(z>0); (3.11)
where pi = P(Zi = 0) = ai ?i?0; i = 1;2;::;p and ? is the Dirac delta function given in
Equation (2.2). The result from Carpenter et al. (2006) [12] is an adaptation of Corollary
3.6, and the multivariate form of the distribution of Z is given in Chapter 5.
Based on the multivariate form of Z; we derive the form of the multivariate distribution
of X. We then avoid problems that many authors, say Fang and Zhang (1990) [24] and
Walker and Stephens (1999) [79] for examples, run into when trying to characterize the
multivariate gamma distribution. We have a close form expression, a feature that was
di?cult to derive from Henderson and Shimakura (2003) [34]. The expression is not simple,
but it is tractable in contrast to what Mathai and Saxena (1978) [65] have suggested in
terms of inflnite series.
55
3.4 The Autoregressive Model
In this section, we focus on another approach to obtain a model similar to the direct
linkage. This is an important extension to the multivariate distribution with correlation
structure following the flrst order autoregressive scheme as stated for further research in
the conclusion of Minhajuddin et al. (2003) [68]. In some applications, the parameter of
interest depends on time. Time series could then be applied to the phenomena which is
described linearly as
Xt = atXt?1 +Zt for t 2 N:
Brockwell and Davis (1996) [10] describe a similar model with some flxed at = a; jatj <
1 and the Zt are iid normal with EZ2t =  2. That gives that E(XtjFt?1) = aXt?1 where
Ft?1 is the set of measurable information up to time t?1:
Let Xt denote the observation at time t. The sequence of r.v.?s (X1;X2;???) is known as
atimeseries. AgraphofXt againstt = 1;2;???isatimeplot, andcanrevealtrends(upward,
downward, variant) or seasonal variation. A model for a time series is described by the joint
distributions of the r.v.?s (X1;X2;???;Xn), which is P(X1 ? x1;X2 ? x2;???;Xn ? xn) for
every integer n = 1;2;:::
However, in practice, only partial information (such as mean, variance) may be avail-
able. Also, we cannot write the joint density as the product of the marginal densities since
independence between the Xi?s is not a workable assumption in our case. The time series
model appears in the class of self decomposable distributions.
Deflnition 3.10 A r.v. X is called self decomposable if for every 0 < a < 1, there exists a
r.v. Z = Z(a) independent of X such that X d= aX +Z:
56
Some desirable properties follow:
? Self decomposable r.v.?s are contained in the class of inflnitely divisible r.v?s. See
Feller (1971) [25].
? The time series version is Xt = aXt?1 +Zt where Zt are independent of each other of
of Xs; s < t.
? If a ! 0, then Xt are independent. If a ! 1, then Xt are perfectly dependent.
Joe (1997) [44] chapter 8, discusses some special cases.
? If X is exponential with mean 1, then Z is a mixture of a point mass with weight a
and the exponential with mean 1.
? If X is an Erlang distribution Ga( ;1),  2 N, then Z is again a mixture Z =8
><
>:
0; with prob. a ;
Erlang Ga(j;1); with prob. pj
where pj; 1 ? j ?  ; are prob. from binomial ( ;1?a).
? If X is Ga( ;1),  2R?N, then ?t =
8>
<
>:
0; with prob. a ;
Ga(j,a); with prob. pj;
where pj; j ? 1; are prob. from negative binomial ( ;a), pj = ?( +j)j!?( ) a (1?a)j:
The class of linear relations is quite large, and may contain unobservable factors.
Among the alternatives, we consider the compound autoregressive model given by X =
(Xt)t?0:
Since the beginning of time series analysis, one popular model assume that the variable
Xt of the process is related to its past development and on a random disturbance sequence
57
Zn which is not related to the past. The process is called a Markov process if the densities
of Xt are normally distributed.
Deflnition 3.11 The process X is called a compound autoregressive process of order p,
denoted CAR(p) if and only if the conditional distribution of Xt given Xt?1 admits the
conditional Laplace form:
E[e?sXtjXt?1] = e?a1Xt?1?a2Xt?2?????apXt?p+b
where ai = ai(s) i = 1;:::;p and b = b(s) are functions of the admissible values s in R:
Our goal is to represent the data based on our probability model (1.2) and on this
family of model (1.1). Then it becomes possible to estimate parameters, check for goodness
of flt, and use our understanding of the mechanism to generate adjustments.
We assume that Xt = aXt?1 +Zt;t = 1;2;::: where Zt are independent, and also each
Zt is independent of Xt0 for each t0 < t.
If Z = fZtgt were normally distributed with mean 0 and variance  2, then the model
would be an autoregressive model, denoted AR(1), and the Zt are commonly referred to as
white noise. The process would then be stationary linear if jaj < 1:
Here, we have that:
Xt = aXt?1 +Zt = a(aXt?2 +Zt?1)+Zt = a2Xt?2 +aZt?1 +Zt
= ...
= atX0 +at?1Z1 +at?2Z2 +???+aZt?1 +Zt
= atX0 +
t?1X
k=0
akZt?k
Therefore ?t = EXt = atEX0 +Pt?1k=0 akEZt?k
58
and Var(Xt) = a2tVar(X0)+Pt?1k=0 a2kVar(Zt?k):
Also, it follows that:
Cov(Xt+k;Xt) = Cov(aXt+k?1 +Zt;Xt) = aCov(Xt+k?1;Xt)
= a2Cov(Xt+k?2;Xt) = ??? = akCov(Xt;Xt)
= akVar(Xt)
The dependence decreases with the lag. That is, as k increases, (Xt+k;Xt) has less depen-
dence. We would like the model to be stationary. The process X = (Xn)(n=0;1;2;???) and its
k shift,  kX = (Xn+k)n=0;1;2;???, share similar similar properties, and we write: X d=  kX:
We say that X is weakly stationary if it satisfles the following two properties:
1. the mean of Xt does not depend on t
2. for each k = 1;2;???; Cov(Xt+k;Xt) does not depend on t:
This weakly stationarity property is achieved only if a = 1, and EZt = 0:
Rather than considering only nonnegative values of t, one could consider a shifted
sequence on Z:
Assume that X0 = aX?1+Z0 and that X?1 = aX?2+Z?1 and so on... It follows that:
Xt = atX0 +
t?1X
k=0
akZt?k = at+1X?1 +
tX
k=0
akZt?k
= ...
=
1X
k=0
akZt?k:
59
Then Xt =
1X
k=0
akZt?k:
For the series X to be stationary, i.e its mean and covariance independent of t, the
required condition is that of absolute summability of the coe?cients fakgk=0;1;2;:::: i.e.
1X
j=0
jakj =
1X
j=0
ak < 1: This is achieved if and only if jaj = a < 1:
Let us denote the coe?cients ak as `k: Then Xt = P1j=0 `kZt?k:
Moreover, if we can express Zt in terms of Xt?s, then we say that the process is in-
vertible. This is achieved, and we can express Zt =
1X
j=0
?jXt?j; where ?0 = 1; ?1 =
?a; and ?j = 0 for j > 1:
The compound autoregressive process deflned earlier in terms of the Laplace transform
with autoregressive path dependence can be expressed as:
E[e?sXtjXt?1] = e?asXt?1?sZt
and the log of the conditional Laplace transform is a linear function of Xt?1:
3.5 Summary
In this chapter, we have generated a general class of multivariate gamma distributions.
We call it generalized gamma distribution, not because the marginals are the generalized
gamma distribution as deflned in Kotz et al. (2000) [53], but because it generates two very
distinct and broad subclasses of distributions:
? Thecontinuousclassmodel, wherethelatentvariableisofthesamegammafamily. We
study that class in Chapter 4. This class has no discontinuities in the joint densities.
60
? The fatal shock model, where all marginal distributions are gamma, but all the latent
variables are gamma mixtures, and there are discontinuities due to the probability of
simultaneous occurrence. In Chapter 5, we study that model under the exponential
case, because of the wide use and applications of the exponential distributions.
61
Chapter 4
The Continuous Case Model
This chapter studies the continuous class of the model given in Chapter 3. The latent
variables are all continuous, and of speciflc same family gamma type distributions. Our goal
is to flnd the joint multivariate density of the latent variables Zi?s deflned in (1.1), assuming
that the conditions of Theorem 3.3 are satisfled, that is the Zi?s are independent gamma
distributed for i = 1;2;::;p. Nadarajah and Kotz (2005) [72] describe the distribution of
the linear combinations of exponential and gamma type r.v.?s. Our approach is difierent
as the mariginal distributions are specifled, and the coe?cients ai in (1.1) are chosen such
that the Zi?s are of the same family of gamma distributions, for i = 1;2;::;p. Methods of
estimations of parameters associated with the latent variables are also proposed.
4.1 Deflnition of Model
Mathai and Moschopoulos (1991) [63] considered a multivariate gamma model where
the dependence structure is obtained by adding a factored common r.v. to every univariate
marginal, exploiting the additive property. In their following paper [64], they consider an
additive form of gamma distributions with common scale parameter. In Kotz et al. (2000)
[53, page 468], a model in reliability applications is described as a system where a failing
component is replaced by an identical one so that the process never stops until the total
items in the system fail.
The Deflnition 3.1 is in the spirit of the Mathai and Moschopoulos (1991) [63] mul-
tivariate gamma with a  exible dependence structure for non-symmetric and nonnegative
62
support. It provides a model flt for many situations. A slightly similar idea has been sug-
gested by Hougaard (1986) [35], where the class of multivariate lifetime distributions had
a dependence created by an unobserved quantity. We next recall the deflnitions and most
important properties of the multivariate gamma distribution in the sense of Mathai and
Moschopoulos (1991) [63]. The distribution of the sum with any shape and scale parame-
ters is also analyzed.
Note that X in Deflnition 3.1 does not necessarily follow the multivariate gamma
unless we pick appropriate values for the ai; i = 1;2;???;p; as it is proved in Mathai and
Moschopoulos (1991) [63] and (1992) [64].
If fi0 ! 0, then X0 ! 0 a:s:, and then the multivariate gamma consists of n indepen-
dent gamma distributions.
Carpenter et al. (2006) [12] marginally deflned the Xi?s as exponential, and showed that
Zi is then a product of a point mass and an exponential distribution. Despite the theoretical
universality, the difierence is in the concept. They start from a bivariate distribution, say
X1 and X0, and derive the conditional distribution of the associated Z1, whereas in most
general situations, one describes the joint distribution of X0 and X1 with Z1 known.
Mathai and Moschoupolos (1992) [64] showed that for X0 = Z0;Z1;Z2;???;Zp; mutu-
ally independent gamma r.v.?s with parameters such that X0 = Z0 ? Ga(?0;?0;fi0) and
Zi ? Ga
 
?i ? fi0?
0
?i;?i;fii ? fi0
?
for i = 1;???;p, then Xi satisfying Xi = aiX0 + Zi for
ai = ?i?0, are also gamma distributed as Xi ? Ga(?i;?i;fii). This result is now in this
dissertation an application of Theorem 3.3. We next give some properties and the form of
the multivariate density function.
63
4.2 Properties
Some important features, as in Section 4.1, follow from Equation (3.8) and Equation
(3.9) of the Laplace transform LX:
LX(s) = e??0
Pp
i=1 aisi
 ?
0
?0 +Ppi=1 aisi
?fi0 pY
i=1
LXi(si)
LX0(aisi):
And then, it is su?cient to explain the nature of LXi(si)L
X0(aisi)
: We next present two
important theorems of this chapter.
Theorem 4.1 The joint pdf of the multivariate gamma X = (X1;X2;???;Xp) based on
Deflnition 3.1, when Xi ? Ga(?i;?i;fii), i = 0;1;???;p is explicitly given in terms of
hypergeometric series ( the Lauricella function of type B) in the special case where Zi;1 ?
i ? p, are gamma type.
Proof:
We follow the ideas in Mathai and Saxena (1978) [65].
Assume that Zi ? Ga(?0i;?0i;fi0i) for i = 1;2;???;p: Using the idea from Mathai and
Moschopoulos (1991) [63], the joint density of X0;Z1;???;Zp is given as:
f(x0;z1;???;zp) = C(x0 ??0)fi0?1expf??0(x0 ??0)g
?
pY
i=1
(zi ??0i)fi0i?1expf??i(zi ??0i)g
where C = ?
fi0
0
?(fi0)
Qp
i=1
?0fi
0i
i
?(fi0i)
64
Even in the simplifled case, integrating out x0 is possible only in very special cases as
it is noted in Krzanowski and Marriot (1994) [54].
Now putting zi = xi ?aix0, we have that the joint density of (X0;X1;X2;???;Xp) is
given by:
g(x0;x1;???;xp) = C(x0 ??0)fi0?1expf??0(x0 ??0)g
?
pY
i=1
(xi ?aix0 ??0i)fi0i?1expf??0i(xi ?aix0 ??0i)g:
Set u0 = x0 ??0 and ui +?0 = 1a
i
(xi ??0i) for 1 ? i ? p:
Then the density of can be written as:
g(u0;u1;???;up) = C
pY
i=1
afi0i?1i ufi0?10 expf??0u0)g
?
pY
i=1
(ui ?u0)fi0i?1expf??0iai(ui ?u0);g
where 0 < u0 < minfu1;???;upg
The joint density of (u1;???;up) is available by integrating out u0. Hence
g(u1;???;up) = C1
Z minfu1;???;upg
0
ufi0?10 expf??0u0)g(u1 ?u0)fi01?1???(up ?u0)fi0p?1
?expf?[?01a1(u1 ?u0)+???+?0pap(up ?u0)]gdu0
where C1 = C
pY
i=1
afi0i?1i :
65
It follows that the density depends on the difierent forms of the k! possible orderings
of u1;u2;???;up. For example u1 < u2 < ??? < up: We have
g(u1;???;up) = C1f
pY
i=1
ufi0i?1i g
Z u1
0
ufi0?10 expf??0u0)g(1? u0u
1
)fi1?1???(1? u0u
p
)fip?1
?expf?[?01a1u1(1? u0u
1
)+???+?0papup(1? u0u
p
)]gdu0:
Set y = u0u
1
: Then the above is expressed as:
g(u1;???;up) = C1f
pY
i=1
ufi0i?1i gufi01
Z 1
0
yfi0?1expf??0u1y)g
= ?(1?y)fi01?1 ?(1? u1u
2
y)fi02?1???(1? u1u
p
y)fi0p?1
= ?expf?[?01a1u1(1?y)+?02a2u2(1? u1u
2
y)???+?0papup(1? u1u
p
y)]gdy:
Expanding the exponentials in series form, we obtain:
g(u1;???;up) = C1f
pY
i=1
ufi0i?1i gufi01
?
1X
r0
1X
r1
???
1X
rp
(??0u1)r0
r0!
(??01a1u1)r1
r1! ???
(??0papup)rp
rp!
?
Z 1
0
yfi0+r0?1(1?y)fi01+r1?1 ?(1? u1u
2
y)fi02+r2?1???(1? u1u
p
y)fi0p+rp?1dy
= C1f
pY
i=1
ufi0i?1i gufi01 ? ?(fi0 +r0)?(fi
01 +r1)
?(fi0 +r0 +fi01 +r1)
?
1X
r0
1X
r1
???
1X
rp
(??0u1)r0
r0!
(??01a1u1)r1
r1! ???
(??0papup)rp
rp!
?FD(fi0 +r0;fi02 +r2;???;fi0p +rp;fi0 +r0 +fi01 +r1; u1u
2
;???; u1u
p
);
where FD is sometimes referred to as the Lauricella function of n variables.
66
The theorem is proved when substituting back xi in terms of ui.
Recall that Lauricella FD; or more precisely F(n)D ; is a function of n variables and n+2
parameters deflned by the power series as:
F(n)D (a;b1;???;bn;c;x1;???;xn) =
1X
m1=0
???
1X
mn=0
(a)m1+???+m2(b1)m1 ???(bn)mn
(c)m1+???+mnm1!???mn! x
m1!
1 ???x
mn
n :
We have a characterization in terms of an inflnite sum of distributions. The question
of how to generate variates from this distribution is then raised. Also relevant is how to
implement a flt for this distribution. To answer these questions, a further study is then
needed. We want to obtain another representation although we can tell by now that there
is no unique way of expressing the joint density of a set of correlated gamma marginals.
Note from the above theorem, relations between means and variances of Xi and Zi can
be obtained. In particular,
?0i = ?i ?ai?0 and fi
0i
?0i =
fii
?i ?ai
fi0
?0;
fi0i
?0i2 =
fii
?2i ?a
2
i
fi0
?20;
Hence
?0i =
fii
?i ?ai
fi0
?0
fii
?2i ?a
2i fi0
?20
and
fi0i = (
fii
?i ?ai
fi0
?0)
2
fii
?2i ?a
2i fi0
?20
:
67
However, in order to obtain a better parameter estimates, a more tractable version of
the joint density is desired.
The notion of frailty is used as a measure of general random efiect. Vaupel et al.
(1979) [78] gave a description of frailty as a measure of susceptibility to all causes of death
in heterogeneous populations, with applications assuming gamma distributions, based on
the advantages of the gamma in the simple forms of its density in Equation (1.2) and its
LST in Example 2.2.
In the particular case where X1 ? Ga(?1 = 0;?1;fi1 = 1) and X2 ? Ga(?2 =
0;?2;fi2 = 12), the density of the sum X = X1 +X2 can be expressed as:
fX(x) =
p
?1?2e?(?1+?2)2 xI0[(?2 ??1)2 x];
where I0 is the Bessel function of flrst kind and zero order
I0(t) = 1+ t
2
22 +
t4
22 ?42 +
t6
22 ?42 ?62 +???:
The result of the convolution of n independent gammas, is given by Moschopoulos
(1985) [70], and is as follows:
fX(x) =
Qn
i=1 ?
fii
i
?(Pni=1 fii)x
Pn?1
i=1 fii?1e??nx
1X
k=0
bn(k)(Pn?1i=1 fii)
k!(Pni=1 fii)k [(?n ??n?1)x]
k
where the coe?cients bi are obtained recursively by:
bi =
8>
<
>:
1; i = 2;
Pk
j=0
bi?1(Pn?2p=1 fip)j(?k)j
j!(Pi?1p=1 fip)j C
j
i ; i = 3;4;???;n,
with Ci = ?i?2 ??i?1?
i ??i?1
:
Moschopoulos (1985) [70], and Furman and Landsman (2005) [28], give an interpretation
of the distribution of X as a sum of gamma distributions the location being at the origin.
68
Theorem 4.2 The distribution of the sum X is a mixed gamma with mixing shape para-
meter given by fi+K and scale parameter ?max = maxf?1;???;?pg where fi = Ppi=1 fii and
K is a non negative integer r.v. with distribution
pk = f(k) = Cdk; k = 0;1;2;:::
C = Qpi=1 ( ?i?max)fii and d0 = 1; dk = 1k Pki=1Ppj=1 fij(1? ?j?max)idk?i
i.e f(x) = P1k=0 pkfYk(x); Yk ? Ga(?max;fi+k)
Proof: See Furman (2005) [28].
In the case of the sum of two gamma X1 ? Ga(?1;fi1) and X2 ? Ga(?2;fi2), the
distribution of X = X1 +X2 is given as:
f(x) = P1k=0 pkfYk(x); pk = Cdk; C = ?
fi1
1 ?
fi2
2
?fi1+fi2max
d0 = 1; d1 = fi1(1? ?1?max)+fi2(1? ?2?max)
d2 = d12 [fi1(1? ?1?max)+fi2(1? ?2?max)]+ d02 [fi1(1? ?1?max)2 +fi2(1? ?2?max)2]
dk = 1k Pki=1Ppj=1 fij(1? ?j?max)idk?i; for k ? 3:
Yk ? Ga(?max;fi1 +fi2 +k):
The form of the exact distribution Theorem 4.1, and in Theorem 4.2, in an inflnite
series, is quite complicated in practical situations. Because the sum involves an inflnite
number of distributions, there is a need in developing a mathematically tractable theory
and result. We consider some results in the gamma cases, and explain the case of the
exponential distribution.
4.3 Estimation Procedures
Because of the complexity of the likelihood, no MLE can be done in the usual way
in estimating model parameters. Alternatives for the MLE have been suggested by many
69
authors such as Cohen and Whitten (1986) [16] and Cohen and Whitten (1988) [17]. They
used method called modifled moments estimators or modifled MLE. Our approach builds
the probability criteria from the EM algorithm. It can be used to compute the MLE
for incomplete data. Disregarding the missing data is usually not safe. After separating
the data into missing and non missing, we start by taking a guess of parameters within
a corresponding domain. The idea is to replace the missing value by its expectation or
predicted score given the starting guessed parameter values. We use an expected value
of the likelihood function, computed using su?cient statistics, in turn computed from the
data, and maximize the likelihood function to obtain new parameter estimates. From the
famous paper by Dempster et al. (1977) [22], the basic EM theorem states that improving
the expected log likelihood leads to an increase in the likelihood itself l( jX) = logL( jX).
The data X = (Xobs;Xmis) has 2 parts: an observed part denoted as Xobs and a missing
part we denote as Xmis and is assumed to follow a gamma distribution with unknown
parameter  , i.e. its pdf is expressed as in (1.2).
The idea is to flll in the missing values and iterate. Each iteration is done in two steps:
the expectation (or E step) and the maximization (or M step).
At the expectation or E step, we flnd the conditional expectation of the log likelihood
function for the complete data denoted here as X given the observed data say Xobs, i.e.
Q( j (k)) = E
?
logf(Xj )j (k);Xobs
?
:
The next step of the EM algorithm, the M-step, is to maximize the Q function; i.e. we
flnd the parameter  =  k+1 maximizing Q( j (k)). The EM algorithm guarantees that the
likelihood function is increased by increase of the function Q:
70
The EM algorithm has a generalization called generalized EM (GEM), where in the M
step, one flnds only a value  such that Q( j (k)) is strictly increasing in k:
The density in our example is of multivariate type. Using ideas from Mathai and
Moschopoulos (1991) [63], the likelihood function can be thought as a sum of p terms. One
idea is to maximize each term separately.
The data vectors Xj = (x1j;x2j;???;xpj) 2Rp for j = 1;???;n are observed and what
are missing (unobserved) is the group of unitsX0 = (x01;x02;???;x0n). We wish to calculate
the ML estimates of  :
There are pn+n = n(p+1) observations, and among them pn are observed and n are
missing.
Then setting  = (?0;?0;fi0;?1;?1;fi1;???;?p;?p;fip)0; a vector of 3(p+1) parameters,
the log likelihood is
l( jX) = log
nY
j=1
g(x0j;x1j;???;xpjj ) =
nX
j=1
log[g(x0j;x1j;???;xpj)j ]
= nlogC +
nX
j=1
log[(x0j ??0)fi0?1expf??0(x0j ??0)g
?
pY
i=1
(xij ?aix0j ??0i)fi0i?1expf??0i(xij ?aix0j ??0i)g]
= nlogC +
nX
j=1
[(fi0 ?1)log(x0j ??0)??0(x0j ??0)
+logf
pY
i=1
(xij ?aix0j ??0i)g+logf
pY
i=1
e??0i(xij?aix0j??0i)g]
= nlogC +(fi0 ?1)
nX
j=1
log(x0j ??0)??0
nX
j=1
(x0j ??0)
+
nX
j=1
pX
i=1
(fi0i ?1)log(xij ?aix0j ??0i)?
nX
j=1
pX
i=1
?0i(xij ?aix0j ??0i);
71
where C =
pY
i=0
?fiii
?(fii) and logC =
pX
i=0
?
fii log(?i)?log(?(fii))
 
:
The log likelihood is not linear in the data. The sample and arithmetic means are
jointly su?cient and complete statistics for the parameters of the data. See Casella and
Berger (1990) [13]. They are functions of:
nX
j=1
xij; and
nX
j=1
xijxi0j; i;i0 = 1;???;p:
We wish to calculate
E[x0jjx1j;x2j;???;xpj]; j = 1;???;n
EX0j[log(x0j ??0)]; j = 1;2;???;n
EX0[xij ?aix0j ??i]; i = 1;2;???;p; j = 1;2;???;n
EX0[logfxij ?aix0j ??ig]; i = 1;2;???;p; j = 1;2;???;n
Cov(x0jjx1j;x2j;???;xpj; ^?; ^?); j = 1;???;n
Using the same idea as the Bayes theorem, which states that
f(x0jx) = f(x0;x)f(x) = f(x0)f(xjx0)f(x) ;
we have that the conditional density of x0j given x = (x1j;x2j;???;xpj)0 for j = 1;2;???;n;
is given by:
f(x0jjx1j;x2j;???;xpj) = (x0j ??0)
fi0?1e??0(x0j??0)G(x0j;x)
Rxmin
j?
0 (x0j ??0)
fi0?1e??0(x0j??0)G(x0j;x)dx0j
where
? the denominator represents the joint distribution of the jth response vector,
? xminj = mini=1;2;???;pfxijai g,
72
? G(x0j;x) = Qpi=1(xij ?aix0j ??0i)fi0i?1e??0i(xij?aix0j??0i):
It is then not possible to just have the conditional density of the observed data. We
cannot separate the component likelihood of the known parameter from the variance com-
ponent parameter. This complexity of the conditional density of x0j given x1j;x2j;???;xpj
for j = 1;2;???;n; (not available in closed or simple form), prevents us from using the
EM algorithm. The problem is in the integration. To go around this problem, we could
approximate the integral or we could replace x0j by a reasonable estimate. Based on the
following relations,
xij = aix0j +zij () x0j = xij ?zija
i
; for i = 1;2;???;p; and j = 1;2;???;n:
Considering the minimum of the x0j is a natural idea one could use, i.e.
x0j = mini=1;::;p
?x
ij ?zij
ai
 
:
In other words, the x0j?s are estimable based on the nature of the structure. And hence the
parameters ?0;?0;fi0 are then estimated.
Now replace zij by E(zij) = fii?
i
?aifi0?
0
+?i ?ai?0. Then
x0j = mini=1;::;p
?x
ij?
fii
?i??i
ai
 
+
 
fi0
?0 +?0
?
:
So
E(x0j) = mini=1;::;p
?x
ij ? fii?i ??i
ai
 
+
 fi
0
?0 +?0
?
:
Elog(x0j ??0) = log
?
mini=1;::;p
?x
ij ? fii?i ??i
ai
 
+
 fi
0
?0
??
:
Elog(xij ?aix0j ??i) = log
?
xij ?ai mini=1;::;p
?x
ij ? fii?i ??i
ai
 
+
 fi
0
?0 +?0
?
??i
?
:
73
Hence,
Q( j (k?1)) = E
?
l( j (k?1);x)
 
= nlogC +(fi0 ?1)
nX
j=1
Elog(x0j ??0)??0
nX
j=1
E(x0j ??0)
+
pX
i=1
nX
j=1
(fii ?1)Elog(xij ?aix0j ??i)?
pX
i=1
nX
j=1
?iE(xij ?aix0j ??i)
= nlogC +(fi0 ?1)
nX
j=1
log
?
mini=1;::;p
?x
ij ? fii?i ??i
ai
 
+ fi0?
0
?
??0
nX
j=1
?
mini=1;::;p
?x
ij ? fii?i ??i
ai
 
+ fi0?
0
?
+
pX
i=1
nX
j=1
(fii ?1)log
?
xij ?ai( mini=1;::;p
?x
ij ? fii?i ??i
ai
 
+
 fi
0
?0 +?0
??
??i
?
?
pX
i=1
nX
j=1
?i
?
xij ?ai
 
mini=1;::;p
?x
ij ? fii?i ??i
ai
 
+
 fi
0
?0 +?0
??
??i
?
;
where  (k?1) = (fi(k?1)0 ;?(k?1)0 ;?(k?1)0 ;fi(k?1)1 ;?(k?1)1 ;?(k?1)1 ;???;fi(k?1)p ;?(k?1)p ;?(k?1)p )0:
Then using maximum likelihood, flnd
 (k) = (?(k)0 ;?(k)0 ;fi(k)0 ;?(k)1 ;?(k)1 ;fi(k)1 ;???;?(k)p ;?(k)p ;fi(k)p )0
that maximizes Q( j (k?1)):
So in here, the parameters are computed iteratively. The following steps are suggested
to get the MLE?s of the parameters:
? step 1: update the unknown values x0j for each iteration along with the parameters
of interest.
? step 2: flnd estimates for ?0;?0 and fi0:
? step 3: flnd estimates for ?i;?i and fii for 1 ? i ? p:
74
Continue the process at to convergence, when there is no much difierence gained in the
next iteration. The vector of parameters  = (?0;?0;fi0;?1;?1;fi1;???;?p;?p;fip)0 has two
components:  = ( 1; 2)0 where  1 = (?0;?0;fi0)0 and  2 = (?1;?1;fi1;???;?p;?p;fip)0 is
estimated by MLE for  1 flxed.
We continue the iterative process until some stopping criteria is reached. Although
the process seems very similar to the EM algorithm, it is not based on the conditional
distribution. It seems nevertheless very related to the EM algorithm. The estimation
is performed by a generalization of the EM algorithm which Tanner (1996) [77] called
the Expectation-Solution (ES) algorithm. For the ES algorithm, the M-step of the EM
algorithm is replaced by a step requiring the solution of an estimating equation, which
does not necessarily correspond to a maximization problem. The ES algorithm is a general
algorithm for solving equations with incomplete data. These authors used this algorithm
to flt a mixture model for independent and correlated data. The use of mixture of densities
gives more  exibility to the model the distribution. With this approach, we have a method
to calculate the variance-covariance matrix of the parameters.
The Markov Chain Monte Carlo (MCMC) allows the simulation of many complex and
multivariate types of random data. It also deals with calculating expectations, flnding
conditional expectations for maximizing likelihood in missing data patterns, predicting the
missing data based on a normal distribution. Although the assumption of normality is not
always a negative aspect (specially when the missing information is not too large), this new
estimation technique can provide better answers as the values x0 are not missing at random.
To understand a statistical model, one idea would be to simulate many realizations
from that model, and study it. For example, consider a r.v. X with pdf f(x) and a
75
function g such that E(g(X)) =
Z
X
g(x)f(x)dx is di?cult to integrate. The integral can be
approximated after n realizations, x1;x2;???;xn; of X by the following sum:
1
n
nX
i=1
g(xi):
In our example, the joint density has the form f(x) = f(x0;x1;x2;???;xp), and we are
interested in some features of the marginal density. Assume we can sample the p+1 many
univariate conditional densities:
f(x0jx1;x2;???;xp)
f(x1jx0;x2;???;xp)
f(x2jx0;x1;x3;???;xp)
...
f(xpjx0;x1;x2;???;xp?1):
Indeed
f(xkjx0;x1;???;xk?1;xk+1;???;xp) = f(x0;x1;???;xp)f(x
0;x1;???;xk?1;xk+1;???;xp)
= ?
fik
k
?(fik)(xk ?akx0)
fik?1e??k(xk?aix0)
which is a gamma distribution with location akx0 and parameters fi0k and ?0k denoted here
as fik and ?k; respectively.
So choose arbitrarily p initial values: x1 = x(0)1 ;x2 = x(0)2 ;???;xp = x(0)p :
Then create
? x(1)0 by drawing from f(x0jx(0)1 ;???;x(0)p )
? x(1)1 by drawing from f(x1jx(1)0 ;x(0)2 ;???;x(0)p )
76
? x(1)2 by drawing from f(x2jx(1)0 ;x(1)1 ;x(0)3 ;???;x(0)p )
? ...
? x(1)p by drawing from f(xpjx(1)0 ;x(1)1 ;x(1)2 ;???;x(1)p?1)
This is one Gibbs pass through the p+1 conditional densities that gives values
(x(1)0 ;x(1)1 ;x(1)2 ;???;x(1)p ):
Thus, we can take the last n of x0 values after many Gibbs passes and we set that:
E(X0) = 1n
m+nX
i=m
xi0:
The Monte Carlo EM algorithm is an important sampling technique to generate random
variates to construct Monte Carlo approximations.
Assaidearlier, theidentiflcationofparametersisanimportanttopic. Wehaveproposed
a novel procedure to identify the parameters which uses ideas of the EM algorithm, with a
computation cost not so large.
77
Chapter 5
The Multivariate Exponential
The multivariate exponential plays an important role in survival and reliability analysis.
Marshall and Olkin (1967) [61], Joe (1997), Ghosh and Gelfand (1998) [30], and Hougaard
(2000) [36], to mention a few authors, gave several examples to motivate this research
problem. In this chapter, we focus on the multivariate exponential distribution. More
speciflcally, weconsiderthep?variateX1;X2;???;andXp beflxedmarginallyasexponential
r.v.?s with hazard rates ?1;?2;???; and ?p, respectively. Then by introducing two types of
latent non-negative variables, X0 and Z1;Z2;???; and Zp, statistically independent between
themselves and of X0, a linear relationship is formed between X0 and X1;X2;???;Xp as in
(1.1). Note that this structure was introduced in Chapter 3, and is a special family within
the generalized gamma distribution.
We focus on the exponential class of distributions because of its importance in the
literature. This class is large, and includes the continuous (examined in Chapter 4) and
discontinuous cases. We also study the bivariate case approach where X0 is unobserved,
and missing, and lay out the joint density function along with the joint survival function.
We also developed estimators for the parameters associated with the model. Carpenter et
al. (2006) [12] deflned a similar approach at the univariate level. They characterize through
Laplace transforms, the distribution of the latent variable in the exponential case as mixture
of a point mass at zero and an exponential with hazard rate ?i as shown in Corollary 3.6.
Note that when Zi = 0; there is a positive probability that Xi is proportional to X0 with
proportionality constant ai, i.e. P(Xi = aiX0) > 0, i = 1;:::;p.
78
We show that our model possesses the property of conditional independence in Def-
inition 1.7 and Equation (1.13), given a random latent efiect. Hougaard (2000) [36] calls
this random latent efiect frailty. Frailty models are common in describing the dependence.
See Vaupel et al. (1979) [78], Hougaard (2000) [36], and Henderson and Shimakura (2003)
[34]. Hougaard (1986) [35] and Hougaard (2000) [36] consider the random latent efiect
as parameter, and we are treating it as latent variable. As described in Carpenter et al.
(2005) [12], the joint distribution is not absolutely continuous. The multivariate survival
data of the experiment gives multiple events and involves several members or components
in a system.
The multivariate lifetime distributions by Hougaard (2000) [36], has a dependence
createdbyanunobservablequantity. Thisisatypicalscenarioweadoptherealso. Hougaard
(1986) [35] proposed a continuous multivariate lifetime distribution where the marginal
distributions are Weibull (continuous) and does not allow the property of simultaneous or
proportional failures of individuals or components.
5.1 The Multivariate Exponential
In this section, we give our version of the p?variate exponential distribution through
Deflnition 5.1. We study its various properties, and characterize its density function.
Deflnition 5.1 Let X0;X1;???;Xp be exponential r.v.?s as in (1.3) with scale parameters
?i; 0 ? i ? p: Let Zi; i = 1;::;p; be independent r.v.?s satisfying (1.1). We deflne the joint
distribution of X = (X1;X2;???;Xp) as the p?variate exponential distribution.
79
From Equation (3.7), the mean of X is given as
E(X) =
 1
?1;
1
?2;???;
1
?p
?0
; (5.1)
and the variance/covariance matrix is given as
? = 1?2
0
0
BB
BB
BB
BB
B@
?20
?21 a1a2 ??? a1ap
a2a1 ?20?2
2
??? a2ap
... ... ... ...
apa1 ??? apap?1 ?20?2
p
1
CC
CC
CC
CC
CA
: (5.2)
From (5.2), the correlation matrix is given as
? = 1?2
0
0
BB
BB
BB
BB
B@
?20 a1a2?1?2 ??? a1ap?1?p
a2a1?2?1 ?20 ??? a2ap?2?p
... ... ... ...
apa1?p?1 ??? apap?1?p?p?1 ?20
1
CC
CC
CC
CC
CA
:
Note that since we consider the exponential case, the shape parameters fii = 1 in (1.2)
for i = 1;2;::;p. From Theorem 3.4 and Equation (1.1), the LST of Zi is
LZi(s) = pi +(1 ?pi)LXi(s); with pi = ai ?i?
0
; i = 1;2;::;p:
That is Zi is a mixture of Bernoulli r.v. with probability pi and exponential r.v. with
parameter ?i as in Example 2.4.
80
From Equation (3.11), the conditional survival function of Xi given X0, is obtained as
S(tjx0) = P(Xi > tjx0) = P(aix0 +Zi > tjx0) = P(Zi > t?aix0jx0)
=
Z 1
t?aix0
fZi(z)dz =
Z 1
t?aix0
?
pi?(z)+(1?pi)?ie??iz
?
dz
= pi
Z 1
t?aix0
?(z)dz +(1?pi)
Z 1
t?aix0
e??izdz
= pi(1?H(t?aix0))+(1?pi)e??i(t?aix0); i = 1;::;p;
where H is the Heaviside function deflned in Equation (2.3).
From the conditional independence property and (1.13), the joint conditional survival
function is
S(x1;???;xpjx0) =
pY
i=1
?
pi(1?H(xi ?aix0))+(1?pi)e??i(xi?aix0)
?
: (5.3)
The distribution of the minimum lifetime distribution X(1) = min
?X
1
a1 ;
X2
a2 ;???;
Xp
ap
 
can be derived directly from (5.3) and from properties of the Heaviside function in Equation
(2.3). It is given as
P(X(1) > tjx0) =
pY
i=1
P
 X
i
ai > tjx0
?
=
pY
i=1
P(Xi > aitjx0)
=
pY
i=1
?
pi(1?H(xi ?x0))+(1?pi)e??iai(xi?x0)
?
:
Note that, as we should expect,
S(0;0;???;0) = 1 and S(1;1;???;1) = 0:
81
The Equation (5.3) above is quite interesting and could be deduced intuitively from
the nature of the conditional distribution. It also captures the method of copula that many
authors have suggested. According to that method, the joint survival function of X1;???;Xp
is represented as:
S(x1;x2;???;xp) = C(P(X1 > x1);P(X2 > x2);???;P(Xp > xp))
= C(SX1(x1);SX2(x2);???;SXp(xp));
where C(u1;u2;???;up) is a copula: a function mixing the univariate survival function u1;u2
and up. For example, Hougaard (2001) [37] proposed the following copula function C at
the bivariate level:
C(u1;u2) = exp[?f(?lnu1)fl +(?lnu2)flg1=fl]
with fl ? 1 coe?cient of association. The case fl = 1 corresponds to the independence
between u1 and u2.
There are some problems associated with the form of the copula, as it can lead to
misrepresentation of the association. The false estimation of fl could then lead to false
estimates for the parameters of interest.
Taking the derivative of the ith survival function in Equation (5.3), and using Equation
(2.3), the conditional density is given by
fXijX0(t) = pi?(t?aix0)+(1?pi)?ie??i(t?aix0); i = 1;::;p: (5.4)
82
and from (5.4), the conditional expectation is
EXijX0(Xi) =
Z 1
0
tfXijX0(t)dt
= pi
Z 1
0
t?(t?aix0)dt+(1?pi)
Z 1
aix0
?ite??i(t?aix0)dt
= piaix0 +(1?pi)( 1?
i
+aix0) = aix0 +(1?pi) 1?
i
; i = 1;::;p:
Note that taking the expectation of the above with respect to X0 gives
EXi = EX0EXijX0(Xi) = EX0[piaix0 +(1?pi)( 1?
i
+aix0)]
= pi ai?
0
+(1?pi)( 1?
i
+ ai?
0
)
= ai?
0
+(1?pi) 1?
i
= 1?
i
; i = 1;::;p;
since pi = ai ?i?
0
; conflrming earlier results in Equation (5.1) and in Carpenter et al. (2006)
[12].
Although the joint density of (X0;X1;X2;???;Xp)0 is easy to flnd (see Equation in
proof of Theorem 1.8), the density of (X1;X2;???;Xp)0 is not. However, we can study the
density of (X0;X1;X2;???;Xp)0 through the latent variables Z1;Z2;???;Zp; with relative
ease. Using the independence of the Zi?s, i = 1;2;::;p, between each other and of X0, and
Theorem 1.8, we have that:
f(x0;x1;???;xp) = ?0e??0x0
pY
i=1
?
pi?(xi ?aix0)?(1?pi)?ie??i(xi?aix0)I(xi>aix0)
?
; (5.5)
83
To demonstrate the complicated form of the above density, we give the explicit form
in the case of three r.v.?s X1 = a1X0 +Z1, X2 = a2X0 +Z2, and X3 = a3X0 +Z3. Hence,
Equation (5.5) becomes:
f(x0;x1;x2;x3) = f(x0)f(x1 ?a1x0)f(x2 ?a2x0)f(x3 ?a3x0)
= f(x0)[p1?(x1 ?a1x0)+(1?p1)fX1(x1 ?a1x0)I(x1>a1x0)]
[p2?(x2 ?a2x0)+(1?p2)fX2(x2 ?a2x0)I(x2>a2x0)]
[p3?(x3 ?a3x0)+(1?p3)fX3(x3 ?a3x0)I(x3>a3x0)]
= f(x0)[p1p2p3?(x1 ?a1x0)?(x2 ?a2x0)?(x3 ?a3x0)
+p1(1?p2)p3?(x1 ?a1x0)?(x3 ?a3x0)fX2(x2 ?a2x0)
+p1p2(1?p3)?(x1 ?a1x0)?(x2 ?a2x0)fX3(x3 ?a3x0)
+p1(1?p2)(1?p3)?(x1 ?a1x0)fX2(x2 ?a2x0)fX3(x3 ?a3x0)
+(1?p1)p2p3?(x2 ?a2x0)?(x3 ?a3x0)fX1(x1 ?a1x0)
+(1?p1)(1?p2)p3?(x3 ?a3x0)fX1(x1 ?a1x0)fX2(x2 ?a2x0)
+(1?p1)p2(1?p3)?(x2 ?a2x0)fX1(x1 ?a1x0)fX3(x3 ?a3x0)
+(1?p1)(1?p2)(1?p3)fX1(x1 ?a1x0)fX2(x2 ?a2x0)fX3(x3 ?a3x0)]:
Coming back to the multivariate form obtained from Equation (5.5), there are 2p terms
in the sum expressed from the power set of P; the set of subsets of f1;2;???;pg: Hence,
f(x0;x1;???;xp) =
pX
jAj=k;A2P
pA(1) ???pA(k)?(xA(1) ?aA(1)x0)????(xA(k) ?aA(k)x0)
?f(x0)(1?pAc(1))???(1?pAc(p?k))
84
?fAc(1)(xAc(1) ?aAc(1)x0)???fAc(p?k)(xAc(p?k) ?aAc(p?k)x0);
where A is a subset of P, with k elements, 0 ? k ? p, denoted as A(1);A(2);???;A(k),
and we then can deduce the subset Ac = f1;2;:::;pgnA; whose elements are denoted as
Ac(1);Ac(2);???;Ac(p?k). Hence,
f(x0;x1;???;xp) =
pX
jAj=k;A2P
pA(1) ???pA(k)?(xA(1) ?aA(1)x0)????(xA(k) ?aA(k)x0)
?(1?pAc(1))???(1?pAc(p?k))
??0?Ac(1) ????Ac(p?k)e??Ac(1)xAc(1) ???e??Ac(p?k)xAc(p?k)
?e?(?0?aAc(1)?Ac(1)?????aAc(p?k)?Ac(p?k))x0:
Therefore the joint density of (X1;X2;???;Xp)0 the multivariate exponential is obtained
by integrating the above expression with respect to x0, giving
f(x1;???;xp) =
p?1X
jAj=k;A2P
pA(1) ???pA(k)
aA(1) ???aA(k) ?(
xA(1)
aA(1) ?
xA(k)
aA(k) )?(
xA(k?1)
aA(k?1) ?
xA(k)
aA(k) )
?(1?pAc(1))???(1?pAc(p?k))
??0?Ac(1) ????Ac(p?k)e??Ac(1)xAc(1) ???e??Ac(p?k)xAc(p?k)
?e
?(?0?aAc
(1)
?Ac
(1)
?????aAc
(p?k)
?Ac
(p?k)
)
xA(1)
aA(1)
+ (1?p1)???(1?pp) ?0?1????p?
0 ?a1?1 ?????ap?p
?e??1x1 ???e??pxp(1?e?(?0?a1?1?????ap?p)?);
where ? = mini
 x
i
ai
?
:
85
Similarly, to get the unconditional survival function, one would derive it from (5.3),
and therefore compute
S(x1;x2;???;xp) =
Z 1
0
pY
i=1
S(xijx0)fX0(x0)dx0
= ?0
Z 1
0
pY
i=1
?
pi(1?H(xi ?aix0))+(1?pi)e??i(xi?aix0)
?
e??0x0dx0:
As we can see, we cannot interchange integration and product in this above expression,
and then there are substantial number of cases, 2p precisely, to consider in order to flnd the
survival function. We discuss the case of p = 2 in the next section and explain each of the
components of the survival in that bivariate case. We also obtain estimates of the unknown
parameters in that case.
5.2 The Bivariate Exponential Model and MLE
In this section, we study the bivariate case of the multivariate exponential distribution
of the previous section. The bivariate exponential distribution from Deflnition 5.1 can be
expressed as:
8
><
>:
X1 = a1X0 +Z1 ;
X2 = a2X0 +Z2 .
(5.6)
Then, from Carpenter et al. (2006) [12], the joint density of (X0;X1) is given by
f(x0;x1) =
8
><
>:
p1fX0(x0)?(x1 = a1x0)
(1?p1)fX0(x0)fX1(x1 ?a1x0)I(x1>a1x0) ,
86
=
8
><
>:
p1?0e??0x0; if x0 = x1a1
(1?p1)?0?1e??0x0e??1(x1?a1x0); if x0 < x1a1 ;
where p1 = a1?1?
0
. Similarly based on the expression X2 = a2X0+Z2, we have for p2 = a2?2?
0
,
f(x0;x2) =
8>
<
>:
p2fX0(x0)?(x2 = a2x0)
(1?p2)fX0(x0)fX2(x2 ?a2x0)I(x2>a2x0) ,
=
8>
<
>:
p2?0e??0x0; if x0 = x2a2
(1?p2)?0?2e??0x0e??2(x2?a2x0); if x0 < x2a2:
Hence, using the independence between X0;Z1;Z2 and Theorem 1.8, the joint density
of (X0;X1;X2) related as in (5.6), is given from (5.5) by:
f(x0;x1;x2) = ?0e??0x0
?
p1?(x1 ?a1x0)+(1?p1)fX1(x1 ?a1x0)I(x1>a1x0)
?
?
?
p2?(x2 ?a2x0)+(1?p2)fX2(x2 ?a2x0)I(x2>a2x0)
?
= p1p2?0e??0x0?(x1?a1x0)?(x2?a2x0)
+p1(1?p2)?0e??0x0fX2(x2 ?a2x0)?(x1?a1x0)
+(1?p1)p2?0e??0x0fX1(x1 ?a1x0)?(x2?a2x0)
+(1?p1)(1?p2)?0e??0x0fX1(x1 ?a1x0)fX2(x2 ?a2x0)I(x1>a1x0;x2>a2x0):
The expression f(x0;x1;x2) is one way to obtain an estimate for x0 or the parameter
associated with it, ?0: Let?s assume that ? = min
 x
1
a1;
x2
a2
?
= x1a
1
? x2a
2
.
87
Also, set r(1)i = I(z1i=0) = I(x1=a1x0) =
8
><
>:
1 , if z1i = 0
0 , if z1i 6= 0;
and r(2)i = I(z2i=0) = I(x2=a2x0) =
8
><
>:
1 , if z2i = 0
0 , if z2i 6= 0:
Then the full likelihood function based on a random sample of size n is the product of
n contributed likelihoods and is given as:
L(?0;?1;?2) =
nY
i=1
?
p1p2?0e??0x0i
?r(1)i r(2)i
?
?
p1(1?p2)?0?2e?(?0?a2?2)x0ie??2x2i
?(1?r(1)i )r(2)i
?
?
(1?p1)p2?0?1e?(?0?a1?1)x0ie??1x1i
?r(1)i (1?r(2)i )
?
?
(1?p1)(1?p2)?0?1?2e?(?0?a1?1?a2?2)x0ie??1x1ie??2x2i
?(1?r(1)i )(1?r(2)i )
=
nY
i=1
?
a1a2?1?2?
0
e??0x0i
?r(1)i r(2)i
?
?
a1?1?2?
0
(?0 ?a2?2)e?(?0?a2?2)x0ie??2x2i
?(1?r(1)i )r(2)i
?
?
a2?1?2?
0
(?0 ?a1?1)e?(?0?a1?1)x0ie??1x1i
?r(1)i (1?r(2)i )
?
??
1?2
?0 (?0 ?a1?1)(?2 ?a2?2)
?e?(?0?a1?1?a2?2)x0ie??1x1ie??2x2i
?(1?r(1)i )(1?r(2)i )
=
 
a1a2?1?2?
0
?Pi r(1)i r(2)i
e??0
P
i x0ir
(1)
i r
(2)
i
?
 
a1a2?1?2?
0
?Pi(1?r(1)i )r(2)i
(?0 ?a2?2)
P
i(1?r
(1)
i )e?(?0?a2?2)
P
i x0i(1?r
(1)
i )r
(2)
i
88
?
 
a1a2?1?2?
0
?Pi r(1)i (1?r(2)i )
(?0 ?a1?1)
P
i(1?r
(2)
i )e?(?0?a1?1)
P
i x0ir
(1)
i (1?r
(2)
i )
?
 ?
1?2
?0
?Pi(1?r(1)i )(1?r(2)i )
e??1
P
i x1i(1?r
(2)
i )e??2
P
i x2i(1?r
(1)
i )
?e?(?0?a1?1?a2?2)
P
i x0i(1?r
(1)
i )(1?r
(2)
i ):
Hence,
L(?0;?1;?2) = a
P
i r
(2)
i1 a
P
i r
(1)
i2 (?1?2
?0 )
ne??0Pi x0ir(1)i r(2)i
(?0 ?a2?2)
P
i(1?r
(1)
i )e?(?0?a2?2)
P
i x0i(1?r
(1)
i )r
(2)
i
(?0 ?a1?1)
P
i(1?r
(2)
i )e?(?0?a1?1)
P
i x0ir
(1)
i (1?r
(2)
i )
e??1
P
i x1i(1?r
(2)
i )e??2
P
i x2i(1?r
(1)
i )
e?(?0?a1?1?a2?2)
P
i x0i(1?r
(1)
i )(1?r
(2)
i ):
Hence the log likelihood is
LL(?0;?1;?2) = log(a1)
X
i
r(2)i +log(a2)
X
i
r(1)i +nlog(?1?2?
0
)
+log(?0 ?a1?1)
X
i
(1?r(2)i )+log(?0 ?a2?2)
X
i
(1?r(1)i )
?(?0 ?a1?1)
X
x0ir(1)i (1?r(2)i )?(?0 ?a1?1)
X
x0i(1?r(1)i )r(2)i
??1
X
i
x1i(1?r(2)i )??2
X
i
x2i(1?r(1)i )
?(?0 ?a1?1 ?a2?2)
X
i
x0i(1?r(1)i )(1?r(2)i );
and @LL@?
0
= ? n?
0
+
P
i(1?r
(2)
i )
?0 ?a1?1 +
P
i(1?r
(1)
i )
?0 ?a2?2
89
?
X
i
x0ir(1)i (1?r(2)i )?
X
i
x0i(1?r(1)i )r(2)i
?
X
i
x0i(1?r(1)i )(1?r(2)i )
= ? n?
0
+
P
i(1?r
(2)
i )
?0 ?a1?1 +
P
i(1?r
(1)
i )
?0 ?a2?2 ?
X
i
x0i(1?r(1)i r(2)i ):
Similarly
@LL
@?1 =
n
?1 ?a1
P
i(1?r
(2)
i )
?0 ?a1?1 +a1
X
i
x0i(1?r(2)i )?
X
i
x1i(1?r(2)i )
So setting @LL@?
1
= 0 gives
P
i(1?r
(2)
i )
?0 ?a1?1 =
n
a1?1 +
X
i
x0i(1?r(2)i )?
P
i x1i(1?r
(2)
i )
a1 ;
and @LL@?
2
= n?
2
?a2
P
i(1?r
(1)
i )
?0 ?a2?2 +a2
X
i
x0i(1?r(1)i )?
X
i
x2i(1?r(1)i ):
So setting @LL@?
2
= 0 gives
P
i(1?r
(1)
i )
?0 ?a2?2 =
n
a2?2 +
X
i
x0i(1?r(1)i )?
P
i x2i(1?r
(1)
i )
a2 :
Now setting @LL@?
0
= 0, and substituting values for
P
i(1?r
(2)
i )
?0 ?a1?1 and
P
i(1?r
(1)
i )
?0 ?a2?2 gives
1
?0 =
1
a1?1 +
1
a2?2 ?
P
ifx2i(1?r
(1)
i )?a2x0i(1?r
(1)
i )g
a2n
90
?
P
ifx1i(1?r
(2)
i )?a1x0i(1?r
(2)
i )g
a1n ?
P
i x0i(1?r
(1)
i r
(2)
i )
n
= 1a
1?1
+ 1a
2?2
?
P
i z1i(1?r
(2)
i )
a1n ?
P
i z2i(1?r
(1)
i )
a2n ?
P
i x0i(1?r
(1)
i r
(2)
i )
n
= 1a
1
( 1?
1
?
P
i z1i(1?r
(2)
i )
n )+
1
a2(
1
?2 ?
P
i z2i(1?r
(1)
i )
n )?
P
i x0i(1?r
(1)
i r
(2)
i )
n :
The above likelihood equations can be used to estimate ?0;?1 and ?2 if the x0i?s,
1 ? i ? n; were known. We address this question, by developing estimators of these latent
terms, and use results that were obtained previously in Section 4.3. It is worth noting that
no approximations has been used here in contrast with the technique used in Chapter 3 and
Chapter 4.
TodeveloptheunconditionalMLE,weintegrateoutx0 fromthejointdensityf(x0;x1;x2);
and we have that:
f(x1;x2) =
Z
x0
f(x0;x1;x2)dx0
= p1p2
Z
?0e??0x0?(x1?a1x0)?(x2?a2x0)dx0
+p1(1?p2)
Z
?0e??0x0fX2(x2 ?a2x0)I(x2>a2x0)?(x1?a1x0)dx0
+(1?p1)p2
Z
?0e??0x0fX1(x1 ?a1x0)I(x1>a1x0)?(x2?a2x0)dx0
+(1?p1)(1?p2)
Z
?0e??0x0fX1(x1 ?a1x0)fX2(x2 ?a2x0)I(x1
a1>x0;
x2
a2>x0)
dx0
= p1p2PartA1 +p1(1?p2)PartA2
+(1?p1)p2PartA3 +(1?p1)(1?p2)PartA4;
where PartA1 =
Z
?0e??0x0?(x1?a1x0)?(x2?a2x0)dx0
91
= 1a
1a2
?0
Z
e??0x0?(x1
a1?x0)
?(x2
a2?x0)
dx0
=
8>
>>>>
<
>>>
>>:
1
a1a2?0e
??0x2a2 ?
(x1a1?x2a2) ;
or
1
a1a2?0e
??0x1a1 ?
(x1a1?x2a2) .
PartA2 =
Z
?0e??0x0fX2(x2 ?a2x0)I(x2>a2x0)?(x1?a1x0)dx0
= 1a
1
Z
?0e??0x0fX2(x2 ?a2x0)I(x2>a2x0)?(x1
a1?x0)
dx0
= 1a
1
?0e??0
x1
a1 fX
2(x2 ?a2
x1
a1)I(x2>a2x1a1)
= 1a
1
?0?2e??0
x1
a1 e??2(x2?a2
x1
a1)I(x
2>a2
x1
a1)
= 1a
1
?0?2e?
x1
a1(?0?a2?2)e??2x2I(x
2>a2
x1
a1)
= 1a
1
?0?2e?
x1
a1(?0?a2?2)e??2x2I(x2
a2>
x1
a1)
= 1a
1
?0?2e??1x1e??2x2e?
x1
a1(?0?a1?1?a2?2)I(x2
a2>
x1
a1)
= 1a
1
?0?2e??1x1e??2x2e????I(x2
a2>
x1
a1)
;
where ? = min
 x
1
a1;
x2
a2
?
and ?? = ?0 ?a1?1 ?a2?2:
Similarly, PartA3 =
Z
?0e??0x0fX1(x1 ?a1x0)I(x1>a1x0)?(x2?a2x0)dx0
= 1a
2
Z
?0e??0x0fX1(x1 ?a1x0)I(x1>a1x0)?(x2
a2?x0)
dx0
= 1a
2
?0e??0
x2
a2 fX
1(x1 ?a1
x2
a2)I(x1>a1x2a2)
92
= 1a
2
?0?1e??0
x2
a2 e??1(x1?a1
x2
a2)I(x
1>a1
x2
a2)
= 1a
2
?0?1e?
x2
a2(?0?a1?1)e??1x1I(x
1>a1
x2
a2)
= 1a
2
?0?1e?
x2
a2(?0?a1?1)e??1x1I(x1
a1>
x2
a2)
= 1a
2
?0?1e??1x1e??2x2e?
x2
a2(?0?a1?1?a2?2)I(x1
a1>
x2
a2)
= 1a
2
?0?1e??1x1e??2x2e????I(x1
a1>
x2
a2)
;
and PartA4 =
Z
?0e??0x0fX1(x1 ?a1x0)fX2(x2 ?a2x0)I(x1>a1x0;x2>a2x0)dx0
=
Z ?
0
?0?1?2e??0x0e??1(x1?a1x0)e??2(x2?a2x0)dx0
=
Z ?
0
?0?1?2e??1x1e??2x2e?(?0?a1?1?a2?2)x0dx0
= ?0?1?2e??1x1e??2x2
Z ?
0
e???x0dx0
= ?0?1?2?? e??1x1e??2x2(1?e????):
Hence the expression for the joint density becomes:
f(x1;x2) = p1p2 ?0a
1a2
e??0
x2
a2 ?(x1
a1?
x2
a2)
+p1(1?p2) 1a
1
?0?2e??1x1e??2x2e????I(x2
a2>
x1
a1)
+(1?p1)p2 1a
2
?0?1e??1x1e??2x2e????I(x1
a1>
x2
a2)
+(1?p1)(1?p2)?0?1?2?? e??1x1e??2x2(1?e????);
where ? = min
 x
1
a1;
x2
a2
?
and ?? = ?0 ?a1?1 ?a2?2;
with a similar form as in previous section.
93
Based on a random sample of size n denoted (x11;x21);(x12;x22);???;(x1n;x2n); let?s
deflne
r(1)j =
8
><
>:
1; if x0j = x1ja1 ? x2ja2 ;
0; if x1ja1 > x2ja2
and r(2)j =
8
><
>:
1; if x0j = x2ja2 ? x1ja1 ;
0; if x2ja2 > x1ja1 .
Then
L(?0;?1;?2) =
nY
j=1
f(x1j;x2j)
=
nY
j=1
?
p1p2 ?0a
1a2
e??0
x2
a2
?r(1)j r(2)j
?
p1(1?p2) 1a
1
?0?2e??1x1e??2x2e????
?r(1)j (1?r(2)j )
?
(1?p1)p2 1a
2
?0?1e??1x1e??2x2e????
?(1?r(1)j )r(2)j
?
(1?p1)(1?p2)?0?1?2?? e??1x1e??2x2(1?e????)
?(1?r(1)j )(1?r(2)j )
:
In order to obtain estimators, the log likelihood is:
l(?0;?1;?2) = logL(?0;?1;?2)
=
X
i
r(1)i r(2)i
?
log(p1p2)+log(?0)?log(a1a2)??0?i
?
+ r(1)i (1?r(2)i )
?
logp1(1?p2)+log?0 +log?2 ?loga1 ??1x1i ??2x2i ????i
?
+ (1?r(1)i )r(2)i
?
log(1?p1)p2 +log?0 +log?1 ?loga2 ??1x1i ??2x2i ????i
?
+ (1?r(1)i )(1?r(2)i )
?
log(1?p1)(1?p2)+log?0 +log?1?2 ?log??
??1x1i ??2x2i +log(1?e????i)
?
:
94
Assume that p1 and p2 are flxed known constant. Then
@l
@?0 =
X
i
r(1)i r(2)i
? 1
?0 ??i
?
+r(1)i (1?r(2)i )
? 1
?0 ??i
?
+(1?r(1)i )r(2)i
? 1
?0 ??i
?
+(1?r(1)i )(1?r(2)i )
? 1
?0 ?
1
?? ?
?e????i
1?e????i
?
=
X
i
? 1
?0 ??i
 
r(1)i +r(2)i ?r(1)i r(2)i
?
?(1?r(1)i )(1?r(2)i )
 1
?? +
?ie????i
1?e????i
??
= n?
0
?
X
i
?i
 
r(1)i +r(2)i ?r(1)i r(2)i
?
?
X
i
(1?r(1)i )(1?r(2)i )
 1
?? +
?ie????i
1?e????i
?
:
Similarly, @l@?
1
=
X
i
r(1)i (1?r2i)[?x1i +a1?i]+(1?r(1)i )r(2)i [ 1?
1
?x1i +a1?i]
+(1?r(1)i )(1?r(2)i )[ 1?
1
+ a1?? ?x1i + ai?ie
????i
1?e????i ]
=
X
i
?x1i(1?r(1)i r(2)i )+a1?i[r(1)i (1?r(2)i )+(1?r(1)i )r(2)i ]
+ 1?
1
(1?r(1)i )+a1(1?r(1)i )(1?r(2)i )( 1?? + ?ie
????i
1?e????i ):
So setting @l@?
1
= 0 and @l@?
0
= 0 gives
a1
X
i
(1?r(1)i )(1?r(2)i )
 1
?? +
?ie????i
1?e????i
?
=
X
i
x1i(1?r(1)i r(2)i )?a1?i
?
r(1)i (1?r(2)i )+(1?r(1)i )r(2)i
?
? 1?
1
(1?r(1)i );
95
and hence n?
0
= 1a
1
X
i
x1i(1?r(1)i r(2)i )+
X
i
?ir(1)i r(2)i ? 1a
1?1
X
i
(1?r(1)i );
or 1?
0
= 1a
1
X
i
x1i(1?r(1)i r(2)i )
n +
X
i
?ir(1)i r(2)i
n ?
1
a1?1
X
i
(1?r(1)i )
n :
A similar formula can be obtained by taking @l@?
2
and setting it equal to zero.
@l
@?2 =
X
i
r(1)i (1?r2i)
? 1
?2 ?x2i +a2?i
?
+(1?r(1)i )r(2)i [?x2i +a2?i]
+(1?r(1)i )(1?r(2)i )
? 1
?2 +
a2
?? ?x2i +
a2?ie????i
1?e????i
?
=
X
i
?x1i(1?r(1)i r(2)i )+a2?i
?
r(1)i (1?r(2)i )+(1?r(1)i )r(2)i
?
+ 1?
2
(1?r(2)i )+a2(1?r(1)i )(1?r(2)i )
 1
?? +
?ie????i
1?e????i
?
:
So, a2
X
i
(1?r(1)i )(1?r(2)i )
 1
?? +
?ie????i
1?e????i
?
=
X
i
x2i(1?r(1)i r(2)i )?a2?i
?
r(1)i (1?r(2)i )+(1?r(1)i )r(2)i
?
? 1?
2
(1?r(2)i )
and hence, n?
0
= 1a
2
X
i
x2i(1?r(1)i r(2)i )+
X
i
?ir(1)i r(2)i ? 1a
2?2
X
i
(1?r(2)i );
or 1?
0
= 1a
2
X
i
x2i(1?r(1)i r(2)i )
n +
X
i
?ir(1)i r(2)i
n ?
1
a2?2
X
i
(1?r(2)i )
n :
This above equation gives estimate of the parameter associated with the unknown
latent variable x0:
96
The distribution of the minimum is also exponential, so the need to characterize its
distribution. For X(1) denoting the minimum between X1a
1
and X2a
2
, we have, based on the
independence of X1jx0 and X2jx0 that:
P(X(1) > tjx0) = P(X1 > a1tjx0)P(X2 > a2tjx0): (5.7)
Equation (5.7) is obtained from Theorem 1.8 and the fact that X1jx0 and X2jx0 are
independent. But X1jx0 and X2jx0 are not identically distributed. So we cannot use the
results in say Dudewicz and Mishra (1988) [23] of the minimal order statistic. Hence, we
need to flnd P(Xi > aitjx0) for i = 1;2:
P(Xi > aitjx0) = P(aix0 +Zi > aitjx0)
= P(Zi > ai(t?x0)jx0)
=
Z 1
ai(t?x0)
fZi(zi)dzi
=
Z 1
ai(t?x0)
?
pi?(zi)+(1?pi)?ie??izi
?
dzi
= p
Z 1
ai(t?x0)
?(zi)dzi +(1?pi)?i
Z 1
ai(t?x0)
e??izidzi
= pi
?
1?H(ai(t?x0))
?
+(1?pi)e?ai?i(t?x0)
= pi
?
1?H(t?x0)
?
+(1?pi)e?ai?i(t?x0)
From the above, we can deduce the conditional density of Xijx0 which is:
fXijx0(t) = ? ddtP(Xi > aitjx0) = pi?(t?x0)+(1?pi)?ie?ai?i(t?x0):
97
Our goal is to flnd fX(1)jX0(t) = ? ddtP(X(1) > tjx0).
So from Theorem 1.8, the hazard function of the minimum lifetimes is given by:
P(X(1) > tjx0) = P(X1 > a1tjx0)P(X2 > a2tjx0)
=
?
p1(1?H(t?x0))+(1?p1)e?a1?1(t?x0)
?
?
p2(1?H(t?x0))+(1?p2)e?a2?2(t?x0)
?
= p1p2
 
1?H(t?x0)
?
+p1(1?p2)
 
1?H(t?x0)
?
e?a2?2(t?x0)
+(1?p1)p2
 
1?H(t?x0)
?
e?a1?1(t?x0)
+(1?p1)(1?p2)e?(a1?1+a2?2)(t?x0):
Hence the conditional density of X(1) given X0 is:
f(X(1)jx0(t) = ? ddtP(X(1) > tjx0)
= p1p2?(t?x0)
+p1(1?p2)
?
?(t?x0)e?a2?2(t?x0) +(1?H(t?x0))a2?2e?a2?2(t?x0)| {z }
?
+(1?p1)p2
?
?(t?x0)e?a1?1(t?x0) +(1?H(t?x0))a1?1e?a1?1(t?x0)| {z }
?
+(1?p1)(1?p2)(a1?1 +a2?2)e?(a1?1+a2?2)(t?x0)
= p1p2?(t?x0)
+p1(1?p2)?(t?x0)e?a2?2(t?x0)
+(1?p1)p2?(t?x0)e?a1?1(t?x0)
98
+(1?p1)(1?p2)(a1?1 +a2?2)e?(a1?1+a2?2)(t?x0)
where the underlined expression is zero by deflnition of the Heaviside function in Equation
(2.3).
Note that
Z 1
?1
f(X(1)jx0(t)dt = 1: Also,
E(X(1)jx0) =
Z 1
?1
tfX(1)jx0(t)dt
= p1p2
Z 1
?1
t?(t?x0)dt
+p1(1?p2)
Z 1
?1
t?(t?x0)e?a2?2(t?x0)dt
+(1?p1)p2
Z 1
?1
t?(t?x0)e?a1?1(t?x0)dt
+(1?p1)(1?p2)
Z 1
?1
(a1?1 +a2?2)e?(a1?1+a2?2)(t?x0)dt
= p1p2x0 +p1(1?p2)x0 +(1?p1)p2x0 +(1?p1)(1?p2)
 1
a1?1 +a2?2 +x0
?
= x0 + (1?p1)(1?p2)a
1?1 +a2?2
:
So, E(X(1)jx0) = x0 + (1?p1)(1?p2)a
1?1 +a2?2
:
Hence, EX(1) = EX0E(X(1)jx0) = E(X0)+E
?(1?p
1)(1?p2)
a1?1 +a2?2
?
EX(1) = 1?
0
+E
?(1?p
1)(1?p2)
a1?1 +a2?2
?
:
Another way to obtain EX(1) using the densities is as follows:
fX(1);X0(t;x0) = fX(1)jX0(t)fX0(x0)
99
= p1p2?0e??0x0?(t?x0)
+p1(1?p2)?0e??0x0e?a2?2(t?x0)?(t?x0)
+(1?p1)p2?0e??0x0e?a1?1(t?x0)?(t?x0)
+(1?p1)(1?p2)?0(a1?1 +a2?2)e??0x0e?(a1?1+a2?2)(t?x0)?(t?x0):
Hence the form of the density of the minimum order statistic is:
fX(1)(t) =
Z 1
?1
fX(1);X0(t;x0)dx0
= p1p2?0
Z 1
?1
e??0x0?(t?x0)dx0
+p1(1?p2)?0
Z 1
?1
e??0x0e?a2?2(t?x0)?(t?x0)dx0
+(1?p1)p2?0
Z 1
?1
e??0x0e?a1?1(t?x0)?(t?x0)dx0
+(1?p1)(1?p2)?0(a1?1 +a2?2)
Z t
0
e??0x0e?(a1?1+a2?2)(t?x0)?(t?x0)dx0
= p1p2?0e??0t
+p1(1?p2)?0e??0t
+(1?p1)p2?0e??0t
+(1?p1)(1?p2)?0(a1?1 +a2?2)?? (1?e???t)e?(a1?1+a2?2)t
= ?0e??0t +(1?p1)(1?p2)[?0(a1?1 +a2?2)?? (1?e???t)e?(a1?1+a2?2)t ??0e??0t]
= ?0e??0t +(1?p1)(1?p2)?0??
?
(a1?1 +a2?2)e?(a1?1+a2?2)t ??0e??0t
?
:
Hence
EX(1) = 1?
0
+(1?p1)(1?p2)?0??
? 1
a1?1 +a2?2 ?
1
?0
?
:
100
Recall that:
P(Xi > xijx0) = P(aix0 +Zi > xijx0)
= P(Zi > xi ?aix0jx0)
=
Z 1
xi?aix0
fZi(z)dz
=
Z 1
xi?aix0
?
pi?(z)+(1?pi)?ie??iz
?
dz
= p
Z 1
xi?aix0
?(z)dz +(1?pi)?i
Z 1
xi?aix0
e??izdz
= pi
?
1?H(xi ?aix0)
?
+(1?pi)e??i(xi?aix0):
Hence FXijx0(xi) = P(Xi ? xijx0) = 1?P(Xi > xijx0)
= piH(xi ?aix0)+(1?pi)
?
1?e??i(xi?aix0)
?
:
And, F(x1;x2jx0) =
?
p1H(x1 ?a1x0)+(1?p1)
?
1?e??1(x1?a1x0)
? 
?
?
p2H(x2 ?a2x0)+(1?p2)
?
1?e??2(x2?a2x0)
? 
= p1p2H(x1 ?a1x0)H(x2 ?a2x0)
+p1(1?p2)H(x1 ?a1x0)
?
1?e??2(x2?a2x0)
?
+(1?p1)p2H(x2 ?a2x0)
?
1?e??1(x1?a1x0)
?
+(1?p1)(1?p2)
?
1?e??1(x1?a1x0)
??
1?e??2(x2?a2x0)
?
:
101
Also, F(x1;1jx0) = limx
2!1
F(x1;x2jx0)
= p1p2H(x1 ?a1x0)+p1(1?p2)H(x1 ?a1x0)
+(1?p1)p2
?
1?e??2(x2?a2x0)
?
+(1?p1)(1?p2)
?
1?e??1(x1?a1x0)
?
= p1H(x1 ?a1x0)+(1?p1)
?
1?e??1(x1?a1x0)
?
:
Similarly, F(1;x2jx0) = p2H(x2 ?a2x0)+(1?p2)
?
1?e??2(x2?a2x0)
?
:
Hence S(x1;x2jx0) = 1+p1p2H(x1 ?a1x0)H(x2 ?a2x0)
?p1p2H(x1 ?a1x0)?p1(1?p2)H(x1 ?a1x0)e??2(x2?a2x0)
?p1p2H(x2 ?a2x0)?(1?p1)p2H(x2 ?a2x0)e??1(x1?a1x0)
+(1?p1)(1?p2)
?
1?e??1(x1?a1x0)
??
1?e??2(x2?a2x0)
?
?(1?p1)
?
1?e??1(x1?a1x0)
?
?(1?p2)
?
1?e??2(x2?a2x0)
?
:
We have proposed a very general bivariate exponential class distributions, which in-
cludes all the work considered in Chapter 3 and in Chapter 4. We have described the form
of the joint distribution and survival functions. The distribution of the minimum has been
characterized. Estimations of the parameters are given based on the likelihood equation.
We have done all that retaining the form of the marginal exponential distribution, and the
fatal shock idea as in Marshall and Olkin (1967) [61].
These results are not easily generalized to the p-variate gamma distribution case. One
reason is that there is no need to approximate the likelihood function in the multivariate
102
exponential in contrast with the results obtained from Chapter 4. In the next section, we
examine a simulated example to illustrate our proposed model.
5.3 Simulation Example
In this section, we perform a simulation study of the multivariate exponential to exam-
ine the properties of various estimators of the parameter from the latent distribution, ?0:
We focus on ?0 because it is an important portion of the correlation structure, and all of
the parameters associated with X1 through Xp can be easily estimated marginally, which is
well documented in the literature. We examine the multivariate exponential given in (1.1)
and (1.3) for various dimensions, from p = 2 to p = 5:
Based on 10,000 replications of sample size 50 each, of (X1;X2;:::;Xp)0; we choose all
ai?s to be 1, ?0 to be 1, and solving for ?i w.r.t. ?; we have that ?i = ?a
i
?0 = ?: Separate
simulations are done for ? = 0:05;0:10;0:20;0:30;0:40;0:50;0:60;0:70;0:80;0:90 and 0:95:
Results for the bias and MSE are presented in Table 5.1 and Table 5.2, respectively.
Bias02 and Mse02 represent the bias and MSE for b?0 = 1=?x0, where ?x0 = Pni=1 x0i; if the
latent unobservable values, x01;:::;x0n, were actually known. It is important to point out
that b?0 is not observable. However, if these values were observable, then b?0 would be MLE
and the best unbiased estimator for ?0. Therefore, the performance of b?0 serves as a good
benchmark to compare with all other estimators described in this chapter, which are based
on approximations. More precisely, if we denote xminp to be the minimum of x1=a1 up to
xp=ap for p = 2;::;5; then
Bias02 = 1x
0
??0 and Bias1p = 1x
minp
??0; respectively,
103
and the MSE are
Mse02 = Bias022 and Mse1p = Bias12p:
xdiff represents the difierence between the estimates and the true value of x0:
xdiffsq represents the MSE of the difierence.
? Bias02 Bias12 Bias13 Bias14 Bias15 x0 xmin xdiff
0.05 -0.011197 -0.574 -0.39937 -0.28855 -0.17101 0.98249 1.23238 0.24989
0.10 0.009905 -0.421 -0.23330 -0.12258 -0.06381 0.99339 1.08727 0.09389
0.20 0.010300 -0.24054 -0.08718 -0.05341 -0.01757 1.01166 1.03754 0.02588
0.30 0.038332 -0.12648 -0.01558 -0.01427 -0.00735 1.02107 1.02898 0.00791
0.40 0.032919 -0.06926 -0.01205 0.01867 0.00780 1.00860 1.01026 0.00166
0.50 0.014231 -0.04644 0.02828 0.02922 0.01344 1.00713 1.00778 0.00065
0.60 0.002170 -0.03026 -0.01516 0.00538 0.00991 1.01039 1.01066 0.00027
0.70 0.017750 0.00292 0.02846 0.03072 0.03005 0.99122 0.99122 0.00001
0.80 0.003247 -0.00332 -0.01022 0.03713 0.00960 1.01068 1.01068 0.00000
0.90 0.0223014 0.02106 0.01377 0.00629 0.02209 0.99810 0.99810 0.00000
0.95 -0.009249 -0.00944 0.03343 0.00253 0.00857 1.00892 1.00892 0.00000
Table 5.1: Table of Bias and Estimation of x0 for difierent correlations
From Table 5.1, when the number of variates increasing, the bias reduces in magnitude.
That is something we could expect as the prediction of x0 becomes more accurate with the
higher number of variates. In fact, each one gives partial information about x0:
Also, as ? increases, the estimate of x0 becomes more e?cient.
It is also observable that the bias becomes satisfactory with higher correlation for the
2, 3, 4 and 5 variates.
We also present the MSE table of the estimates in Table 5.2.
104
? Mse02 Mse12 Mse13 Mse14 Mse15 xdiffsq
0.05 0.018739 0.33305 0.16586 0.090732 0.044181 0.45044
0.10 0.019985 0.18484 0.06545 0.030232 0.020013 0.12077
0.20 0.018835 0.07029 0.02061 0.016686 0.019382 0.02934
0.30 0.020357 0.02646 0.01864 0.021683 0.022462 0.00687
0.40 0.020278 0.02136 0.02126 0.020382 0.017916 0.00080
0.50 0.019973 0.01946 0.02365 0.027534 0.022398 0.00022
0.60 0.019373 0.01839 0.01619 0.016020 0.021058 0.00026
0.70 0.031892 0.02974 0.01811 0.029901 0.023984 0.00000
0.80 0.017505 0.01730 0.02042 0.029010 0.020515 0.00000
0.90 0.018066 0.01788 0.01857 0.019959 0.022949 0.00000
0.95 0.015202 0.01517 0.02429 0.027867 0.018026 0.00000
Table 5.2: Table of MSE of x0 for difierent correlations
These estimated values show the efiectiveness of the proposed estimation techniques de-
veloped. As we see from the Table 5.2 of MSE, the difierence does appear to be consistently
small, although the high values of correlations do appear to give lower MSE?s.
The algorithm to the proposed estimation is not di?cult to implement, maybe time
consuming. We implemented it using SASr program.
105
Chapter 6
Conclusion
Inthisdissertation, wedeflnedandcharacterizedanewmultivariategeneralizedlocation-
scale family of gamma distributions with potential applications in survival and reliability
modeling. This family possesses three-parameter gamma marginals (in most cases) and it
contains absolutely continuous classes, as well as, the Marshall Olkin type of distributions
with a positive probability mass on a set of measure zero. Interestingly, the variables mak-
ing up the multivariate vector were made linearly related indirectly through a collection
of latent random variables and the multivariate distribution is not necessarily restricted to
those with gamma marginal distributions. Maximum likelihood estimators and estimators
based on the EM algorithm were proposed for the unknown parameters, and, in addition,
methods were given to estimate the latent terms in the model.
We have shown that this distribution is shift invariant, closed under flnite independent
convolutions, and closed under scale transformations. We have also shown that our model
has as special cases those models proposed by Mathai and Moschopoulos (1991) [63], Iyer
et al. (2002) [42], and Iyer et al. (2004) [43], and corrected some of their omissions.
The possible implication of this work is enormous. It takes into account the non iid
properties of real data. Assuming that ai?s are unknown is their structures will add a lot
more applications to the model. Further investigations of the shapes parameters will allow
a characterizations of more distributions. The estimation techniques can be reflned, and
that will improve a lot in statistical decision approach. Extending the model to include
censoring is an attractive option for several applications.
106
Bibliography
[1] Abramowitz, MandStegun, I.A.(1972), Handbook of Mathematical Functions, Selected
Government Publications, Chapter 6, New York, Dover.
[2] Arnold, B.C. (1967), A note on multivariate distributions with specifled marginals,
Journal of the American Statistical Association, Vol. 62, pp. 1460-1461.
[3] Au, Chi and Tam, Judy (1999), Transforming Variables using the Dirac Generalized
Function, The American Statistician, Vol. 53, 3, pp. 270-273.
[4] Balakrishnan, N. and Wang, J. (2000), Simple e?cient estimation of the three-
parameter gamma distribution, Journal of Statistical Planning and Inference, 85, pp.
115-126.
[5] Barlow, R.E. and Proschan, F. (1965), Mathematical Theory of Reliability, John Wiley
& Sons, Library of Congress.
[6] Billingsley, Patrick (1986), Probability and Measure, Wiley Series in Probability and
Mathematical Statistics, 2nd edition.
[7] Borovkov, A.A. and Utev, S.A. (1984), On an inequality and a related characterization
of the normal distribution, Theory of Prob. Appl. Vol 28, pp. 219-228.
[8] Bowman, K.O. and Shenton, L.R. (1988), Properties of estimators of the gamma dis-
tribution, New York: Marcel Dekker, Taylor Series.
[9] Bowman, K.O. and Shenton, L.R. (2002), Problems with Maximum Likelihood Estima-
tion and the 3-Parameter Gamma Distribution, J. Statist. Comput. Simul. Vol.72(5),
pp. 391-401.
[10] Brockwell, P.J. and Davis, R.A. (1996), Introduction to Time Series and Forecasting,
Springer Texts in Statistics.
[11] Carpenter, M. and Diawara, N. (2006), A Multivariate Gamma Distribution, and Its
Characterizations, Accepted pending revision J. Mathematical and Management Sci-
ences.
[12] Carpenter, M., Diawara, N. and Han, Yi, (2006), A New Class of Bivariate Weibull
Survival Distributions, J. Mathematical and Management Sciences, Vol. 26 (1 & 2),
pp. 164-184.
[13] Casella, G. and Berger, R.L. (1990), Statistical Inference, Duxbury Press 2nd edition.
107
[14] Chen, L.H.Y. (1982), An inequality for the multivariate normal distribution, J. Multi-
variate Anal., Vol 12, pp 306-315.
[15] Chen, L.H.Y. (1987), Characterization of probability distributions by Poincare-type in-
equalities, Ann. Inst. H. Poincare Sect. B (N.S.) Vol 23, pp. 91-110.
[16] Cohen, A.C. and Whitten, B.J. (1986), Modifled moment estimation for the three-
parameter gamma distribution, J. Quality Technol. 18, pp. 53-62.
[17] Cohen, A.C. and Whitten, B.J. (1988), Parameter Estimation in Reliability and Life
Span Models,Dekker, New York.
[18] Cohen-Tannoudji, C., Diu, B., and Lalo?e, F. (1977), Quantum Mechanics, Vol. 2, John
Wiley & Sons.
[19] Coles, S.G. and Tawn, J.A. (1994), Statistical Methods for Multivariate Extremes: An
Application to structural Design, Appl. Statist., Vol. 43, pp. 1-48.
[20] Cox, D.R. (1972), Regresssion Models and Life Tables, J. Royal Statist. Society, Series
B, Vol. 34, pp 187-220.
[21] Csorgo, S. and Welsh, A.H. (1989), Testing for Exponential and Marshall-Olkin Dis-
tributions, Journal of Statistical Planning and Inferences, 23, 287-300.
[22] Dempster, A., Laird, N. and Rubin, D. (1977), Maximum Likelihood from Incomplete
Data via the EM algorithm,, Journal of the Royal Statistical Society, Series B, Vol 39
(1), pp 1-38.
[23] Dudewicz, E.J. and Mishra, S. (1988), Modern Mathematical Statistics, Wiley Series
in Probability and Mathematical Statistics.
[24] Fang, K. and Zhang, Y. (1990), Generalized Multivariate Analysis, Springer.
[25] Feller, W. (1971), An Introduction to Probability Theory and Its Applications, Second
Edition, Wiley Series in Probability and Mathematical Statistics.
[26] Folland, G.B. (1999), Real Analysis: Modern Techniques, and Their Applications,
Wiley-Interscience Series.
[27] Friedl, H. and Kauermann, G. (2000), Standard errors for EM estimates in generalized
linear models with random efiects, Biometrics, Vol. 56, pp. 761-767.
[28] Furman, E. and Landsman, Z. (2005), Risk capital decomposition for a multivariate
dependent gamma portfolio, Insurance: Mathematics and Economics, 2005, Vol. 37,
issue 3, pages 635-649.
[29] Gaver, D.P., and Lewis, P.A.W., (1980), First-order autoregressive gamma sequences
and point processes, Adv. appl. Prob, 12, 727-745.
108
[30] Ghosh, S.K. and Gelfand, A.E. (1998), Latent Waiting Time Models for Bivariate
Event Times with Censoring, Sankya, Vol. 60, Series A, pp. 31-47.
[31] Grove, D.A. and Coddington, P.D. (2005), Analytic Models of Probability Distributions
for MPI Point-to-Point Communication Times on Distributed Memory Parallel Com-
puters, Proc. 6th International Conference on Algorithms and Architectures for Parallel
Processing, Melbourne, Australia.
[32] Hanagal, D.D. (1996), A multivariate Weibull distribution, Economic Quality Control,
11, pp. 193-200.
[33] He, W. and Lawless, J.F. (2005), Bivariate location-scale models for regression analysis,
with applications to lifetime data, J. R. Statist. Soc. B, Vol. 67, part 1, pp. 63-78.
[34] Henderson, R. and Shimakura, S. (2003), A Serial Correlated Gamma Frailty Model
for Longitudinal Count Data, Biometrika, 90, 2, pp. 355-366.
[35] Hougaard, Philip (1986), A class of multivariate failure time distributions, Biometrika,
73, 3, pp. 671-678.
[36] Hougaard, Philip (2000), Analysis of Multivariate Survival Data, Springer.
[37] Hougaard, Philip (2001), The normality assumption: Pro and Contra, Biometrika, 73,
3, pp. 671-678.
[38] Hunter, J.K., and Nachtergaele, B. (2001), Applied Analysis, River Edge, NJ, World
Scientiflc.
[39] Hwang, T.Y. and Hu, C.Y. (1999), On a characterization of the gamma distribution:
the independence of the sample mean and the sample coe?cient of variation, Ann. Inst.
Statist. Math., 51, pp. 749-753.
[40] Hwang, T.Y. and Hu, C.Y. (2000), On some characterizations of population distribu-
tions, Taiwanese Journal of Mathematics, Vol 4, No 3, pp. 427-437.
[41] Hwang, T.Y. and Huang, P.H. (2002), On new moment estimation of parameters of
gamma distribution using its characterization, Ann. Inst. Stat. Math., 54, pp. 840-847.
[42] Iyer, Srilanth K., Manjunath, D. and Manivasakan, R. (2002), Bivariate Exponential
Distributions Using Linear Structures, Sankhya, V.64, Series A, pp. 156-166.
[43] Iyer, Srilanth K. and Manjunath, D. (2004), Correlated Bivariate Sequence for queueing
and reliability applications, Communications in Statistics, Vol. 33, pp. 331-350.
[44] Joe, H. (1997), Multivariate Models and Dependence Concepts, Chapman & Hall.
[45] Johnson, N. and Kotz, S. (1970), Continuous Univariate Distributions-1 Wiley, New
York.
109
[46] Johnson, N., Kotz, S. and Kemp, A. (1993), Univariate Discrete Distributions, 2nd
edition Wiley Series in Probability and Mathematical Statistics.
[47] Johnson, N., Kotz, S., Balakrishnan, N. (1997), Discrete Multivariate Distributions,
Wiley Series in Probability and Statistics.
[48] Johnson, R.A. and Wichern, D.W. (1998), Applied Multivariate Statistical Analysis,
Prentice Hall, 4th edition.
[49] Jorgensen, B. (1987), Exponential Dispersion Models, J. R. Statist. Soc. B, Vol. 49,
No. 2, pp. 127-162.
[50] Kallenberg, O. (1986), Random Measures, 4th Edition, Akademie-Verlag Berlin.
[51] Kallenberg, O. (2002), Foundations of Modern Probability, 2nd Edition, Springer.
[52] Khuri, Andre (2004), Applications of Dirac?s Delta Function in Statistics, Int. J. Math.
Educ. Sci. Technol., 35, 2, pp. 185-195.
[53] Kotz, S., Balakrishnan, N. and Johnson, N. (2000), Continuous Multivariate Distribu-
tions, Volume 1 Wiley Series in Probability and Statistics.
[54] Krzanowski, W.J. and Marriot, F.H.C. (1994), Multivariate Analysis: Part 1: Distri-
butions, Ordination and Inference, Edward Arnold, Libary of Congress.
[55] Lawrence, A.J. (1978), Some Autoregressive Models for Point Processes, Colloquia
Mathematica Societatis J?anos Boylai.
[56] Lawless, J. F. (2003), Statistical Models and Methods for Lifetime Data, 2nd edition
Wiley Series in Probability and Statistics.
[57] Lee, Larry (1979), Multivariate Distributions having Weibull Properties, Journal of
Multivariate Analysis, 9, pp. 267-277.
[58] Lehmann, E.L. (1986), Testing Statistical Hypotheses, 2nd edition, New York, Chapman
& Hall.
[59] Lu, Jye-Chyi and Bhattacharyya, G.K. (1990), Some New Constructions of Bivariate
Weibull Models, Ann. Inst. Statist. Math., 43(3): pp. 543-559.
[60] Mardia, K.V. (1970), Measures of multivariate skewness and kurtosis with applications,
Biometrika, Vol. 57., pp. 519-530.
[61] Marshall, A.W. and Olkin, I. (1967), A Multivariate Exponential Distribution, J. Amer.
Stat. Assoc., 63: pp. 30-44.
[62] Marshall, A.W. and Olkin, I. (1983), Domains of Attraction of Multivariate Extreme
Value Distributions, The Annals of Probability, 111, pp. 168-177.
110
[63] Mathai, A.M. and Moschopoulos, P.G. (1991), On a Multivariate Gamma, Journal of
Multivariate Analysis, 39, pp. 135-153.
[64] Mathai, A.M. and Moschopoulos, P.G. (1992), A Form of Multivariate Gamma Dis-
tribution, Annals of the Institute of Statistical Mathematics, 44, pp. 97-106.
[65] Mathai, A.M. and Saxena, R.K. (1978), The H-function with Applications in Statistics
and Other Disciplines, Wiley New York.
[66] McCullagh, P. and Nelder, J.A. (1989), Generalized Linear Models, Chapman and Hall:
London.
[67] McLachlan, G.L. and Peel, D. (2000), Finite Mixture Models, New York, Wiley.
[68] Minhajuddin, A.T.M., Harris, I.R., and Schucany, W.R. (2003), Simulating Multi-
variate Distributions with Speciflc Correlations, J. Statist. Comp. Simul., Vol. 74, pp.
599-607.
[69] Moran, P.A.P. (1969), Statistical Inference with Bivariate Gamma Distributions, Bio-
metrika, Vol. 56, No.3, pp. 627-634.
[70] Moschopoulos, P.G. (1985), The Distribution of the Sum of Independent Gamma Ran-
dom Variables, Annals of the Institute of Statistical Mathematics, Vol. 37, pp. 541-544.
[71] Moschopoulos, P.G. and Staniswalis, J. G. (1994), Estimation Given Conditionals from
an Exponential Family, The American Statistician, Vol. 48, No. 4, pp. 271-275.
[72] Nadarajah, S. and Kotz, S. (2005), On the Linear Combinations of Exponential and
Gamma Random Variables, Entropy, Vol 7[2], pp. 161-171.
[73] Pazman, A. and Pronzato, L. (1996), A Dirac-function method for densities of nonlin-
ear statistics and for marginals densities in nonlinear regression, Statistics and Prob-
ability Letters, Vol 26, pp 159-167.
[74] Rudin, W. (1976), Principles of Mathematical Analysis, McGraw-Hill.
[75] Shilov, G.E. and Gurevich, B.L. (1977), Integral, Measure, and Derivative: a Unifled
Approach, New York, Dover Publications.
[76] Sivazlian, B.D. (1981), On a Multivariate Extension of the Gamma and Beta Distrib-
utions, SIAM Journal on Applied Mathematics, Vol. 41, No. 2, pp. 205-209.
[77] Tanner, M.A. (1996), Tools for Statistical Inference: Methods for the Expectation of
Posterior Distributions and Likelihood Functions, 3rd edition, New York: Springer.
[78] Vaupel, J.W., Manton, K.G., and Stallard, E. (1979), The impact of heterogeneity in
individual frailty on the dynamics of mortality, Demogarphy, 16(3), pp. 439-454.
111
[79] Walker, S.G. and Stephens, D.A. (1999), A Multivariate Family of Distributions on
(0;1)p, Biometrika, Vol. 86, No. 3, pp. 703-709.
[80] Williamson, J.H. (1962), Lebesgue Integration, New York, Holt, Rinehart and Winston,
Athena Series.
112