New Classes of Multivariate Gamma Survival and Reliability Models Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classifled information. Norou Dini Diawara Certiflcate of Approval: Asheber Abebe Assistant Professor Mathematics and Statistics Mark Carpenter, Chair Associate Professor Mathematics and Statistics Nedret Billor Associate Professor Mathematics and Statistics Olav Kallenberg Professor Mathematics and Statistics Jerzy Szulga Professor Mathematics and Statistics Stephen L. McFarland Dean Graduate School New Classes of Multivariate Gamma Survival and Reliability Models Norou Dini Diawara A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulflllment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 7, 2006 New Classes of Multivariate Gamma Survival and Reliability Models Norou Diawara Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author August 7, 2006 Date of Graduation iii Vita Norou Dini Diawara, son of Moustapha Bassirou Diawara and Khady Diatou Dieng, was born in Ziguinchor, S?en?egal on November 15, 1971. He attented public school in Dakar, Senegal. After graduating from Lamine Gueye?s High School, he entered Cheick Anta Diop University (U.C.A.D.) in September 1991, from which he received his B.S. in Mathematics and Physics in 1993. He started his graduate studies in the Department of Mathematics at U.C.A.D. With the help of his family and uncle Alioune Diop, he went on to complete his Licence and Ma^?trise at the University of Le Havre in Le Havre, France in 1996. He continued his studies at the University of South Alabama, in Mobile, AL, USA where he earned his M.S. in Mathematics and Statistics in 1999. After two years at the University of Georgia, in Athens, GA. as graduate teaching assistant and working for the Canadian Consulate General in Atlanta, GA., he came to Auburn University in August 2002 where he completed his Ph.D. in Mathematics and Statistics in 2006. iv Dissertation Abstract New Classes of Multivariate Gamma Survival and Reliability Models Norou Dini Diawara Doctor of Philosophy, August 7, 2006 (M.S., University of South Alabama, Mobile,AL., 1999) (M.S., Le Havre University, Le Havre, France, 1996) 121 Typed Pages Directed by Mark Carpenter Multivariate modeling and analysis based on the multivariate normal distribution is well established and widely used. However, when the marginal distributions have only a positive support, such as time-to-event models, that are positively skewed, often the multivariate normal theory and resulting approximations fail. Accordingly, over the last flfty years, thousands of papers have been published suggesting many ways of generating families of positive support multivariate distributions, such as gamma, Weibull and exponential. As evidenced by recent literature, this quest is still rigorously pursued even today. In this dissertation, we provide a large and exible class of multivariate gamma distributions that contains both absolutely continuous and discontinuous distributions on the positive hypercube support. All of these models are applicable to the area of reliability and survival modeling. v Acknowledgments I would like to thank my advisor, Dr. Carpenter, for his academic guidance, generous advice, his sharp comments and support, during our many discussions. I am very appre- ciative of his encouragement and patience when things seemed fuzzy. Without his support, this thesis would not have been completed. It has been a privilege working with him. I also wish to express my sincere gratitude to the members of my committee: Dr. Ash Abebe, Dr. Nedret Billor, Dr. Olav Kallenberg and Dr. Jerzy Szulga for their time, patience and suggestions that led to me improving this work. I have truly enjoyed working with each one of them, and I really appreciate their full support. Special thanks to Dr. Olav Kallenberg for the knowledge I have gained through the seminars and years of research we had started. I thank him for his continuous support since I came to Auburn University, his kindness, time, and collaboration. Thanks to the faculty and stafi of our Department, and to my fellow students for the great experience at Auburn. I am deeply indebted to my family, who has always supported me, and specially to my parents and my siblings Papa Amadou, Amadou Belal, Ndeye Maguette, and Ndeye Fatou, for their love, guidance and vision. Special thanks to my uncle Alioune Diop and my aunt Ndiaya Mbengue for their kindness, support and encouragement. I am grateful to my wife, Oumy Ba, for her endless love, friendship, support and encouragement. I thank her for her patience during all these stages when we had to spend time apart, and her alone taking care of our baby Sidy Manoumb?e. She is an inspiration to me. I dedicate this dissertation to my beloved brother Sidy Manoumb?e, and all those who truly believed in me, encouraged and supported me during this long path. vi Style manual or journal used Transactions of the American Mathematical Society (together with the style known as \auphd"). Bibliography follows van Leunen?s A Handbook for Scholars. Computer software used The document preparation package TEX (speciflcally LATEX) together with the departmental style-flle auphd.sty. vii Table of Contents List of Figures ix 1 Introduction 3 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 The Univariate Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.2 Estimation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5 Marginal, Conditional and Joint Distributions . . . . . . . . . . . . . . . . . 20 1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 Preliminaries 25 2.1 The Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 The Dirac Delta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 Inflnite Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4 Stable Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3 The Multivariate Gamma 39 3.1 Properties and Characterization . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 The Fatal Shock Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 The Multivariate Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 The Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4 The Continuous Case Model 62 4.1 Deflnition of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Estimation Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 The Multivariate Exponential 78 5.1 The Multivariate Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 The Bivariate Exponential Model and MLE . . . . . . . . . . . . . . . . . . 86 5.3 Simulation Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6 Conclusion 106 Bibliography 107 viii List of Figures 1.1 Graph of gamma pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Graph of gamma cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1 Graph of bivariate gamma distribution . . . . . . . . . . . . . . . . . . . . . 38 ix Abbreviations Abbreviation Description iid Independent and identically distributed EM Expectation Maximization LST Laplace-Stieltjes transform MCMC Markov chain Monte Carlo MLE Maximum Likelihood Estimator MSE Mean Square Error cdf Cumulative density function pdf Probability density function (or mass function in discrete case) pp. Pages prob. Probability et al. and others a.s. almost surely i.e. that means r.s. Random sample r.v. Random variable a^b Minimum between a and b a_b Maximum between a and b 1 Notations Notation Description R Set of real numbers R+ Set of positive real numbers Z= f0;?1;::g Set of integer numbers N= f1;2;:::g Set of positive integer numbers X Random variable p Dimension of multivariate distribution [a] The entire part of a number a P The power set jAj Cardinal of a set A A(k) kth element of the set A Ac(k) kth element of the set Ac, complement set of A E(X), Var(X) Expectation and variance of the r.v. X Ga(?;?;fi) Gamma distribution with location ?, scale ?, and shape fi ?(fi) Gamma function with parameter fi Parameter set of full model ^ Maximum likelihood estimate for parameter set of full model FXjY and fXjY conditional cdf and pdf, respectively X ? f(x) The random variable X follows the density f(x) LX The Laplace transform of the random variable X / Proportional to 2 Chapter 1 Introduction 1.1 Motivation Multivariate survival or reliability analysis refers to the multivariate modeling of the difierent times to multiple events that are recorded on the same experimental unit. In such cases, when more than one time-to-event is observed for each individual under study, it is reasonable to assume that the event times are dependent. However, it is common practice to assume \working independence" (see Lawless (2003) [56] for example) where each event is analyzed separately, while ignoring this dependence (efiectively treating them as inde- pendent events). This seemingly pragmatic approach can lead to confusing interpretations and incorrect results. Carpenter et al. (2006) [12] showed that ignoring dependence be- tween variables comes at a great cost in terms of bias, mean square error (MSE), and other objective criteria in the bivariate exponential case. This idea is quite intuitive, because ignoring dependence requires maximizing the wrong or misspecifled likelihood function. That is, the maximum likelihood estimators (MLE?s) of the parameters based on the marginal likelihoods are not necessarily the same as the MLE?s derived from the joint likelihood. Another problem with ignoring dependence is that such an analysis would require one to adjustforthemultipletestingproblem. Inclinicaltrials, theFoodandDrugAdministration, for example, routinely requires conservative adjustments for multiple endpoints such as a primary endpoint of time-to-death due to a brain tumor, and a secondary endpoint of time-to-recurrence of the tumor after surgery. 3 Because the marginal distributions are of positive support and often very positively skewed, treating the observations as multivariate normal can lead to incorrect results and/or convergence issues. For example, the multivariate mixture model is used in many appli- cations for unsupervised classiflcation. Hougaard (2001) [37] advises cautions using the multivariate normal when the underlying distribution does not apply to it. Also, these methods can be computationally intensive (see Borovkov and Utev (1984) [7], Chen (1982) [14], Mclachlan and Peel (2000) [67], Johnson and Wichern (1998) [48] ) even if the underlying latent populations are multivariate normal. In fact, Mclachlan and Peel (2000) [67] provide a whole chapter on mixture models for failure time data. Therefore, over the last flfty years there has been great interest in deriving and char- acterizing multivariate distributions with positive support and skewed marginals. A few of the many widely known references are from Arnold (1967) [2], Barlow and Proschan (1965) [5], Fang (1990) [24], Furman (2005) [28], Gosh and Gelfand (1998) [30], Hanagal (1996) [32], Hougaard (2000) [36], Hougaard (2001) [37], Joe (1997) [44], Kotz et al. (2000) [53], Lee (1979) [57], Lu and Bhattacharyya (1990) [59], Marshall and Olkin (1967) [61], Mathai and Moschopoulos (1991) [63], Mathai and Moschopoulos (1992) [64], Moran (1969) [69], Sivazlian (1981) [76], and Walker and Stephens (1999) [79]. In this dissertation, we provide a large and exible class of multivariate gamma dis- tributions that contains both absolutely continuous and discontinuous distributions on the positive hypercube support. All of these models are applicable to the area of reliability and survival modeling. Also, since this theoretical research resulted from a real world application we provide methods for parameter estimation for a variety of applications. 4 1.2 Research Question Our focus is on generating multivariate survival and reliability models with specifled marginal distributions of the same family, i.e., exponential, Weibull and gamma. One common use of these models, in survival analysis, is to examine the survival times of patients in medical or biological settings, based on factors such as treatment, age, education, to predict the time to death, or time to certain events. In the industrial engineering setting, the study of time to failure of devices, machines or components in a system, are referred to as reliability analysis. Although the applications are very difierent, these models can be used in both contexts of survival and reliability analysis. Our approach to creating a dependence structure between multiple events, is to in- directly associate the random variables (r.v.?s) linearly with latent, unobservable r.v.?s. Specially, let p be some flxed integer, and X0;X1;:::;Xp be r.v.?s with specifled distribu- tions of the same family, say f(xj ), that are linked through the mutually independent r.v.?s Z1;Z2;:::;Zp in the following way: Xi = aiX0 +Zi (1.1) where ai?s are some nonnegative constants, and Zi independent of X0 for i = 1;2;???;p. Note that X0;Z1;Z2;???;Zp are considered latent, unobservable r.v.?s , that generate the observable multivariate vector X = (X1;X2;???;Xp)0: Much of the research in this dissertation is on the derivation and characterization of the classes of distributions of Z1;Z2;???;Zp, that produce the specifled marginal distributions 5 of X1;X2;???;Xp; respectively. We derive and study estimators of the parameters under the resulting dependence structure. Interestingly, if we treat the latent variables, X0;Z1;???;Zp; as missing observations, then methods based on conditional expectation and/or expectation-maximization (EM) algorithms have great potential in estimating the parameters associated with the latent population. We develop such approaches as well as a missing value substitution techniques in improving our estimates of the parameters of the marginal distributions and parameters associated with the latent terms. Since in this dissertation we focus on a new multivariate gamma distribution, in the sections that follow, we review the univariate gamma and other multivariate gammas in their various forms found in the literature and in practice. 1.3 The Univariate Gamma Distribution Positively skewed distributions on a positive support occur quite often in practical applications such as in reliability and survival analysis. Lawless (2003) [56] gives some examples of the most common models, and their applications. The gamma family of dis- tributions is right tailed, and seems a natural choice for positive valued distributions with heavy tails. It is widely used in reliability and survival. For example, the gamma dis- tribution is used in statistical studies of times between earthquakes. So estimation of its parameters is very important. The gamma distribution belongs to the class of exponential distributions on the real line with respect to Lebesgue measure. See McCullagh and Nelder (1989) [66] for example. 6 This family plays an important role in many areas of probability and statistics through the normal, exponential, gamma, Poisson, binomial distributions, and many other ones. The exponential family distribution is a large class that allow us to handle a wide category of distributions, and many of the properties are similar to the normal distribution. Lehmann(1986) [58] gives some results for that class of distributions. McCullagh and Nelder (1989) [66] give some examples of models where the gamma distribution is used in the regression structure under generalized linear model (GLM). GLM allows a unifled theory for many models in practical statistics, including models for gamma responses and survival data. We consider the univariate three parameter gamma distribution X, denoted here as Ga(?;?;fi). If ? = 0; we denote the two parameter gamma distribution X as Ga(?;fi): Deflnition 1.1 The three parameter gamma distribution, see Cohen and Whitten (1986) [16] and Billingsley (1986) [6] for examples, is deflned by the density function fX(:;?;?;fi) : t 2 (?;1) 7! fX(t;?;?;fi) 2 R+ of the three parameters ? 2 R;? > 0 and fi > 0 given as fX(t;?;?;fi) = ? fi ?(fi)(t??) fi?1e??(t??)I[?;1)(t) (1.2) where ? 2R; ? > 0; and fi > 0 are the location, scale and shape parameters, respectively. ?(fi) = Z 1 0 xfi?1e?xdx is called the gamma function. The three parameter gamma distribution is also known as Type III of distributions in the Pearson?s system of distributions. Rather than considering the distribution at the 7 origin, it is sometimes more appropriate to shift it on the real axis, allowing the domain to be [?;1), as it is done in (1.2). The scale parameter ? is also called the rate or inverse scale under inverse parametrization. The smaller the value of ?, the wider the spread, and vice versa. Whereas for the normal distribution, the scale parameter is the standard deviation. When 0 < fi < 1; the distribution has a vertical asymptote. When fi > 1; the distribution has a mode at fi?1? + ? and heavy tails. See Figure 1.1 and Figure 1.2 below for the cdf and pdf, respectively. The gamma function ?(fi) = Z 1 0 xfi?1e?xdx with ? 1 2 ? = p? is the normalizing constant of the distribution, and satisfles the relation fi?(fi) = ?(fi + 1): The incom- plete gamma is deflned as I(fi;t) = 1?(fi) Z t 0 xfi?1e?xdx; as in Lawless (2003) [56], and ?(fi;t) = 1?(fi) Z 1 t xfi?1e?xdx is called the complementary incomplete gamma function. Several special cases of the gamma , such as the Erlang or the exponential, exist, and are widely used. The gamma distribution of (1.2) with location at the origin and scale 1 is known as the standard exponential distribution. Most statistical packages, such as SASr and S-Plus generate standard exponential distributions. Its pdf is: f(t) = ?e??t; where ? > 0, and t ? 0: (1.3) The displaced exponential is the exponential shifted by some value ?; denoted exp(?;?) and deflned as f(t) = ?e??(t??); where ? > 0, and t ? ?: (1.4) 8 Gamma pdf with lambda=1/2, Low thickness: alpha=1, Medium thickness: alpha=3, High thickness: alpha=5 0 0.1 0.2 0.3 0.4 0.5 5 10 15 20 25Figure 1.1: Graph of gamma pdf The Erlang distribution is also a particular case of the gamma distribution where the shape parameter fi in (1.2), is a positive integer value, denoted here as n. Its pdf is: f(t) = ? n ?(n)(t??) n?1e??(t??) 2R+; where ? > 0, and n 2N. (1.5) The Erlang distribution is often used to model the waiting times in queuing systems, particularly in the case of telephone tra?c engineering. The Erlang distribution is the probability distribution of the waiting time until the nth arrival in a one-dimensional Poisson process with intensity ?: The case n = 1 reduces to the exponential distribution. The 9 Gamma cdf with lambda=1/2, Low tickness: alpha=1, Medium tickness: alpha=3, High tickness: alpha=5 0 0.2 0.4 0.6 0.8 1 5 10 15 20 25Figure 1.2: Graph of gamma cdf exponential distribution is the difierence between successive occurrences of events given by a Poisson process. Another interpretation due to Barlow and Proschan (1965) [5], explains the Erlang distribution as the waiting time until failure by a random shock in a device. The device fails when there are exactly fi shocks that occur randomly over time, fi 2N, at a Poisson rate with parameter ?. That is the convolution of fi exponential distributions. 10 1.3.1 Properties The class of gamma distributions which includes the exponential, the Erlang and the Chi-Square distributions, is widely used in applied and theoretical statistics. Unlike the normal distribution, the gamma distribution is asymmetric, and cannot be distinguished by its mean and variance alone, known as flrst and second moments, respectively. In addition to the mean and variance, this distribution is characterized by its third and fourth order moments known as skewness and kurtosis, and by higher moments. See Grove and Coddington (2005) [31] for example. The expected value of the gamma distribution in (1.2) and its variance are EX = fi? +? and Var(X) = fi?2; (1.6) respectively. The probability that an event of interest has not occurred by time x, the survival or reliability function, is given by SX(x) = 1?FX(x) = 1?I fi;?(x??) ? ; (1.7) where I(fi;t) = 1?(fi) Z t 0 xfi?1e?xdx: The instantaneous rate of occurrence of an event, also called hazard function, or failure rate function, is obtained from (1.7) and is deflned as: h(x) = f(x)S(x) = ? fi ?(fi) (x??)fi?1 (1?I(fi;?(x??)))e ??(x??); (1.8) 11 and the cumulative hazard rate is H(x) = Z x ? h(u)du = ?logS(x): (1.9) The hazard function (1.8), for the gamma density in (1.2), is increasing when fi > 1 from 0 to ?, decreasing when fi < 1 from 1 to ?, and constant when fi = 1 (exponential case). Many authors have recognized the importance of modeling and studying the hazard function (1.8) or the cumulative hazard rate (1.9). For example, Cox (1972) [20] used the hazard function to get estimators of the efiects of difierent covariates on the times to failures of a system, by his famously proposed Cox proportional hazard model. Balakrishnan and Wang (2000) [4] showed that the estimation problem for the three parameter gamma has always been a challenging and interesting one, in particular when the shape parameter fi is small (less than 2.5). As the shape fi ! 1, the gamma distribution in (1.2) approaches normality. Another property related with the gamma pdf is that it is closed under scalar product. More precisely, if X ? Ga(?;?;fi), then Y = cX ? Ga(c?; ?c;fi) (1.10) for all c > 0; and fY (y) = fX(yc)? 1c = (?=c) fi ?(fi) (y ?c?) fi?1e??c(y?c?): 12 The Laplace transform of (1.2) is given by LX(t) = e??t ? ?+t ?fi : (1.11) Some advantages of the gamma distribution are in the simple forms of its density (1.2) and its Laplace transform (1.11) further discussed in Example 2.2. Another important property is that in the Erlang type distribution, G(?;?;m) can be thought of as the sum of m independent distributions that are each displaced exponential as in (1.4) with parameters ?i and ? such that mX i=1 ?i = ?. Moreprecisely, if Xi ? exp(?i;?) i = 1;2;???;m;thenX = mX i=1 Xi ? G( mX i=1 ?i;?;m): Dudewicz and Mishra (1988) [23], and Billingsley (1986) [6] showed that a sum of inde- pendent gamma r.v.?s with same location and scale parameters ? and ?; is again a gamma r.v. with the added shape parameters. The class of distributions fGa(?;?;m); m 2Ng is then additive as its sum operation is well deflned. This is analogous to the informally deflned multiplicative distribution for distributions that have a well deflned multiplicative operation. 1.3.2 Estimation of Parameters Since the gamma distribution has many applications and is widely used, obtaining reliable estimates of its parameters is a key issue, especially since the gamma can model a variety of hazard functions. There are several methods of estimation such as graphical methods, methods of moments, the least squares method, and the maximum likelihood method. 13 The estimations of the shape and scale are afiected if the location has to be estimated using method of moments. Refer to Bowman and Shenton (2002) [9] for an example. The location parameter ?, in most cases non-negative, is typically called the threshold parameter. Here, it is interpreted as the minimum possible lifetime and can be estimated by the minimum order statistics, as the condition x ? ? must be always satisfled, although it is only an MLE in the exponential case. Note that Balakrishnan and Wang (2000) [4] also considered a general linear combination of the order statistics, ^? = Pni=1 cix(i); where x(1);x(2);???;x(n) are the order statistics from a given sample of size n. However, as mentioned for example in Bowman and Shenton (1988) [8], there is a high degree of deviation of the estimates from the parent distribution if one uses the method of moments. In exploratory data analysis, the sample mean and the sample standard de- viation are two quantities often computed as measures of center and spread, respectively. In the normal distribution case, the population mean and variance are mathematically in- dependent, and the sample mean and variance are known to be statistically independent. However, in the non normal case, the population mean and variance are often linked math- ematically (See McCullagh and Nelder (1989) [66]), i.e. the higher the mean, the higher the standard deviation, and in fact the sample mean and sample variance are not independent. The coe?cient of variation establishes a similar independence property for the gamma case. The population coe?cient of variation, CV, for a distribution X with mean ?X and standard deviation X is deflned as CV = X? X . Estimates are obtained for example by Hwang and Huang (2002) [41], among other authors. They derive the results, establishing that the sample mean ?X and the sample variance S2n are unbiased estimators of (1.6), 14 respectively. (See Hwang and Hu (1999)[39] and Hwang and Hu (2000)[40]). We recall from Dudewicz and Mishra (1986) [23] and Casella and Berger (1990) [13], that the independence of the sample mean and sample standard deviation characterizes the normal distribution. Theorem 1.2 establishes a similar characterization for the gamma distribution through the sample coe?cient of variation CVn = Sn?X n . Theorem 1.2 Suppose that n ? 3. X1;X2;:::;Xn are independent Ga(?;?;fi) r.v.?s ifi ?Xn and CVn are independent. Proof: Proved in Hwang and Hu (1999) [39]. Hwang and Hu (1999) [39] use this characterization to obtain estimates of shape and scale parameters for the gamma distribution. They showed that these new estimators perform better that the MLE, and the method of moments estimators for a sample of size n ? 25: Using (1.2), the likelihood function based on an independent identically gamma dis- tributed random sample X1;X2;???;Xn of size n is called , is given by: L(?;?;fi) = ? nfi ?n(fi) nY i=1 (xi ??)fi?1e?? Pn i=1(xi??)I(?;1)(xi): The likelihood is viewed as a function of the parameters ?; ?; and fi conditioned on the data. To ease the computations, one often deals with the logarithmic of the likelihood, and the log likelihood is given as l(?;?;fi) = nfilog??n log?(fi)+(fi?1) nX i=1 log(xi ??)?? nX i=1 (xi ??): 15 Based on the strong law of large numbers or ergodic theorems, the maximum likelihood estimators have useful properties such as consistency and su?ciency. They achieve the Cram?er-Rao minimum variance asymptotically. They have the smallest variance among all possible estimators asymptotically as the sample size increases. In other words, MLE?s are statistically e?cient and given better and often more reliable estimates than the method of moments. 1.4 Literature Review Several approaches and models have been suggested and developed for constructing the multivariate gamma distribution by many authors such as Johnson and Kotz (1970) [45], Hougaard (2001) [37], Marshall and Olkin (1967) [61], Tanner (1996) [77], Moran (1969) [69], Mathai and Moschopoulos (1992) [64], and Walker and Stephens (1999) [79]. Deflning the multivariate gamma distribution can be quite intuitive, but properties are di?cult to illustrate. That could explain why numerous versions of multivariate gamma distributions are proposed in the literature. In particular, various forms of bivariate models have been described from combinations of independent gamma distributions. Johnson et al. (1997) [53] gives a coverage of a useful variety. One approach is to start with certain desirable statistical properties in the univariate case, and build the multivariate extension through various mechanisms. We take one such approach in this dissertation. See for example Mathai and Moschopoulos (1991) [63], Gaver and Lewis (1980) [29], and Hanagal (1996) [32]. Another approach, which deflnes the multivariate gamma from the bivariate, is not as straightforward, since many bivariate gamma do not have a natural extension to multivariate dependence. For example, see Marshall and Olkin (1967) [61], Marshall 16 and Olkin (1983) [62], and Sivazlian (1981) [76]. In fact, Marshall and Olkin (1967) [61], and Marshall and Olkin (1983) [62] proposed an approach that does not extend to the multivariate case easily. Sivazlian (1981) [76] gives a form of multivariate gamma with independent structure. Walker and Stephens (1999) [79] deflne a multivariate family of distributions on Rp+ in the context of survival analysis. It includes the Weibull, with interesting properties for which the marginal distribution is of the same family, with a exible relationship between the components, straightforward analysis, and is applicable for censored data. As they stated, the extension to the multivariate setting has proved quite problematic. Our goal is to solve that problem and maintain these properties as well in our construction of the multivariate gamma distribution. AnotherclassicalmultivariategammaisknownasCheriyanandRamabhadran?sgamma distribution and is given in Kotz et al. (2000) [53]. It is a multivariate additive distribution which is a simple case of the one we deflne in (1.1) where the nonnegative constant is taken to be unity. Results for this distribution are also noted in Mathai and Moschopoulos (1991) [63], which they further extend. Henderson and Shimakura (2003) [34] presented a version of multivariate gamma dis- tribution with gamma marginal distributions and correlation matrix without a closed form for the density. Moreover, the estimation of parameters and asymptotic properties are not well studied in those cases. One needs to develop a multivariate gamma distribution that can be used in modeling processes, such as those deflned by Marshall and Olkin (1983) [62] for the bivariate logistic model, or the bivariate exponential model by Mardia (1970) [60]. These methods and 17 others are given in Coles and Tawn (1994) [19], where it is shown how statistical methods in the multivariate context may be applied to problems of data analysis in the extremes in particular. As we said earlier, for the multivariate normal X = (X1;X2;???;Xp)0; with p ? p positive deflnite covariance matrix ? having Var(Xi) = 2i (1 ? i ? p) as diago- nal elements of ?; Chen (1987) [15], among others, obtained a characterization of X as U(X;?) = 1; where U(X;?) = supg2C Var[g(X)]E[rtg(X)?rg(X)]: Here C is the class of measurable functions g : Rp 7! R such that E[g(X)]2 < 1 and Var[g(X)] > 0 and rg(X) represents the gradient operator. Then for the multivariate gamma, U(X;?) > 1: While in the Gaussian case, where the correlation matrix and the marginal distributions completely specify the joint distribution, those two components (correlation matrix and marginal distributions) do not induce a unique joint distribution for correlated gamma r.v.?s. We cannot arbitrarily set X such that the marginal distributions are gamma, i.e Xi ? Ga(0;?i;fii) for i = 1;???;n; and Corr(Xi;Xj) = ?ij 2 [?1;1] for i;j 2 f1;2;???;ng: The di?culty arises from the fact that the mean and variance/covariance cannot be modeled arbitrary as in the normal case, because in the gamma case, the f?ijg1?i;j?n are functions of the shape parameters. This is true for all multivariate gamma motivated in the literature to date. Using the spirit of the multivariate normal density, several versions of the multivari- ate gamma can be found. Fang and Zhang (1990) [24] describe a family of multivariate 18 distributions called p-dimensional r.v. X = (X1;X2;???;Xp)0 with symmetric Kotz type distribution (KTD) denoted as X ? KTDp(?;?p?p;fi;m;fl) where ? ? represents the location vector in Rp, ? ?p?p is the positive deflnite covariance matrix, ? fi is such that fi > (2?p)=2, ? m > 0 and fl > 0. It is deflned as having a joint density of the form fX(x) = Cpj?j?1[(x??)t??1(x??)]fi?1expf?m[(x??)t??1(x??)]flg; where Cp is the normalizing constant given by, Cp = fl?(p=2)m p=2fl+(fi?1)=fl ?p=2?(p=2fl +(fi?1)=p): When fl = 1;fi = 1; and m = 12; we obtain the multivariate normal distribution. Using Fang and Zhang (1990) [24] approach, and modifying the multivariate normal, we hope to have properties similar to the multivariate normal. We deflne the square root of a vector as the square root of its components. We introduce the type of multivariate gamma as follows: fX(x) = Cpj?j?1[(m[x??])(1=2)]t??1(m[x??])1=2]fi?1 ?expf?[(m[x??])(1=2)t??1(m[x??])1=2]flg: 19 Taking fl = 1;m = (pfi1;???;pfip)t, we have the form of the multivariate gamma distribution. When the Xi?s are independent, we have the desired form. The result is not obtained when there is dependence. This multivariate gamma is an example of the multivariate dispersion model introduced by Jorgensen (1987) [49]. 1.5 Marginal, Conditional and Joint Distributions Here, we review the relevant concepts of marginal, joint and conditional distributions that are referred to frequently in this dissertation. These concepts can be found in text- books such as Feller (1971) [25], Dudewicz and Mishra (1988) [23], Folland (1999) [26], and Kallenberg (2002) [51]. We also deflne the concept of conditional independence. Deflnition 1.3 Let (?;A;P) be a probability space, i.e. ? is a set, A is a -algebra, a family of subsets of ? and P is a mapping, P : A ! [0;1] such that (i) P(?) = 1 and (ii) if fAn;n 2Ng disjoints sets in A, then P(Sn An) = P1n=1 P(An). Deflnition 1.4 Let B(R) be the Borel ?fleld generated on R: A random variable (r.v.) X is a measurable function on a probability space, i.e. X is a mapping X : ? !R such that for every B 2B(R), X?1B 2A. Deflnition 1.5 The cumulative distribution function (cdf) of X is a function FX, also denoted as F if there is no confusion, F :R! [0;1] deflned as F(x) = (P ?X?1)((?1;x]) = P(X ? x); x 2R: The cdf is a nondecreasing, right continuous, and has the following limits: limx!?1F(x) = 0 and limx!1F(x) = 1. 20 For a flxed integer p 2 N, the joint cumulative distribution of p?r.v.?s, or p?variate denoted X = (X1;???;Xp)0 is an extension of the above concept, and is deflned on a probability space (?;A;P). When ? =Rp, the joint cdf of X is deflned as F :Rp ! [0;1] by: F(x1;???xp) = P(X1 ? x1;???;Xp ? xp): (1.12) The joint cdf is right continuous, and satisfles the following: limxi!1F(x1;???;xi;???;xp) = F(x1;???;xi?1;xi+1;???;xp); which is the joint distribution of the remaining p?1-variates and limx1;???;xp!?1F(x1;???;xp) = 0 and limx1;???;xp!1F(x1;???;xp) = 1: The marginal cdf of Xk; with 1 ? k ? p; is deflned as F(xk) = limx1;???;xk?1;xk+1;???;xp!1F(x1;???;xp): The p?r.v?s. are said to be independent if (1.12) can be written as F(x1;???;xp) = FX1(x1)???FXp(xp); and then E(X1???Xp) = E(X1)???E(Xp) provided that all the expectations exist. In addi- tion, the r.v.?s fgi(Xi);1 ? i ? pg are independent. It is also known that uncorrelated r.v.?s can be deduced from independence, but uncorrelated does not always imply independence. Another concept that is useful is that of conditional distributions. Deflnition 1.6 Suppose (X1;X2;???; Xp)0 is a random vector. The conditional distribu- tion of (X1;X2;???;Xk)0; with 1 ? k ? p; given (Xk+1;???;Xp)0; is given by the ratio of the joint distribution of all the r.v.?s over the joint marginal distribution of the conditioned 21 r.v.?s. It is given by: fX1;X2;???;XkjXk+1;???;Xp(x1;x2;???;xkjxk+1;???;xp) = fX1;X2;???;Xp(x1;x2;???;xp)f Xk+1;???;Xp(xk+1;???;xp) ; provided that fXk+1;???;Xp(xk+1;???;xp) > 0; where f could be a pdf or a pmf. Deflnition 1.7 The p?variate vector X = (X1;X2;???;Xp)0 is said to be conditionally independent, conditioned on a variable, say X0; if f(x1;x2;???;xpjx0) = f1(x1jx0)f2(x2jx0)???fp(xpjx0): (1.13) where fi(?jx0) represents the conditional pdf of Xi; i = 1;???;p given X0: (1.13) is equivalent to F(x1;x2;???;xpjx0) = F1(x1jx0)F2(x2jx0)???Fp(xpjx0) where Fi(?jx0) represents the conditional cdf of Xi; for i = 1;???;p given X0: Theorem 1.8 The joint density of (X0;X1;:::;Xp) satisfying (1.1) is expressed as: f(x0;x1;x2;???;xp) = f(x0) pY i=1 fZi(xi ?aix0): and (X1;X2;???;Xp)0 are conditionally independent given the latent variable X0. Proof: Using the independence of Zi?s, i = 1;::;p; of each other and of X0, we have that the joint density of X0;Z1;Z2;???;Zp ; given as: f(x0;z1;z2;???;zp) = f(x0)f(z1)f(z2)???f(zp). Hence based on the linear transformation Xi = aiX0 +Zi () Zi = Xi?aiX0; for i = 1;2;::;p, we have that: f(x0;x1;x2;???;xp) = f(x0)f(x1 ?a1x0)???f(xp ?apx0) 22 = f(x0) pY i=1 fZi(xi ?aix0): From Deflnition 1.7, (X1;X2;???;Xp)0 given X0 are independent as the conditional distribution of (X1;X2;???;Xp)0 given X0 is: f(x1;x2;???;xpjx0) = f(x0;x1;x2;???;xp)f(x 0) = fZ1(x1 ?a1x0)fZ2(x2 ?a2x0)???fZp(xp ?apx0) = pY i=1 fZi(xi ?aix0) = pY i=1 fXi(xijx0): Note that, given x0, aix0 plays the role of a location parameter. So xi ? aix0 for all i = 1;2;???;p. In Chapter 3, we see that when x0 is unknown, a possible estimate for it is the minimum of the xi=ai?s; i = 1;???;p: Deflnition 1.9 : The random vector X = (X1;X2;???;Xp)0 is multivariate gamma if its components Xi; i = 1;???;p; marginally are of gamma family following the structure de- scribed in (1.1), and each subset of X has a multivariate gamma form. 1.6 Thesis Outline In this dissertation, we are concerned with the three parameter gamma distribution. The characterization of the joint gamma distribution of X = (X1;X2;???;Xp) based on (1.1): 23 ? has Mathai and Moschopoulos (1992) [64] as a special case and a more exible linear structure, ? has Iyer et al. (2004) [43] as a special case both in terms of structure and dimension, ? has Nadarajah and Kotz (2005) [72] as a special case, ? and shares similar properties to the Marshall and Olkin (1967) [61] model, that in- cludes the continuous and discontinuous portions. Marshall and Olkin?s expression for the bivariate exponential model has received a lot of attention due to its allowance for the simultaneous failure of several system components induced from a fatal shock. We provide difierent inference procedures for the parameters, and characterize location- scale families of multivariate gamma distributions. The marginal distributions are tradi- tional location-scale gammas, and their joint distribution contains absolutely continuous classes, as well as, the Marshall-Olkin (1967) [61] type of distributions with a positive probability mass on a set of measure zero. MLE?s are developed in the bivariate case. The dissertation is organized as follows. In Chapter 2, we start with preliminary materials and fundamental concepts used in these new classes of survival and reliability models. In Chapter 3, we derive and characterize the generalized multivariate gamma distribution. The continuous case of the multivariate gamma is studied in Chapter 4. In Chapter 5, a discontinuous case of the multivariate gamma, in the form of multivariate exponential, is given. 24 Chapter 2 Preliminaries The necessary mathematical concepts for this dissertation are concentrated in this chapter. We review the notions of Laplace transforms (See Feller (1971) [25], Abramowitz and Stegun (1972) [1], and Billingsley (1986) [6]) and Dirac delta (See Cohen-Tannoudji et al. (1977), and Khuri (2004) [52]), inflnite and stable distributions (See Feller (1971) [25], and Hougaard (2000) [36]). 2.1 The Laplace Transform The Laplace transform (the equivalent concept of moment generating function) is used in many areas of statistics, probability theory, and risk analysis. We are interested in the Laplace transform associated with real arguments. The characterization by the means of the real Laplace transform does not presume the existence of moments Deflnition 2.1 If X is a r.v. deflned on R+ with cdf FX, satisfying P(X = 0) < 1, then its Laplace-Stieltjes transform (LST) is the function valued in R deflned in Abramowitz and Stegun (1972) [1] as: LX(s) = Ee?sX = Z 1 0 e?sxdFX(x): (2.1) Here are some properties associated with the LST: ? existence: the integral in (2.1) is with respect to Lebesgue-Stieltjes integration. In our cases of positive support distributions, (2.1) always exists. In fact, 0 < LX(s) ? 1: ? LX(s) is inflnitely difierentiable, and d nLX dsn (s) exists for all n 2 N: 25 ? For m 2N; the mth moment of X is given by EXm = (?1)mL(m)X (0): ? additivity: the LST of the sum of independent r.v.?s is obtained by taking the product of the LST of the individual r.v. For X1;:::;Xn independent r.v.?s, then X = Pni=1 Xi has LST: LX(s) = Ee?sX = E nY i=1 esXi = nY i=1 EesXi = nY i=1 LXi(s): ? the LST can be deflned for a p?variate distribution, say X = (X1;X2;:::;Xp)0 as: LX(s) = Ee?(s1X1+???+spXp); for s = (s1;s2;???;sp)0 2Rp+: ? uniqueness: if X1 and X2 are two r.v.?s such that LX1(s) = LX2(s) then fX1(x) = fX2(x);for all x except on a set of measure 0. Therefore, the LST completely charac- terizes the distribution. The LST helps in the computations and in the linear combinations of r.v.?s associated with some distributions. Example 2.2 If X ? Ga(?;?;fi) with pdf (1.2), then its LST is given by (1.11), as: LX(s) = Z 1 ? e?sx ? fi ?(fi)(x??) fi?1e??(x??)dx = e?s? Z 1 ? e?s(x??) ? fi ?(fi)(x??) fi?1e??(x??)dx = e?s? Z 1 ? ?fi ?(fi)(x??) fi?1e?(?+s)(x??)dx = e?s? Z 1 0 ?fi ?(fi)t fi?1e?(?+s)tdt = e?s? ? fi (?+s)fi Z 1 0 (?+s)fi ?(fi) t fi?1e?(?+s)tdt = e?s? ? ?+s ?fi : 26 The gamma distribution, shifted at the origin, with unit mean has LST LX(t) = e??t 1 1 +t=fi ?fi ?! e??te?t = e?(?+1)t as fi !1: Example 2.3 : Let X1;X2;???;Xn, n > 1, be independent r.v.?s distributed as Ga(?i;?;fii) for 1 ? i ? n: It is well known that X = X1+X2+???Xn has density Ga( nX i=1 ?i;?; nX i=1 fii): This is easily shown using LST. The Laplace transform of the gamma distribution Xi; i = 1;???;n; given in Example 2.2 becomes: LXi(s) = e?s?i ? ?+s ?fii . Using the properties of the LST, we have LX(s) = nY i=1 LXi(s) = nY i=1 e?s?i ? ?+s ?fii = e?s Pn i=1 ?i ? ?+s ?Pni=1 fii ) X ? Ga( nX i=1 ?i;?; nX i=1 fii): In the remaining cases, we consider situations where the ??s are difierent. When two or more of the gamma distributions have same parameter ?, we can add the r.v.?s to obtain another gamma distribution with the same parameter ?; but a difierent shape parameter. Example 2.4 Suppose X and Y are independent discrete and positive support continuous distributions with pmf and pdf p(x) and f(y), respectively. Then LXY(s) = Ee?sXY = Z 1 0 X x e?sxyf(y)p(x)dy = X x Z 1 0 e?sxyf(y)dy ? p(x) = X x LY(sx)p(x): 27 We later use Example 2.4 in the case when X is a Bernoulli random variable with probability p, and Y is a positive support distribution. In that case, we have that: LXY(s) = p +(1 ?p)LY(s): We are also interested in the sum of r.v.?s. For two continuous r.v.?s X1 ? f1 and X2 ? f2, their sum, X = X1 +X2; has density obtained from the joint pdf as: fX(x) = Z 1 ?1 Z x?x1 ?1 f(x1;x2)dx1dx2: The convolution of two functions f;g :R!R is the function (f ?g)(x) = Z 1 ?1 f(t)g(x?t)dt: If the r.v.?s are independent, the density of their sum is the convolution of their densi- ties, and can be represented as below. Theorem 2.5 Assume that X1 and X2 are independent r.v.?s deflned on R+ with pmf/pdf f1 and f2; respectively. Then X = X1 +X2 has density fX(x) = (f1 ?f2)(x) = Z x 0 f1(t)f2(x?t)dt: Proof: Proved in Hunter and Nachtergaele (2001) [38] However, a lot of work can be alleviated as the LST of X in Theorem 2.5 is given by: LX(t) = LX1(t)LX2(t); and is recognized in some distributional form. Example 2.6 Consider X to be the sum of two iid of exponential type r.v.?s with parame- ter ?: Then fX(x) = Z x 0 f1(t)f2(x ? t)dt = Z x 0 ?e??t?e??(x?t)dt = ?2xe??x; which is a Ga(0;?;2): 28 Theorem 2.7 For X1 ? Ga(?1;?1;fi1) and X2 ? Ga(?2;?2;fi2), the r.v. X = X1 + X2 has a gamma distribution Ga(?1 +?2;?;fi1 +fi2) ifi ?1 = ?2 = ?: Proof: The LST of X is given by LX(s) = e?(?1+?2)s ? fi1 1 ? fi2 2 (?1 +s)fi1(?2 +s)fi2 If ?1 6= ?2, then the LST is not representative of a gamma distribution. Considering the case where ?1 = ?2 = ?, then the LST and the density of the sum X = X1 +X2 become: LX(s) = e?s? ? ?+s ?fi fX(x) = ? fi ?(fi)(x??) fi?1e??(x??) where fi = fi1 +fi2 and ? = ?1 +?2: We generalize this above result later in Theorem 3.3. The sum of independent gamma distributions with same scale parameter gives a gamma distribution with same scale para- meter with the location and scale updated. In that sense, the gamma distribution follows an additive property. Mathai and Moschopoulos (1992) [64] use this property to generate their class of multivariate gamma distributions. 2.2 The Dirac Delta Function Another important collection of tools that we use are the properties of the Dirac dis- tribution, sometimes referred to as the Dirac delta function. See Cohen-Tannoudji et al. (1977) [18], and Khuri (2004) [52]. Deflnition 2.8 Dirac delta function at the point c 2R, is a point mass distribution denoted ?c, and we say that a r.v. X has point mass ?c distribution at c if its pmf is given by f(xjc) = ?c(x) = ?(x?c) = 0 if x 6= c; and Z 1 ?1 f(xjc)dx = 1: (2.2) 29 The integral notation in (2.2) is not mathematically justifled. ? is not rigourously deflned as a function, but as a distribution, and there have been several references on this function. Cohen-Tannoudji et al. (1977) [18] gives a complete discussion. Despite its name, the Dirac delta function is not a function in the classical sense. One reason for this is that because the function f(x) = ?(x); and g(x) = 0 a:e: , are equal almost everywhere, yet their (Lebesgue) integrals are difierent. See Folland (1999) [26] and Rudin (1976) [74]. Another reason is that it is singular. Instead, it is said to be a distribution. It is a generalized idea of functions, and can be used inside integrals. The well known mathematician Laurent Schwartz gave it in 1947 a rigorous mathematical deflnition as a linear functional on the space of test functions D, the set of all real valued inflnitely difierentiable functions with compact support on R such that for a given f(x) 2 D, the value of the functional is given by Kallenberg (1986) [50], and Hunter and Nachtergaele (2001) [38]. Such linear functionals are called generalized functions or distributions. For this reason, the delta function is more appropriately called Dirac?s delta distribution. Thus, the value of the Dirac delta function ?x is deflned by its action of a function f(x) 2 D when used in integral as in formula (2) in Khuri (2004) [52]. The theory of distributions in mathematics has been highly developed, and as a result, the Dirac delta function is well established and accepted in mathematics as a generalized function or distribution. Note also that it has been modifled from the original version deflned by Dirac in 1920. It is well known that the Heaviside step function is an antiderivative of the Dirac distribution. The Heaviside step function, also called unit step function, see for example 30 Abramowitz and Stegun (1972) [1], is a discontinuous function deflned as H(x) = Z x ?1 ?(t)dt = 8> < >: 0; if x ? 0; 1; if x > 0. (2.3) The value of the Heaviside function at 0 is sometimes taken to be 0; or 12 (most popular for symmetry purposes) or 1: Here, we take it to be 0: Both Dirac and Heaviside functions have been used in a variety of flelds of science and engineering. Their use in statistics is relatively new. For another reference see the paper by Pazman and Pronzato (1996) [73]. The Dirac delta function is a very useful tool in approximating tall narrow spike func- tions (also called impulse functions), and the following integralZ ?1 ?1 f(x)?(x)dx = ?[f] = f(0) for any (test) function f(x), is more a notation for convenience, and not a true integral. It can be regarded as an "operator" or a linear functional on the space of (test) functions, which gives the value of the function at 0. It is important to see that the integral is simply a notational convenience, and not a true integral. More details are given in Kallenberg (1986) [50], Williamson (1962) [80], Au and Tam (1999) [3] and Shilov and Gurevich (1977) [75] for examples. So as a distribution, the Dirac delta function ?(x?s) is a pdf with mean median and mode s, cdf H(x?s), variance and skewness 0 satisfying the following: ? Z 1 ?1 ?(fix)dx = Z 1 ?1 ?(u)dujfij = 1jfij ? ?(fix) = ?(x)jfij 31 ? ?(x) = lima!0?a(x) where ?a(x) = 1ap?e?x2=a2 as limit of a normal distribution. To end this review, we note the following results: c > 0; H(cx) = H(x); H(x?a) = 1?H(?x+a) = 1?H(a?x) andZ H(x?a)dx = (x?a)H(x?a). The Dirac delta distribution can be thought as the limit case of a distribution whose density must be concentrated at the origin point. So for a r.v. X with Dirac density ?(x?c);c ? 0, the LST is given by LX(t) = e?ct: The moments for the Dirac delta function ?c are given by: EXk = ck; Var(X) = 0; and its characteristic function is given by `(t) = eitc: The Dirac function provides a very helpful tool in mathematical statistics as it provides a unifying approach in the treatment of discrete and continuous distributions and their transformations. We give two examples in each case below. Example 2.9 Khuri (2004) [52] Suppose X is a discrete r.v. that takes values a1;a2;???;an with corresponding probabili- ties p1;p2;???;pn; respectively. Assume that Pni=1 pi = 1: Then the pdf p(x) of X can be represented as p(x) = Pni=1 pi?(x?ai): For example, if X ? B(n;p) where B(n;p) is the binomial distribution, then p(x) = Pni=0 pi(1?p)n?i?(x?i) and the kth noncentral moment of X is given by Z 1 ?1 xkp(x)dx = nX i=1 pi Z 1 ?1 xk?(x?ai)dx = nX i=1 akipi: 32 Example 2.10 Khuri (2004) [52] Let X1 and X2 be independent r.v. distributed as X1 ? Ga(? = 12;fi) and X2 ? Ga(? = 1 2;fl). The joint distribution of Y = X1 X1+X2 and Z = X1 +X2 can expressed as: f(y;z) = Z 1 ?1 Z 1 ?1 f(x1;x2)?[ x1x 1 +x2 ?y]?(x1 +x2 ?z)dx1dx2 = 1?(fi)?(fl)2fi+fl Z 1 0 xfl?12 dx2 Z 1 0 xfi?11 e?12(x1+x2) ??( x1x 1 +x?2 ?y)?[x1 ?(z ?x2)]dx1 = 1?(fi)?(fl)2fi+fl Z 1 0 xfl?12 (z ?x2)fi?1e?12z?[z ?x2z ?y]dx2 = 1?(fi)?(fl)2fi+flyfi?1(1?y)fl?1zfi+fl?1e?12z: The cumulants are alternative ways of summarizing the properties of the r.v.?s specially when the moments are not easily obtained. The moments of X are not directly related to the moments of aX + b, for a and b positive constants. The moments of X = X1 + X2 do not have simple relation with the moments of X1 and X2: The idea is then to use the cumulants. Deflnition 2.11 The function ?(s) = logLX(?s) is called the cumulant generating func- tion of X. Expanding ?(s) in its power series, ?(s) = 1X i=1 ?i i! s i, gives coe?cients ?i called ith cumulant of the distribution X. The cumulant generating function allows one to relate the cumulants to the moments. It allows us to characterize inflnitely divisible LST?s which is given in Subsection 2.3. See Feller (1971) [25]. 33 2.3 Inflnite Divisibility There are important results obtained from the concept of inflnite divisibility which have many applications in the theory of limit distributions for the sum of independent r.v.?s. In general it is di?cult to determine whether a given distribution is inflnitely divisible or not. We would like to consider what conditions are required for the pdf of the gamma distributions and its mixture to be inflnitely divisible. We also consider the exponential distribution. We flrst give notations and a deflnition. Let the symbol d= denote equality in distribu- tion, and d?! denote convergence in distribution. Deflnition 2.12 Consider a random vector X. Its distribution is said to be inflnitely divis- ible if for every n 2N there exist iid random vectors Xn1;Xn2;???;Xnn with Pk Xnk d= X: In other words, an inflnitely divisible r.v. X has pdf f(x) that can be represented as the sum of an arbitrary number of iid r.v.?s X1;X2;???;Xn, with cdf Fn , that is: X d= X1 +X2 +???+Xn hence the term inflnitely divisible. Borrowing from Billingsley (1985) [6] pp. 383-384, the distribution F of X is the n-fold convolution Fn ?Fn ?????Fn where Fn is the distribution function of Xi, 1 ? i ? n: We use that property of inflnitely divisible to characterize the distribution of the un- known r.v. from a linear relationship as given in (1.1). Two simple examples of inflnitely divisible distributions are the Poisson distribution and the negative Binomial distribution. 34 Deflnition 2.13 : A function ` on the interval I = [0;1) is completely monotone if it possesses derivatives `(n) at all orders which alternate in sign, i.e. if (?1)n`(n)(s) ? 0; for all s in the interior of I, and n = 0;1;2;::: Theorem 2.14 ` is completely monotone ifi ` is the Laplace transform of some measure. Proof: See Feller (1971) [25]. For two real valued functions `1 and `2 that are completely monotone, so is their product and their compositions, when appropriately chosen. It is important to note that any completely monotone probability density function is inflnitely divisible. See Feller (1971) [25]. Moreover, if ` is completely monotone on [0;1) and `(c) = 0 for some c > 0, then ` must be identically zero on [0;1): All Erlang type distributions are inflnitely divisible. The Erlang distribution deflned in (1.5), is inflnitely divisible in the sense that it can be represented as the sum of n independent r.v.?s with a common distribution. (See Feller (1971) [25] pages 176-177). In fact, n need not be an integer as the gamma can be expressed in a canonical form of an inflnitely divisible distribution. Notice that the distribution of the sum of mutually independent gamma r.v.?s with difierent scale parameters, is not gamma, even if they are mutually independent. Rather, it is described as a mixed gamma with mixing shape parameter. Using ideas from Mathai and Moschopoulos (1991) [63] and Mathai and Saxena (1978) [65], we characterize the distribution of the sums in Chapter 4. 35 2.4 Stable Distributions There are many results about the limit theorems. See Kallenberg (2002) [51] and Billingsley (1986) [6]. We study the class of positive stable distributions. The results could also apply to distributions other than gamma. Here we introduce the necessary deflnitions and recall limit laws for the joint distribution of the sum of independent distributions from Feller (1971) [25]. Deflnition 2.15 (Stable distribution, Feller (1971) [25]) Let X0;X1;X2;??? be an iid se- quence of r.v.?s with common distribution F. The r.v. X0 is stable or the distribution F is stable (in the broad sense) if for each n there exist constants cn > 0 and n such that Sn d= cnX0 + n; where Sn = X1 + X2 +???+ Xn and F is not concentrated at one point. We say F is stable in the strict sense if the above holds with n = 0: Stable distributions are a rich class of probability distributions that allow one to gen- eralize the Central Limit Theorem (CLT), replacing the assumption of flnite variance with a less restrictive condition on the inflnite variance. For data with heavy tails, stable distri- butions should flt better than a normal distribution. From Deflnition 2.12 above, the two r.v.?s X0 and Sn are of the same type as fSn(x) = fX x? n cn ? : Stable distributions can be characterized with the concept of domain of attraction. Deflnition 2.16 (Domain of Attraction, Feller (1971) [25]) The distribution of the inde- pendent sequence of r.v.?s X1;X2;??? belongs to the domain of attraction of a r.v. X0 with distribution F, if there exist norming or scaling constants an > 0 and bn such that Sn an ?bn = X1 +X2 +???+Xn an ?bn d?! X 0 as n !1: 36 The above deflnition is difierent from Deflnition 2.12 in the sense that there is not equality in distribution, but convergence in distribution. In fact, Billingsley (1985) [6] page 389, gives a more restrictive version of Deflnition 2.13, where for each n, there exit constants an > 0 and bn such that Sna n ?bn has same distribution as the iid r.v.?s X1;X2;???;Xn , and calls it stable. Also, the distribution of X0 is not necessarily the same as the distribution of the sequence. Finally, this deflnition recaptures the CLT when we take a sequence of r.v.?s with mean ? and variance 2, where an = pn and bn = n3=2? with X0 being the standard normal distribution. The deflnition also states that similar limit theorems are possible for distributions without variance, that is stable distributions. The next result states the equivalence of Deflnition 2.12 and Deflnition 2.13. Theorem 2.17 A r.v. X0 with distribution F possesses a domain of attraction ifi it is stable. Proof: Given in Feller (1971) [25]. Hence the non-degenerate limiting distribution of a sum of gamma distributions must be a member of a stable class. However, gamma distributions are not stable, although they are inflnitely divisible. Stable distributions are inflnitely divisible, but not the opposite. For another example, consider the case of the Poisson distribution. The Poisson distribution, although inflnitely divisible, is not stable (See Billingsley (1986) [6] Section 28 page 389). The class of inflnitely divisible distribution is larger than the class of stable distributions. That leads us to explore other possibilities. 37 Consider now two independent gamma distributions X1 ? Ga(?1;?1;fi1) and X2 ? Ga(?2;?2;fi2). The graph of the joint distribution is given below in Figure 2.1. Figure 2.1: Graph of bivariate gamma distribution We characterize the multivariate gamma distribution in the next chapter. 38 Chapter 3 The Multivariate Gamma In this chapter, we propose and characterize a quite general multivariate gamma dis- tribution, with specifled three parameter gamma marginals of Type III in the Pearson?s system. We introduce some notations and basic concepts of multivariate gamma distribu- tions. Because a multivariate distribution function is completely characterized by its mean, variance and covariance structure, we give these properties. We also propose a series of pa- rameter estimation techniques that can be adapted to special classes of distributions such as those discussed in Chapter 4 (Continuous class) and Chapter 5 (Fatal shock). Multivariate gamma distributions belong in the class of multivariate lifetime distri- butions, since all the components of the random vector have positive support and their marginals follow a typical lifetime distribution. Hougaard (2000) [36] among other authors give many applications of the multivariate lifetime distributions. As he mentioned, the fleld has gaps and is still growing both in theory and application. The bivariate exponential, one simple case of bivariate gamma, plays a fundamental role in survival and reliability analysis because of its adaptation to many practical situations. The complexity of the model, in its notation and estimation technique, increases with the number of components. Within this framework, Moran (1969) [69], among other authors, had also described a form of a bivariate gamma process and given estimation techniques of one parameter using the other variate as a control. It has been di?cult to work with until the development of proportional hazard models or frailty models. See Cox (1972) [20], Ghosh and Gelfand (1998) [30] and Hougaard (2000) [36] to mention a few. However, 39 we do not deal with the proportional hazard model here, since it is primarily used for the univariate survival case. In fact, Cox?s proportional hazard model, sometimes considered as nonparametric, is typically treated as multivariate in only the recurring events situations, such as monitoring times until repeated hospitalizations of the same individual, or repeated failures of a device in a system. Our model is truly multivariate in the sense that the associated parameters are mathematically independent and we can model components with difierent means and variances. 3.1 Properties and Characterization In the next deflnition, we formally deflne our multivariate gamma that was brie y described in (1.1). Deflnition 3.1 : Let X0;X1;:::;Xp gamma r.v.?s with shape parameters fii, scale para- meters ?i and location parameters ?i, i.e Xi ? Ga(?i;?i;fii) for i = 0;1;:::;p as given in (1.2). Let Z1;Z2;:::;Zp be mutually independent r.v.?s satisfying Xi = aiX0 + Zi for some ai > 0 and Zi independent of X0 for i = 1;2;:::;p. We deflne the joint distribution of the random vector X = (X1;X2;:::;Xp)0 as the multivariate gamma distribution for a flxed integer p. Although the density function is not explicitly expressed in Deflnition 3.1, we can study various properties of the multivariate gamma distribution by using the linear relationship given in (1.1), and the fact that Zi; i = 1;::;p; are independent of each other and of X0. Note that the distribution of Zi; i = 1;::;p; has not been specifled. As we will explain later, depending on the choice of the constants ai, the distributions of Zi take on many forms. In fact, in some cases, Zi is a gamma, and in other cases, Zi is a mixture of distributions. 40 The linear correlation between Xi and the latent term X0 is expressed as: ?(Xi;X0) = Cov(Xi;X0)pVar(X i) pVar(X 0) ; where Cov(Xi;X0) = E(X0Xi)?E(X0)E(Xi) is the covariance between X0 and Xi, and Var(X0) and Var(Xi) denote the variances of X0 and Xi; respectively. Note that if the Zi were degenerate, say Zi ? bi; or P(Zi = bi) = 1; bi 2 R+; i = 1;2;::;p; then the linear correlation, a measure of linear dependence, would be perfect. That is Zi ? bi ) Xi = aiX0 + bi, which gives ?(X0;Xi) = ?1; or more precisely ?(Xi;Xj) = sgn(aiaj). The degenerate case is the only case where j?j = 1: This linear relation (1.1) seems to be natural as pointed out and motivated in Carpenter et al. (2006) [12], where properties for the exponential model are given. Similarly, from this structural relationship, we derive the following values of the means, variances, and covariances: E(Xi) = fii? i +?i; (3.1) E(Zi) = E(Xi)?aiE(X0) = (fii? i ?aifi0? 0 )+(?i ?ai?0); (3.2) Var(Xi) = fii?2 i ; (3.3) Var(Zi) = Var(Xi)?a2iVar(X0) = fii?2 i ?a2i fi0?2 0 > 0 () fiifi 0 > a i?i ?0 ?2 ; (3.4) Cov(Xi;Xj) = aiajVar(X0) = aiajfi0?2 0 () ?(Xi;Xj) = aiaj?i?j?2 0 fi0p fiifij; (3.5) Cov(Zi;Zj) = 0; (3.6) for i 6= j; i;j = 1;::;p: 41 It is clear from Deflnition 3.1 and Equation (3.5) that Xi and Xj; i 6= j; i;j = 1;::;p; are positively correlated. This is a restriction in our model, but not unusual in this fleld. See He and Lawless (2005) [33] for more examples of positive correlations in lifetime data. Also, note that (3.1), (3.3), (3.4) and (3.5) can be more concisely expressed in the mean vector of the mean of X: E(X) = fi 1 ?1 +?1; fi2 ?2 +?2;???; fip ?p +?p ?0 ; (3.7) the variance/covariance matrix: ? = 0 BB BB BB BB B@ fi1 ?21 a1a2 fi0 ?20 ??? a1ap fi0 ?20 a2a1fi0?2 0 fi2 ?22 ??? a2ap fi0 ?20 ... ... ... ... apa1fi0?2 0 ??? apap?1fi0?2 0 fip ?2p 1 CC CC CC CC CA = fi0?2 0 0 BB BB BB BB B@ ?20 ?21 fi1 fi0 a1a2 ??? a1ap a2a1 ?20?2 2 fi2 fi0 ??? a2ap ... ... ... ... apa1 ??? apap?1 ?20?2 p fip fi0 1 CC CC CC CC CA ; and the correlation matrix: ? = fi0?2 0 0 BB BB BB BB B@ ?20 fi0 a1a2?1?2p fi1fi2 ??? a1ap?1?pp fi1fip a2a1?2?1p fi1fi2 ?20 fi0 ??? a2ap?2?pp fi2fip ... ... ... ... apa1?p?1p fipfi1 ??? apap?1?p?p?1p fipfip?1 ?20 fi0 1 CC CC CC CC CA : For Xi ? Ga(?i;?i;fii) for i = 0;1;???;p, based on the LST given in (2.1) and using the binomial theorem, we have the following: dmLXi(s) dsm = ? fi mX k=0 m k ??dk(? i +s)?fi dsk ?dm?ke??is dsm?k ; for m 2N: Setting s = 0, we have that 42 E(Xmi ) = mX k=0 m k ? ?fi+ki ?m?ki (fii)k; where (fi)k = fi(fi+1)???(fi+k?1): The moments and cross-moments are given for m and l 2N; by: E(Xmi ) = E(aiX0 +Zi)m = mX r=0 m r ? ariEXr0EZm?ri = mX r=0 m r ? EZm?ri ari rX k=0 r k ? ?fi+ki ?r?ki (fii)k and E(Xmi Xlj) = E(aiX0 +Zi)m(ajX0 +Zj)l = E mX r=0 m r ? ariXr0Zm?ri lX s=0 l s ? asjXs0Zl?sj = mX r=0 lX s=0 m r ? l s ? ariasjEXr+s0 EZm?ri EZl?sj ; i 6= j; i;j = 1;2;:::;p: The LST has many helpful properties as discussed in Section 2.1. In particular, in this dissertation, we use it for many of the derivations of the distributions of the latent variables. The multivariate LST for any p-variate random vector X deflned in (1.1) is: LX(s) = Ee?s?X = Ee? Pp i=1 siXi = Ee? Pp i=1 si(aiX0+Zi) = E pY i=1 e?aisiX0e?siZi = Ee?X0 Pp i=1 aisi pY i=1 Ee?siZi = LX0 pX i=1 aisi ? pY i=1 LZi(si) = LX0 pX i=1 aisi ? pY i=1 LXi(si) LX0(aisi) ; 43 for s = (s1;s2;:::;sp)0; regardless of the underlying distributions as long as the LST exists. If X0 ? Ga(?0;?0;fi0); then the above becomes LX(s) = e??0 Pp i=1 aisi ? 0 ?0 +Ppi=1 aisi ?fi0 pY i=1 LXi(si) LX0(aisi): (3.8) Note that from the above, the LST of Zi is: LZi(s) = LXi(s)L X0(ais) ; for i = 1;2;::;p: (3.9) As long as LZi in Equation (3.9) is completely monotone as in Deflnition 2.13, the conditions of Deflnition 3.1 are satisfled, and the marginals are gamma. Equation (3.9) will be used extensively throughout this dissertation, but for now we present an important theorem that gives some properties on the scale and shift transformations of the multivariate gamma distribution given in Deflnition 3.1. Theorem 3.2 The class of multivariate gamma distributions based on Deflnition 3.1 is closed under scale transformation in the sense of products with diagonal matrices with pos- itive entries, shift transformations, and under flnite independent convolutions. Proof: Consider a multivariate gamma X = (X1;???;Xp)0, and let C = (ci)i=1;::;p be a diagonal matrix with positive entries. Set Y = CX. Then we have that LY(s) = Ee?s?Y = Ee? Pp i=1 siYi = Ee? Pp i=1 cisi(aiX0+Zi) = E pY i=1 e?aicisiX0e?cisiZi = Ee?X0 Pp i=1 aicisi pY i=1 Ee?cisiZi = LX0( pX i=1 aicisi) pY i=1 LZi(cisi) 44 = e??0 Pp i=1 aicisi ? 0 ?0 +Ppi=1 aicisi ?fi0 pY i=1 LXi(cisi) LX0(aicisi); for s = (s1;s2;:::;sp); which is of the same form as in (3.8). The mean EY = CEX and the variance is expressed as: Var(Y) = C0Var(X)C: Let d = (d1;d2;???;dp) and W deflned as: W = (W1;W2;???;Wp)0 := X + d = (X1 +d1;X2 +d2;???;Xp +dp). Then, we have the following: LW(s) = Ee?s?W = Ee? Pp i=1 siWi = Ee? Pp i=1 si(Xi+di) = Ee? Pp i=1 sidie? Pp i=1 si(aiX0+Zi) = Ee? Pp i=1 sidi pY i=1 e?aisiX0e?siZi = e? Pp i=1 sidiEe?X0 Pp i=1 aisi pY i=1 Ee?siZi = e? Pp i=1 sidiLX0( pX i=1 aisi) pY i=1 LZi(si) = e? Pp i=1 sidie??0 Pp i=1 aisi ? 0 ?0 +Ppi=1 aisi ?fi0 pY i=1 LXi(si) LX0(aisi) = e? Pp i=1(di+?0ai)si ? 0 ?0 +Ppi=1 aisi ?fi0 pY i=1 LXi(si) LX0(aisi); which is of the same form as in (3.8). So the multivariate gamma is closed under shift transformation of the form W = X+d. Consider now n independent p-variate gamma distributions Xi=(Xi1;Xi2;???;Xip)0 for i = 1;???;n; and set X = X1 +???+Xn= (X11 +???+Xn1;X12 +???+Xn2;???;X1p +???+Xnp)0. LX(s) = Ee?s0(X1+???+Xn) = Ee? Pp i=1 si(X1i+???+Xni) 45 = pY i=1 Ee?si(X1i+???+Xni) = LX1(s)???LXn(s); which has the form of n product of (3.8). Recall that the distribution of Zi has not been specifled, but its LST is given in (3.9). In the next theorem, we state a very interesting property, as it gives a necessary and su?cient condition for all the latent variables Zi to be of the gamma type distributions (1.2). Theorem 3.3 Suppose that for some ai > 0; i=1,...,p, Xi ? Ga(?i;fii) i=0,1,...,p, are related as in Deflnition 3.1 with Zi; i = 1;:::;p; then Zi ? Ga(?i;fii ?fi0) () ai = ?0? i and pi = P(Zi = 0) = 0: Proof: From the expression of ai = ?0? i and the result in (1.10), we have that aiX0 ? Ga(?i;fi0): Since Xi ? Ga(?i;fii); it follows from Example 2.3 (the additive property of the gamma distribution) that Z must be distributed as Zi ? Ga(?i;fii ?fi0): Conversely, assume that Zi ? Ga(?i;fii?fi0). Then from Example 2.2 and the independence between X0 and Zi, we have that: LXi(t) = LX0(ait)LZi(t) =) LX0(ait) = ? 0 ?0 +ait ?fi0 = ? i ?i +t ?fii ? i ?i +t ?fi0?fii : Hence ?0? 0 +ait = ?i? i +t ; and solving for ai gives ?0 +ait ?0 = ?i +t ?i =) 1+ ait ?0 = 1+ t ?i or ai = ?0 ?i : Mathai and Moschopoulos (1991) [63] among other authors made use of this particular coe?cientoflinearrelationshipbetweentwogammadistributions, anddevelopedproperties, 46 approximations, and inequalities based on that particular model. In this dissertation, we allow for more general representations. In Chapter 4, we examine further this speciflc form, and call it the continuous class of distributions. In all other cases where the conditions of Theorem 3.3 are not satisfled, we call that form the \fatal shock" class, because there is a positive probability of simultaneous or proportional occurrences. In the Section 3.2, we focus on the characterization of the distribution of Zi; i = 1;2;::;p in the fatal shock class. This last type of distributions is further studied in Chapter 5. All the Zi?s take on similar structures. To simplify the notations, without lost of generality, we drop the indices, and we rewrite (1.1) as: Y = aX +Z; for some a > 0; (3.10) where Y;X;a and Z in (3.10) play the role of Xi;X0;ai and Zi in (1.1) respectively. A simple computation shows that the cdf of Z is: P(Z ? z) = P(Y ?aX ? z) = P(Y = aX)+(1?p)P(X ? Y ?za ) = P(Y = aX)+(1?p) ? 1?P(X ? Y ?za ) ? = P(Y = aX)+(1?p) ? 1? Z 1 z FX(y ?za )fY (y)dy ? : Since Z 1 z FX(y ?za )fY (y)dy = FX(y ?za )FY (y) ?1 z ? Z 1 z 1 afX( y ?z a )FY (y)dy 47 = 1? Z 1 z 1 afX( y ?z a )FY (y)dy; the cdf can be further simplifled as P(Z ? z) = P(Y = aX)+ 1?pa Z 1 z fX(y ?za )FY (y)dy; where P(Y = aX) = P(Z = 0) is the probability of proportional occurrence. Note that if the conditions of Theorem 3.3 are not satisfled, then P(Y = aX) = P(Z = 0) > 0: In Section 3.2, we further characterize the distribution of Z: 3.2 The Fatal Shock Model In this section, we study the form of the distribution of Z obtained in (3.10) for a general a > 0 where a 6= ?0? i . Suppose that X and Y are distributed as X ? Ga(?;?;fi) and Ga(?0;fi0;?0), respec- tively. Assuming a linear relationship as in (3.10), and from Equation (3.9), where a is some positive constant, and Z is independent of X, the LST of Z becomes: LZ(s) = LY(s)L X(as) = e(???0)s ?0 ?0 +s ?fi0 ?+as ? ?fi : All distributions Z satisfying this relation (3.10) for all a 2 (0;1); are called members of the self decomposable r.v.?s. See also Gaver and Lewis (1980) [29]. Previously, we have studied the properties of the r.v. Z, derive its LST in (3.9), mean in (3.2), variance and covariance in (3.4) and (3.6), respectively. However, the form of the distribution of Z has not been given. In the next theorems, we give the required forms that 48 the distribution of Z must take on to give gamma marginals. We state the main theorems, and start with the Erlang case deflned in (3.10). Theorem 3.4 Erlang type distributions. Let X ? Ga(?;?;fi) and Y ? Ga(?0;?0;fi0) such that fi;fi0 2N;fi ? fi0 and as in (3.10), Z is independent of X. Then Z is the shifted sum of independent: ? fi r.v.?s, each being product of Bernoulli and exponential distribution with parameter ?0; ? and of fi0 ?fi gamma r.v.?s with scale ?0 and shape k = 1;::;fi0 ?fi: Proof: Using the Laplace transform technique, we have: LZ(s) = LY(s)L X(as) = e?(???0)s? 0fi0 ?fi (?+as)fi (?0 +s)fi0 = e?(???)s? 0fi0 ?fi (?+as)fi (?0 +s)fi ? 1 (?0 +s)fi0?fi ? By the theorem of expansion in partial fractions, with fi instead of fi0 ?fi, 1 (?0 +s)fi = fiX k=1 Ck (?0 +s)k where Ck are real numbers. Hence: LZ(s) = e?(???0)s ? 0fi0 ?fi ? p +(1 ?p)LY(s) ?fi fi0?fiX k=1 Ck ?0k ?0 ?0 +s ?k : where Ck; k = 1;2;:::;fi0 ?fi; are constant and p = a ??0: Based on the properties of the LST from Section 2.1, we have that ? p + (1 ? p)LY(s) ? is the product of Bernoulli and exponential r.v.?s as discussed in Example 2.4 and in Carpenter et al. (2006) [12]. So? p+(1?p)LY(s) ?fi is the product of fi independent of those terms. 49 Also, sincetheLSTofasumoffunctionsisthesumoftheirLST,then fi0?fiX k=1 Ck ?0k ?0 ?0 +s ?k is the LST of the sum of fi0 ?fi gamma distributions. The special case of the exponential where fi = fi0 = 1 has been studied in Carpenter et al. (2006) [12], and by Iyer et al. (2002) [42]. A similar result in the gamma context has been given in Carpenter and Diawara (2006) [11]. Theorem 3.5 Suppose X ? Ga(?;?;fi), with fi 2N; and Y ? Ga(?0;?0;fi0) as in (3.10), and fi ? fi0. Then Z is the sum of two r.v.?s: ? a Ga(???0;?0;fi0 ?fi) and ? another distribution which is the sum of fi independent r.v.?s each of which is a product of a Bernoulli and an exponential with parameter ?0. Proof: From the expression of LZ, LZ(s) = ??0 ? (?+as) (?0 +s) ?fi e(???0)s ? ?0 ?0 +s ?fi0?fi ; the result follows. Also, assuming Z = X2 ?aX1, and using the Taylor approximation of log(1+s) = 1X n=0 (?1)n+1 (n+1)!s n+1; the cumulants of Z are obtained as logLZ(s) = logLX2(s)?logLX1(as) = fi1 log(1? as? 1 )?fi2 log(1? s? 2 ) 50 = fi1 1X n=0 (?1)n+1 (n+1)!( as ?1) n+1 ?fi2 1X n=0 (?1)n+1 (n+1)!( s ?2) n+1 = 1X n=0 (?1)n+1 (n+1)!s n+1 ? fi1( a? 1 )n+1 ?fi2( 1? 2 )n+1 ? Another characterization of Z is obtained by considering several cases for the Ga(?;?;fi) and Ga(?0;?0;fi0). Corollary 3.6 Let fi = fi0 2N, then Z is a shifted sum of fi independent r.v.?s, each being a product of a Bernoulli with mean (1?p) and an exponential with parameter ?0, where the value of the shift is ???0. Proof: fi 2N. Then the LST of Z can be expressed as: LZ(s) = e(???0)s ??0 ? (?+as) (?0 +s) ?fi = e(???0)s ? (1?p)( ? 0 ?0 +s)+p ?fi : Theorem 3.7 Suppose X ? Ga(?;?;fi), and Y ? Ga(?0;?0;fi0) as in (3.10), with fi < fi0, and assume that [fi+1] ? fi0. Then Z is is the sum of ? [fi+1] independent r.v.?s, each being a product of a Bernoulli with mean (1?p) and an exponential with parameter ?0 ? Ga(???0;?0;fi0 ?[fi+1]) ? Ga(0;?=a;[fi+1]?fi) 51 Proof: LZ(s) = ??0 ? ?+as ?0 +s ?[fi+1] e(???0)s ? ?0 ?0 +s ?fi0?[fi+1]??+as ? ?fi?[fi+1] and notice that: ? ?+as ? ?fi?[fi+1] = ? ? ?+as ?[fi+1]?fi : In fact, instead of assuming [fi+1] ? fi0, it su?ces to assume that [fi+?] ? fi0 for some ? 2 [0;1]: Theorem 3.8 Suppose X ? Ga(?;?;fi), and Y ? Ga(?0;?0;fi0) as in (3.10), with fi = fi0 =2N, then Z does not have the LST of some measure. Proof: fi =2N The explicit formula in terms of the LST is LZ(s) = ??0 ? (?+as) (?0 +s) ?[fi] ? ? ?0 ?0 +s ?fi?[fi] ?e(???0)s ??+as ? ?fi?[fi] The question we have is how to interpret ??+as ? ?fi?[fi] : However, set `(s) = ??+as ? ?fl ; where fl = fi?[fi] 2 [0;1]: `(1)(s) = fla? ??+as ? ?fl?1 `(2)(s) = fl(fl ?1) a ? ?2??+as ? ?fl?2 ... `(n)(s) = fl(fl ?1)???(fl ?n+1) a ? ?n??+as ? ?fl?n ; 52 So `(0) = 1, and has derivatives that alternate in sign. However, by Theorem 2.14, from Feller (1971) [25], (?1)n`(s) ? 0; for all n = 0;1;2;::: ` is not completely monotone. We have that ` is not the Laplace transform of some probability distribution. Theorem 3.9 Let X ? Ga(?;?;fi) and Y ? Ga(?0;?0;fi0) such that fi 2N; fi0 2R+; fi0 ? fi and as in (3.10). Then Z has LST of some measure ifi fi = fi0. Proof: LZ(s) = LY(s)L X(as) = e(???0)s ?0 ?0 +s ?fi0 ?+as ? ?fi = e(???0)s ??0 ? (?+as) (?0 +s) ?fi??+as ? ?fi?fi0 : The expression ??+as ? ?fi?fi0 is completely monotone ifi fi ? fi0. Since fi0 ? fi must be satisfled also, then fi = fi0: We have seen that the distribution of Z takes on many difierent forms depending on the parameters that are given. In Theorem 3.3, we found the only case where the latent variable is of continuous gamma type as the marginals. This case is further studied in Chapter 4. The results from Theorem 3.4 up to Theorem 3.9 give the various forms that the distribution of Z can take. Theorem 3.4 and Theorem 3.5 are similar, as they express the distribution of Z as a sum of product of Bernoulli and exponential, and that of gamma r.v.?s. The special case, given in the Corollary 3.6, states that the Erlang case where fi = fi0 2 N; leads to just the product of fi Bernoulli and exponential. These results give alternative ways of characterizing the distribution of the latent variables, an opportunity to replace the unconditional expectation, density by the unconditional expectation, density. That allow to increase the e?ciency in the variance covariance matrix of parameters. In all those cases, 53 fi ? fi0 is a required condition for the model validation. The Theorem 3.7 shows that there must be an integer value between fi and fi0 to justify the gamma marginals for Y and X in the Equation (3.10). If there is no such integer value between fi and fi0, then Z does not have the LST of some measure. It is not be represented by a distributional form. Theorem 3.9 conflrms that flnding in the sense that it gives back the result obtained in Corollary 3.6. 3.3 The Multivariate Gamma The previous section overviews the possible forms of distribution that Z, based on Equation (3.10), can take. Our primary goal is to obtain the multivariate gamma form of X as in Deflnition 3.1. That means we need to study the distribution ofZ = (Z1;Z2;???;Zp) based on Equation (1.1) and Deflnition 3.1. In Chapter 4, we study the case where the latent variables Zi are all of gamma types as in the result obtained from Theorem 3.3. In that case the probability of proportional occurrence is pi = 0; for i = 1;2;::;p. Many authors have suggested that formulation. Mathai and Moschopoulos (1991) [63] use that form to give a version of multivariate gamma. Theygivepropertiesofsuchmodelintermsofmeans, varianceandcovarianceofparameters, approximations, and inequalities. In this dissertation, we have proposed a general model as in Deflnition 3.1 of a multivariate gamma, that simplifles to the one proposed by Mathai and Moschopoulos (1991) [63], and Mathai and Moschopoulos (1992) [64]. Nadarajah and Kotz (2005) [72] derive a linear combination of exponential and gamma r.v.?s. They assume that there is no probability of simultaneous occurrence between Xi and X0: However, as given in the famous paper by Marshall and Olkin (1969) [61], that is not meaningful in \shock model" conditions. Our model captures the shock model that Marshall and Olkin 54 (1967) [61] studied in the simple case of the multivariate exponential distribution. That lead us to elaborate on the multivariate exponential distribution, and the results are given in Chapter 5. When ai = ?0? i ; ;i = 1;2;::;p in Equation (1.1), then X reduces to multivariate gamma proposedbyMathaiandMoschopoulos(1991)[63]. Wegivetheformofthejointdistribution of Z in the next chapter, its interprtation and estimation procedures of its parameters. Iyer et al. (2002) [42] and Iyer et al. (2004) [43] dealt with the case of positive flxed ai?s in the one dimensional case assuming that X0 and Xi are exponential with parameters ?0 and ?i; ;i = 1;2;::;p; respectively. Carpenter et al. (2006) [12] showed that the density of Zi is then expressed as: fZi(z) = pi?(z)+(1?pi)fXi(z)I(z>0); (3.11) where pi = P(Zi = 0) = ai ?i?0; i = 1;2;::;p and ? is the Dirac delta function given in Equation (2.2). The result from Carpenter et al. (2006) [12] is an adaptation of Corollary 3.6, and the multivariate form of the distribution of Z is given in Chapter 5. Based on the multivariate form of Z; we derive the form of the multivariate distribution of X. We then avoid problems that many authors, say Fang and Zhang (1990) [24] and Walker and Stephens (1999) [79] for examples, run into when trying to characterize the multivariate gamma distribution. We have a close form expression, a feature that was di?cult to derive from Henderson and Shimakura (2003) [34]. The expression is not simple, but it is tractable in contrast to what Mathai and Saxena (1978) [65] have suggested in terms of inflnite series. 55 3.4 The Autoregressive Model In this section, we focus on another approach to obtain a model similar to the direct linkage. This is an important extension to the multivariate distribution with correlation structure following the flrst order autoregressive scheme as stated for further research in the conclusion of Minhajuddin et al. (2003) [68]. In some applications, the parameter of interest depends on time. Time series could then be applied to the phenomena which is described linearly as Xt = atXt?1 +Zt for t 2 N: Brockwell and Davis (1996) [10] describe a similar model with some flxed at = a; jatj < 1 and the Zt are iid normal with EZ2t = 2. That gives that E(XtjFt?1) = aXt?1 where Ft?1 is the set of measurable information up to time t?1: Let Xt denote the observation at time t. The sequence of r.v.?s (X1;X2;???) is known as atimeseries. AgraphofXt againstt = 1;2;???isatimeplot, andcanrevealtrends(upward, downward, variant) or seasonal variation. A model for a time series is described by the joint distributions of the r.v.?s (X1;X2;???;Xn), which is P(X1 ? x1;X2 ? x2;???;Xn ? xn) for every integer n = 1;2;::: However, in practice, only partial information (such as mean, variance) may be avail- able. Also, we cannot write the joint density as the product of the marginal densities since independence between the Xi?s is not a workable assumption in our case. The time series model appears in the class of self decomposable distributions. Deflnition 3.10 A r.v. X is called self decomposable if for every 0 < a < 1, there exists a r.v. Z = Z(a) independent of X such that X d= aX +Z: 56 Some desirable properties follow: ? Self decomposable r.v.?s are contained in the class of inflnitely divisible r.v?s. See Feller (1971) [25]. ? The time series version is Xt = aXt?1 +Zt where Zt are independent of each other of of Xs; s < t. ? If a ! 0, then Xt are independent. If a ! 1, then Xt are perfectly dependent. Joe (1997) [44] chapter 8, discusses some special cases. ? If X is exponential with mean 1, then Z is a mixture of a point mass with weight a and the exponential with mean 1. ? If X is an Erlang distribution Ga( ;1), 2 N, then Z is again a mixture Z =8 >< >: 0; with prob. a ; Erlang Ga(j;1); with prob. pj where pj; 1 ? j ? ; are prob. from binomial ( ;1?a). ? If X is Ga( ;1), 2R?N, then ?t = 8> < >: 0; with prob. a ; Ga(j,a); with prob. pj; where pj; j ? 1; are prob. from negative binomial ( ;a), pj = ?( +j)j!?( ) a (1?a)j: The class of linear relations is quite large, and may contain unobservable factors. Among the alternatives, we consider the compound autoregressive model given by X = (Xt)t?0: Since the beginning of time series analysis, one popular model assume that the variable Xt of the process is related to its past development and on a random disturbance sequence 57 Zn which is not related to the past. The process is called a Markov process if the densities of Xt are normally distributed. Deflnition 3.11 The process X is called a compound autoregressive process of order p, denoted CAR(p) if and only if the conditional distribution of Xt given Xt?1 admits the conditional Laplace form: E[e?sXtjXt?1] = e?a1Xt?1?a2Xt?2?????apXt?p+b where ai = ai(s) i = 1;:::;p and b = b(s) are functions of the admissible values s in R: Our goal is to represent the data based on our probability model (1.2) and on this family of model (1.1). Then it becomes possible to estimate parameters, check for goodness of flt, and use our understanding of the mechanism to generate adjustments. We assume that Xt = aXt?1 +Zt;t = 1;2;::: where Zt are independent, and also each Zt is independent of Xt0 for each t0 < t. If Z = fZtgt were normally distributed with mean 0 and variance 2, then the model would be an autoregressive model, denoted AR(1), and the Zt are commonly referred to as white noise. The process would then be stationary linear if jaj < 1: Here, we have that: Xt = aXt?1 +Zt = a(aXt?2 +Zt?1)+Zt = a2Xt?2 +aZt?1 +Zt = ... = atX0 +at?1Z1 +at?2Z2 +???+aZt?1 +Zt = atX0 + t?1X k=0 akZt?k Therefore ?t = EXt = atEX0 +Pt?1k=0 akEZt?k 58 and Var(Xt) = a2tVar(X0)+Pt?1k=0 a2kVar(Zt?k): Also, it follows that: Cov(Xt+k;Xt) = Cov(aXt+k?1 +Zt;Xt) = aCov(Xt+k?1;Xt) = a2Cov(Xt+k?2;Xt) = ??? = akCov(Xt;Xt) = akVar(Xt) The dependence decreases with the lag. That is, as k increases, (Xt+k;Xt) has less depen- dence. We would like the model to be stationary. The process X = (Xn)(n=0;1;2;???) and its k shift, kX = (Xn+k)n=0;1;2;???, share similar similar properties, and we write: X d= kX: We say that X is weakly stationary if it satisfles the following two properties: 1. the mean of Xt does not depend on t 2. for each k = 1;2;???; Cov(Xt+k;Xt) does not depend on t: This weakly stationarity property is achieved only if a = 1, and EZt = 0: Rather than considering only nonnegative values of t, one could consider a shifted sequence on Z: Assume that X0 = aX?1+Z0 and that X?1 = aX?2+Z?1 and so on... It follows that: Xt = atX0 + t?1X k=0 akZt?k = at+1X?1 + tX k=0 akZt?k = ... = 1X k=0 akZt?k: 59 Then Xt = 1X k=0 akZt?k: For the series X to be stationary, i.e its mean and covariance independent of t, the required condition is that of absolute summability of the coe?cients fakgk=0;1;2;:::: i.e. 1X j=0 jakj = 1X j=0 ak < 1: This is achieved if and only if jaj = a < 1: Let us denote the coe?cients ak as `k: Then Xt = P1j=0 `kZt?k: Moreover, if we can express Zt in terms of Xt?s, then we say that the process is in- vertible. This is achieved, and we can express Zt = 1X j=0 ?jXt?j; where ?0 = 1; ?1 = ?a; and ?j = 0 for j > 1: The compound autoregressive process deflned earlier in terms of the Laplace transform with autoregressive path dependence can be expressed as: E[e?sXtjXt?1] = e?asXt?1?sZt and the log of the conditional Laplace transform is a linear function of Xt?1: 3.5 Summary In this chapter, we have generated a general class of multivariate gamma distributions. We call it generalized gamma distribution, not because the marginals are the generalized gamma distribution as deflned in Kotz et al. (2000) [53], but because it generates two very distinct and broad subclasses of distributions: ? Thecontinuousclassmodel, wherethelatentvariableisofthesamegammafamily. We study that class in Chapter 4. This class has no discontinuities in the joint densities. 60 ? The fatal shock model, where all marginal distributions are gamma, but all the latent variables are gamma mixtures, and there are discontinuities due to the probability of simultaneous occurrence. In Chapter 5, we study that model under the exponential case, because of the wide use and applications of the exponential distributions. 61 Chapter 4 The Continuous Case Model This chapter studies the continuous class of the model given in Chapter 3. The latent variables are all continuous, and of speciflc same family gamma type distributions. Our goal is to flnd the joint multivariate density of the latent variables Zi?s deflned in (1.1), assuming that the conditions of Theorem 3.3 are satisfled, that is the Zi?s are independent gamma distributed for i = 1;2;::;p. Nadarajah and Kotz (2005) [72] describe the distribution of the linear combinations of exponential and gamma type r.v.?s. Our approach is difierent as the mariginal distributions are specifled, and the coe?cients ai in (1.1) are chosen such that the Zi?s are of the same family of gamma distributions, for i = 1;2;::;p. Methods of estimations of parameters associated with the latent variables are also proposed. 4.1 Deflnition of Model Mathai and Moschopoulos (1991) [63] considered a multivariate gamma model where the dependence structure is obtained by adding a factored common r.v. to every univariate marginal, exploiting the additive property. In their following paper [64], they consider an additive form of gamma distributions with common scale parameter. In Kotz et al. (2000) [53, page 468], a model in reliability applications is described as a system where a failing component is replaced by an identical one so that the process never stops until the total items in the system fail. The Deflnition 3.1 is in the spirit of the Mathai and Moschopoulos (1991) [63] mul- tivariate gamma with a exible dependence structure for non-symmetric and nonnegative 62 support. It provides a model flt for many situations. A slightly similar idea has been sug- gested by Hougaard (1986) [35], where the class of multivariate lifetime distributions had a dependence created by an unobserved quantity. We next recall the deflnitions and most important properties of the multivariate gamma distribution in the sense of Mathai and Moschopoulos (1991) [63]. The distribution of the sum with any shape and scale parame- ters is also analyzed. Note that X in Deflnition 3.1 does not necessarily follow the multivariate gamma unless we pick appropriate values for the ai; i = 1;2;???;p; as it is proved in Mathai and Moschopoulos (1991) [63] and (1992) [64]. If fi0 ! 0, then X0 ! 0 a:s:, and then the multivariate gamma consists of n indepen- dent gamma distributions. Carpenter et al. (2006) [12] marginally deflned the Xi?s as exponential, and showed that Zi is then a product of a point mass and an exponential distribution. Despite the theoretical universality, the difierence is in the concept. They start from a bivariate distribution, say X1 and X0, and derive the conditional distribution of the associated Z1, whereas in most general situations, one describes the joint distribution of X0 and X1 with Z1 known. Mathai and Moschoupolos (1992) [64] showed that for X0 = Z0;Z1;Z2;???;Zp; mutu- ally independent gamma r.v.?s with parameters such that X0 = Z0 ? Ga(?0;?0;fi0) and Zi ? Ga ?i ? fi0? 0 ?i;?i;fii ? fi0 ? for i = 1;???;p, then Xi satisfying Xi = aiX0 + Zi for ai = ?i?0, are also gamma distributed as Xi ? Ga(?i;?i;fii). This result is now in this dissertation an application of Theorem 3.3. We next give some properties and the form of the multivariate density function. 63 4.2 Properties Some important features, as in Section 4.1, follow from Equation (3.8) and Equation (3.9) of the Laplace transform LX: LX(s) = e??0 Pp i=1 aisi ? 0 ?0 +Ppi=1 aisi ?fi0 pY i=1 LXi(si) LX0(aisi): And then, it is su?cient to explain the nature of LXi(si)L X0(aisi) : We next present two important theorems of this chapter. Theorem 4.1 The joint pdf of the multivariate gamma X = (X1;X2;???;Xp) based on Deflnition 3.1, when Xi ? Ga(?i;?i;fii), i = 0;1;???;p is explicitly given in terms of hypergeometric series ( the Lauricella function of type B) in the special case where Zi;1 ? i ? p, are gamma type. Proof: We follow the ideas in Mathai and Saxena (1978) [65]. Assume that Zi ? Ga(?0i;?0i;fi0i) for i = 1;2;???;p: Using the idea from Mathai and Moschopoulos (1991) [63], the joint density of X0;Z1;???;Zp is given as: f(x0;z1;???;zp) = C(x0 ??0)fi0?1expf??0(x0 ??0)g ? pY i=1 (zi ??0i)fi0i?1expf??i(zi ??0i)g where C = ? fi0 0 ?(fi0) Qp i=1 ?0fi 0i i ?(fi0i) 64 Even in the simplifled case, integrating out x0 is possible only in very special cases as it is noted in Krzanowski and Marriot (1994) [54]. Now putting zi = xi ?aix0, we have that the joint density of (X0;X1;X2;???;Xp) is given by: g(x0;x1;???;xp) = C(x0 ??0)fi0?1expf??0(x0 ??0)g ? pY i=1 (xi ?aix0 ??0i)fi0i?1expf??0i(xi ?aix0 ??0i)g: Set u0 = x0 ??0 and ui +?0 = 1a i (xi ??0i) for 1 ? i ? p: Then the density of can be written as: g(u0;u1;???;up) = C pY i=1 afi0i?1i ufi0?10 expf??0u0)g ? pY i=1 (ui ?u0)fi0i?1expf??0iai(ui ?u0);g where 0 < u0 < minfu1;???;upg The joint density of (u1;???;up) is available by integrating out u0. Hence g(u1;???;up) = C1 Z minfu1;???;upg 0 ufi0?10 expf??0u0)g(u1 ?u0)fi01?1???(up ?u0)fi0p?1 ?expf?[?01a1(u1 ?u0)+???+?0pap(up ?u0)]gdu0 where C1 = C pY i=1 afi0i?1i : 65 It follows that the density depends on the difierent forms of the k! possible orderings of u1;u2;???;up. For example u1 < u2 < ??? < up: We have g(u1;???;up) = C1f pY i=1 ufi0i?1i g Z u1 0 ufi0?10 expf??0u0)g(1? u0u 1 )fi1?1???(1? u0u p )fip?1 ?expf?[?01a1u1(1? u0u 1 )+???+?0papup(1? u0u p )]gdu0: Set y = u0u 1 : Then the above is expressed as: g(u1;???;up) = C1f pY i=1 ufi0i?1i gufi01 Z 1 0 yfi0?1expf??0u1y)g = ?(1?y)fi01?1 ?(1? u1u 2 y)fi02?1???(1? u1u p y)fi0p?1 = ?expf?[?01a1u1(1?y)+?02a2u2(1? u1u 2 y)???+?0papup(1? u1u p y)]gdy: Expanding the exponentials in series form, we obtain: g(u1;???;up) = C1f pY i=1 ufi0i?1i gufi01 ? 1X r0 1X r1 ??? 1X rp (??0u1)r0 r0! (??01a1u1)r1 r1! ??? (??0papup)rp rp! ? Z 1 0 yfi0+r0?1(1?y)fi01+r1?1 ?(1? u1u 2 y)fi02+r2?1???(1? u1u p y)fi0p+rp?1dy = C1f pY i=1 ufi0i?1i gufi01 ? ?(fi0 +r0)?(fi 01 +r1) ?(fi0 +r0 +fi01 +r1) ? 1X r0 1X r1 ??? 1X rp (??0u1)r0 r0! (??01a1u1)r1 r1! ??? (??0papup)rp rp! ?FD(fi0 +r0;fi02 +r2;???;fi0p +rp;fi0 +r0 +fi01 +r1; u1u 2 ;???; u1u p ); where FD is sometimes referred to as the Lauricella function of n variables. 66 The theorem is proved when substituting back xi in terms of ui. Recall that Lauricella FD; or more precisely F(n)D ; is a function of n variables and n+2 parameters deflned by the power series as: F(n)D (a;b1;???;bn;c;x1;???;xn) = 1X m1=0 ??? 1X mn=0 (a)m1+???+m2(b1)m1 ???(bn)mn (c)m1+???+mnm1!???mn! x m1! 1 ???x mn n : We have a characterization in terms of an inflnite sum of distributions. The question of how to generate variates from this distribution is then raised. Also relevant is how to implement a flt for this distribution. To answer these questions, a further study is then needed. We want to obtain another representation although we can tell by now that there is no unique way of expressing the joint density of a set of correlated gamma marginals. Note from the above theorem, relations between means and variances of Xi and Zi can be obtained. In particular, ?0i = ?i ?ai?0 and fi 0i ?0i = fii ?i ?ai fi0 ?0; fi0i ?0i2 = fii ?2i ?a 2 i fi0 ?20; Hence ?0i = fii ?i ?ai fi0 ?0 fii ?2i ?a 2i fi0 ?20 and fi0i = ( fii ?i ?ai fi0 ?0) 2 fii ?2i ?a 2i fi0 ?20 : 67 However, in order to obtain a better parameter estimates, a more tractable version of the joint density is desired. The notion of frailty is used as a measure of general random efiect. Vaupel et al. (1979) [78] gave a description of frailty as a measure of susceptibility to all causes of death in heterogeneous populations, with applications assuming gamma distributions, based on the advantages of the gamma in the simple forms of its density in Equation (1.2) and its LST in Example 2.2. In the particular case where X1 ? Ga(?1 = 0;?1;fi1 = 1) and X2 ? Ga(?2 = 0;?2;fi2 = 12), the density of the sum X = X1 +X2 can be expressed as: fX(x) = p ?1?2e?(?1+?2)2 xI0[(?2 ??1)2 x]; where I0 is the Bessel function of flrst kind and zero order I0(t) = 1+ t 2 22 + t4 22 ?42 + t6 22 ?42 ?62 +???: The result of the convolution of n independent gammas, is given by Moschopoulos (1985) [70], and is as follows: fX(x) = Qn i=1 ? fii i ?(Pni=1 fii)x Pn?1 i=1 fii?1e??nx 1X k=0 bn(k)(Pn?1i=1 fii) k!(Pni=1 fii)k [(?n ??n?1)x] k where the coe?cients bi are obtained recursively by: bi = 8> < >: 1; i = 2; Pk j=0 bi?1(Pn?2p=1 fip)j(?k)j j!(Pi?1p=1 fip)j C j i ; i = 3;4;???;n, with Ci = ?i?2 ??i?1? i ??i?1 : Moschopoulos (1985) [70], and Furman and Landsman (2005) [28], give an interpretation of the distribution of X as a sum of gamma distributions the location being at the origin. 68 Theorem 4.2 The distribution of the sum X is a mixed gamma with mixing shape para- meter given by fi+K and scale parameter ?max = maxf?1;???;?pg where fi = Ppi=1 fii and K is a non negative integer r.v. with distribution pk = f(k) = Cdk; k = 0;1;2;::: C = Qpi=1 ( ?i?max)fii and d0 = 1; dk = 1k Pki=1Ppj=1 fij(1? ?j?max)idk?i i.e f(x) = P1k=0 pkfYk(x); Yk ? Ga(?max;fi+k) Proof: See Furman (2005) [28]. In the case of the sum of two gamma X1 ? Ga(?1;fi1) and X2 ? Ga(?2;fi2), the distribution of X = X1 +X2 is given as: f(x) = P1k=0 pkfYk(x); pk = Cdk; C = ? fi1 1 ? fi2 2 ?fi1+fi2max d0 = 1; d1 = fi1(1? ?1?max)+fi2(1? ?2?max) d2 = d12 [fi1(1? ?1?max)+fi2(1? ?2?max)]+ d02 [fi1(1? ?1?max)2 +fi2(1? ?2?max)2] dk = 1k Pki=1Ppj=1 fij(1? ?j?max)idk?i; for k ? 3: Yk ? Ga(?max;fi1 +fi2 +k): The form of the exact distribution Theorem 4.1, and in Theorem 4.2, in an inflnite series, is quite complicated in practical situations. Because the sum involves an inflnite number of distributions, there is a need in developing a mathematically tractable theory and result. We consider some results in the gamma cases, and explain the case of the exponential distribution. 4.3 Estimation Procedures Because of the complexity of the likelihood, no MLE can be done in the usual way in estimating model parameters. Alternatives for the MLE have been suggested by many 69 authors such as Cohen and Whitten (1986) [16] and Cohen and Whitten (1988) [17]. They used method called modifled moments estimators or modifled MLE. Our approach builds the probability criteria from the EM algorithm. It can be used to compute the MLE for incomplete data. Disregarding the missing data is usually not safe. After separating the data into missing and non missing, we start by taking a guess of parameters within a corresponding domain. The idea is to replace the missing value by its expectation or predicted score given the starting guessed parameter values. We use an expected value of the likelihood function, computed using su?cient statistics, in turn computed from the data, and maximize the likelihood function to obtain new parameter estimates. From the famous paper by Dempster et al. (1977) [22], the basic EM theorem states that improving the expected log likelihood leads to an increase in the likelihood itself l( jX) = logL( jX). The data X = (Xobs;Xmis) has 2 parts: an observed part denoted as Xobs and a missing part we denote as Xmis and is assumed to follow a gamma distribution with unknown parameter , i.e. its pdf is expressed as in (1.2). The idea is to flll in the missing values and iterate. Each iteration is done in two steps: the expectation (or E step) and the maximization (or M step). At the expectation or E step, we flnd the conditional expectation of the log likelihood function for the complete data denoted here as X given the observed data say Xobs, i.e. Q( j (k)) = E ? logf(Xj )j (k);Xobs ? : The next step of the EM algorithm, the M-step, is to maximize the Q function; i.e. we flnd the parameter = k+1 maximizing Q( j (k)). The EM algorithm guarantees that the likelihood function is increased by increase of the function Q: 70 The EM algorithm has a generalization called generalized EM (GEM), where in the M step, one flnds only a value such that Q( j (k)) is strictly increasing in k: The density in our example is of multivariate type. Using ideas from Mathai and Moschopoulos (1991) [63], the likelihood function can be thought as a sum of p terms. One idea is to maximize each term separately. The data vectors Xj = (x1j;x2j;???;xpj) 2Rp for j = 1;???;n are observed and what are missing (unobserved) is the group of unitsX0 = (x01;x02;???;x0n). We wish to calculate the ML estimates of : There are pn+n = n(p+1) observations, and among them pn are observed and n are missing. Then setting = (?0;?0;fi0;?1;?1;fi1;???;?p;?p;fip)0; a vector of 3(p+1) parameters, the log likelihood is l( jX) = log nY j=1 g(x0j;x1j;???;xpjj ) = nX j=1 log[g(x0j;x1j;???;xpj)j ] = nlogC + nX j=1 log[(x0j ??0)fi0?1expf??0(x0j ??0)g ? pY i=1 (xij ?aix0j ??0i)fi0i?1expf??0i(xij ?aix0j ??0i)g] = nlogC + nX j=1 [(fi0 ?1)log(x0j ??0)??0(x0j ??0) +logf pY i=1 (xij ?aix0j ??0i)g+logf pY i=1 e??0i(xij?aix0j??0i)g] = nlogC +(fi0 ?1) nX j=1 log(x0j ??0)??0 nX j=1 (x0j ??0) + nX j=1 pX i=1 (fi0i ?1)log(xij ?aix0j ??0i)? nX j=1 pX i=1 ?0i(xij ?aix0j ??0i); 71 where C = pY i=0 ?fiii ?(fii) and logC = pX i=0 ? fii log(?i)?log(?(fii)) : The log likelihood is not linear in the data. The sample and arithmetic means are jointly su?cient and complete statistics for the parameters of the data. See Casella and Berger (1990) [13]. They are functions of: nX j=1 xij; and nX j=1 xijxi0j; i;i0 = 1;???;p: We wish to calculate E[x0jjx1j;x2j;???;xpj]; j = 1;???;n EX0j[log(x0j ??0)]; j = 1;2;???;n EX0[xij ?aix0j ??i]; i = 1;2;???;p; j = 1;2;???;n EX0[logfxij ?aix0j ??ig]; i = 1;2;???;p; j = 1;2;???;n Cov(x0jjx1j;x2j;???;xpj; ^?; ^?); j = 1;???;n Using the same idea as the Bayes theorem, which states that f(x0jx) = f(x0;x)f(x) = f(x0)f(xjx0)f(x) ; we have that the conditional density of x0j given x = (x1j;x2j;???;xpj)0 for j = 1;2;???;n; is given by: f(x0jjx1j;x2j;???;xpj) = (x0j ??0) fi0?1e??0(x0j??0)G(x0j;x) Rxmin j? 0 (x0j ??0) fi0?1e??0(x0j??0)G(x0j;x)dx0j where ? the denominator represents the joint distribution of the jth response vector, ? xminj = mini=1;2;???;pfxijai g, 72 ? G(x0j;x) = Qpi=1(xij ?aix0j ??0i)fi0i?1e??0i(xij?aix0j??0i): It is then not possible to just have the conditional density of the observed data. We cannot separate the component likelihood of the known parameter from the variance com- ponent parameter. This complexity of the conditional density of x0j given x1j;x2j;???;xpj for j = 1;2;???;n; (not available in closed or simple form), prevents us from using the EM algorithm. The problem is in the integration. To go around this problem, we could approximate the integral or we could replace x0j by a reasonable estimate. Based on the following relations, xij = aix0j +zij () x0j = xij ?zija i ; for i = 1;2;???;p; and j = 1;2;???;n: Considering the minimum of the x0j is a natural idea one could use, i.e. x0j = mini=1;::;p ?x ij ?zij ai : In other words, the x0j?s are estimable based on the nature of the structure. And hence the parameters ?0;?0;fi0 are then estimated. Now replace zij by E(zij) = fii? i ?aifi0? 0 +?i ?ai?0. Then x0j = mini=1;::;p ?x ij? fii ?i??i ai + fi0 ?0 +?0 ? : So E(x0j) = mini=1;::;p ?x ij ? fii?i ??i ai + fi 0 ?0 +?0 ? : Elog(x0j ??0) = log ? mini=1;::;p ?x ij ? fii?i ??i ai + fi 0 ?0 ?? : Elog(xij ?aix0j ??i) = log ? xij ?ai mini=1;::;p ?x ij ? fii?i ??i ai + fi 0 ?0 +?0 ? ??i ? : 73 Hence, Q( j (k?1)) = E ? l( j (k?1);x) = nlogC +(fi0 ?1) nX j=1 Elog(x0j ??0)??0 nX j=1 E(x0j ??0) + pX i=1 nX j=1 (fii ?1)Elog(xij ?aix0j ??i)? pX i=1 nX j=1 ?iE(xij ?aix0j ??i) = nlogC +(fi0 ?1) nX j=1 log ? mini=1;::;p ?x ij ? fii?i ??i ai + fi0? 0 ? ??0 nX j=1 ? mini=1;::;p ?x ij ? fii?i ??i ai + fi0? 0 ? + pX i=1 nX j=1 (fii ?1)log ? xij ?ai( mini=1;::;p ?x ij ? fii?i ??i ai + fi 0 ?0 +?0 ?? ??i ? ? pX i=1 nX j=1 ?i ? xij ?ai mini=1;::;p ?x ij ? fii?i ??i ai + fi 0 ?0 +?0 ?? ??i ? ; where (k?1) = (fi(k?1)0 ;?(k?1)0 ;?(k?1)0 ;fi(k?1)1 ;?(k?1)1 ;?(k?1)1 ;???;fi(k?1)p ;?(k?1)p ;?(k?1)p )0: Then using maximum likelihood, flnd (k) = (?(k)0 ;?(k)0 ;fi(k)0 ;?(k)1 ;?(k)1 ;fi(k)1 ;???;?(k)p ;?(k)p ;fi(k)p )0 that maximizes Q( j (k?1)): So in here, the parameters are computed iteratively. The following steps are suggested to get the MLE?s of the parameters: ? step 1: update the unknown values x0j for each iteration along with the parameters of interest. ? step 2: flnd estimates for ?0;?0 and fi0: ? step 3: flnd estimates for ?i;?i and fii for 1 ? i ? p: 74 Continue the process at to convergence, when there is no much difierence gained in the next iteration. The vector of parameters = (?0;?0;fi0;?1;?1;fi1;???;?p;?p;fip)0 has two components: = ( 1; 2)0 where 1 = (?0;?0;fi0)0 and 2 = (?1;?1;fi1;???;?p;?p;fip)0 is estimated by MLE for 1 flxed. We continue the iterative process until some stopping criteria is reached. Although the process seems very similar to the EM algorithm, it is not based on the conditional distribution. It seems nevertheless very related to the EM algorithm. The estimation is performed by a generalization of the EM algorithm which Tanner (1996) [77] called the Expectation-Solution (ES) algorithm. For the ES algorithm, the M-step of the EM algorithm is replaced by a step requiring the solution of an estimating equation, which does not necessarily correspond to a maximization problem. The ES algorithm is a general algorithm for solving equations with incomplete data. These authors used this algorithm to flt a mixture model for independent and correlated data. The use of mixture of densities gives more exibility to the model the distribution. With this approach, we have a method to calculate the variance-covariance matrix of the parameters. The Markov Chain Monte Carlo (MCMC) allows the simulation of many complex and multivariate types of random data. It also deals with calculating expectations, flnding conditional expectations for maximizing likelihood in missing data patterns, predicting the missing data based on a normal distribution. Although the assumption of normality is not always a negative aspect (specially when the missing information is not too large), this new estimation technique can provide better answers as the values x0 are not missing at random. To understand a statistical model, one idea would be to simulate many realizations from that model, and study it. For example, consider a r.v. X with pdf f(x) and a 75 function g such that E(g(X)) = Z X g(x)f(x)dx is di?cult to integrate. The integral can be approximated after n realizations, x1;x2;???;xn; of X by the following sum: 1 n nX i=1 g(xi): In our example, the joint density has the form f(x) = f(x0;x1;x2;???;xp), and we are interested in some features of the marginal density. Assume we can sample the p+1 many univariate conditional densities: f(x0jx1;x2;???;xp) f(x1jx0;x2;???;xp) f(x2jx0;x1;x3;???;xp) ... f(xpjx0;x1;x2;???;xp?1): Indeed f(xkjx0;x1;???;xk?1;xk+1;???;xp) = f(x0;x1;???;xp)f(x 0;x1;???;xk?1;xk+1;???;xp) = ? fik k ?(fik)(xk ?akx0) fik?1e??k(xk?aix0) which is a gamma distribution with location akx0 and parameters fi0k and ?0k denoted here as fik and ?k; respectively. So choose arbitrarily p initial values: x1 = x(0)1 ;x2 = x(0)2 ;???;xp = x(0)p : Then create ? x(1)0 by drawing from f(x0jx(0)1 ;???;x(0)p ) ? x(1)1 by drawing from f(x1jx(1)0 ;x(0)2 ;???;x(0)p ) 76 ? x(1)2 by drawing from f(x2jx(1)0 ;x(1)1 ;x(0)3 ;???;x(0)p ) ? ... ? x(1)p by drawing from f(xpjx(1)0 ;x(1)1 ;x(1)2 ;???;x(1)p?1) This is one Gibbs pass through the p+1 conditional densities that gives values (x(1)0 ;x(1)1 ;x(1)2 ;???;x(1)p ): Thus, we can take the last n of x0 values after many Gibbs passes and we set that: E(X0) = 1n m+nX i=m xi0: The Monte Carlo EM algorithm is an important sampling technique to generate random variates to construct Monte Carlo approximations. Assaidearlier, theidentiflcationofparametersisanimportanttopic. Wehaveproposed a novel procedure to identify the parameters which uses ideas of the EM algorithm, with a computation cost not so large. 77 Chapter 5 The Multivariate Exponential The multivariate exponential plays an important role in survival and reliability analysis. Marshall and Olkin (1967) [61], Joe (1997), Ghosh and Gelfand (1998) [30], and Hougaard (2000) [36], to mention a few authors, gave several examples to motivate this research problem. In this chapter, we focus on the multivariate exponential distribution. More speciflcally, weconsiderthep?variateX1;X2;???;andXp beflxedmarginallyasexponential r.v.?s with hazard rates ?1;?2;???; and ?p, respectively. Then by introducing two types of latent non-negative variables, X0 and Z1;Z2;???; and Zp, statistically independent between themselves and of X0, a linear relationship is formed between X0 and X1;X2;???;Xp as in (1.1). Note that this structure was introduced in Chapter 3, and is a special family within the generalized gamma distribution. We focus on the exponential class of distributions because of its importance in the literature. This class is large, and includes the continuous (examined in Chapter 4) and discontinuous cases. We also study the bivariate case approach where X0 is unobserved, and missing, and lay out the joint density function along with the joint survival function. We also developed estimators for the parameters associated with the model. Carpenter et al. (2006) [12] deflned a similar approach at the univariate level. They characterize through Laplace transforms, the distribution of the latent variable in the exponential case as mixture of a point mass at zero and an exponential with hazard rate ?i as shown in Corollary 3.6. Note that when Zi = 0; there is a positive probability that Xi is proportional to X0 with proportionality constant ai, i.e. P(Xi = aiX0) > 0, i = 1;:::;p. 78 We show that our model possesses the property of conditional independence in Def- inition 1.7 and Equation (1.13), given a random latent efiect. Hougaard (2000) [36] calls this random latent efiect frailty. Frailty models are common in describing the dependence. See Vaupel et al. (1979) [78], Hougaard (2000) [36], and Henderson and Shimakura (2003) [34]. Hougaard (1986) [35] and Hougaard (2000) [36] consider the random latent efiect as parameter, and we are treating it as latent variable. As described in Carpenter et al. (2005) [12], the joint distribution is not absolutely continuous. The multivariate survival data of the experiment gives multiple events and involves several members or components in a system. The multivariate lifetime distributions by Hougaard (2000) [36], has a dependence createdbyanunobservablequantity. Thisisatypicalscenarioweadoptherealso. Hougaard (1986) [35] proposed a continuous multivariate lifetime distribution where the marginal distributions are Weibull (continuous) and does not allow the property of simultaneous or proportional failures of individuals or components. 5.1 The Multivariate Exponential In this section, we give our version of the p?variate exponential distribution through Deflnition 5.1. We study its various properties, and characterize its density function. Deflnition 5.1 Let X0;X1;???;Xp be exponential r.v.?s as in (1.3) with scale parameters ?i; 0 ? i ? p: Let Zi; i = 1;::;p; be independent r.v.?s satisfying (1.1). We deflne the joint distribution of X = (X1;X2;???;Xp) as the p?variate exponential distribution. 79 From Equation (3.7), the mean of X is given as E(X) = 1 ?1; 1 ?2;???; 1 ?p ?0 ; (5.1) and the variance/covariance matrix is given as ? = 1?2 0 0 BB BB BB BB B@ ?20 ?21 a1a2 ??? a1ap a2a1 ?20?2 2 ??? a2ap ... ... ... ... apa1 ??? apap?1 ?20?2 p 1 CC CC CC CC CA : (5.2) From (5.2), the correlation matrix is given as ? = 1?2 0 0 BB BB BB BB B@ ?20 a1a2?1?2 ??? a1ap?1?p a2a1?2?1 ?20 ??? a2ap?2?p ... ... ... ... apa1?p?1 ??? apap?1?p?p?1 ?20 1 CC CC CC CC CA : Note that since we consider the exponential case, the shape parameters fii = 1 in (1.2) for i = 1;2;::;p. From Theorem 3.4 and Equation (1.1), the LST of Zi is LZi(s) = pi +(1 ?pi)LXi(s); with pi = ai ?i? 0 ; i = 1;2;::;p: That is Zi is a mixture of Bernoulli r.v. with probability pi and exponential r.v. with parameter ?i as in Example 2.4. 80 From Equation (3.11), the conditional survival function of Xi given X0, is obtained as S(tjx0) = P(Xi > tjx0) = P(aix0 +Zi > tjx0) = P(Zi > t?aix0jx0) = Z 1 t?aix0 fZi(z)dz = Z 1 t?aix0 ? pi?(z)+(1?pi)?ie??iz ? dz = pi Z 1 t?aix0 ?(z)dz +(1?pi) Z 1 t?aix0 e??izdz = pi(1?H(t?aix0))+(1?pi)e??i(t?aix0); i = 1;::;p; where H is the Heaviside function deflned in Equation (2.3). From the conditional independence property and (1.13), the joint conditional survival function is S(x1;???;xpjx0) = pY i=1 ? pi(1?H(xi ?aix0))+(1?pi)e??i(xi?aix0) ? : (5.3) The distribution of the minimum lifetime distribution X(1) = min ?X 1 a1 ; X2 a2 ;???; Xp ap can be derived directly from (5.3) and from properties of the Heaviside function in Equation (2.3). It is given as P(X(1) > tjx0) = pY i=1 P X i ai > tjx0 ? = pY i=1 P(Xi > aitjx0) = pY i=1 ? pi(1?H(xi ?x0))+(1?pi)e??iai(xi?x0) ? : Note that, as we should expect, S(0;0;???;0) = 1 and S(1;1;???;1) = 0: 81 The Equation (5.3) above is quite interesting and could be deduced intuitively from the nature of the conditional distribution. It also captures the method of copula that many authors have suggested. According to that method, the joint survival function of X1;???;Xp is represented as: S(x1;x2;???;xp) = C(P(X1 > x1);P(X2 > x2);???;P(Xp > xp)) = C(SX1(x1);SX2(x2);???;SXp(xp)); where C(u1;u2;???;up) is a copula: a function mixing the univariate survival function u1;u2 and up. For example, Hougaard (2001) [37] proposed the following copula function C at the bivariate level: C(u1;u2) = exp[?f(?lnu1)fl +(?lnu2)flg1=fl] with fl ? 1 coe?cient of association. The case fl = 1 corresponds to the independence between u1 and u2. There are some problems associated with the form of the copula, as it can lead to misrepresentation of the association. The false estimation of fl could then lead to false estimates for the parameters of interest. Taking the derivative of the ith survival function in Equation (5.3), and using Equation (2.3), the conditional density is given by fXijX0(t) = pi?(t?aix0)+(1?pi)?ie??i(t?aix0); i = 1;::;p: (5.4) 82 and from (5.4), the conditional expectation is EXijX0(Xi) = Z 1 0 tfXijX0(t)dt = pi Z 1 0 t?(t?aix0)dt+(1?pi) Z 1 aix0 ?ite??i(t?aix0)dt = piaix0 +(1?pi)( 1? i +aix0) = aix0 +(1?pi) 1? i ; i = 1;::;p: Note that taking the expectation of the above with respect to X0 gives EXi = EX0EXijX0(Xi) = EX0[piaix0 +(1?pi)( 1? i +aix0)] = pi ai? 0 +(1?pi)( 1? i + ai? 0 ) = ai? 0 +(1?pi) 1? i = 1? i ; i = 1;::;p; since pi = ai ?i? 0 ; conflrming earlier results in Equation (5.1) and in Carpenter et al. (2006) [12]. Although the joint density of (X0;X1;X2;???;Xp)0 is easy to flnd (see Equation in proof of Theorem 1.8), the density of (X1;X2;???;Xp)0 is not. However, we can study the density of (X0;X1;X2;???;Xp)0 through the latent variables Z1;Z2;???;Zp; with relative ease. Using the independence of the Zi?s, i = 1;2;::;p, between each other and of X0, and Theorem 1.8, we have that: f(x0;x1;???;xp) = ?0e??0x0 pY i=1 ? pi?(xi ?aix0)?(1?pi)?ie??i(xi?aix0)I(xi>aix0) ? ; (5.5) 83 To demonstrate the complicated form of the above density, we give the explicit form in the case of three r.v.?s X1 = a1X0 +Z1, X2 = a2X0 +Z2, and X3 = a3X0 +Z3. Hence, Equation (5.5) becomes: f(x0;x1;x2;x3) = f(x0)f(x1 ?a1x0)f(x2 ?a2x0)f(x3 ?a3x0) = f(x0)[p1?(x1 ?a1x0)+(1?p1)fX1(x1 ?a1x0)I(x1>a1x0)] [p2?(x2 ?a2x0)+(1?p2)fX2(x2 ?a2x0)I(x2>a2x0)] [p3?(x3 ?a3x0)+(1?p3)fX3(x3 ?a3x0)I(x3>a3x0)] = f(x0)[p1p2p3?(x1 ?a1x0)?(x2 ?a2x0)?(x3 ?a3x0) +p1(1?p2)p3?(x1 ?a1x0)?(x3 ?a3x0)fX2(x2 ?a2x0) +p1p2(1?p3)?(x1 ?a1x0)?(x2 ?a2x0)fX3(x3 ?a3x0) +p1(1?p2)(1?p3)?(x1 ?a1x0)fX2(x2 ?a2x0)fX3(x3 ?a3x0) +(1?p1)p2p3?(x2 ?a2x0)?(x3 ?a3x0)fX1(x1 ?a1x0) +(1?p1)(1?p2)p3?(x3 ?a3x0)fX1(x1 ?a1x0)fX2(x2 ?a2x0) +(1?p1)p2(1?p3)?(x2 ?a2x0)fX1(x1 ?a1x0)fX3(x3 ?a3x0) +(1?p1)(1?p2)(1?p3)fX1(x1 ?a1x0)fX2(x2 ?a2x0)fX3(x3 ?a3x0)]: Coming back to the multivariate form obtained from Equation (5.5), there are 2p terms in the sum expressed from the power set of P; the set of subsets of f1;2;???;pg: Hence, f(x0;x1;???;xp) = pX jAj=k;A2P pA(1) ???pA(k)?(xA(1) ?aA(1)x0)????(xA(k) ?aA(k)x0) ?f(x0)(1?pAc(1))???(1?pAc(p?k)) 84 ?fAc(1)(xAc(1) ?aAc(1)x0)???fAc(p?k)(xAc(p?k) ?aAc(p?k)x0); where A is a subset of P, with k elements, 0 ? k ? p, denoted as A(1);A(2);???;A(k), and we then can deduce the subset Ac = f1;2;:::;pgnA; whose elements are denoted as Ac(1);Ac(2);???;Ac(p?k). Hence, f(x0;x1;???;xp) = pX jAj=k;A2P pA(1) ???pA(k)?(xA(1) ?aA(1)x0)????(xA(k) ?aA(k)x0) ?(1?pAc(1))???(1?pAc(p?k)) ??0?Ac(1) ????Ac(p?k)e??Ac(1)xAc(1) ???e??Ac(p?k)xAc(p?k) ?e?(?0?aAc(1)?Ac(1)?????aAc(p?k)?Ac(p?k))x0: Therefore the joint density of (X1;X2;???;Xp)0 the multivariate exponential is obtained by integrating the above expression with respect to x0, giving f(x1;???;xp) = p?1X jAj=k;A2P pA(1) ???pA(k) aA(1) ???aA(k) ?( xA(1) aA(1) ? xA(k) aA(k) )?( xA(k?1) aA(k?1) ? xA(k) aA(k) ) ?(1?pAc(1))???(1?pAc(p?k)) ??0?Ac(1) ????Ac(p?k)e??Ac(1)xAc(1) ???e??Ac(p?k)xAc(p?k) ?e ?(?0?aAc (1) ?Ac (1) ?????aAc (p?k) ?Ac (p?k) ) xA(1) aA(1) + (1?p1)???(1?pp) ?0?1????p? 0 ?a1?1 ?????ap?p ?e??1x1 ???e??pxp(1?e?(?0?a1?1?????ap?p)?); where ? = mini x i ai ? : 85 Similarly, to get the unconditional survival function, one would derive it from (5.3), and therefore compute S(x1;x2;???;xp) = Z 1 0 pY i=1 S(xijx0)fX0(x0)dx0 = ?0 Z 1 0 pY i=1 ? pi(1?H(xi ?aix0))+(1?pi)e??i(xi?aix0) ? e??0x0dx0: As we can see, we cannot interchange integration and product in this above expression, and then there are substantial number of cases, 2p precisely, to consider in order to flnd the survival function. We discuss the case of p = 2 in the next section and explain each of the components of the survival in that bivariate case. We also obtain estimates of the unknown parameters in that case. 5.2 The Bivariate Exponential Model and MLE In this section, we study the bivariate case of the multivariate exponential distribution of the previous section. The bivariate exponential distribution from Deflnition 5.1 can be expressed as: 8 >< >: X1 = a1X0 +Z1 ; X2 = a2X0 +Z2 . (5.6) Then, from Carpenter et al. (2006) [12], the joint density of (X0;X1) is given by f(x0;x1) = 8 >< >: p1fX0(x0)?(x1 = a1x0) (1?p1)fX0(x0)fX1(x1 ?a1x0)I(x1>a1x0) , 86 = 8 >< >: p1?0e??0x0; if x0 = x1a1 (1?p1)?0?1e??0x0e??1(x1?a1x0); if x0 < x1a1 ; where p1 = a1?1? 0 . Similarly based on the expression X2 = a2X0+Z2, we have for p2 = a2?2? 0 , f(x0;x2) = 8> < >: p2fX0(x0)?(x2 = a2x0) (1?p2)fX0(x0)fX2(x2 ?a2x0)I(x2>a2x0) , = 8> < >: p2?0e??0x0; if x0 = x2a2 (1?p2)?0?2e??0x0e??2(x2?a2x0); if x0 < x2a2: Hence, using the independence between X0;Z1;Z2 and Theorem 1.8, the joint density of (X0;X1;X2) related as in (5.6), is given from (5.5) by: f(x0;x1;x2) = ?0e??0x0 ? p1?(x1 ?a1x0)+(1?p1)fX1(x1 ?a1x0)I(x1>a1x0) ? ? ? p2?(x2 ?a2x0)+(1?p2)fX2(x2 ?a2x0)I(x2>a2x0) ? = p1p2?0e??0x0?(x1?a1x0)?(x2?a2x0) +p1(1?p2)?0e??0x0fX2(x2 ?a2x0)?(x1?a1x0) +(1?p1)p2?0e??0x0fX1(x1 ?a1x0)?(x2?a2x0) +(1?p1)(1?p2)?0e??0x0fX1(x1 ?a1x0)fX2(x2 ?a2x0)I(x1>a1x0;x2>a2x0): The expression f(x0;x1;x2) is one way to obtain an estimate for x0 or the parameter associated with it, ?0: Let?s assume that ? = min x 1 a1; x2 a2 ? = x1a 1 ? x2a 2 . 87 Also, set r(1)i = I(z1i=0) = I(x1=a1x0) = 8 >< >: 1 , if z1i = 0 0 , if z1i 6= 0; and r(2)i = I(z2i=0) = I(x2=a2x0) = 8 >< >: 1 , if z2i = 0 0 , if z2i 6= 0: Then the full likelihood function based on a random sample of size n is the product of n contributed likelihoods and is given as: L(?0;?1;?2) = nY i=1 ? p1p2?0e??0x0i ?r(1)i r(2)i ? ? p1(1?p2)?0?2e?(?0?a2?2)x0ie??2x2i ?(1?r(1)i )r(2)i ? ? (1?p1)p2?0?1e?(?0?a1?1)x0ie??1x1i ?r(1)i (1?r(2)i ) ? ? (1?p1)(1?p2)?0?1?2e?(?0?a1?1?a2?2)x0ie??1x1ie??2x2i ?(1?r(1)i )(1?r(2)i ) = nY i=1 ? a1a2?1?2? 0 e??0x0i ?r(1)i r(2)i ? ? a1?1?2? 0 (?0 ?a2?2)e?(?0?a2?2)x0ie??2x2i ?(1?r(1)i )r(2)i ? ? a2?1?2? 0 (?0 ?a1?1)e?(?0?a1?1)x0ie??1x1i ?r(1)i (1?r(2)i ) ? ?? 1?2 ?0 (?0 ?a1?1)(?2 ?a2?2) ?e?(?0?a1?1?a2?2)x0ie??1x1ie??2x2i ?(1?r(1)i )(1?r(2)i ) = a1a2?1?2? 0 ?Pi r(1)i r(2)i e??0 P i x0ir (1) i r (2) i ? a1a2?1?2? 0 ?Pi(1?r(1)i )r(2)i (?0 ?a2?2) P i(1?r (1) i )e?(?0?a2?2) P i x0i(1?r (1) i )r (2) i 88 ? a1a2?1?2? 0 ?Pi r(1)i (1?r(2)i ) (?0 ?a1?1) P i(1?r (2) i )e?(?0?a1?1) P i x0ir (1) i (1?r (2) i ) ? ? 1?2 ?0 ?Pi(1?r(1)i )(1?r(2)i ) e??1 P i x1i(1?r (2) i )e??2 P i x2i(1?r (1) i ) ?e?(?0?a1?1?a2?2) P i x0i(1?r (1) i )(1?r (2) i ): Hence, L(?0;?1;?2) = a P i r (2) i1 a P i r (1) i2 (?1?2 ?0 ) ne??0Pi x0ir(1)i r(2)i (?0 ?a2?2) P i(1?r (1) i )e?(?0?a2?2) P i x0i(1?r (1) i )r (2) i (?0 ?a1?1) P i(1?r (2) i )e?(?0?a1?1) P i x0ir (1) i (1?r (2) i ) e??1 P i x1i(1?r (2) i )e??2 P i x2i(1?r (1) i ) e?(?0?a1?1?a2?2) P i x0i(1?r (1) i )(1?r (2) i ): Hence the log likelihood is LL(?0;?1;?2) = log(a1) X i r(2)i +log(a2) X i r(1)i +nlog(?1?2? 0 ) +log(?0 ?a1?1) X i (1?r(2)i )+log(?0 ?a2?2) X i (1?r(1)i ) ?(?0 ?a1?1) X x0ir(1)i (1?r(2)i )?(?0 ?a1?1) X x0i(1?r(1)i )r(2)i ??1 X i x1i(1?r(2)i )??2 X i x2i(1?r(1)i ) ?(?0 ?a1?1 ?a2?2) X i x0i(1?r(1)i )(1?r(2)i ); and @LL@? 0 = ? n? 0 + P i(1?r (2) i ) ?0 ?a1?1 + P i(1?r (1) i ) ?0 ?a2?2 89 ? X i x0ir(1)i (1?r(2)i )? X i x0i(1?r(1)i )r(2)i ? X i x0i(1?r(1)i )(1?r(2)i ) = ? n? 0 + P i(1?r (2) i ) ?0 ?a1?1 + P i(1?r (1) i ) ?0 ?a2?2 ? X i x0i(1?r(1)i r(2)i ): Similarly @LL @?1 = n ?1 ?a1 P i(1?r (2) i ) ?0 ?a1?1 +a1 X i x0i(1?r(2)i )? X i x1i(1?r(2)i ) So setting @LL@? 1 = 0 gives P i(1?r (2) i ) ?0 ?a1?1 = n a1?1 + X i x0i(1?r(2)i )? P i x1i(1?r (2) i ) a1 ; and @LL@? 2 = n? 2 ?a2 P i(1?r (1) i ) ?0 ?a2?2 +a2 X i x0i(1?r(1)i )? X i x2i(1?r(1)i ): So setting @LL@? 2 = 0 gives P i(1?r (1) i ) ?0 ?a2?2 = n a2?2 + X i x0i(1?r(1)i )? P i x2i(1?r (1) i ) a2 : Now setting @LL@? 0 = 0, and substituting values for P i(1?r (2) i ) ?0 ?a1?1 and P i(1?r (1) i ) ?0 ?a2?2 gives 1 ?0 = 1 a1?1 + 1 a2?2 ? P ifx2i(1?r (1) i )?a2x0i(1?r (1) i )g a2n 90 ? P ifx1i(1?r (2) i )?a1x0i(1?r (2) i )g a1n ? P i x0i(1?r (1) i r (2) i ) n = 1a 1?1 + 1a 2?2 ? P i z1i(1?r (2) i ) a1n ? P i z2i(1?r (1) i ) a2n ? P i x0i(1?r (1) i r (2) i ) n = 1a 1 ( 1? 1 ? P i z1i(1?r (2) i ) n )+ 1 a2( 1 ?2 ? P i z2i(1?r (1) i ) n )? P i x0i(1?r (1) i r (2) i ) n : The above likelihood equations can be used to estimate ?0;?1 and ?2 if the x0i?s, 1 ? i ? n; were known. We address this question, by developing estimators of these latent terms, and use results that were obtained previously in Section 4.3. It is worth noting that no approximations has been used here in contrast with the technique used in Chapter 3 and Chapter 4. TodeveloptheunconditionalMLE,weintegrateoutx0 fromthejointdensityf(x0;x1;x2); and we have that: f(x1;x2) = Z x0 f(x0;x1;x2)dx0 = p1p2 Z ?0e??0x0?(x1?a1x0)?(x2?a2x0)dx0 +p1(1?p2) Z ?0e??0x0fX2(x2 ?a2x0)I(x2>a2x0)?(x1?a1x0)dx0 +(1?p1)p2 Z ?0e??0x0fX1(x1 ?a1x0)I(x1>a1x0)?(x2?a2x0)dx0 +(1?p1)(1?p2) Z ?0e??0x0fX1(x1 ?a1x0)fX2(x2 ?a2x0)I(x1 a1>x0; x2 a2>x0) dx0 = p1p2PartA1 +p1(1?p2)PartA2 +(1?p1)p2PartA3 +(1?p1)(1?p2)PartA4; where PartA1 = Z ?0e??0x0?(x1?a1x0)?(x2?a2x0)dx0 91 = 1a 1a2 ?0 Z e??0x0?(x1 a1?x0) ?(x2 a2?x0) dx0 = 8> >>>> < >>> >>: 1 a1a2?0e ??0x2a2 ? (x1a1?x2a2) ; or 1 a1a2?0e ??0x1a1 ? (x1a1?x2a2) . PartA2 = Z ?0e??0x0fX2(x2 ?a2x0)I(x2>a2x0)?(x1?a1x0)dx0 = 1a 1 Z ?0e??0x0fX2(x2 ?a2x0)I(x2>a2x0)?(x1 a1?x0) dx0 = 1a 1 ?0e??0 x1 a1 fX 2(x2 ?a2 x1 a1)I(x2>a2x1a1) = 1a 1 ?0?2e??0 x1 a1 e??2(x2?a2 x1 a1)I(x 2>a2 x1 a1) = 1a 1 ?0?2e? x1 a1(?0?a2?2)e??2x2I(x 2>a2 x1 a1) = 1a 1 ?0?2e? x1 a1(?0?a2?2)e??2x2I(x2 a2> x1 a1) = 1a 1 ?0?2e??1x1e??2x2e? x1 a1(?0?a1?1?a2?2)I(x2 a2> x1 a1) = 1a 1 ?0?2e??1x1e??2x2e????I(x2 a2> x1 a1) ; where ? = min x 1 a1; x2 a2 ? and ?? = ?0 ?a1?1 ?a2?2: Similarly, PartA3 = Z ?0e??0x0fX1(x1 ?a1x0)I(x1>a1x0)?(x2?a2x0)dx0 = 1a 2 Z ?0e??0x0fX1(x1 ?a1x0)I(x1>a1x0)?(x2 a2?x0) dx0 = 1a 2 ?0e??0 x2 a2 fX 1(x1 ?a1 x2 a2)I(x1>a1x2a2) 92 = 1a 2 ?0?1e??0 x2 a2 e??1(x1?a1 x2 a2)I(x 1>a1 x2 a2) = 1a 2 ?0?1e? x2 a2(?0?a1?1)e??1x1I(x 1>a1 x2 a2) = 1a 2 ?0?1e? x2 a2(?0?a1?1)e??1x1I(x1 a1> x2 a2) = 1a 2 ?0?1e??1x1e??2x2e? x2 a2(?0?a1?1?a2?2)I(x1 a1> x2 a2) = 1a 2 ?0?1e??1x1e??2x2e????I(x1 a1> x2 a2) ; and PartA4 = Z ?0e??0x0fX1(x1 ?a1x0)fX2(x2 ?a2x0)I(x1>a1x0;x2>a2x0)dx0 = Z ? 0 ?0?1?2e??0x0e??1(x1?a1x0)e??2(x2?a2x0)dx0 = Z ? 0 ?0?1?2e??1x1e??2x2e?(?0?a1?1?a2?2)x0dx0 = ?0?1?2e??1x1e??2x2 Z ? 0 e???x0dx0 = ?0?1?2?? e??1x1e??2x2(1?e????): Hence the expression for the joint density becomes: f(x1;x2) = p1p2 ?0a 1a2 e??0 x2 a2 ?(x1 a1? x2 a2) +p1(1?p2) 1a 1 ?0?2e??1x1e??2x2e????I(x2 a2> x1 a1) +(1?p1)p2 1a 2 ?0?1e??1x1e??2x2e????I(x1 a1> x2 a2) +(1?p1)(1?p2)?0?1?2?? e??1x1e??2x2(1?e????); where ? = min x 1 a1; x2 a2 ? and ?? = ?0 ?a1?1 ?a2?2; with a similar form as in previous section. 93 Based on a random sample of size n denoted (x11;x21);(x12;x22);???;(x1n;x2n); let?s deflne r(1)j = 8 >< >: 1; if x0j = x1ja1 ? x2ja2 ; 0; if x1ja1 > x2ja2 and r(2)j = 8 >< >: 1; if x0j = x2ja2 ? x1ja1 ; 0; if x2ja2 > x1ja1 . Then L(?0;?1;?2) = nY j=1 f(x1j;x2j) = nY j=1 ? p1p2 ?0a 1a2 e??0 x2 a2 ?r(1)j r(2)j ? p1(1?p2) 1a 1 ?0?2e??1x1e??2x2e???? ?r(1)j (1?r(2)j ) ? (1?p1)p2 1a 2 ?0?1e??1x1e??2x2e???? ?(1?r(1)j )r(2)j ? (1?p1)(1?p2)?0?1?2?? e??1x1e??2x2(1?e????) ?(1?r(1)j )(1?r(2)j ) : In order to obtain estimators, the log likelihood is: l(?0;?1;?2) = logL(?0;?1;?2) = X i r(1)i r(2)i ? log(p1p2)+log(?0)?log(a1a2)??0?i ? + r(1)i (1?r(2)i ) ? logp1(1?p2)+log?0 +log?2 ?loga1 ??1x1i ??2x2i ????i ? + (1?r(1)i )r(2)i ? log(1?p1)p2 +log?0 +log?1 ?loga2 ??1x1i ??2x2i ????i ? + (1?r(1)i )(1?r(2)i ) ? log(1?p1)(1?p2)+log?0 +log?1?2 ?log?? ??1x1i ??2x2i +log(1?e????i) ? : 94 Assume that p1 and p2 are flxed known constant. Then @l @?0 = X i r(1)i r(2)i ? 1 ?0 ??i ? +r(1)i (1?r(2)i ) ? 1 ?0 ??i ? +(1?r(1)i )r(2)i ? 1 ?0 ??i ? +(1?r(1)i )(1?r(2)i ) ? 1 ?0 ? 1 ?? ? ?e????i 1?e????i ? = X i ? 1 ?0 ??i r(1)i +r(2)i ?r(1)i r(2)i ? ?(1?r(1)i )(1?r(2)i ) 1 ?? + ?ie????i 1?e????i ?? = n? 0 ? X i ?i r(1)i +r(2)i ?r(1)i r(2)i ? ? X i (1?r(1)i )(1?r(2)i ) 1 ?? + ?ie????i 1?e????i ? : Similarly, @l@? 1 = X i r(1)i (1?r2i)[?x1i +a1?i]+(1?r(1)i )r(2)i [ 1? 1 ?x1i +a1?i] +(1?r(1)i )(1?r(2)i )[ 1? 1 + a1?? ?x1i + ai?ie ????i 1?e????i ] = X i ?x1i(1?r(1)i r(2)i )+a1?i[r(1)i (1?r(2)i )+(1?r(1)i )r(2)i ] + 1? 1 (1?r(1)i )+a1(1?r(1)i )(1?r(2)i )( 1?? + ?ie ????i 1?e????i ): So setting @l@? 1 = 0 and @l@? 0 = 0 gives a1 X i (1?r(1)i )(1?r(2)i ) 1 ?? + ?ie????i 1?e????i ? = X i x1i(1?r(1)i r(2)i )?a1?i ? r(1)i (1?r(2)i )+(1?r(1)i )r(2)i ? ? 1? 1 (1?r(1)i ); 95 and hence n? 0 = 1a 1 X i x1i(1?r(1)i r(2)i )+ X i ?ir(1)i r(2)i ? 1a 1?1 X i (1?r(1)i ); or 1? 0 = 1a 1 X i x1i(1?r(1)i r(2)i ) n + X i ?ir(1)i r(2)i n ? 1 a1?1 X i (1?r(1)i ) n : A similar formula can be obtained by taking @l@? 2 and setting it equal to zero. @l @?2 = X i r(1)i (1?r2i) ? 1 ?2 ?x2i +a2?i ? +(1?r(1)i )r(2)i [?x2i +a2?i] +(1?r(1)i )(1?r(2)i ) ? 1 ?2 + a2 ?? ?x2i + a2?ie????i 1?e????i ? = X i ?x1i(1?r(1)i r(2)i )+a2?i ? r(1)i (1?r(2)i )+(1?r(1)i )r(2)i ? + 1? 2 (1?r(2)i )+a2(1?r(1)i )(1?r(2)i ) 1 ?? + ?ie????i 1?e????i ? : So, a2 X i (1?r(1)i )(1?r(2)i ) 1 ?? + ?ie????i 1?e????i ? = X i x2i(1?r(1)i r(2)i )?a2?i ? r(1)i (1?r(2)i )+(1?r(1)i )r(2)i ? ? 1? 2 (1?r(2)i ) and hence, n? 0 = 1a 2 X i x2i(1?r(1)i r(2)i )+ X i ?ir(1)i r(2)i ? 1a 2?2 X i (1?r(2)i ); or 1? 0 = 1a 2 X i x2i(1?r(1)i r(2)i ) n + X i ?ir(1)i r(2)i n ? 1 a2?2 X i (1?r(2)i ) n : This above equation gives estimate of the parameter associated with the unknown latent variable x0: 96 The distribution of the minimum is also exponential, so the need to characterize its distribution. For X(1) denoting the minimum between X1a 1 and X2a 2 , we have, based on the independence of X1jx0 and X2jx0 that: P(X(1) > tjx0) = P(X1 > a1tjx0)P(X2 > a2tjx0): (5.7) Equation (5.7) is obtained from Theorem 1.8 and the fact that X1jx0 and X2jx0 are independent. But X1jx0 and X2jx0 are not identically distributed. So we cannot use the results in say Dudewicz and Mishra (1988) [23] of the minimal order statistic. Hence, we need to flnd P(Xi > aitjx0) for i = 1;2: P(Xi > aitjx0) = P(aix0 +Zi > aitjx0) = P(Zi > ai(t?x0)jx0) = Z 1 ai(t?x0) fZi(zi)dzi = Z 1 ai(t?x0) ? pi?(zi)+(1?pi)?ie??izi ? dzi = p Z 1 ai(t?x0) ?(zi)dzi +(1?pi)?i Z 1 ai(t?x0) e??izidzi = pi ? 1?H(ai(t?x0)) ? +(1?pi)e?ai?i(t?x0) = pi ? 1?H(t?x0) ? +(1?pi)e?ai?i(t?x0) From the above, we can deduce the conditional density of Xijx0 which is: fXijx0(t) = ? ddtP(Xi > aitjx0) = pi?(t?x0)+(1?pi)?ie?ai?i(t?x0): 97 Our goal is to flnd fX(1)jX0(t) = ? ddtP(X(1) > tjx0). So from Theorem 1.8, the hazard function of the minimum lifetimes is given by: P(X(1) > tjx0) = P(X1 > a1tjx0)P(X2 > a2tjx0) = ? p1(1?H(t?x0))+(1?p1)e?a1?1(t?x0) ? ? p2(1?H(t?x0))+(1?p2)e?a2?2(t?x0) ? = p1p2 1?H(t?x0) ? +p1(1?p2) 1?H(t?x0) ? e?a2?2(t?x0) +(1?p1)p2 1?H(t?x0) ? e?a1?1(t?x0) +(1?p1)(1?p2)e?(a1?1+a2?2)(t?x0): Hence the conditional density of X(1) given X0 is: f(X(1)jx0(t) = ? ddtP(X(1) > tjx0) = p1p2?(t?x0) +p1(1?p2) ? ?(t?x0)e?a2?2(t?x0) +(1?H(t?x0))a2?2e?a2?2(t?x0)| {z } ? +(1?p1)p2 ? ?(t?x0)e?a1?1(t?x0) +(1?H(t?x0))a1?1e?a1?1(t?x0)| {z } ? +(1?p1)(1?p2)(a1?1 +a2?2)e?(a1?1+a2?2)(t?x0) = p1p2?(t?x0) +p1(1?p2)?(t?x0)e?a2?2(t?x0) +(1?p1)p2?(t?x0)e?a1?1(t?x0) 98 +(1?p1)(1?p2)(a1?1 +a2?2)e?(a1?1+a2?2)(t?x0) where the underlined expression is zero by deflnition of the Heaviside function in Equation (2.3). Note that Z 1 ?1 f(X(1)jx0(t)dt = 1: Also, E(X(1)jx0) = Z 1 ?1 tfX(1)jx0(t)dt = p1p2 Z 1 ?1 t?(t?x0)dt +p1(1?p2) Z 1 ?1 t?(t?x0)e?a2?2(t?x0)dt +(1?p1)p2 Z 1 ?1 t?(t?x0)e?a1?1(t?x0)dt +(1?p1)(1?p2) Z 1 ?1 (a1?1 +a2?2)e?(a1?1+a2?2)(t?x0)dt = p1p2x0 +p1(1?p2)x0 +(1?p1)p2x0 +(1?p1)(1?p2) 1 a1?1 +a2?2 +x0 ? = x0 + (1?p1)(1?p2)a 1?1 +a2?2 : So, E(X(1)jx0) = x0 + (1?p1)(1?p2)a 1?1 +a2?2 : Hence, EX(1) = EX0E(X(1)jx0) = E(X0)+E ?(1?p 1)(1?p2) a1?1 +a2?2 ? EX(1) = 1? 0 +E ?(1?p 1)(1?p2) a1?1 +a2?2 ? : Another way to obtain EX(1) using the densities is as follows: fX(1);X0(t;x0) = fX(1)jX0(t)fX0(x0) 99 = p1p2?0e??0x0?(t?x0) +p1(1?p2)?0e??0x0e?a2?2(t?x0)?(t?x0) +(1?p1)p2?0e??0x0e?a1?1(t?x0)?(t?x0) +(1?p1)(1?p2)?0(a1?1 +a2?2)e??0x0e?(a1?1+a2?2)(t?x0)?(t?x0): Hence the form of the density of the minimum order statistic is: fX(1)(t) = Z 1 ?1 fX(1);X0(t;x0)dx0 = p1p2?0 Z 1 ?1 e??0x0?(t?x0)dx0 +p1(1?p2)?0 Z 1 ?1 e??0x0e?a2?2(t?x0)?(t?x0)dx0 +(1?p1)p2?0 Z 1 ?1 e??0x0e?a1?1(t?x0)?(t?x0)dx0 +(1?p1)(1?p2)?0(a1?1 +a2?2) Z t 0 e??0x0e?(a1?1+a2?2)(t?x0)?(t?x0)dx0 = p1p2?0e??0t +p1(1?p2)?0e??0t +(1?p1)p2?0e??0t +(1?p1)(1?p2)?0(a1?1 +a2?2)?? (1?e???t)e?(a1?1+a2?2)t = ?0e??0t +(1?p1)(1?p2)[?0(a1?1 +a2?2)?? (1?e???t)e?(a1?1+a2?2)t ??0e??0t] = ?0e??0t +(1?p1)(1?p2)?0?? ? (a1?1 +a2?2)e?(a1?1+a2?2)t ??0e??0t ? : Hence EX(1) = 1? 0 +(1?p1)(1?p2)?0?? ? 1 a1?1 +a2?2 ? 1 ?0 ? : 100 Recall that: P(Xi > xijx0) = P(aix0 +Zi > xijx0) = P(Zi > xi ?aix0jx0) = Z 1 xi?aix0 fZi(z)dz = Z 1 xi?aix0 ? pi?(z)+(1?pi)?ie??iz ? dz = p Z 1 xi?aix0 ?(z)dz +(1?pi)?i Z 1 xi?aix0 e??izdz = pi ? 1?H(xi ?aix0) ? +(1?pi)e??i(xi?aix0): Hence FXijx0(xi) = P(Xi ? xijx0) = 1?P(Xi > xijx0) = piH(xi ?aix0)+(1?pi) ? 1?e??i(xi?aix0) ? : And, F(x1;x2jx0) = ? p1H(x1 ?a1x0)+(1?p1) ? 1?e??1(x1?a1x0) ? ? ? p2H(x2 ?a2x0)+(1?p2) ? 1?e??2(x2?a2x0) ? = p1p2H(x1 ?a1x0)H(x2 ?a2x0) +p1(1?p2)H(x1 ?a1x0) ? 1?e??2(x2?a2x0) ? +(1?p1)p2H(x2 ?a2x0) ? 1?e??1(x1?a1x0) ? +(1?p1)(1?p2) ? 1?e??1(x1?a1x0) ?? 1?e??2(x2?a2x0) ? : 101 Also, F(x1;1jx0) = limx 2!1 F(x1;x2jx0) = p1p2H(x1 ?a1x0)+p1(1?p2)H(x1 ?a1x0) +(1?p1)p2 ? 1?e??2(x2?a2x0) ? +(1?p1)(1?p2) ? 1?e??1(x1?a1x0) ? = p1H(x1 ?a1x0)+(1?p1) ? 1?e??1(x1?a1x0) ? : Similarly, F(1;x2jx0) = p2H(x2 ?a2x0)+(1?p2) ? 1?e??2(x2?a2x0) ? : Hence S(x1;x2jx0) = 1+p1p2H(x1 ?a1x0)H(x2 ?a2x0) ?p1p2H(x1 ?a1x0)?p1(1?p2)H(x1 ?a1x0)e??2(x2?a2x0) ?p1p2H(x2 ?a2x0)?(1?p1)p2H(x2 ?a2x0)e??1(x1?a1x0) +(1?p1)(1?p2) ? 1?e??1(x1?a1x0) ?? 1?e??2(x2?a2x0) ? ?(1?p1) ? 1?e??1(x1?a1x0) ? ?(1?p2) ? 1?e??2(x2?a2x0) ? : We have proposed a very general bivariate exponential class distributions, which in- cludes all the work considered in Chapter 3 and in Chapter 4. We have described the form of the joint distribution and survival functions. The distribution of the minimum has been characterized. Estimations of the parameters are given based on the likelihood equation. We have done all that retaining the form of the marginal exponential distribution, and the fatal shock idea as in Marshall and Olkin (1967) [61]. These results are not easily generalized to the p-variate gamma distribution case. One reason is that there is no need to approximate the likelihood function in the multivariate 102 exponential in contrast with the results obtained from Chapter 4. In the next section, we examine a simulated example to illustrate our proposed model. 5.3 Simulation Example In this section, we perform a simulation study of the multivariate exponential to exam- ine the properties of various estimators of the parameter from the latent distribution, ?0: We focus on ?0 because it is an important portion of the correlation structure, and all of the parameters associated with X1 through Xp can be easily estimated marginally, which is well documented in the literature. We examine the multivariate exponential given in (1.1) and (1.3) for various dimensions, from p = 2 to p = 5: Based on 10,000 replications of sample size 50 each, of (X1;X2;:::;Xp)0; we choose all ai?s to be 1, ?0 to be 1, and solving for ?i w.r.t. ?; we have that ?i = ?a i ?0 = ?: Separate simulations are done for ? = 0:05;0:10;0:20;0:30;0:40;0:50;0:60;0:70;0:80;0:90 and 0:95: Results for the bias and MSE are presented in Table 5.1 and Table 5.2, respectively. Bias02 and Mse02 represent the bias and MSE for b?0 = 1=?x0, where ?x0 = Pni=1 x0i; if the latent unobservable values, x01;:::;x0n, were actually known. It is important to point out that b?0 is not observable. However, if these values were observable, then b?0 would be MLE and the best unbiased estimator for ?0. Therefore, the performance of b?0 serves as a good benchmark to compare with all other estimators described in this chapter, which are based on approximations. More precisely, if we denote xminp to be the minimum of x1=a1 up to xp=ap for p = 2;::;5; then Bias02 = 1x 0 ??0 and Bias1p = 1x minp ??0; respectively, 103 and the MSE are Mse02 = Bias022 and Mse1p = Bias12p: xdiff represents the difierence between the estimates and the true value of x0: xdiffsq represents the MSE of the difierence. ? Bias02 Bias12 Bias13 Bias14 Bias15 x0 xmin xdiff 0.05 -0.011197 -0.574 -0.39937 -0.28855 -0.17101 0.98249 1.23238 0.24989 0.10 0.009905 -0.421 -0.23330 -0.12258 -0.06381 0.99339 1.08727 0.09389 0.20 0.010300 -0.24054 -0.08718 -0.05341 -0.01757 1.01166 1.03754 0.02588 0.30 0.038332 -0.12648 -0.01558 -0.01427 -0.00735 1.02107 1.02898 0.00791 0.40 0.032919 -0.06926 -0.01205 0.01867 0.00780 1.00860 1.01026 0.00166 0.50 0.014231 -0.04644 0.02828 0.02922 0.01344 1.00713 1.00778 0.00065 0.60 0.002170 -0.03026 -0.01516 0.00538 0.00991 1.01039 1.01066 0.00027 0.70 0.017750 0.00292 0.02846 0.03072 0.03005 0.99122 0.99122 0.00001 0.80 0.003247 -0.00332 -0.01022 0.03713 0.00960 1.01068 1.01068 0.00000 0.90 0.0223014 0.02106 0.01377 0.00629 0.02209 0.99810 0.99810 0.00000 0.95 -0.009249 -0.00944 0.03343 0.00253 0.00857 1.00892 1.00892 0.00000 Table 5.1: Table of Bias and Estimation of x0 for difierent correlations From Table 5.1, when the number of variates increasing, the bias reduces in magnitude. That is something we could expect as the prediction of x0 becomes more accurate with the higher number of variates. In fact, each one gives partial information about x0: Also, as ? increases, the estimate of x0 becomes more e?cient. It is also observable that the bias becomes satisfactory with higher correlation for the 2, 3, 4 and 5 variates. We also present the MSE table of the estimates in Table 5.2. 104 ? Mse02 Mse12 Mse13 Mse14 Mse15 xdiffsq 0.05 0.018739 0.33305 0.16586 0.090732 0.044181 0.45044 0.10 0.019985 0.18484 0.06545 0.030232 0.020013 0.12077 0.20 0.018835 0.07029 0.02061 0.016686 0.019382 0.02934 0.30 0.020357 0.02646 0.01864 0.021683 0.022462 0.00687 0.40 0.020278 0.02136 0.02126 0.020382 0.017916 0.00080 0.50 0.019973 0.01946 0.02365 0.027534 0.022398 0.00022 0.60 0.019373 0.01839 0.01619 0.016020 0.021058 0.00026 0.70 0.031892 0.02974 0.01811 0.029901 0.023984 0.00000 0.80 0.017505 0.01730 0.02042 0.029010 0.020515 0.00000 0.90 0.018066 0.01788 0.01857 0.019959 0.022949 0.00000 0.95 0.015202 0.01517 0.02429 0.027867 0.018026 0.00000 Table 5.2: Table of MSE of x0 for difierent correlations These estimated values show the efiectiveness of the proposed estimation techniques de- veloped. As we see from the Table 5.2 of MSE, the difierence does appear to be consistently small, although the high values of correlations do appear to give lower MSE?s. The algorithm to the proposed estimation is not di?cult to implement, maybe time consuming. We implemented it using SASr program. 105 Chapter 6 Conclusion Inthisdissertation, wedeflnedandcharacterizedanewmultivariategeneralizedlocation- scale family of gamma distributions with potential applications in survival and reliability modeling. This family possesses three-parameter gamma marginals (in most cases) and it contains absolutely continuous classes, as well as, the Marshall Olkin type of distributions with a positive probability mass on a set of measure zero. Interestingly, the variables mak- ing up the multivariate vector were made linearly related indirectly through a collection of latent random variables and the multivariate distribution is not necessarily restricted to those with gamma marginal distributions. Maximum likelihood estimators and estimators based on the EM algorithm were proposed for the unknown parameters, and, in addition, methods were given to estimate the latent terms in the model. We have shown that this distribution is shift invariant, closed under flnite independent convolutions, and closed under scale transformations. We have also shown that our model has as special cases those models proposed by Mathai and Moschopoulos (1991) [63], Iyer et al. (2002) [42], and Iyer et al. (2004) [43], and corrected some of their omissions. The possible implication of this work is enormous. It takes into account the non iid properties of real data. Assuming that ai?s are unknown is their structures will add a lot more applications to the model. Further investigations of the shapes parameters will allow a characterizations of more distributions. The estimation techniques can be reflned, and that will improve a lot in statistical decision approach. Extending the model to include censoring is an attractive option for several applications. 106 Bibliography [1] Abramowitz, MandStegun, I.A.(1972), Handbook of Mathematical Functions, Selected Government Publications, Chapter 6, New York, Dover. [2] Arnold, B.C. (1967), A note on multivariate distributions with specifled marginals, Journal of the American Statistical Association, Vol. 62, pp. 1460-1461. [3] Au, Chi and Tam, Judy (1999), Transforming Variables using the Dirac Generalized Function, The American Statistician, Vol. 53, 3, pp. 270-273. [4] Balakrishnan, N. and Wang, J. (2000), Simple e?cient estimation of the three- parameter gamma distribution, Journal of Statistical Planning and Inference, 85, pp. 115-126. [5] Barlow, R.E. and Proschan, F. (1965), Mathematical Theory of Reliability, John Wiley & Sons, Library of Congress. [6] Billingsley, Patrick (1986), Probability and Measure, Wiley Series in Probability and Mathematical Statistics, 2nd edition. [7] Borovkov, A.A. and Utev, S.A. (1984), On an inequality and a related characterization of the normal distribution, Theory of Prob. Appl. Vol 28, pp. 219-228. [8] Bowman, K.O. and Shenton, L.R. (1988), Properties of estimators of the gamma dis- tribution, New York: Marcel Dekker, Taylor Series. [9] Bowman, K.O. and Shenton, L.R. (2002), Problems with Maximum Likelihood Estima- tion and the 3-Parameter Gamma Distribution, J. Statist. Comput. Simul. Vol.72(5), pp. 391-401. [10] Brockwell, P.J. and Davis, R.A. (1996), Introduction to Time Series and Forecasting, Springer Texts in Statistics. [11] Carpenter, M. and Diawara, N. (2006), A Multivariate Gamma Distribution, and Its Characterizations, Accepted pending revision J. Mathematical and Management Sci- ences. [12] Carpenter, M., Diawara, N. and Han, Yi, (2006), A New Class of Bivariate Weibull Survival Distributions, J. Mathematical and Management Sciences, Vol. 26 (1 & 2), pp. 164-184. [13] Casella, G. and Berger, R.L. (1990), Statistical Inference, Duxbury Press 2nd edition. 107 [14] Chen, L.H.Y. (1982), An inequality for the multivariate normal distribution, J. Multi- variate Anal., Vol 12, pp 306-315. [15] Chen, L.H.Y. (1987), Characterization of probability distributions by Poincare-type in- equalities, Ann. Inst. H. Poincare Sect. B (N.S.) Vol 23, pp. 91-110. [16] Cohen, A.C. and Whitten, B.J. (1986), Modifled moment estimation for the three- parameter gamma distribution, J. Quality Technol. 18, pp. 53-62. [17] Cohen, A.C. and Whitten, B.J. (1988), Parameter Estimation in Reliability and Life Span Models,Dekker, New York. [18] Cohen-Tannoudji, C., Diu, B., and Lalo?e, F. (1977), Quantum Mechanics, Vol. 2, John Wiley & Sons. [19] Coles, S.G. and Tawn, J.A. (1994), Statistical Methods for Multivariate Extremes: An Application to structural Design, Appl. Statist., Vol. 43, pp. 1-48. [20] Cox, D.R. (1972), Regresssion Models and Life Tables, J. Royal Statist. Society, Series B, Vol. 34, pp 187-220. [21] Csorgo, S. and Welsh, A.H. (1989), Testing for Exponential and Marshall-Olkin Dis- tributions, Journal of Statistical Planning and Inferences, 23, 287-300. [22] Dempster, A., Laird, N. and Rubin, D. (1977), Maximum Likelihood from Incomplete Data via the EM algorithm,, Journal of the Royal Statistical Society, Series B, Vol 39 (1), pp 1-38. [23] Dudewicz, E.J. and Mishra, S. (1988), Modern Mathematical Statistics, Wiley Series in Probability and Mathematical Statistics. [24] Fang, K. and Zhang, Y. (1990), Generalized Multivariate Analysis, Springer. [25] Feller, W. (1971), An Introduction to Probability Theory and Its Applications, Second Edition, Wiley Series in Probability and Mathematical Statistics. [26] Folland, G.B. (1999), Real Analysis: Modern Techniques, and Their Applications, Wiley-Interscience Series. [27] Friedl, H. and Kauermann, G. (2000), Standard errors for EM estimates in generalized linear models with random efiects, Biometrics, Vol. 56, pp. 761-767. [28] Furman, E. and Landsman, Z. (2005), Risk capital decomposition for a multivariate dependent gamma portfolio, Insurance: Mathematics and Economics, 2005, Vol. 37, issue 3, pages 635-649. [29] Gaver, D.P., and Lewis, P.A.W., (1980), First-order autoregressive gamma sequences and point processes, Adv. appl. Prob, 12, 727-745. 108 [30] Ghosh, S.K. and Gelfand, A.E. (1998), Latent Waiting Time Models for Bivariate Event Times with Censoring, Sankya, Vol. 60, Series A, pp. 31-47. [31] Grove, D.A. and Coddington, P.D. (2005), Analytic Models of Probability Distributions for MPI Point-to-Point Communication Times on Distributed Memory Parallel Com- puters, Proc. 6th International Conference on Algorithms and Architectures for Parallel Processing, Melbourne, Australia. [32] Hanagal, D.D. (1996), A multivariate Weibull distribution, Economic Quality Control, 11, pp. 193-200. [33] He, W. and Lawless, J.F. (2005), Bivariate location-scale models for regression analysis, with applications to lifetime data, J. R. Statist. Soc. B, Vol. 67, part 1, pp. 63-78. [34] Henderson, R. and Shimakura, S. (2003), A Serial Correlated Gamma Frailty Model for Longitudinal Count Data, Biometrika, 90, 2, pp. 355-366. [35] Hougaard, Philip (1986), A class of multivariate failure time distributions, Biometrika, 73, 3, pp. 671-678. [36] Hougaard, Philip (2000), Analysis of Multivariate Survival Data, Springer. [37] Hougaard, Philip (2001), The normality assumption: Pro and Contra, Biometrika, 73, 3, pp. 671-678. [38] Hunter, J.K., and Nachtergaele, B. (2001), Applied Analysis, River Edge, NJ, World Scientiflc. [39] Hwang, T.Y. and Hu, C.Y. (1999), On a characterization of the gamma distribution: the independence of the sample mean and the sample coe?cient of variation, Ann. Inst. Statist. Math., 51, pp. 749-753. [40] Hwang, T.Y. and Hu, C.Y. (2000), On some characterizations of population distribu- tions, Taiwanese Journal of Mathematics, Vol 4, No 3, pp. 427-437. [41] Hwang, T.Y. and Huang, P.H. (2002), On new moment estimation of parameters of gamma distribution using its characterization, Ann. Inst. Stat. Math., 54, pp. 840-847. [42] Iyer, Srilanth K., Manjunath, D. and Manivasakan, R. (2002), Bivariate Exponential Distributions Using Linear Structures, Sankhya, V.64, Series A, pp. 156-166. [43] Iyer, Srilanth K. and Manjunath, D. (2004), Correlated Bivariate Sequence for queueing and reliability applications, Communications in Statistics, Vol. 33, pp. 331-350. [44] Joe, H. (1997), Multivariate Models and Dependence Concepts, Chapman & Hall. [45] Johnson, N. and Kotz, S. (1970), Continuous Univariate Distributions-1 Wiley, New York. 109 [46] Johnson, N., Kotz, S. and Kemp, A. (1993), Univariate Discrete Distributions, 2nd edition Wiley Series in Probability and Mathematical Statistics. [47] Johnson, N., Kotz, S., Balakrishnan, N. (1997), Discrete Multivariate Distributions, Wiley Series in Probability and Statistics. [48] Johnson, R.A. and Wichern, D.W. (1998), Applied Multivariate Statistical Analysis, Prentice Hall, 4th edition. [49] Jorgensen, B. (1987), Exponential Dispersion Models, J. R. Statist. Soc. B, Vol. 49, No. 2, pp. 127-162. [50] Kallenberg, O. (1986), Random Measures, 4th Edition, Akademie-Verlag Berlin. [51] Kallenberg, O. (2002), Foundations of Modern Probability, 2nd Edition, Springer. [52] Khuri, Andre (2004), Applications of Dirac?s Delta Function in Statistics, Int. J. Math. Educ. Sci. Technol., 35, 2, pp. 185-195. [53] Kotz, S., Balakrishnan, N. and Johnson, N. (2000), Continuous Multivariate Distribu- tions, Volume 1 Wiley Series in Probability and Statistics. [54] Krzanowski, W.J. and Marriot, F.H.C. (1994), Multivariate Analysis: Part 1: Distri- butions, Ordination and Inference, Edward Arnold, Libary of Congress. [55] Lawrence, A.J. (1978), Some Autoregressive Models for Point Processes, Colloquia Mathematica Societatis J?anos Boylai. [56] Lawless, J. F. (2003), Statistical Models and Methods for Lifetime Data, 2nd edition Wiley Series in Probability and Statistics. [57] Lee, Larry (1979), Multivariate Distributions having Weibull Properties, Journal of Multivariate Analysis, 9, pp. 267-277. [58] Lehmann, E.L. (1986), Testing Statistical Hypotheses, 2nd edition, New York, Chapman & Hall. [59] Lu, Jye-Chyi and Bhattacharyya, G.K. (1990), Some New Constructions of Bivariate Weibull Models, Ann. Inst. Statist. Math., 43(3): pp. 543-559. [60] Mardia, K.V. (1970), Measures of multivariate skewness and kurtosis with applications, Biometrika, Vol. 57., pp. 519-530. [61] Marshall, A.W. and Olkin, I. (1967), A Multivariate Exponential Distribution, J. Amer. Stat. Assoc., 63: pp. 30-44. [62] Marshall, A.W. and Olkin, I. (1983), Domains of Attraction of Multivariate Extreme Value Distributions, The Annals of Probability, 111, pp. 168-177. 110 [63] Mathai, A.M. and Moschopoulos, P.G. (1991), On a Multivariate Gamma, Journal of Multivariate Analysis, 39, pp. 135-153. [64] Mathai, A.M. and Moschopoulos, P.G. (1992), A Form of Multivariate Gamma Dis- tribution, Annals of the Institute of Statistical Mathematics, 44, pp. 97-106. [65] Mathai, A.M. and Saxena, R.K. (1978), The H-function with Applications in Statistics and Other Disciplines, Wiley New York. [66] McCullagh, P. and Nelder, J.A. (1989), Generalized Linear Models, Chapman and Hall: London. [67] McLachlan, G.L. and Peel, D. (2000), Finite Mixture Models, New York, Wiley. [68] Minhajuddin, A.T.M., Harris, I.R., and Schucany, W.R. (2003), Simulating Multi- variate Distributions with Speciflc Correlations, J. Statist. Comp. Simul., Vol. 74, pp. 599-607. [69] Moran, P.A.P. (1969), Statistical Inference with Bivariate Gamma Distributions, Bio- metrika, Vol. 56, No.3, pp. 627-634. [70] Moschopoulos, P.G. (1985), The Distribution of the Sum of Independent Gamma Ran- dom Variables, Annals of the Institute of Statistical Mathematics, Vol. 37, pp. 541-544. [71] Moschopoulos, P.G. and Staniswalis, J. G. (1994), Estimation Given Conditionals from an Exponential Family, The American Statistician, Vol. 48, No. 4, pp. 271-275. [72] Nadarajah, S. and Kotz, S. (2005), On the Linear Combinations of Exponential and Gamma Random Variables, Entropy, Vol 7[2], pp. 161-171. [73] Pazman, A. and Pronzato, L. (1996), A Dirac-function method for densities of nonlin- ear statistics and for marginals densities in nonlinear regression, Statistics and Prob- ability Letters, Vol 26, pp 159-167. [74] Rudin, W. (1976), Principles of Mathematical Analysis, McGraw-Hill. [75] Shilov, G.E. and Gurevich, B.L. (1977), Integral, Measure, and Derivative: a Unifled Approach, New York, Dover Publications. [76] Sivazlian, B.D. (1981), On a Multivariate Extension of the Gamma and Beta Distrib- utions, SIAM Journal on Applied Mathematics, Vol. 41, No. 2, pp. 205-209. [77] Tanner, M.A. (1996), Tools for Statistical Inference: Methods for the Expectation of Posterior Distributions and Likelihood Functions, 3rd edition, New York: Springer. [78] Vaupel, J.W., Manton, K.G., and Stallard, E. (1979), The impact of heterogeneity in individual frailty on the dynamics of mortality, Demogarphy, 16(3), pp. 439-454. 111 [79] Walker, S.G. and Stephens, D.A. (1999), A Multivariate Family of Distributions on (0;1)p, Biometrika, Vol. 86, No. 3, pp. 703-709. [80] Williamson, J.H. (1962), Lebesgue Integration, New York, Holt, Rinehart and Winston, Athena Series. 112