Genuine Sequential Estimation Procedures
for Gamma Populations using Exact Evaluation Criteria
Except where reference is made to the work of others, the work described in this
dissertation is my own or was done in collaboration with my advisory committee. This
dissertation does not include proprietary or classi ed information.
Kevin Tolliver
Certi cate of Approval:
Hyejin Shin
Assistant Professor
Mathematics and Statistics
Mark Carpenter, Chair
Associate Professor
Mathematics and Statistics
Peng Zeng
Assistant Professor
Mathematics and Statistics
George T. Flowers
Acting Dean
Graduate School
Genuine Sequential Estimation Procedures
for Gamma Populations using Exact Evaluation Criteria
Kevin Tolliver
A Dissertation
Submitted to
the Graduate Faculty of
Auburn University
in Partial Ful llment of the
Requirements for the
Degree of
Doctor of Philosophy
Auburn, Alabama
December 18, 2009
Genuine Sequential Estimation Procedures
for Gamma Populations using Exact Evaluation Criteria
Kevin Tolliver
Permission is granted to Auburn University to make copies of this dissertation at its
discretion, upon the request of individuals or institutions and at
their expense. The author reserves all publication rights.
Signature of Author
Date of Graduation
iii
Vita
Kevin Paul Tolliver, son of Kevin Leonadis Tolliver and Cheryl Gatewood Tolliver,
was born in Washington, D.C., USA on April 17, 1985. He graduated from Edgewater
High School in Orlando, FL in 2002. In 2005, he received his Bachelor of Science degree in
Mathematics from Morehouse College in Atlanta, GA. He received his Doctor of Philosophy
degree in Statistics from Auburn University in 2009.
iv
Dissertation Abstract
Genuine Sequential Estimation Procedures
for Gamma Populations using Exact Evaluation Criteria
Kevin Tolliver
Doctor of Philosophy, December 18, 2009
(B.S., Morehouse College, 2005)
80 Typed Pages
Directed by Mark Carpenter
In this dissertation, we develop genuine two-stage sequential procedures for bounded-
risk and xed-width con dence interval estimation for Gamma distributed populations,
based on exact evaluation criteria. The term \genuine" refers to the fact that, in contrast
to previous methods, the procedures proposed herein are based on the combined samples
from both the rst and second stages, rather than ignoring the data from the rst-stage
sample. Accordingly, the terminal sample size and the estimate are no longer independent,
which complicates the theory development signi cantly. The term \exact" refers to the fact
the procedures are not evaluated on asymptotic or large sample theory, as is common in the
literature predating this dissertation, and the derivations are based only on the properties
of the underlying distribution, i.e., Gamma. The practical application of each procedure
was also considered and examples are given for both problems, i.e., bounded-risk and xed-
width.
v
Acknowledgments
I would like to thank my advisor, Dr. Carpenter, for his academic guidance. Under
his direction I became more than a mathematician but a mathematical statistician. I am
very appreciative of his encouragement and belief in my abilities when the material was not
always clear. Without his support, this thesis would not have been completed. It has been
a privilege working with him.
I also would like to extend my gratitude to the members of my committee: Dr. Peng
Zeng and Dr. Hyejin Shin for their time and suggestions that led to me improving this work.
Working with them in and out of the classroom has truly been a pleasure. In addition, I
would like to thank Dr. Robert Norton for taking time out of his schedule to assist with
the editing process.
I would also like to extend a special thanks to Kandace Noah for helping me understand
how my research is applied in her eld, and to Mary Mechler and Alicia Smith who gave
important insight when writing my thesis.
I am extremely grateful to my family: my siblings, loving aunts, uncles, and grand-
mother. I give a special thats to my father, Kevin L. Tolliver, for being a positive role
model in my life and my mother, Cheryl G. Tolliver, who always pushed me to do better
myself.
vi
Style manual or journal used Journal of Approximation Theory (together with the style
known as \aums"). Bibliograpy follows van Leunen?s A Handbook for Scholars.
Computer software used The document preparation package TEX (speci cally LATEX)
together with the departmental style- le aums.sty.
vii
Table of Contents
List of Figures x
List of Tables xi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Sequential Analysis and Multistage Designs . . . . . . . . . . . . . . . . . . 4
1.4 The Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Special Cases of Gamma Distribution . . . . . . . . . . . . . . . . . 9
1.4.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1 Bounded Risk Estimation . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.2 Fixed Width Interval Estimation . . . . . . . . . . . . . . . . . . . . 14
1.5.3 Modeling Times with Gamma . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Dissertation Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Bounded Risk Estimation 18
2.1 Shape Known and Scale Unknown . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Improving the Terminal Sample Size . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Performance Properties of New Estimation Procedure . . . . . . . . 26
2.3 Computer Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Shape Unknown and Scale Unknown . . . . . . . . . . . . . . . . . . . . . . 28
2.4.1 Robustness Considerations . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Determining Initial Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Example in Understanding Precipitation Rates in Regional Climate Models 32
2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Fixed Width Confidence Interval Estimation 37
3.1 Con dence Intervals for Gamma Distribution . . . . . . . . . . . . . . . . . 37
3.2 Two-Stage Fixed-Width Con dence Interval for Scale . . . . . . . . . . . . 39
3.3 Computer Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Asymptotic Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Example in Air Force Aeronautical Maintenance . . . . . . . . . . . . . . . 45
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Conclusion 50
viii
Bibliography 52
A Tables 56
B Figures 64
ix
List of Figures
1.1 Gamma CDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Gamma PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
B.1 Scatterplot of Alpha vs. Empirical Ratios . . . . . . . . . . . . . . . . . . . 65
B.2 Average Risk of Two Methods Compared to Risk Bound with Initial Sample
of 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
B.3 Average Risk of Two Methods Compared to Risk Bound with Initial Sample
of 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
B.4 Average Improved Risk of Two Methods Compared to Risk Bound with Ini-
tial Sample of 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
B.5 Average Improved Risk of Two Methods Compared to Risk Bound with Ini-
tial Sample of 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
x
List of Tables
A.1 Shape Known, Scale Unknown . . . . . . . . . . . . . . . . . . . . . . . . . 56
A.2 Shape Known, Scale Unknown (Improved) . . . . . . . . . . . . . . . . . . . 57
A.3 Old Bound B as function of Shape and Initial Sample Size . . . . . . . . . . 58
A.4 New Bound B as function of Shape and Initial Sample Size . . . . . . . . . 58
A.5 Shape Unknown, Scale Unknown . . . . . . . . . . . . . . . . . . . . . . . . 59
A.6 Shape Unknown, Scale Unknown (Improved) . . . . . . . . . . . . . . . . . 60
A.7 Initial Sample Size Considerations Simulations . . . . . . . . . . . . . . . . 61
A.8 Con dence Interval Simulations of Terminal Sample Size . . . . . . . . . . . 62
A.9 Coverage Percents as Risk Bound Varies . . . . . . . . . . . . . . . . . . . . 63
A.10 Coverage Percents as Initial Sample Size Varies . . . . . . . . . . . . . . . . 63
xi
Chapter 1
Introduction
1.1 Motivation
Statistical modeling is a technique used in many di erent scienti c elds. By summa-
rizing current results into one expression, statistical modeling aids researchers in explaining
their current results. More importantly, observed outcomes could be utilized to make future
predictions. In scienti c experimentation there are many factors that can contribute to a
certain outcome in a research experiment. A model simply refers to the outcome that is
expressed as the mathematical function of these factors. In order to make these predictions,
the data in these models are assumed to be random and have some underlying distribution
where one or all parameters are unknown, and parameter estimation is used to t these
models.
The Gamma distribution is often assumed to be the underlying distribution to model
right-skewed variables with positive support. Because of its exibility this distribution has
a wealth of applications and is often used to model random times-to-events. Two scienti c
elds of study where the Gamma distribution is most often used to model data are Survival
and Reliability Analysis. In Survival Analysis, variables such as lifespans of organisms as
well as time till a treatment takes e ect can be modeled with the Gamma distribution.
In Reliability Analysis Studies, lifespans of a system or systems components as well as
chemical corrosion, e.g. can be modeled with the Gamma distribution. The information
gained by statistical models in these two elds is used in developing life insurance plans,
pertinent drug information, warranty information, quality control information, etc. A pa-
rameter often studied in these elds is Mean-Time-To-Failure (MTTF) that is very useful
for systems used on a regular basis. A general queue also models times with a Gamma
1
distribution. This is seen in various computer systems, call centers, and tra c ow man-
agement systems. Articles that use the Gamma distribution in modeling times relating to
queues include: Choe and Shro (1997), Amero and Bayarri (2007), Chu and Ke (1997),
Clarke (1957). Though modeling times is the most frequent use, the Gamma distribution as
a family of distributions can be assumed in any area where values have a positive support.
For example, it is used frequently in climatology modeling both precipitation rates and
precipitation intensity. This is seen in Maureil et. al (2007) and Gutowski et. al (2008). In
addition, it is seen in censor imaging as shown in Chatelian (2007) and Chatelian (2008).
These statistical models are reliant on the parameter estimation, therefore it is imperative
for model t that estimates under some criterion are accurate, low bias with low variation.
To have an accurate parameter estimate, Sequential Analysis is needed to determine
how many observations are required An accuracy measure, such as standard error, is depen-
dent upon the parameters of the unknown underlying distribution. In sequential analysis,
all the observations are not sampled at once. In fact, having estimates with predetermined
accuracy cannot be determined with a sample size known prior to sampling. It is well
known that an unbiased estimator will become more accurate as the sample size increases.
However, knowing the sample size needed to ensure the accuracy falls within the criterion is
impossible to determine without any knowledge of the underlying distribution. Sequential
problems such as assigned-accuracy problems deal more with the sample size than with the
estimator itself. The nal model estimates are dependent upon information gained in prior
sampling.
Historically researchers have calculated measures of accuracy for sample design based
on incorrect assumptions about the underlying distribution. For many years, the underlying
distribution for the data is assumed to be Normal, even for time estimates where there is
a positive support and the data is right skewed. Sampling that assumes that the data is
Normal when it is not introduces the risk of not actually meeting the criteria. It could also
lead to sampling more observations than is needed to meet the speci ed criterion. This is a
very prevalent problem in statistics since many experiments and surveys are restricted by
2
budgetary restraints.
1.2 Research Question
Our focus is on developing a sampling procedure that will ensure an accurate estimator,
which means that the estimator will have a low bias and low variation. The model assumes
that the estimator is unbiased so the concern is restricting the variation. This dissertation
looks at two di erent problems involving predetermined accuracy. The rst problem is to
ensure that the risk falls within a bound and the second problem is to ensure the width of
the interval estimator is within a bound. In doing so, we will completely avoid using large
sampling theory. There will be no asymptotic approximation of the underlying distribution
and because this is done our procedure will hold for any number of initial observations.
We develop the mathematical theory that ensures that the risk is within a pre-speci ed
bound under a genuine two-stage sampling scheme that assumes that the data comes from a
Gamma population. The term \genuine" refers to the fact that the sequential procedure is
based on the combined sample from both stages. It may seem fairly obvious that a genuine
two-stage estimation procedure will yield better results than one that disregards one of the
two samples. This has not always been implemented. Arriving at a two-stage procedure
that ensures risk is within a bound may come with a cost and could result in sampling more
observations than previous sampling procedures. A more practical problem is considered
where the goal is to sample the fewest number of observations that achieve this goal to avoid
oversampling as described by Wald (1947). Using a relationship between risk and interval
estimation, a genuine two-stage xed-width interval estimator sampling scheme is produced
that is unlike anything that has ever been done in this eld before.
3
1.3 Sequential Analysis and Multistage Designs
Sequential analysis is a statistical theory of data where the nal sample size is not
known prior to the study. Sampling procedures where the nal sample size is known prior
to sampling is known as a xed-sample size procedure. Sequential sampling procedures are
used over xed-sample size procedures for (1) ethical reasons, (2) conceivability reasons, and
(3) economical reasons. For example, in a drug trial for reducing hypertension, if there are
m initial observations where some of them develop side e ects or there is signi cant evidence
that the true mean is low, then medical ethics forbid further sampling. With other instances,
arriving at an alternative solution is inconceivable. An example of conceivability reasons
considers an industrial process. There is no known way of determining when a process
will become out of control with a xed-sample size. There are occasions when sequential
analysis is economical. An example of this is any sampling where there is an attached cost
to each observation. Sequential analysis can reduce the number of observations, which will
consequently reduce the cost of the experiment. Finding assigned accuracy estimators for
parameters of a Gamma process or population can be all three. As noted before, the Gamma
is often assumed in modeling times in clinical trials. There is no conceivable solution for
determining the nal sample size needed to achieve predetermined accuracy with a xed
sample size. Since one objective is to sample the fewest number of observations that achieve
a certain goal, it has economical applications. For all of those reasons, a sequential design
needs to be implemented to achieve pre-assigned accuracy.
Sequential analysis consists of two components: (1) the stopping rule and the decision
rule. The stopping rule indicates whether or not sampling should be stopped after m
observations or whether additional observations should be sampled. A stopping rule is
characterized as a mechanism for deciding whether to continue or stop a process on the
basis of the present position and past events, which will almost always lead to a decision
to stop at some time. The nal resultant sample size N is called the terminal sample size.
The decision rule tells what actions need to be taken after sampling has been stopped.
4
De nition 1.1 If m is a known predetermined sample size, a sample is said to be sequential
if the terminal sample size N is not xed, i.e.
P(N = m) < 1 for m;N2N:
The emphasis in this context is on having an estimator that will fall below some predeter-
mined accuracy, the terminal sample size N will be the nal sample size that ensures this is
the case. The terminal sample size depends on earlier observed information, X1;X2;:::;Xm
making it a random variable.
The optimal sample size (n ) is the number of observations that best achieves a re-
searchers goal. This can mean a number of things for di erent problems. In our context,
the optimal sample size is the fewest number of observations that ensures that our estimator
is accurate under some predetermined criteria. The optimal sample size is xed and is de-
pendent upon the unknown parameters of the underlying distribution. In an ideal situation
the terminal sample size will equal that of the optimal sample size. The terminal sample
size is assessed by looking at the ratio of the expectations,
E[N=n ]: (1:3:1)
The performance of the terminal sample size can be evaluated asymptotically
limm!1E[N=n ] = 1: (1:3:2)
and
limm!1Var[N=n ] = 0: (1:3:3)
In sequential analysis, there are two sub elds: (1) purely sequential designs and (2)
multistage designs. Earlier it was mentioned that the nal sample size is not known prior
to the start of sampling. However, this does not mean that each observation is observed one
at a time. With purely sequential designs, each observation is observed one at a time and
an analysis is performed after each observation is drawn. Whereas in multistage designs,
5
multiple observations are drawn at a time,called a stage, and there is a cap on the total
number of stages. The terminal sample size does not necessarily consist of the prior m
observations. In some instances the m observations are used to determine the terminal
sample size and then disregarded for the analysis.
De nition 1.2 Let X1=fX1;:::;Xm1g be an initial sample. For a decision rule sub-
sequent samples are Xi=fXmi 1+1;:::;Xmig or Xi = ; for 1 i k. The sample
X =[ki=1Xi: is a genuine k-stage sample.
The advantages of purely sequential problems are that they yield better statistical results
and the procedure will have a reduced chance of over sampling as described by Wald. How-
ever depending on the design, multistage sampling can be more cost e cient and more
manageable.
For example, consider the problem of determining when an industrial process becomes
out of control; data is read after each observation. In such cases, it will be practical to
use a purely sequential design. As the data is read, the process can immediately determine
when it has become out of control and there is no need to continue sampling. However in
a clinical trial, it is not practical to treat one subject at a time. A multistage design is
needed.
Multistage problems are currently used in a wide range of areas. Some multistage
sampling schemes use a set of observations from the population as their initial sample. The
subsequent samples consist of analyzing a subset of that initial set. This is seen in the U.S.
Census? Current Population Survey multistage method, given in Moore, McCabe, and Craig
(2007). This is also seen in crop management, Finney (1984), as well as multistage clus-
ter analysis Phillipi(2005). In the context of modeling times, multistage designs are often
used in adaptive designs. In a broad overview of adaptive designs, several examples were
given where statistical procedures were modi ed during the conduct of clinical trials. It is
not only e cient to identify clinical bene ts of the test treatment under investigation, but
also to increase the probability of success of clinical development, Chow and Chang (2008).
Most adaptive designs in clinical trials can be referred to as adaptive randomization and
6
group sequential designs. With the exibility for stopping a trial early due to safety, futility
and/or e cacy and sample size re-estimation at interim for achieving the desired statisti-
cal power. In an article on uni ed theory of two-stage adaptive designs the mathematical
theory is proposed to adaptations in literature. To summarize, the adaptations alter the
sampling distribution, which means the assumed results may not be true, Lui, Proschan,
and Pledger (2002). For example, for two-stage adaptive tests in particular, changes in the
sampling distribution can occur. Only recently it has been thought of to alter the p-value.
In their article, they arrive at a number of useful theories on two-stage adaptive designs.
Using a large number of stages makes organization more complicated and both admin-
istrative expenses and interest charges on the large investment increase. This is also the
case with a number of other sequential sampling procedures. Because of this, in literature
there are many two and three-stage designs; Mukhopadhyay and Pepe (2006), Mukhopad-
hyay and Zacks (2006), Yao and Venkatraman (1998), Satagopan et al. (2002), Whittemore
(1997), Jinn et al. (1987), Lorden (1983), Mukhopadyay (1995), etc., and not as many four,
ve, and six-stage designs.
Squared error loss is a measure of an estimate?s distance from its true parameter.
De nition 1.3 If A> 0 is constant speci ed by the experimenter that penalizes deviations
more or less as need be, squared error loss of an estimator with n observations is the squared
distance between a parameter and its estimator ^ :
Ln( ; ^ ) = A( ^ )2:
In practice, this measure is assessed by its expected value, called risk. The risk gives an
indication of the reliability of an estimate. High risk indicates the estimator is unreliable
while a low risk indicates the estimator is reliable. Increasing the sample size is an action
taken to lower risk. One method of accuracy measure in sequential analysis is the bounded
risk estimator.
7
De nition 1.4 For a predetermined risk bound w, a bounded risk estimator with the ter-
minal sample size N number of observations is the expectation of the squared error loss
RN( ; ^ ) = E(LN) w:
It is well documented that the risk will be a multiple of the variance plus the bias of the
estimator squared. Consider the Normal distribution with known variance; in this instance,
bounding the risk is an easy calculation. However, if the mean and variance are unknown,
then this problem cannot be solved with a xed sample size. With tting statistical models
with parameter estimation, the goal is to have accurate mean parameter estimates.
Another accuracy measure in sequential analysis is the xed-width interval estimator.
De nition 1.5 For a predetermined width d, a 1 a xed-width interval estimator of a
real-valued parameter is any pair of functions L(X) and U(X), with L(X) < U(X) for
with the inference L(X) < < U(X) is made. We say CX is the interval [L(X);U(X)],
The width of the con dence interval is simply U(X) L(X) d, and P( 2CX) 1 a:
Fixed-width con dence intervals are a large part of sequential estimation. It is known that
interval estimation is more informative than point estimation due to the P(^ = ) = 0 for
any continuous distribution. Interval estimators consist of width of the interval and the
coverage probability. There are merits to both. A small coverage probability implies that
the researcher has a larger chance of making an error, whereas a large width is uninformative.
1.4 The Gamma Distribution
The model assumption for the proposed sampling scheme is that the underlying distri-
bution is Gamma. The focus is on estimating the mean parameter of the Gamma distribu-
tion. The Gamma distribution is a exible right-skewed distribution that has a variety of
applications. It is often used in modeling times-to-event that is seen in biological science,
8
engineering, ecological, and probability elds. The density is
f(x) = 1 ( ) x 1e x= ; for x> 0; (1:4:1)
where ( ) = R10 t 1e tdt and ; > 0, with 12 =p and ( ) = ( + 1). Note,
and are referred to as the shape and scale parameters, respectively. A property with the
Gamma density is that it is closed under scalar product. That is if X Gamma( ; ), then
Y = cX Gamma( ;c ): (1:4:2)
The sum of k Gamma random variables with shape and scale is
kX
i=1
Xi Gamma(k ; ): (1:4:3)
The moment generating function of this distribution is
MX(t) =
1
1 t
;t< 1= : (1:4:4)
Which makes the mean and variance
EX = and Var(X) = 2: (1:4:5)
1.4.1 Special Cases of Gamma Distribution
Some special cases of the Gamma distribution will be noted as they are referenced
throughout this dissertation. Suppose X is distributed with Gamma with shape and
scale . If the shape parameter is one, then X is exponentially distributed with scale
. It is well documented that adding k exponentially distributed variables will yield a
Gamma distribution with shape k and scale , as noted in equation (1.4.3). If the shape
parameter is an integer then the variable is Erlang with shape and scale . If the scale
is two, then X becomes a Chi-Square distribution with parameter 2 . It follows by (1.4.2)
9
Figure 1.1: Gamma CDF
that 2X= Chi Square(2 ). Finally if there are two independent Gamma distributed
variables, X Gamma( ; ) and Y Gamma( ; ), then XX+Y Beta( ; ).
1.4.2 Estimation
As previously stated, the Gamma distribution is widely used in engineering, probabil-
ity, ecological and biological science elds. The problem of nding reliable estimators for the
mean dates back to the early 1950s. There are a number of di erent methods that can be
used in nding estimators for this distribution: method of moments, maximum likelihood,
and least squares. In particular, for this dissertation, the maximum likelihood method is
used to obtain the estimator for and the method of moments estimator is used to obtain
the estimator for
The maximum likelihood method of estimation is the most popular technique for de-
riving estimators. This technique has many ideal properties including the fact that it yields
the best unbiased estimators. Using (1.4.1), the likelihood function for n identically and
10
Figure 1.2: Gamma PDF
independently distributed variables. The likelihood function becomes:
L( ; ) = 1[ ( )]n
nY
i=1
x 1i e 1=
Pn
i=1xi xi > 0; fori = 1;:::;n
To ease computation, the natural logarithm of the likelihood is taken. This can be done
because the natural log function is a monotone function, so the likelihood will maintain its
optimum values. If shape is known, the maximum likelihood estimator for can be easily
obtained and is shown to be
^ = X= : (1:4:5)
If the shape is unknown, no close form maximum likelihood estimator or numerical solution
needs to be given to arrive at its maximum likelihood approximation. This is the reason
the maximum likelihood estimation approach is not used for the shape parameter.
11
The method of moments estimator is another common method for estimating a param-
eter. It works by setting the kth moment to the sum of xi to the kth power,
^E(Xk) = 1
n
nX
i=1
Xki i = 1;:::;n k2N:
Shape parameter ( ), which is widely considered a nuisance parameter. Because of the
unattainability of a best unbiased estimator, method of moments estimator is used,
^ = X2=S2: (1:4:6)
This is a slightly biased estimate that is asymptotically consistent.
1.5 Literature Review
Early elements of sequential analysis appear in the 17th and 18th century, when math-
ematicians Huyghens, Bernoulli, Montmort, DeMoiver, LaGrange and LaPlace worked on
the Gamblers Ruin problem, (Ghosh and Sen 1991). This famous probability problem tries
to determine at what point gamblers will completely deplete their funds. Dodge and Romig
in 1929 were the earliest to apply what is now known as sequential analysis to a statistical
problem. They developed a double sampling test, where two samples were taken and the
proportion of defective units was observed. Shewart in 1931 developed theory on what
instance does an industrial process become out of control. Wald in 1947 produced a well
known book on sequential analysis that sparked interest from several authors world wide.
1.5.1 Bounded Risk Estimation
A common sequential problem dealing with preassigned accuracy is bounded risk es-
timation. For populations with known variance, there is a xed-sample size solution; no
sequential methods need to be implemented. The problem arises when nothing is known
12
about the population. Stein (1945) proposed a two-stage bounded risk estimation procedure
for Normal populations. This procedure incorporated using the standard deviation from
the initial sample to yield the proper terminal sample size. Modeling times with a Normal
underlying distribution will not yield ideal results because times are often skewed to the
right. A better distribution to assume when modeling times is the Exponential distribution.
Birnbaun and Healy (1960) developed a two-stage bounded risk estimation procedure
for Exponential processes. This sampling scheme assumed that the underlying distribution
was Exponential and found ways to bound the scale parameter using Chi-Square transfor-
mations. Their result can be summarized as follows: if there are m 3 initial observations
and
BBH = Am
2
(m 1)(m 2)
then
NBH =dBBH
X2m
w e (1:5:1)
is the sample size required so that the risk of the estimator is within the bound w. However,
this procedure is not a genuine two-stage sampling procedure. The solution is only based
on the second sample. The initial sample is used to determine the sample size required
in achieving bounded risk and then it is not included in the nal estimate. This is done
because taking observations from two di erent samples alters the sampling distribution.
It is true that bounding the risk cannot be done with a xed sample size. However, dis-
regarding readily available information is wasteful. This concept was improved by adding
observations from the initial sample to the second sample, thus making the procedure a gen-
uine two-stage sampling procedure. Although, this proved to be an asymptotically great
bounded risk-estimator (Kubokawa 1989), no actual proof was provided for this result and
it is uncertain if it holds for any number initial observations (> 3).
Various works in sequential estimation of scale parameter of the Exponential distribu-
tion is done by Mukhopadhyay (1995), (2006), (2006a), (2006b), (2007), etc. Mukhopadhyay
and Pepe (2006) record an exact genuine two-stage sampling procedure. Their result which
13
holds for any initial sample size greater than three, can be summarized as follows: if there
are m 3 initial observations and
BMP = 2Am(m+ 1)(m 1)(m 2)
then
NMP =dBBH
X2m
w e (1:5:2)
is the sample size required so that the risk of the estimator is within the bound w. The
consequence of this is the expected value of the terminal sample size is more than twice that
of the initial sample size; meaning on average the researcher will sample more than twice
the observations needed. This is referred to as a penalty for exact bounded risk estimation.
Exploring the distribution of this terminal sample size, a reduction of this terminal sample
size could be found making this exact procedure more practical, Zacks and Mukhopadhyay
(2006).
1.5.2 Fixed Width Interval Estimation
The next sequential problem is restricting the interval estimators width. Interval es-
timation is one of the fundamental aspects of statistics. Presenting interval estimators are
often preferred over measures of variation, such as risk. This is probably due to the fact that
con dence intervals can yield better interpretations. This is particularly important when
the estimate does not have a Normal sampling distribution, Ramsey and Shafer (2002).
Both measures give exibility to the estimator, but interval estimators give results in terms
of what is probable, i.e. the probability that the true parameter lies within a 1 a con -
dence region is 1 a.
Fixed-width con dence consists of relatively high probabilities and relatively narrow
14
interval widths. Traditionally, they are of the form
CX =f j XN d< < XN +dg; (1:5:3)
with the terminal sample size N. There is no xed sample size solution to this when the
variance of a distribution is unknown. The width of the interval estimator is dependent
upon the variance of the distribution. Stein (1949) solves the Normal xed width problem
by proposing a two-stage procedure to bound the con dence interval for mean when vari-
ance 2 is not known. The terminal sample size of this procedure is given below,
N = maxfm;d
b2m 1;1 a=2S2m
d2 eg; (1:5:4)
where bm 1;1 a=2 is the 1 a=2 point of a t with m 1 degrees of freedom. This uses the fact
that bm 1;1 a=2 will be larger than z1 a=2. This procedure was shown to be asymptotically
inconsistent. As the initial sample size gets large, the ratio between widths of this proce-
dure?s sample size and the optimal sample size will be (bm 1;1 a=2=z1 a=2). Ghurye (1958)
proposes a two-stage xed-width con dence interval for a location parameter of a general
density, f(x), along the lines of Stein. This is not used for mean. Chow and Robbins (1965)
record a purely sequential interval estimator for the mean of a general density f(x). This
sequential result uses an initial sample of m observations, then chooses the rst n for which
the following is achieved size is
N = minfn mjn d 2z2a=2(S2n +n 1)g; (1:5:5)
However, their procedure uses asymptotic theory. This is not a practical approach for model
estimates because it observes observations one at a time and we are avoiding Normal ap-
proximation. A general method for determining xed width con dence intervals is given by
Khan (1969). This method like the previous methods use Normal theory; in it he discusses
almost sure convergence, asymptotic consistency, and asymptotic e ciency. Research is
15
continuing to be developed in this area. For example, Mukhopadhyay, Silva, and Waikar
(2006) develop a two-stage sampling procedure to which they compare Steins xed width
interval approach (1949) and Chapmans xed width interval approach (1950).
As noted before when estimating mean time, it is better to model with the Exponen-
tial distribution. Govindarajulu (1995) developed a sequential estimator for the mean of an
Exponentially distributed population. This result is more applicable to modeling times, it
is summarized that as follows: if
zn = z[1 +n 1(1 +z2)=4 +o(n 1)]
NG = minfn mjn z2n X2n=d2g (1:5:6)
will bound the risk. This procedure again uses Normal approximation.
However because of the shape of this distribution, no research has been found on re-
stricting the width of the interval without using asymptotic approximation.
1.5.3 Modeling Times with Gamma
It should be noted that both of the prior subsections ended with sequential research in
statistical modeling for Exponential populations. This is because there is not much research
in this area for Gamma populations. However, in the same instances where the Exponential
model can be used, so can the Gamma. Speci cally statistically modeling random times,
such as mean time-to-failures, are assumed to be Exponential. Where the longer one survives
the smaller the probability is for continual survival. This is not always the case. For
example, it is charted for life expectancy of an infant that there are many casualties during
the rst few months of birth. So for a short period of time, the life expectancy increases
the longer the infant lives. It will be more appropriate to model infant life expectancy with
a Gamma or Weibull distribution. Both of these distributions are more general forms of
the Exponential distribution. There are several works that discuss modeling MTTF as a
16
Gamma distributed variable: Coit and Jin (1999), Shapiro and Wardrop (1978), Barber
and Jennison (2002), etc. These articles provide examples of when the Gamma should be
used over the Exponential distribution. For example, when modeling failure times with a
known number of failures and missing values are present. The time between one failure and
the last record is Gamma with known shape. This happens often when data is recorded
periodically and not after each failure, Coit and Jin (1999).
1.6 Dissertation Layout
In the second chapter, two-stage bounded risk estimators are developed. The perfor-
mances of these sample sizes are evaluated through simulation. Use of numerical methods
is implemented to reduce the value for the sample size, making the estimator more asymp-
totically consistent. In the third chapter, a xed-width interval estimator is created, and
another example is given to illustrate how it works and how it relates to queueing theory.
The nal chapter summarizes the results of this dissertation and discusses future research
problems in this area.
17
Chapter 2
Bounded Risk Estimation
The goal for the bounded risk problem is to sample the fewest number of observations so
that the risk is just within a predetermined bound. Birnbaum-Healy (1960) developed a two-
stage sampling procedure for the Exponential distribution; however their method does not
use the information obtained from their initial sample in their nal estimate. Mukhopadhyay
and Pepe (2006) develop a two-stage sampling procedure for the Exponential distribution
that combines the initial sample with the second sample to derive the nal estimate. In
this chapter, we generalize Mukhopadhyay and Pepe?s result to the Gamma distributed
populations. It should follow that when the shape is equal to one, our results will be
exactly that of Mukhopadhyay and Pepe.
Additionally, we introduce some notations and basic concepts of decision theory as it
applies to the Gamma distribution. First, the risk bounds are found when only the shape
parameter is known. Secondly, risk bounds are found when both parameters are unknown.
We also evaluate the performance of our bounds theoretically and through simulations and
make possible improvements.
2.1 Shape Known and Scale Unknown
There are a number of reasons the shape known case is studied: (1) There are partic-
ular instances where the shape parameter is either known or can be assumed as known, (2)
studying the alpha known case allows us to see how robust the Exponential assumption is,
(3) there are times when the shape parameter is not known but there is a mathematical
theory that allows us to assume the shape parameter is known, and (4) it lays the ground
work for when shape parameter is unknown.
18
MTTF is often estimated as an Exponential random variable. This is merely a special
case of the Gamma distribution when the shape is known and equal to one. Mukhopadhyay
discusses this in a number of articles (1995), (2006), (2006a), (2006b), (2007), etc. However
in many cases, MTTF is modeled with a Gamma distribution when the shape is known
and not equal to one. Dopke (1992) and Coit and Jin (1999) discuss estimation of the
MTTF as a Gamma random variable. The example given by Coit and Jin is when the time
between each failure is not recorded. If there are k failures in a span t, then the MTTF is
Gamma distributed with known shape k and unknown scale. They elaborate on why each
failure time is not always recorded saying, \this is understandable because the elapsed time
meter records time for the entire assembled item and not the individual components." This
is similar to the idea of sum of Poisson process random variables. Other examples when
the shape is known occur with modeling times and Normal distribution; there are modeling
times when the shape parameter is assumed known just as there are instances in the Normal
distribution when variance is assumed known. This can happen for a number of reasons;
either there is so much historical evidence that the shape is consistent, there exists some
mathematical theory for the shapes value, or the actual shape is of little concern as long as
it is within reason. For example, in Maurellies (1999) precipitation models, he discusses the
actual unimportance of knowing the exact shape. They state that since the data is right
skewed, it is important to model the data with a low shape value. In this dissertation, they
simply model precipitation intensity with = 2. In each of these examples it is important
to have reliable estimates.
This is not the only reason for exploring the shape known, scale unknown case. Study-
ing the shape known case also gives an idea of how robust the assumption of an Exponential
distribution is. Mukhopadhyay and Pepe?s (2006) result is only for the Exponential distri-
bution. If there is some uncertainty that the shape is one, then there is no validity to their
procedure. These forementioned reasons provide justi cation for studying the shape known
case.
Our goal is to develop a reliable estimation sampling scheme for when only the scale is
19
unknown. If the shape is known and it is only desired to estimate the scale , our goal is
to nd the fewest number of observations n that will make its associated risk function less
than or equal to a predetermined risk w > 0. Recall the mean of a Gamma distribution
with parameters and is . Hence the risk is Rn( ; ^ ) w. If the shape is known
the problem reduces down to:
Rn = A
2
n w: (2:1:2)
This implies, n A 2 w . Let the optimal sample size:
n =dA
2
we: (2:1:3)
This guarantees an integer value, which will ensure that the risk is within our bound. Sam-
pling more observations than n is considered oversampling, sampling fewer observations
than n will yield a high risk and thus an unreliable estimate.
Notice n is dependent on the unknown parameter , so a sequential sampling proce-
dure must be implemented to ensure that knowledge can be gained on this parameter. A
pilot sample of m observations X1;:::;Xm i.i.d variables will be taken following a Gamma
distribution ( ; ), with m > 3. From this sample the maximum likelihood estimator
of can be found using the maximum likelihood estimator Xm= , see (1.4.5) to see how
this was derived. That estimate is used to determine the terminal sample size N. This
quantity will guarantee that we do not exceed the necessary number of observations for
the statistical procedure by too much, as it might be costly or impractical, yet not fall
short of an appropriate sample size either. After observing the rst m observations, our
rst stage, a decision is needed to determine if the procedure can continue with the m ob-
servations, or if more need to be added, our second stage. Yielding our two-stage procedure.
20
Theorem 2.1 If X1;:::;Xm i.i.d. Gamma ( ; ) initial observations are drawn, (m
3)and B is chosen to be
B = A 2
m
2m2 (m 1)
(m ) +
(m3 +m2)[ (m 2)]
(m )
: (2:1:5)
and the terminal sample size is chosen to be
N = max
m;dBX
2
m
3w e
!
: (2:1:4)
Then if N m observations are drawn in the second stage the risk over all N observations
will be less than a predetermined risk bound w : RN w
Proof.
We can re-express the risk on all N observations as RN = AE(XN )2 side of the inequality
as
AE
"
m2
N2
X
m
2
+
2
N m
N2
#
:
Recall m N, so the ratio is mN 1, and N BX
2
m
3w .
Now,
AE
"
m2
N2
X
m
2#
AE
"
m
N
X
m
2#
m
3w
B AE
1
2
2
Xm +
2
X2m
!
Also,
2
AE(
N m
N ) =
2
AE(
1
N(1
m
N))
2
AE(
1
N)
2w
B AE
2
X2m
Thus, using the two inequalities above with the reexpression fact we have
AE
X
m
2
m
2w
B AE
m
2m
Xm +
m 2
X2m
+
2
X2m
!
21
Using the fact that Pmi=1Xi Gamma(m ; ), it is easily seen that 2 1Xm will be
distributed 2 with 2m degrees of freedom. It can be veri ed that expectation will be
(m k)
2k (m ) . We obtain the equation:
RN = AE
X
N
2
A
2w
B
m
2m2 (m 1)
(m ) +
(m3 +m2)[ (m 2)]
(m )
:
So to ensure the expected loss is less than our risk bound w, we set the righthand of the
inequality to equal w then solve for B accordingly and obtain equation (2.1.5).
2.2 Improving the Terminal Sample Size
In the prior section, we found results that certainly achieved the goal of having the risk
within the risk bound. An alternative to the asymptotic sampling that ensured the risk is
within a bound was found. Remember that is only part of the goal; the goal is to sample
the fewest number of observations that achieves the bounded risk goal. It is important
to investigate the relationship between the terminal sample size and the optimal sample
size. Exploring the relationship of the N and n is the rst step in seeing if N needs to be
reduced. If m 1. No further work needs to be done. However, the fact
that as increases B decreases we choose to further our research and develop a new bound
coe cient as a function of the old B and .
The risk function of our two-stage sampling procedure was investigated as a function
of the scale, in order to approximate the where the maximal risk occur. Clearly, larger
values result in larger risks if N were to remain constant. However, larger values tend
to result in larger values for N, which reduce the risk. So the maximal risk under this two-
stage sampling procedure is not necessarily an in nite entity. In fact, through simulations
the maximal risks most commonly occurred between ve and six. Once this was done, B
was identi ed for each , then empirically reduced so that the risk is just within the bound
w. The new B is the ratio of the empirical B found to the B as a result of mathematical
theory given in (2.1.5). In Figure B.1, we can see a scatter plot of these ratios and .
For each value of there was a corresponding ratio. For example, = 1 would corre-
spond to 0:560. Looking at the scatter plot, there appears to be a negatively exponential
relationship between and what the appropriate ratio should be, leveling o around = 20.
A regression is performed to exploit this relationship. This is done only to nd coe -
cient of log ( ). For this problem, we are not trying to t the curve, but have a curve that
gently sits above each of the points. In order to do this, the same regression is used but the
slope needs to be altered. As stated earlier, 0:565 su ces for all > 1, this will be used to
nd the intercept. With this we nd that:
Bnew = [ 0:031 log( ) + 0:597]B ; < 20
Bnew = 0:505B ; 20 (2:3:4)
25
This will give smaller values for B see Table A.4.
2.2.1 Performance Properties of New Estimation Procedure
A genuine two-stage procedure for sampling data to an assigned accuracy only assuming
the Gamma distribution was found in Theorem 2.1. This section showed that ultimately this
procedure would continue to sample nearly twice as many observations than needed. A more
practical solution of how to select a smaller number of observations was also considered in
this section. Unlike section 2.1, a mathematically rigorous proof was not provided. However,
su cient analysis was performed to substantiate the belief that the risk will always be within
the risk bound w and that a resulting reduction in terminal sample size of nearly half the
observations will be selected. In fact if = 1;w!0 as m!1 and E[N=n ] <1 then
limE[Nnew=n ] = 1:13;
and if 20, then
limE[Nnew=n ] = 1:01:
This means that instead of sampling nearly twice as many observations, the improved two-
stage sampling procedure will sample between approximately 1.01 and 1.13 depending on
the value of and the initial sample size. This is seen in Table A.4. Not only does this
improved result give reliable estimators for but the terminal sample size nears the optimal
sample size.
2.3 Computer Simulations
In the previous section, the mathematical theory was provided to ensure the risk stays
with the predetermined bound. To verify the results in the previous section a simulation
study was conducted using R software. In the simulation, di ering values for optimal sample
size n were chosen: 25, 50, 100, 500. We x = 5 since this result is not dependent upon
26
and vary = f0:5;1;2;5;10g. A is a constant expression, we choose it to be 2; 10,000
replications were used for each case. The quantity N is an estimate for the expected value
of N and r is an estimate of the risk with the original terminal sample size. This simulation
was repeated with the improved terminal sample size Nnew. Our desired result is to see r
fall beneath w and to see rnew be just below w and Nnew to be above n . Also, since this
is a generalization of Mukhopadhyay?s research we would like to see the same results with
= 1. As such, our results for = 1 and = 5 should resemble Mukhopadhyay and Pepe?s
(2006) results. Figure B.2, B.3, B.4, and B.5 give visual representation of the estimated
risks compared to their risk bounds as shape varies 0.5, 1.0, 2.0, 5.0, 10.0 and n is xed
to 25. Figure B.4 and B.5 display the same for the improved results. Notice, the estimated
risks fall within the risk bound. With the improved results the estimated risks are closer to
the risk bound. Further detail is given in Table A.1 and Table A.2.
See Table A.1 and Table A.2, we note the following:
1) In Table A.1, for the case = 1 and = 5, our mean value for N and mean value for r
are nearly identical as those given by Mukhapadyay and Pepe. The B given in this article is
exactly that of their B. This is to be expected because this is a generalization of their result.
2) In Table A.1, the mean value for N nears twice that of n . Note N is a function of and
m, so as both variables get larger N gets closer to 2n . We can see this if we look across
rows and down columns.
3) In Table A.1, the mean value for r is always nearly half of our predetermined risk w. This
follows since the expected risk is inversely proportional to the number of observations drawn.
4) Naturally as the initial sample size m increased, we obtained more information about
the sample with which to make our decision and consequently we obtained better values for
27
r and N.
5) From Table A.1 and A.2, we observe that at no point in this simulation do the estimates
for the risk nor the improved risk ever exceed the risk bound w.
6) In Table A.2, we observe that the newer results risks are much closer to the bound, and
the newer sample sizes have been reduced on average by 43%.
7) In Table A.2, we observe that the average terminal sample size Nnew is still larger than
n . It is my conjecture, that it will be unable to improve Nnew any further. Our average
risks are just within the bounds, which is our goal. Reducing the sample size any further
might result in having the average risk eclipse the risk bound.
2.4 Shape Unknown and Scale Unknown
In this section, we provide a solution to the question of how many samples should be
selected if both parameters are unknown. If variables are both unknown, nding bounds
become more di cult. The goal remains the same, to nd an appropriate sample size N
that will make the associated risk function less than or equal to a predetermined risk w> 0.
Recall if X Gamma( ; ) then the mean is and the estimator for the mean is X. The
goal is to nd N such that:
AE(X )2 0.
(2) N is nite with probability 1 for every d> 0.
(3) N=n !1 as d!0 in probability or a.s.
(4) E(N)=n !1 as d!0.
(5) limd!0P( 2CX) = 1 a
In the following section, we will show that under certain conditions these properties hold.
Theorem 3.1 For signi cance level a and predetermined width d, if X1;:::;Xm i.i.d. Gamma
( ; ) initial observations are drawn (m 3). If N is de ned in 2.1.4 and gq be the qth
quantile of the Gamma(n ;1=n) distribution
M = minfn N2Nj
r
Nw
A [
g1 a=2 ga=2
]g (3:1:6)
Then if CX =f j XM=
q
Nw
A [g1 a=2= 1] < < XM=
q
Nw
A [ga=2= 1] = dg
Then
(1) P( 2CX) 1 a
(2) CX d
40
Proof.
Theorem 2.1 ensures that A
2
N 0j 1(g1 a=2(m) ga=2(m)) = dg:
The ratio of expectations E[M=n ] becomes
E
2
4minfn>Nj
q
Nw
A
1(g1 a=2(M) ga=2(M)) = dg
minfn> 0j 1(g1 a=2(m) ga=2(m)) = dg
3
5:
Clearly if n >N, this reduces to
E
"
1
r
Nw
A
#
:
43
Thus,
E[M=n ]
p
1 + (m ) 1:
It is obvious that as d! 0, M !1. If the terminal sample size increases the random
variable (G= 1) should be examined asymptotically. It was mentioned earlier that G
Gamma(M ;1=M). This means that the mean and variance ofG= will be one and (M ) 1
respectively. According to the Central Limit Theorem
pM (G= 1) = Z N(0;1):
Also, the random variable XM= Gamma(M ; =M ) with the standard deviation ( )
equaling =pM . Let n 0 be the optimal sample size of the risk estimator. If m!1 and
w!0 such that E[N=n 0] <1 and d!0.
1:01 < limE
"r
Nw
A
#
< 1:13
Under those conditions, the upper bound of the proposed interval estimator
Xm=
r
Nw
A (ga=2= 1):
approximately becomes
Xm=
M(za=2):
Because the Normal distribution is symmetric it is the same as
Xm= +
M(z1 a=2):
Similarly, the lower bound will be the same lower bound as the Normal lower bound. The
Normal distribution will have all the optimal properties. Thus, under those conditions
44
asymptotically the proposed procedure will have all of the optimal properties. To summa-
rize, the procedure was developed using exact methodology and will hold for any number
of initial observations (m > 3). A small initial sample size may come with a price of
sampling more observations than needed. For large m such that m