Stochastic Differential Equations:
A Dynamical Systems Approach
Except where reference is made to the work of others, the work described in this
dissertation is my own or was done in collaboration with my advisory committee. This
dissertation does not include proprietary or classifled information.
Blane Jackson Hollingsworth
Certiflcate of Approval:
Georg Hetzer
Professor
Mathematics and Statistics
Paul Schmidt, Chair
Professor
Mathematics and Statistics
Ming Liao
Professor
Mathematics and Statistics
Wenxian Shen
Professor
Mathematics and Statistics
Joe F. Pittman
Interim Dean
Graduate School
Stochastic Differential Equations:
A Dynamical Systems Approach
Blane Jackson Hollingsworth
A Dissertation
Submitted to
the Graduate Faculty of
Auburn University
in Partial Fulflllment of the
Requirements for the
Degree of
Doctor of Philosophy
Auburn, Alabama
May 10, 2008
Stochastic Differential Equations:
A Dynamical Systems Approach
Blane Jackson Hollingsworth
Permission is granted to Auburn University to make copies of this dissertation at its
discretion, upon the request of individuals or institutions and at
their expense. The author reserves all publication rights.
Signature of Author
Date of Graduation
iii
Vita
Blane Hollingsworth was born in Huntsville, Alabama in 1976. His parents are Dianne
and Sonny Hollingsworth. He attended the University of Alabama in Huntsville from 1994
to 2000, receiving both his B.S. and M.A. degrees in mathematics. In fall of 2000, he entered
the Ph.D. program at Auburn University.
iv
Dissertation Abstract
Stochastic Differential Equations:
A Dynamical Systems Approach
Blane Jackson Hollingsworth
Doctor of Philosophy, May 10, 2008
(B.S., University of Alabama in Huntsville, 1998)
(M.A., University of Alabama in Huntsville, 2000)
121 Typed Pages
Directed by Paul Schmidt
The relatively new subject of stochastic difierential equations has increasing impor-
tance in both theory and applications. The subject draws upon two main sources, prob-
ability/stochastic processes and difierential equations/dynamical systems. There exists a
signiflcant \culture gap" between the corresponding research communities. The objec-
tive of the dissertation project is to present a concise yet mostly self-contained theory of
stochastic difierential equations from the difierential equations/dynamical systems point of
view, primarily incorporating semigroup theory and functional analysis techniques to study
the solutions. Prerequisites from probability/stochastic processes are developed as needed.
For continuous-time stochastic processes whose random variables are (Lebesgue) absolutely
continuous, the Fokker-Planck equation is employed to study the evolution of the densities,
with applications to predator-prey models with noisy coe?cients.
v
Acknowledgments
No one deserves more thanks than Dr. Paul Schmidt for his patience and guidance
throughout this endeavor. Dr. Georg Hetzer, Dr. Ming Liao, and Dr. Wenxian Shen are
all deserving of thanks as well, as I have taken one or more important classes from each of
them and they are members of my committee. Also, I?d like to thank Dr. Olav Kallenberg,
who guided me through the chapters on stochastic difierential equations in his book during
independent study. Finally, I would like to thank my parents for all their love and support.
vi
Style manual or journal used Journal of Approximation Theory (together with the style
known as \aums"). Bibliograpy follows van Leunen?s A Handbook for Scholars.
Computer software used The document preparation package TEX (speciflcally LATEX)
together with the departmental style-flle aums.sty.
vii
Table of Contents
1 Introduction and Preliminaries 1
1.1 Stochastic Processes and Their Distributions . . . . . . . . . . . . . . . . . 1
1.2 Semigroups of Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Kernels and Semigroups of Kernels . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Conditional Expectation, Martingales, and Markov Processes . . . . . . . . 17
1.5 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Ito Integrals and Stochastic Differential Equations 28
2.1 The Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Stochastic Difierential Equations and their Solutions . . . . . . . . . . . . . 41
2.3 Ito?s Formula and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Dynamical Systems and Stochastic Stability 60
3.1 \Stochastic Dynamical Systems" . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Koopman and Frobenius-Perron Operators:
The Deterministic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3 Koopman and Frobenius-Perron Operators:
The Stochastic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4 Liapunov Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5 Markov Semigroup Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.6 Long-time Behavior of a Stochastic Predator-prey Model . . . . . . . . . . . 107
Bibliography 113
viii
Chapter 1
Introduction and Preliminaries
1.1 Stochastic Processes and Their Distributions
Let (?;A;P) be a probability space, (S;B) a measurable space, and X an S-valued
random variable on ?, that is, a mapping from ? into S that is measurable with respect
to the  -algebras A and B. By the distribution of X, denoted by PX, we mean the image
of the probability measure P under the mapping X, that is, the probability measure on B,
deflned by PX(B) := P(X 2 B) := P(X?1(B)) for B 2 B. (Here, as in the sequel, we
take some liberties in our terminology. To be precise, we should of course refer to X as an
(S;B)-valued random variable on (?;A) and to PX as its P-distribution.)
Now let T be a non-empty set and X = (Xt)t2T a family of S-valued random variables
on ?; we call X a stochastic process on ?, with state space S and index set T. Clearly,
X can be thought of as a mapping from ? into the Cartesian product ST, deflned by
X(!) := X! := (Xt(!))t2T for ! 2 ?. The image X! of a point ! 2 ? is called the path of
!; the set ST, endowed with the product  -algebra induced by B, is called the path space
of X. With slight abuse of notation, we denote the product  -algebra of ST by BT. Since
BT is generated by the coordinate projections ?t : ST ! S, deflned by ?t(x) := xt for
x 2 ST and t 2 T, and since Xt = ?t ? X for t 2 T, measurability of X with respect to
the  -algebras A and BT is equivalent to the measurability of Xt for every t 2 T. In other
words, X is an ST-valued random variable on ?. Its distribution PX, a probability measure
on BT, is called the joint distribution of the random variables Xt, t 2 T.
1
It follows from a standard uniqueness theorem of measure theory that the probability
measure PX is uniquely determined by the values
PX
?\
t2F
??1t (Bt)
?
= P
?\
t2F
X?1t (Bt)
?
= P(Xt 2 Bt 8t 2 F);
where F varies over the non-empty flnite subsets of T and (Bt)t2F over the corresponding
flnite families of sets in B. In particular, even if T is inflnite, the distribution of the family
(Xt)t2T is uniquely determined by the distributions of the \flnite subfamilies" (Xt)t2F with
; 6= F ? T flnite, that is, by the probability measures Qt1;:::;tn := P(Xt1;:::;Xtn) with n 2N
and (t1;:::;tn) 2 Tn injective (that is, t1;??? ;tn are pairwise distinct); these are called
the flnite joint distributions of the random variables Xt, t 2 T, or the flnite-dimensional
distributions of the process X.
Note that for each n 2 N and (t1;:::;tn) 2 Tn injective, Qt1;:::;tn is a probability
measure on the product  -algebra Bn of Sn induced by B. Clearly, if B1;:::;Bn 2 B and
? is a permutation of the set f1;:::;ng, then
Qt1;:::;tn?B??1(1) ?????B??1(n)? = P?Xti 2 B??1(i) 8i 2f1;:::;ng?
= P?Xt?(j) 2 Bj 8j 2f1;:::;ng? = Qt?(1);:::;t?(n)?B1 ?????Bn?:
Also, if n ? 2 and Bn = S, then
Qt1;:::;tn?B1 ?????Bn? = P?Xti 2 Bi 8i 2f1;:::;ng?
= P?Xti 2 Bi 8i 2f1;:::;n?1g? = Qt1;:::;tn?1?B1 ?????Bn?1?:
Under certain restrictions on the state space S, a theorem due to Kolmogorov ensures,
roughly speaking, that any family of probability measures Qt1;:::;tn, consistent with the
2
above conditions, is in fact the family of flnite-dimensional distributions of a stochastic
process on some probability space (?;A;P); recall that (S;B) is a Polish space if B is the
Borel  -algebra generated by a complete and separable metric topology on S. Then we
have
Theorem 1.1 (Kolmogorov). Suppose (S;B) is a Polish space (that is, B is the Borel  -
algebra generated by a complete and separable metric topology on S), T is a non-empty
set, and for each n 2 N and (t1;:::;tn) 2 Tn injective, Qt1;:::;tn is a probability measure
on Bn (the product  -algebra of Sn, which in this case coincides with the Borel  -algebra
generated by the product topology of Sn). Further suppose that the following two conditions
are satisfled for all n 2N, (t1;:::;tn) 2 Tn injective, and B1;:::;Bn 2B:
(a) If ? is a permutation of f1;:::;ng, then
Qt1;:::;tn?B??1(1) ?????B??1(n)? = Qt?(1);:::;t?(n)?B1 ?????Bn?:
(b) If n ? 2 and Bn = S, then
Qt1;:::;tn?B1 ?????Bn? = Qt1;:::;tn?1?B1 ?????Bn?1?:
Then there exists a probability space (?;A;P), along with a family X = (Xt)t2T of S-valued
random variables on ?, such that Qt1;:::;tn = P(Xt1;:::;Xtn) for all n 2Nand (t1;:::;tn) 2 Tn
injective.
Note that while neither the probability space (?;A;P) nor the process X are uniquely
determined, the distribution PX is. We refer to [3, Section 35] for a detailed exposition
of these issues and a proof of Kolmogorov?s theorem (see in particular Theorem 35.3 and
3
Corollary 35.4 ibidem). For most purposes, the distribution of a stochastic process is much
more important than the process itself. This implies, of course, that both have the same
state space and index set, but the underlying probability spaces may be difierent.
Deflnition 1.1. Two processes are called equivalent if both have the same distribution.
Deflnition 1.2. Two processes X = (Xt)t2T and Y = (Yt)t2T over the same probability
space (?;A;P), with the same state space and index set, are called modiflcations of each
other if P(St2T(Xt 6= Yt)) = 0.
It is easily verifled that any two processes that are modiflcations of each other have the
same flnite-dimensional distributions and are thus equivalent.
Now suppose that X is a stochastic process over (?;A;P), with state space (S;B),
index set T, and distribution Q = PX. Then X is equivalent to the process ? := (?t)t2T on
(ST;BT;Q). To see this, note that the coordinate projections ?t, for t 2 T, are S-valued
random variables on ST and that ?, as a mapping from ST into ST, is the identity map:
?(!) := ?! := (?t(!))t2T = (!t)t2T = ! for all ! 2 ST; hence, Q? = Q.
Deflnition 1.3. The process ? as above is called the canonical process with distribution Q.
We can think of the canonical process as the standard representative of the equivalence
class of stochastic processes with distribution Q. Using this terminology, the assertion of
(1.1) may be stated as follows: There exists a unique probability measure Q on BT such that
the given probability measures Qt1;:::;tn coincide with the flnite-dimensional distributions of
the canonical process ? with distribution Q.
In the following we assume that the state space (S;B) is Polish (as in (1.1)) and that
X is a so-called continuous-time process, that is, the index set T is R+. For each ! 2 ?,
4
the path X! is then a curve in S, parametrized with t 2R+. If the curve X! is continuous
for every (or P-almost every) ! 2 ?, we say that X has continuous (or almost surely
continuous) paths. If X has almost surely continuous paths, an obvious and inconsequential
modiflcation of the underlying probability space will turn X into a process with continuous
paths. Also, any process with almost surely continuous paths admits a modiflcation with
continuous paths.
Saying that X has continuous paths is equivalent to saying that X maps ? into the
subspace C := C(R+;S) of SR+, that is, the subspace of continuous mappings from R+
into S. This space is, in general, not measurable as a subset of SR+; in fact, C =2 BR+
unless S is a singleton (see [3, Corollary 38.5]). However, C is a Polish space under
the topology of uniform convergence on compact subsets of R+, and the trace  -algebra
C \BR+ := fC \BjB 2BR+g coincides with the Borel  -algebra generated by this topol-
ogy. Also, C inherits a topology from SR+ (the product topology, which coincides with the
topology of pointwise convergence on R+), and the trace  -algebra C \BR+ coincides with
the Borel  -algebra generated by that topology as well (see [3, Theorem 38.6]).
Now suppose that X is a continuous-time process with Polish state space (S;B) and
distribution Q and that X is equivalent to a process with continuous paths. Then X is in
fact equivalent to the process e? := (?tjC)t2R+ on (C;C \BR+; eQ), where C := C(R+;S)
and eQ is deflned by eQ(C\B) := Q(B) for B 2BR+. That eQ is well deflned as a probability
measure on the trace  -algebra C \BR+ follows from the (non-trivial) fact that Q(B) = 1
for all B 2BR+ with B  C (in other words, C has Q-outer measure 1). For the proof, we
refer to [3, Sections 38{39], in particular Theorems 38.2{3 and Lemma 39.2 ibidem. To see
that X and e? are equivalent, observe that e?, as a mapping from C into SR+, is simply the
5
restrictionoftheidentitymapofSR+ toC. Thus, eQe?(B) = eQ(e??1(B)) = eQ(C\B) = Q(B)
for all B 2BR+; that is, eQe? = Q.
Deflnition 1.4. The process e? as above is called the C-canonical process with distribution
Q.
Whenever an equivalence class of continuous-time processes with Polish state space
contains a process with continuous paths, we think of the associated C-canonical process
e? (rather than the canonical process ?) as the standard representative of the equivalence
class.
In the next two sections, we discuss semigroups, which will be used in the short term
to prescribe a family of measures that satisfles Kolmogorov?s theorem and hence allows us
to construct Brownian motion.
6
1.2 Semigroups of Linear Operators
Let X be a Banach space. A family T := (Tt) = (Tt)t2R+ of bounded linear operators
Tt : X ! X is called a semigroup of linear operators (or, more simply, a semigroup) if
T0 = idX and Tt+s = TtSs for all t;s 2R+. If limt!0+ k x?Ttx k= 0 for all x 2 X, then we
say T is strongly continuous. The inflnitesimal generator (or, more simply, the generator)
of a strongly continuous semigroup T is the operator A : D(A) ? X ! X deflned by
Ax := lim
t!0+
Ttx?x
t
for all x 2 D(A), the set of x 2 X such that the limit exists. We say a semigroup T is
a  -contraction semigroup if, for some nonnegative constant  , k Tt k? e t for all t ? 0,
where k Tt k is the operator norm of Tt. We say T is a contraction semigroup if  = 0.
Call Cb(Rn;R) the space of bounded, continuous functions mapping Rn into R, and
call C0(Rn;R), the subset of Cb(Rn;R) such that limjxj!1f(x) = 0. Equip C0(Rn;R) with
the sup norm to make it a Banach space.
Deflnition 1.5. A contraction semigroup of linear operators T on C0(Rn;R) is called a
Feller semigroup if
1. for every t ? 0, Tt maps C0(Rn;R) into itself, and
2. limt!0 Ttf(x) = f(x) for all f 2 C0(Rn;R) and x 2Rn.
It can be shown ([9, Theorem 19.6]) that Feller semigroups are strongly continuous.
Strong continuity is quite valuable due to the following theorem (see [17, Theorem
2.3.2]):
7
Theorem 1.2. Any strongly continuous semigroup (Gt) with inflnitesimal generator A has
the property that, for any x 2 D(A), Gtx 2 D(A) for all t 2R+, t 7! Gtx is C1, and
d
dt(Gtx) = AGtx = GtAx:
In another way, u : t 7! Ttx solves the initial value problem _u = Au, u(0) = x. So,
formally, u(t) should be of the form etA, that is, Ttx = etAx or Tt = etA. We would like to
have a way to guarantee that a given operator A, generally unbounded, indeed will be the
generator of a strongly continuous semigroup.
Let X be a normed linear space, A : D(A) ? X ! X a linear operator. Consider the
equation
Ax = y:
To guarantee the existence and uniqueness of a solution x 2 D(A), for every y 2 X, and
the continuous dependence of x on y, the operator A must be one-to-one and onto, with
a bounded inverse A?1. Assuming X to be complete and A to be closed, the latter is
automatic, by the open-mapping theorem. More generally, consider the equation
(A??I)x = y;
where I := idX and ? 2C. Existence, uniqueness, and continuous dependence are guaran-
teed if ? belongs to the resolvent set of A, as deflned below.
8
Deflnition 1.6. For a Banach space X and a closed linear operator A : D(A) ? X ! X,
deflne ?(A), the resolvent set of A, by ?(A) := f? 2 C j A??I is one-to-one and ontog.
Then deflne R(?;A), the resolvent of A, by R(?;A) := (A??I)?1.
Theorem 1.3 (Hille-Yosida). For a Banach space X, a closed, densely deflned linear oper-
ator A : D(A) ? X ! X is the inflnitesimal generator of a strongly continuous semigroup
of contractions if and only if
1. (0;1) ? ?(A), and
2. for each ? > 0, k R(?;A) k? 1?.
We refer to [17, pp. 51-56] for the proof.
In the next section, we present a discussion of semigroups of kernels with the construc-
tion of Brownian motion in mind.
9
1.3 Kernels and Semigroups of Kernels
Let (?1;A1) and (?2;A2) be given measurable spaces.
Deflnition 1.7. A function k : ?1 ?A2 !R+ with the properties
1. k(?;A2) is A1-measurable for all A2 2A2,
2. k(!;?) is a (probability) measure on A2 for all ! 2 ?1,
is called a (probability) kernel from (?1;A1) to (?2;A2).
We also call a probability kernel a Markov kernel or say that the kernel is Markovian.
Further, if (?1;A1) equals (?2;A2), we call k a kernel on (?1;A1), or simply a kernel on
?1.
Let us establish some notation here; call B the Borel  -algebra onR, call B+ the Borel
 -algebra onR+, and callBn the Borel  -algebra onRn for any n 2N. Call ?n the Lebesgue
measure on (Rn;Bn); we may simply call ? := ?n when n is understood. Given x 2Rn, call
?x the point mass at x, that is, the measure that satisfles ?x(A) = 1 if x 2 A and ?x(A) = 0
else, for A 2Bn.
Let (?;A;?) be a  -flnite measure space and let M(?;A) denote the space of all R-
valued functions on ? that are measurable with respect to A and B. For p 2 [1;1), let
Lp(?;A;?) denote the space of functions f belonging toM(?;A) such thatjfjp is integrable
(with respect to ?); call L := L1. Let L1(?;A;?) denote the space of functions f belonging
to M(?;A) such that the essential supremum of jfj is flnite. When the associated  -algebra
and measure are understood, we may abbreviate Lp(?) and L1(?) for Lp(?;A;?) and
L1(?;A;?), respectively; we frequently understand R+?? to have  -algebra B+?A and
measure ?+ ?? (where ?+ is Lebesgue measure on (R+;B+).
10
Given f and g in M(?;A), we say that f is equivalent to g (with respect to ?) if
?(f 6= g) = 0. Note that if f is integrable and equivalent to g, then g is also integrable.
Denote the equivalence classes of M(?;A) and Lp(?;A;?) by M(?;A;?) and Lp(?;A;?),
respectively; we frequently \identify" an equivalence class with an arbitrary member. Also,
if f 2M(?;A) and f is nonnegative, we say f 2M+(?;A); we give the analogous meaning
to Lp+;M+; and Lp+.
Now, a kernel k from (?1;A1) to (?2;A2) determines a mapping K of M+(?2;A2)
into M+(?1;A1), deflned by
(Kf2)(!1) :=
Z
f2(!2)k(!1;d!2);
for !1 2 ?1 and f2 2 M+(?2;A2). Let us refer to K as the integral operator associated
with k. Note that for any A2 2A2, K1A2 = k(?;A2). In particular, K1?2 = 1?1 if and only
if k is Markovian.
Kernels may be composed in the following way: for i = 1;2, let ki be a kernel from
(?i;Ai) to (?i+1;Ai+1). We may deflne the composition k1k2 in terms of the composition
of the associated integral operators K1 and K2:
(k1k2)(?;A3) := K1K21A3:
Then k1k2 is a kernel from (?1;A1) to (?3;A3), and we have
(k1k2)(!1;A3) =
Z
k1(!1;d!2)k2(!2;A3);
11
for all !1 2 ?1 and A3 2A3. Observe that if k1;k2 are Markovian then so is k1k2. We need
the composition of kernels to deflne semigroups of kernels.
Deflnition1.8. If (Pt) is a family of kernels on a measurable space (S;B) and if Ps+t = PsPt
for all s;t ? 0, then we say (Pt) := (Pt)t2R+ is a semigroup of kernels on S.
We remark that a semigroup of kernels satisfles Ps+t(x;B) = R Ps(x;dy)Pt(y;B) for
x 2 S;B 2B, often called the Chapman-Kolmogorov property.
Deflnition 1.9. A semigroup of kernels (Pt) is called normal if P0(x;?) = ?x for all x 2 S.
We call (Pt) Markovian if each kernel Pt is Markovian.
Now, let (Pt) be a semigroup of kernels on (Rn;Bn).
Deflnition 1.10. (Pt) is called translation-invariant if Pt(x;B) = Pt(x+z;B +z) for all
x;z 2Rn;t ? 0; and B 2Bn.
It can be shown that translation-invariant semigroups of kernels must be normal (see
[3, 29.7 and p.311]). The following proposition demonstrates the importance of these semi-
groups.
Proposition 1.1. Given a translation-invariant (Pt) as above,
1. deflne T := (Tt)t?0 by
Ttf =
Z
f(y)Pt(?;dy); (1.1)
for any t 2 R+ and f 2 L1(Rn). Then T is a contraction semigroup of linear
operators on L1(Rn);
12
2. deflne (?t) := (?t)t?0 by ?t(B) := Pt(0;B) for all B 2Bn. Then (?t) is a convolution
semigroup of measures on Bn, that is, for all s;t ? 0, for all B 2Bn, (?t) satisfles
?s+t(B) =
Z
?s(dy)?t(B ?y) = (?s ??t)(B):
Proof. The second claim is simple, so we only address the flrst claim (which in fact does
not require translation invariance of (Pt)). It remains to note that T is indeed a semigroup;
(Pt) satisfles the Chapman-Kolmogorov property and T0 is the identity mapping since (Pt)
is normal:
T0f(x) =
Z
f(y)P0(x;dy) =
Z
f(y)?x(dy) = f(x):
Note that kTtfk?kfk since (Pt) is Markovian, so that T is a contraction semigroup.
Conversely, given a convolution semigroup of measures (?t) on Bn, if we deflne
Pt(x;B) := ?t(B ?x) for all t ? 0;x 2 Rn;B 2 Bn, then (Pt) is a translation-invariant
semigroup of kernels on Rn ([3, pp. 310-311]). Notice that (Pt) is a translation-invariant
Markov semigroup ifi (?t) is a convolution semigroup of probability measures.
At this point, an intuitive interpretation of a translation-invariant Markov semigroup
is helpful. Think of Pt(x;B) as the probability that a randomly moving particle starting
at x at time 0 is in the set B at time t. We see that the semigroup property means
there is no memory, in the sense that we need not understand the history of the particle?s
movement, rather, we only need to know where it is at time t to yield the probability that
it is in some set at time t + s. Thinking that the particle is \in dy" at time t, we can see
from the Chapman-Kolmogorov property that Pt+s(x;B) = R Pt(x;dy)Ps(y;B), or that the
probability a particle is in a set B at time t+s can be obtained from Pt(x;dy) (which we
13
think of as the \present") and Ps(y;B) (which is the probability that the particle starting at
y ends up in B at time s). This semigroup reasoning is similar to concepts in deterministic
dynamical systems, which will be discussed later.
Armed with this intuitive understanding of translation-invariant Markovian semigroups
of kernels, we realize the next step: that translation-invariant Markovian semigroups of ker-
nels lead to measures which satisfy the hypotheses of Kolmogorov?s theorem, and hence, lead
totheconstructionofstochasticprocesses(inparticular, Brownianmotion)whichwillmodel
random particle motion in a natural way. The idea is, if we take times t1 < t2 < ??? < tk
and sets B1;B2;??? ;Bk in Bn, we may construct the iterated integral
Z
B1
Z
B2
???
Z
Bk
Ptk?tk?1(xk?1;dxk)Ptk?1?tk?2(xk?2;dxk?1)???Pt1(x0;dx1): (1.2)
For a particle starting at x0, this integral models random particle motion without memory,
in the sense that it gives the probability that at times t1;t2;??? ;tk, the particle is found
successively in B1;B2;??? ;Bk. We could even, by tacking on another integral in (1.2),
impose that the particle?s initial location is random; let ? be a probability measure on
Bn that describes the distribution of the initial location of the particle. Then we would
integrate over Rn with respect to ? over the variable x0:
Z
Rn
Z
B1
???
Z
Bk
Ptk?tk?1(xk?1;dxk)???Pt1(x0;dx1)?(dx0): (1.3)
14
Then it can be shown [3, 36.4] that given (Pt) and ? as above, for x := (x1;x2;??? ;xk) and
for any B 2Nki=1Bi (where Bi = Bn for all 1 ? i ? k), the measures Pt1;t2;???;tk, deflned by
Pt1;t2;???;tk(B) :=
Z
Rn
Z
B1
???
Z
Bk
1B(x)Ptk?tk?1(xk?1;dxk)???Pt1(x0;dx1)?(dx0); (1.4)
for B 2 Bn, satisfy the hypotheses of Kolmogorov?s theorem. The family of measures in
(1.4) are thus the flnite-dimensional distributions of some stochastic process with state space
Rn.
The canonical process X associated with this stochastic process has a distribution
which depends only on (Pt) and ?, so let us denote this distribution by P?. This means
P?(Xt1 2 B1;Xt2 2 B2;??? ;Xtk 2 Bk) =
Z
Rn
Z
B1
Z
B2
???
Z
Bk
Ptk?tk?1(xk?1;dxk)???Pt0(x0;dx1)?(dx0)
holds for all B1;B2;??? ;Bk in Bn. Also, the P?-distribution of X0 is ?, and so we may
refer to ? as the initial distribution of the process.
Processes constructed as in the above enjoy some useful and intuitive properties.
Deflnition 1.11. A process X with state space (Rn;Bn) has stationary increments if there
is a family of probability measures (?t) on Bn such that ?t?s = PXt?Xs; this means that
the distribution of Xt ?Xs depends only on t?s.
Deflnition 1.12. A process X with state space (Rn;Bn) has independent increments if
Xt0;Xt1 ?Xt0;??? ;Xtk ?Xtk?1 are all independent for any t0;t1;:::;tk 2R+ with
t0 < t1 < ::: < tk, for any k ? 1.
15
It can be shown ([3, 37.2]) that the canonical process derived from a translation-
invariant Markov semigroup of kernels (Pt) and initial distribution ? has stationary and
independent increments.
In the next section, we will explain conditional probability, martingales, and Markov
processes, and then we will be able to prescribe a particular (Pt) so that we can construct
Brownian motion.
16
1.4 Conditional Expectation, Martingales, and Markov Processes
Let (?;A;?) be a  -flnite measure space, f 2 L(?;A;?). Then for any A 2 A, we
deflne
?f(A) :=
Z
A
fd?:
We say that ?f is the signed measure that has density f with respect to ?; this implies the
relation
Z
gd?f =
Z
gfd?;
for all g 2L(?;A;?f). Further, note that g 2L(?;A;?f) ifi gf 2L(?;A;?). Finally, note
that ?f is a flnite signed measure on (?;A) that is absolutely continuous with respect to
?, that is, ?f(A) = 0 whenever A 2A and ?(A) = 0.
Conversely, given any flnite signed measure ? on (?;A) that is absolutely continuous
with respect to ?, there exists by the Radon-Nikodym theorem a function f 2 L(?;A;?),
unique up to modiflcation on a ?-null set, such that ? = ?f. The equivalence class of all
f such that ? = ?f is called the Radon-Nikodym derivative of ? with respect to ? and
is denoted d?d?. Note that if f is any representative of d?d?, we have R gd? = R gfd? for
all g 2 L(?;A;?), or (formally) d? = fd?; we frequently \identify" d?d? with an arbitrary
representative. This justifles the \difierential" notation d?d? for Radon-Nikodym derivatives.
We also have a number of rules for Radon-Nikodym derivatives that are reminiscent of the
rules of difierential calculus, for example, the chain rule: if ?1 is a flnite signed measure
on (?;A), if ?2 and ?3 are flnite measures on (?;A), if ?1 is absolutely continuous with
respect to ?2, and if ?2 is absolutely continuous with respect to ?3, then ?1 is absolutely
17
continuous with respect to ?3, and
d?1
d?3 =
d?1
d?2
d?2
d?3:
In particular, d?2d?3 d?3d?2 = 1 when ?2 and ?3 are both absolutely continuous with respect to
each other.
Now, let (?;A;P) be a probability space, let ? 2 L(?;A;P), and let F ? A be a
 -algebra. Then P?, the signed measure that has density ? with respect to P, restricts to
a measure on F, namely, P?jF, which is absolutely continuous with respect to PjF, the
restriction of P to F. This leads to the deflnition of conditional expectation.
Deflnition 1.13. The conditional expectation of ? given F, denoted EF? or E(?jF), is the
Radon-Nikodym derivative of P?jF with respect to PjF.
Note that E(?jF) is the unique member of L(?;F;PjF) such that
Z
F
E(?jF)dP =
Z
F
?dP;
for all F 2F.
Deflnition 1.14. The expected value of ?, denoted by E?, is deflned as
E? :=
Z
?
?dP:
The conditional expectation of ? given an event A 2 A with P(A) > 0, denoted by E(?jA),
is deflned as
E(?jA) := 1P(A)
Z
A
?dP:
18
It is helpful to consider examples. We see if F is the  -algebra induced by ?, or if ? has
an F-measurable version, then EF? = ? P-a.s.; in this case we have the \pull out" property
EF?? = ?EF? P-a.s., for any ? 2 L(?). If instead F = f;;?g, or if ? is independent of F,
then EF? = E? P-a.s. Along these lines, for A 2 A such that 0 < P(A) < 1, if we take
F = f;;A;Ac;?g, then EF? = E(?jA)1A + E(?jAc)1Ac P-a.s. Also, if A 2 F;P(A) > 0,
and A has no proper nonempty subset belonging to F, then EF?jA = E(?jA) P-a.s.
We use conditional expectation to deflne conditional probability; observe that
E1A = R? 1AdP = P(A).
Deflnition 1.15. The conditional probability given F, denoted PF, is deflned by
PF(A) := EF(1A);
for all A 2A.
Note that PF is not a probability measure, rather, it maps members of A intoR-valued
random variables on ?, with the property that RF PF(A)dP = P(A\F), for all F 2F.
We will use conditional expectation to deflne martingales, but flrst we need some
deflnitions.
Deflnition 1.16. Given a measurable space (?;A), a family of  -algebras F := fFtgt?0
such that Fs ?Ft for s ? t with Ft ?A for all t ? 0 is called a flltration of A.
For simplicity we usually just call F a flltration. Now let X be a continuous-time
stochastic process on (?;A) with state space (S;B).
Deflnition 1.17. We call  (Xsjs ? t) the  -algebra generated by (Xs)s?t, that is, the
smallest  -algebra that contains X?1s (B) for every B 2Bn and s ? t.
19
We call F(X) := (Ft(X))t2R+ the flltration generated (or induced) by X, where
Ft(X) =  (Xsjs ? t) for each t.
We say X is adapted to a flltration F if Xt is Ft-measurable for all t.
Observe that F(X) is the smallest flltration for which X is adapted. If F is understood
we may simply say that X is adapted.
We see that flltrations add more and more sets (or at least, no less sets) as time
increases; by increasing the size of a  -algebra, the potential for the process to take new
values is increased. For example, a measurable R-valued function on ? that has only one
value only generates the trivial  -algebra (`;?). A measurable function taking two values,
say, f(!) = 1 when ! 2 A and f(!) = 0 when ! 2 Ac, generates the  -algebraf`;A;Ac;?g.
Thus the increasing in the flltration describes the \increase of randomness," and the size of
Ft is indicative of the possible deviation of Xt from its expected value.
Now let X have state space (R;B).
Deflnition 1.18. We say X is an integrable process, or simply, X is integrable, if Xt is
an integrable random variable for each t.
Given a flltration F, we say X is a martingale with respect to F if X is an integrable,
adapted process that satisfles, P-a.s.,
Xs = E(XtjFs);
for every t, for s ? t.
20
For an example of a martingale, flx ? 2 L(?;A;P) and a flltration F. Deflne a
continuous-time process M by
Mt = E(?jFt);
for every t. Then M is integrable and adapted to F. For s ? t, we have Fs ? Ft, so
EFsEFt? = EFs? P-a.s. Thus we have the P-a.s. relation
Ms = E(?jFs) = E(E(?jFt)jFs) = E(MtjFs):
Thus, M is a martingale. Intuitively, martingales are \fair games" in the sense that the
expected value of \winnings" at a later time are exactly the value of \winnings" at present.
Next, let F1;F2; and G be sub  -algebras of A.
Deflnition 1.19. The  -algebras F1 and F2 are called conditionally independent given G,
denoted F1 ?G F2, if a.s.,
PG(F1 \F2) = PG(F1)PG(F2)
for all F1 2F1, F2 2F2.
We now deflne Markov processes.
Deflnition 1.20. For X a continuous-time process on (?;A) and a flltration F of A, we
call X a Markov process if it is adapted to F and if for all s;t 2 R+ with s ? t, Fs and
 (Xt) are conditionally independent given  (Xs).
Intuitively, for Markov processes one may think that the past is independent of the
future given the present, in the sense that knowing the state Xs makes the future predictions
Xt independent of the \history" Fs.
21
Markov processes are precisely those processes which are generated by translation-
invariant Markovian semigroups of kernels (with respect to the induced flltration; the non-
trivial proof can be found in [3, Theorem 42.3]). Since the semigroup property is essential
both to dynamical systems and the construction of a Markov process, one can interpret
a Markov process as a randomized dynamical system. As we will see, Markov processes
are of value in understanding the dynamics generated by solutions of stochastic difierential
equations (much like the dynamics of deterministic difierential equations).
In the next section, we will motivate the need for Brownian motion and prescribe a
special translation-invariant Markovian semigroup of kernels in order to construct it. We
will further prove some useful properties of Brownian motion.
22
1.5 Brownian Motion
We will now proceed to prescribe the speciflc Markov semigroup of kernels (Pt) to
construct Brownian motion. We will flrst motivate our selection of (Pt) by returning to our
intuition of how particles undergo random motion. Consider the \drunken sailor" problem,
where a drunken sailor stands on the origin 0 and starts taking unit length steps in random
directions. After each step, he randomly steps in a difierent direction. The question is,
\Where does he end up after n steps?"
The obvious answer is that we do not know; his position is described by a random
variable. He is expected to be where he started, as he has the same chance of going left as
right, or forward as backward. But the variance depends directly on the number of steps; he
cannot stray far in a short number of steps, for example, so one could expect a low variance
in this case. So what is the distribution of this random variable?
The key is the Central Limit Theorem; one fairly simple version is in [9, Proposition
5.9], which says that for independent, identically distributed Rd-valued random variables
?;?1;?2;::: with E? = 0 and E?2 = 1; then as n ! 1, n?12 Pk?n ?k converges in distribu-
tion to a standard normally distributed random variable ?, that is, a normally distributed
random variable with mean 0 and variance 1. We may say for brevity that ? is N(0;1).
So, in the drunken sailor problem, the random variable describing where a sailor will end
up after a large enough number of steps is normal with mean 0 and variance n (see e.g. [3,
p.221-p.226]).
Now, one can think of n as time moving continuously rather than as a discrete number
of steps; call it t now. So, imagine a continuous-time stochastic process X having initial
distribution ?x. This represents the initial location of a particle at x known with probability
23
1, where the densities (assuming they exist) of Xt as time increases \ atten" into a Gaussian
curve, successively getting \more  at" the more time increases.
Imagine now that the sailor is not drunk, but in a heavy crowd, so that he is being
pushed around in a random direction. This is essentially the same problem, but it makes
more sense in a physical interpretation; particles are interacting with other particles, being
bumped into other particles which in turn bump into other particles ad inflnitum. This
model of particle movement is called Brownian motion, and it is a stochastic process where,
at time t, each random variable Bt has distribution N(0;t).
This type of random interference can be thought to perturb a trajectory as well, not
just a stationary object. For example, if a ball is thrown, one can model its path. But
now suppose there is lots of wind blowing in random directions; where does the ball go?
To describe this, we incorporate a \noise term" in the difierential equation. Quite sensibly,
this term should somehow be based on Brownian motion, which changes the otherwise
deterministic trajectory of the ball into a continuous-time stochastic process.
Recall that N(m;t) as a probability measure over (R;B) has (Lebesgue) density
gm;t(x) := ( 12?t)12e?(x?m)
2
2t ;
and observe that N(0;s)?N(0;t) = N(0;s+t). Deflne the Brownian convolution semigroup
of measures (?t) onRd by setting ?t equal to the product measure (d-many times) of N(0;t)
in R for each t, that is, ?t := Ndi=1 N(0;t). Then we can deflne the translation-invariant
Markov semigroup of kernels (Pt) by
Pt(x;A) :=
Z
A
( 12?t)12e?(y?x)
2
2t dy;
24
for any t 2R+, x 2Rd;A 2 B. After deflning our initial condition ? := ?0 := ?x, we may
construct a process ~B as in Section 1.3. If we write ~B = ( ~B1; ~B2;??? ; ~Bd); then ~Bi and ~Bj
are independent for all i 6= j (see [9, Lemma 3.10]).
We now prove ~B has a continuous version using the following result from [3, Theorem
39.3].
Theorem 1.4 (Continuous Paths). For a continuous-time stochastic process X on (?;A)
with state space (Rd;Bd), if for some positive constants a;b; and C, the inequality
E(jXt ?Xsja) ? Cjt?sjb+1 (1.5)
holds for all s;t 2R+, then X has a continuous version.
We use the following lemma to verify (1.5) for ~B.
Lemma 1.1. For ~B as above,
E(j~Bt ? ~Bsj4) = d(d+2)(t?s)2; (1.6)
Proof. This claim follows from the property that ~Bt?s is equal in distribution to (t?s)12 ~B1
(called the scaling property) and the following recursion for N(0;1)-distributed R-valued
random variables ? on (?;A) (which is easy to prove using integration by parts; see [3,
4.20]):
Z
R
x2ng0;1dx = E(?2n) = (2n?1)E(?2n?2); (1.7)
25
for any n 2 R. Now, to prove (1.6), we see that, for ( ~Bt ? ~Bs)i the i-th component of
~Bt ? ~Bs, and for 1 ? i ? d,
E(j~Bt ? ~Bsj)4 = E([( ~Bt ? ~Bs)21 +( ~Bt ? ~Bs)22 +???+( ~Bt ? ~B)2d]2);
and by stationarity and scaling, the above equals
(t?s)2E([Z21 +Z22 +???+Z2d]2);
where Z := (Z1;Z2;??? ;Zd) is an Rd-valued N(0;1)-distributed random variable. By alge-
bra and the independence, the above equals
(t?s)2[
dX
i=1
E(Z4i )+
Y
i6=j
2E(Z2i )E(Z2j)]: (1.8)
Now, E(Zi)2 = 1 for all 1 ? i ? n, and by the recursion (1.7), E(Zi)4 = 3E(Zi)2, so (1.8)
becomes (3d+d(d?1))(t?s)2, which is d(d+2)(t?s)2, so (1.6) holds.
So, it is a simple corollary to select a = 4;b = 1;C = d(d+2) and thus satisfy (1.5), so
we indeed have a continuous modiflcation B of ~B.
Deflnition 1.21. B as above is deflned to be a Brownian motion.
B is unique (up to equivalence to another C-canonical process); in another way, we
may interpret Brownian motion to be a probability measure PB0 (called Wiener measure)
on the path space (C(R+;Rn);B(C(R+;Rn)).
By construction, Brownian motion is a Markov process; it is easy to see that one-
dimensional Brownian motion is also a martingale (with respect to the induced flltration)
26
since, a.s.,
E(BtjFs) = E(Bt +Bs ?BsjFs) = E(BsjFs)+E(Bt ?BsjFs)
= Bs +E(Bt ?Bs) = Bs;
since B has independent increments, E(Bt ?Bs) = E(Bt)?E(Bs) = 0, and EF(X) = X
a.s. when X is an F-measurable random variable. This means B has stationary increments
as well, as we argued in the section on kernels.
Since ?0 = ?x, we sometimes write Bx instead of B to emphasize the starting point,
and hence we sometimes refer to Bx as a Brownian motion starting at x; if otherwise not
stated, we assume the Brownian motion starts at zero. Now we observe that the variance
of Bxt ?Bxs := Bs ?Bt is t?s. This is because
var(Bt ?Bs) = var(Bt?s ?x) = E[(Bt?s ?x)2]?E(Bt?s ?x)2
= E[B2t?s ?xBt?s +x2]?0 = E[B2t?s]?x2 +x2 = t?s;
since Bt has variance t for any t.
In the next chapter, we will see how to integrate with respect to a Brownian motion;
this will prove essential to the deflnition of a stochastic difierential equation.
27
Chapter 2
Ito Integrals and Stochastic Differential Equations
2.1 The Ito Integral
Let (?;A;P) be a probability space and X a continuous-time, real-valued stochastic
process on ?. Assuming that the paths of X are difierentiable, we can deflne the time-
derivative _X of X by
_X(t;!) := d
dtX
!(t);
for t 2R+ and ! 2 ?. It is easy to see that the mappings _Xt = _X(t;?) are measurable for all
t 2R+, so that _X is a stochastic process. Unfortunately, difierentiability of the paths is a
very restrictive assumption. For example, the paths of a one-dimensional Brownian motion
B on ? are continuous but nowhere difierentiable [10, Theorem 2.9.18]. Thus, the time-
derivative _B of B, frequently referred to as \white noise," does not exist in the na??ve sense.
Nevertheless, \white noise" plays an important role in the theory of stochastic difierential
equations.
By way of motivation, consider a simple scalar ODE,
_x = r(x)x
for a function x : R+ ! R, where r : R ! R is a given, su?ciently smooth function. We
can interpret x(t), for t 2R+, as the density of a population at time t, in which case r(x)
represents the per-capita growth rate of the population as a function of its density. The
growth rate of any real population is subject to random  uctuations; to model these, we
28
would like to add \white noise" to the function r. On a purely formal level, this idea leads
to a \stochastic difierential equation" of the form
_X = ?r(X)+W?X; (2.1)
where W = _B is the (formal) time-derivative of a Brownian motion B. The \solutions"
of this \stochastic difierential equation" should, of course, be continuous-time stochastic
processes X rather than functions x : R+ ! R. Since already the \antiderivatives" of W
(one-dimensional Brownian motions) have nowhere difierentiable paths, we cannot hope to
flnd stochastic processes X that satisfy (2.1) in the na??ve sense, that is,
d
dtX
!(t) = ?r(X!(t))+W!(t)?X!(t) (2.2)
for all t 2R+ and ! 2 ?, where (?;A;P) is the underlying probability space; instead, we
have to develop a notion of \weak" or \generalized" solutions of (2.1). The flrst step, still
on a purely formal level, is to rewrite (2.2) as an \integral equation,"
X!(t) = X!(0)+
Z t
0
?r(X!(s))+W!(s)?X!(s)ds; (2.3)
for t 2R+ and ! 2 ?. The most problematic term in (2.3) is, of course, the one involving
W (the formal time-derivative of B). This raises the question of how to make sense of
integrals of the form Rt0 X!(s)W!(s)ds, where t 2R+, W = _B, for some one-dimensional
Brownian motion B on ?, and X is a continuous-time, real-valued process on ?. Note that,
29
formally, Z
t
0
X!(s)W!(s)ds =
Z t
0
X!(s) _B!(s)ds =
Z t
0
X!(s)dB!(s)
for all t 2 R+ and ! 2 ?. The integral on the right appears to be a Riemann-Stieltjes
integral involving the real-valued functions X! and B!, but unfortunately, the paths of B
are not of bounded variation on compact subintervals ofR+ [9, Corollary 13.10]. Thus, the
integral does not exist, in general, in the classical Riemann-Stieltjes sense, no matter what
assumptions we make about the process X. Nevertheless, it is possible to rigorously deflne
the integral
(ItX)(!) =
Z t
0
X!(s)dB!(s);
for t 2R+, ! 2 ?, and a reasonably large class of continuous-time, real-valued processes X
on ?, in such a way that ItX is measurable for every t 2R+. The process Y := (ItX)t2R+
then qualifles as a weak or generalized antiderivative of WX (that is, a solution of the
\stochastic difierential equation" _Y = WX). In fact, there are several ways of doing
this. Our deflnition will be based on the use of left Riemann-Stieltjes sums and leads to
the so-called Ito integral. Other choices are possible; for example, the use of mid-point
Riemann-Stieltjes sums leads to the so-called Stratonovich integral [10, p. 350].
For all of the following, suppose that B is a one-dimensional Brownian motion on ?
and that X is a continuous-time, real-valued stochastic process on ?. Also, suppose that X
is adapted to the flltration F(B). As discussed in the section on conditional probabilities,
this has the interpretation that the random variable Xt, for t 2R+, is \no more random"
than the Brownian motion B up to time t, certainly a reasonable assumption if we think of
X as the solution of a \stochastic difierential equation" whose randomness is produced by
30
B. (For now, we ignore the efiect of a \random initial condition" X0 = X0 a.s., where X0
is a given random variable on ?, on the solution X.)
Now flx a; b 2R+ with a < b. We wish to deflne the Ito integral
(Ia;bX)(!) =
Z b
a
X!(t)dB!(t);
for ! 2 ?, under suitable additional assumptions on X. To that end, endow the interval
[a;b] with the Borel  -algebra B[a;b] and the Lebesgue-Borel measure ?[a;b]. The Cartesian
product [a;b]?? is then naturally endowed with the product  -algebra B[a;b] ?A and the
product measure ?[a;b]?P. Given any flltration F of A, let LpF([a;b]??), for p ? 1, denote
the set of all (equivalence classes of) F-adapted functions in Lp([a;b]??) (that is, functions
Y 2 Lp([a;b]??) such that Yt = Y(t;?) is Ft-measurable for every t 2 [a;b]); we are most
interested in L2F([a;b]??).
Lemma 2.1. For any flltration F of A, L2F([a;b]??) is a closed linear subspace of
L2([a;b]??).
Proof. That L2F([a;b]??) is closed is the only nonobvious part; to see this let
(Yn)n2N 2 L2F([a;b]??)N, let Y 2 L2([a;b]??), and let Yn ! Y in L2([a;b]??). Then
there is a subsequence (Ykn)n2N of (Yn)n2N that converges to Y pointwise almost everywhere.
Modifying the functions Ykn on a set of measure zero if necessary, we may assume that
Ykn(t;!) ! Y(t;!) for all t 2 [a;b] and ! 2 ?. But then, for every t 2 [a;b], Y(t;?) is the
pointwise limit of the Ft-measurable functions Ykn(t;?), and thus, Ft-measurable; that is,
Y 2 L2F([a;b]??).
31
Deflnition 2.1. We call a measurable function Y on [a;b]?? simple if it can be written
in the form
Y(t;!) =
nX
j=1
Yj(!)1[tj?1;tj)(t);
for t 2 [a;b] and ! 2 ?, where n is a positive integer, (tj)nj=0 is a partition of the interval
[a;b], and (Yj)nj=1 is a sequence of measurable functions on ?.
Given a flltration F of A, such Y will belong to L2F([a;b]??) if and only if Yj is square-
integrable and Ftj?1-measurable for every j 2f1;:::;ng. The set of all simple functions in
L2F([a;b]??) can be shown to be dense in L2F([a;b]??) [12, pp. 18-20].
It is obvious how to deflne the Ito integral Ia;bX if X 2 L2F([a;b]??) is simple; given a
representation of the form X(t;!) = Pnj=1 Xj(!)1[tj?1;tj)(t), for t 2 [a;b] and ! 2 ?, with
n 2N, (tj)nj=0 a partition of [a;b], and (Xj)nj=1 2 L2(?)n where Xj is Ftj?1-adapted for all
j 2f1;2;??? ;ng, we let
(Ia;bX)(!) :=
nX
j=1
Xj(!)?B!(tj)?B!(tj?1)?;
for ! 2 ?. The sum on the right-hand side is independent of the representation of X and
coincides with the left Riemann-Stieltjes sum of X! with respect to B! for any partition
of [a;b] that is a reflnement of the partition (tj)nj=0.
Theorem 2.1 (Ito Isometry for Simple Functions). Let F = F(B) and let
X 2 L2F([a;b]??) be simple. Then Ia;bX 2 L2(?) with
kIa;bXk2L2(?) =
nX
j=1
(tj ?tj?1)kXjk2L2(?) = kXk2L2([a;b]??):
32
Proof. First note that
kIa;bXk2L2(?) =
Z
?
? nX
j=1
Xj(Btj ?Btj?1)?2dP
=
nX
i;j=1
Z
?
XiXj(Bti ?Bti?1)(Btj ?Btj?1)dP
=
nX
i;j=1
E?XiXj(Bti ?Bti?1)(Btj ?Btj?1)?;
where E denotes expectation with respect to P. Next, realize that i 6= j (say, without loss
of generality, i < j) implies
E?XiXj(Bti ?Bti?1)(Btj ?Btj?1)? = 0: (2.4)
This is because Xi, Xj, and (Bti ?Bti?1) are Fti?1-measurable and because of independent
increments (so that E(Btj ? Btj?1) = EFti?1(Btj ? Btj?1)). Therefore, by deflnition of
conditional probability and use of the the \pull out" property (see the section on conditional
probability), we have
E?XiXj(Bti ?Bti?1)(Btj ?Btj?1)?
= E?EFti?1(XiXj(Bti ?Bti?1)(Btj ?Btj?1))?
= E?XiXj(Bti ?Bti?1)EFti?1(Btj ?Btj?1)?
= E?XiXj(Bti ?Bti?1)E?Btj ?Btj?1)?
= E?XiXj(Bti ?Bti?1)?E(Btj ?Btj?1):
33
Since E(Btj ?Btj?1) = 0, we have shown (2.4).
Next, when i = j,
E(XiXj(Bti ?Bti?1)(Btj ?Btj?1)) = E(X2i )(ti ?ti?1); (2.5)
since by the same argument used to show (2.4),
E(X2i (Bti ?Bti?1)2) = E(X2i )E(Bti ?Bti?1)2;
and since B has stationary increments,
E(Bti ?Bti?1)2 = E(Bti?ti?1)2 = var(Bti?ti?1) = ti ?ti?1:
Finally, by combining (2.4) and (2.5), we see that
kIa;bXk2L2(?) =
nX
i;j=1
E(XiXj(Bti ?Bti?1)(Btj ?Btj?1))
=
nX
i=1
E(X2i )(ti ?ti?1) = kXk2L2([a;b]??):
Therefore, Ia;b isa(linear)isometryfromadense(linear)subspaceofL2F([a;b]??)into
L2(?); as such, it has a unique extension to a linear isometry Ia;b : L2F([a;b]??) ! L2(?).
This deflnes the Ito integral Ia;bX for every X 2 L2F([a;b]??), and we have the Ito isometry,
kIa;bXkL2(?) = kXkL2([a;b]??):
34
We will use the symbol Rba Xt dBt to denote the Ito integral Ia;bX.
Stated rigorously:
Deflnition 2.2. Let F := F(B). For every X 2 L2F([a;b]??), the Ito integral Rba Xt dBt
exists and is deflned by
Z b
a
X(t;!)dBt(!) := limn!1
Z b
a
Yn(t;!)dBt(!)
(convergence in L2(?)), where (Yn)n2N is any sequence of simple functions that approach
X in L2F([a;b]??).
Note that, due to Fubini?s theorem, Rba X2t dt is an integrable function on ?, with
R
?
?Rb
a X
2t dt?dP = kXk2
L2([a;b]??). This allows us to write the Ito isometry in the form
E
?Z b
a
Xt dBt
?2
= E
?Z b
a
X2t dt
?
:
Now, if X is a continuous-time, real-valued process on ? such that
X 2 L2F([0;t]??) for every t 2 R+ then ItX = Rt0 XsdBs is deflned for every t 2 R+,
and IX := (ItX)t2R+ is a stochastic process. It can be shown that IX is a martingale
with respect to F, and as a consequence, has a modiflcation with continuous paths (see
[12, pp.22-26] for more details). In the future, we will assume without saying that IX has
continuous paths.
Deflnition 2.2 is enough to make sense ofRt0 XsWsds = Rt0 XsdBs on the right hand side
of (2.3), provided that X 2 L2F([0;t]??). This condition needs to be part of the notion
of a \solution" of equation (2.3). As discussed earlier, F = F(B)-adaptedness of X is a
reasonable requirement as long as the \randomness" of X is \caused" solely by B.
35
Now, instead of solving just one scalar \stochastic difierential equation", we would like
to solve coupled systems of such equations. By way of motivation, consider the system
_Xi = (ri(X)+Wi)Xi; 1 ? i ? n (2.6)
for an Rn-valued process X = (X1;X2;??? ;Xn), where
r = (r1;r2;??? ;rn) : Rn ! Rn
is a su?ciently smooth vector fleld, B := (B1;B2;??? ;Bn) is an n-dimensional Brownian
motion, and W = (W1;W2;??? ;Wn) = ( _B1; _B2;??? _Bn). In integral form, equation (2.6)
reads
Xit = Xi0 +
Z t
0
ri(Xs)Xisds+
Z t
0
XisdBis; 1 ? i ? n: (2.7)
Using Deflnition (2.2), the integral on the far right would make sense if we could assume
that Xi 2 L2F(Bi)([0;t]??). Unfortunately, this is not a reasonable assumption: due to the
coupling of the equations, Xi is afiected by all components of B; thus Xi should be F(B)-
adapted, but cannot be expected to be F(Bi)-adapted! Luckily, the assumption F = F(B)
in Theorem 2.1 and Deflnition 2.2 (where B is a one-dimensional Brownian motion) can be
relaxed | it is enough to assume that F is a flltration of A such that B is a martingale
with respect to F.
Under this assumption (clearly satisfled if F = F(B)), the proof of the Ito isometry
for simple functions (Theorem 2.1) still goes through (note that EF(Bitj ?Bitj?1) = 0 , for
36
1 ? j ? n, and then, so does the entire construction, culminating in Deflnition 2.2 (for
more details, see ([12, p.24]))
Now, if B = (B1;B2;??? ;Bn) is an n-dimensional Brownian motion, then each com-
ponent Bi is a martingale with respect to F = F(B). This is true since Bit ? Bit?c is
independent of Ft?c for c 2 (0;t), which means
E(BitjFt?c) = E(Bit ?Bit?c +Bit?cjFt?c) = 0+Bit?c: (2.8)
As a consequence, the integral on the far right of equation (2.7) is deflned if
Xi 2 L2F(B)([0;t]??) for 1 ? i ? n, as desired.
Note that, in vector notation, the system can be written as
_X = U(X)+V(X)W; (2.9)
where U(X) = (r1(X)X1;r2(X)X2;??? ;rn(X)Xn) and V(X) is the diagonal n?n-matrix
whose diagonal entries are X1;??? ;Xn. Of course, we would like to consider more general
systems of the form (2.9), with arbitrary (su?ciently smooth) functions U :Rn !Rn and
V : Rn ! Rn?n. Also we would like to allow for the possibility that only some of the
equations are afiected by white noise, say, the flrst d equations, where 1 ? d ? n. In this
case,
W = (W1;??? ;Wd;0;??? ;0) = ( _B1;??? ; _Bd;0??? ;0);
where B is a d-dimensional Brownian motion. Only the flrst d columns of V are then
relevant, and we may as well assume that V :Rn !Rn?d.
37
Under these assumptions, the integral version of equation (2.9) is
Xt = X0 +
Z t
0
U(Xs)ds+
Z t
0
V(Xs)dBs: (2.10)
Of course, the integrals are understood \componentwise", that is,
Z t
0
U(Xs)ds =
 Z t
0
Ui(Xs)ds
?n
i=1
;
Z t
0
V(Xs)dBs =
 Z t
0
dX
j=1
V ij(Xs)dBjs
?n
i=1
=
 dX
j=1
Z t
0
V ij(Xs)dBjs
?n
i=1
:
The second integral is well deflned, provided that V ij(X) 2 L2F([0;t]??) for all 1 ? i ? n,
1 ? j ? d, where F is a flltration of A such that each component of B is a martingale with
respect to F. Note that if X is F-adapted and V is continuous, then V(X) is F-adapted.
This motivates the following deflnition.
Deflnition 2.3. Let (?;A;P) be a probability space, n 2 N, d 2 f1;??? ;ng. Let B be a
d-dimensional Brownian motion on ?, F a flltration of A such that each component of B
is a martingale with respect to F. Let U be an Rn-valued process on ?, V an Rn?d-valued
process on ? such that Ui 2 L2F([0;t] ? ?) and V ij 2 L2F([0;t] ? ?) for all 1 ? i ? n,
1 ? j ? d, t 2R+.
If X0 is an Rn-valued random variable on ?, the process X, deflned by
Xt = X0 +
Z t
0
Usds+
Z t
0
VsdBs; (2.11)
38
for t 2 R+, is called a stochastic integral generated by (U;V). The set of all stochastic
integrals generated by (U;V) is denoted by
Z
Utdt+
Z
VtdBt; (2.12)
with slight abuse of language, we call this the stochastic integral generated by (U;V).
Formally, the process X deflned by (2.11) is an \antiderivative" of U +V _B, that is, a
solution of the \stochastic difierential equation"
_X = U +V _B; (2.13)
or, in difierential notation,
dXt = Utdt+VtdBt: (2.14)
In the same sense, the stochastic integral (2.12) is the set of all \antiderivatives" (the
\indeflnite integral") of U + V _B, that is, the \general solution" of (2.13)/(2.14). The
formal expression Utdt+VtdBt is called the stochastic difierential generated by (U;V).
We note that the assumptions on V in Deflnition 2.3 are needed to guarantee the exis-
tence of the second integral in (2.11). They also guarantee that the process ?Rt0 VsdBs?t2R
+
is F-adapted and square-integrable in the sense that Rt0 VsdBs 2 L2(?) for all t 2R+. The
assumptions on U are stronger than necessary to guarantee the existence of the flrst inte-
gral in (2.11); in fact, Ui 2 L1([0;t]??) for all 1 ? i ? n and t 2R+ would be su?cient.
39
However, the stronger assumptions on U guarantees that the process ?Rt0 Usds?t2R
+
is F-
adapted and square-integrable in the sense that Rt0 Usds 2 L2(?) for all t 2R+. Indeed, we
have the following lemma.
Lemma 2.2. Under the assumptions of Deflnition 2.3, the processes ?Rt0 Usds?t2R
+
and
?Rt
0 VsdBs
?
t2R+ are well-deflned, F-adapted and square-integrable in the sense that
Rt
0 Usds,
Rt
0 VsdBs 2 L
2(?) for all t 2R+.
Corollary 2.1. Assume the hypotheses of Deflnition 2.3 with F = F(B) and let X0 2 L2(?).
Then the process X deflned by (2.11) is F(B;X0)-adapted and square-integrable in the sense
that Xt 2 L2(?) for all t 2R+.
Let us return to the integral equation (2.10), that is, the integral version of the \stochas-
tic difierential equation"
dXt = U(Xt)dt+V(Xt)dBt: (2.15)
It is natural to seek a solution X of (2.15) subject to an initial condition of the form
X0 = X0; (2.16)
where X0 is a given Rn-valued random variable on ?. The integral equation corresponding
to (2.15/(2.16)) reads
Xt = X0 +
Z t
0
U(Xs)ds+
Z t
0
V(Xs)dBs (2.17)
40
Due to the random initial condition, a solution of (2.17) cannot be expected to be
F(B)-adapted, but should be F(B;X0)-adapted. The same would then hold for U(X)
and V(X). However, under this assumption, the Ito integral in (2.17) is deflned only if
each component of B is a martingale with respect to F(B;X0). This is the case if X0
and B are independent; this follows from an argument similar to (2.8) and is reasonable
intuitively, as we expect that the randomness of the initial condition should have nothing
to do with an arbitrarily given Brownian motion. Along these lines, note that if X were
only F(B)-adapted, then X0 being independent of B would force X0 to be a.s. constant!
The above consideration motivates the following version of Corollary (2.1).
Corollary 2.2. Assume the hypotheses of Deflnition 2.3 with F := F(B;X0), where X0 2
L2(?) is independent of B. Then the process X deflned by (2.11) is F(B;X0)-adapted and
square-integrable in the sense that Xt 2 L2(?) for all t 2R+.
Now we move to the next section, where we formally deflne stochastic difierential
equations, deflne the solution to a stochastic difierential equation, and discuss the existence
and uniqueness of solutions.
2.2 Stochastic Difierential Equations and their Solutions
As we discussed in the previous section, (2.13) or (2.14) is a stochastic analog of the
deterministic antidifierentiation problem dx = f(t)dt or _x = f(t), where f :R+ !Rn is a
given, su?ciently regular function. To arrive at the stochastic analog of dx = f(t;x)dt or
_x = f(t;x), where f : R+ ?Rn ! Rn is a given, su?ciently regular function, we need to
discuss the composition of stochastic processes.
41
Let (?;A), (S;S), and (S0;S0) be measurable spaces, let H be a continuous-time S-
valued stochastic process on (?;A) and let G be a continuous-time S0-valued stochastic
process on (S;S). We now (with slight abuse of notation) deflne the composition of G with
H, denoted by G?H.
Deflnition 2.4. For G and H as above, we deflne the composition G?H to be the process
deflned by (G?H)t := Gt ?Ht, for all t 2R+.
In this way, G?H is a continuous-time S0-valued stochastic process on (?;A).
Now, if X is a stochastic integral, then X is a continuous-time Rn-valued stochastic
process on (?;A;P). So, take U : R+ ?Rn ! Rn to be measurable with respect to the
second variable (so U is a continuous-timeRn-valued stochastic process on (Rn;Bn)). Then
U ?X is an Rn-valued process on ?, and we have
(U ?X)(t;!) = (Ut ?Xt)(!) = Ut(Xt(!)) = Ut(X(t;!)) = U(t;X(t;!));
for all t 2R+ and ! 2 ?.
Similarly, take V : R+ ?Rn ! Rn?d to be measurable with respect to the second
variable. Then V ?X is anRn?d-valued process on ?, and at least formally, we can consider
the stochastic difierential equation
dXt = U(t;Xt)dt+V(t;Xt)dBt; (2.18)
or the equivalent integral equation
Xt = X0 +
Z t
0
U(s;Xs)ds+
Z t
0
V(s;Xs)dBs: (2.19)
42
This motivates the deflnition of solution to a stochastic difierential equation.
Deflnition 2.5. We say X is a solution to (2.18) if X is any continuous-time Rn-valued
process X on ? such that (U ? X;V ? X) satisfy the hypotheses of Deflnition 2.3 and X
satisfles (2.19) for all t 2R+.
We remark that this deflnition makes sense due to Lemma 2.2. We will soon give
particular conditions on U and V in order to guarantee that a unique solution to (2.18)
exists. For now, assume that U and V are appropriate enough for (2.18) to make sense.
We can now rigorously impose an initial condition to (2:18), state the deflnition of the
stochastic initial value problem, and deflne the notion of solution.
Let X be a solution to (2:18) as above, and suppose we are given anRn-valued random
variable X0 = (X10;X20;??? ;Xn0 ) such that Xi0 2 L2(?) for each 1 ? i ? n and such that X0
is independent of B. Recalling the argument preceding Corollary 2.2, if X0 = X0 a.s. then
we specify F to be F(X0;B), where F(X0;B) = fFt(X0;B)gt2R+ and where Ft(X0;B) is
the  -algebra generated by X0 and fBsjs ? tg, for every t 2 R+. Motivated by this and
Corollary 2.2, the following deflnition is justifled.
Deflnition 2.6. Given X0 2 L2(?), independent of B, we call
dXt = U(t;Xt)dt+V(t;Xt)dBt; (2.20)
X0 = X0 a.s.
a (strong) stochastic initial value problem, and we say X is a (strong) solution to (2.20) if X
is a solution to dXt = U(t;Xt)dt+V(t;Xt)dBt in the sense of Deflnition 2.5 and satisfles
X0 = X0 a.s.
43
In Problem (2.20) the Brownian motion B is given in advance and we are seeking a
solution X. A \weak" version of (2.20) would require that, along withX, we flnd a Brownian
motion B on a probability space (?;A;P) and a flltration F such that each component of
B is a martingale with respect to F. Then, since (?;A;P) is not given, we cannot impose
an initial condition as in (2.20), but we can impose an initial distribution ?. This leads to
the following deflnition.
Deflnition 2.7. We call
dXt = U(t;X)dt+V(t;X)dBt; (2.21)
PX0 = ? a.s.
a weak stochastic initial value problem, and we say (X;B;F) is a weak solution to (2.21) if
B is an Rd-valued Brownian motion such that Bi is a martingale with respect to F for all
1 ? i ? d, and X is a solution to dXt = U(t;X)dt+V(t;X)dBt in the sense of Deflnition
2.5 and satisfles ? = PX0.
Clearly, a strong initial value problem induces a weak initial valueproblem (by replacing
the given initial condition with its distribution and then removing the given probability
space and Brownian motion); if that strong initial value problem has a solution then clearly
the induced weak initial value problem must also have a solution. Also, if a weak initial
value problem has a solution, then there is at least one associated strong initial value
problem (by taking the Brownian motion B in the weak problem?s solution as the given
Brownian motion in the strong problem and constructing a random variable X0 over B?s
accompanying probability space such that X0 has distribution ?). Further, the existence
44
of a weak solution X to a weak initial value problem does not necessarily imply that X is a
strong solution to the induced strong initial value problem; this is believable simply because
X may be adapted to some flltration F but not to F(X0;B).
From a modeling perspective, we really have weak problems (as no one can realistically
present up front the speciflc representation of the white noise involved). Weak solutions are
also useful because there are examples of weak initial value problems which have a weak
solution but no strong solutions (see [10, pp. 301-302]). We drop the adjective weak or
strong when there is no ambiguity.
Along these lines, there are also strong and weak notions of uniqueness.
Deflnition 2.8. We say that the strong initial value problem (2:20) has the strong unique-
ness property if any two solutions X and ~X are modiflcations of each other.
For convenience we often say that X is a strongly unique solution, or that X is strongly
unique. Strong uniqueness is often called pathwise uniqueness.
Deflnition 2.9. We say that the weak initial value problem (2:21) has the weak unique-
ness property if any two solutions (X;B;F) and ( ~X; ~B; ~F) are equivalent in the sense that
PX = P ~X.
Again, for convenience, we often say that X is a weakly unique solution or X is weakly
unique, and since we may identify any weak solution with its (unique) distribution, we
sometimes call PX the weak solution. Weak uniqueness is often called uniqueness in distri-
bution.
Analogously, we can have an initial value problem starting at any time s > 0.
45
Now we have the ingredients to present an existence and uniqueness theorem, which
gives us at least one way to place conditions on U and V to guarantee the situation of
Deflnition 2.5 is satisfled (one proof can be found in [12, Theorem 5.5]):
Theorem 2.2 (Existence/Uniqueness). Let (?;A;P) be a measure space, n 2N,
d 2f1;:::;ng, B a d-dimensional Brownian motion on ?, U :R+ ?Rn !Rn and
V :R+ ?Rn !Rn?d measurable functions, X0 2 L2(?), X0 independent of B,
F = F(B;X0).
Assume that there exist positive constants C and D such that, for all t 2 R+ and
x;y 2Rn,
jU(t;x)j+jV(t;x)j? C(1+jxj); (2.22)
where j?j is the Euclidean norm, and
jU(t;x)?U(t;y)j+jV(t;x)?V(t;y)j? Djx?yj: (2.23)
Then the initial value problem (2:20) has a strongly unique strong solution X.
Before proving this important theorem, we remark that (2.22) is imposed to avoid
that X explodes, i.e., that there is a flnite time T0 such that P(limt!T0 jXt(?)j = 1) > 0
(see [9, Lemma 21.6]) while (2.23) ensures uniqueness. Compare this to the deterministic
case, where an at most linear growth estimate insures that solutions do not explode (see
[1, Theorem 7.6]) and a Lipschitz condition guarantees uniqueness. The idea of the proof
is similar to the deterministic case; let us only consider the scalar case.
46
Proof. First, we show uniqueness. Suppose two solutions X and Y exist, having initial
values X0 and Y 0, respectively. Then we can estimate EjXt?Ytj2 for a flxed t by using the
inequality (x+y +z)2 ? 3(x2 +y2 +z2), the Ito isometry, the Cauchy-Schwarz inequality,
and (2.23) so that Gronwall?s inequality applies. This yields an inequality of the form
EjXt ?Ytj2 ? 3EjX0 ?Y 0j2eKt;
where K is a constant depending only on D and T. Assuming X0 = Y 0 then im-
plies that P(jXt ? Ytj = 0) = 1 (recall t is flxed). We can repeat this argument for
all rational t and then use that stochastic integrals are continuous in time to obtain
P(St2[0;T]jXt ?Ytj = 0) = 1, which means X and Y are modiflcations of each other. This
shows the strong uniqueness.
To show existence, flrst deflne the iterations Y (0)t := X0 and
Y (k+1)t := X0 +
Z t
0
U(s;Y (k)s )ds+
Z t
0
V(s;Y (k)s )dBs; (2.24)
for k 2Z+. We claim that (2.24) is well deflned; flrst note that Y (k)t isFt(X0;B)-measurable
for each k 2Z+ and for all t 2 [0;T]. Next, we have by a similar calculation to that in the
uniqueness proof, by (2.22), and by Fubini, that
EjY (1)t ?Y (0)t j2 ? 2C2(t+t2)(1+EjX0j2) ? L1t; (2.25)
where L1 only depends on C;T, and EjX0j2. Therefore (2.24) makes sense for k = 1.
One can proceed by induction to show that (2.24) makes sense for all k; we can estimate
47
EjY (2)t ?Y (1)t j2 similarly and use (2.23) to yield an inequality of the form
EjY (2)t ?Y (1)t j2 ? 3(1+T)D2
Z t
0
EjY (1)s ?Y (0)s j2ds (2.26)
? 3(1+T)D2
Z t
0
L1s ds ? L2t
2
2 ;
where L2 only depends on C;D;T, and EjX0j2. Iterating this, we can estimate
EjY (k+1)t ?Y (k)t j2 similarly:
EjY (k+1)t ?Y (k)t j2 ? 3(1+T)D2
Z t
0
EjY (k)s ?Y (k?1)s j2ds ? Lk+1t
k+1
(k +1)! (2.27)
where Lk+1 is a constant depending only on T;C;D; and EjX0j2; in fact,
Lk+1 = L1(3(1+T)D2)k, for k 2 Z+. Since t ? T, this inequality shows fY (k)t gk2Z+ is
a Cauchy sequence in L2([0;T]??), so for every t 2 [0;T], fY (k)t gk2Z+ has a Ft(X0;B)-
measurable limit Xt. In fact, this convergence is uniform; from (2.27), we apply the in-
equality (see [6, Theorem 2.8])
P(sup
[a;b]
j
Z b
a
f(s)dBsj > r) ? 1r2E(
Z b
a
f2(s)ds)
yielding
P(sup
[0;T]
jY k+1t ?Y kt j > 1k2) ? Lk+1t
k+1
k +1! k
4:
48
Since P1k=1 Lk+1tk+1k+1! k4 converges, by the Borel-Cantelli Lemma (see e.g. [6, Theorem 1.1])
there exists a su?ciently large M 2N such that, for all m ? M,
P(sup
[0;T]
jY m+1t ?Y mt j > 1m2) = 0:
Therefore the convergence of fY (k)t gk2Z+ to Xt is uniform, which means
Xt = X0 + lim
k!1
(
Z t
0
U(s;Y (k)s )+
Z t
0
V(s;Y (k)s )) = X0 +
Z t
0
U(s;Xs)+
Z t
0
V(s;Xs)dBs;
so X is indeed a solution.
Unless we say otherwise, we assume (2.22) and (2.23) hold when we discuss stochastic
difierential equations.
As we discussed before, a strong initial value problem induces a weak initial value
problem; it can be shown that if a strong initial value problem enjoys the strong uniqueness
property, then the strong initial value problem and its induced weak initial value problem
have the weak uniqueness property (see e.g. [10, pp. 306-310]).
We will soon discuss how deterministic dynamical systems generalize to the stochastic
case, but before this, we reserve the next section for Ito?s formula, which allows us to
calculate speciflc examples of Ito integrals and hence solutions to stochastic difierential
equations.
49
2.3 Ito?s Formula and Examples
Equipped with Ito integral and stochastic difierential equation concepts, we now focus
on explicitly calculating Ito integrals and solutions to stochastic difierential equations. The
tool we need is Ito?s formula, which is essentially a stochastic analog of the chain rule.
First, we prove Ito?s formula in one dimension, and then we present the n-dimensional
version and study some examples. Let X be the stochastic integral generated by (U;V)
(for U;V satisfying the assumptions in Deflntion 2.3) and let g 2 C2(R+ ?R). Then the
process (g(t;Xt))t2R+ is also a 1-dimensional stochastic integral, and for all t,
g(t;Xt) = g(0;X0)+
Z t
0
?@g
@s(s;Xs)+Us
@g
@x(s;Xs)
+ 12V 2s @
2g
@x2(s;Xs)
?
ds+
Z t
0
Vs@g@x(s;Xs)dBs;
which we call Ito?s formula.
Notice the \extra" term Rt0 12V 2s @2g@x2ds; such a term is often called a \correction term."
We can, in fact, recover the \natural" form of the chain rule by using the Stratonovich
integral (which difiers from the Ito integral in that it uses the midpoint instead of the left
endpoint), but Stratonovich integrals \look into the future" and (among other things) do
not enjoy the martingale property.
To see where this extra term comes from, let us examine a Taylor expansion of g(t;Xt);
it is enough to assume that g; @g@t; @g@x; and @2g@x2 are bounded, for if we can prove it in this
case, then we can take sequences of bounded functions gn; @gn@t ; @gn@x ; and @2gn@x2 to uniformly
approximate a C2 function g and @g@t; @g@x; and @2g@x2, respectively on compact subsets ofR+?R
(by Stone-Weierstrass) and then the uniform convergence allows the limit to carry through
50
the integral (for the stochastic term, use Ito?s isometry). Recall that the norm of any
partition P of [0;t] is deflned to be k P k= max1?i?n(ti ?ti?1), and let P = (tj)nj=0 be a
partition of [0;t] with su?ciently small norm. Then, carrying out a Taylor expansion of
g(t;Xt),
g(t;Xt) = g(0;X0)+
nX
j=1
?g(t
j;Xtj)?g(tj?1;Xtj?1)
?
= g(0;X0)+
nX
j=1
 @g
@t(tj?1;Xtj?1)(tj ?tj?1)
+ @g@x(tj?1;Xtj?1)(Xtj ?Xtj?1)
?
+ 12
nX
j=1
 @2g
@t2 (tj?1;Xtj?1)(tj ?tj?1)
2
+2 @
2g
@x@t(tj?1;Xtj?1)(Xtj ?Xtj?1)(tj ?tj?1)
+ @
2g
@x2(tj?1;Xtj?1)(Xtj ?Xtj?1)
2
?
+
nX
j=1
Rj;
where Rj in the remainder Pnj=1 Rj takes the form
Rj =
X
fjfij?3g
@fi1
@tfi1
@fi2
@xfi2
g(tj?1;Xtj?1)
fi1!fi2! (tj ?tj?1)
fi1(Xt
j ?Xtj?1)
fi2:
Let us approximate each term, using that the norm of the partition is small. For the
flrst-order terms, we see
X
j
@g
@t(tj ?tj?1) ?
Z t
0
@g
@t(s;Xs)ds;
51
and
X
j
@g
@x(Xtj ?Xtj?1) ?
Z t
0
@g
@x(s;Xs)dXs;
where
Z t
0
@g
@x(s;Xs)dXs =
Z t
0
@g
@x(s;Xs)U(s;Xs)ds+
Z t
0
@g
@x(s;Xs)V(s;Xs)dBs;
For the second order terms, only the term Pnj=1 @2g@x2(tj?1;Xtj?1)(Xtj ?Xtj?1)2 is not
approximately zero. We expand this term:
nX
j=1
@2g
@x2
?
(Utj ?Utj?1)2(tj ?tj?1)2 +(Vtj ?Vtj?1)2(Btj ?Btj?1)2+ (2.28)
(Utj ?Utj?1)(Vtj ?Vtj?1)(tj ?tj?1)(Btj ?Btj?1)
?
:
We claim that (2.28) has only one term that is not approximately zero, namely,
Pn
j=1
@2g
@x2(Vtj ?Vtj?1)
2(Bt
j ?Btj?1)
2; and it satisfles
k
nX
j=1
@2g
@x2(Vtj ?Vtj?1)
2(Bt
j ?Btj?1)
2 ?
Z t
0
@2g
@x2V
2
s ds kL2? 0: (2.29)
For details on how to prove (2.29), see [12, p.32]; we present a similar and more
transparent argument to clearly convey the essence of Ito?s formula. To this end, we now
show that
lim
kPk!0
k
nX
j=1
[(Btj ?Btj?1)2 ?(tj ?tj?1)] kL2= 0: (2.30)
52
To see this, call Mj = (Btj ?Btj?1)2 ?(tj ?tj?1). Then EMj = 0 because
E(Btj ?Btj?1)2 = tj ?tj?1. Further, E(Mj)2 = 2(tj ?tj?1) because
E(Mj)2 = E((Btj ?Btj?1)4 ?(Btj ?Btj?1)2(tj ?tj?1)+(tj ?tj?1)2)
= 3(tj ?tj?1)2 ?2(tj ?tj?1)2 +(tj ?tj?1)2;
by (1.5) (in the section on Brownian motion).
Now, since Brownian motion has independent increments, each Mj is independent,
which means
E(
nX
j=1
Mj)2 = E
nX
j=1
(Mj)2 =
nX
j=1
2(tj ?tj?1)2;
which clearly goes to zero as the norm of the partition goes to zero.
So, we have taken the Taylor expansion of g(t;Xt) and approximated each term when
the norm of the partition is small; combining all of them yields Ito?s formula, as claimed.
Using difierential notation, we can express (2.30) as \(dBt)2 = dt". Also, we may write
\(dt)2 = dtdBt = 0," since terms containing (tj ?tj1)2 or (tj ?tj1)(Btj ?Btj1) go to zero
when the norm of the partition goes to zero; thus we could write
\(dXt)2 = U2(dt)2 +UVdtdBt +V 2(dBt)2 = V 2dt:"
We could then rewrite Ito?s formula more conveniently:
dg(t;Xt) = @g@t(t;Xt)dt+ @g@x
i
(t;Xt)dXt + 12 @
2g
@x2(t;Xt)(dXt)
2: (2.31)
53
We may extend this result to n-dimensions: taking X to be an n-dimensional stochastic
integral and taking g 2 C2(R+ ?Rn;Rn), where X = (Xi)ni=1 and g = (gi)ni=1, then
(g(t;Xt))t2R+ is an n-dimensional stochastic integral, and for all t 2R+,
dgk(t;Xt) = @g
k
@t (t;Xt)dt+
nX
i=1
@gk
@xi (t;Xt)dX
i
t (2.32)
+
nX
i;j=1
1
2
@2gk
@xi@xj(t;Xt)dX
i
tdX
j
t;
for 1 ? k ? n (understanding dBitdBjt = ?ijdt and dtdBit = 0).
We remark that Ito?s formula can be stated and proved in higher generality using
martingale theory (as in ([9, Theorem 17.18])), but we do not need such a level of abstraction
in the specialized environment of stochastic difierential equations.
Now for some examples; let B be a real-valued Brownian motion on a probability space
(?;A;P). Also, let X0 = x a.s. always be the initial condition (for some x 2R).
To start, let us observe an integration by parts formula: for f 2 C1(R+),
Z t
0
f(s)dBs = f(t)Bt ?
Z t
0
Bsdf(s); (2.33)
for all t 2 R+. This follows from Ito?s formula; we think that Rt0 f(s)dBs should some-
how yield a \f(t)Bt" sort of term, so we take g(t;x) = f(t)x. Then it is clear that
@
@tg(t;x) = f
0(t)x, @
@xg(t;x) = f(t); and
@2
@x2g(t;x) = 0. Since g(t;Bt) = f(t)Bt, by Ito?s
formula,
d(g(t;Bt)) = d(f(t)Bt) = f0(t)Btdt+f(t)dBt +0;
54
which easily yields (2.33). Note that f only depends on time, and note that (similarly to
the above argument)
Z t
0
f(s)dXs = (f(t)Xt ?f(0)X0)?
Z t
0
Xsdf(s) (2.34)
holds for any 1-dimensional stochastic integral X (for a more general result, see [9, Theorem
17.16]).
Along these lines, consider
dXt = dBt ?Xtdt: (2.35)
Calling g(t;x) = etx and using Ito?s formula yields
d(etXt) = etdXt +etXtdt = etdBt;
and it is easy to see that the solution is
Xt = e?tx+
Z t
0
e(s?t)dBs:
Similarly,
dXt =  dBt ?bXtdt;
55
where  ;b are constants, has solution
Xt = e?btx+ 
Z t
0
eb(s?t)dBs:
Next, we have a basic example of an Ito integral calculation:
Z t
0
BsdBs = 12(B2t ?t): (2.36)
We expect the 12B2t term from deterministic calculus, but we inherit the extra 12t term from
the stochastic case (often called a correction term). To use Ito?s formula, take g(t;x) = 12x2
and Xt = Bt, which yields
dg(t;Bt) = BtdBt + 12(dBt)2;
which means
1
2dB
2
t =
Z t
0
BsdBs + 12
Z t
0
ds;
and this easily reduces to (2.36).
Thisexampleshowshowtherealization(dBt)2 = dthelpscalculateItointegrals. Notice
that the integral of a purely random process ends up having a deterministic part; for deeper
insight along these lines, see [9, pp. 339-340].
Along these lines, let us solve
dXt = bXtdt+ XtdBt; (2.37)
56
where  and b are positive constants. The noiseless case would produce the solution Xt =
ebXt, so we may expect something like this but with a correction term. Since (2.37) can be
interpreted as Z
t
0
dXs
Xs = bt+ Bt;
we apply Ito?s formula (using that (dt)2 = 0):
d(ln(Xt)) = dXtX
t
? (dXt)
2
2X2t =
dXt
Xt ?
 2X2t dt
2X2t =
dXt
Xt ?
1
2 
2dt:
Thus, solving for dXtXt and integrating, we get
bt+ Bt = ln(Xtx )+ 12 2t;
or
Xt = xe(b?12 2)t+ Bt:
Notice that we recover the proper solution for the noiseless case when  := 0, and that we
have solved the one-dimensional equation _X = (r(X)+W)X when r(X) is constant.
For the next example we study the logistics equation
dXt = (aXt ?b(Xt)2)dt+ XtdBt; (2.38)
for a;b; and  positive constants. We remark that only a is perturbed; if b is perturbed,
then the probability that solutions do not explode in flnite time is zero (except, of course,
for the trivial solution x = 0; see [2, p. 99]).
57
To solve (2.38), substitute Yt = ?1Xt to get
dYt = (?aYt ?b)dt? YtdBt;
which is solved in a similar fashion to (2.37):
Yt = e?[(a?12 2)t+ dBt][?1x ?b
Z t
0
e(a?12 2)s+ dBsds]:
Transforming back to Xt gives
Xt = xe
(a?12 2)t+ Bt
1+xbRt0 e(a?12 2)s+ dBsds
:
Observe that in the noiseless case one yields the familiar elementary solution
Xt = xKx?(K ?x)e?at;
for K = ba and all t 2R+.
Finally, let us move to a two-dimensional system; now let B be anR2-valued Brownian
motion on a probability space (?;A;P). We will \reverse engineer" the classical system
where
dx1 = ?x2dt
dx2 = x1dt;
58
with initial value x(0) = (x1(0);x2(0)) = (1;0) 2R2. It is well-known that the solution is
x(t) = (cos(t);sin(t)) = eit. Now pick g(t;x) = eix, so that
g(t;B) = eiB = (cosB;sinB) := (X1;X2):
Then by Ito?s formula, we see
dX1t = ?sin(Bt)dBt ? 12 cos(Bt)dt = ?X2dBt ? 12X1dt
dX2t = cos(Bt)dBt ? 12 sin(Bt)dt = X1dBt + 12X2dt
is the stochastic system (with initial condition X0 = (1;0) a.s.) whose solution is a one-
dimensional Brownian motion traveling around the unit circle.
59
Chapter 3
Dynamical Systems and Stochastic Stability
3.1 \Stochastic Dynamical Systems"
In this section we present an overview of dynamical systems and their \stochastic"
analogies. Let X be a metric space and let S : D(S) ? R? X ! X. For t 2 R, let
St := S(t;?).
Deflnition 3.1. We say S is a local dynamical system or local  ow if D(S) is open in
R?X, S is continuous and
i) S0 = idS, and
ii) St+s = St ?Ss for all t;s 2R.
We say S is a global  ow if S is a local  ow with D(S) =R?X. We say S is a local
(global) semi ow if the above conditions hold with R replaced by R+.
Deflnition 3.2. We say that a set A ? X is positively invariant for a local semi ow S if
St(A\D(St)) ? A for all t 2 R+. We say x is an equilibrium point if, for every t 2 R+,
x 2 D(St) and St(x) = x.
Recall that autonomous ordinary difierential equations generate  ows: let
b 2 C1(Rn;Rn) and consider the system _u = b(u). Given any initial value x 2 Rn there
exists a largest open interval of existence Ix ?Rcontaining 0 such that the system _u = b(u)
has a unique solution ux 2 C1(Ix;Rn) with ux(0) = x. The system _u = b(u) generates a
local solution  ow S : D(S) ?R?Rn !Rn with D(S) := f(t;x) 2R?Rnjt 2 Ixg where
S(t;x) := ux(t) for all (t;x) 2 D(S); D(S) is open and S is continuous by [1, Theorem
60
8.3], S is C1(D(S);Rn) since b 2 C1(Rn;Rn) by [1, Theorem 9.5], and the uniqueness of
the solution guarantees the group property: St+s = StSs for all s;t 2R+. Similarly, we can
obtain the local solution semi ow SjD(S)\(R+?Rn). Observe that @@tS(t;x) = b(S(t;x)) for
all (t;x) 2 D(S).
Assume _u = b(u) generates a global solution  ow S. Then St 2 C1(Rn;Rn) for all
t 2R and St is invertible with S?1t = S?t 2 C1(Rn;Rn). So, for each t 2R, St :Rn !Rn
is a C1-difieomorphism. This means that for any A 2 Bn we have the change of variable
formula
Z
S?1t (A)
f(x)dx =
Z
A
f(S?1t (y))jdet(D(S?1t (y)))jdy (3.1)
=
Z
A
f(S?t(y))jdet(DS?t)(y)jdy
where D(S?1t ) is the (nonzero) Jacobian matrix for S?1t . Thus, ?(A) = 0 implies
?(S?1t (A)) = 0 for all t; we shall soon see the importance of this property.
Analogously, compare the above to the autonomous case of stoachastic difierential
equations. For (U;V) 2 Sn;dF;B, this amounts to making U and V constant in time; call
Ut := b for all t 2R+ and Vt :=  for all t 2R+, where b :Rn !Rn is (Bn, Bn)-measurable
and  :Rn !Rn?d is (Bn, Bn?d)-measurable. We remark that this may seem like a strange
and sudden notation change, but it is quite common in the literature to use \ , b" notation
and so we adhere to this convention now, specially reserving it for the autonomous case
(even though many authors use  and b for more general cases).
We shall now see how solutions in the autonomous case enjoy some of the \nice dynam-
ical systems properties" that we hope for. Considering only degenerate initial distributions,
61
we see that (2.20) takes the form
dXt = b(Xt)dt+ (Xt)dBt; (3.2)
X0 = x a.s.;
with x 2Rn. To emphasize the initial condition in the solution?s expression, call X0;x the
strong solution to (3.2), where B is given, and X0;x is F(B)-adapted. Let us study the
induced weak problem
dXt = b(Xt)dt+ (Xt)dBt; (3.3)
PX0 = ?x:
We will now provide two weak solutions to (3.3) that are \time shifts" of each other and
use the weak uniqueness property to conclude that they must have the same distribution.
Obviously, (X0;x;B;F(B)) is a weak solution to (3.3).
Now consider the process (Xs;xt+s)t2R+, that is, the shifted version of X0;x as above
where Xs;xs = x a.s. We would like to shift back in time by s to solve (3.3); observe that
one cannot simply shift the Brownian motion B in time without afiecting the variance, so
to \shift" B, we need to deflne a new Brownian motion with the appropriate distribution
(this is why we must appeal to the weak problem!). So, deflne ~Bt := Bt+s ? Bs for all
t ? 0. Then ~B = ( ~Bt)t2R+ is a Brownian motion starting at zero, has the same distribution
as (Bt+s)t2R+ by stationary increments, and ~F( ~B) is deflned by ~Ft( ~B) := Ft+s(B) for all
62
t 2R+. Now, by deflnition of Xs;x and shifting,
Xs;xt+s = x+
Z t+s
s
b(Xs;xr )dr +
Z t+s
s
 (Xs;xr )dBr
= x+
Z s
0
b(Xs;xr+s)dr +
Z s
0
 (Xs;xr+s)d ~Br:
This means ((Xs;xt+s)t2R+; ~B; ~F( ~B)) is also a weak solution of (3.3). Thus, by the weak
uniqueness,
(X0;xt )t2R+ ?d (Xs;xt+s)t2R+; (3.4)
which leads us to the following deflnition:
Deflnition 3.3. We say a process X is time-homogeneous if X satisfles (3.4).
We call this the difiusion case, often referring to the solution X as a difiusion; we
think of x as a particle that would move with velocity b (b is sometimes called the \drift"
coe?cient) except that random collisions with other particles (say, the collisions occur with
some kind of \intensity"  , also called a \difiusion" coe?cient) may cause interference. As
we will see later, there is an intimate relationship between stochastic difierential equations
and second-order partial difierential equations, which is one reason why the term \difiusion"
is used.
Again, time-homogeneity has much to do with the \nice dynamical systems properties"
we want; we may think of a difiusion as a \stochastic semi ow." In fact, in probability
theory, a difiusion refers to a Markov process with continuous paths (with perhaps some
extra properties); it therefore is not a surprise that a solution to a stochastic difierential
63
equation with initial values that are a.s. constant will be a Markov process (see [12, Theorem
7.2]).
So what if the initial condition is a nondegenerate random variable? Then we have
a semi ow action on a set of probability measures. More precisely, recall that Cb(Rn;R)
is the set of bounded, continuous functions mapping Rn into R, and call MBn the set
of all flnite Borel measures. Equip Cb(Rn;R) with the the sup norm, that is, the norm
kfk = supxjf(x)j, to make it a Banach space. Then MBn is a subset of C?b(Rn;R), the dual
space of Cb(Rn;R), so if we equip C?b(Rn;R) with the weak? topology, MBn inherits it. It
can be shown that MBn is metrizable as a complete metric space (see [13, p.371]), therefore
a dynamical system could be deflned over MBn. In fact, we are most interested in the
(Lebesgue) absolutely continuous measures; this allows us the luxury of using semigroup
theory to \ uctuate functions" rather than \ uctuate measures."
Now deflne a family U = (Ut)t?0 such that for each t 2R+, Ut : MBn !MBn by
Ut? :=
Z
Pt(x;?)?(dx): (3.5)
Then U is the dual semigroup to T, as hf;Ut?i = hTtf;?i for all f 2 C0(Rn;R) and all
? 2 C?0(Rn;R), where hf;?iC0(R);C?0(R) := R fd?. Note that U is a semidynamical system
on MBn: ? 2MBn implies Ut? 2MBn for every t 2R+. Also, for any B 2MBn, we have
U0?(B) =
Z
P0(x;B)?(dx) =
Z
1B(x)?(dx) = ?(B)
and
Us+t?(B) =
Z
Ps+t(x;B)?(dx) =
Z  Z
Pt(y;B)Ps(x;dy)
?
?(dx) =
64
Z
Pt(y;B)
Z
Ps(x;dy)?(dx) =
Z
Pt(y;B)Us?(dy) = Ut(Us?(B)):
All we lack is the continuity, the proof of which can be found in ([13, pp. 370-371]). Notice
that if ? is a probability measure, then Ut? is a probability measure for every t 2R+ since
T is a contraction semigroup.
3.2 Koopman and Frobenius-Perron Operators:
The Deterministic Case
In this section, we deflne the Koopman and Frobenius-Perron operators, which are
useful in understanding how deterministic cases for difierential equations extend to stochas-
tic ones. Primarily, we are interested in describing the distribution of a (continuous-time
Rn-valued) solution process X of a stochastic difierential equation via semigroup theory; in
the case where the distributions PXt have densities for every t, one can represent the  ow
as a semigroup of linear operators on L1(Rn) whose generator is a second-order difierential
operator onRn. This leads to the set-up of a partial difierential equation called the Fokker-
Planck equation, which describes the  uctuation of the densities of the distributions of X
(assuming that the random variable Xt has a Lebesgue density for all t 2R+).
A brief outline of the procedure is as follows. First, we deflne the Koopman operator
and the Frobenius-Perron operators. We show they are adjoint and derive the inflntesimal
generators for each in the case of a deterministic ordinary difierential equation. We then
make a stochastic generalization of these operators and mimic the deterministic case, em-
ploying stochastic calculus. Finally, we yield the form for the inflntesimal generator of the
semigroup describing the solution process X and use its adjoint to obtain the Fokker-Planck
equation.
65
Given a measureable space (X;F) equipped with a signed measure ?, let (Y;G;??) be a
( -flnite) measure space, and let S : X ! Y be measurable.
Deflnition 3.4. We deflne the image measure of ? under S by
?S(G) = ?(S?1(G));
for any G 2G.
A useful characterization which follows from the deflnition is
Z
gd?S =
Z
(g ?S)d?; (3.6)
for g nonnegative and measurable. Also, a measurable g is ?S-integrable ifi g ? S is ?-
integrable, in which case (3.6) holds.
Deflnition 3.5. S is nonsingular if ??(G) = 0 implies ?(S?1(G)) = 0 for every G 2G.
So, if S is nonsingular, then ?S is absolutely continuous with respect to ?? and thus,
has a Radon-Nikodym derivative d?Sd?? (which is in L1(??) := L1(Y;G;??) ifi ?S is flnite).
Now let (X;F; ?) be another  -flnite measure space, and let f 2 L1(?) := L1(X;F; ?),
and recall that ?f(A) := RA fd?, for A 2F.
We now deflne the Frobenius-Perron operator P (associated with S as above) by apply-
ing the image measure construction to the signed measure ?f; denote this by ?Sf = (?f)S.
Note that since f is ?-integrable and
?Sf(G) = ?f(S?1(G)) =
Z
S?1(G)
fd?
66
holds for any G 2G, then ?Sf is a flnite signed measure.
Deflnition 3.6. The operator P, where P : L1(?) ! L1(??) is deflned by
Pf = d(?
S
f)
d?? ;
is called the Frobenius-Perron operator (associated to S).
We obtain this by taking f 2 L1(?), creating ?f, associating it to ?Sf, and using the
nonsingularity of S to take the the Radon-Nikodym derivative with respect to ??. In another
way, for G 2G,
?Sf(G) = ?f(S?1(G)) =
Z
S?1(G)
fd? =
Z
G
Pfd??:
In fact, what happens in general is that S is causing a change in measure, so one can
think of ? 7! ?S as a mapping from MF into MG, where MA denotes the set of all flnite
signed measures on a  -algebra A. Deflne, for a given measure  0 on A,
M 0A = f 2MAj <<  0g;
which is the set of all flnite signed measures which are absolutely continuous with respect
to  0 over A. By Radon-Nikodym, there is a one-to-one correspondence between elements
of M 0A and L1( 0). Also, if S is nonsingular, then ? 7! ?S is a mapping from M?F into
M??G; by the above one-to-one correspondence, this mapping can be identifled with the
Frobenius-Perron operator.
Closely related to this concept is the Koopman operator.
67
Deflnition 3.7. The operator U, where U : L1(??) ! L1(?), deflned by Ug = g ? S, is
called the Koopman operator (associated to S).
Clearly, U is nonnegative in the sense that g ? 0 implies Ug ? 0 for all g 2 L1(??),
and U is a bounded linear operator with operator norm 1. Further, P is a nonnegative
bounded linear operator with operator norm 1. To see that P is nonnegative, let f 2 L1(?)
be nonnegative everywhere and suppose that Pf is negative over a set G 2 G of positive
measure. This implies Z
S?1(G)
fd? =
Z
G
Pfd?? < 0;
which contradicts that f is nonnegative. To see that the operator norm of P is 1, observe
for nonnegative f 2 L1(?)
kPfkL1(?) =
Z
Y
Pfd?? =
Z
X
fd? = kfkL1(?):
This extends easily to the case of arbitrary f 2 L1(?), using the decomposition of f into
positive and negative parts.
Lemma 3.1. The Koopman operator is the adjoint of the Frobenius-Perron operator.
Proof. By (3.6), for all f 2 L1(?);g 2 L1(??),
< Pf;g >L1(??);L1(??)=
Z
(Pf)gd?? =
Z d?S
f
d?? gd??
=
Z
gd?Sf =
Z
(g ?S)d?f =
Z
(g ?S)fd? =< f;Ug >L1(?);L1(?) :
68
Now, let X = Y be a metric space, and let F = G = B, where B is the Borel  -algebra
on X, and let ??;?? be two  -flnite measures on B. Now we take S : R+ ? X ! X to
be a nonsingular semidynamical system, that is, S is a semidynamical system such that
St : X ! X is nonsingular for all t. Then we can deflne Ptf := d?
St
f
d?? and Utg := g ?St, for
each t 2R+.
More speciflcally, let X = Y = Rn, let ? = ?? = ?n, and let F = G = Bn. As in the
previous section, let b 2 C1(Rn;Rn) and let _y = b(y) generate a global solution  ow S; recall
that S is nonsingular due to the change of variable formula (3.1). We now observe that
P := fPtgt?0 and U := fUtgt?0 (both of which are associated to S) are in fact semigroups;
let t;s ? 0, and let g 2 L1(Rn). U is a semigroup because
Ut+s(g) = U((St+s(g)) = U(St(Ss(g))) = Ut(Ss(g)) = Us(Ut(g));
and clearly, U0(g) = g. Also, P is a semigroup, since for any f 2 L1(Rn) and A 2Bn
Z
A
Pt+sf =
Z
S?1t+s(A)
f =
Z
S?1s (S?1t (A))
f =
Z
S?1t (A)
Psf =
Z
A
Ps(Ptf);
so Pt+sf = Ps(Ptf) (and clearly, P0f = f). We have already observed that Ut and Pt
have operator norm equal to one for each t ? 0, therefore P and U are in fact contraction
semigroups.
The next property we need is strong continuity. Call C1c(Rn) the space of functions in
C1(Rn) that have compact support.
Lemma 3.2. P is a strongly continuous semigroup.
69
Proof. By (3.1), we have (a.s.)
Ptf(x) = f(S?tx)jdet((DS?t)(x))j
for all f 2 C1c(Rn), and
lim
t!0+
f(S?tx)jdet((DS?t)(x))j = f(x):
We claim that the limit is uniform in x. To see this, deflne Kt for t 2 [0;1] to be the support
of Ptf, so K0 is the support of f. Then Kt = St(K0) for all t 2 [0;1], since jdet(DS?t)(x)j
is never zero and
Ptf(x) 6= 0 , f(S?tx) 6= 0 , S?tx 2 K0 , x 2 St(K0):
But K0 is compact and S is continuous on R+ ?Rn, so K := St2[0;1] Kt = S([0;1]?K0)
is compact. Therefore, (t;x) 7! f(S?tx) is uniformly continuous on [0;1] ? K, and so is
(t;x) 7! f(S?tx)jdet((DS?t)(x))j. Finally, realizing that if h : [0;1]?Rn !R is uniformly
continuous, then h(t;?) converges uniformly to h(0;?) as t ! 0+ proves our claim.
Our claim implies
limt!0 k Ptf ?f kL1=
Z
Rn
limt!0jPtf(x)?f(x)jdx =
Z
K
limt!0jPtf(x)?f(x)jdx = 0:
Since P is uniformly bounded, by Banach-Steinhaus, Ptf ! f in L1 for any f 2 L1(Rn)
(the closure of C1c(Rn) in the L1-norm). This means P is strongly continuous.
70
As a consequence, we know that P has an inflnitesimal generator AFP by Hille-Yosida.
To identify AFP, we will use the duality between the Frobenius-Perron and Koopman
operators, flrst showing that U has an inflnitesimal generator AK and identifying it. To
this end, let t 2R+, x 2Rn, and suppose f 2 C1c(Rn); then by the Mean Value Theorem,
deflnition of U, and deflnition of solution semi ow,
Utf(x)?f(x)
t =
f(S(t;x))?f(x)
t =
rf(S(ct;x))? @@tS(ct;x) = b(S(ct;x))?rf(S(ct;x));
for some 0 ? c ? 1. Then
lim
t!0+
Utf(x)?f(x)
t = limt!0+ b(S(ct;x))?rf(S(ct;x)) = b(x)?(rf(x)):
By a similar argument as in the proof of Lemma 3.2, the limit is uniform in x. Thus, for
at least f 2 C1c(Rn), Utf?ft converges in L1; in particular, Utf converges to f in L1 for
all f 2 C1c(Rn) (difierentiability implies continuity). Since U is uniformly bounded, by the
Banach-Steinhaus theorem, Utf ! f in L1 for all f 2 C0 (the closure of C1c(Rn) in the
L1-norm). Further, Ut(C0) ? C0, so U restricts to a strongly continuous semigroup on
C0. So, by Hille-Yosida, UjC0 has an inflnitesimal generator AK; our calculation shows
C1c(Rn) ? D(AK) and that, for all f 2 C1c(Rn),
AKf = b?(rf): (3.7)
71
We now use the duality between the Frobenius-Perron operators and the Koopman
operators to identify AFP. Note that P is not really the dual of UjC0; also, we would
need re exivity to insure that a strongly continuous contraction semigroup T has a strongly
continuous dual contraction semigroup T?, and that the dual of the generator of T is really
the generator of T? [17, Theorem 3.7.1]. However, let g 2 C1c(Rn) and let f 2 D(AFP) be
continuously difierentiable. Then for any t ? 0 we have
< Ptf;g >L1(Rn);L1(Rn)=< f;Utg >L1(Rn);L1(Rn);
and we can subtract < f;g >L1(Rn);L1(Rn) from both sides and divide by t:
< Ptf ?ft ;g >L1(Rn);L1(Rn)=< f; Utg ?gt >L1(Rn);L1(Rn) :
We know that the limit as t ! 0+ exists on both sides; on the right hand side, take this
limit and use (3.7) and integration by parts:
< f;AKg >L1(Rn);L1(Rn)=
Z
f(AKg) =
Z
f[b?(rg)]
= ?
Z
g[r?(bf)]dx =< ?r?(bf);g >L1(Rn);L1(Rn) :
The above calculation identifles AFP, which we have already shown to exist. In fact, for all
f 2 D(AFP)\C1(Rn),
AFPf = ?r?(bf):
Thus, we have proved
72
Theorem 3.1. The inflnitesimal generator AK of the Koopman semigroup (restricted to
C0) is given by AKf = b ?rf for f 2 C1c(Rn). The inflnitesimal generator AFP of the
Frobenius-Perron semigroup is given by AFPg = ?r?(bg) for continuously difierentiable
g 2 D(AFP).
Consider what happens in the case where we have a deterministic difierential equation
with global solution  ow S = fStgt?0 with a \noisy" initial value, that is, an initial value
that is a nondegenerate random variable, say, X0 = X0 a.s. Then we have the initial value
problem
dXt = b(Xt)dt; (3.8)
X0 = X0 a.s.
Lemma 3.3. X := (Xt)t2R+ deflned by Xt := St ?X0 for all t 2R+ solves (3.8).
Proof. The proof is easy. Obviously, the initial condition is satisfled, and further,
@
@tX(t;!) =
@
@tS(t;X
0(!)) = b(S(t;X0(!))) = b(Xt(!));
for any ! 2 ?.
Remark: As a result of the lemma, we get the following useful equations:
PXt(B) = PSt?X0(B) = (PX0)St(B) = PX0(S?1t (B));
73
for all t 2R+ and B 2Bn. Extending this to flnite-dimensional distributions, we get
P(Xt1;Xt2;???;Xtk)(B1 ?B2 ?????Bk) = PX0(
k\
i=1
S?1ti (Bi)); (3.9)
for all t1;t2;??? ;tk 2R+ and B 2Bn.
The Frobenius-Perron semigroup P gives us a new way of understanding (3.8) if X0
has a density. Let X0 have density g, so PX0 = ?g. Recall that, for any t 2R+, we denote
the distribution of Xt by PXt (so PX0 = PX0). Then PXt also has a density (since St is a
difieomorphism for each t), and PXt = ?Ptg since for any A 2Bn,
PXt(A) = P(St?X0)(A) = PX0(S?1t (A)) =
Z
S?1t (A)
g(x)dx =
Z
A
Ptg(x)dx:
But the strong continuity of P allows us to use Theorem 1.2; we may set up the Cauchy
problem
ut = ?r?(bu) = AFPu (3.10)
u(0;x) = g(x);
where g is the density of X0. Solving (3.10) gives u(t;?) = Ptg(?), the density of PXt, for
any t 2 R+. We call (3.10) the Liouville equation; one can interpret this physically as a
conservation of mass equation (where b is a velocity fleld for a  uid with density u).
In summary, we have following lemma:
Lemma 3.4. If X solves (3.8) and PX0 = ?g, then PXt = ?Ptg = ?u(t;?) for all t ? 0,
where u is the solution of (3.10).
74
Probably the easiest example is the scalar \transport" equation _x = b, for a positive
constant b, with initial condition x(0) = x0 whose solution is x(t) = St(x0) := bt + x0. If
there were an entire family of degenerate initial conditions (occurring with some probabil-
ity), they would all be subject to the \transporting" motion x 7! x+bt.
Now, add a noisy initial condition so the transport equation becomes dXt = b dt
with initial condition X0 = X0 having density g, where b is a positive constant. Then
we can use the Frobenius-Perron semigroup to study the  uctuation of g via the equation
@
@tu(t;x) = AFPu := ?r ? (bu) with initial condition u(0;x) = g(x), which has solution
Ptg(x) = g(x ? bt). This makes sense; imagine a process heavily concentrated at x0 2 R
initially, so that g(x0) is a \spike". Then the process should be heavily concentrated at
x0 +bt after some time t, so the density g at this time should be \spiking" at g(x0 ?bt).
For the weak version of (3.8), we would be given the initial condition PX0 = ? instead
of X0 = X0 a.s.; let us point out another way to determine PXt, for t ? 0, given PX0 = ?,
a way that works whether ? has a density or not.
Lemma 3.5. Suppose, for every x 2 Rn, that _y = b(y);y(0) = x has a global forward-in-
time solution Xx with distribution Px := PXx. Given a probability measure ? on Bn, deflne
P? by
P?(A) :=
Z
x2Rn
Px(A)?(dx);
for any A 2B(C). Then a stochastic process X = (Xt)t2R+ is a solution to the initial value
problem
dXt = b(Xt)dt; (3.11)
75
PX0 = ?;
ifi PX = P?.
Remark 1: P? is well-deflned (_y = b(y) determines a trivial convolution semigroup of
measures; see [9, Lemma 8.7] or the discussion on pp. 9-10 in the section \Kernels and
Semigroups of Kernels").
Remark 2: Clearly, Xxt = S(t;x) a.s. by Lemma 7, where S is the solution semi ow
generated by _y = b(y), and so, PXxt = ?S(x;t) for all t 2R+, x 2Rn. Further, (Xx)x2Rn is
a family of strong solutions to _y = b(y) parameterized by x 2 Rn (in the initial condition
y(0) = x) such that each process Xx may live on a difierent probability space (?x;Fx;Px)
for each x; however, this doesn?t matter, as all we care about is Px.
Proof. Let X := (Xt)t?0 be a stochastic process on a probability space (?;F;P). We want
to show that X solves (3.11) ifi PX = P?. To this end, suppose that X solves (3.11); we
must show that PX = P?. Since X solves (3.11), X = (St ? X0)t?0 by Lemma 3.3 and
PX0 = ?. Now let k 2N, let t1;t2;??? ;tk 2R+, let B1;B2;??? ;Bk 2Bn, and call ?t1;t2;???;tk
the mapping from C := C(R+;Rn) into Rn?k given by ! 7! (!(t1);!(t2);??? ;!(tk)) Then,
by deflnition of P?,
P?(??1t1;t2;???;tk(B1 ?B2 ?????Bk))
=
Z
Rn
PXx(??1t1;t2;???;tk(B1 ?B2 ?????Bk))?(dx)
=
Z
Rn
P(Xxt1;Xxt2;???;Xxtn)(B1 ?B2 ?????Bk)?(dx):
76
Now, since Xx solves (3.8) and since Xx0 = x a.s., we may apply (3.9) twice to see that the
above equals Z
1(S?1
t1 (B1)\S
?1
t2 (B2)\???\S
?1
tk (Bk))
(x)?(dx)
= ?(S?1t1 (B1)\S?1t2 (B2)\???\S?1tk (Bk))
= P(Xt1;Xt2;???;Xtk)(B1 ?B2 ?????Bk);
and so P? = PX.
For the other implication, suppose that PX = P?. Then we must show that X is a
solution to (3.11). To see this, observe that X satisfles the initial condition, since
PX0(B) = ?(B);
for any B 2 Bn. It only remains to show that (Xt)t2R+ and (St ?X0)t2R+ have the same
joint distributions, which follows similarly to the above:
P(Xt1;Xt2;???;Xtk)(B1 ?B2 ?????Bk)
= P?(B1 ?B2 ?????Bk)
= ?(S?1t1 (B1)\S?1t2 (B2)\???\S?1tk (Bk))
= P(St1?X0;St2?X0;???;Stk?X0)(B1 ?B2 ?????Bk);
thus X is a solution and the lemma holds.
77
To interpret this result, flx x1;x2 inRn and fi 2 [0;1], and call ?fi = fi?x1 +(1?fi)?x2.
Let Xx1 and Xx2 denote the respective solutions to (3.8) with initial conditions X0 = x1
a.s. and X0 = x2 a.s. Then if X solves (3.11) with initial condition PX0 = ?fi, X must
have distribution P?fi, where
P?fi = fiPXx1 +(1?fi)PXx2:
So, if X is a strong solution to (3.8) with initial condition X0 = X0, where X0 is a
random variable equal to x1 with probability fi and x2 with probability 1 ? fi, then X
is a modiflcation of the process ~X, where ~X = Xx1 with probability fi and ~X = Xx2
with probability 1 ? fi. In this way, one may interpret the action of the above lemma as
a kind of \stochastic superposition" (not the usual \superposition principle," which says
that the linear combination of solutions is also a solution, which we cannot expect unless
we assume b is linear). More profoundly, this extends to even nonzero  , which means it
su?ces to examine degenerate initial conditions for (weak) stochastic difierential equations.
We extend the above ideas to the stochastic case in the next section, emphasizing the use
of the Frobenius-Perron semigroup.
78
3.3 Koopman and Frobenius-Perron Operators:
The Stochastic Case
We have studied (0;b) with a degenerate initial distribution, and also for a noisy initial
condition which is the nondegenerate distribution of an initial random variable (with a
density). We want  to be non-zero now, so let us extend the notions of Koopman operator
and Frobenius-Perron operator to the stochastic case and then derive extended versions of
AK and AFP. As before we derive AK and exploit \duality" to obtain AFP.
As in the previous section, it su?ces to study the degenerate solutions by integrating
Px over x with respect to a given nondegenerate initial distribution ?; the proof involves
deep probabilistic concepts (see [9, Theorem 21.10] and the preceding theorems ibidem) so
we will simply state the result.
Lemma 3.6. Suppose, for every x 2Rn, that dXt = b(Xt)dt+ (Xt)dBt, X0 = x a.s. has
a global forward-in-time solution Xx = (Xxt )t2R+ with distribution Px := PXx. Given a
probability measure ? on Bn, deflne P? by
P?(A) :=
Z
x2Rn
Px(A)?(dx);
for any A 2B(C). Then a stochastic process X = (Xt)t2R+ is a solution to the initial value
problem
dXt = b(Xt)dt+ (Xt)dBt; (3.12)
PX0 = ?;
79
ifi PX = P?.
Under the assumptions of Lemma 3.6, let Xx := (Xxt )t?0 denote the solution process to
( ;b)with initial conditionX0 = xa.s. and letPx be the distribution ofXx. DeflneEx to be
the expectation with respect to Px. We can now deflne the stochastic Koopman operator; as
before, we want something like \Utf = f ?St" for a \stochastic solution semi ow" (we can?t
backsolve; the noise in the  ow causes a \difiusion" ) S (which acts on random variables)
and we also want U to map L1(Rn) into L1(Rn). But something must give; we are dealing
with a space of Rn-valued random variables rather than Rn. So, let f 2 L1(Rn); then for
flxed t 2R+ and flxed x 2Rn, f ?(Xxt ) is a bounded and Borel-measurable function that
maps C(R+;Rn) into R, and so Ex(f ? Xxt ) makes sense. This leads us to the following
deflnition:
Deflnition 3.8. Let f 2 L1(Rn) and (Xxt ) := (Xxt )t?0;x2Rn be the family such that, for
every flxed x, (Xxt ) is the canonical realization of the solution to ( ;b) with initial condition
X0 = x a.s. Deflne for all t 2R+ the stochastic Koopman operator Ut : L1(Rn) ! L1(Rn)
by
Utf(x) := Ex(f ?Xxt );
for all f 2 L1(Rn) and x 2Rn.
Let us comment on this deflnition; flrst observe that the canonical realization is nec-
essary; Px (and hence Ex) lives over B(C). Next, note that it is consistent with the
deterministic Koopman operator as Xt = St ?X0 = St(x) when X0 = x a.s. for a global
solution  ow S generated by _y = b(y), so Utf(x) reduces to f(St(x)). Next, it is obvious
that Utf 2 L1(Rn) for all t 2 R+ and f 2 L1(Rn), and that Ut is nonexpansive for all
80
t 2 R+. In fact, U := fUtgt?0 restricts to a strongly continuous semigroup on C0 (see
[9, Theorem 21.11]), so it has an inflnitesimal generator. We emulate the argument in the
deterministic case to identify it; let n = 1 for notational ease (the argument for any n is
similar). Let f 2 C2c(R) and recall that if Xx solves ( ;b) with initial condition X0 = x a.s.,
then Xx and f(Xx) are continuous martingales (which means we may apply Ito?s formula).
For the remainder of this argument, we call Xt := Xxt for convenience. Then we have
Xt = x+Rt0 b(Xs)ds+Rt0  (Xs)dBs, so apply Ito?s formula to f(Xt):
f(Xt) = f(x)+
Z t
0
[b(Xs)f0(Xs)+ 12 2(Xs)f00(Xs)dt]+
Z t
0
 (Xs)f0(Xs)dBs:
Taking the expected value with respect to Px of both sides, we get
Exf(Xt) = f(x)+Ex
 Z t
0
b(Xs)f0(Xs) (3.13)
+12 2(Xs)f00(Xs)dt
?
+Ex(
Z t
0
 (Xs)f0(Xs)dBs):
By basic properties of Ito integrals, Ex(Rt0  (Xs)f0(Xs)dBs) = 0, so (3.13) becomes
Exf(Xt) = f(x)+Ex(
Z t
0
b(Xs)f0(Xs)+ 12 2(Xs)f00(Xs)dt):
81
Now, by deflnition of inflnitesimal generator,
Af(x) = limt!0 E
xf(Xt)?f(x)
t
= limt!0 E
x(Rt
0[b(Xs)f
0(Xs)+ 1
2 
2(Xs)f00(Xs)]dt)
t
= Ex(b(X0)f0(X0)+ 12 2(X0)f00(X0))
= b(x)f0(x)+ 12 2(x)f00(x):
Thus we have the characterization for the inflnitesimal generator of the stochastic Koopman
operator:
AKf = bf0 + 12 2f00; (3.14)
for any f 2 C2c(R), or, for any dimension n, for 0 ? i;j ? n,
AKf =
nX
i=1
bi @f@x
i
+ 12
nX
i;j=1
aij @
2f
@xi@xj; (3.15)
for any f 2 C2c(Rn), where (aij) = Pdk=1  ik jk (the Brownian motion is d-dimensional,
for 1 ? d ? n). Note that if the noise were zero, the generator would correspond to the
deterministic case, as expected.
Next, we obtain the inflnitesimal generator of Frobenius-Perron operator associated to
the \stochastic solution semi ow" S induced by the solution of ( ;b) in the case that S is
nonsingular. We must impose here that @b@x; @ @x; and @2 @x2 exist and are bounded.
82
Naively, if something like \A?FP = AK" holds then from (3.14) and integration by parts
we would get
AFPf = ?(bf)0 + 12( 2f)00; (3.16)
or for any dimension n,
AFPf = ?
nX
i=1
@(bif)
@xi +
1
2
nX
i;j=1
@2(aijf)
@xi@xj : (3.17)
In fact, this is the case; for a \from scratch" proof, see [11, Theorem 11.6.1].
Now, let X solve ( ;b) with initial condition X0 = X0 a.s. and suppose X0 has density
g. Then we can set up the problem
@u
@t = AFPu; (3.18)
u(0;x) = g(x);
where the AFP is as in (3.17) and the solution u(t;?) to (3.18) is the density of Xt for
every t; @u@t = AFPu is called the Fokker-Planck equation. We are interested in flnding a
fundamental solution to (3.18); we digress slightly to give some necessary deflnitions and
notation that leads to one result that guarantees existence/uniqueness under some technical
conditions.
First, let us rewrite AFPu in nondivergence form:
AFPu = ?
nX
i=1
@(bi(x)u)
@xi +
1
2
nX
i;j=1
@2(aij(x)u)
@xi@xj (3.19)
83
= ~c(x)u+
nX
i=1
~bi(x) @u
@xi +
1
2
nX
i;j=1
aij @
2u
@xi@xj;
where
~bi(x) = ?bi(x)+
nX
i;j=1
@aij(x)
@xj
and
~c(x) = 12
nX
i;j=1
@2aij(x)
@xi@xj ?
nX
i=1
@bi(x)
@xi :
Of course, the coe?cients must be su?ciently smooth for the above to make sense; we
also want them to satisfy growth conditions, namely, that there is a positive constant M
such that
jaij(x)j? M;j~bi(x)j? M(1+jxj);j~c(x)j? M(1+jxj2): (3.20)
We know aij = aji, so given any ? = (?1;?2;??? ;?n) 2 Rn, we at least know
Pn
i;j=1 aij?i?j =
Pn
k=1(
P
i=1  ik(x)?i)
2 ? 0. We would like strict inequality, so let us
assume that the uniform parabolicity property holds, that is, that there is a constant ? > 0
such that
nX
i;j=1
aij(x)?i?j ? ?
nX
i=1
?2i; (3.21)
for any x 2Rn and ? 2Rn.
We condense the above into the following deflnition:
Deflnition 3.9. Given (3.18), we say aij and bi are Cauchy-regular if they are C4 functions
such that the corresponding aij, ~bi and ~c of (3.19) satisfy (3.21) and (3.20).
84
Now we recall the deflnition of a classical solution.
Deflnition 3.10. Let f 2 C(Rn). We say u : R+ ?Rn ! Rn is a classical solution of
(3.18) if
i) for all T > 0 there are positive constants c;fi such that ju(t;x)j ? cefix2 for all
0 < t ? T, x 2Rn,
ii) ut;uxi;uxi;xj are continuous for all 1 ? i;j;? n and u satisfles
ut = ~c(x)u+
nX
i=1
~bi(x) @u
@xi +
1
2
nX
i;j=1
aij @
2u
@xi@xj;
for all t > 0 and x 2Rn, and
iii) limt!0 u(t;x) = f(x).
We are now able to state the desired existence/uniqueness theorem:
Theorem 3.2. Given (3.18), let aij;bi be Cauchy-regular and let f 2 C(Rn) satisfy
jf(x)j? cefix2 with positive constants c;fi. Then there is a unique classical solution to (3.18)
given by u(t;x) = R ?(t;x;y)f(y)dy, where the fundamental solution (or kernel) ?(t;x;y)
is deflned for all t > 0, x;y 2 Rn, is continuous and difierentiable with respect to t, twice
difierentiable with respect to xi for all 1 ? i ? n, and satisfles the equation
ut = ~c(x)u+
nX
i=1
~bi(x) @u
@xi +
1
2
nX
i;j=1
aij @
2u
@xi@xj
as a function of t and x for every flxed y.
85
Our slight digression concludes with at least one condition under which a fundamental
solution exists. Now, if we are able to flnd a fundamental solution ?(t;x;y) to the Fokker-
Planck equation then given any initial condition u(0;x) = g(x), where g 2 L1(Rn), we can
deflne a family of operators fPtgt?0 by
u(t;x) = Pt(g(x)) =
Z
Rd
?(t;x;y)g(y)dy; (3.22)
and u is often called a generalized solution in this case (of course, g has to be continuous
in order for u to be a classical solution).
Deflnition 3.11. We call fPtgt?0 a stochastic semigroup if fPtgt?0 is Markovian semigroup
of linear operators (on L1(Rn)) that is monotone (Ptf ? 0 when f ? 0, for all t 2R+) and
norm-preserving (kPtfk = kfk when f ? 0, for all t 2R+).
The proof of the next theorem can be found in ([11, pp. 369-370]).
Theorem 3.3. fPtgt?0 as deflned in (3.22) is a stochastic semigroup.
This theorem justifles the following deflnition:
Deflnition 3.12. We call P := fPtgt?0 as deflned in (3.22) the stochastic Frobenius-Perron
semigroup.
Let us now consider the simple example
dXt = dBt; (3.23)
X0 = X0 a:s:;
86
where X0 has density g. Then the solution is a Brownian motion, and (3.18) becomes the
heat equation
ut = 12?u;
u(0;x) = g(x);
which has solution
u(t;x) = ( 12?t)d2
Z
Rd
e?jx?yj
2
2t g(y)dy; (3.24)
for x 2Rd;t ? 0. Notice that the fundamental solution
( 12?t)d2e?jx?yj
2
2t
is the density of a Brownian motion, as we expect.
One way to think about what happens is that, for a noiseless stochastic difierential
equation with degenerate initial condition, we have a point moving through space in time
governed by a  ow (in essence, an ordinary difierential equation). If the initial condition
is nondegenerate with a density, we may understand how the family of points evolves as a
density via the partial difierential equation generated by the Frobenius-Perron operator.
Now, if a stochastic difierential equation has a degenerate initial condition, we still
have a point moving through space in time governed by a  ow, but there is noise and we
cannot actually tell where that point is; we are  uctuating random variables or measures.
If the measures are absolutely continuous, we may instead  uctuate densities just as in the
87
previous case, which means that \deterministic partial difierential equations have the same
complexity as stochastic difierential equations with degenerate initial conditions."
Another interpretation for the latter case is that a point moves through space governed
by a Brownian motion whose \expected  ow" is described by b and whose \spread" or
\intensity" is described by  . For example, in (3.23), the  ow is trivial, so we expect that
the point stays where it started in space, but as time goes the noise may move it away. With
the above interpretation, we see now that there is no difierence between ( ;b) := (1;0) with
degenerate initial condition X0 = x and ( ;b) := (0;bL) with nondegenerate initial condition
having a Lebesgue density g, where bL can be derived from the Liouville equation.
So how much more complicated is the \mixed" case where neither  nor b are zero?
We can actually remove b from our consideration; this result is called the \transformation
of drift" formula (so-called because b is often referred to as the \drift" term), which in our
situation can be stated as follows (see [5, p. 43]):
Given any x 2 Rn, let Xx solve ( ;b) with initial condition Xx0 = x a.s. Assume
 :Rn !Rn ?Rn and  (y) has positive eigenvalues for every y. Further, let f :Rn !Rn
and suppose Y xt solves ( ;b +  f). Then PXxt and PY xt are absolutely continuous with
respect to each other and
dPY xt = exp[
Z t
0
f(Xxs)dBs ? 12
Z t
0
jf(Xxs)j2dt]dPXxt : (3.25)
In particular, we could pick f such that  f = ?b, and obtain a relationship between
( ;b) and ( ;0); we have already realized how ( ;0) relates to a deterministic partial dif-
ferential equation (as we did in the study of (3.23)). So, in theory, one can describe the
88
dynamical systems aspects of ( ;b) in general by tracing back to ( ;0) or (0;b) (although
this may be quite unwieldy).
Now that we understand dynamical systems in a stochastic setting, we move to the
notions of stability in a stochastic setting, deflning what the various notions of \stochastic
stability" are as well as emulating Liapunov theory to demonstrate stability/instability of
solutions to stochastic difierential equations.
89
3.4 Liapunov Stability
We begin by recalling some notation and some basic notions of stability of deterministic
dynamical systems.
As we discussed in section 3.1, \Stochastic Dynamical Systems," we let b 2 C1(Rn;Rn)
and consider the system _u = b(u). Given any initial value x 2 Rn there exists a largest
open interval of existence Ix ?R containing 0 such that the system _u = b(u) has a unique
solution ux 2 C1(Ix;Rn) with ux(0) = x. The system _u = b(u) generates a local solution
 ow S : D(S) ?R?Rn !Rn with D(S) := f(t;x) 2R?Rnjt 2 Ixg where S(t;x) := ux(t)
for all (t;x) 2 D(S); we know D(S) is open, S is C1(D(S);Rn), and S is satisfles the group
property.
In what follows, we assume that S is a global solution  ow.
Deflnition 3.13. We say ?x is an equilibrium point of S if S(t;?x) = ?x for every t 2R.
Observe that ?x is an equilibrium point of S ifi b(?x) = 0.
Deflnition 3.14. An equilibrium point ?x of S is called stable if for any ? > 0; there
is ?(?) > 0 such that whenever kx ? ?xk < ?, it follows that kS(t;x) ? ?xk < ? for all
t ? 0. An equilibrium point that is not stable is called unstable. An equilibrium point of
a system is asymptotically stable if it is stable and, in addition, there is r > 0 such that
limt!1S(t;x) = ?x for all x such that kx? ?xk < r.
We now recall the principle of linearized stability, which in essence extracts information
about the stability of the nonlinear system from the stability of the linearized system. More
speciflcally, for an equilibrium point ?u, we linearize b at ?u so that our system becomes
_v = Db(?u)v, where v = u ? ?u and Db(?u) is the Jacobian matrix. It can be shown [8,
90
Theorem 9.5 and Theorem 9.7] that if Db(?u) has only eigenvalues with negative real parts,
then ?u is asymptotically stable, while if any eigenvalue has positive real part, then ?u is
unstable (for eigenvalues with real part 0, the linearized system is insu?cient to determine
stability).
Assuming that b(0) = 0, we are interested in the stability of the trivial solution u := 0;
we use Liapunov theory in this situation.
Deflnition 3.15. We say a C1-function V : D(V) ?Rn !R is positive deflnite if D(V)
is open and contains the origin, if V(0) = 0 and if V(x) > 0 for all non-zero x. If
?V is positive deflnite, we call V negative deflnite. Deflne the orbital derivative of V to
be AKV = (b ? r)V = Pni=1 bi @V@xi. We call a positive deflnite V (strictly) Liapunov if
AKV(x) ? (<)0 for all nonzero x.
The utility of Liapunov functions is illustrated in the following theorem, which is proven
e.g. in [8, Theorem 9.12].
Theorem 3.4. If 0 is an equilibrium point of _u = b(u), and if there exists a (strictly)
Liapunov function V, then 0 is (asymptotically) stable. Further, 0 is unstable if AKV > 0.
Moving to the stochastic case, we generalize the concepts of stability, orbital derivative,
Liapunov function, and the principle of linearized stability. Stability and orbital derivative
are fairly straight forward to generalize, and Liapunov functions are only a little trickier,
but unfortunately, the principle of linearized stability is quite di?cult to generalize. Recall
that Xx denotes the solution to ( ;b) with degenerate initial condition X0 = x a.s.; assume
global solvability, that is, assume Xx exists for every x 2 Rn. Throughout, assume that
b(0) = 0 and  (0) = 0, so that ( ;b) admits that trivial solution X = 0.
91
Deflnition 3.16. If for all ? > 0, we have
limx!0P(sup
t?0
jXxt j > ?) = 0;
then we say the trivial solution X = 0 is stable in probability.
In essence, this means that as x goes to 0, the probability that a path starting at x
will remain in an arbitrarily prescribed neighborhood of 0 is 1. This is quite similar to the
deterministic version of stability, except now when x is close to 0, the probability that Xx
is also close to zero is close to 1.
Deflnition 3.17. If X = 0 is stable in probability and, for every x,
limx!0P(limt!1jXxt j = 0) = 1;
we say X = 0 is asymptotically stable in probability.
Basically, this means that as x goes to 0, the probability that a path starting at x will
eventually approach 0 as time goes to inflnity is 1.
Deflnition 3.18. Let ( ;b) admit a trivial solution X0 = 0. If X0 is stable in probability
and, for every x,
P(limt!1Xxt = 0) = 1
we say X0 is asymptotically stable in the large.
Asymptotic stability in the large is the most powerful notion of stability, since the
probability that any path (no matter where it starts) goes to 0 as time goes to inflnity is 1.
92
If we are to generalize the Liapunov stability theory to the above concepts, we would
need to study the sign of the \stochastic orbital derivative"; to see what the \stochastic
orbital derivative" is, we do a little reverse engineering. Notice that the deterministic orbital
derivative takes the form of the generator of the deterministic Koopman semigroup AK, so
analogously, it makes sense to think that the \stochastic orbital derivative" should take
the form of the generator of the stochastic Koopman semigroup. This formally justifles the
following deflnition.
Deflnition 3.19. For V in C2(Rn), we deflne the stochastic orbital derivative of V to be
AKV =
nX
i=1
bi@V@x
i
+ 12
nX
i;j=1
aij @
2V
@xi@xj;
where as before, A := (aij) = (Pnk=1  ik jk).
We remark that the notation \AK" as well as the stochastic generalization of orbital
derivative are consistent; they reduce to the deterministic case when  is 0.
Now we can generalize the Liapunov theory, which parallels the deterministic case
quite similarly; we remark up front that we are presenting a brief summary with some
simplifying conditions and we are only operating in the time-homogeneous case, and that
there are plenty of weaker assumptions and technical details behind what follows (the reader
is invited to check [7, Chapter 5] for more).
Deflnition 3.20. Let V : D(V) ?Rn !R, where D(V) is open and contains the origin,
V(0) = 0, and V(x) > 0 for all non-zero x. Further, let V 2 C2(D(V)nf0g). We say V is
a (strict) stochastic Liapunov function if AKV(x) ? (<)0 for all nonzero x.
93
Theorem 3.5. If V is a stochastic Liapunov function then X = 0 is stable in probability.
Further, if the matrix A has positive eigenvalues, then X = 0 is stable in probability ifi it
is asymptotically stable in probability.
The proof of this theorem can be found in [7, pp. 164,168].
Asymptotic stability in the large is almost \too nice" for practical purposes; still,
there are several conditions that are su?cient to guarantee it. One not surprising condi-
tion is that X = 0 is asymptotically stable in the large if X = 0 is stable in probabil-
ity and recurrent to the domain jxj < ? for all ? > 0 (a process Y is recurrent to A if
supft ? 0 : P(Yt 2 A) = 1g = 1, else it is transient). There are stricter conditions which
can be imposed on V which are of little interest to us; see ([7, Theorem 4.4, Theorem 4.5])
for those details.
As far as instability goes, things are usually a little trickier. Intuitively, systems that are
stable without noise may become unstable with the addition of noise. Much less intuitively,
an unstable system can be stabilized by the addition of noise! We shall soon see examples
of these situations, but for now, we state one su?cient condition for instability.
Theorem 3.6. Let V be a stochastic Liapunov function with the exception that D(V)
may not contain zero, let limx!0 V(x) = 1, and call Ur = fx 2 D(V) j jxj < rg for
r > 0. If A has positive eigenvalues, then X = 0 is unstable in probability, and further,
P(supt>0jXxt j < r) = 0 for all x 2 Ur.
Contrast this to the deterministic case, and notice that AKV does not change sign but
V is now \inversely positive deflnite," which makes the above believable.
94
Let us now look at some examples; of course, there is little to do with the trivial
solutions to the transport equation or the Langevin equation, so let us move the next most
complicated example.
Example 3.1.
Reconsider the one-dimensional equation dXt = bXtdt+ XtdBt, where b; are positive
constants, with initial condition X0 = x a.s. We have already solved this explicitly, and we
know its solution is
Xt = xe(b?12 2)t+ Bt:
We can see that when 2b <  2, the expected value of the solution decays to 0 as time goes
to inflnity, so we expect that the condition 2b <  2 insures the zero solution X = 0 is stable;
let us use the Liapunov theory to verify this. Pick V(x) = jxj1? 2b 2 ; V is positive-deflnite
and twice-continuously difierentiable (except at 0) so we may examine AKV for nonzero x:
AKV(x) = bxV0(x)+ 12 2x2V00(x);
which is the same as
AKV(x) = bx(1? 2b 2)jxj? 2b 2 + 12 2x2(1? 2b 2)(?2b 2)jxj? 2b 2?1:
With a bit of algebra, it is clear that AKV < 0 when 2b <  2. Thus, X = 0 is asymptotically
stable in probability.
Computationally, this example is quite simple, and interpreting stability in this case as
an \extinct population" is reasonable. However, the results may cause the reader di?culty
95
when it comes to a physical interpretation. Notice that if there is no noise, we have _x = bx,
where b > 0; clearly this has an unstable trivial solution, so in this case, adding \enough"
noise actually stabilizes the trivial solution. This does not jibe with our physical intuition,
so for consistency?s sake the condition 2b <  2 as above is deemed \physically unfeasible"
(for a further discussion of this, see [7, 173-176]).
Remark: The discussion in ([7, 173-176]) will appeal to readers interested in a contrast
of the Ito (left-endpoint) interpretation and the Stratonovich (midpoint) interpretation of
the stochastic integral. It turns out that, under the Stratonovich interpretation, the sign of
b alone determines the stability of the trivial solution.
Along these lines, if \not enough" noise is added, or really, \not enough physically
feasible" noise is added, then the trivial solution should remain unstable; this is indeed the
case when 2b >  2. To see this, select V(x) = ?lnjxj. Then all the conditions to determine
instability are satisfled, since for non-zero x,
AKV = ?b+ 12 ? 0:
Of course, if b is negative, the trivial solution is stable no matter what  is.
It is intuitive to think that any stable system will become unstable with the addition
of enough noise, but in fact it depends upon the dimension of the space. We can mimic the
above argument in a fairly general setting: suppose we have a system of n equations, each
equation of which has a stable trivial solution. Now add noise to it so our system becomes
dXt = b(Xt)dt+ XtdBt;
96
for  > 0 a constant. Then picking V(x) = ?ln(jxj2), we see after several steps of calculus
that
AKV(x) = ?2x?b(x)jxj2 ? 2(n?2):
We satisfy the hypotheses of Theorem 3.6 when n > 2, as we can pick  large enough to
destroy the stability of the trivial solution of the original system. Notice that if n = 2
and we take b(i)(Xt) := biXt for i = 1;2, where bi are negative constants for i = 1;2, the
asymptotic stability of the system cannot be destroyed by arbitrarily large noise; let  be
any constant. Then there is a su?ciently small positive constant a := a( ) such that taking
V(x) = jxja yields
AKV(x) = ajxja?2(b1x21 +b2x22 + a 
2jxj2
2 ) < 0:
This means the trivial solution of the system is still asymptotically stable (in fact, asymp-
totically stable in the large).
Let us move to the situation where the trivial solution is stable, but not asymptotically
stable. In this case, stability may be so delicate that even the slightest of noise ruins it;
this is exhibited in the next example.
Example 3.2.
Consider the system
dX1 = X2dt+ (X)dB1t;
dX2 = ?X1dt+ (X)dB2t;
97
where X = (X1;X2) and B1;B2 are independent Brownian motions. In the determin-
istic case, we have a stable equilibrium at zero that is not asymptotically stable. Pick
V(x) = ?ln(jxj2) for x = (x1;x2); similarly to the above example this satisfles all the
necessary requirements to test for instability, and we see
AKV(x) = x2@V(x)@x
1
?x1@V(x)@x
2
+ 12 2(x)[@
2V(x)
@x21 +
@2V(x)
@x22 ]:
With a bit of calculation we see AKV(x) = 0 whenever  (x) is nonzero for x nonzero, which
means we have instability for arbitrarily small positive noise.
So we have seen simple examples where
i) instability becomes stability with enough noise (although this is not \physically
feasible"),
ii) stability is not afiected by (arbitrarily large) noise, and
iii) stability is destroyed by (arbitrarily small) noise,
which shows the complicated and interesting nature of stochastic stability.
Now we brie y discuss the principle of linearized stability; with the above in mind it
should not be surprising that there are quite a lot of di?culties with extracting information
about the full system from the linear approximation. So, what can we say about the full
system if we know how its linearization acts? For one thing, the full system is stable if the
linearized system has constant coe?cients and is asymptotically stable. One needs some
other concepts like \exponential stability" to say more; interested readers may want to start
with [7, Chapter 7]. From this point we abandon Liapunov theory in favor of the \density
 uctuation" type of stability theory.
98
3.5 Markov Semigroup Stability
Of more practical importance to us is the use of Frobenius-Perron operators and the
Fokker-Planck equations when dealing with stability of solutions to stochastic difierential
equations.
Let (X;A;?) be a measure space, let P := fPtgt?0 be a stochastic semigroup, and call
D := ff 2 L1(X) j kfk = 1;f ? 0g the set of densities.
Deflnition 3.21. We say f? 2 D is an invariant density for P (also called a stationary
density) if Ptf? = f? for all t ? 0.
When P is obvious, we may just say that f? is an invariant density.
Deflnition 3.22. We say P is asymptotically stable if P has a unique invariant density
f?, and if, for all f 2 D,
limt!1kPtf ?f?kL1(X) = 0:
The analog to instability is called sweeping.
Deflnition 3.23. We say that P is sweeping with respect to a set A 2A if, for all f 2 D,
limt!1
Z
A
Ptf(x)dx = 0:
Given some  -algebra F ? A of X, if P is sweeping for all A 2 F, then we say it is
sweeping with respect to F.
When the context is clear, we usually just say that a semigroup is sweeping.
Of particular interest are stochastic semigroups that are kernel operators (when
(X;A;?) := (Rn;Bn;?n)).
99
Deflnition 3.24. We say P is a stochastic semigroup of kernel operators (on Rn) if for
any x 2Rn, t 2R+, and f 2 D,
Ptf(x) =
Z
Rn
K(t;x;y)f(y)dy;
where K := K(t;x;y) :R+ ?Rn ?Rn !R+ is a (stochastic) kernel, in the sense that
Z
Rn
Ptf(x)dx = 1:
Stochastic semigroups of kernel operators will correspond to a semigroup of Frobenius-
Perron operators associated to a Fokker-Planck equation having a fundamental solution;
for the remainder of the section, let the hypotheses of Theorem 3.2 be satisfled (so aij and
bi are Cauchy-regular for (3.18)) and call P := fPtgt?0 the stochastic Frobenius-Perron
semigroup associated to (3.18).
We emulate the Liapunov-type stability theory by again appealing to AK.
Deflnition 3.25. Let V 2 C2(Rn) be nonnegative, let limjxj!1V(x) = 1, and let there
exist constants  ;? such that V(x);j@V(x)@xi j, and j@2V(x)@xi@xj j are all bounded by  e?jxj, for
1 ? i;j ? n. If in addition, there exist positive constants fi and fl such that V satisfles
AKV(x) ??fiV(x)+fl;
then we call V Markovian-Liapunov (ML).
The next theorem is quite natural; a proof can be found in ([11, Theorem 11.9.1]).
100
Theorem 3.7. P (associated to (3.18)) is asymptotically stable if there exists a ML function
V.
When P is asymptotically stable we can determine the invariant density u?; since u?
does not change in time, then u? is the unique density that satisfles the special case of
(3.18):
1
2
dX
i;j=1
@2
@xi@xj(aiju?)?
dX
i=1
@
@xi(biu?) = 0:
Next we deal with the conditions under which P is sweeping; in this context it is
understood that we are considering sweeping from the family of compact subsets Bc of Rn.
In other words, if for all A 2Bc and for all f 2 D,
limt!1
Z
A
Ptf(x)dx = limt!1
Z
A
u(t;x)dx = 0;
then P is sweeping.
Deflnition 3.26. Let V 2 C2(Rn) be positive and let there exist constants  ;? such that
V(x);j@V(x)@xi j, and j@2V(x)@xi@xj j are all bounded by  e?jxj. If in addition, there exists a positive
constant fi such that V satisfles
AKV(x) ??fiV(x); (3.26)
then we call V a Bielecki function.
The proof of the next theorem can be found in [11, Theorem 11.11.1].
Theorem 3.8. P (associated to (3.18)) is sweeping if there exists a Bielecki function V.
101
Example 3.3.
One very simple example in one dimension is ( ;?bx) with initial condition X0 = X0
a.s., where X0 has density f and  and b are positive constants. We have already explicitly
solved this; recall that the solution is
Xt = e?btX0 + 
Z t
0
eb(s?t)dBs:
Trying to use Liapunov theory as before proves fruitless, as the trivial solution would have
to have  := 0. However, we can see that the expected value of this process at any time
t is E(Xt) = e?btE(X0), and that the variance V(Xt) is e?2btV(X0) +  2Rt0 e2b(s?t)ds; if
time goes to inflnity then V(Xt) goes to  22b and E(Xt) goes to 0. Thus we should see some
kind of asymptotic stability with a limiting density exhibiting the same kind of variance; a
natural guess is a Gaussian density centered at zero with variance  22b.
Pick V(x) = x2; observe that V is ML since
AKV(x) = 12( 2)(2)+(?bx)(2x) ??fix2 +fl
is satisfled when fi := 2b and fl :=  2. Hence P is asymptotically stable and the limiting
density satisfles AFPu? = 0, or
1
2( 
2u?(x))00 ?(?bxu?(x))0 = 0;
and this has solution
u?(x) =
r
b
? 2e
?bx2
 2 :
102
Note that this is a normal density with expected value zero and variance  22b, which is
consistent with our expectations.
Example 3.4.
To see how sweeping works, we study dXt = bXtdt +  dBt with initial condition
X0 = X0 a.s., whereX0 hasdensityf and andbarepositiveconstants. PickV(x) = e?kx2,
for some positive constant k. To see if V is a Bielecki function we need to flnd a positive fi
such that
1
2 
2e?kx2[(4k2x2)+(?2k)]+bxe?kx2(?2kx) ??fiV(x):
A bit of manipulation gives
2k(( 2k?b)x2 ? 2) ??fi;
and we satisfy this if we take k := b 2 and fi := b . Thus the semigroup is sweeping.
Roughly speaking, sweeping and asymptotic stability are the only possibilities; this is
the so-called Foguel alternative ([11, Theorem 11.12.1]):
Theorem 3.9. Let the hypotheses of Theorem 3.2 be satisfled, and let P be the stochastic
Frobenius-Perron semigroup associated to (3.18). Suppose all stationary nonnegative solu-
tions to (3.18) take the form cu?(x), where u? > 0 a.e. and c is a nonnegative constant,
and call
I :=
Z
Rn
u?(x)dx: (3.27)
If I < 1, P is asymptotically stable; if I = 1, P is sweeping.
103
This makes sense; some normalized version of u? would be the exact limiting density,
provided u? had a flnite integral.
We now give a template in one dimension of how to utilize the Foguel alternative.
Consider
dXt = b(Xt)dt+ (Xt)dt;
where a(x) =  2(x) and b(x) are Cauchy-regular.
The Fokker-Planck equation takes the form
1
2( 
2(x)u?(x))00 ?(b(x)u?(x))0 = 0;
or, writing z(x) =  2(x)u?(x),
dz
dx =
2b(x)
 2(x)z +c1;
for c1 a constant. Then, if e
Rx
0 B(y)dy makes sense, where B(y) := 2b(y) 2(y), we get, for c2 a
constant,
z(x) = e
Rx
0 B(y)dy
 
c2 +c1
Z x
0
e
Ry
0 ?B(z)dzdy
?
:
We only care about the a.s. positive stationary solutions for the application of the Foguel
alternative, so it is enough to examine the sign of c2 +c1Rx0 (e
Ry
0 ?B(z)dz)dy for almost every
x.
If we assume that xb(x) ? 0 for all jxj ? r, for r a positive constant (so [?r;r] is not
repelling for trajectories of _x = b(x)), then (according to Maple) Rx0 (e
Ry
0 ?B(z)dz)dy ! ?1
when x ! ?1; this means z cannot be positive for every x unless c1 = 0, and thus, the
104
stationary nonnegative solutions must take the form
u?(x) = 1 2(x)c2e
Rx
0 B(y)dy:
We now need to check whether RRu?(x)dx is flnite or not, which is the same as observing
that
I :=
Z 1
?1
1
 2(x)e
Rx
0 B(y)dy (3.28)
is flnite or not. If I < 1, P is asymptotically stable, and if I = 1, P is sweeping. We now
summarize these results:
Corollary 3.1. Assume a(x) =  2(x) and b(x) are Cauchy-regular for
dXt = b(Xt)dt+ (Xt)dt and assume xb(x) ? 0 for all jxj ? r, for r a positive constant.
Then if I in (3.28) is flnite, P is asymptotically stable, and if I in (3.28) is inflnite, P is
sweeping.
Example 3.5.
Let  (x) :=  be a nonzero constant and let b(x) = ? Kx1+x2, for K ? 0 constant. Then
B(x) = ?1 2
Z x
0
2Ky
1+y2dy = ?K ln(1+x
2);
and
u?(x) = ce?K 2 ln(1+x2) = C(1+x2)?;
where ? := ?K 2 . We see u? is integrable ifi K 2 > 12, which implies P is asymptotically
stable. Also, 0 ? K 2 ? 12 implies P is sweeping. In conclusion, the origin is attracting in
105
the deterministic case, but in the stochastic case, we can calculate the critical amount of
noise needed to destroy the asymptotic stability.
Example 3.6.
Let b; be positive constants and reconsider the equation
dXt = ?bXtdt+ XtdBt;
with initial condition X0 = X0 a.s. (so b(x) := ?bx and  (x) :=  x).
We have already solved this explicitly and observed that, for any degenerate initial
condition X0 = x a.s., the solution will go to zero as time goes to inflnity. We also
used a stochastic Liapunov function to deduce asymptotic stability. Note that we can-
not apply the template; the necessary prerequisites for the template are not satisfled since
a(x) =  2(x) =  2x2 is not bounded by any constant M and hence, is not Cauchy-regular.
106
3.6 Long-time Behavior of a Stochastic Predator-prey Model
This is a summary of \Long-time behaviour of a stochastic prey-predator model" by
Rudnicki [16].
We consider the system
dXt =  XtdBt +(fiXt ?flXtYt ??X2t )dt; (3.29)
dYt = ?YtdBt +(? Yt +?XtYt ??Y 2t )dt; (3.30)
which is a stochastic Lotka-Volterra predator-prey model. In [4], the existence of a solution
to (3.29, 3.30) is proven. We interpret the (positive) constant coe?cients in the following
way: fi is the growth rate of the prey in the absence of predators, fl is the \predation
rate" that kills ofi the prey, and ? is inversely related to the \carrying capacity" of the
prey, in that if the population grows too much, the environment cannot support further
growth. We interpret  as the decay rate of the predator in the absence of prey and ? as
the predation rate that causes predator growth. We may also think of ? as the \reciprocal
carrying capacity" of the predator. Further, we interpret  ;? as \noise terms" like disease
or weather  uctuations that would interfere with an ideal model.
Suppose that  = ? = 0 in (3:29;3:30), so that we are in the deterministic case.
One can compute equilibrium points: (0;0);(0;? ?);(fi?;0), and (?x; ?y), where
?x = fi? + fl?fl +?? ;
107
?y = fi? ?? ?fl +??:
We observe that (0;0) is unstable, (0;? ?) is biologically irrelevant, and (fi?;0) yields
2 cases, namely, stability for ? > fi? and instability for ? < fi?. Finally, (?x; ?y) yields 3
cases, namely, it lies in the fourth quadrant and is biologically irrelevant for ? > fi?, lies
in the flrst quadrant and is asymptotically stable for ? < fi?, and lies on the x-axis for
? = fi?.
So how does this relate to the stochastic case? Let us for now sacriflce technicality for
intuition, and examine the terms
c1 = fi?  
2
2 ; c2 =  +
?2
2 :
These are the \stochastic versions" of fi and  , respectively (which make sense; if there are
very large  uctuations in disease, weather, etc., then it could signiflcantly afiect birth/death
rates). Then conditions like \? < (>)fi?" become \?c2 < (>)c1?." We get something
analogous in Rudnicki?s Theorem 1, namely, if c1 < 0, then the prey die, and so do the
predators. If c1 > 0, if we have \?c2 > c1?", the predators growth will be negative, and
eventually, the predators die out; if we have \?c2 < fic1", then we obtain a \nice" result,
that somehow the system reaches a desired level of stability. One can see how large noise in
c1 could reduce the prey?s birth rate to below zero, and hence, cause extinction. Without
this noise term or predators, the population would converge to a positive equilibrium,
but with the noise term, \bad" environmental  uctuations cause extinction (even with no
predators!). Similarly, the predators can die if ? is too large, no matter how the prey acts.
The efiects of the incorporation of the noise term is in essence a decrease in the prey?s birth
108
rate and an increase in the predator?s death rate. This is arguably a sensible reflnement,
as it is a little idealistic to think that very small populations will always survive; one must
expect some role to be played by the unpredictability of nature.
So, equippedwiththebasicidea, weproceedtomakemoreprecisetheabovebyformally
stating Rudnicki?s main theorem and outlining the strategy of the proof. First, transform
(3:29;3:30) by calling Xt = e?t and Yt = e?t, so we arrive at the main system
d?t =  dBt +(fi?  
2
2 ??e
?t ?fle?t)dt; (3.31)
d?t = ?dBt +(? ? ?
2
2 +?e
?t ??e?t)dt: (3.32)
Let the solution process (?t;?t) be such that the distribution of the initial value (?0;?0)
is absolutely continuous with density v(x;y). Then (?t;?t) has density u(x;y;t), where u
satisfles the Fokker-Planck equation:
@u
@t =
1
2 
2@2u
@x2 + ?
@2u
@x@y +
1
2?
2@2u
@y2 ?
@(f1(x;y)u)
@x ?
@(f2(x;y)u)
@y ; (3.33)
where f1(x;y) = c1 ? ?ex ? fley;f2(x;y) = ?c2 + ?ex ? ?ey, and where c1 = fi ? 12 2,
c2 =  + 12?2 > 0.
To verify this, it must be shown that the transition probability function for (?t;?t),
which we call P(t;x;y;A), is absolutely continuous with respect to Lebesgue measure for
each (x;y) and t > 0. This means that the distribution of any solution is absolutely contin-
uous and has density u satisfying (3.33). This allows us to proceed by studying \ uctuation
109
of densities", using advanced techniques based on the section on Markov semigroup stability
(see [14] and [15]). We now state the paper?s main theorem (Theorem 1):
Let (?t;?t) solve (3.31,3.32). Then for all t > 0 the distribution of (?t;?t) has a density
u(t;x;y) satisfying (3.33).
1) If c1 > 0 and ?c2 < ?c1, then there is a unique density u? which is an asymptotically
stable stationary solution of (3.33). This means that, no matter what the initial distribution
of (?0;?0) is, (?t;?t) converges in distribution to a random variable with density u?.
2) If c1 > 0 and ?c2 > ?c1, then limt!1?t = ?1 a.s. and the distribution of ?t
converges weakly to the measure with density f?(x) = C exp(2c1x 2 ?(2? 2)ex).
3) If c1 < 0, then ?t and ?t go to ?1 a.s. as t goes to 1.
We outline the proof of this theorem by lemmas, introducing notation as necessary:
Call Ptv(x;y) = u(t;x;y). ThenfPtg is a Markov semigroup corresponding to (3.33) (write
(3.33) as @u@t = Au: Then A is the inflnitesimal generator of fPtg).
Lemma 1: fPtgt?0 is an integral Markov semigroup with a continuous kernel k.
In fact, k = k(t;x;y;x0;y0) 2 C1(R+?R2?R2) is the density of P(t;x0;y0;?), so that
Ptv(x;y) =
Z 1
?1
Z 1
?1
k(t;x;y;?;?)v(?;?)d?d? (3.34)
is the integral representation of fPtg. The Hormander condition is verifled to prove that a
density exists.
We will need that k is positive to apply some \Foguel alternative type" results; the
basic idea is to flnd some set that is an attractor and realize that k is positive on that
set (which is all that is needed). To this end, a method based on support theorems is
introduced, and we get
110
Lemma 2: For each (x0;y0) 2 E and for almost every (x;y) 2 E, there exists T > 0
such that k(T;x;y;x0;y0) > 0; where
i) E =R2 if  > ? or fl? ? ? ,
ii) E = E(M0) = f(x;y)jy < (? )x+M0g; where M0 is the smallest number such that
(f1;f2)?[?; ] ? 0 for (x;y) =2 E(M0), if  ? ? and fl? < ? .
So, in the case of i) the invariant density u? is positive everywhere, while in the case
of ii) we have a smaller support. If i) we can use the following result:
If an integral Markov semigroup has only one invariant density that is a.e. positive,
then the semigroup is asymptotically stable. Also, if there is no invariant density, the
semigroup is sweeping from compact sets (or simply \sweeping").
However, if ii) holds, the situation is more delicate, and we must insure that, a.e., for
any t > 0;f 2 D,
Z 1
0
Ptfdt > 0 (3.35)
in order to yield that the (integral Markov) semigroup is either asymptotically stable or
sweeping (also called the Foguel alternative). In fact, in the case of ii) one can show
Lemma 3: In the situation of Lemma 2 ii),
limt!1
Z Z
E
Ptf(x;y)dxdy = 1: (3.36)
Now we have
Lemma 4: fPtg is either sweeping or asymptotically stable.
111
Of course, one would like to know which one is happening, so naturally the next result
is
Lemma 5: If c1 > 0 and ?c2 < ?c1 then fPtg is asymptotically stable.
The proof of this lemma relies upon the construction of a Khasminskii function, the
existence of which precludes sweeping. This yields Theorem 1 i).
For Theorem 1 ii) and iii), recall that, for equation ( ;b) and its solution Xt, if we
deflne
s(x) =
Z x
0
exp(?
Z y
0
2b(r)
 2(r))drdy; (3.37)
then s(?1) > ?1 and s(1) = 1 implies limt!1Xt = ?1. From this fact (and a bit
of ergodic theory) it is simple to derive Lemmas 6 and 7, which are Theorem 1 iii) and ii),
respectively.
112
Bibliography
[1] H. Amann, Ordinary Difierential Equations, de Gruyter, Berlin & New York, 1990.
[2] L. Arnold, Random Dynamical Systems, Springer, Berlin & New York, 1998
[3] H. Bauer, Probability Theory, de Gruyter, Berlin & New York, 1996.
[4] Chessa, S. and Fujita Y. H. (2002). The stochastic equation of predator-prey population
dynamics. Boll. Unione Mat. Ital. Sez. B. Artic. Ric. Mat. 5, 789{804
[5] Friedlin, M. and Wentzell, A. Random Perturbations of Dynamical Systems, Springer,
New York, 1988.
[6] T. Gard, Introduction to Stochastic Difierential Equations, Marcel Dekker Inc., New
York, 1988.
[7] R.Z. Hasminskii, Stochastic Stability of Difierential Equations, Alphen aan den Rijn,
Netherlands, 1980.
[8] H. Kocak and J. Hale, Dynamics and Bifurcations, Springer-Verlag, New York, 1991.
[9] O. Kallenberg, Foundations of Modern Probability, Springer-Verlag, New York, 2002.
[10] I. Karatzas, and S. Shreve, Brownian Motion and Stochastic Calculus (Second edition),
Springer-Verlag, Berlin & New York, 1991.
[11] A. Lasota, and M. Mackey, Chaos, Fractals and Noise, Springer-Verlag, New York,
1991.
[12] B. Oksendal, Stochastic Difierential Equations (Second Edition), Springer-Verlag,
Berlin & New York, 1989.
[13] S. Saperstone, Semidynamical Systems in Inflnite Dimensional Spaces, Springer-Verlag,
New York, 1981.
[14] K. Pichr and R. Rudnicki (2000). Continuous Markov semigroups and stability of
transport equations. J. Math. Anal. Appl. 249 (2000), pp. 668685.
[15] R. Rudnicki (1995). On asymptotic stability and sweeping for Markov operators. Bull.
Polish Acad.: Math. 43 (1995), pp. 245262.
[16] Rudnicki, R. (2003). Long-time behaviour of a stochastic prey-predator model. Stoch.
Process. Appl. 108, 93{107.
[17] I. Vrabie, C0-Semigroups and Applications, Elsevier, Boston, 2003.
113