Stochastic Differential Equations: A Dynamical Systems Approach Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classifled information. Blane Jackson Hollingsworth Certiflcate of Approval: Georg Hetzer Professor Mathematics and Statistics Paul Schmidt, Chair Professor Mathematics and Statistics Ming Liao Professor Mathematics and Statistics Wenxian Shen Professor Mathematics and Statistics Joe F. Pittman Interim Dean Graduate School Stochastic Differential Equations: A Dynamical Systems Approach Blane Jackson Hollingsworth A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulflllment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 10, 2008 Stochastic Differential Equations: A Dynamical Systems Approach Blane Jackson Hollingsworth Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Vita Blane Hollingsworth was born in Huntsville, Alabama in 1976. His parents are Dianne and Sonny Hollingsworth. He attended the University of Alabama in Huntsville from 1994 to 2000, receiving both his B.S. and M.A. degrees in mathematics. In fall of 2000, he entered the Ph.D. program at Auburn University. iv Dissertation Abstract Stochastic Differential Equations: A Dynamical Systems Approach Blane Jackson Hollingsworth Doctor of Philosophy, May 10, 2008 (B.S., University of Alabama in Huntsville, 1998) (M.A., University of Alabama in Huntsville, 2000) 121 Typed Pages Directed by Paul Schmidt The relatively new subject of stochastic difierential equations has increasing impor- tance in both theory and applications. The subject draws upon two main sources, prob- ability/stochastic processes and difierential equations/dynamical systems. There exists a signiflcant \culture gap" between the corresponding research communities. The objec- tive of the dissertation project is to present a concise yet mostly self-contained theory of stochastic difierential equations from the difierential equations/dynamical systems point of view, primarily incorporating semigroup theory and functional analysis techniques to study the solutions. Prerequisites from probability/stochastic processes are developed as needed. For continuous-time stochastic processes whose random variables are (Lebesgue) absolutely continuous, the Fokker-Planck equation is employed to study the evolution of the densities, with applications to predator-prey models with noisy coe?cients. v Acknowledgments No one deserves more thanks than Dr. Paul Schmidt for his patience and guidance throughout this endeavor. Dr. Georg Hetzer, Dr. Ming Liao, and Dr. Wenxian Shen are all deserving of thanks as well, as I have taken one or more important classes from each of them and they are members of my committee. Also, I?d like to thank Dr. Olav Kallenberg, who guided me through the chapters on stochastic difierential equations in his book during independent study. Finally, I would like to thank my parents for all their love and support. vi Style manual or journal used Journal of Approximation Theory (together with the style known as \aums"). Bibliograpy follows van Leunen?s A Handbook for Scholars. Computer software used The document preparation package TEX (speciflcally LATEX) together with the departmental style-flle aums.sty. vii Table of Contents 1 Introduction and Preliminaries 1 1.1 Stochastic Processes and Their Distributions . . . . . . . . . . . . . . . . . 1 1.2 Semigroups of Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Kernels and Semigroups of Kernels . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Conditional Expectation, Martingales, and Markov Processes . . . . . . . . 17 1.5 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 Ito Integrals and Stochastic Differential Equations 28 2.1 The Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2 Stochastic Difierential Equations and their Solutions . . . . . . . . . . . . . 41 2.3 Ito?s Formula and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3 Dynamical Systems and Stochastic Stability 60 3.1 \Stochastic Dynamical Systems" . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2 Koopman and Frobenius-Perron Operators: The Deterministic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3 Koopman and Frobenius-Perron Operators: The Stochastic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.4 Liapunov Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.5 Markov Semigroup Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.6 Long-time Behavior of a Stochastic Predator-prey Model . . . . . . . . . . . 107 Bibliography 113 viii Chapter 1 Introduction and Preliminaries 1.1 Stochastic Processes and Their Distributions Let (?;A;P) be a probability space, (S;B) a measurable space, and X an S-valued random variable on ?, that is, a mapping from ? into S that is measurable with respect to the -algebras A and B. By the distribution of X, denoted by PX, we mean the image of the probability measure P under the mapping X, that is, the probability measure on B, deflned by PX(B) := P(X 2 B) := P(X?1(B)) for B 2 B. (Here, as in the sequel, we take some liberties in our terminology. To be precise, we should of course refer to X as an (S;B)-valued random variable on (?;A) and to PX as its P-distribution.) Now let T be a non-empty set and X = (Xt)t2T a family of S-valued random variables on ?; we call X a stochastic process on ?, with state space S and index set T. Clearly, X can be thought of as a mapping from ? into the Cartesian product ST, deflned by X(!) := X! := (Xt(!))t2T for ! 2 ?. The image X! of a point ! 2 ? is called the path of !; the set ST, endowed with the product -algebra induced by B, is called the path space of X. With slight abuse of notation, we denote the product -algebra of ST by BT. Since BT is generated by the coordinate projections ?t : ST ! S, deflned by ?t(x) := xt for x 2 ST and t 2 T, and since Xt = ?t ? X for t 2 T, measurability of X with respect to the -algebras A and BT is equivalent to the measurability of Xt for every t 2 T. In other words, X is an ST-valued random variable on ?. Its distribution PX, a probability measure on BT, is called the joint distribution of the random variables Xt, t 2 T. 1 It follows from a standard uniqueness theorem of measure theory that the probability measure PX is uniquely determined by the values PX ?\ t2F ??1t (Bt) ? = P ?\ t2F X?1t (Bt) ? = P(Xt 2 Bt 8t 2 F); where F varies over the non-empty flnite subsets of T and (Bt)t2F over the corresponding flnite families of sets in B. In particular, even if T is inflnite, the distribution of the family (Xt)t2T is uniquely determined by the distributions of the \flnite subfamilies" (Xt)t2F with ; 6= F ? T flnite, that is, by the probability measures Qt1;:::;tn := P(Xt1;:::;Xtn) with n 2N and (t1;:::;tn) 2 Tn injective (that is, t1;??? ;tn are pairwise distinct); these are called the flnite joint distributions of the random variables Xt, t 2 T, or the flnite-dimensional distributions of the process X. Note that for each n 2 N and (t1;:::;tn) 2 Tn injective, Qt1;:::;tn is a probability measure on the product -algebra Bn of Sn induced by B. Clearly, if B1;:::;Bn 2 B and ? is a permutation of the set f1;:::;ng, then Qt1;:::;tn?B??1(1) ?????B??1(n)? = P?Xti 2 B??1(i) 8i 2f1;:::;ng? = P?Xt?(j) 2 Bj 8j 2f1;:::;ng? = Qt?(1);:::;t?(n)?B1 ?????Bn?: Also, if n ? 2 and Bn = S, then Qt1;:::;tn?B1 ?????Bn? = P?Xti 2 Bi 8i 2f1;:::;ng? = P?Xti 2 Bi 8i 2f1;:::;n?1g? = Qt1;:::;tn?1?B1 ?????Bn?1?: Under certain restrictions on the state space S, a theorem due to Kolmogorov ensures, roughly speaking, that any family of probability measures Qt1;:::;tn, consistent with the 2 above conditions, is in fact the family of flnite-dimensional distributions of a stochastic process on some probability space (?;A;P); recall that (S;B) is a Polish space if B is the Borel -algebra generated by a complete and separable metric topology on S. Then we have Theorem 1.1 (Kolmogorov). Suppose (S;B) is a Polish space (that is, B is the Borel - algebra generated by a complete and separable metric topology on S), T is a non-empty set, and for each n 2 N and (t1;:::;tn) 2 Tn injective, Qt1;:::;tn is a probability measure on Bn (the product -algebra of Sn, which in this case coincides with the Borel -algebra generated by the product topology of Sn). Further suppose that the following two conditions are satisfled for all n 2N, (t1;:::;tn) 2 Tn injective, and B1;:::;Bn 2B: (a) If ? is a permutation of f1;:::;ng, then Qt1;:::;tn?B??1(1) ?????B??1(n)? = Qt?(1);:::;t?(n)?B1 ?????Bn?: (b) If n ? 2 and Bn = S, then Qt1;:::;tn?B1 ?????Bn? = Qt1;:::;tn?1?B1 ?????Bn?1?: Then there exists a probability space (?;A;P), along with a family X = (Xt)t2T of S-valued random variables on ?, such that Qt1;:::;tn = P(Xt1;:::;Xtn) for all n 2Nand (t1;:::;tn) 2 Tn injective. Note that while neither the probability space (?;A;P) nor the process X are uniquely determined, the distribution PX is. We refer to [3, Section 35] for a detailed exposition of these issues and a proof of Kolmogorov?s theorem (see in particular Theorem 35.3 and 3 Corollary 35.4 ibidem). For most purposes, the distribution of a stochastic process is much more important than the process itself. This implies, of course, that both have the same state space and index set, but the underlying probability spaces may be difierent. Deflnition 1.1. Two processes are called equivalent if both have the same distribution. Deflnition 1.2. Two processes X = (Xt)t2T and Y = (Yt)t2T over the same probability space (?;A;P), with the same state space and index set, are called modiflcations of each other if P(St2T(Xt 6= Yt)) = 0. It is easily verifled that any two processes that are modiflcations of each other have the same flnite-dimensional distributions and are thus equivalent. Now suppose that X is a stochastic process over (?;A;P), with state space (S;B), index set T, and distribution Q = PX. Then X is equivalent to the process ? := (?t)t2T on (ST;BT;Q). To see this, note that the coordinate projections ?t, for t 2 T, are S-valued random variables on ST and that ?, as a mapping from ST into ST, is the identity map: ?(!) := ?! := (?t(!))t2T = (!t)t2T = ! for all ! 2 ST; hence, Q? = Q. Deflnition 1.3. The process ? as above is called the canonical process with distribution Q. We can think of the canonical process as the standard representative of the equivalence class of stochastic processes with distribution Q. Using this terminology, the assertion of (1.1) may be stated as follows: There exists a unique probability measure Q on BT such that the given probability measures Qt1;:::;tn coincide with the flnite-dimensional distributions of the canonical process ? with distribution Q. In the following we assume that the state space (S;B) is Polish (as in (1.1)) and that X is a so-called continuous-time process, that is, the index set T is R+. For each ! 2 ?, 4 the path X! is then a curve in S, parametrized with t 2R+. If the curve X! is continuous for every (or P-almost every) ! 2 ?, we say that X has continuous (or almost surely continuous) paths. If X has almost surely continuous paths, an obvious and inconsequential modiflcation of the underlying probability space will turn X into a process with continuous paths. Also, any process with almost surely continuous paths admits a modiflcation with continuous paths. Saying that X has continuous paths is equivalent to saying that X maps ? into the subspace C := C(R+;S) of SR+, that is, the subspace of continuous mappings from R+ into S. This space is, in general, not measurable as a subset of SR+; in fact, C =2 BR+ unless S is a singleton (see [3, Corollary 38.5]). However, C is a Polish space under the topology of uniform convergence on compact subsets of R+, and the trace -algebra C \BR+ := fC \BjB 2BR+g coincides with the Borel -algebra generated by this topol- ogy. Also, C inherits a topology from SR+ (the product topology, which coincides with the topology of pointwise convergence on R+), and the trace -algebra C \BR+ coincides with the Borel -algebra generated by that topology as well (see [3, Theorem 38.6]). Now suppose that X is a continuous-time process with Polish state space (S;B) and distribution Q and that X is equivalent to a process with continuous paths. Then X is in fact equivalent to the process e? := (?tjC)t2R+ on (C;C \BR+; eQ), where C := C(R+;S) and eQ is deflned by eQ(C\B) := Q(B) for B 2BR+. That eQ is well deflned as a probability measure on the trace -algebra C \BR+ follows from the (non-trivial) fact that Q(B) = 1 for all B 2BR+ with B C (in other words, C has Q-outer measure 1). For the proof, we refer to [3, Sections 38{39], in particular Theorems 38.2{3 and Lemma 39.2 ibidem. To see that X and e? are equivalent, observe that e?, as a mapping from C into SR+, is simply the 5 restrictionoftheidentitymapofSR+ toC. Thus, eQe?(B) = eQ(e??1(B)) = eQ(C\B) = Q(B) for all B 2BR+; that is, eQe? = Q. Deflnition 1.4. The process e? as above is called the C-canonical process with distribution Q. Whenever an equivalence class of continuous-time processes with Polish state space contains a process with continuous paths, we think of the associated C-canonical process e? (rather than the canonical process ?) as the standard representative of the equivalence class. In the next two sections, we discuss semigroups, which will be used in the short term to prescribe a family of measures that satisfles Kolmogorov?s theorem and hence allows us to construct Brownian motion. 6 1.2 Semigroups of Linear Operators Let X be a Banach space. A family T := (Tt) = (Tt)t2R+ of bounded linear operators Tt : X ! X is called a semigroup of linear operators (or, more simply, a semigroup) if T0 = idX and Tt+s = TtSs for all t;s 2R+. If limt!0+ k x?Ttx k= 0 for all x 2 X, then we say T is strongly continuous. The inflnitesimal generator (or, more simply, the generator) of a strongly continuous semigroup T is the operator A : D(A) ? X ! X deflned by Ax := lim t!0+ Ttx?x t for all x 2 D(A), the set of x 2 X such that the limit exists. We say a semigroup T is a -contraction semigroup if, for some nonnegative constant , k Tt k? e t for all t ? 0, where k Tt k is the operator norm of Tt. We say T is a contraction semigroup if = 0. Call Cb(Rn;R) the space of bounded, continuous functions mapping Rn into R, and call C0(Rn;R), the subset of Cb(Rn;R) such that limjxj!1f(x) = 0. Equip C0(Rn;R) with the sup norm to make it a Banach space. Deflnition 1.5. A contraction semigroup of linear operators T on C0(Rn;R) is called a Feller semigroup if 1. for every t ? 0, Tt maps C0(Rn;R) into itself, and 2. limt!0 Ttf(x) = f(x) for all f 2 C0(Rn;R) and x 2Rn. It can be shown ([9, Theorem 19.6]) that Feller semigroups are strongly continuous. Strong continuity is quite valuable due to the following theorem (see [17, Theorem 2.3.2]): 7 Theorem 1.2. Any strongly continuous semigroup (Gt) with inflnitesimal generator A has the property that, for any x 2 D(A), Gtx 2 D(A) for all t 2R+, t 7! Gtx is C1, and d dt(Gtx) = AGtx = GtAx: In another way, u : t 7! Ttx solves the initial value problem _u = Au, u(0) = x. So, formally, u(t) should be of the form etA, that is, Ttx = etAx or Tt = etA. We would like to have a way to guarantee that a given operator A, generally unbounded, indeed will be the generator of a strongly continuous semigroup. Let X be a normed linear space, A : D(A) ? X ! X a linear operator. Consider the equation Ax = y: To guarantee the existence and uniqueness of a solution x 2 D(A), for every y 2 X, and the continuous dependence of x on y, the operator A must be one-to-one and onto, with a bounded inverse A?1. Assuming X to be complete and A to be closed, the latter is automatic, by the open-mapping theorem. More generally, consider the equation (A??I)x = y; where I := idX and ? 2C. Existence, uniqueness, and continuous dependence are guaran- teed if ? belongs to the resolvent set of A, as deflned below. 8 Deflnition 1.6. For a Banach space X and a closed linear operator A : D(A) ? X ! X, deflne ?(A), the resolvent set of A, by ?(A) := f? 2 C j A??I is one-to-one and ontog. Then deflne R(?;A), the resolvent of A, by R(?;A) := (A??I)?1. Theorem 1.3 (Hille-Yosida). For a Banach space X, a closed, densely deflned linear oper- ator A : D(A) ? X ! X is the inflnitesimal generator of a strongly continuous semigroup of contractions if and only if 1. (0;1) ? ?(A), and 2. for each ? > 0, k R(?;A) k? 1?. We refer to [17, pp. 51-56] for the proof. In the next section, we present a discussion of semigroups of kernels with the construc- tion of Brownian motion in mind. 9 1.3 Kernels and Semigroups of Kernels Let (?1;A1) and (?2;A2) be given measurable spaces. Deflnition 1.7. A function k : ?1 ?A2 !R+ with the properties 1. k(?;A2) is A1-measurable for all A2 2A2, 2. k(!;?) is a (probability) measure on A2 for all ! 2 ?1, is called a (probability) kernel from (?1;A1) to (?2;A2). We also call a probability kernel a Markov kernel or say that the kernel is Markovian. Further, if (?1;A1) equals (?2;A2), we call k a kernel on (?1;A1), or simply a kernel on ?1. Let us establish some notation here; call B the Borel -algebra onR, call B+ the Borel -algebra onR+, and callBn the Borel -algebra onRn for any n 2N. Call ?n the Lebesgue measure on (Rn;Bn); we may simply call ? := ?n when n is understood. Given x 2Rn, call ?x the point mass at x, that is, the measure that satisfles ?x(A) = 1 if x 2 A and ?x(A) = 0 else, for A 2Bn. Let (?;A;?) be a -flnite measure space and let M(?;A) denote the space of all R- valued functions on ? that are measurable with respect to A and B. For p 2 [1;1), let Lp(?;A;?) denote the space of functions f belonging toM(?;A) such thatjfjp is integrable (with respect to ?); call L := L1. Let L1(?;A;?) denote the space of functions f belonging to M(?;A) such that the essential supremum of jfj is flnite. When the associated -algebra and measure are understood, we may abbreviate Lp(?) and L1(?) for Lp(?;A;?) and L1(?;A;?), respectively; we frequently understand R+?? to have -algebra B+?A and measure ?+ ?? (where ?+ is Lebesgue measure on (R+;B+). 10 Given f and g in M(?;A), we say that f is equivalent to g (with respect to ?) if ?(f 6= g) = 0. Note that if f is integrable and equivalent to g, then g is also integrable. Denote the equivalence classes of M(?;A) and Lp(?;A;?) by M(?;A;?) and Lp(?;A;?), respectively; we frequently \identify" an equivalence class with an arbitrary member. Also, if f 2M(?;A) and f is nonnegative, we say f 2M+(?;A); we give the analogous meaning to Lp+;M+; and Lp+. Now, a kernel k from (?1;A1) to (?2;A2) determines a mapping K of M+(?2;A2) into M+(?1;A1), deflned by (Kf2)(!1) := Z f2(!2)k(!1;d!2); for !1 2 ?1 and f2 2 M+(?2;A2). Let us refer to K as the integral operator associated with k. Note that for any A2 2A2, K1A2 = k(?;A2). In particular, K1?2 = 1?1 if and only if k is Markovian. Kernels may be composed in the following way: for i = 1;2, let ki be a kernel from (?i;Ai) to (?i+1;Ai+1). We may deflne the composition k1k2 in terms of the composition of the associated integral operators K1 and K2: (k1k2)(?;A3) := K1K21A3: Then k1k2 is a kernel from (?1;A1) to (?3;A3), and we have (k1k2)(!1;A3) = Z k1(!1;d!2)k2(!2;A3); 11 for all !1 2 ?1 and A3 2A3. Observe that if k1;k2 are Markovian then so is k1k2. We need the composition of kernels to deflne semigroups of kernels. Deflnition1.8. If (Pt) is a family of kernels on a measurable space (S;B) and if Ps+t = PsPt for all s;t ? 0, then we say (Pt) := (Pt)t2R+ is a semigroup of kernels on S. We remark that a semigroup of kernels satisfles Ps+t(x;B) = R Ps(x;dy)Pt(y;B) for x 2 S;B 2B, often called the Chapman-Kolmogorov property. Deflnition 1.9. A semigroup of kernels (Pt) is called normal if P0(x;?) = ?x for all x 2 S. We call (Pt) Markovian if each kernel Pt is Markovian. Now, let (Pt) be a semigroup of kernels on (Rn;Bn). Deflnition 1.10. (Pt) is called translation-invariant if Pt(x;B) = Pt(x+z;B +z) for all x;z 2Rn;t ? 0; and B 2Bn. It can be shown that translation-invariant semigroups of kernels must be normal (see [3, 29.7 and p.311]). The following proposition demonstrates the importance of these semi- groups. Proposition 1.1. Given a translation-invariant (Pt) as above, 1. deflne T := (Tt)t?0 by Ttf = Z f(y)Pt(?;dy); (1.1) for any t 2 R+ and f 2 L1(Rn). Then T is a contraction semigroup of linear operators on L1(Rn); 12 2. deflne (?t) := (?t)t?0 by ?t(B) := Pt(0;B) for all B 2Bn. Then (?t) is a convolution semigroup of measures on Bn, that is, for all s;t ? 0, for all B 2Bn, (?t) satisfles ?s+t(B) = Z ?s(dy)?t(B ?y) = (?s ??t)(B): Proof. The second claim is simple, so we only address the flrst claim (which in fact does not require translation invariance of (Pt)). It remains to note that T is indeed a semigroup; (Pt) satisfles the Chapman-Kolmogorov property and T0 is the identity mapping since (Pt) is normal: T0f(x) = Z f(y)P0(x;dy) = Z f(y)?x(dy) = f(x): Note that kTtfk?kfk since (Pt) is Markovian, so that T is a contraction semigroup. Conversely, given a convolution semigroup of measures (?t) on Bn, if we deflne Pt(x;B) := ?t(B ?x) for all t ? 0;x 2 Rn;B 2 Bn, then (Pt) is a translation-invariant semigroup of kernels on Rn ([3, pp. 310-311]). Notice that (Pt) is a translation-invariant Markov semigroup ifi (?t) is a convolution semigroup of probability measures. At this point, an intuitive interpretation of a translation-invariant Markov semigroup is helpful. Think of Pt(x;B) as the probability that a randomly moving particle starting at x at time 0 is in the set B at time t. We see that the semigroup property means there is no memory, in the sense that we need not understand the history of the particle?s movement, rather, we only need to know where it is at time t to yield the probability that it is in some set at time t + s. Thinking that the particle is \in dy" at time t, we can see from the Chapman-Kolmogorov property that Pt+s(x;B) = R Pt(x;dy)Ps(y;B), or that the probability a particle is in a set B at time t+s can be obtained from Pt(x;dy) (which we 13 think of as the \present") and Ps(y;B) (which is the probability that the particle starting at y ends up in B at time s). This semigroup reasoning is similar to concepts in deterministic dynamical systems, which will be discussed later. Armed with this intuitive understanding of translation-invariant Markovian semigroups of kernels, we realize the next step: that translation-invariant Markovian semigroups of ker- nels lead to measures which satisfy the hypotheses of Kolmogorov?s theorem, and hence, lead totheconstructionofstochasticprocesses(inparticular, Brownianmotion)whichwillmodel random particle motion in a natural way. The idea is, if we take times t1 < t2 < ??? < tk and sets B1;B2;??? ;Bk in Bn, we may construct the iterated integral Z B1 Z B2 ??? Z Bk Ptk?tk?1(xk?1;dxk)Ptk?1?tk?2(xk?2;dxk?1)???Pt1(x0;dx1): (1.2) For a particle starting at x0, this integral models random particle motion without memory, in the sense that it gives the probability that at times t1;t2;??? ;tk, the particle is found successively in B1;B2;??? ;Bk. We could even, by tacking on another integral in (1.2), impose that the particle?s initial location is random; let ? be a probability measure on Bn that describes the distribution of the initial location of the particle. Then we would integrate over Rn with respect to ? over the variable x0: Z Rn Z B1 ??? Z Bk Ptk?tk?1(xk?1;dxk)???Pt1(x0;dx1)?(dx0): (1.3) 14 Then it can be shown [3, 36.4] that given (Pt) and ? as above, for x := (x1;x2;??? ;xk) and for any B 2Nki=1Bi (where Bi = Bn for all 1 ? i ? k), the measures Pt1;t2;???;tk, deflned by Pt1;t2;???;tk(B) := Z Rn Z B1 ??? Z Bk 1B(x)Ptk?tk?1(xk?1;dxk)???Pt1(x0;dx1)?(dx0); (1.4) for B 2 Bn, satisfy the hypotheses of Kolmogorov?s theorem. The family of measures in (1.4) are thus the flnite-dimensional distributions of some stochastic process with state space Rn. The canonical process X associated with this stochastic process has a distribution which depends only on (Pt) and ?, so let us denote this distribution by P?. This means P?(Xt1 2 B1;Xt2 2 B2;??? ;Xtk 2 Bk) = Z Rn Z B1 Z B2 ??? Z Bk Ptk?tk?1(xk?1;dxk)???Pt0(x0;dx1)?(dx0) holds for all B1;B2;??? ;Bk in Bn. Also, the P?-distribution of X0 is ?, and so we may refer to ? as the initial distribution of the process. Processes constructed as in the above enjoy some useful and intuitive properties. Deflnition 1.11. A process X with state space (Rn;Bn) has stationary increments if there is a family of probability measures (?t) on Bn such that ?t?s = PXt?Xs; this means that the distribution of Xt ?Xs depends only on t?s. Deflnition 1.12. A process X with state space (Rn;Bn) has independent increments if Xt0;Xt1 ?Xt0;??? ;Xtk ?Xtk?1 are all independent for any t0;t1;:::;tk 2R+ with t0 < t1 < ::: < tk, for any k ? 1. 15 It can be shown ([3, 37.2]) that the canonical process derived from a translation- invariant Markov semigroup of kernels (Pt) and initial distribution ? has stationary and independent increments. In the next section, we will explain conditional probability, martingales, and Markov processes, and then we will be able to prescribe a particular (Pt) so that we can construct Brownian motion. 16 1.4 Conditional Expectation, Martingales, and Markov Processes Let (?;A;?) be a -flnite measure space, f 2 L(?;A;?). Then for any A 2 A, we deflne ?f(A) := Z A fd?: We say that ?f is the signed measure that has density f with respect to ?; this implies the relation Z gd?f = Z gfd?; for all g 2L(?;A;?f). Further, note that g 2L(?;A;?f) ifi gf 2L(?;A;?). Finally, note that ?f is a flnite signed measure on (?;A) that is absolutely continuous with respect to ?, that is, ?f(A) = 0 whenever A 2A and ?(A) = 0. Conversely, given any flnite signed measure ? on (?;A) that is absolutely continuous with respect to ?, there exists by the Radon-Nikodym theorem a function f 2 L(?;A;?), unique up to modiflcation on a ?-null set, such that ? = ?f. The equivalence class of all f such that ? = ?f is called the Radon-Nikodym derivative of ? with respect to ? and is denoted d?d?. Note that if f is any representative of d?d?, we have R gd? = R gfd? for all g 2 L(?;A;?), or (formally) d? = fd?; we frequently \identify" d?d? with an arbitrary representative. This justifles the \difierential" notation d?d? for Radon-Nikodym derivatives. We also have a number of rules for Radon-Nikodym derivatives that are reminiscent of the rules of difierential calculus, for example, the chain rule: if ?1 is a flnite signed measure on (?;A), if ?2 and ?3 are flnite measures on (?;A), if ?1 is absolutely continuous with respect to ?2, and if ?2 is absolutely continuous with respect to ?3, then ?1 is absolutely 17 continuous with respect to ?3, and d?1 d?3 = d?1 d?2 d?2 d?3: In particular, d?2d?3 d?3d?2 = 1 when ?2 and ?3 are both absolutely continuous with respect to each other. Now, let (?;A;P) be a probability space, let ? 2 L(?;A;P), and let F ? A be a -algebra. Then P?, the signed measure that has density ? with respect to P, restricts to a measure on F, namely, P?jF, which is absolutely continuous with respect to PjF, the restriction of P to F. This leads to the deflnition of conditional expectation. Deflnition 1.13. The conditional expectation of ? given F, denoted EF? or E(?jF), is the Radon-Nikodym derivative of P?jF with respect to PjF. Note that E(?jF) is the unique member of L(?;F;PjF) such that Z F E(?jF)dP = Z F ?dP; for all F 2F. Deflnition 1.14. The expected value of ?, denoted by E?, is deflned as E? := Z ? ?dP: The conditional expectation of ? given an event A 2 A with P(A) > 0, denoted by E(?jA), is deflned as E(?jA) := 1P(A) Z A ?dP: 18 It is helpful to consider examples. We see if F is the -algebra induced by ?, or if ? has an F-measurable version, then EF? = ? P-a.s.; in this case we have the \pull out" property EF?? = ?EF? P-a.s., for any ? 2 L(?). If instead F = f;;?g, or if ? is independent of F, then EF? = E? P-a.s. Along these lines, for A 2 A such that 0 < P(A) < 1, if we take F = f;;A;Ac;?g, then EF? = E(?jA)1A + E(?jAc)1Ac P-a.s. Also, if A 2 F;P(A) > 0, and A has no proper nonempty subset belonging to F, then EF?jA = E(?jA) P-a.s. We use conditional expectation to deflne conditional probability; observe that E1A = R? 1AdP = P(A). Deflnition 1.15. The conditional probability given F, denoted PF, is deflned by PF(A) := EF(1A); for all A 2A. Note that PF is not a probability measure, rather, it maps members of A intoR-valued random variables on ?, with the property that RF PF(A)dP = P(A\F), for all F 2F. We will use conditional expectation to deflne martingales, but flrst we need some deflnitions. Deflnition 1.16. Given a measurable space (?;A), a family of -algebras F := fFtgt?0 such that Fs ?Ft for s ? t with Ft ?A for all t ? 0 is called a flltration of A. For simplicity we usually just call F a flltration. Now let X be a continuous-time stochastic process on (?;A) with state space (S;B). Deflnition 1.17. We call (Xsjs ? t) the -algebra generated by (Xs)s?t, that is, the smallest -algebra that contains X?1s (B) for every B 2Bn and s ? t. 19 We call F(X) := (Ft(X))t2R+ the flltration generated (or induced) by X, where Ft(X) = (Xsjs ? t) for each t. We say X is adapted to a flltration F if Xt is Ft-measurable for all t. Observe that F(X) is the smallest flltration for which X is adapted. If F is understood we may simply say that X is adapted. We see that flltrations add more and more sets (or at least, no less sets) as time increases; by increasing the size of a -algebra, the potential for the process to take new values is increased. For example, a measurable R-valued function on ? that has only one value only generates the trivial -algebra (`;?). A measurable function taking two values, say, f(!) = 1 when ! 2 A and f(!) = 0 when ! 2 Ac, generates the -algebraf`;A;Ac;?g. Thus the increasing in the flltration describes the \increase of randomness," and the size of Ft is indicative of the possible deviation of Xt from its expected value. Now let X have state space (R;B). Deflnition 1.18. We say X is an integrable process, or simply, X is integrable, if Xt is an integrable random variable for each t. Given a flltration F, we say X is a martingale with respect to F if X is an integrable, adapted process that satisfles, P-a.s., Xs = E(XtjFs); for every t, for s ? t. 20 For an example of a martingale, flx ? 2 L(?;A;P) and a flltration F. Deflne a continuous-time process M by Mt = E(?jFt); for every t. Then M is integrable and adapted to F. For s ? t, we have Fs ? Ft, so EFsEFt? = EFs? P-a.s. Thus we have the P-a.s. relation Ms = E(?jFs) = E(E(?jFt)jFs) = E(MtjFs): Thus, M is a martingale. Intuitively, martingales are \fair games" in the sense that the expected value of \winnings" at a later time are exactly the value of \winnings" at present. Next, let F1;F2; and G be sub -algebras of A. Deflnition 1.19. The -algebras F1 and F2 are called conditionally independent given G, denoted F1 ?G F2, if a.s., PG(F1 \F2) = PG(F1)PG(F2) for all F1 2F1, F2 2F2. We now deflne Markov processes. Deflnition 1.20. For X a continuous-time process on (?;A) and a flltration F of A, we call X a Markov process if it is adapted to F and if for all s;t 2 R+ with s ? t, Fs and (Xt) are conditionally independent given (Xs). Intuitively, for Markov processes one may think that the past is independent of the future given the present, in the sense that knowing the state Xs makes the future predictions Xt independent of the \history" Fs. 21 Markov processes are precisely those processes which are generated by translation- invariant Markovian semigroups of kernels (with respect to the induced flltration; the non- trivial proof can be found in [3, Theorem 42.3]). Since the semigroup property is essential both to dynamical systems and the construction of a Markov process, one can interpret a Markov process as a randomized dynamical system. As we will see, Markov processes are of value in understanding the dynamics generated by solutions of stochastic difierential equations (much like the dynamics of deterministic difierential equations). In the next section, we will motivate the need for Brownian motion and prescribe a special translation-invariant Markovian semigroup of kernels in order to construct it. We will further prove some useful properties of Brownian motion. 22 1.5 Brownian Motion We will now proceed to prescribe the speciflc Markov semigroup of kernels (Pt) to construct Brownian motion. We will flrst motivate our selection of (Pt) by returning to our intuition of how particles undergo random motion. Consider the \drunken sailor" problem, where a drunken sailor stands on the origin 0 and starts taking unit length steps in random directions. After each step, he randomly steps in a difierent direction. The question is, \Where does he end up after n steps?" The obvious answer is that we do not know; his position is described by a random variable. He is expected to be where he started, as he has the same chance of going left as right, or forward as backward. But the variance depends directly on the number of steps; he cannot stray far in a short number of steps, for example, so one could expect a low variance in this case. So what is the distribution of this random variable? The key is the Central Limit Theorem; one fairly simple version is in [9, Proposition 5.9], which says that for independent, identically distributed Rd-valued random variables ?;?1;?2;::: with E? = 0 and E?2 = 1; then as n ! 1, n?12 Pk?n ?k converges in distribu- tion to a standard normally distributed random variable ?, that is, a normally distributed random variable with mean 0 and variance 1. We may say for brevity that ? is N(0;1). So, in the drunken sailor problem, the random variable describing where a sailor will end up after a large enough number of steps is normal with mean 0 and variance n (see e.g. [3, p.221-p.226]). Now, one can think of n as time moving continuously rather than as a discrete number of steps; call it t now. So, imagine a continuous-time stochastic process X having initial distribution ?x. This represents the initial location of a particle at x known with probability 23 1, where the densities (assuming they exist) of Xt as time increases \ atten" into a Gaussian curve, successively getting \more at" the more time increases. Imagine now that the sailor is not drunk, but in a heavy crowd, so that he is being pushed around in a random direction. This is essentially the same problem, but it makes more sense in a physical interpretation; particles are interacting with other particles, being bumped into other particles which in turn bump into other particles ad inflnitum. This model of particle movement is called Brownian motion, and it is a stochastic process where, at time t, each random variable Bt has distribution N(0;t). This type of random interference can be thought to perturb a trajectory as well, not just a stationary object. For example, if a ball is thrown, one can model its path. But now suppose there is lots of wind blowing in random directions; where does the ball go? To describe this, we incorporate a \noise term" in the difierential equation. Quite sensibly, this term should somehow be based on Brownian motion, which changes the otherwise deterministic trajectory of the ball into a continuous-time stochastic process. Recall that N(m;t) as a probability measure over (R;B) has (Lebesgue) density gm;t(x) := ( 12?t)12e?(x?m) 2 2t ; and observe that N(0;s)?N(0;t) = N(0;s+t). Deflne the Brownian convolution semigroup of measures (?t) onRd by setting ?t equal to the product measure (d-many times) of N(0;t) in R for each t, that is, ?t := Ndi=1 N(0;t). Then we can deflne the translation-invariant Markov semigroup of kernels (Pt) by Pt(x;A) := Z A ( 12?t)12e?(y?x) 2 2t dy; 24 for any t 2R+, x 2Rd;A 2 B. After deflning our initial condition ? := ?0 := ?x, we may construct a process ~B as in Section 1.3. If we write ~B = ( ~B1; ~B2;??? ; ~Bd); then ~Bi and ~Bj are independent for all i 6= j (see [9, Lemma 3.10]). We now prove ~B has a continuous version using the following result from [3, Theorem 39.3]. Theorem 1.4 (Continuous Paths). For a continuous-time stochastic process X on (?;A) with state space (Rd;Bd), if for some positive constants a;b; and C, the inequality E(jXt ?Xsja) ? Cjt?sjb+1 (1.5) holds for all s;t 2R+, then X has a continuous version. We use the following lemma to verify (1.5) for ~B. Lemma 1.1. For ~B as above, E(j~Bt ? ~Bsj4) = d(d+2)(t?s)2; (1.6) Proof. This claim follows from the property that ~Bt?s is equal in distribution to (t?s)12 ~B1 (called the scaling property) and the following recursion for N(0;1)-distributed R-valued random variables ? on (?;A) (which is easy to prove using integration by parts; see [3, 4.20]): Z R x2ng0;1dx = E(?2n) = (2n?1)E(?2n?2); (1.7) 25 for any n 2 R. Now, to prove (1.6), we see that, for ( ~Bt ? ~Bs)i the i-th component of ~Bt ? ~Bs, and for 1 ? i ? d, E(j~Bt ? ~Bsj)4 = E([( ~Bt ? ~Bs)21 +( ~Bt ? ~Bs)22 +???+( ~Bt ? ~B)2d]2); and by stationarity and scaling, the above equals (t?s)2E([Z21 +Z22 +???+Z2d]2); where Z := (Z1;Z2;??? ;Zd) is an Rd-valued N(0;1)-distributed random variable. By alge- bra and the independence, the above equals (t?s)2[ dX i=1 E(Z4i )+ Y i6=j 2E(Z2i )E(Z2j)]: (1.8) Now, E(Zi)2 = 1 for all 1 ? i ? n, and by the recursion (1.7), E(Zi)4 = 3E(Zi)2, so (1.8) becomes (3d+d(d?1))(t?s)2, which is d(d+2)(t?s)2, so (1.6) holds. So, it is a simple corollary to select a = 4;b = 1;C = d(d+2) and thus satisfy (1.5), so we indeed have a continuous modiflcation B of ~B. Deflnition 1.21. B as above is deflned to be a Brownian motion. B is unique (up to equivalence to another C-canonical process); in another way, we may interpret Brownian motion to be a probability measure PB0 (called Wiener measure) on the path space (C(R+;Rn);B(C(R+;Rn)). By construction, Brownian motion is a Markov process; it is easy to see that one- dimensional Brownian motion is also a martingale (with respect to the induced flltration) 26 since, a.s., E(BtjFs) = E(Bt +Bs ?BsjFs) = E(BsjFs)+E(Bt ?BsjFs) = Bs +E(Bt ?Bs) = Bs; since B has independent increments, E(Bt ?Bs) = E(Bt)?E(Bs) = 0, and EF(X) = X a.s. when X is an F-measurable random variable. This means B has stationary increments as well, as we argued in the section on kernels. Since ?0 = ?x, we sometimes write Bx instead of B to emphasize the starting point, and hence we sometimes refer to Bx as a Brownian motion starting at x; if otherwise not stated, we assume the Brownian motion starts at zero. Now we observe that the variance of Bxt ?Bxs := Bs ?Bt is t?s. This is because var(Bt ?Bs) = var(Bt?s ?x) = E[(Bt?s ?x)2]?E(Bt?s ?x)2 = E[B2t?s ?xBt?s +x2]?0 = E[B2t?s]?x2 +x2 = t?s; since Bt has variance t for any t. In the next chapter, we will see how to integrate with respect to a Brownian motion; this will prove essential to the deflnition of a stochastic difierential equation. 27 Chapter 2 Ito Integrals and Stochastic Differential Equations 2.1 The Ito Integral Let (?;A;P) be a probability space and X a continuous-time, real-valued stochastic process on ?. Assuming that the paths of X are difierentiable, we can deflne the time- derivative _X of X by _X(t;!) := d dtX !(t); for t 2R+ and ! 2 ?. It is easy to see that the mappings _Xt = _X(t;?) are measurable for all t 2R+, so that _X is a stochastic process. Unfortunately, difierentiability of the paths is a very restrictive assumption. For example, the paths of a one-dimensional Brownian motion B on ? are continuous but nowhere difierentiable [10, Theorem 2.9.18]. Thus, the time- derivative _B of B, frequently referred to as \white noise," does not exist in the na??ve sense. Nevertheless, \white noise" plays an important role in the theory of stochastic difierential equations. By way of motivation, consider a simple scalar ODE, _x = r(x)x for a function x : R+ ! R, where r : R ! R is a given, su?ciently smooth function. We can interpret x(t), for t 2R+, as the density of a population at time t, in which case r(x) represents the per-capita growth rate of the population as a function of its density. The growth rate of any real population is subject to random uctuations; to model these, we 28 would like to add \white noise" to the function r. On a purely formal level, this idea leads to a \stochastic difierential equation" of the form _X = ?r(X)+W?X; (2.1) where W = _B is the (formal) time-derivative of a Brownian motion B. The \solutions" of this \stochastic difierential equation" should, of course, be continuous-time stochastic processes X rather than functions x : R+ ! R. Since already the \antiderivatives" of W (one-dimensional Brownian motions) have nowhere difierentiable paths, we cannot hope to flnd stochastic processes X that satisfy (2.1) in the na??ve sense, that is, d dtX !(t) = ?r(X!(t))+W!(t)?X!(t) (2.2) for all t 2R+ and ! 2 ?, where (?;A;P) is the underlying probability space; instead, we have to develop a notion of \weak" or \generalized" solutions of (2.1). The flrst step, still on a purely formal level, is to rewrite (2.2) as an \integral equation," X!(t) = X!(0)+ Z t 0 ?r(X!(s))+W!(s)?X!(s)ds; (2.3) for t 2R+ and ! 2 ?. The most problematic term in (2.3) is, of course, the one involving W (the formal time-derivative of B). This raises the question of how to make sense of integrals of the form Rt0 X!(s)W!(s)ds, where t 2R+, W = _B, for some one-dimensional Brownian motion B on ?, and X is a continuous-time, real-valued process on ?. Note that, 29 formally, Z t 0 X!(s)W!(s)ds = Z t 0 X!(s) _B!(s)ds = Z t 0 X!(s)dB!(s) for all t 2 R+ and ! 2 ?. The integral on the right appears to be a Riemann-Stieltjes integral involving the real-valued functions X! and B!, but unfortunately, the paths of B are not of bounded variation on compact subintervals ofR+ [9, Corollary 13.10]. Thus, the integral does not exist, in general, in the classical Riemann-Stieltjes sense, no matter what assumptions we make about the process X. Nevertheless, it is possible to rigorously deflne the integral (ItX)(!) = Z t 0 X!(s)dB!(s); for t 2R+, ! 2 ?, and a reasonably large class of continuous-time, real-valued processes X on ?, in such a way that ItX is measurable for every t 2R+. The process Y := (ItX)t2R+ then qualifles as a weak or generalized antiderivative of WX (that is, a solution of the \stochastic difierential equation" _Y = WX). In fact, there are several ways of doing this. Our deflnition will be based on the use of left Riemann-Stieltjes sums and leads to the so-called Ito integral. Other choices are possible; for example, the use of mid-point Riemann-Stieltjes sums leads to the so-called Stratonovich integral [10, p. 350]. For all of the following, suppose that B is a one-dimensional Brownian motion on ? and that X is a continuous-time, real-valued stochastic process on ?. Also, suppose that X is adapted to the flltration F(B). As discussed in the section on conditional probabilities, this has the interpretation that the random variable Xt, for t 2R+, is \no more random" than the Brownian motion B up to time t, certainly a reasonable assumption if we think of X as the solution of a \stochastic difierential equation" whose randomness is produced by 30 B. (For now, we ignore the efiect of a \random initial condition" X0 = X0 a.s., where X0 is a given random variable on ?, on the solution X.) Now flx a; b 2R+ with a < b. We wish to deflne the Ito integral (Ia;bX)(!) = Z b a X!(t)dB!(t); for ! 2 ?, under suitable additional assumptions on X. To that end, endow the interval [a;b] with the Borel -algebra B[a;b] and the Lebesgue-Borel measure ?[a;b]. The Cartesian product [a;b]?? is then naturally endowed with the product -algebra B[a;b] ?A and the product measure ?[a;b]?P. Given any flltration F of A, let LpF([a;b]??), for p ? 1, denote the set of all (equivalence classes of) F-adapted functions in Lp([a;b]??) (that is, functions Y 2 Lp([a;b]??) such that Yt = Y(t;?) is Ft-measurable for every t 2 [a;b]); we are most interested in L2F([a;b]??). Lemma 2.1. For any flltration F of A, L2F([a;b]??) is a closed linear subspace of L2([a;b]??). Proof. That L2F([a;b]??) is closed is the only nonobvious part; to see this let (Yn)n2N 2 L2F([a;b]??)N, let Y 2 L2([a;b]??), and let Yn ! Y in L2([a;b]??). Then there is a subsequence (Ykn)n2N of (Yn)n2N that converges to Y pointwise almost everywhere. Modifying the functions Ykn on a set of measure zero if necessary, we may assume that Ykn(t;!) ! Y(t;!) for all t 2 [a;b] and ! 2 ?. But then, for every t 2 [a;b], Y(t;?) is the pointwise limit of the Ft-measurable functions Ykn(t;?), and thus, Ft-measurable; that is, Y 2 L2F([a;b]??). 31 Deflnition 2.1. We call a measurable function Y on [a;b]?? simple if it can be written in the form Y(t;!) = nX j=1 Yj(!)1[tj?1;tj)(t); for t 2 [a;b] and ! 2 ?, where n is a positive integer, (tj)nj=0 is a partition of the interval [a;b], and (Yj)nj=1 is a sequence of measurable functions on ?. Given a flltration F of A, such Y will belong to L2F([a;b]??) if and only if Yj is square- integrable and Ftj?1-measurable for every j 2f1;:::;ng. The set of all simple functions in L2F([a;b]??) can be shown to be dense in L2F([a;b]??) [12, pp. 18-20]. It is obvious how to deflne the Ito integral Ia;bX if X 2 L2F([a;b]??) is simple; given a representation of the form X(t;!) = Pnj=1 Xj(!)1[tj?1;tj)(t), for t 2 [a;b] and ! 2 ?, with n 2N, (tj)nj=0 a partition of [a;b], and (Xj)nj=1 2 L2(?)n where Xj is Ftj?1-adapted for all j 2f1;2;??? ;ng, we let (Ia;bX)(!) := nX j=1 Xj(!)?B!(tj)?B!(tj?1)?; for ! 2 ?. The sum on the right-hand side is independent of the representation of X and coincides with the left Riemann-Stieltjes sum of X! with respect to B! for any partition of [a;b] that is a reflnement of the partition (tj)nj=0. Theorem 2.1 (Ito Isometry for Simple Functions). Let F = F(B) and let X 2 L2F([a;b]??) be simple. Then Ia;bX 2 L2(?) with kIa;bXk2L2(?) = nX j=1 (tj ?tj?1)kXjk2L2(?) = kXk2L2([a;b]??): 32 Proof. First note that kIa;bXk2L2(?) = Z ? ? nX j=1 Xj(Btj ?Btj?1)?2dP = nX i;j=1 Z ? XiXj(Bti ?Bti?1)(Btj ?Btj?1)dP = nX i;j=1 E?XiXj(Bti ?Bti?1)(Btj ?Btj?1)?; where E denotes expectation with respect to P. Next, realize that i 6= j (say, without loss of generality, i < j) implies E?XiXj(Bti ?Bti?1)(Btj ?Btj?1)? = 0: (2.4) This is because Xi, Xj, and (Bti ?Bti?1) are Fti?1-measurable and because of independent increments (so that E(Btj ? Btj?1) = EFti?1(Btj ? Btj?1)). Therefore, by deflnition of conditional probability and use of the the \pull out" property (see the section on conditional probability), we have E?XiXj(Bti ?Bti?1)(Btj ?Btj?1)? = E?EFti?1(XiXj(Bti ?Bti?1)(Btj ?Btj?1))? = E?XiXj(Bti ?Bti?1)EFti?1(Btj ?Btj?1)? = E?XiXj(Bti ?Bti?1)E?Btj ?Btj?1)? = E?XiXj(Bti ?Bti?1)?E(Btj ?Btj?1): 33 Since E(Btj ?Btj?1) = 0, we have shown (2.4). Next, when i = j, E(XiXj(Bti ?Bti?1)(Btj ?Btj?1)) = E(X2i )(ti ?ti?1); (2.5) since by the same argument used to show (2.4), E(X2i (Bti ?Bti?1)2) = E(X2i )E(Bti ?Bti?1)2; and since B has stationary increments, E(Bti ?Bti?1)2 = E(Bti?ti?1)2 = var(Bti?ti?1) = ti ?ti?1: Finally, by combining (2.4) and (2.5), we see that kIa;bXk2L2(?) = nX i;j=1 E(XiXj(Bti ?Bti?1)(Btj ?Btj?1)) = nX i=1 E(X2i )(ti ?ti?1) = kXk2L2([a;b]??): Therefore, Ia;b isa(linear)isometryfromadense(linear)subspaceofL2F([a;b]??)into L2(?); as such, it has a unique extension to a linear isometry Ia;b : L2F([a;b]??) ! L2(?). This deflnes the Ito integral Ia;bX for every X 2 L2F([a;b]??), and we have the Ito isometry, kIa;bXkL2(?) = kXkL2([a;b]??): 34 We will use the symbol Rba Xt dBt to denote the Ito integral Ia;bX. Stated rigorously: Deflnition 2.2. Let F := F(B). For every X 2 L2F([a;b]??), the Ito integral Rba Xt dBt exists and is deflned by Z b a X(t;!)dBt(!) := limn!1 Z b a Yn(t;!)dBt(!) (convergence in L2(?)), where (Yn)n2N is any sequence of simple functions that approach X in L2F([a;b]??). Note that, due to Fubini?s theorem, Rba X2t dt is an integrable function on ?, with R ? ?Rb a X 2t dt?dP = kXk2 L2([a;b]??). This allows us to write the Ito isometry in the form E ?Z b a Xt dBt ?2 = E ?Z b a X2t dt ? : Now, if X is a continuous-time, real-valued process on ? such that X 2 L2F([0;t]??) for every t 2 R+ then ItX = Rt0 XsdBs is deflned for every t 2 R+, and IX := (ItX)t2R+ is a stochastic process. It can be shown that IX is a martingale with respect to F, and as a consequence, has a modiflcation with continuous paths (see [12, pp.22-26] for more details). In the future, we will assume without saying that IX has continuous paths. Deflnition 2.2 is enough to make sense ofRt0 XsWsds = Rt0 XsdBs on the right hand side of (2.3), provided that X 2 L2F([0;t]??). This condition needs to be part of the notion of a \solution" of equation (2.3). As discussed earlier, F = F(B)-adaptedness of X is a reasonable requirement as long as the \randomness" of X is \caused" solely by B. 35 Now, instead of solving just one scalar \stochastic difierential equation", we would like to solve coupled systems of such equations. By way of motivation, consider the system _Xi = (ri(X)+Wi)Xi; 1 ? i ? n (2.6) for an Rn-valued process X = (X1;X2;??? ;Xn), where r = (r1;r2;??? ;rn) : Rn ! Rn is a su?ciently smooth vector fleld, B := (B1;B2;??? ;Bn) is an n-dimensional Brownian motion, and W = (W1;W2;??? ;Wn) = ( _B1; _B2;??? _Bn). In integral form, equation (2.6) reads Xit = Xi0 + Z t 0 ri(Xs)Xisds+ Z t 0 XisdBis; 1 ? i ? n: (2.7) Using Deflnition (2.2), the integral on the far right would make sense if we could assume that Xi 2 L2F(Bi)([0;t]??). Unfortunately, this is not a reasonable assumption: due to the coupling of the equations, Xi is afiected by all components of B; thus Xi should be F(B)- adapted, but cannot be expected to be F(Bi)-adapted! Luckily, the assumption F = F(B) in Theorem 2.1 and Deflnition 2.2 (where B is a one-dimensional Brownian motion) can be relaxed | it is enough to assume that F is a flltration of A such that B is a martingale with respect to F. Under this assumption (clearly satisfled if F = F(B)), the proof of the Ito isometry for simple functions (Theorem 2.1) still goes through (note that EF(Bitj ?Bitj?1) = 0 , for 36 1 ? j ? n, and then, so does the entire construction, culminating in Deflnition 2.2 (for more details, see ([12, p.24])) Now, if B = (B1;B2;??? ;Bn) is an n-dimensional Brownian motion, then each com- ponent Bi is a martingale with respect to F = F(B). This is true since Bit ? Bit?c is independent of Ft?c for c 2 (0;t), which means E(BitjFt?c) = E(Bit ?Bit?c +Bit?cjFt?c) = 0+Bit?c: (2.8) As a consequence, the integral on the far right of equation (2.7) is deflned if Xi 2 L2F(B)([0;t]??) for 1 ? i ? n, as desired. Note that, in vector notation, the system can be written as _X = U(X)+V(X)W; (2.9) where U(X) = (r1(X)X1;r2(X)X2;??? ;rn(X)Xn) and V(X) is the diagonal n?n-matrix whose diagonal entries are X1;??? ;Xn. Of course, we would like to consider more general systems of the form (2.9), with arbitrary (su?ciently smooth) functions U :Rn !Rn and V : Rn ! Rn?n. Also we would like to allow for the possibility that only some of the equations are afiected by white noise, say, the flrst d equations, where 1 ? d ? n. In this case, W = (W1;??? ;Wd;0;??? ;0) = ( _B1;??? ; _Bd;0??? ;0); where B is a d-dimensional Brownian motion. Only the flrst d columns of V are then relevant, and we may as well assume that V :Rn !Rn?d. 37 Under these assumptions, the integral version of equation (2.9) is Xt = X0 + Z t 0 U(Xs)ds+ Z t 0 V(Xs)dBs: (2.10) Of course, the integrals are understood \componentwise", that is, Z t 0 U(Xs)ds = Z t 0 Ui(Xs)ds ?n i=1 ; Z t 0 V(Xs)dBs = Z t 0 dX j=1 V ij(Xs)dBjs ?n i=1 = dX j=1 Z t 0 V ij(Xs)dBjs ?n i=1 : The second integral is well deflned, provided that V ij(X) 2 L2F([0;t]??) for all 1 ? i ? n, 1 ? j ? d, where F is a flltration of A such that each component of B is a martingale with respect to F. Note that if X is F-adapted and V is continuous, then V(X) is F-adapted. This motivates the following deflnition. Deflnition 2.3. Let (?;A;P) be a probability space, n 2 N, d 2 f1;??? ;ng. Let B be a d-dimensional Brownian motion on ?, F a flltration of A such that each component of B is a martingale with respect to F. Let U be an Rn-valued process on ?, V an Rn?d-valued process on ? such that Ui 2 L2F([0;t] ? ?) and V ij 2 L2F([0;t] ? ?) for all 1 ? i ? n, 1 ? j ? d, t 2R+. If X0 is an Rn-valued random variable on ?, the process X, deflned by Xt = X0 + Z t 0 Usds+ Z t 0 VsdBs; (2.11) 38 for t 2 R+, is called a stochastic integral generated by (U;V). The set of all stochastic integrals generated by (U;V) is denoted by Z Utdt+ Z VtdBt; (2.12) with slight abuse of language, we call this the stochastic integral generated by (U;V). Formally, the process X deflned by (2.11) is an \antiderivative" of U +V _B, that is, a solution of the \stochastic difierential equation" _X = U +V _B; (2.13) or, in difierential notation, dXt = Utdt+VtdBt: (2.14) In the same sense, the stochastic integral (2.12) is the set of all \antiderivatives" (the \indeflnite integral") of U + V _B, that is, the \general solution" of (2.13)/(2.14). The formal expression Utdt+VtdBt is called the stochastic difierential generated by (U;V). We note that the assumptions on V in Deflnition 2.3 are needed to guarantee the exis- tence of the second integral in (2.11). They also guarantee that the process ?Rt0 VsdBs?t2R + is F-adapted and square-integrable in the sense that Rt0 VsdBs 2 L2(?) for all t 2R+. The assumptions on U are stronger than necessary to guarantee the existence of the flrst inte- gral in (2.11); in fact, Ui 2 L1([0;t]??) for all 1 ? i ? n and t 2R+ would be su?cient. 39 However, the stronger assumptions on U guarantees that the process ?Rt0 Usds?t2R + is F- adapted and square-integrable in the sense that Rt0 Usds 2 L2(?) for all t 2R+. Indeed, we have the following lemma. Lemma 2.2. Under the assumptions of Deflnition 2.3, the processes ?Rt0 Usds?t2R + and ?Rt 0 VsdBs ? t2R+ are well-deflned, F-adapted and square-integrable in the sense that Rt 0 Usds, Rt 0 VsdBs 2 L 2(?) for all t 2R+. Corollary 2.1. Assume the hypotheses of Deflnition 2.3 with F = F(B) and let X0 2 L2(?). Then the process X deflned by (2.11) is F(B;X0)-adapted and square-integrable in the sense that Xt 2 L2(?) for all t 2R+. Let us return to the integral equation (2.10), that is, the integral version of the \stochas- tic difierential equation" dXt = U(Xt)dt+V(Xt)dBt: (2.15) It is natural to seek a solution X of (2.15) subject to an initial condition of the form X0 = X0; (2.16) where X0 is a given Rn-valued random variable on ?. The integral equation corresponding to (2.15/(2.16)) reads Xt = X0 + Z t 0 U(Xs)ds+ Z t 0 V(Xs)dBs (2.17) 40 Due to the random initial condition, a solution of (2.17) cannot be expected to be F(B)-adapted, but should be F(B;X0)-adapted. The same would then hold for U(X) and V(X). However, under this assumption, the Ito integral in (2.17) is deflned only if each component of B is a martingale with respect to F(B;X0). This is the case if X0 and B are independent; this follows from an argument similar to (2.8) and is reasonable intuitively, as we expect that the randomness of the initial condition should have nothing to do with an arbitrarily given Brownian motion. Along these lines, note that if X were only F(B)-adapted, then X0 being independent of B would force X0 to be a.s. constant! The above consideration motivates the following version of Corollary (2.1). Corollary 2.2. Assume the hypotheses of Deflnition 2.3 with F := F(B;X0), where X0 2 L2(?) is independent of B. Then the process X deflned by (2.11) is F(B;X0)-adapted and square-integrable in the sense that Xt 2 L2(?) for all t 2R+. Now we move to the next section, where we formally deflne stochastic difierential equations, deflne the solution to a stochastic difierential equation, and discuss the existence and uniqueness of solutions. 2.2 Stochastic Difierential Equations and their Solutions As we discussed in the previous section, (2.13) or (2.14) is a stochastic analog of the deterministic antidifierentiation problem dx = f(t)dt or _x = f(t), where f :R+ !Rn is a given, su?ciently regular function. To arrive at the stochastic analog of dx = f(t;x)dt or _x = f(t;x), where f : R+ ?Rn ! Rn is a given, su?ciently regular function, we need to discuss the composition of stochastic processes. 41 Let (?;A), (S;S), and (S0;S0) be measurable spaces, let H be a continuous-time S- valued stochastic process on (?;A) and let G be a continuous-time S0-valued stochastic process on (S;S). We now (with slight abuse of notation) deflne the composition of G with H, denoted by G?H. Deflnition 2.4. For G and H as above, we deflne the composition G?H to be the process deflned by (G?H)t := Gt ?Ht, for all t 2R+. In this way, G?H is a continuous-time S0-valued stochastic process on (?;A). Now, if X is a stochastic integral, then X is a continuous-time Rn-valued stochastic process on (?;A;P). So, take U : R+ ?Rn ! Rn to be measurable with respect to the second variable (so U is a continuous-timeRn-valued stochastic process on (Rn;Bn)). Then U ?X is an Rn-valued process on ?, and we have (U ?X)(t;!) = (Ut ?Xt)(!) = Ut(Xt(!)) = Ut(X(t;!)) = U(t;X(t;!)); for all t 2R+ and ! 2 ?. Similarly, take V : R+ ?Rn ! Rn?d to be measurable with respect to the second variable. Then V ?X is anRn?d-valued process on ?, and at least formally, we can consider the stochastic difierential equation dXt = U(t;Xt)dt+V(t;Xt)dBt; (2.18) or the equivalent integral equation Xt = X0 + Z t 0 U(s;Xs)ds+ Z t 0 V(s;Xs)dBs: (2.19) 42 This motivates the deflnition of solution to a stochastic difierential equation. Deflnition 2.5. We say X is a solution to (2.18) if X is any continuous-time Rn-valued process X on ? such that (U ? X;V ? X) satisfy the hypotheses of Deflnition 2.3 and X satisfles (2.19) for all t 2R+. We remark that this deflnition makes sense due to Lemma 2.2. We will soon give particular conditions on U and V in order to guarantee that a unique solution to (2.18) exists. For now, assume that U and V are appropriate enough for (2.18) to make sense. We can now rigorously impose an initial condition to (2:18), state the deflnition of the stochastic initial value problem, and deflne the notion of solution. Let X be a solution to (2:18) as above, and suppose we are given anRn-valued random variable X0 = (X10;X20;??? ;Xn0 ) such that Xi0 2 L2(?) for each 1 ? i ? n and such that X0 is independent of B. Recalling the argument preceding Corollary 2.2, if X0 = X0 a.s. then we specify F to be F(X0;B), where F(X0;B) = fFt(X0;B)gt2R+ and where Ft(X0;B) is the -algebra generated by X0 and fBsjs ? tg, for every t 2 R+. Motivated by this and Corollary 2.2, the following deflnition is justifled. Deflnition 2.6. Given X0 2 L2(?), independent of B, we call dXt = U(t;Xt)dt+V(t;Xt)dBt; (2.20) X0 = X0 a.s. a (strong) stochastic initial value problem, and we say X is a (strong) solution to (2.20) if X is a solution to dXt = U(t;Xt)dt+V(t;Xt)dBt in the sense of Deflnition 2.5 and satisfles X0 = X0 a.s. 43 In Problem (2.20) the Brownian motion B is given in advance and we are seeking a solution X. A \weak" version of (2.20) would require that, along withX, we flnd a Brownian motion B on a probability space (?;A;P) and a flltration F such that each component of B is a martingale with respect to F. Then, since (?;A;P) is not given, we cannot impose an initial condition as in (2.20), but we can impose an initial distribution ?. This leads to the following deflnition. Deflnition 2.7. We call dXt = U(t;X)dt+V(t;X)dBt; (2.21) PX0 = ? a.s. a weak stochastic initial value problem, and we say (X;B;F) is a weak solution to (2.21) if B is an Rd-valued Brownian motion such that Bi is a martingale with respect to F for all 1 ? i ? d, and X is a solution to dXt = U(t;X)dt+V(t;X)dBt in the sense of Deflnition 2.5 and satisfles ? = PX0. Clearly, a strong initial value problem induces a weak initial valueproblem (by replacing the given initial condition with its distribution and then removing the given probability space and Brownian motion); if that strong initial value problem has a solution then clearly the induced weak initial value problem must also have a solution. Also, if a weak initial value problem has a solution, then there is at least one associated strong initial value problem (by taking the Brownian motion B in the weak problem?s solution as the given Brownian motion in the strong problem and constructing a random variable X0 over B?s accompanying probability space such that X0 has distribution ?). Further, the existence 44 of a weak solution X to a weak initial value problem does not necessarily imply that X is a strong solution to the induced strong initial value problem; this is believable simply because X may be adapted to some flltration F but not to F(X0;B). From a modeling perspective, we really have weak problems (as no one can realistically present up front the speciflc representation of the white noise involved). Weak solutions are also useful because there are examples of weak initial value problems which have a weak solution but no strong solutions (see [10, pp. 301-302]). We drop the adjective weak or strong when there is no ambiguity. Along these lines, there are also strong and weak notions of uniqueness. Deflnition 2.8. We say that the strong initial value problem (2:20) has the strong unique- ness property if any two solutions X and ~X are modiflcations of each other. For convenience we often say that X is a strongly unique solution, or that X is strongly unique. Strong uniqueness is often called pathwise uniqueness. Deflnition 2.9. We say that the weak initial value problem (2:21) has the weak unique- ness property if any two solutions (X;B;F) and ( ~X; ~B; ~F) are equivalent in the sense that PX = P ~X. Again, for convenience, we often say that X is a weakly unique solution or X is weakly unique, and since we may identify any weak solution with its (unique) distribution, we sometimes call PX the weak solution. Weak uniqueness is often called uniqueness in distri- bution. Analogously, we can have an initial value problem starting at any time s > 0. 45 Now we have the ingredients to present an existence and uniqueness theorem, which gives us at least one way to place conditions on U and V to guarantee the situation of Deflnition 2.5 is satisfled (one proof can be found in [12, Theorem 5.5]): Theorem 2.2 (Existence/Uniqueness). Let (?;A;P) be a measure space, n 2N, d 2f1;:::;ng, B a d-dimensional Brownian motion on ?, U :R+ ?Rn !Rn and V :R+ ?Rn !Rn?d measurable functions, X0 2 L2(?), X0 independent of B, F = F(B;X0). Assume that there exist positive constants C and D such that, for all t 2 R+ and x;y 2Rn, jU(t;x)j+jV(t;x)j? C(1+jxj); (2.22) where j?j is the Euclidean norm, and jU(t;x)?U(t;y)j+jV(t;x)?V(t;y)j? Djx?yj: (2.23) Then the initial value problem (2:20) has a strongly unique strong solution X. Before proving this important theorem, we remark that (2.22) is imposed to avoid that X explodes, i.e., that there is a flnite time T0 such that P(limt!T0 jXt(?)j = 1) > 0 (see [9, Lemma 21.6]) while (2.23) ensures uniqueness. Compare this to the deterministic case, where an at most linear growth estimate insures that solutions do not explode (see [1, Theorem 7.6]) and a Lipschitz condition guarantees uniqueness. The idea of the proof is similar to the deterministic case; let us only consider the scalar case. 46 Proof. First, we show uniqueness. Suppose two solutions X and Y exist, having initial values X0 and Y 0, respectively. Then we can estimate EjXt?Ytj2 for a flxed t by using the inequality (x+y +z)2 ? 3(x2 +y2 +z2), the Ito isometry, the Cauchy-Schwarz inequality, and (2.23) so that Gronwall?s inequality applies. This yields an inequality of the form EjXt ?Ytj2 ? 3EjX0 ?Y 0j2eKt; where K is a constant depending only on D and T. Assuming X0 = Y 0 then im- plies that P(jXt ? Ytj = 0) = 1 (recall t is flxed). We can repeat this argument for all rational t and then use that stochastic integrals are continuous in time to obtain P(St2[0;T]jXt ?Ytj = 0) = 1, which means X and Y are modiflcations of each other. This shows the strong uniqueness. To show existence, flrst deflne the iterations Y (0)t := X0 and Y (k+1)t := X0 + Z t 0 U(s;Y (k)s )ds+ Z t 0 V(s;Y (k)s )dBs; (2.24) for k 2Z+. We claim that (2.24) is well deflned; flrst note that Y (k)t isFt(X0;B)-measurable for each k 2Z+ and for all t 2 [0;T]. Next, we have by a similar calculation to that in the uniqueness proof, by (2.22), and by Fubini, that EjY (1)t ?Y (0)t j2 ? 2C2(t+t2)(1+EjX0j2) ? L1t; (2.25) where L1 only depends on C;T, and EjX0j2. Therefore (2.24) makes sense for k = 1. One can proceed by induction to show that (2.24) makes sense for all k; we can estimate 47 EjY (2)t ?Y (1)t j2 similarly and use (2.23) to yield an inequality of the form EjY (2)t ?Y (1)t j2 ? 3(1+T)D2 Z t 0 EjY (1)s ?Y (0)s j2ds (2.26) ? 3(1+T)D2 Z t 0 L1s ds ? L2t 2 2 ; where L2 only depends on C;D;T, and EjX0j2. Iterating this, we can estimate EjY (k+1)t ?Y (k)t j2 similarly: EjY (k+1)t ?Y (k)t j2 ? 3(1+T)D2 Z t 0 EjY (k)s ?Y (k?1)s j2ds ? Lk+1t k+1 (k +1)! (2.27) where Lk+1 is a constant depending only on T;C;D; and EjX0j2; in fact, Lk+1 = L1(3(1+T)D2)k, for k 2 Z+. Since t ? T, this inequality shows fY (k)t gk2Z+ is a Cauchy sequence in L2([0;T]??), so for every t 2 [0;T], fY (k)t gk2Z+ has a Ft(X0;B)- measurable limit Xt. In fact, this convergence is uniform; from (2.27), we apply the in- equality (see [6, Theorem 2.8]) P(sup [a;b] j Z b a f(s)dBsj > r) ? 1r2E( Z b a f2(s)ds) yielding P(sup [0;T] jY k+1t ?Y kt j > 1k2) ? Lk+1t k+1 k +1! k 4: 48 Since P1k=1 Lk+1tk+1k+1! k4 converges, by the Borel-Cantelli Lemma (see e.g. [6, Theorem 1.1]) there exists a su?ciently large M 2N such that, for all m ? M, P(sup [0;T] jY m+1t ?Y mt j > 1m2) = 0: Therefore the convergence of fY (k)t gk2Z+ to Xt is uniform, which means Xt = X0 + lim k!1 ( Z t 0 U(s;Y (k)s )+ Z t 0 V(s;Y (k)s )) = X0 + Z t 0 U(s;Xs)+ Z t 0 V(s;Xs)dBs; so X is indeed a solution. Unless we say otherwise, we assume (2.22) and (2.23) hold when we discuss stochastic difierential equations. As we discussed before, a strong initial value problem induces a weak initial value problem; it can be shown that if a strong initial value problem enjoys the strong uniqueness property, then the strong initial value problem and its induced weak initial value problem have the weak uniqueness property (see e.g. [10, pp. 306-310]). We will soon discuss how deterministic dynamical systems generalize to the stochastic case, but before this, we reserve the next section for Ito?s formula, which allows us to calculate speciflc examples of Ito integrals and hence solutions to stochastic difierential equations. 49 2.3 Ito?s Formula and Examples Equipped with Ito integral and stochastic difierential equation concepts, we now focus on explicitly calculating Ito integrals and solutions to stochastic difierential equations. The tool we need is Ito?s formula, which is essentially a stochastic analog of the chain rule. First, we prove Ito?s formula in one dimension, and then we present the n-dimensional version and study some examples. Let X be the stochastic integral generated by (U;V) (for U;V satisfying the assumptions in Deflntion 2.3) and let g 2 C2(R+ ?R). Then the process (g(t;Xt))t2R+ is also a 1-dimensional stochastic integral, and for all t, g(t;Xt) = g(0;X0)+ Z t 0 ?@g @s(s;Xs)+Us @g @x(s;Xs) + 12V 2s @ 2g @x2(s;Xs) ? ds+ Z t 0 Vs@g@x(s;Xs)dBs; which we call Ito?s formula. Notice the \extra" term Rt0 12V 2s @2g@x2ds; such a term is often called a \correction term." We can, in fact, recover the \natural" form of the chain rule by using the Stratonovich integral (which difiers from the Ito integral in that it uses the midpoint instead of the left endpoint), but Stratonovich integrals \look into the future" and (among other things) do not enjoy the martingale property. To see where this extra term comes from, let us examine a Taylor expansion of g(t;Xt); it is enough to assume that g; @g@t; @g@x; and @2g@x2 are bounded, for if we can prove it in this case, then we can take sequences of bounded functions gn; @gn@t ; @gn@x ; and @2gn@x2 to uniformly approximate a C2 function g and @g@t; @g@x; and @2g@x2, respectively on compact subsets ofR+?R (by Stone-Weierstrass) and then the uniform convergence allows the limit to carry through 50 the integral (for the stochastic term, use Ito?s isometry). Recall that the norm of any partition P of [0;t] is deflned to be k P k= max1?i?n(ti ?ti?1), and let P = (tj)nj=0 be a partition of [0;t] with su?ciently small norm. Then, carrying out a Taylor expansion of g(t;Xt), g(t;Xt) = g(0;X0)+ nX j=1 ?g(t j;Xtj)?g(tj?1;Xtj?1) ? = g(0;X0)+ nX j=1 @g @t(tj?1;Xtj?1)(tj ?tj?1) + @g@x(tj?1;Xtj?1)(Xtj ?Xtj?1) ? + 12 nX j=1 @2g @t2 (tj?1;Xtj?1)(tj ?tj?1) 2 +2 @ 2g @x@t(tj?1;Xtj?1)(Xtj ?Xtj?1)(tj ?tj?1) + @ 2g @x2(tj?1;Xtj?1)(Xtj ?Xtj?1) 2 ? + nX j=1 Rj; where Rj in the remainder Pnj=1 Rj takes the form Rj = X fjfij?3g @fi1 @tfi1 @fi2 @xfi2 g(tj?1;Xtj?1) fi1!fi2! (tj ?tj?1) fi1(Xt j ?Xtj?1) fi2: Let us approximate each term, using that the norm of the partition is small. For the flrst-order terms, we see X j @g @t(tj ?tj?1) ? Z t 0 @g @t(s;Xs)ds; 51 and X j @g @x(Xtj ?Xtj?1) ? Z t 0 @g @x(s;Xs)dXs; where Z t 0 @g @x(s;Xs)dXs = Z t 0 @g @x(s;Xs)U(s;Xs)ds+ Z t 0 @g @x(s;Xs)V(s;Xs)dBs; For the second order terms, only the term Pnj=1 @2g@x2(tj?1;Xtj?1)(Xtj ?Xtj?1)2 is not approximately zero. We expand this term: nX j=1 @2g @x2 ? (Utj ?Utj?1)2(tj ?tj?1)2 +(Vtj ?Vtj?1)2(Btj ?Btj?1)2+ (2.28) (Utj ?Utj?1)(Vtj ?Vtj?1)(tj ?tj?1)(Btj ?Btj?1) ? : We claim that (2.28) has only one term that is not approximately zero, namely, Pn j=1 @2g @x2(Vtj ?Vtj?1) 2(Bt j ?Btj?1) 2; and it satisfles k nX j=1 @2g @x2(Vtj ?Vtj?1) 2(Bt j ?Btj?1) 2 ? Z t 0 @2g @x2V 2 s ds kL2? 0: (2.29) For details on how to prove (2.29), see [12, p.32]; we present a similar and more transparent argument to clearly convey the essence of Ito?s formula. To this end, we now show that lim kPk!0 k nX j=1 [(Btj ?Btj?1)2 ?(tj ?tj?1)] kL2= 0: (2.30) 52 To see this, call Mj = (Btj ?Btj?1)2 ?(tj ?tj?1). Then EMj = 0 because E(Btj ?Btj?1)2 = tj ?tj?1. Further, E(Mj)2 = 2(tj ?tj?1) because E(Mj)2 = E((Btj ?Btj?1)4 ?(Btj ?Btj?1)2(tj ?tj?1)+(tj ?tj?1)2) = 3(tj ?tj?1)2 ?2(tj ?tj?1)2 +(tj ?tj?1)2; by (1.5) (in the section on Brownian motion). Now, since Brownian motion has independent increments, each Mj is independent, which means E( nX j=1 Mj)2 = E nX j=1 (Mj)2 = nX j=1 2(tj ?tj?1)2; which clearly goes to zero as the norm of the partition goes to zero. So, we have taken the Taylor expansion of g(t;Xt) and approximated each term when the norm of the partition is small; combining all of them yields Ito?s formula, as claimed. Using difierential notation, we can express (2.30) as \(dBt)2 = dt". Also, we may write \(dt)2 = dtdBt = 0," since terms containing (tj ?tj1)2 or (tj ?tj1)(Btj ?Btj1) go to zero when the norm of the partition goes to zero; thus we could write \(dXt)2 = U2(dt)2 +UVdtdBt +V 2(dBt)2 = V 2dt:" We could then rewrite Ito?s formula more conveniently: dg(t;Xt) = @g@t(t;Xt)dt+ @g@x i (t;Xt)dXt + 12 @ 2g @x2(t;Xt)(dXt) 2: (2.31) 53 We may extend this result to n-dimensions: taking X to be an n-dimensional stochastic integral and taking g 2 C2(R+ ?Rn;Rn), where X = (Xi)ni=1 and g = (gi)ni=1, then (g(t;Xt))t2R+ is an n-dimensional stochastic integral, and for all t 2R+, dgk(t;Xt) = @g k @t (t;Xt)dt+ nX i=1 @gk @xi (t;Xt)dX i t (2.32) + nX i;j=1 1 2 @2gk @xi@xj(t;Xt)dX i tdX j t; for 1 ? k ? n (understanding dBitdBjt = ?ijdt and dtdBit = 0). We remark that Ito?s formula can be stated and proved in higher generality using martingale theory (as in ([9, Theorem 17.18])), but we do not need such a level of abstraction in the specialized environment of stochastic difierential equations. Now for some examples; let B be a real-valued Brownian motion on a probability space (?;A;P). Also, let X0 = x a.s. always be the initial condition (for some x 2R). To start, let us observe an integration by parts formula: for f 2 C1(R+), Z t 0 f(s)dBs = f(t)Bt ? Z t 0 Bsdf(s); (2.33) for all t 2 R+. This follows from Ito?s formula; we think that Rt0 f(s)dBs should some- how yield a \f(t)Bt" sort of term, so we take g(t;x) = f(t)x. Then it is clear that @ @tg(t;x) = f 0(t)x, @ @xg(t;x) = f(t); and @2 @x2g(t;x) = 0. Since g(t;Bt) = f(t)Bt, by Ito?s formula, d(g(t;Bt)) = d(f(t)Bt) = f0(t)Btdt+f(t)dBt +0; 54 which easily yields (2.33). Note that f only depends on time, and note that (similarly to the above argument) Z t 0 f(s)dXs = (f(t)Xt ?f(0)X0)? Z t 0 Xsdf(s) (2.34) holds for any 1-dimensional stochastic integral X (for a more general result, see [9, Theorem 17.16]). Along these lines, consider dXt = dBt ?Xtdt: (2.35) Calling g(t;x) = etx and using Ito?s formula yields d(etXt) = etdXt +etXtdt = etdBt; and it is easy to see that the solution is Xt = e?tx+ Z t 0 e(s?t)dBs: Similarly, dXt = dBt ?bXtdt; 55 where ;b are constants, has solution Xt = e?btx+ Z t 0 eb(s?t)dBs: Next, we have a basic example of an Ito integral calculation: Z t 0 BsdBs = 12(B2t ?t): (2.36) We expect the 12B2t term from deterministic calculus, but we inherit the extra 12t term from the stochastic case (often called a correction term). To use Ito?s formula, take g(t;x) = 12x2 and Xt = Bt, which yields dg(t;Bt) = BtdBt + 12(dBt)2; which means 1 2dB 2 t = Z t 0 BsdBs + 12 Z t 0 ds; and this easily reduces to (2.36). Thisexampleshowshowtherealization(dBt)2 = dthelpscalculateItointegrals. Notice that the integral of a purely random process ends up having a deterministic part; for deeper insight along these lines, see [9, pp. 339-340]. Along these lines, let us solve dXt = bXtdt+ XtdBt; (2.37) 56 where and b are positive constants. The noiseless case would produce the solution Xt = ebXt, so we may expect something like this but with a correction term. Since (2.37) can be interpreted as Z t 0 dXs Xs = bt+ Bt; we apply Ito?s formula (using that (dt)2 = 0): d(ln(Xt)) = dXtX t ? (dXt) 2 2X2t = dXt Xt ? 2X2t dt 2X2t = dXt Xt ? 1 2 2dt: Thus, solving for dXtXt and integrating, we get bt+ Bt = ln(Xtx )+ 12 2t; or Xt = xe(b?12 2)t+ Bt: Notice that we recover the proper solution for the noiseless case when := 0, and that we have solved the one-dimensional equation _X = (r(X)+W)X when r(X) is constant. For the next example we study the logistics equation dXt = (aXt ?b(Xt)2)dt+ XtdBt; (2.38) for a;b; and positive constants. We remark that only a is perturbed; if b is perturbed, then the probability that solutions do not explode in flnite time is zero (except, of course, for the trivial solution x = 0; see [2, p. 99]). 57 To solve (2.38), substitute Yt = ?1Xt to get dYt = (?aYt ?b)dt? YtdBt; which is solved in a similar fashion to (2.37): Yt = e?[(a?12 2)t+ dBt][?1x ?b Z t 0 e(a?12 2)s+ dBsds]: Transforming back to Xt gives Xt = xe (a?12 2)t+ Bt 1+xbRt0 e(a?12 2)s+ dBsds : Observe that in the noiseless case one yields the familiar elementary solution Xt = xKx?(K ?x)e?at; for K = ba and all t 2R+. Finally, let us move to a two-dimensional system; now let B be anR2-valued Brownian motion on a probability space (?;A;P). We will \reverse engineer" the classical system where dx1 = ?x2dt dx2 = x1dt; 58 with initial value x(0) = (x1(0);x2(0)) = (1;0) 2R2. It is well-known that the solution is x(t) = (cos(t);sin(t)) = eit. Now pick g(t;x) = eix, so that g(t;B) = eiB = (cosB;sinB) := (X1;X2): Then by Ito?s formula, we see dX1t = ?sin(Bt)dBt ? 12 cos(Bt)dt = ?X2dBt ? 12X1dt dX2t = cos(Bt)dBt ? 12 sin(Bt)dt = X1dBt + 12X2dt is the stochastic system (with initial condition X0 = (1;0) a.s.) whose solution is a one- dimensional Brownian motion traveling around the unit circle. 59 Chapter 3 Dynamical Systems and Stochastic Stability 3.1 \Stochastic Dynamical Systems" In this section we present an overview of dynamical systems and their \stochastic" analogies. Let X be a metric space and let S : D(S) ? R? X ! X. For t 2 R, let St := S(t;?). Deflnition 3.1. We say S is a local dynamical system or local ow if D(S) is open in R?X, S is continuous and i) S0 = idS, and ii) St+s = St ?Ss for all t;s 2R. We say S is a global ow if S is a local ow with D(S) =R?X. We say S is a local (global) semi ow if the above conditions hold with R replaced by R+. Deflnition 3.2. We say that a set A ? X is positively invariant for a local semi ow S if St(A\D(St)) ? A for all t 2 R+. We say x is an equilibrium point if, for every t 2 R+, x 2 D(St) and St(x) = x. Recall that autonomous ordinary difierential equations generate ows: let b 2 C1(Rn;Rn) and consider the system _u = b(u). Given any initial value x 2 Rn there exists a largest open interval of existence Ix ?Rcontaining 0 such that the system _u = b(u) has a unique solution ux 2 C1(Ix;Rn) with ux(0) = x. The system _u = b(u) generates a local solution ow S : D(S) ?R?Rn !Rn with D(S) := f(t;x) 2R?Rnjt 2 Ixg where S(t;x) := ux(t) for all (t;x) 2 D(S); D(S) is open and S is continuous by [1, Theorem 60 8.3], S is C1(D(S);Rn) since b 2 C1(Rn;Rn) by [1, Theorem 9.5], and the uniqueness of the solution guarantees the group property: St+s = StSs for all s;t 2R+. Similarly, we can obtain the local solution semi ow SjD(S)\(R+?Rn). Observe that @@tS(t;x) = b(S(t;x)) for all (t;x) 2 D(S). Assume _u = b(u) generates a global solution ow S. Then St 2 C1(Rn;Rn) for all t 2R and St is invertible with S?1t = S?t 2 C1(Rn;Rn). So, for each t 2R, St :Rn !Rn is a C1-difieomorphism. This means that for any A 2 Bn we have the change of variable formula Z S?1t (A) f(x)dx = Z A f(S?1t (y))jdet(D(S?1t (y)))jdy (3.1) = Z A f(S?t(y))jdet(DS?t)(y)jdy where D(S?1t ) is the (nonzero) Jacobian matrix for S?1t . Thus, ?(A) = 0 implies ?(S?1t (A)) = 0 for all t; we shall soon see the importance of this property. Analogously, compare the above to the autonomous case of stoachastic difierential equations. For (U;V) 2 Sn;dF;B, this amounts to making U and V constant in time; call Ut := b for all t 2R+ and Vt := for all t 2R+, where b :Rn !Rn is (Bn, Bn)-measurable and :Rn !Rn?d is (Bn, Bn?d)-measurable. We remark that this may seem like a strange and sudden notation change, but it is quite common in the literature to use \ , b" notation and so we adhere to this convention now, specially reserving it for the autonomous case (even though many authors use and b for more general cases). We shall now see how solutions in the autonomous case enjoy some of the \nice dynam- ical systems properties" that we hope for. Considering only degenerate initial distributions, 61 we see that (2.20) takes the form dXt = b(Xt)dt+ (Xt)dBt; (3.2) X0 = x a.s.; with x 2Rn. To emphasize the initial condition in the solution?s expression, call X0;x the strong solution to (3.2), where B is given, and X0;x is F(B)-adapted. Let us study the induced weak problem dXt = b(Xt)dt+ (Xt)dBt; (3.3) PX0 = ?x: We will now provide two weak solutions to (3.3) that are \time shifts" of each other and use the weak uniqueness property to conclude that they must have the same distribution. Obviously, (X0;x;B;F(B)) is a weak solution to (3.3). Now consider the process (Xs;xt+s)t2R+, that is, the shifted version of X0;x as above where Xs;xs = x a.s. We would like to shift back in time by s to solve (3.3); observe that one cannot simply shift the Brownian motion B in time without afiecting the variance, so to \shift" B, we need to deflne a new Brownian motion with the appropriate distribution (this is why we must appeal to the weak problem!). So, deflne ~Bt := Bt+s ? Bs for all t ? 0. Then ~B = ( ~Bt)t2R+ is a Brownian motion starting at zero, has the same distribution as (Bt+s)t2R+ by stationary increments, and ~F( ~B) is deflned by ~Ft( ~B) := Ft+s(B) for all 62 t 2R+. Now, by deflnition of Xs;x and shifting, Xs;xt+s = x+ Z t+s s b(Xs;xr )dr + Z t+s s (Xs;xr )dBr = x+ Z s 0 b(Xs;xr+s)dr + Z s 0 (Xs;xr+s)d ~Br: This means ((Xs;xt+s)t2R+; ~B; ~F( ~B)) is also a weak solution of (3.3). Thus, by the weak uniqueness, (X0;xt )t2R+ ?d (Xs;xt+s)t2R+; (3.4) which leads us to the following deflnition: Deflnition 3.3. We say a process X is time-homogeneous if X satisfles (3.4). We call this the difiusion case, often referring to the solution X as a difiusion; we think of x as a particle that would move with velocity b (b is sometimes called the \drift" coe?cient) except that random collisions with other particles (say, the collisions occur with some kind of \intensity" , also called a \difiusion" coe?cient) may cause interference. As we will see later, there is an intimate relationship between stochastic difierential equations and second-order partial difierential equations, which is one reason why the term \difiusion" is used. Again, time-homogeneity has much to do with the \nice dynamical systems properties" we want; we may think of a difiusion as a \stochastic semi ow." In fact, in probability theory, a difiusion refers to a Markov process with continuous paths (with perhaps some extra properties); it therefore is not a surprise that a solution to a stochastic difierential 63 equation with initial values that are a.s. constant will be a Markov process (see [12, Theorem 7.2]). So what if the initial condition is a nondegenerate random variable? Then we have a semi ow action on a set of probability measures. More precisely, recall that Cb(Rn;R) is the set of bounded, continuous functions mapping Rn into R, and call MBn the set of all flnite Borel measures. Equip Cb(Rn;R) with the the sup norm, that is, the norm kfk = supxjf(x)j, to make it a Banach space. Then MBn is a subset of C?b(Rn;R), the dual space of Cb(Rn;R), so if we equip C?b(Rn;R) with the weak? topology, MBn inherits it. It can be shown that MBn is metrizable as a complete metric space (see [13, p.371]), therefore a dynamical system could be deflned over MBn. In fact, we are most interested in the (Lebesgue) absolutely continuous measures; this allows us the luxury of using semigroup theory to \ uctuate functions" rather than \ uctuate measures." Now deflne a family U = (Ut)t?0 such that for each t 2R+, Ut : MBn !MBn by Ut? := Z Pt(x;?)?(dx): (3.5) Then U is the dual semigroup to T, as hf;Ut?i = hTtf;?i for all f 2 C0(Rn;R) and all ? 2 C?0(Rn;R), where hf;?iC0(R);C?0(R) := R fd?. Note that U is a semidynamical system on MBn: ? 2MBn implies Ut? 2MBn for every t 2R+. Also, for any B 2MBn, we have U0?(B) = Z P0(x;B)?(dx) = Z 1B(x)?(dx) = ?(B) and Us+t?(B) = Z Ps+t(x;B)?(dx) = Z Z Pt(y;B)Ps(x;dy) ? ?(dx) = 64 Z Pt(y;B) Z Ps(x;dy)?(dx) = Z Pt(y;B)Us?(dy) = Ut(Us?(B)): All we lack is the continuity, the proof of which can be found in ([13, pp. 370-371]). Notice that if ? is a probability measure, then Ut? is a probability measure for every t 2R+ since T is a contraction semigroup. 3.2 Koopman and Frobenius-Perron Operators: The Deterministic Case In this section, we deflne the Koopman and Frobenius-Perron operators, which are useful in understanding how deterministic cases for difierential equations extend to stochas- tic ones. Primarily, we are interested in describing the distribution of a (continuous-time Rn-valued) solution process X of a stochastic difierential equation via semigroup theory; in the case where the distributions PXt have densities for every t, one can represent the ow as a semigroup of linear operators on L1(Rn) whose generator is a second-order difierential operator onRn. This leads to the set-up of a partial difierential equation called the Fokker- Planck equation, which describes the uctuation of the densities of the distributions of X (assuming that the random variable Xt has a Lebesgue density for all t 2R+). A brief outline of the procedure is as follows. First, we deflne the Koopman operator and the Frobenius-Perron operators. We show they are adjoint and derive the inflntesimal generators for each in the case of a deterministic ordinary difierential equation. We then make a stochastic generalization of these operators and mimic the deterministic case, em- ploying stochastic calculus. Finally, we yield the form for the inflntesimal generator of the semigroup describing the solution process X and use its adjoint to obtain the Fokker-Planck equation. 65 Given a measureable space (X;F) equipped with a signed measure ?, let (Y;G;??) be a ( -flnite) measure space, and let S : X ! Y be measurable. Deflnition 3.4. We deflne the image measure of ? under S by ?S(G) = ?(S?1(G)); for any G 2G. A useful characterization which follows from the deflnition is Z gd?S = Z (g ?S)d?; (3.6) for g nonnegative and measurable. Also, a measurable g is ?S-integrable ifi g ? S is ?- integrable, in which case (3.6) holds. Deflnition 3.5. S is nonsingular if ??(G) = 0 implies ?(S?1(G)) = 0 for every G 2G. So, if S is nonsingular, then ?S is absolutely continuous with respect to ?? and thus, has a Radon-Nikodym derivative d?Sd?? (which is in L1(??) := L1(Y;G;??) ifi ?S is flnite). Now let (X;F; ?) be another -flnite measure space, and let f 2 L1(?) := L1(X;F; ?), and recall that ?f(A) := RA fd?, for A 2F. We now deflne the Frobenius-Perron operator P (associated with S as above) by apply- ing the image measure construction to the signed measure ?f; denote this by ?Sf = (?f)S. Note that since f is ?-integrable and ?Sf(G) = ?f(S?1(G)) = Z S?1(G) fd? 66 holds for any G 2G, then ?Sf is a flnite signed measure. Deflnition 3.6. The operator P, where P : L1(?) ! L1(??) is deflned by Pf = d(? S f) d?? ; is called the Frobenius-Perron operator (associated to S). We obtain this by taking f 2 L1(?), creating ?f, associating it to ?Sf, and using the nonsingularity of S to take the the Radon-Nikodym derivative with respect to ??. In another way, for G 2G, ?Sf(G) = ?f(S?1(G)) = Z S?1(G) fd? = Z G Pfd??: In fact, what happens in general is that S is causing a change in measure, so one can think of ? 7! ?S as a mapping from MF into MG, where MA denotes the set of all flnite signed measures on a -algebra A. Deflne, for a given measure 0 on A, M 0A = f 2MAj << 0g; which is the set of all flnite signed measures which are absolutely continuous with respect to 0 over A. By Radon-Nikodym, there is a one-to-one correspondence between elements of M 0A and L1( 0). Also, if S is nonsingular, then ? 7! ?S is a mapping from M?F into M??G; by the above one-to-one correspondence, this mapping can be identifled with the Frobenius-Perron operator. Closely related to this concept is the Koopman operator. 67 Deflnition 3.7. The operator U, where U : L1(??) ! L1(?), deflned by Ug = g ? S, is called the Koopman operator (associated to S). Clearly, U is nonnegative in the sense that g ? 0 implies Ug ? 0 for all g 2 L1(??), and U is a bounded linear operator with operator norm 1. Further, P is a nonnegative bounded linear operator with operator norm 1. To see that P is nonnegative, let f 2 L1(?) be nonnegative everywhere and suppose that Pf is negative over a set G 2 G of positive measure. This implies Z S?1(G) fd? = Z G Pfd?? < 0; which contradicts that f is nonnegative. To see that the operator norm of P is 1, observe for nonnegative f 2 L1(?) kPfkL1(?) = Z Y Pfd?? = Z X fd? = kfkL1(?): This extends easily to the case of arbitrary f 2 L1(?), using the decomposition of f into positive and negative parts. Lemma 3.1. The Koopman operator is the adjoint of the Frobenius-Perron operator. Proof. By (3.6), for all f 2 L1(?);g 2 L1(??), < Pf;g >L1(??);L1(??)= Z (Pf)gd?? = Z d?S f d?? gd?? = Z gd?Sf = Z (g ?S)d?f = Z (g ?S)fd? =< f;Ug >L1(?);L1(?) : 68 Now, let X = Y be a metric space, and let F = G = B, where B is the Borel -algebra on X, and let ??;?? be two -flnite measures on B. Now we take S : R+ ? X ! X to be a nonsingular semidynamical system, that is, S is a semidynamical system such that St : X ! X is nonsingular for all t. Then we can deflne Ptf := d? St f d?? and Utg := g ?St, for each t 2R+. More speciflcally, let X = Y = Rn, let ? = ?? = ?n, and let F = G = Bn. As in the previous section, let b 2 C1(Rn;Rn) and let _y = b(y) generate a global solution ow S; recall that S is nonsingular due to the change of variable formula (3.1). We now observe that P := fPtgt?0 and U := fUtgt?0 (both of which are associated to S) are in fact semigroups; let t;s ? 0, and let g 2 L1(Rn). U is a semigroup because Ut+s(g) = U((St+s(g)) = U(St(Ss(g))) = Ut(Ss(g)) = Us(Ut(g)); and clearly, U0(g) = g. Also, P is a semigroup, since for any f 2 L1(Rn) and A 2Bn Z A Pt+sf = Z S?1t+s(A) f = Z S?1s (S?1t (A)) f = Z S?1t (A) Psf = Z A Ps(Ptf); so Pt+sf = Ps(Ptf) (and clearly, P0f = f). We have already observed that Ut and Pt have operator norm equal to one for each t ? 0, therefore P and U are in fact contraction semigroups. The next property we need is strong continuity. Call C1c(Rn) the space of functions in C1(Rn) that have compact support. Lemma 3.2. P is a strongly continuous semigroup. 69 Proof. By (3.1), we have (a.s.) Ptf(x) = f(S?tx)jdet((DS?t)(x))j for all f 2 C1c(Rn), and lim t!0+ f(S?tx)jdet((DS?t)(x))j = f(x): We claim that the limit is uniform in x. To see this, deflne Kt for t 2 [0;1] to be the support of Ptf, so K0 is the support of f. Then Kt = St(K0) for all t 2 [0;1], since jdet(DS?t)(x)j is never zero and Ptf(x) 6= 0 , f(S?tx) 6= 0 , S?tx 2 K0 , x 2 St(K0): But K0 is compact and S is continuous on R+ ?Rn, so K := St2[0;1] Kt = S([0;1]?K0) is compact. Therefore, (t;x) 7! f(S?tx) is uniformly continuous on [0;1] ? K, and so is (t;x) 7! f(S?tx)jdet((DS?t)(x))j. Finally, realizing that if h : [0;1]?Rn !R is uniformly continuous, then h(t;?) converges uniformly to h(0;?) as t ! 0+ proves our claim. Our claim implies limt!0 k Ptf ?f kL1= Z Rn limt!0jPtf(x)?f(x)jdx = Z K limt!0jPtf(x)?f(x)jdx = 0: Since P is uniformly bounded, by Banach-Steinhaus, Ptf ! f in L1 for any f 2 L1(Rn) (the closure of C1c(Rn) in the L1-norm). This means P is strongly continuous. 70 As a consequence, we know that P has an inflnitesimal generator AFP by Hille-Yosida. To identify AFP, we will use the duality between the Frobenius-Perron and Koopman operators, flrst showing that U has an inflnitesimal generator AK and identifying it. To this end, let t 2R+, x 2Rn, and suppose f 2 C1c(Rn); then by the Mean Value Theorem, deflnition of U, and deflnition of solution semi ow, Utf(x)?f(x) t = f(S(t;x))?f(x) t = rf(S(ct;x))? @@tS(ct;x) = b(S(ct;x))?rf(S(ct;x)); for some 0 ? c ? 1. Then lim t!0+ Utf(x)?f(x) t = limt!0+ b(S(ct;x))?rf(S(ct;x)) = b(x)?(rf(x)): By a similar argument as in the proof of Lemma 3.2, the limit is uniform in x. Thus, for at least f 2 C1c(Rn), Utf?ft converges in L1; in particular, Utf converges to f in L1 for all f 2 C1c(Rn) (difierentiability implies continuity). Since U is uniformly bounded, by the Banach-Steinhaus theorem, Utf ! f in L1 for all f 2 C0 (the closure of C1c(Rn) in the L1-norm). Further, Ut(C0) ? C0, so U restricts to a strongly continuous semigroup on C0. So, by Hille-Yosida, UjC0 has an inflnitesimal generator AK; our calculation shows C1c(Rn) ? D(AK) and that, for all f 2 C1c(Rn), AKf = b?(rf): (3.7) 71 We now use the duality between the Frobenius-Perron operators and the Koopman operators to identify AFP. Note that P is not really the dual of UjC0; also, we would need re exivity to insure that a strongly continuous contraction semigroup T has a strongly continuous dual contraction semigroup T?, and that the dual of the generator of T is really the generator of T? [17, Theorem 3.7.1]. However, let g 2 C1c(Rn) and let f 2 D(AFP) be continuously difierentiable. Then for any t ? 0 we have < Ptf;g >L1(Rn);L1(Rn)=< f;Utg >L1(Rn);L1(Rn); and we can subtract < f;g >L1(Rn);L1(Rn) from both sides and divide by t: < Ptf ?ft ;g >L1(Rn);L1(Rn)=< f; Utg ?gt >L1(Rn);L1(Rn) : We know that the limit as t ! 0+ exists on both sides; on the right hand side, take this limit and use (3.7) and integration by parts: < f;AKg >L1(Rn);L1(Rn)= Z f(AKg) = Z f[b?(rg)] = ? Z g[r?(bf)]dx =< ?r?(bf);g >L1(Rn);L1(Rn) : The above calculation identifles AFP, which we have already shown to exist. In fact, for all f 2 D(AFP)\C1(Rn), AFPf = ?r?(bf): Thus, we have proved 72 Theorem 3.1. The inflnitesimal generator AK of the Koopman semigroup (restricted to C0) is given by AKf = b ?rf for f 2 C1c(Rn). The inflnitesimal generator AFP of the Frobenius-Perron semigroup is given by AFPg = ?r?(bg) for continuously difierentiable g 2 D(AFP). Consider what happens in the case where we have a deterministic difierential equation with global solution ow S = fStgt?0 with a \noisy" initial value, that is, an initial value that is a nondegenerate random variable, say, X0 = X0 a.s. Then we have the initial value problem dXt = b(Xt)dt; (3.8) X0 = X0 a.s. Lemma 3.3. X := (Xt)t2R+ deflned by Xt := St ?X0 for all t 2R+ solves (3.8). Proof. The proof is easy. Obviously, the initial condition is satisfled, and further, @ @tX(t;!) = @ @tS(t;X 0(!)) = b(S(t;X0(!))) = b(Xt(!)); for any ! 2 ?. Remark: As a result of the lemma, we get the following useful equations: PXt(B) = PSt?X0(B) = (PX0)St(B) = PX0(S?1t (B)); 73 for all t 2R+ and B 2Bn. Extending this to flnite-dimensional distributions, we get P(Xt1;Xt2;???;Xtk)(B1 ?B2 ?????Bk) = PX0( k\ i=1 S?1ti (Bi)); (3.9) for all t1;t2;??? ;tk 2R+ and B 2Bn. The Frobenius-Perron semigroup P gives us a new way of understanding (3.8) if X0 has a density. Let X0 have density g, so PX0 = ?g. Recall that, for any t 2R+, we denote the distribution of Xt by PXt (so PX0 = PX0). Then PXt also has a density (since St is a difieomorphism for each t), and PXt = ?Ptg since for any A 2Bn, PXt(A) = P(St?X0)(A) = PX0(S?1t (A)) = Z S?1t (A) g(x)dx = Z A Ptg(x)dx: But the strong continuity of P allows us to use Theorem 1.2; we may set up the Cauchy problem ut = ?r?(bu) = AFPu (3.10) u(0;x) = g(x); where g is the density of X0. Solving (3.10) gives u(t;?) = Ptg(?), the density of PXt, for any t 2 R+. We call (3.10) the Liouville equation; one can interpret this physically as a conservation of mass equation (where b is a velocity fleld for a uid with density u). In summary, we have following lemma: Lemma 3.4. If X solves (3.8) and PX0 = ?g, then PXt = ?Ptg = ?u(t;?) for all t ? 0, where u is the solution of (3.10). 74 Probably the easiest example is the scalar \transport" equation _x = b, for a positive constant b, with initial condition x(0) = x0 whose solution is x(t) = St(x0) := bt + x0. If there were an entire family of degenerate initial conditions (occurring with some probabil- ity), they would all be subject to the \transporting" motion x 7! x+bt. Now, add a noisy initial condition so the transport equation becomes dXt = b dt with initial condition X0 = X0 having density g, where b is a positive constant. Then we can use the Frobenius-Perron semigroup to study the uctuation of g via the equation @ @tu(t;x) = AFPu := ?r ? (bu) with initial condition u(0;x) = g(x), which has solution Ptg(x) = g(x ? bt). This makes sense; imagine a process heavily concentrated at x0 2 R initially, so that g(x0) is a \spike". Then the process should be heavily concentrated at x0 +bt after some time t, so the density g at this time should be \spiking" at g(x0 ?bt). For the weak version of (3.8), we would be given the initial condition PX0 = ? instead of X0 = X0 a.s.; let us point out another way to determine PXt, for t ? 0, given PX0 = ?, a way that works whether ? has a density or not. Lemma 3.5. Suppose, for every x 2 Rn, that _y = b(y);y(0) = x has a global forward-in- time solution Xx with distribution Px := PXx. Given a probability measure ? on Bn, deflne P? by P?(A) := Z x2Rn Px(A)?(dx); for any A 2B(C). Then a stochastic process X = (Xt)t2R+ is a solution to the initial value problem dXt = b(Xt)dt; (3.11) 75 PX0 = ?; ifi PX = P?. Remark 1: P? is well-deflned (_y = b(y) determines a trivial convolution semigroup of measures; see [9, Lemma 8.7] or the discussion on pp. 9-10 in the section \Kernels and Semigroups of Kernels"). Remark 2: Clearly, Xxt = S(t;x) a.s. by Lemma 7, where S is the solution semi ow generated by _y = b(y), and so, PXxt = ?S(x;t) for all t 2R+, x 2Rn. Further, (Xx)x2Rn is a family of strong solutions to _y = b(y) parameterized by x 2 Rn (in the initial condition y(0) = x) such that each process Xx may live on a difierent probability space (?x;Fx;Px) for each x; however, this doesn?t matter, as all we care about is Px. Proof. Let X := (Xt)t?0 be a stochastic process on a probability space (?;F;P). We want to show that X solves (3.11) ifi PX = P?. To this end, suppose that X solves (3.11); we must show that PX = P?. Since X solves (3.11), X = (St ? X0)t?0 by Lemma 3.3 and PX0 = ?. Now let k 2N, let t1;t2;??? ;tk 2R+, let B1;B2;??? ;Bk 2Bn, and call ?t1;t2;???;tk the mapping from C := C(R+;Rn) into Rn?k given by ! 7! (!(t1);!(t2);??? ;!(tk)) Then, by deflnition of P?, P?(??1t1;t2;???;tk(B1 ?B2 ?????Bk)) = Z Rn PXx(??1t1;t2;???;tk(B1 ?B2 ?????Bk))?(dx) = Z Rn P(Xxt1;Xxt2;???;Xxtn)(B1 ?B2 ?????Bk)?(dx): 76 Now, since Xx solves (3.8) and since Xx0 = x a.s., we may apply (3.9) twice to see that the above equals Z 1(S?1 t1 (B1)\S ?1 t2 (B2)\???\S ?1 tk (Bk)) (x)?(dx) = ?(S?1t1 (B1)\S?1t2 (B2)\???\S?1tk (Bk)) = P(Xt1;Xt2;???;Xtk)(B1 ?B2 ?????Bk); and so P? = PX. For the other implication, suppose that PX = P?. Then we must show that X is a solution to (3.11). To see this, observe that X satisfles the initial condition, since PX0(B) = ?(B); for any B 2 Bn. It only remains to show that (Xt)t2R+ and (St ?X0)t2R+ have the same joint distributions, which follows similarly to the above: P(Xt1;Xt2;???;Xtk)(B1 ?B2 ?????Bk) = P?(B1 ?B2 ?????Bk) = ?(S?1t1 (B1)\S?1t2 (B2)\???\S?1tk (Bk)) = P(St1?X0;St2?X0;???;Stk?X0)(B1 ?B2 ?????Bk); thus X is a solution and the lemma holds. 77 To interpret this result, flx x1;x2 inRn and fi 2 [0;1], and call ?fi = fi?x1 +(1?fi)?x2. Let Xx1 and Xx2 denote the respective solutions to (3.8) with initial conditions X0 = x1 a.s. and X0 = x2 a.s. Then if X solves (3.11) with initial condition PX0 = ?fi, X must have distribution P?fi, where P?fi = fiPXx1 +(1?fi)PXx2: So, if X is a strong solution to (3.8) with initial condition X0 = X0, where X0 is a random variable equal to x1 with probability fi and x2 with probability 1 ? fi, then X is a modiflcation of the process ~X, where ~X = Xx1 with probability fi and ~X = Xx2 with probability 1 ? fi. In this way, one may interpret the action of the above lemma as a kind of \stochastic superposition" (not the usual \superposition principle," which says that the linear combination of solutions is also a solution, which we cannot expect unless we assume b is linear). More profoundly, this extends to even nonzero , which means it su?ces to examine degenerate initial conditions for (weak) stochastic difierential equations. We extend the above ideas to the stochastic case in the next section, emphasizing the use of the Frobenius-Perron semigroup. 78 3.3 Koopman and Frobenius-Perron Operators: The Stochastic Case We have studied (0;b) with a degenerate initial distribution, and also for a noisy initial condition which is the nondegenerate distribution of an initial random variable (with a density). We want to be non-zero now, so let us extend the notions of Koopman operator and Frobenius-Perron operator to the stochastic case and then derive extended versions of AK and AFP. As before we derive AK and exploit \duality" to obtain AFP. As in the previous section, it su?ces to study the degenerate solutions by integrating Px over x with respect to a given nondegenerate initial distribution ?; the proof involves deep probabilistic concepts (see [9, Theorem 21.10] and the preceding theorems ibidem) so we will simply state the result. Lemma 3.6. Suppose, for every x 2Rn, that dXt = b(Xt)dt+ (Xt)dBt, X0 = x a.s. has a global forward-in-time solution Xx = (Xxt )t2R+ with distribution Px := PXx. Given a probability measure ? on Bn, deflne P? by P?(A) := Z x2Rn Px(A)?(dx); for any A 2B(C). Then a stochastic process X = (Xt)t2R+ is a solution to the initial value problem dXt = b(Xt)dt+ (Xt)dBt; (3.12) PX0 = ?; 79 ifi PX = P?. Under the assumptions of Lemma 3.6, let Xx := (Xxt )t?0 denote the solution process to ( ;b)with initial conditionX0 = xa.s. and letPx be the distribution ofXx. DeflneEx to be the expectation with respect to Px. We can now deflne the stochastic Koopman operator; as before, we want something like \Utf = f ?St" for a \stochastic solution semi ow" (we can?t backsolve; the noise in the ow causes a \difiusion" ) S (which acts on random variables) and we also want U to map L1(Rn) into L1(Rn). But something must give; we are dealing with a space of Rn-valued random variables rather than Rn. So, let f 2 L1(Rn); then for flxed t 2R+ and flxed x 2Rn, f ?(Xxt ) is a bounded and Borel-measurable function that maps C(R+;Rn) into R, and so Ex(f ? Xxt ) makes sense. This leads us to the following deflnition: Deflnition 3.8. Let f 2 L1(Rn) and (Xxt ) := (Xxt )t?0;x2Rn be the family such that, for every flxed x, (Xxt ) is the canonical realization of the solution to ( ;b) with initial condition X0 = x a.s. Deflne for all t 2R+ the stochastic Koopman operator Ut : L1(Rn) ! L1(Rn) by Utf(x) := Ex(f ?Xxt ); for all f 2 L1(Rn) and x 2Rn. Let us comment on this deflnition; flrst observe that the canonical realization is nec- essary; Px (and hence Ex) lives over B(C). Next, note that it is consistent with the deterministic Koopman operator as Xt = St ?X0 = St(x) when X0 = x a.s. for a global solution ow S generated by _y = b(y), so Utf(x) reduces to f(St(x)). Next, it is obvious that Utf 2 L1(Rn) for all t 2 R+ and f 2 L1(Rn), and that Ut is nonexpansive for all 80 t 2 R+. In fact, U := fUtgt?0 restricts to a strongly continuous semigroup on C0 (see [9, Theorem 21.11]), so it has an inflnitesimal generator. We emulate the argument in the deterministic case to identify it; let n = 1 for notational ease (the argument for any n is similar). Let f 2 C2c(R) and recall that if Xx solves ( ;b) with initial condition X0 = x a.s., then Xx and f(Xx) are continuous martingales (which means we may apply Ito?s formula). For the remainder of this argument, we call Xt := Xxt for convenience. Then we have Xt = x+Rt0 b(Xs)ds+Rt0 (Xs)dBs, so apply Ito?s formula to f(Xt): f(Xt) = f(x)+ Z t 0 [b(Xs)f0(Xs)+ 12 2(Xs)f00(Xs)dt]+ Z t 0 (Xs)f0(Xs)dBs: Taking the expected value with respect to Px of both sides, we get Exf(Xt) = f(x)+Ex Z t 0 b(Xs)f0(Xs) (3.13) +12 2(Xs)f00(Xs)dt ? +Ex( Z t 0 (Xs)f0(Xs)dBs): By basic properties of Ito integrals, Ex(Rt0 (Xs)f0(Xs)dBs) = 0, so (3.13) becomes Exf(Xt) = f(x)+Ex( Z t 0 b(Xs)f0(Xs)+ 12 2(Xs)f00(Xs)dt): 81 Now, by deflnition of inflnitesimal generator, Af(x) = limt!0 E xf(Xt)?f(x) t = limt!0 E x(Rt 0[b(Xs)f 0(Xs)+ 1 2 2(Xs)f00(Xs)]dt) t = Ex(b(X0)f0(X0)+ 12 2(X0)f00(X0)) = b(x)f0(x)+ 12 2(x)f00(x): Thus we have the characterization for the inflnitesimal generator of the stochastic Koopman operator: AKf = bf0 + 12 2f00; (3.14) for any f 2 C2c(R), or, for any dimension n, for 0 ? i;j ? n, AKf = nX i=1 bi @f@x i + 12 nX i;j=1 aij @ 2f @xi@xj; (3.15) for any f 2 C2c(Rn), where (aij) = Pdk=1 ik jk (the Brownian motion is d-dimensional, for 1 ? d ? n). Note that if the noise were zero, the generator would correspond to the deterministic case, as expected. Next, we obtain the inflnitesimal generator of Frobenius-Perron operator associated to the \stochastic solution semi ow" S induced by the solution of ( ;b) in the case that S is nonsingular. We must impose here that @b@x; @ @x; and @2 @x2 exist and are bounded. 82 Naively, if something like \A?FP = AK" holds then from (3.14) and integration by parts we would get AFPf = ?(bf)0 + 12( 2f)00; (3.16) or for any dimension n, AFPf = ? nX i=1 @(bif) @xi + 1 2 nX i;j=1 @2(aijf) @xi@xj : (3.17) In fact, this is the case; for a \from scratch" proof, see [11, Theorem 11.6.1]. Now, let X solve ( ;b) with initial condition X0 = X0 a.s. and suppose X0 has density g. Then we can set up the problem @u @t = AFPu; (3.18) u(0;x) = g(x); where the AFP is as in (3.17) and the solution u(t;?) to (3.18) is the density of Xt for every t; @u@t = AFPu is called the Fokker-Planck equation. We are interested in flnding a fundamental solution to (3.18); we digress slightly to give some necessary deflnitions and notation that leads to one result that guarantees existence/uniqueness under some technical conditions. First, let us rewrite AFPu in nondivergence form: AFPu = ? nX i=1 @(bi(x)u) @xi + 1 2 nX i;j=1 @2(aij(x)u) @xi@xj (3.19) 83 = ~c(x)u+ nX i=1 ~bi(x) @u @xi + 1 2 nX i;j=1 aij @ 2u @xi@xj; where ~bi(x) = ?bi(x)+ nX i;j=1 @aij(x) @xj and ~c(x) = 12 nX i;j=1 @2aij(x) @xi@xj ? nX i=1 @bi(x) @xi : Of course, the coe?cients must be su?ciently smooth for the above to make sense; we also want them to satisfy growth conditions, namely, that there is a positive constant M such that jaij(x)j? M;j~bi(x)j? M(1+jxj);j~c(x)j? M(1+jxj2): (3.20) We know aij = aji, so given any ? = (?1;?2;??? ;?n) 2 Rn, we at least know Pn i;j=1 aij?i?j = Pn k=1( P i=1 ik(x)?i) 2 ? 0. We would like strict inequality, so let us assume that the uniform parabolicity property holds, that is, that there is a constant ? > 0 such that nX i;j=1 aij(x)?i?j ? ? nX i=1 ?2i; (3.21) for any x 2Rn and ? 2Rn. We condense the above into the following deflnition: Deflnition 3.9. Given (3.18), we say aij and bi are Cauchy-regular if they are C4 functions such that the corresponding aij, ~bi and ~c of (3.19) satisfy (3.21) and (3.20). 84 Now we recall the deflnition of a classical solution. Deflnition 3.10. Let f 2 C(Rn). We say u : R+ ?Rn ! Rn is a classical solution of (3.18) if i) for all T > 0 there are positive constants c;fi such that ju(t;x)j ? cefix2 for all 0 < t ? T, x 2Rn, ii) ut;uxi;uxi;xj are continuous for all 1 ? i;j;? n and u satisfles ut = ~c(x)u+ nX i=1 ~bi(x) @u @xi + 1 2 nX i;j=1 aij @ 2u @xi@xj; for all t > 0 and x 2Rn, and iii) limt!0 u(t;x) = f(x). We are now able to state the desired existence/uniqueness theorem: Theorem 3.2. Given (3.18), let aij;bi be Cauchy-regular and let f 2 C(Rn) satisfy jf(x)j? cefix2 with positive constants c;fi. Then there is a unique classical solution to (3.18) given by u(t;x) = R ?(t;x;y)f(y)dy, where the fundamental solution (or kernel) ?(t;x;y) is deflned for all t > 0, x;y 2 Rn, is continuous and difierentiable with respect to t, twice difierentiable with respect to xi for all 1 ? i ? n, and satisfles the equation ut = ~c(x)u+ nX i=1 ~bi(x) @u @xi + 1 2 nX i;j=1 aij @ 2u @xi@xj as a function of t and x for every flxed y. 85 Our slight digression concludes with at least one condition under which a fundamental solution exists. Now, if we are able to flnd a fundamental solution ?(t;x;y) to the Fokker- Planck equation then given any initial condition u(0;x) = g(x), where g 2 L1(Rn), we can deflne a family of operators fPtgt?0 by u(t;x) = Pt(g(x)) = Z Rd ?(t;x;y)g(y)dy; (3.22) and u is often called a generalized solution in this case (of course, g has to be continuous in order for u to be a classical solution). Deflnition 3.11. We call fPtgt?0 a stochastic semigroup if fPtgt?0 is Markovian semigroup of linear operators (on L1(Rn)) that is monotone (Ptf ? 0 when f ? 0, for all t 2R+) and norm-preserving (kPtfk = kfk when f ? 0, for all t 2R+). The proof of the next theorem can be found in ([11, pp. 369-370]). Theorem 3.3. fPtgt?0 as deflned in (3.22) is a stochastic semigroup. This theorem justifles the following deflnition: Deflnition 3.12. We call P := fPtgt?0 as deflned in (3.22) the stochastic Frobenius-Perron semigroup. Let us now consider the simple example dXt = dBt; (3.23) X0 = X0 a:s:; 86 where X0 has density g. Then the solution is a Brownian motion, and (3.18) becomes the heat equation ut = 12?u; u(0;x) = g(x); which has solution u(t;x) = ( 12?t)d2 Z Rd e?jx?yj 2 2t g(y)dy; (3.24) for x 2Rd;t ? 0. Notice that the fundamental solution ( 12?t)d2e?jx?yj 2 2t is the density of a Brownian motion, as we expect. One way to think about what happens is that, for a noiseless stochastic difierential equation with degenerate initial condition, we have a point moving through space in time governed by a ow (in essence, an ordinary difierential equation). If the initial condition is nondegenerate with a density, we may understand how the family of points evolves as a density via the partial difierential equation generated by the Frobenius-Perron operator. Now, if a stochastic difierential equation has a degenerate initial condition, we still have a point moving through space in time governed by a ow, but there is noise and we cannot actually tell where that point is; we are uctuating random variables or measures. If the measures are absolutely continuous, we may instead uctuate densities just as in the 87 previous case, which means that \deterministic partial difierential equations have the same complexity as stochastic difierential equations with degenerate initial conditions." Another interpretation for the latter case is that a point moves through space governed by a Brownian motion whose \expected ow" is described by b and whose \spread" or \intensity" is described by . For example, in (3.23), the ow is trivial, so we expect that the point stays where it started in space, but as time goes the noise may move it away. With the above interpretation, we see now that there is no difierence between ( ;b) := (1;0) with degenerate initial condition X0 = x and ( ;b) := (0;bL) with nondegenerate initial condition having a Lebesgue density g, where bL can be derived from the Liouville equation. So how much more complicated is the \mixed" case where neither nor b are zero? We can actually remove b from our consideration; this result is called the \transformation of drift" formula (so-called because b is often referred to as the \drift" term), which in our situation can be stated as follows (see [5, p. 43]): Given any x 2 Rn, let Xx solve ( ;b) with initial condition Xx0 = x a.s. Assume :Rn !Rn ?Rn and (y) has positive eigenvalues for every y. Further, let f :Rn !Rn and suppose Y xt solves ( ;b + f). Then PXxt and PY xt are absolutely continuous with respect to each other and dPY xt = exp[ Z t 0 f(Xxs)dBs ? 12 Z t 0 jf(Xxs)j2dt]dPXxt : (3.25) In particular, we could pick f such that f = ?b, and obtain a relationship between ( ;b) and ( ;0); we have already realized how ( ;0) relates to a deterministic partial dif- ferential equation (as we did in the study of (3.23)). So, in theory, one can describe the 88 dynamical systems aspects of ( ;b) in general by tracing back to ( ;0) or (0;b) (although this may be quite unwieldy). Now that we understand dynamical systems in a stochastic setting, we move to the notions of stability in a stochastic setting, deflning what the various notions of \stochastic stability" are as well as emulating Liapunov theory to demonstrate stability/instability of solutions to stochastic difierential equations. 89 3.4 Liapunov Stability We begin by recalling some notation and some basic notions of stability of deterministic dynamical systems. As we discussed in section 3.1, \Stochastic Dynamical Systems," we let b 2 C1(Rn;Rn) and consider the system _u = b(u). Given any initial value x 2 Rn there exists a largest open interval of existence Ix ?R containing 0 such that the system _u = b(u) has a unique solution ux 2 C1(Ix;Rn) with ux(0) = x. The system _u = b(u) generates a local solution ow S : D(S) ?R?Rn !Rn with D(S) := f(t;x) 2R?Rnjt 2 Ixg where S(t;x) := ux(t) for all (t;x) 2 D(S); we know D(S) is open, S is C1(D(S);Rn), and S is satisfles the group property. In what follows, we assume that S is a global solution ow. Deflnition 3.13. We say ?x is an equilibrium point of S if S(t;?x) = ?x for every t 2R. Observe that ?x is an equilibrium point of S ifi b(?x) = 0. Deflnition 3.14. An equilibrium point ?x of S is called stable if for any ? > 0; there is ?(?) > 0 such that whenever kx ? ?xk < ?, it follows that kS(t;x) ? ?xk < ? for all t ? 0. An equilibrium point that is not stable is called unstable. An equilibrium point of a system is asymptotically stable if it is stable and, in addition, there is r > 0 such that limt!1S(t;x) = ?x for all x such that kx? ?xk < r. We now recall the principle of linearized stability, which in essence extracts information about the stability of the nonlinear system from the stability of the linearized system. More speciflcally, for an equilibrium point ?u, we linearize b at ?u so that our system becomes _v = Db(?u)v, where v = u ? ?u and Db(?u) is the Jacobian matrix. It can be shown [8, 90 Theorem 9.5 and Theorem 9.7] that if Db(?u) has only eigenvalues with negative real parts, then ?u is asymptotically stable, while if any eigenvalue has positive real part, then ?u is unstable (for eigenvalues with real part 0, the linearized system is insu?cient to determine stability). Assuming that b(0) = 0, we are interested in the stability of the trivial solution u := 0; we use Liapunov theory in this situation. Deflnition 3.15. We say a C1-function V : D(V) ?Rn !R is positive deflnite if D(V) is open and contains the origin, if V(0) = 0 and if V(x) > 0 for all non-zero x. If ?V is positive deflnite, we call V negative deflnite. Deflne the orbital derivative of V to be AKV = (b ? r)V = Pni=1 bi @V@xi. We call a positive deflnite V (strictly) Liapunov if AKV(x) ? (<)0 for all nonzero x. The utility of Liapunov functions is illustrated in the following theorem, which is proven e.g. in [8, Theorem 9.12]. Theorem 3.4. If 0 is an equilibrium point of _u = b(u), and if there exists a (strictly) Liapunov function V, then 0 is (asymptotically) stable. Further, 0 is unstable if AKV > 0. Moving to the stochastic case, we generalize the concepts of stability, orbital derivative, Liapunov function, and the principle of linearized stability. Stability and orbital derivative are fairly straight forward to generalize, and Liapunov functions are only a little trickier, but unfortunately, the principle of linearized stability is quite di?cult to generalize. Recall that Xx denotes the solution to ( ;b) with degenerate initial condition X0 = x a.s.; assume global solvability, that is, assume Xx exists for every x 2 Rn. Throughout, assume that b(0) = 0 and (0) = 0, so that ( ;b) admits that trivial solution X = 0. 91 Deflnition 3.16. If for all ? > 0, we have limx!0P(sup t?0 jXxt j > ?) = 0; then we say the trivial solution X = 0 is stable in probability. In essence, this means that as x goes to 0, the probability that a path starting at x will remain in an arbitrarily prescribed neighborhood of 0 is 1. This is quite similar to the deterministic version of stability, except now when x is close to 0, the probability that Xx is also close to zero is close to 1. Deflnition 3.17. If X = 0 is stable in probability and, for every x, limx!0P(limt!1jXxt j = 0) = 1; we say X = 0 is asymptotically stable in probability. Basically, this means that as x goes to 0, the probability that a path starting at x will eventually approach 0 as time goes to inflnity is 1. Deflnition 3.18. Let ( ;b) admit a trivial solution X0 = 0. If X0 is stable in probability and, for every x, P(limt!1Xxt = 0) = 1 we say X0 is asymptotically stable in the large. Asymptotic stability in the large is the most powerful notion of stability, since the probability that any path (no matter where it starts) goes to 0 as time goes to inflnity is 1. 92 If we are to generalize the Liapunov stability theory to the above concepts, we would need to study the sign of the \stochastic orbital derivative"; to see what the \stochastic orbital derivative" is, we do a little reverse engineering. Notice that the deterministic orbital derivative takes the form of the generator of the deterministic Koopman semigroup AK, so analogously, it makes sense to think that the \stochastic orbital derivative" should take the form of the generator of the stochastic Koopman semigroup. This formally justifles the following deflnition. Deflnition 3.19. For V in C2(Rn), we deflne the stochastic orbital derivative of V to be AKV = nX i=1 bi@V@x i + 12 nX i;j=1 aij @ 2V @xi@xj; where as before, A := (aij) = (Pnk=1 ik jk). We remark that the notation \AK" as well as the stochastic generalization of orbital derivative are consistent; they reduce to the deterministic case when is 0. Now we can generalize the Liapunov theory, which parallels the deterministic case quite similarly; we remark up front that we are presenting a brief summary with some simplifying conditions and we are only operating in the time-homogeneous case, and that there are plenty of weaker assumptions and technical details behind what follows (the reader is invited to check [7, Chapter 5] for more). Deflnition 3.20. Let V : D(V) ?Rn !R, where D(V) is open and contains the origin, V(0) = 0, and V(x) > 0 for all non-zero x. Further, let V 2 C2(D(V)nf0g). We say V is a (strict) stochastic Liapunov function if AKV(x) ? (<)0 for all nonzero x. 93 Theorem 3.5. If V is a stochastic Liapunov function then X = 0 is stable in probability. Further, if the matrix A has positive eigenvalues, then X = 0 is stable in probability ifi it is asymptotically stable in probability. The proof of this theorem can be found in [7, pp. 164,168]. Asymptotic stability in the large is almost \too nice" for practical purposes; still, there are several conditions that are su?cient to guarantee it. One not surprising condi- tion is that X = 0 is asymptotically stable in the large if X = 0 is stable in probabil- ity and recurrent to the domain jxj < ? for all ? > 0 (a process Y is recurrent to A if supft ? 0 : P(Yt 2 A) = 1g = 1, else it is transient). There are stricter conditions which can be imposed on V which are of little interest to us; see ([7, Theorem 4.4, Theorem 4.5]) for those details. As far as instability goes, things are usually a little trickier. Intuitively, systems that are stable without noise may become unstable with the addition of noise. Much less intuitively, an unstable system can be stabilized by the addition of noise! We shall soon see examples of these situations, but for now, we state one su?cient condition for instability. Theorem 3.6. Let V be a stochastic Liapunov function with the exception that D(V) may not contain zero, let limx!0 V(x) = 1, and call Ur = fx 2 D(V) j jxj < rg for r > 0. If A has positive eigenvalues, then X = 0 is unstable in probability, and further, P(supt>0jXxt j < r) = 0 for all x 2 Ur. Contrast this to the deterministic case, and notice that AKV does not change sign but V is now \inversely positive deflnite," which makes the above believable. 94 Let us now look at some examples; of course, there is little to do with the trivial solutions to the transport equation or the Langevin equation, so let us move the next most complicated example. Example 3.1. Reconsider the one-dimensional equation dXt = bXtdt+ XtdBt, where b; are positive constants, with initial condition X0 = x a.s. We have already solved this explicitly, and we know its solution is Xt = xe(b?12 2)t+ Bt: We can see that when 2b < 2, the expected value of the solution decays to 0 as time goes to inflnity, so we expect that the condition 2b < 2 insures the zero solution X = 0 is stable; let us use the Liapunov theory to verify this. Pick V(x) = jxj1? 2b 2 ; V is positive-deflnite and twice-continuously difierentiable (except at 0) so we may examine AKV for nonzero x: AKV(x) = bxV0(x)+ 12 2x2V00(x); which is the same as AKV(x) = bx(1? 2b 2)jxj? 2b 2 + 12 2x2(1? 2b 2)(?2b 2)jxj? 2b 2?1: With a bit of algebra, it is clear that AKV < 0 when 2b < 2. Thus, X = 0 is asymptotically stable in probability. Computationally, this example is quite simple, and interpreting stability in this case as an \extinct population" is reasonable. However, the results may cause the reader di?culty 95 when it comes to a physical interpretation. Notice that if there is no noise, we have _x = bx, where b > 0; clearly this has an unstable trivial solution, so in this case, adding \enough" noise actually stabilizes the trivial solution. This does not jibe with our physical intuition, so for consistency?s sake the condition 2b < 2 as above is deemed \physically unfeasible" (for a further discussion of this, see [7, 173-176]). Remark: The discussion in ([7, 173-176]) will appeal to readers interested in a contrast of the Ito (left-endpoint) interpretation and the Stratonovich (midpoint) interpretation of the stochastic integral. It turns out that, under the Stratonovich interpretation, the sign of b alone determines the stability of the trivial solution. Along these lines, if \not enough" noise is added, or really, \not enough physically feasible" noise is added, then the trivial solution should remain unstable; this is indeed the case when 2b > 2. To see this, select V(x) = ?lnjxj. Then all the conditions to determine instability are satisfled, since for non-zero x, AKV = ?b+ 12 ? 0: Of course, if b is negative, the trivial solution is stable no matter what is. It is intuitive to think that any stable system will become unstable with the addition of enough noise, but in fact it depends upon the dimension of the space. We can mimic the above argument in a fairly general setting: suppose we have a system of n equations, each equation of which has a stable trivial solution. Now add noise to it so our system becomes dXt = b(Xt)dt+ XtdBt; 96 for > 0 a constant. Then picking V(x) = ?ln(jxj2), we see after several steps of calculus that AKV(x) = ?2x?b(x)jxj2 ? 2(n?2): We satisfy the hypotheses of Theorem 3.6 when n > 2, as we can pick large enough to destroy the stability of the trivial solution of the original system. Notice that if n = 2 and we take b(i)(Xt) := biXt for i = 1;2, where bi are negative constants for i = 1;2, the asymptotic stability of the system cannot be destroyed by arbitrarily large noise; let be any constant. Then there is a su?ciently small positive constant a := a( ) such that taking V(x) = jxja yields AKV(x) = ajxja?2(b1x21 +b2x22 + a 2jxj2 2 ) < 0: This means the trivial solution of the system is still asymptotically stable (in fact, asymp- totically stable in the large). Let us move to the situation where the trivial solution is stable, but not asymptotically stable. In this case, stability may be so delicate that even the slightest of noise ruins it; this is exhibited in the next example. Example 3.2. Consider the system dX1 = X2dt+ (X)dB1t; dX2 = ?X1dt+ (X)dB2t; 97 where X = (X1;X2) and B1;B2 are independent Brownian motions. In the determin- istic case, we have a stable equilibrium at zero that is not asymptotically stable. Pick V(x) = ?ln(jxj2) for x = (x1;x2); similarly to the above example this satisfles all the necessary requirements to test for instability, and we see AKV(x) = x2@V(x)@x 1 ?x1@V(x)@x 2 + 12 2(x)[@ 2V(x) @x21 + @2V(x) @x22 ]: With a bit of calculation we see AKV(x) = 0 whenever (x) is nonzero for x nonzero, which means we have instability for arbitrarily small positive noise. So we have seen simple examples where i) instability becomes stability with enough noise (although this is not \physically feasible"), ii) stability is not afiected by (arbitrarily large) noise, and iii) stability is destroyed by (arbitrarily small) noise, which shows the complicated and interesting nature of stochastic stability. Now we brie y discuss the principle of linearized stability; with the above in mind it should not be surprising that there are quite a lot of di?culties with extracting information about the full system from the linear approximation. So, what can we say about the full system if we know how its linearization acts? For one thing, the full system is stable if the linearized system has constant coe?cients and is asymptotically stable. One needs some other concepts like \exponential stability" to say more; interested readers may want to start with [7, Chapter 7]. From this point we abandon Liapunov theory in favor of the \density uctuation" type of stability theory. 98 3.5 Markov Semigroup Stability Of more practical importance to us is the use of Frobenius-Perron operators and the Fokker-Planck equations when dealing with stability of solutions to stochastic difierential equations. Let (X;A;?) be a measure space, let P := fPtgt?0 be a stochastic semigroup, and call D := ff 2 L1(X) j kfk = 1;f ? 0g the set of densities. Deflnition 3.21. We say f? 2 D is an invariant density for P (also called a stationary density) if Ptf? = f? for all t ? 0. When P is obvious, we may just say that f? is an invariant density. Deflnition 3.22. We say P is asymptotically stable if P has a unique invariant density f?, and if, for all f 2 D, limt!1kPtf ?f?kL1(X) = 0: The analog to instability is called sweeping. Deflnition 3.23. We say that P is sweeping with respect to a set A 2A if, for all f 2 D, limt!1 Z A Ptf(x)dx = 0: Given some -algebra F ? A of X, if P is sweeping for all A 2 F, then we say it is sweeping with respect to F. When the context is clear, we usually just say that a semigroup is sweeping. Of particular interest are stochastic semigroups that are kernel operators (when (X;A;?) := (Rn;Bn;?n)). 99 Deflnition 3.24. We say P is a stochastic semigroup of kernel operators (on Rn) if for any x 2Rn, t 2R+, and f 2 D, Ptf(x) = Z Rn K(t;x;y)f(y)dy; where K := K(t;x;y) :R+ ?Rn ?Rn !R+ is a (stochastic) kernel, in the sense that Z Rn Ptf(x)dx = 1: Stochastic semigroups of kernel operators will correspond to a semigroup of Frobenius- Perron operators associated to a Fokker-Planck equation having a fundamental solution; for the remainder of the section, let the hypotheses of Theorem 3.2 be satisfled (so aij and bi are Cauchy-regular for (3.18)) and call P := fPtgt?0 the stochastic Frobenius-Perron semigroup associated to (3.18). We emulate the Liapunov-type stability theory by again appealing to AK. Deflnition 3.25. Let V 2 C2(Rn) be nonnegative, let limjxj!1V(x) = 1, and let there exist constants ;? such that V(x);j@V(x)@xi j, and j@2V(x)@xi@xj j are all bounded by e?jxj, for 1 ? i;j ? n. If in addition, there exist positive constants fi and fl such that V satisfles AKV(x) ??fiV(x)+fl; then we call V Markovian-Liapunov (ML). The next theorem is quite natural; a proof can be found in ([11, Theorem 11.9.1]). 100 Theorem 3.7. P (associated to (3.18)) is asymptotically stable if there exists a ML function V. When P is asymptotically stable we can determine the invariant density u?; since u? does not change in time, then u? is the unique density that satisfles the special case of (3.18): 1 2 dX i;j=1 @2 @xi@xj(aiju?)? dX i=1 @ @xi(biu?) = 0: Next we deal with the conditions under which P is sweeping; in this context it is understood that we are considering sweeping from the family of compact subsets Bc of Rn. In other words, if for all A 2Bc and for all f 2 D, limt!1 Z A Ptf(x)dx = limt!1 Z A u(t;x)dx = 0; then P is sweeping. Deflnition 3.26. Let V 2 C2(Rn) be positive and let there exist constants ;? such that V(x);j@V(x)@xi j, and j@2V(x)@xi@xj j are all bounded by e?jxj. If in addition, there exists a positive constant fi such that V satisfles AKV(x) ??fiV(x); (3.26) then we call V a Bielecki function. The proof of the next theorem can be found in [11, Theorem 11.11.1]. Theorem 3.8. P (associated to (3.18)) is sweeping if there exists a Bielecki function V. 101 Example 3.3. One very simple example in one dimension is ( ;?bx) with initial condition X0 = X0 a.s., where X0 has density f and and b are positive constants. We have already explicitly solved this; recall that the solution is Xt = e?btX0 + Z t 0 eb(s?t)dBs: Trying to use Liapunov theory as before proves fruitless, as the trivial solution would have to have := 0. However, we can see that the expected value of this process at any time t is E(Xt) = e?btE(X0), and that the variance V(Xt) is e?2btV(X0) + 2Rt0 e2b(s?t)ds; if time goes to inflnity then V(Xt) goes to 22b and E(Xt) goes to 0. Thus we should see some kind of asymptotic stability with a limiting density exhibiting the same kind of variance; a natural guess is a Gaussian density centered at zero with variance 22b. Pick V(x) = x2; observe that V is ML since AKV(x) = 12( 2)(2)+(?bx)(2x) ??fix2 +fl is satisfled when fi := 2b and fl := 2. Hence P is asymptotically stable and the limiting density satisfles AFPu? = 0, or 1 2( 2u?(x))00 ?(?bxu?(x))0 = 0; and this has solution u?(x) = r b ? 2e ?bx2 2 : 102 Note that this is a normal density with expected value zero and variance 22b, which is consistent with our expectations. Example 3.4. To see how sweeping works, we study dXt = bXtdt + dBt with initial condition X0 = X0 a.s., whereX0 hasdensityf and andbarepositiveconstants. PickV(x) = e?kx2, for some positive constant k. To see if V is a Bielecki function we need to flnd a positive fi such that 1 2 2e?kx2[(4k2x2)+(?2k)]+bxe?kx2(?2kx) ??fiV(x): A bit of manipulation gives 2k(( 2k?b)x2 ? 2) ??fi; and we satisfy this if we take k := b 2 and fi := b . Thus the semigroup is sweeping. Roughly speaking, sweeping and asymptotic stability are the only possibilities; this is the so-called Foguel alternative ([11, Theorem 11.12.1]): Theorem 3.9. Let the hypotheses of Theorem 3.2 be satisfled, and let P be the stochastic Frobenius-Perron semigroup associated to (3.18). Suppose all stationary nonnegative solu- tions to (3.18) take the form cu?(x), where u? > 0 a.e. and c is a nonnegative constant, and call I := Z Rn u?(x)dx: (3.27) If I < 1, P is asymptotically stable; if I = 1, P is sweeping. 103 This makes sense; some normalized version of u? would be the exact limiting density, provided u? had a flnite integral. We now give a template in one dimension of how to utilize the Foguel alternative. Consider dXt = b(Xt)dt+ (Xt)dt; where a(x) = 2(x) and b(x) are Cauchy-regular. The Fokker-Planck equation takes the form 1 2( 2(x)u?(x))00 ?(b(x)u?(x))0 = 0; or, writing z(x) = 2(x)u?(x), dz dx = 2b(x) 2(x)z +c1; for c1 a constant. Then, if e Rx 0 B(y)dy makes sense, where B(y) := 2b(y) 2(y), we get, for c2 a constant, z(x) = e Rx 0 B(y)dy c2 +c1 Z x 0 e Ry 0 ?B(z)dzdy ? : We only care about the a.s. positive stationary solutions for the application of the Foguel alternative, so it is enough to examine the sign of c2 +c1Rx0 (e Ry 0 ?B(z)dz)dy for almost every x. If we assume that xb(x) ? 0 for all jxj ? r, for r a positive constant (so [?r;r] is not repelling for trajectories of _x = b(x)), then (according to Maple) Rx0 (e Ry 0 ?B(z)dz)dy ! ?1 when x ! ?1; this means z cannot be positive for every x unless c1 = 0, and thus, the 104 stationary nonnegative solutions must take the form u?(x) = 1 2(x)c2e Rx 0 B(y)dy: We now need to check whether RRu?(x)dx is flnite or not, which is the same as observing that I := Z 1 ?1 1 2(x)e Rx 0 B(y)dy (3.28) is flnite or not. If I < 1, P is asymptotically stable, and if I = 1, P is sweeping. We now summarize these results: Corollary 3.1. Assume a(x) = 2(x) and b(x) are Cauchy-regular for dXt = b(Xt)dt+ (Xt)dt and assume xb(x) ? 0 for all jxj ? r, for r a positive constant. Then if I in (3.28) is flnite, P is asymptotically stable, and if I in (3.28) is inflnite, P is sweeping. Example 3.5. Let (x) := be a nonzero constant and let b(x) = ? Kx1+x2, for K ? 0 constant. Then B(x) = ?1 2 Z x 0 2Ky 1+y2dy = ?K ln(1+x 2); and u?(x) = ce?K 2 ln(1+x2) = C(1+x2)?; where ? := ?K 2 . We see u? is integrable ifi K 2 > 12, which implies P is asymptotically stable. Also, 0 ? K 2 ? 12 implies P is sweeping. In conclusion, the origin is attracting in 105 the deterministic case, but in the stochastic case, we can calculate the critical amount of noise needed to destroy the asymptotic stability. Example 3.6. Let b; be positive constants and reconsider the equation dXt = ?bXtdt+ XtdBt; with initial condition X0 = X0 a.s. (so b(x) := ?bx and (x) := x). We have already solved this explicitly and observed that, for any degenerate initial condition X0 = x a.s., the solution will go to zero as time goes to inflnity. We also used a stochastic Liapunov function to deduce asymptotic stability. Note that we can- not apply the template; the necessary prerequisites for the template are not satisfled since a(x) = 2(x) = 2x2 is not bounded by any constant M and hence, is not Cauchy-regular. 106 3.6 Long-time Behavior of a Stochastic Predator-prey Model This is a summary of \Long-time behaviour of a stochastic prey-predator model" by Rudnicki [16]. We consider the system dXt = XtdBt +(fiXt ?flXtYt ??X2t )dt; (3.29) dYt = ?YtdBt +(? Yt +?XtYt ??Y 2t )dt; (3.30) which is a stochastic Lotka-Volterra predator-prey model. In [4], the existence of a solution to (3.29, 3.30) is proven. We interpret the (positive) constant coe?cients in the following way: fi is the growth rate of the prey in the absence of predators, fl is the \predation rate" that kills ofi the prey, and ? is inversely related to the \carrying capacity" of the prey, in that if the population grows too much, the environment cannot support further growth. We interpret as the decay rate of the predator in the absence of prey and ? as the predation rate that causes predator growth. We may also think of ? as the \reciprocal carrying capacity" of the predator. Further, we interpret ;? as \noise terms" like disease or weather uctuations that would interfere with an ideal model. Suppose that = ? = 0 in (3:29;3:30), so that we are in the deterministic case. One can compute equilibrium points: (0;0);(0;? ?);(fi?;0), and (?x; ?y), where ?x = fi? + fl?fl +?? ; 107 ?y = fi? ?? ?fl +??: We observe that (0;0) is unstable, (0;? ?) is biologically irrelevant, and (fi?;0) yields 2 cases, namely, stability for ? > fi? and instability for ? < fi?. Finally, (?x; ?y) yields 3 cases, namely, it lies in the fourth quadrant and is biologically irrelevant for ? > fi?, lies in the flrst quadrant and is asymptotically stable for ? < fi?, and lies on the x-axis for ? = fi?. So how does this relate to the stochastic case? Let us for now sacriflce technicality for intuition, and examine the terms c1 = fi? 2 2 ; c2 = + ?2 2 : These are the \stochastic versions" of fi and , respectively (which make sense; if there are very large uctuations in disease, weather, etc., then it could signiflcantly afiect birth/death rates). Then conditions like \? < (>)fi?" become \?c2 < (>)c1?." We get something analogous in Rudnicki?s Theorem 1, namely, if c1 < 0, then the prey die, and so do the predators. If c1 > 0, if we have \?c2 > c1?", the predators growth will be negative, and eventually, the predators die out; if we have \?c2 < fic1", then we obtain a \nice" result, that somehow the system reaches a desired level of stability. One can see how large noise in c1 could reduce the prey?s birth rate to below zero, and hence, cause extinction. Without this noise term or predators, the population would converge to a positive equilibrium, but with the noise term, \bad" environmental uctuations cause extinction (even with no predators!). Similarly, the predators can die if ? is too large, no matter how the prey acts. The efiects of the incorporation of the noise term is in essence a decrease in the prey?s birth 108 rate and an increase in the predator?s death rate. This is arguably a sensible reflnement, as it is a little idealistic to think that very small populations will always survive; one must expect some role to be played by the unpredictability of nature. So, equippedwiththebasicidea, weproceedtomakemoreprecisetheabovebyformally stating Rudnicki?s main theorem and outlining the strategy of the proof. First, transform (3:29;3:30) by calling Xt = e?t and Yt = e?t, so we arrive at the main system d?t = dBt +(fi? 2 2 ??e ?t ?fle?t)dt; (3.31) d?t = ?dBt +(? ? ? 2 2 +?e ?t ??e?t)dt: (3.32) Let the solution process (?t;?t) be such that the distribution of the initial value (?0;?0) is absolutely continuous with density v(x;y). Then (?t;?t) has density u(x;y;t), where u satisfles the Fokker-Planck equation: @u @t = 1 2 2@2u @x2 + ? @2u @x@y + 1 2? 2@2u @y2 ? @(f1(x;y)u) @x ? @(f2(x;y)u) @y ; (3.33) where f1(x;y) = c1 ? ?ex ? fley;f2(x;y) = ?c2 + ?ex ? ?ey, and where c1 = fi ? 12 2, c2 = + 12?2 > 0. To verify this, it must be shown that the transition probability function for (?t;?t), which we call P(t;x;y;A), is absolutely continuous with respect to Lebesgue measure for each (x;y) and t > 0. This means that the distribution of any solution is absolutely contin- uous and has density u satisfying (3.33). This allows us to proceed by studying \ uctuation 109 of densities", using advanced techniques based on the section on Markov semigroup stability (see [14] and [15]). We now state the paper?s main theorem (Theorem 1): Let (?t;?t) solve (3.31,3.32). Then for all t > 0 the distribution of (?t;?t) has a density u(t;x;y) satisfying (3.33). 1) If c1 > 0 and ?c2 < ?c1, then there is a unique density u? which is an asymptotically stable stationary solution of (3.33). This means that, no matter what the initial distribution of (?0;?0) is, (?t;?t) converges in distribution to a random variable with density u?. 2) If c1 > 0 and ?c2 > ?c1, then limt!1?t = ?1 a.s. and the distribution of ?t converges weakly to the measure with density f?(x) = C exp(2c1x 2 ?(2? 2)ex). 3) If c1 < 0, then ?t and ?t go to ?1 a.s. as t goes to 1. We outline the proof of this theorem by lemmas, introducing notation as necessary: Call Ptv(x;y) = u(t;x;y). ThenfPtg is a Markov semigroup corresponding to (3.33) (write (3.33) as @u@t = Au: Then A is the inflnitesimal generator of fPtg). Lemma 1: fPtgt?0 is an integral Markov semigroup with a continuous kernel k. In fact, k = k(t;x;y;x0;y0) 2 C1(R+?R2?R2) is the density of P(t;x0;y0;?), so that Ptv(x;y) = Z 1 ?1 Z 1 ?1 k(t;x;y;?;?)v(?;?)d?d? (3.34) is the integral representation of fPtg. The Hormander condition is verifled to prove that a density exists. We will need that k is positive to apply some \Foguel alternative type" results; the basic idea is to flnd some set that is an attractor and realize that k is positive on that set (which is all that is needed). To this end, a method based on support theorems is introduced, and we get 110 Lemma 2: For each (x0;y0) 2 E and for almost every (x;y) 2 E, there exists T > 0 such that k(T;x;y;x0;y0) > 0; where i) E =R2 if > ? or fl? ? ? , ii) E = E(M0) = f(x;y)jy < (? )x+M0g; where M0 is the smallest number such that (f1;f2)?[?; ] ? 0 for (x;y) =2 E(M0), if ? ? and fl? < ? . So, in the case of i) the invariant density u? is positive everywhere, while in the case of ii) we have a smaller support. If i) we can use the following result: If an integral Markov semigroup has only one invariant density that is a.e. positive, then the semigroup is asymptotically stable. Also, if there is no invariant density, the semigroup is sweeping from compact sets (or simply \sweeping"). However, if ii) holds, the situation is more delicate, and we must insure that, a.e., for any t > 0;f 2 D, Z 1 0 Ptfdt > 0 (3.35) in order to yield that the (integral Markov) semigroup is either asymptotically stable or sweeping (also called the Foguel alternative). In fact, in the case of ii) one can show Lemma 3: In the situation of Lemma 2 ii), limt!1 Z Z E Ptf(x;y)dxdy = 1: (3.36) Now we have Lemma 4: fPtg is either sweeping or asymptotically stable. 111 Of course, one would like to know which one is happening, so naturally the next result is Lemma 5: If c1 > 0 and ?c2 < ?c1 then fPtg is asymptotically stable. The proof of this lemma relies upon the construction of a Khasminskii function, the existence of which precludes sweeping. This yields Theorem 1 i). For Theorem 1 ii) and iii), recall that, for equation ( ;b) and its solution Xt, if we deflne s(x) = Z x 0 exp(? Z y 0 2b(r) 2(r))drdy; (3.37) then s(?1) > ?1 and s(1) = 1 implies limt!1Xt = ?1. From this fact (and a bit of ergodic theory) it is simple to derive Lemmas 6 and 7, which are Theorem 1 iii) and ii), respectively. 112 Bibliography [1] H. Amann, Ordinary Difierential Equations, de Gruyter, Berlin & New York, 1990. [2] L. Arnold, Random Dynamical Systems, Springer, Berlin & New York, 1998 [3] H. Bauer, Probability Theory, de Gruyter, Berlin & New York, 1996. [4] Chessa, S. and Fujita Y. H. (2002). The stochastic equation of predator-prey population dynamics. Boll. Unione Mat. Ital. Sez. B. Artic. Ric. Mat. 5, 789{804 [5] Friedlin, M. and Wentzell, A. Random Perturbations of Dynamical Systems, Springer, New York, 1988. [6] T. Gard, Introduction to Stochastic Difierential Equations, Marcel Dekker Inc., New York, 1988. [7] R.Z. Hasminskii, Stochastic Stability of Difierential Equations, Alphen aan den Rijn, Netherlands, 1980. [8] H. Kocak and J. Hale, Dynamics and Bifurcations, Springer-Verlag, New York, 1991. [9] O. Kallenberg, Foundations of Modern Probability, Springer-Verlag, New York, 2002. [10] I. Karatzas, and S. Shreve, Brownian Motion and Stochastic Calculus (Second edition), Springer-Verlag, Berlin & New York, 1991. [11] A. Lasota, and M. Mackey, Chaos, Fractals and Noise, Springer-Verlag, New York, 1991. [12] B. Oksendal, Stochastic Difierential Equations (Second Edition), Springer-Verlag, Berlin & New York, 1989. [13] S. Saperstone, Semidynamical Systems in Inflnite Dimensional Spaces, Springer-Verlag, New York, 1981. [14] K. Pichr and R. Rudnicki (2000). Continuous Markov semigroups and stability of transport equations. J. Math. Anal. Appl. 249 (2000), pp. 668685. [15] R. Rudnicki (1995). On asymptotic stability and sweeping for Markov operators. Bull. Polish Acad.: Math. 43 (1995), pp. 245262. [16] Rudnicki, R. (2003). Long-time behaviour of a stochastic prey-predator model. Stoch. Process. Appl. 108, 93{107. [17] I. Vrabie, C0-Semigroups and Applications, Elsevier, Boston, 2003. 113