|
Large deviations
Francis Comets
Université Paris Diderot, France
http://www.proba.jussieu.fr/ comets/
Large deviations is concerned with the study of rare events and of small probabilities. Let be independent identically distributed (i.i.d.) real random variables with expectation , and their empirical mean. The law of large numbers shows that, for any Borel not containing in its closure, as , but does not tell us how fast the probability vanishes. Large deviations state it is exponential in , and give us the rate of decay. Cramér's theorem states that,
as , for all interval . The rate function can be computed as the Legendre conjugate of the logarithmic moment generating function of ,
and is called the Cramér transform of the common law of the 's. The natural assumption is the finiteness of the moment generating function in a neighborhood of the origin, i.e., the property of exponential tails. The function is convex with .
In the Gaussian case , we find ;
In the Bernoulli case , we find the entropy function for , and otherwise.
To emphasize the importance of rare events, let us mention a consequence, the Erdös-Rényi law: consider an infinite sequence , of Bernoulli i.i.d. variables with parameter , and define the length of the longest consecutive run, contained within the first tosses, in which the fraction of 1's is at least ( ). Erdös and Rényi proved that, almost surely as ,
with the function from the Bernoulli case above. Though it may look paradoxical, large deviations are at the core of this event of full probability. This result is the basis of bioinformatics applications like sequence matching, and of statistical tests for sequence randomness.
The theory does not only apply to independent variables,
but allows for many variations, including weakly dependent variables in a general state space, Markov or Gaussian processes, large deviations from ergodic theorems, non-asymptotic bounds, asymptotic expansions (Edgeworth expansions), ....
Here is the formal definition. Given a Polish space (i.e., a separable complete metric space) , let be a sequence of Borel probability measures on , let be a positive sequence tending to infinity, and finally let be a lower continuous functional on which level sets are compact for all . We say that the sequence satisfies a large deviation principle with speed and rate , if for each measurable set 
where and denote respectively the closure and interior of The rate function can be obtained as
with the ball of center and radius . Large deviation theory allows for an abstract version of Laplace method for estimating integrals: Varadhan's lemma states that, for any continuous function with
(a bounded is fine), we have
and the sequence of probability measures concentrates on the set of maximizers.
Sanov's theorem and sampling with replacement: let be a probability measure on a set that we assume finite for simplicity, with for all . Let , an i.i.d. sequence with law , and the score vector of the -sample,
By the law of large numbers, a.s. From the multinomial distribution, one can check that, for all such that is a possible score vector for the -sample,
where is the relative entropy of with respect to . The large deviations theorem holds for the empirical distribution of a general -sample, with speed and rate given by the natural generalization of the above formula. This result, due to Sanov, has many consequences in information theory and statistical mechanics [2,5], and for exponential families in statistics. Applications in statistics also include point estimation (by giving the exponential rate of convergence of -estimators) and for hypothesis testing (Bahadur efficiency) [7], and concentration inequalities [2].
Consider now a . For simplicity we assume that it is irreducible with a finite state space . We denote by the transition matrix, and for any , . By Perron-Frobenius theorem, as , with the principal eigenvalue of the positive matrix . By the ergodic theorem, converges to the (unique) invariant law for . The law of the empirical distribution satisfes a large deviation principle with speed and rate given by
for any law on .
Consider next a . We assume it is irreducible on the finite state space , and denote by the transition rate from to ( for , ). Then, similarly to the time-discrete case, the law of the empirical distribution satisfes a large deviation principle with speed and rate given by
with the principal eigenvalue of the matrix . Now, assume in addition that the process is reversible with respect to a probability measure , i.e. for all . Then, is the invariant measure for the process and for all . Using the variational formula for eigenvalues of symmetric operators, Donsker and Varadhan found that the rate function takes a simple form in the reversible case,
that is the value of the Dirichlet form of the reversible process on the square root of the density of with respect to the invariant measure.
The Freidlin-Wentzell theory deals with diffusion processes with small noise,
The coefficients are uniformly lipshitz functions, and is a standard Brownian motion. The sequence can be viewed as as a small random perturbation of the ordinary differential equation (ODE)
Indeed, in the supremum norm on bounded time-intervals. Freidlin and Wentzell have shown that, on a finite time interval , the sequence with values in the path space obeys the LDP with speed and rate function
if is absolutely continuous with square-integrable derivative and ; otherwise. (To fit in the above formal definition, take a sequence , and for the law of .) Note that if and only if is a solution of the above ODE. To follow closely any other path during a finite time is an event of probability at leading order, and therefore is very rare for small .
A simple case is and with a smooth ; we view as the height of , and a key role is played by the local minima of .
With an overwhelming probability as , the picture will be as follows in a generic situation. The process will stay close to the solution of the ODE starting from , and will eventually come near the local minimum (say ) which attracts , and stay around for times of order . But, by ergodicity, it will leave the neighborhood of at some time, and, even more, it will visit all points: it is then important how these large deviations occur. Let be domain of attraction of , be its depth (i.e., the height difference between and the lowest point on the boundary of , that we call the lowest pass). Up to times of order , the process remains in the part of of relative height smaller than , and will occupy this region with density proportional to . At some random time of order , it will leave through the lowest path, and fall down towards a new local minimum, following roughly a path of the ODE. The piece of path just before leaving is the time-reversed of an ODE path. The ratio of to its expected value converges to an exponential law.
The Freidlin-Wentzell theory has applications in physics (metastability phenomena, analysis of rare events) and engineering (tracking loops, statistical analysis of signals, stabilization of systems and algorithms) [6,1,2,9].
Acknowledgement: This article is based on an article from Lovric, Miodrag (2011), International Encyclopedia of Statistical Science. Heidelberg: Springer Science Business Media, LLC
- 1
- Azencott, R. Grandes déviations et applications. (French) Eighth Saint Flour Probability Summer School--1978, 1-176, Lecture Notes in Math. 774, Springer, Berlin, 1980
- 2
- Dembo, Amir; Zeitouni, Ofer: Large deviations techniques and applications. Springer, New York, 1998.
- 3
- Deuschel, J.-D., Stroock, D. Large deviations. Academic Press, Inc., Boston, MA, 1989
- 4
- Feng, J., Kurtz, T. Large deviations for . American Mathematical Society, Providence, RI, 2006
- 5
- den Hollander, Frank: Large deviations. American Mathematical Society, Providence, RI, 2000.
- 6
- Freidlin, M. I.; Wentzell, A. D.: Random perturbations of dynamical systems. Springer-Verlag, New York, 1998.
- 7
- Kester, A.: Some large deviation results in statistics. CWI Tract, 18. Centrum voor Wiskunde en Informatica, Amsterdam, 1985.
- 8
- Kipnis, C., Landim, C.: Scaling limits of interacting particle systems. Springer-Verlag, Berlin, 1999
- 9
- Olivieri, Enzo; Vares, Maria Eulália: Large deviations and metastability. Cambridge University Press, 2005.
- 10
- Varadhan, S. R. S.: Large deviations. Ann. Probab. 36 (2008), 397-419
|