Markov Chains
Arnoldo Frigessi 1
Bernd Heidergott 2
Markov chains are stochastic models which play an important role in many applications in areas as diverse as biology, finance, and industrial production. Roughly speaking, Markov chains are used for modeling how a system moves from one state to another in time. Transitions between state are random and governed by a conditional probability distribution which assigns a probability to the move into a new state, given the current state of the system. This dependence represents the memory of the system. A basic example of a Markov chain is the so-called on the non-negative integers, defined as follows. Let , for , be a sequence of random variables with initial value . Assume that and for some given value of . The sequence is an example of a Markov chain (for the definition see below). There are many aspects of the chain one can be interested in, including (i) whether returns to in a finite number of steps (this holds for ), (ii) the expected number of steps until the chain returns to (which is finite for ), and (iii) the limiting behavior of as .
A few examples help illustrate the important concepts. A useful model in modeling infectious diseases assumes that there are four possible states: Susceptible (S), Infected (I), Immune (A), Dead (R). Possible transitions are from S to I, S or R; from I to A or R; from A to A or R; from R to R only. The transitions probabilities, from S to I, S to R and the loop S to S, must sum to one and can depend on characteristics of the individuals modeled, like age, gender, life style, etc. All individuals start in S, and move at each time unit (say a day). Given observations of the sequence of visited states (called trajectory) for a sample of individuals, with their personal characteristics, one can estimate the transition probabilities, by , for example. This model assumes that the transition probability at time from one state A to state B only depends on the state A and not on the trajectory that leads to A. This might not be realistic, as for example, remaining in the diseased state I over many days could increase the probability of transition to R. It is possible to model a system with longer memory, and thus leave the simplest setting of a Markov Chain (though one can formulate such a model still as a Markov Chain over a more complex state space which includes the length of stay in the current state). A second example refers to finance. Here we follow the daily value in Euro of a stock. The state space is continuous, and one can model the transitions from state Euro to Euro with an appropriate Normal density with mean . The time series of the value of the stock might well show a longer memory, which one would typically model with some autoregressive terms, leading to more complex process again. As a further example, consider the set of all web pages on the Internet as the state space of a giant Markov chain, where the user clicks from one page to the next, according to a transition probability. A Markov chain has been used to model such a process. The transitions from the current web page to the next web page can be modeled as a mixture of two terms: with probability the user follows one of the links present in the current web page and among these uniformly; with probability the user chooses another web page at random among all other ones. Typically . The longterm fraction of visits of the Markov chain to each state can then be interpreted as a ranking of the web page, as done basically by the Google page ranking algorithm. Again, one could discuss how correct the assumption is that only the current web page determines the transition probability to the next one. Finally, Markov Chain Monte Carlo (MCMC) algorithms are Markov chains, where at each iteration, a new state is visited according to a transition probability that depends on the current state. These stochastic algorithms are used to sample from a distribution on the state space, which is the distribution of the chain in the limit, when enough iterations have been performed. The modeler has to critically validate Markov and homogeneity hypothesis before trusting results based on the Markov chain model, or chains with higher order of memory. In general a has the Markov property if the probability to enter a state in the future is independent of the states visited in the past given the current state.
In the literature the term Markov processes is used for Markov chains for both discrete- and continuous time cases, which is the setting of this note. Standard textbooks on Markov chains are [3,4,5,6]. In this paper we follow [2] and use the term 'Markov chain' for the discrete time case and the term 'Markov process' for the continuous time case. General references on Markov chains are [7,8,9,10,11].
Consider a sequence of random variables defined on a common underlying probability space with discrete state space , i.e., technically, is -measurable for . The defining property of a Markov chain is that the distribution of depends on the past only through the immediate predecessor , i.e., it holds that
where and all other are elements of the given state space . If does not depend on , the chain is called homogenous. Provided that is at most countable, the transition probabilities of a homogeneous Markov Chain are organised into a matrix , where is the probability of a transition from to . The matrix is called the one-step transition probability matrix of the Markov chain. For the random walk example above, the transition matrix is given by , , for , and otherwise zero. In order to fully define a Markov Chain it is necessary to assign an initial distribution . The marginal distribution at time can then be computed as
where denotes the element of the -th power of the transition matrix. Given an initial distribution and a transition matrix the distribution of the Markov chain is uniquely defined.
A Markov chain is said to be aperiodic if for each pair of states the greatest common divisor of the set of all such that is one. The random walk in our introductory example fails to be aperiodic as any path from starting in and returning there has a length that is a multiple of 2.
A distribution is called a stationary distribution of if
A key topic in Markov chain theory is the study of the limiting behavior of . has limiting distribution for initial distribution if
 |
(1) |
Any limiting distribution is a stationary distribution. A Markov chain is called ergodic if the limit in (1) is independent of the initial distribution. Consequently, an ergodic Markov chain has a unique limiting distribution and this limiting distribution is also a stationary distribution If fails to be aperiodic, then the limit in (1) may not exists but can be replaced by the Cesaro limit
which always exists for finite Markov chains.
A Markov chain is called irreducible if for any pair of states , there exists a path from to that can follow with positive probability. In other words, any state can be reached from any other state with positive probability. An irreducible Markov chain is called recurrent if the number of steps from a state to the first visit of a state , denoted by , is almost surely finite for all , and it is called positive recurrent if for at least one . Note that for the random walk is recurrent and for it is positive recurrent.
Any aperiodic, irreducible and positive recurrent Markov chain possesses a unique stationary distribution which is the unique probability vector solving (and which is also the unique limiting distribution). This ergodic theorem is one of the central results and it has been established in many variations and extensions, see the references. Efficient algorithms for computing are available, but fail for very large state-spaces.
An important topic is the estimation of the (one-step) transition probabilities. Consider a discrete time, homogeneous Markov chain with finite state space , observed at time points on the trajectory . We wish to estimate the transition probabilities by maximum likelihood. The likelihood is
where is the number of transitions from to in the observed trajectory. Ignoring the initial factor, the maximum likelihood estimator of is found to be equal to , where is the number of transitions out from state . Standard likelihood asymptotic applies, despite the data are dependent, as , which will happen if the chain is ergodic. The asymptotic variance of the maximum likelihood estimates can be approximated by var . The covariances are zero, except cov for . If the trajectory is short, the initial distribution should be considered. A possible model is to use the stationary distribution , which depend on the unknown transition probabilities. Hence numerical maximization is needed to obtain the maximum likelihood estimates. In certain medical applications, an alternative asymptotic regime can be of interest, when many ( ) short trajectories are observed, and . In this case the initial distribution cannot be neglected.
Let denote the (continuous time) Markov process on state space with transition matrix , i.e.,
Under some mild regularity conditions is holds that the generator matrix , defined as
exists for . The stationary distribution of a Markov process can be found as the unique probability that solves , see [1]. A generator matrix is called uniformizable with rate if . While any finite dimensional generator matrix is uniformizable a classical example of a Markov process on denumerable state space that fails to have this property is the M/M/ queue. Note that if is uniformizable with rate , then is uniformizable with rate for any . Let be uniformizable with rate and introduce the Markov chain as follows
![$\displaystyle [ P_\mu ]_{ i j } = \begin{cases}q_{ i j }/ \mu & i \not = j \\ 1+ q_{ i i }/ \mu & i = j , \end{cases}$ $\displaystyle [ P_\mu ]_{ i j } = \begin{cases}q_{ i j }/ \mu & i \not = j \\ 1+ q_{ i i }/ \mu & i = j , \end{cases}$](http://statprob.com/cache/objects/254/l2h/img126.png) |
(2) |
for , or, in shorthand notation,
then it holds that
 |
(3) |
Moreover, the stationary distribution of and coincide. The Markov chain with transition probability matrix is called the sampled chain. The relationship between and can be expressed as follows. Let denote a Poisson process with rate , then and are equal in distribution for all . From the above it becomes clear that the analysis of the stationary behavior of a (uniformizable) continuous time Markov chain reduces to that of a discrete time Markov chain.
- 1
- W. Anderson: Continuous-time Markov Chains, An Applications Oriented Approach, Springer, New York, 1991.
- 2
- M. Iosifescu: Finite Markov Processes and Their Applictions, Wiley, New York, 1980.
- 3
- M. Kijima: Markov Processes for Stochastic Modelling, Chapman & Hall, London, 1997.
- 4
- S. Meyn and R. Tweedie:
Markov Chains and Stochastic Stability, Springer, London, 1993.
- 5
- E. Nummelin:
General Irreducible Markov Chains and Non-negative Operators, Cambridge Univesity Press, Cambridge, 1984.
- 6
- D. Revuz:
Markov Chains Second Edition, Noth-Holland, Amsterdam, 1984.
- 7
- W. Feller: An Introduction to Probability Theory and its Applications. Vol. I. Third edition. Wiley, New York, 1968.
- 8
- W. Gilks, S. Richardson and D. Spiegeihalter (editors): Markov Chain Monte Carlo in Practice, Chapman & Hall, London, 1995.
- 9
- O. Haeggstroem: Finite Markov Chains and Algorithmic Applications, London Mathematical Society Student Texts (No. 52), 2002.
- 10
- J. Kemeny and J. Snell: Finite Markov Chains, originally published by Van Nostrand Publishing Company, 1960, Springer Verlag, 3rd printing, 1983.
- 11
- E. Seneta: Non-negative Matrices and Markov Chains, originally published by Allen & Unwin Ltd., London, 1973 Springer Series in Statistics, 2nd rev. ed., 2006.
Footnotes
-
1
- Department of Biostatistics, University of Oslo & Norwegian Computing Centre, Oslo, Norway
-
2
- Deptartment of Econometrics, Vrije Universiteit, Amsterdam, The Netherlands
|