Markov chains

Markov Chain Monte Carlo (MCMC)

Suppose we want to sample from a complex probability distribution that can be factorized as where .

Recall the random walk algorithm that we discussed for deciding satisfiability for SAT, where we moved along the edges of the Boolean hypercube looking for a satisfying assignment by picking any violated clause and flipping the truth assignment of a randomly chosen variable in that clause to make that clause satisfiable.

We will present a random walk Monte Carlo method which moves around the Boolean hypercube in a special way such that the random walk will visit points with a frequency proportional to the corresponding probability . In other words, we will spend more time in areas assigned higher probability by the model, and as such, we will draw more samples wherever is high. To do so, we will construct a Markov chain whose limiting distribution is the desired target distribution . More specifically, we want to design the transition matrix of the Markov chain so simulating the Markov chain for a sufficiently large number of steps will give us samples from . We will prove that, under some conditions, this property will hold no matter how we initialize the Markov chain (regardless of which state we start in).

Markov chains - introduction

Let the state space be the set of vertices of the Boolean hypercube. Since the number of possible states is , for large , the state space cannot be enumerated completely, and the transition matrix cannot be represented explicitly. We will to find a way to simulate from the Markov chain without representing the Markov chain explicitly.

The Markov chain is completely defined by the initial state and the transition matrix , where is the probability of transitioning from state to , since and and more generally . For example, the initial distribution can be uniform, , or initialized in a single state, .

Our goal is to design the transition matrix so that, regardless of the initial state , we will have converge to the target probability distribution after a sufficiently long time.

Let us first look at the eigenvectors of , any vector satisfying for some . If we have a basis of eigenvectors , then we can take any initial probability distribution and easily determine . Then after a large number of time steps , we will have . In general, it is hard to say anything about the spectral properties of an arbitrary matrix. We know is a row-stochastic matrix, . But we want to put further restrictions on to analyze the eigenvectors.

We will analyze Markov chains which are strongly connected (for any 2 states , there exists a path from to ). This is a natural assumption since no matter what state we start in, we want to be able to reach the target distribution in the limit.

The key assumptions we will make are that the Markov chain is irreducible (all states communicate with each other: for all , for some , meaning you can go from any state to any other state for a large enough ) and the Markov chain is aperiodic (there exists such that for all , ).

is a stationary probability distribution if . In general, stationary probability distributions are not unique. The irreducibility condition guarantees that the stationary distribution is unique. For an example of what happens when we do not have irreducibility, consider the case of two nodes with self-loops with .

is the limiting distribution if for every initial probability distribution , . The aperiodicity condition is necessary for the limit to exist. (Comment: This is a technical condition. In practice, we might not need aperiodicity.) For an example of what happens when we do not have aperiodicity, consider .

Markov chains - proof of convergence

We will prove that if the Markov chain is irreducible and aperiodic, then there exists a stationary distribution, the stationary distribution is unique, and the Markov chain will converge to the stationary distribution (note the Perron-Frobenius theorem).

If the Markov chain is irreducible and aperiodic, then the Markov chain is primitive ( such that ). tells us the probability of going from state to state in exactly steps. To see this, note that if the Markov chain is irreducible, it means we can go from any node to any other node in a finite number of steps (possibly depending on ). We want independent of . To do this, we can add self-loops which are guaranteed to exist by the aperiodicity condition.

Without loss of generality, let us consider the case where . We will prove that as for a where all of the rows of are identical.

The intuition is that the stochastic matrix is doing averaging. The biggest and smallest elements of the vector get closer to each other because of the weighted average (contraction). Eventually, the vector will converge to a constant.

Let be a transition probability matrix with no zero entries. Let be the smallest entry of .

Let be a vector with largest component and smallest component . Similarly, define to be the largest component and to be the smallest component of .

We will show that . To prove this bound, let us consider a “worst-case” vector which will give us the tightest bound. Consider a vector which is everywhere except at the spot corresponding to the entry with in . Thus, we see that the biggest can be is . Similarly, the smallest can be is . Subtracting the two inequalities gives the desired bound.

Let be an arbitrary vector. Let us study the sequence .

Since the sequences are bounded and monotonic, and must converge.

We also know that

If , then and , so as , and so we have . Therefore, , a vector where all of the entries in are the same.

In particular, we can let . , a vector where all of the entries in are the same. This concludes the proof that as for a where all of the rows of are identical. The interpretation of this limit is that regardless of the initial state we start in, the limiting probability distribution is the same. If is any probability distribution, then where is any row of .

We will now show that is a stationary distribution of . Let . Since , by taking limits on both sides, we get . Row-wise, . We also see that is a left eigenvector of with an eigenvalue of 1.

Markov chains - Volodymyr Kuleshov