Markov chains

A Markov chain is a sequence of states where each transition depends only on the current state — not on how you got there. This “memoryless” property makes them tractable, and they appear throughout physics, biology, economics, and computer science.

π(A) stationary

60.0%

π(B) stationary

40.0%

|λ₂| (mixing)

0.50

near 1 = slow mix

P(A→A) 0.80 P(B→B) 0.70 steps 300 runs 10

Dashed line: stationary π(A) = (1−P(B→B)) / (2−P(A→A)−P(B→B)). All runs start in state A and converge — faster when |λ₂| is small (high mixing).

What to notice

All runs start in state A (fraction = 1) and converge to the dashed stationary line.
The stationary distribution $π_{A} = \frac{1 - P ( B \to B )}{2 - P ( A \to A ) - P ( B \to B )}$ depends only on the transition probabilities, not on where you started.
|λ₂| near 1 (sticky transitions, like P(A→A) = 0.99): slow convergence — the chain stays near its starting state for a long time. |λ₂| near 0: fast mixing — just a few steps to forget the starting state.

The stationary distribution

For a 2-state chain, the stationary distribution satisfies the balance equations: the probability flux from A to B equals the flux from B to A. This gives:

π_{A} = \frac{1 - P ( B \to B )}{2 - P ( A \to A ) - P ( B \to B )}

For larger chains, the stationary distribution π is the left eigenvector of the transition matrix corresponding to eigenvalue 1. It always exists and is unique when the chain is irreducible (every state reachable from every other).

Mixing time

The speed of convergence is controlled by the second-largest eigenvalue $∣ λ_{2} ∣$ . The spectral gap 1 − |λ₂| determines how quickly the chain forgets its starting state. Small spectral gap = slow mixing = many steps needed to reach equilibrium. This is why PageRank (a Markov chain on the web graph) converges slowly on tightly clustered link structures.

Connection to the Law of Large Numbers

The fraction of time a Markov chain spends in state A converges to π(A) with probability 1 — an ergodic theorem analogous to the Law of Large Numbers for i.i.d. samples. Unlike i.i.d. averaging, the chain’s samples are correlated, but the time average still converges to the space average.