I’ve heard the term “Markov chain” a couple of times but never had a look at what it actually is. Probably because it sounded too impressive and complicated.
It turns out it’s not as bad as it sounds. In fact the whole idea is pretty intuitive and once you get a feeling of how things work it’s much easier to get your head around the mathematics.
Andrey Markov was a Russian mathematician who studied stochastic processes (a stochastic process is a random process) and specially systems that follow a suite of linked events. Markov found interesting results about discrete processes that form a chain.
Let’s start with an example before looking at the maths! Imagine that we are at a supermarket and we are looking at the number of people that queue up at the till. The number of people in queue is the state of our system. There can be 0 person in the queue, or 1, or 2, … or 10 … or 20 …
The system can change from one state to the other depending if a person arrived in the queue or someone finished checkout and left the queue, or nothing changed.
We can model this system like so:
If we generalise this example we can say that a Markov chain is composed of a set of states \(S={s_1,s_2,…,s_m}\) and at every time step the system can move from one state to another according to a transition model \(T\). Therefore a Markov chain is defined by:
- A set of states \(S={s_1,s_2,…,s_m}\)
- An initial state \(s_0\)
- A transition model \(T(s,s’)\)
A Markov chain observe a very important property: The next state depends only of the current state. That means that the next state doesn’t depend on the past. It doesn’t depend on how many people were in the queue 2 or 3 or 10 times before. The only thing that matters is the current number of people in the queue.
This is expressed by the transition model \(T(s,s’)\) which means that the next state \(s’\) depends only of the current state \(s\).
This is a very important property which means that if you want to model your process as a Markov chain you need to define your state very carefully as it should include everything you need to predict the next state.
Let’s take another example and see how we can model it as a Markov chain. I go to the gym a few times a week and I usually do 2 kind of sessions:
- Cardio workout
- Strength workout
In this case our model has 3 states:
- Cardio workout
- Strength workout
- Rest day
How de we move from one state to another? Well if I did a strength session I’ll be pretty exhausted so I am most likely to have a rest day the day after. If I had a rest day I can a cardio or a strength workout, or being lazy and take another rest day … nothing is certain it’s all a matter of probability, which gives us this kind of diagram
Now that we know how to transition from one state to the other we can answer interesting questions like:
- Given that I had a rest day how many chances are there that I’ll train tomorrow ?
- Given that I had a cardio workout today what will I do in 2 days time (or 3 or 10 or 100) ?
- Given that I had a strength workout today what is the probabilities than do a strength workout in 2 days (or 10 or 100) ?
Knowing the probabilities for the next day is straightforward but computing the probabilities over several steps is more interesting as several paths are possible:
We can then sum up the probabilities of all the possible paths to answer the question: “Given that today is a rest day, what is the probability that I’ll rest again in 2 days time?”.
As you can see it gets tedious pretty quickly. In fact to compute this kind of probabilities we define the transition model as a \(m * m\) matrix where \(T_{ij}\) is the probability to move from state \(s_i\) to state \(s_j\). We can then compute the probabilities in \(n\) steps by computing \(T^n\).
If we do it we numpy it looks like this:
import numpy as np T = np.array([ [0.4, 0.3, 0.3], [0.5, 0.2, 0.3], [0.7, 0.2, 0.1] ]) T_2 = np.linalg.matrix_power(T, 2) T_3 = np.linalg.matrix_power(T, 3) T_10 = np.linalg.matrix_power(T, 10) T_50 = np.linalg.matrix_power(T, 50) T_100 = np.linalg.matrix_power(T, 100) // start in state "Rest" v = np.array([[1.0, 0.0, 0.0]]) print(" v_1: " + str(np.dot(v,T))) print(" v_2: " + str(np.dot(v,T_2))) print(" v_3: " + str(np.dot(v,T_3))) print(" v_10: " + str(np.dot(v,T_10))) print(" v_50: " + str(np.dot(v,T_50))) print("v_100: " + str(np.dot(v,T_100)))
and it produces the following output:
v_1: [[ 0.4 0.3 0.3]] v_2: [[ 0.52 0.24 0.24]] v_3: [[ 0.496 0.252 0.252]] v_10: [[ 0.50000005 0.24999997 0.24999997]] v_50: [[ 0.5 0.25 0.25]] v_100: [[ 0.5 0.25 0.25]]
In this case the probabilities converges to a steady state (only the probabilities converge not the states which keep changing randomly) but it’s not always the case depending on the structure of the chain.
In fact the structure of the chain demonstrates interesting properties. By looking only at the structure of the graph we can tell if the probabilities will converge or not and if the initial state matters or not.
To understand if the initial state matters or not we need to define 2 kind of states:
- Recurrent state
- Transient state
A recurrent state is a state for which whatever the transitions you make there is always a path to go back to that initial state.
A transient state is a state that is not recurrent – there exists a path for which it’s not possible to go back to the initial state.
Then if we group all the connected recurrent state into groups we can say that the initial state doesn’t matter as long as there is only 1 recurrent group in the chain (no matter where you start from you’ll always end up into a state within this group). If there is more than 1 recurrent group the initial state matters.
Then the convergence property is a bit trickier to observe because the system may oscillate between 2 (or more) sets of states. In this case the system is said to be periodic. A system is periodic if it exists some set of states (not necessarily connected) for which you always from one set to another (there is no way to stay within the same set).

It means that as long as a state is connected to itself the chain is not periodic (because it exists a transition that stays in the same set of states).
Hopefully this post gave you a good feeling of what a Markov chain is. I didn’t really got into the mathematics involved beyond this but now that you have a good intuition of how it works it should be less painful.
The MIT provides very detailed courses on Markov chains:
https://www.youtube.com/watch?v=IkbkEtOOC1Y
Markov chain are used in wide variety of domains – basically everywhere you need planning – from computer science (buffer / queue) to HR to management …