On Time-varying Markov Systems
In this article I consider a discrete time Markov system which has a transition matrix that is not constant over time. The derivation of the forward and backward Chapman-Kolmogorov equations is then shown.
In Markovian systems, the transition matrix determines the behaviour of the system and the Chapman-Kolmogorov equation is important as it allows us to associate the transition probability matrix, which may not be constant, changes the system across time and allows us to move through the system forwards and backwards in time, exploring the system behaviour fully.
Such systems are often encountered in real-world processes. For example, the transition probability of commuters on a train system will depend on the time of the day and day of the week.
The Markov system has $N$ states. $\Phi(m, n)$ is the multi-time step transition matrix between times $m$ and $n$ (its elements are $\phi_{ij}(m, n)$). A single time step transition matrix at time $n$ is represented by the matrix $P(n)$ (its elements are $p_{ij}(n)$).
Most of the content in this post can be found in Dynamic Probabilistic Systems: Markov Models (Volume 1) Chapter 9.
Say $\mathrm{Pr}(S_{nj}~|~S_{mi})$ is the probability that state of the system at time $n$ is $j$ given that it is in state $i$ at time $m$. We can write
$$\begin{aligned} \mathrm{Pr}(S_{nj}~|~S_{mi}) =& \sum^N_{k=1}\mathrm{Pr}(S_{nj}S_{rk}~|~S_{mi}), & m\leq r\leq n \end{aligned}$$
Since $m\leq r \leq n$, the above equation simply states that $\mathrm{Pr}(S_{nj}~|~S_{mi})$ is the marginal distribution of the joint distribution between $S_{nj}$ and $S_{rk}$, given $S_{mi}$, over all intermediate states $S_{rk}$.
Applying the Chain rule to the above equation, we have
$$\begin{aligned} \mathrm{Pr}(S_{nj}~|~S_{mi}) =& \sum^N_{k=1}\mathrm{Pr}(S_{rk}~|~S_{mi})\mathrm{Pr}(S_{nj}~|~S_{rk}S_{mi}) &\\ =& \sum^N_{k=1}\mathrm{Pr}(S_{rk}~|~S_{mi})\mathrm{Pr}(S_{nj}~|~S_{rk}), &m\leq r\leq n \end{aligned}$$
Note that in the second row, we get the simplification of $\mathrm{Pr}(S_{nj}~|~S_{rk}S_{mi}) \rightarrow \mathrm{Pr}(S_{nj}~|~S_{rk})$ because the system is Markov and $m\leq r$.
Since $\phi_{ij}(m, n) = \mathrm{Pr}(S_{nj}~|~S_{mi})$, we can re-write the above into the Chapman-Kolmogorov equation as
$$\begin{aligned} \phi_{ij}(m, n) = & \sum^N_{k=1} \mathrm{Pr}(S_{rk}~|~S_{mi})\mathrm{Pr}(S_{nj}~|~S_{rk}) \\ =& \sum^N_{k=1}\phi_{ik}(m, r)\phi_{kj}(r, n). \end{aligned}$$
If we take a closer look at the final form of $\phi_{ij}(m, n)$, we can see that the multi-step transition probability from state $i$ to state $j$ between times $m$ and $n$ can be broken down into transition probabilities of two time intervals $(m, r)$ and $(r, n)$, where $m\leq r \leq n$, summing over all possible states.
One way of looking at the Chapman-Kolmogorov equation is to think of it as summing over all the paths the Markov process can take from state $i$ at time $m$ through time $r$ and then to state $j$ at time $n$ over all possible intermediate states that could occur at time $r$.
In matrix form, the Chapman-Kolmogorov equations are
$$\Phi(m,n) = \Phi(m,r)\Phi(r,n),$$
with the boundary condition of $\Phi(n, n)=I$.
At this point, the result of the Chapman-Kolmogorov equation might seem intuitive and even obvious. One might say, "of course the transition probability over some time interval can be constructed from the transition probabilities of intermediate time intervals!". But what is intuitive need not be correct or be open to being stated mathematically. It is important to note that throughout the derivation, we did not assume that $\Phi_{ij}$ need be the same over any two time points. Also, it is nice to see that such an intuitive result can be derived from basic principles.
Diving a little bit deeper, we can recursively divide any time interval into smaller sub-intervals. Suppose we choose $r=n-1$. This would result in the forward Chapman-Kolmogorov equation.
$$\Phi(m,n) = \Phi(m, n-1)\Phi(n-1, n),$$
where $\Phi(n-1, n)$ is simply the one-step transition probability matrix $P(n-1)$. This is the forward equation because the dissection of the time interval $(m, n)$ starts at the front end closer to $n$.
Similarly, we can have the backward Chapman-Kolmogorov equation.
$$\Phi(m,n) = \Phi(m, m+1)\Phi(m+1, n).$$
If we recursively apply either the forward or backward equations (or both at the same time), then we get
$$\Phi(m, n) = \Phi(m, m+1)\Phi(m+1, m+2)\cdots \Phi(n-2, n-1)\Phi(n-1, n).$$
This tells us that the multi-step transition probability is the matrix multiplication of the constituent transition probabilities over every sub interval between $m$ and $n$.
Again, an intuitive result. "Of course it is!" says the skeptic. But the fact that the result is intuitive only makes it more beautiful.