# Markov Decision Process
We can start by thinking about a [Markov Chain](Markov-Chain.md), which includes a **state space** and a **transition operator**. However, we must realize that using only an MC we cannot specify a decision making problem because there is no notion of actions.
The inclusion of actions was a much more recent invention, created in the 1950s. The MDP adds a few more objects to the Markov Chain. It includes a **state space** and an **action space**. This means that our graphical model now contains both states and actions. And our transition probabilities are conditioned on the states and actions. Note that $T$ is still an operator, but now it is a **tensor**.

We also have a reward function. This is a mapping from the cartesian product of state and action space in to the real numbers. This is what allows us to define an objective for reinforcement learning.

### Partially Observed MDP
We can now introduce the idea of a partially observed MDP, adding in two additional objects: an **observation space** and an **observation probability**.

---
Date: 20211207
Links to: [Reinforcement Learning (old)](Reinforcement%20Learning%20(old).md)
Tags:
References:
* []()