Markov-Decision-Process

# Markov Decision Process We can start by thinking about a [Markov Chain](Markov-Chain.md), which includes a **state space** and a **transition operator**. However, we must realize that using only an MC we cannot specify a decision making problem because there is no notion of actions. The inclusion of actions was a much more recent invention, created in the 1950s. The MDP adds a few more objects to the Markov Chain. It includes a **state space** and an **action space**. This means that our graphical model now contains both states and actions. And our transition probabilities are conditioned on the states and actions. Note that $T$ is still an operator, but now it is a **tensor**. ![](Screen%20Shot%202021-12-07%20at%207.18.17%20AM.png) We also have a reward function. This is a mapping from the cartesian product of state and action space in to the real numbers. This is what allows us to define an objective for reinforcement learning. ![](Screen%20Shot%202021-12-07%20at%207.22.47%20AM.png) ### Partially Observed MDP We can now introduce the idea of a partially observed MDP, adding in two additional objects: an **observation space** and an **observation probability**. ![](Screen%20Shot%202021-12-07%20at%207.24.25%20AM.png) --- Date: 20211207 Links to: [Reinforcement Learning (old)](Reinforcement%20Learning%20(old).md) Tags: References: * []()