Reinforcement-Learning-Algorithms

# Reinforcement Learning Algorithms There are many different Reinforcement Learning algorithms, but they all share the same high level anatomy. They consist of three basic parts: 1. **Generate samples** $\rightarrow$ RL is about learning through trial and error in some sense. The trial part of trial and error means attempting to run your policy in your environment, meaning it will actually interact with your Markov Decision Process and collect samples. Samples in this case are *trajectories*. Generally the trajectory is that given by our policy. 2. **Fit a model** $\rightarrow$ This simply means that we are going to try and estimate *something* about our current policy. How is your policy doing, how is it performing, what kind of rewards is it attaining? 3. **Improve the policy** $\rightarrow$ This generally consists of doing something so that the better trajectories we sampled have a higher probability. ![RL-algorithms-basic-1|400](Screen%20Shot%202021-12-07%20at%208.05.09%20AM.png) ### Simple Example ![](Screen%20Shot%202021-12-07%20at%208.10.31%20AM.png) ### RL by backprop ![](Screen%20Shot%202021-12-07%20at%208.12.12%20AM.png) ![](Screen%20Shot%202021-12-07%20at%208.15.47%20AM.png) --- Date: 20211207 Links to: [Reinforcement Learning (old)](Reinforcement%20Learning%20(old).md) Tags: References: * []()