# Reinforcement Learning Reinforcement learning is mode seeking as opposed to distribution matching [listen here](https://overcast.fm/+aYlPxjgs0/1:21:55) ### Quick Intuition * A **policy** is just a probability distribution that our agent uses to select an **action** * In order to guide your policy in the direction you would like we can use **reward shaping**. Note that reward shaping can be [very challenging](https://youtu.be/JgvyzIkgxF0?t=742). ### Good Resources * Learning with sparse rewards [Reinforcement Learning with sparse rewards - YouTube](https://www.youtube.com/watch?v=0Ey02HT_1Ho) --- Date: 20231217 Links to: Tags: References: * [Deep Reinforcement Learning Doesn't Work Yet](https://www.alexirpan.com/2018/02/14/rl-hard.html)