# Reinforcement Learning: Compared to Supervised and Unsupervised Learning A TLDR of RL compared to SL could be: > Reinforcement Learning *evaluates actions taken* rather than *instructing* with the *correct actions*. In supervised learning, we can tell the agent the *exact* correct situation for every state. This will hopefully generalize to states not seen in the training set. In RL, generally we have a much *sparser* reward signal and do not know the correct action for every state, but we receive rewards based on a series of states and actions. Another way to characterize this is as: * **Supervised Learning**: The agent has to learn how to act from labeled examples provided by an external supervisor and generalize them to other situations * **Unsupervised Learning**: Concerns finding structure hidden in collections of unlabeled data. * **Reinforcement Learning**: An interactive problem in which the agent must learn behavior through trial-and-error interactions with a dynamic environment. Must deal with the exploration and exploitation trade off. A concrete example is easily seen if we compare image classification and dynamic pricing. In image classification, we imagine that a photo of a dog is always a dog. Our agent predicting that it is a dog does not change that. In other words, the underlying distribution that we are trying to learn is static, it does not change over time. Our agent predicting dog also doesn't impact the training data that it sees. It simply sees a snap shot of training data from a certain historical period of time. However, consider the situation of dynamic pricing. Our agent in this case is providing a price, which a customer can then chose to purchase. From this signal our agent learns a pricing model. Notice that in this case our agents actions directly influence the training data/signal that it has to learn from. If it always decides to price an item between 5 and 10 dollars, it will have no data about how customers would respond if it was outside that range. And this brings us to the [exploration-exploitation](Exploration-Exploitation.md) trade off. In RL problems, agents must balance focusing on exploration and exploitation. This is all to say that fundamentally what makes RL different from SL and USL is: > The agent's actions could affect the future state of the environment and their effects cannot be fully predicted, hence the environment must be monitored frequently by the agent who should react to any situation. --- Date: 20211129 Links to: [Reinforcement Learning (old)](Reinforcement%20Learning%20(old).md) Tags: References: * [Fantastic Course from UC Berkeley](https://www.youtube.com/watch?v=HUzyjOsd2PA&list=PL_iWQOsE6TfURIIhCrlt-wj9ByIVpbfGc&index=5)