Research Base - Nate's Notes

# Reinforcement Learning - Research Base ### Queue (Technical) 1. RLHF 1. [https://openai.com/research/instruction-following](https://openai.com/research/instruction-following) 2. [https://arxiv.org/pdf/2203.02155.pdf](https://arxiv.org/pdf/2203.02155.pdf) 3. [https://huyenchip.com/2023/05/02/rlhf.html](https://huyenchip.com/2023/05/02/rlhf.html) 4. [Fetching Title#e92h](https://www.youtube.com/watch?v=2MBJOuVq380) 2. RL 1. [DeepMind x UCL | Deep Learning Lecture Series 2021 - YouTube](https://www.youtube.com/playlist?list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm) 2. [DeepMind x UCL | Reinforcement Learning Course 2018 - YouTube](https://www.youtube.com/playlist?list=PLqYmG7hTraZBKeNJ-JE_eyJHZ7XgBoAyb) 3. [AlphaGo - Mastering the game of Go with deep neural networks and tree search | RL Paper Explained - YouTube](https://www.youtube.com/watch?v=Z1BELqFQZVM) 4. [DeepMind's AlphaGo Zero and AlphaZero | RL paper explained - YouTube](https://www.youtube.com/watch?v=0slFo1rV0EM) 5. [MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | RL Paper explained - YouTube](https://www.youtube.com/watch?v=mH7f7N7s79s) 6. [From AlphaGo to MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model - YouTube](https://www.youtube.com/watch?v=lVMgxtm5L-U&t=146s) 7. [Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained) - YouTube](https://www.youtube.com/watch?v=tjbEVY5XIk0) 8. [Fetching Title#by2p](https://www.youtube.com/watch?v=wc-FxNENg9U) 4. RL Implementation 1. [Deep Q Learning is Simple with PyTorch | Full Tutorial 2020 - YouTube](https://www.youtube.com/watch?v=wc-FxNENg9U) 2. [Machine Learning with Phil - YouTube](https://www.youtube.com/@MachineLearningwithPhil) 3. [Modern Reinforcement Learning: Deep Q Learning in PyTorch | Udemy](https://www.udemy.com/course/deep-q-learning-from-paper-to-code/) 4. [Modern Reinforcement Learning: Actor-Critic Algorithms | Udemy](https://www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/) 5. [Curiosity Driven Deep Reinforcement Learning | Udemy](https://www.udemy.com/course/curiosity-driven-deep-reinforcement-learning/) 6. [MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | RL Paper explained - YouTube](https://www.youtube.com/watch?v=mH7f7N7s79s) 5. Evolution 1. [Tweet / Twitter](https://twitter.com/carperai/status/1678516615879745536) / [OpenELM Paper & 0.9 Release | CarperAI](https://carper.ai/openelm-paper-0-9-release/) / [OpenELM/OpenELM_Paper.pdf at paper · CarperAI/OpenELM · GitHub](https://github.com/CarperAI/OpenELM/blob/paper/OpenELM_Paper.pdf) 2. [First Explore, then Exploit](https://arxiv.org/pdf/2307.02276.pdf) 6. MuZero actually *learns an environment model* 1. ([MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model - YouTube](https://www.youtube.com/watch?v=We20YSAJZSE)) 2. [MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | RL Paper explained - YouTube](https://www.youtube.com/watch?v=mH7f7N7s79s)