Deep Reinforcement Learning Explained

Content of this series Towards Data Science

This is a relaxed introductory series with a practical approach that tries to cover the basic concepts in Reinforcement Learning and Deep Learning to begin in the area of Deep Reinforcement Learning.

Part 1: Introduction to Deep Reinforcement Learning

01: A gentle introduction to Deep Reinforcement Learning, Learning the basics of Reinforcement Learning (15/05/2020)

02: Formalization of a Reinforcement Learning Problem, Agent-Environment interaction in an MDP (22/05/2020)

03: Deep Learning Basics, Basic concepts for Beginners (27/05/2020)

04: Deep Learning with PyTorch, First contact with Pytorch for Beginners (01/06/2020)

05: PyTorch Performance Analysis with TensorBoard,  How to run TensorFlow for PyTorch inside Colab (03/06/2020)

06: Solving an RL Problem Using Cross-Entropy Method, Agent Creation Using Deep Neural Networks (04/06/2020)

07: Cross-Entropy Method Performance Analysis, Implementation of the Cross-Entropy Training Loop (08/06/2020)

Part 2: Classical methods for RL

08: The Bellman Equation, V-function, and Q-function Explained (11/06/2020)

09: The Value Iteration Algorithm, Estimation of Transitions and Rewards from the Agent’s experience (13/06/2020)

10: Value Iteration for V-function, V-function in Practice for Frozen-Lake Environment (14/06/2020)

11: Value Iteration for Q-function, Frozen-Lake code for Q-function (15/06/2020)

12: Reviewing Essential Concepts, Mathematical Notation Updated (12/07/2020)

13: Monte Carlo Methods, Exploration-Exploitation Dilemma (22/07/2020)

14: MC Control Methods  and Temporal-Difference Methods, Constant-alpha MC Control, Sarsa, Q-Learning (26/07/2020)

Part 3: Deep Q-Networks and Policy Methods

15: Deep Q-Network – I: Open AI Gym and Wrappers (16/08/2020)

16: Deep Q-Network – II:  Experience Replay & Target Network (16/08/2020)

17: Deep Q-Network – III: Performance  & Use (16/08/2020)

18: Policy-based Methods, Hill Climbing algorithm (07/09/2020)

19: Policy-Gradient Methods, REINFORCE algorithm (10/09/2020)

20: Reinforcement Learning Frameworks, Solving CartPole Environment using RLlib on Ray framework  (27/09/2020)



How did this series start?

I started to write this series during the period of lockdown in Barcelona. Honestly, writing these posts in my spare time helped me to #StayAtHome because of the lockdown. Thank you for reading this publication in those days, it justifies the effort I made.

Disclaimers —  These posts were written during this period of lockdown in Barcelona as a personal distraction and dissemination of scientific knowledge, in case it could be of help to someone, but without the purpose of being an academic reference document in the DRL area. If the reader needs a more rigorous document, the last post in the series offers an extensive list of academic resources and books that the reader can consult. The author is aware that this series of posts may contain some errors and suffers from a revision of the English text to improve it if the purpose were an academic document. But although the author would like to improve the content in quantity and quality, his professional commitments do not leave him free time to do so. However, the author agrees to refine all those errors that readers can report as soon as he can.

Our research in DRL

Our research group at UPC Barcelona Tech and Barcelona Supercomputing Center is doing research on this topic. Our latest paper in this area is “Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills” presented at the 37th International Conference on Machine Learning (ICML2020) The paper presents a novel paradigm for unsupervised skill discovery in Reinforcement Learning. It is the last contribution of @vcampos7, one of our Ph.D. students co-advised with@DocXavi. This paper is co-authored with @alexrtrott, @CaimingXiong, @RichardSocher from Salesforce Research.

About BSC and UPC

The Barcelona Supercomputing Center (BSC) is a public research center located in Barcelona. It hosts MareNostrum, a 13.7 Petaflops supercomputer, which also includes clusters of emerging technologies. In June 2017, it ranked 13th in the world.

The Polytechnic University of Catalonia (Universitat Politècnica de Catalunya), currently referred to as BarcelonaTech, and commonly known as UPC, is the largest engineering university in Catalonia, Spain. It also offers programs in other disciplines such as mathematics and architecture.

About Towards Data Science

Towards Data Science provides a platform to exchange knowledge through Medium. I thank Towards Data Science very much for accepting the publication of my contributions, which allowed me to be one of the top writers in Artificial Intelligence in Medium.