Deep Reinforcement Learning Explained

Content of this series

Below the reader will find the updated index of the posts published in this series (I will keep updated it as I write new posts). I would like to finish this series towards the end of the year if I have the time and energy necessary 😉

Part 1: Essential concepts in Reinforcement Learning and Deep Learning

01: A gentle introduction to Deep Reinforcement Learning (15/05/2020)

02: Formalization of a Reinforcement Learning Problem (22/05/2020)

03: Deep Learning Basics (27/05/2020)

04: Deep Learning with PyTorch (01/06/2020)

05: PyTorch Performance Analysis with TensorBoard(03/06/2020)

06: Solving an RL Problem Using Cross-Entropy Method(04/06/2020)

07: Cross-Entropy Method Performance Analysis (08/06/2020)

08: The Bellman Equation(11/06/2020)

09: The Value Iteration Algorithm (13/06/2020)

10: Value Iteration for V-function (14/06/2020)

11: Value Iteration for Q-function(15/06/2020)

Part 2: Implementation of Reinforcement Learning classical methods

12: Reviewing Essential Concepts from Part 1(12/07/2020)

13: Monte Carlo Methods & Exploration-Exploitation Dilemma (22/07/2020)

14: MC Control Methods  and Temporal-Difference Methods (26/07/2020)

15: Deep Q-Network – I: Open AI Gym and Wrappers (16/08/2020)

16: Deep Q-Network – II:  Experience Replay & Target Network (16/08/2020)

17: Deep Q-Network – III: Performance  & Use (16/08/2020)

18: Policy-based Methods (Hill Climbing algorithm) (07/09/2020)

19: Policy-Gradient Methods (REINFORCE algorithm) (10/09/2020)

More posts soon!




This is an introductory series with a practical approach that tries to cover the essential concepts in Reinforcement Learning and Deep Learning to begin in the area of Deep Reinforcement Learning.

How did this series start?

I started to write this series during the period of lockdown in Barcelona. Honestly, writing these posts in my spare time helped me to #StayAtHome because of the lockdown. Thank you for reading this publication in those days, it justifies the effort I made. Since it has attracted readers’ interest, I will try to continue this series as I find free time.

Our research in DRL

Our research group at UPC Barcelona Tech and Barcelona Supercomputing Center is doing research on this topic. Our latest paper in this area is “Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills” presented in the 37th International Conference on Machine Learning (ICML2020) The paper presents a novel paradigm for unsupervised skill discovery in Reinforcement Learning. It is the last contribution of @vcampos7, one of our Ph.D. students co-advised with@DocXavi. This paper is co-authored with @alexrtrott, @CaimingXiong, @RichardSocher from Salesforce Research.

About BSC and UPC

The Barcelona Supercomputing Center (BSC) is a public research center located in Barcelona. It hosts MareNostrum, a 13.7 Petaflops supercomputer, which also includes clusters of emerging technologies. In June 2017, it ranked 13th in the world.

The Polytechnic University of Catalonia (Universitat PolitĂšcnica de Catalunya), currently referred to as BarcelonaTech, and commonly known as UPC, is the largest engineering university in Catalonia, Spain. It also offers programs in other disciplines such as mathematics and architecture.

About Towards Data Science

Towards Data Science provides a platform to exchange knowledge through Medium. I thank Towards Data Science very much for accepting the publication of my contributions, which allowed me to be one of the top writers in Artificial Intelligence in Medium.