Deep Reinforcement Learning

Workshops and notebooks about deep reinforcement learning (DRL).

Batch Constrained Deep-Q Learning on the CartPole Environment Using Coach

Phil Winder, Oct 2020

Batch-constrained deep Q-learning (BCQ) provides experience in a different way. Rather than feeding the raw observations to the buffer-trained agent, BCQ trains another neural network to generate prospective actions using a conditional variational auto-encoder. This is a type of auto-encoder that allows you to generate observations from specific classes. This has the effect of constraining the policy by only generating actions that lead to states in the buffer. It also includes the ability to tune the model to generate random actions by adding noise to the actions, if desired.

Rainbow on Atari Using Coach

Phil Winder, Oct 2020

Following on from the previous experiment on the Cartpole environment, coach comes with a handy collection of presets for more recent algorithms. Namely, Rainbow, which is a smorgasbord of improvements to DQN. These presets use the various Atari environments, which are de facto performance comparison for value-based methods. So much so that I worry that algorithms are beginning to overfit these environments. This small tutorial shows you how to run these presets and generate the results.