Deep Reinforcement Learning

Batch Constrained Deep-Q Learning on the CartPole Environment Using Coach

Phil Winder, Oct 2020

Batch-constrained deep Q-learning (BCQ) provides experience in a different way. Rather than feeding the raw observations to the buffer-trained agent, BCQ trains another neural network to generate prospective actions using a conditional variational auto-encoder. This is a type of auto-encoder that allows you to generate observations from specific classes. This has the effect of constraining the policy by only generating actions that lead to states in the buffer. It also includes the ability to tune the model to generate random actions by adding noise to the actions, if desired.

Rainbow on Atari Using Coach

Phil Winder, Oct 2020

Following on from the previous experiment on the Cartpole environment, coach comes with a handy collection of presets for more recent algorithms. Namely, Rainbow, which is a smorgasbord of improvements to DQN. These presets use the various Atari environments, which are de facto performance comparison for value-based methods. So much so that I worry that algorithms are beginning to overfit these environments. This small tutorial shows you how to run these presets and generate the results.

DQN and Q-Learning on the CartPole Environment Using Coach

Phil Winder, Oct 2020

The Cartpole environment is a popular simple environment with a continuous state space and a discrete action space. Nervana Systems coach provides a simple interface to experiment with a variety of algorithms and environments. In this workshop you will use coach to train an agent to balance a pole.