Value Methods

Value methods, like Q-learning, are vitally important to reinforcement learning. Learn more about them with in-depth practical notebooks.

Eligibility Traces

Phil Winder, Oct 2020

Eligibility traces implement n-Step methods on a sliding scale. They smoothly vary the amount that the return is projected, from a single step up to far into the future. They are implemented with tracers which remember where the agent has been in the past and update them accordingly. They are intuitive, especially in a discrete setting.

N-Step Methods

Phil Winder, Oct 2020

Another fundamental algorithm is the use of n-Step returns, rather than single step returns in the basic Q-learning or SARSA implementations. Rather than just looking one step into the future and estimating the return, you can look several steps. This is implemented in a backwards fashion, where you should travel first then updates the states you have visited. But it works really well.

Delayed Q-learning vs. Double Q-learning vs. Q-Learning

Phil Winder, Oct 2020

Delayed Q-learning and double Q-learning are two extensions to Q-learning that are used throughout RL, so it’s worth considering them in a simple form. Delayed Q-learning simply delays any estimate until there is a statistically significant sample of observations. Slowing update with an exponentially weighted moving average is a similar strategy. Double Q-learning includes two Q-tables, in essence two value estimates, to reduce bias.This notebook builds upon the Q-learning and SARSA notebooks, so I recommend you see them first.

Q-Learning vs. SARSA

Phil Winder, Oct 2020

Two fundamental RL algorithms, both remarkably useful, even today. One of the primary reasons for their popularity is that they are simple, because by default they only work with discrete state and action spaces. Of course it is possible to improve them to work with continuous state/action spaces, but consider discretizing to keep things rediculously simple. In this workshop I’m going to reproduce the cliffworld example in the book. In the future I will extend and expand on this so you can develop your own algorithms and environments.