Learn about the use of statistics in reinforcement learning through a collection of practical notebooks.

Kullback-Leibler Divergence

Oct 2020

Kullback-Leibler divergence is described as a measure of “suprise” of a distribution given an expected distribution. For example, when the distributions are the same, then the KL-divergence is zero. When the distributions are dramatically different, the KL-divergence is large. It is also used to calculate the extra number of bits required to describe a new distribution given another. For example, if the distributions are the same, then no extra bits are required to identify the new distribution.

Importance Sampling

Phil Winder, Oct 2020

Importance Sampling Importance sampling provides a way to estimate the mean of a distribution when you know the probabilities, but cannot sample from it. This is useful in RL because often you have a policy which you can generate transition probabilities from, but you can’t actually sample. Like if you had an unsafe situation that you couldn’t repeat; you could use importance sampling to calculate the expected value without repeating the unsafe act.