An Autonomous Remote Control Vehicle With Reinforcement Learning

A practical workshop demonstrating how to train a remote control vehicle with reinforcement learning.

Reinforcement learning is designed to solve tasks which require complex sequential decision making. Learning to control and drive an autonomous vehicle is one such complex problem. In this workshop I present a somewhat simplified version of the problem with a simulation of a vehicle. You can use this simulation to train an agent to drive a car.

The coolest part of this experiment is the use of a variational auto-encoder to build a model of the world from experimental data. To do that you basically race the car manually, to gather a big load of observational data, then train the VAE upon that. Then you take the hidden neurons from the middle of the VAE and use that as the state. This dramatically speeds up policy training, because it doesn’t need to learn a representation. At the expense of training the VAE of course, but at least you only have to do that once.

Resulting policies are shown below for two of the environments.

donkey-generated-roads-v0 donkey-generated-track-v0

Professional Help

If you’re doing something like this in your business, then please reach out to Our experts can provide no-nonsense help that I guarantee will save you time.


This is another quite complicated project that involves multiple codebases and custom installs. The premise is based upon the DonkeyCar, a simple autonomous vehicle with an RGB camera for observation. There is also an accompanying simulator built in the Unity framework.

To install the prerequisites:

  1. Download the Unity DonkeyCar simulation for your operating system from: gym-donkeycar
  2. Git clone this repository: git clone
  3. Create a Python 3.7 virtual environment and python install

I used stable_baselines 2 for the SAC implementation, which unfortunately means you have to use Tensorflow 1.5, which means you have to use Python 3.7. I use poetry to install the dependencies.


With the provided hyperparameters the DonkeyCar should learn a reasonable policy within 5000 environment steps (approximately 10 minutes on my macbook). If you leave it for longer it should be able to solve the basic tracks.

  1. Start the donkey_sim binary and leave it at the main page.
  2. Choose a saved VAE model to match the track you want to train upon.
  3. python

See the --help for configuration options.

This process will continue until you CTRL+C at which point it will dump the model to a file. This takes anywhere from five minutes to one hour on my 2014 MacBook Pro, depending on how decent you want your policy.


  • New Hyper-Parameters?: Alter the hyperparameters and run again.
  • New Tracks?: Pass a different VAE model (each environment needs a new VAE) and environment ID to the train function.

Variational Auto-Encoder

This project uses a VAE as the state representation mechanism. To train a new VAE, you first need to generate some training data by following these steps:

  1. Launch the donkey_sim application.
  2. Click on the Log dir button and set the log dir. (This is PATH_TO_IMAGES_DIR below)
  3. Launch a track.
  4. Click on the [METHOD] w Rec option.
  5. Watch the log count at the bottom left. You want about 10,000 observations with as much diversity as possible. Bonus marks if you get your kids to perform this step! :-D

Now you have the training data, you can train the VAE. Run the (minimal) command:

  1. python --training_data_dir=${PATH_TO_IMAGES_DIR}

This will continue until you CTRL+C at which point the model will be saved to vae.pkl. Tensorboard logs will be saved in the log directory alongside example reconstructed images created during training. For more configuration, see the help.

This will need about 10,000 training steps to produce a half decent representation. This takes about an hour on my 2014 MacBook pro.


View performance metrics by running:

tensorboard --logdir ./${LOG_DIR}

And browse to http://localhost:6006

Visualising the VAE Reconstruction

During the VAE training you can see reconstructed images. But I also created a function to generate a video of what the agent sees during training.

python --environment_id=donkey-generated-track-v0 --vae_path=pretrained-models/vae/vae-donkey-generated-track-v0-32.pkl --model_path=pretrained-models/policy/ --monitoring_dir=monitoring


This was heavily inspired by several people. I was looking for an example like that provided VAE-based state representation like in David Foster’s excellent book Generative Deep Learning. Then I came across Antonin Raffin’s repo which pretty much nailed a simple VAE example. That was the starting point for all this code, plus code from Antonin’s other project stable-baselines. Thank you Antonin.

But this wouldn’t nearly be as cool if Tawn Kramer hadn’t created the DonkeyCar Unity simulation, so thank you to him too.