Frequently Asked Questions

This page presents a collection of excellent questions I have been asked. If you have a question about RL, the answer's probably here.

RL Usage

Is RL Better than ML?

“Why is reinforcement learning better than other forms of machine learning?”

Better depends on your application. Testing, experimentation and evidence will prove whether it’s better. But in general, any application that involves multi-step decision making could be improved by RL. ML makes one-shot decisions which are unlikely to be optimal in the long run.

See page 5 in the book.

Examples of Using RL in Production Use Cases

“Do you know of anyone that are using RL in production?”

So, there’s a variety of things that I’ve heard. Some public, some not. Let me try and recall some:

  • Covariant AI demoed a super cool RL-driven pick and place robot.
  • I’ve spoken to engineers that have used RL to improve their recommendations.
  • I’ve spoken to leaders that have deployed RL as part of a continuous-learning strategy for their ML models.
  • I spoke to another leader that managed to reduce the size of the ML team running their core recommendations algorithm by using RL.

And there’s loads of use cases reported in the papers. But of course, whether you call that production or not depends on what they are doing. Many are pure research. But lots are for research on current production systems. For example, this one from the YouTube team

More on and in the book.

Of course if you know anyone that wants to develop production RL algorithms, let me know. :wink:

Replacing Teams of Data Scientists with RL

“Can you talk more about the story where RL replaced a team of Data Scientists?”

This is not another clickbait “we all won’t have jobs next year”. :smile: No I can’t I’m afraid, it’s not public knowledge.

To summarise, consider:

a) a team of 10+ highly educated, very expensive smart people tweaking neural network architectures and running massive expensive experiments (for example). This is what large tech companies do to solve heavily used, data-intensive systems.


b) an RL algorithm, with a decent reward function, that trains itself over the long term, to solve the actual business metric that the business is keen on improving. RL can easily match and with effort surpass the performance of that team quite quickly.

To be clear, the engineering challenge doesn’t go away, it shifts. Now these people are curators. Guardians of the RL algorithm that is actually doing the number crunching. There’s still a lot of engineering work that goes into building a system like that, but it’s not pure data science any more.

I’m being intentionally vague and speculative here, but you can see it happening.

How is Deploying RL Different to Deploying ML?

“How are they different from “classical” ML and DL models? What are the typical tools for training and deploying?”

First, bare in mind that there isn’t much industrial experience of running RL in production, yet. It’s not like ML, where there’s now years worth or experience to leverage. But I can speculate.

One of the key issues with RL is state. By definition the MDP loop is constantly evolving. New observations, new models, new actions. In particular, if you’re running an algorithm which is actively learning (most, but not all implementations), which means that the underlying state of the model (the trained parameters) are changing ALL the time.

One of the definitions of “modern” software is immutability and software that is free of side effects. By definition, an actively learning RL algorithm is mutable and most definitely has side effects!

So over the next few years I predict that there is going to be industrial research (i.e. new frameworks/blog posts/presentations/etc.) into how to run mutable RL algorithms in a robust way. I imagine that under the hood there will be a strategy to do some kind of checkpointing to make it pseudo-immutable.

On the training side, there’s loads. I can’t keep up. I did a review a long time ago and I’ve been meaning to update it. Take your pick.

On the deployment side, less so. Many of the frameworks above have some kind of serving mode, but I get the impression that most people have to roll their own serving infrastructure and tooling.

Simplifying RL Problems

“You noted that many industrial applications could be solved with something as simple as tabular Q-learning. I was wondering if you could elaborate on that with some examples?”

If you’re talking about “many problems can be solved with simple algorithms”, then yes, there are many problems with low hanging fruit, that can be solved with simple algorithms. This comes down to a trade off between business value and technical difficulty. If it’s valuable, and easy, then that’s the problem you should solve first. If it’s less valuable, but still very easy, then that might still be prioritised higher because it’s easy to solve.

Relating “easy” to RL, what I mean is that the state and action space is simple and the reward obvious. Most of the time you can simplify the problem too. Bite off a smaller chunk of the problem. For example, imagine you were Amazon and you were trying to create RL algorithms to restock your warehouse. Yes, you could try and view the warehouse as the environment and the products as part of the state, but that would be a massively complex problem. Instead, why not say that a box is an environment, and the singular items in that box is the state. That’s much easier to solve.

See what I mean? We’d have to get into domains to be more specific than that.

RL for Auto-ML

“I am curious as to why RL techniques are not widely used as a means to improve on supervised learning problems”

Why? I guess it’s some complex combination of attention, media, ease of use, advice, reading, media and something with the word OpenAI or Google in the name. :smile:

I mean, it’s there, it’s possible. Maybe it’s just waiting for someone to wrap it or market it better than the last person? Hint hint, nudge nudge. If you have a spare 6 months on your hands. :slightly_smiling_face:

To be fair there are things out there already. For example I’ve used Optuna for hyperparameter optimisation, which has an RL solution in there. But they’re not selling the fact that it’s using RL. They’re selling the fact that it automatically does hyperparameter tuning for you.

Same with Kubeflow’s Katib. That has an RL mode too.

That’s the thing about engineering in general. People don’t care how the sausage is made. It’s the product that counts. And it’s why UI engineers take all the glory!

Simulations of Business Use Cases

“What would be an example of environment with which one can experiment at home? I have neither robotic hand at home nor trading partners willing to make biding wars. The card/text/video games are covered in much detail in the books. It will be more interesting to play with something resembling a commercial use case.”

The world really is your oyster here. You can create your own in a domain that you want more experience in (that’s a great way to gain experience). Or you can search through the thousands of gyms other people have created.

For example:

RL In Industry

What use cases that are currently solved by ML, better solved by RL?

“What are the most common use cases in industry where problems are framed as supervised learning (or ranking) problems, but you would reframe them as RL problems?”

Really great question and one that deserves a much more comprehensive and evidence-based answer.

But, if I had to try and fit it in a chat window….

I’d summarise the dilemma by reminding you of the Markov Decision Process (MDP - page 35 of the book).

If you have an environment that has state that can be mutated, if it can be observed, if you can alter the state through your agent’s actions, and if you have a business problem that where it pays to move the environment into a certain state, then by definition you have an RL problem.

To the first part of your question, common use cases masquerading as supervised ML… Any recommendations task. I think that’s broad enough for you! I would suggest that the vast majority of cases where people use recommendations are optimising for the wrong thing. The goal is to help the user find things as easily as possible so that they value the functionality and keep coming back/buying unnecessary plastic stuff. A standard solution (I’m grossly simplifying here) would build a model, in a supervised manner, that maps user intent to products, quantified by click through rate or something.

That’s entirely the wrong metric. You could use RL and train over full customer lifecycles. You could train on raw profit. Or the amount of time individual users spend on the site. Or whatever is most applicable for your problem. So the action is the recommendation (lots of research available on this). The environment is user and possibly the business/products. The observation is the product catalogue, user demographics, past history, information, the weather, etc. The reward is customer lifetime value or whatever.

Look up any of the RL recommendations papers for an academic argument as to why RL is better suited.

Is RL Mandatory in certain fields?

“Is reinforcement learning considered a crucial approach in robotics (or do you have an opinion on its use for this)?”

Crucial. Hmmm. Depends on how you define the word. I wouldn’t say it’s CRUCIAL, in capital letters, no. You can create perfectly adequate solutions using simple stuff like PID controllers and inverse kinematics.

The threshold is complexity. Once you need to do something remotely complex, like more complex than just “move to coordinates x,y” or as soon as it involves a non-trivial number of interacting components, then yes, RL is probably necessary. But I think that’s missing the point slightly. The great thing about RL is the interface. The MDP. It’s a way of defining problems, not solutions. And it can be applied to any project, simple or complex. If the interface is the same then you can use the same processes, the same techniques to solve a wide variety of problems. It scales from simple to mind-bendingly complex, very few ML techniques and say the same.

For example, if you worked for a robotics company and you sold a bomb-disposal robot and a floor-cleaning robot, you’d have to develop completely different architectures, systems, code, solutions, etc. But if you’re using RL, it’s the same. Define the environment, define what you’re trying to do, try lots of actions and learn which ones maximise the reward.

Who Should be Pushing RL Adoption?

“Given that this is such a technical domain, who should be pushing for RL adoption?”

Like most things in life, I suspect there’s no easy or right answer. I’m no expert in management, but I think POs or PMs should be steering product development, but decisions should be agreed/discussed as a team. Ideas, solutions, metrics, everything, have to be defined by “the team” because no one person can know everything and get everything right.

I have the same argument with people that have the word “architect” in their title. :wink:

Industries Affected by RL

“Which industries do you see being most affected by advancements in reinforcement learning?”

For reference, see page 5-7 of the book.

“Industry” is a tricky word because it is broad and out-dated. It’s similar to asking what industry could make use of software. Of course, all of them could.

There are opportunities everywhere.

With that said, it’s a valid question. So far, robotics seems to be the number 1 use case. Simply because it’s hard to derive control programs for complex tasks. It’s easier to learn them.

Pricing/bidding/recommendations/advertising/etc. are largely similar tasks and have also had a lot of press.

The finance industry are going to be big users. I’ve spoken to people already that are using it.

Healthcare and specifically personalised medicine is a perfect match, although the regulatory requirements are likely to prevent this from taking off.

The Tech industry can leverage it to much greater extents for automation. E.g. ML, auto-ML, neural architecture search, etc. Lots of mundane automation like Alexa, email control, etc.

And lots more… :smile:

RL in Healthcare

“What you said about healthcare is interesting, why would regulatory requirements prevent reinforcement learning from improving things there?”

Healthcare == people’s lives. So there’s lots of rules and regulations to prevent accidents. This means there’s a very high barrier to entry. (I’m talking from a UK/EU perspective by the way :wink: - there may be fewer regulations in, say, the US for e.g.)

RL Tips and Tricks


“What are your top tips for debugging RL algos?”

Check out chapter 11 for more detail on this.

Here’s some random thoughts off the top of my head:

  1. Visualise what is going on (like any data-related task)
  2. If you are given the environment, start with the simplest algorithm and work up (e.g. random/CEM).
  3. If you have control over the environment/simulation, make that as simple as possible and solve that first. Then make the environment/simulation more complex.
  4. Split the tech. If you’re working with deep models, attempt to decouple the training of the deep NN from the RL. Not always optimal, but makes development much easier. For example, use autoencoders, train the autoencoder first and verify it works. Then pass the much lower-dimensional state into the RL algo. It will train much faster (possibly less optimally) and it will be easier to figure out issues.
  5. Split the problem. Try and halve the problem. Halve it again. Solve each quarter independently.
  6. Consider hierarchical policies (similar to 5). If you can manually design the hierarchy, even better for understanding/explainability. But you can automate that process too.
  7. Good old debugging techniques. print’s are your friend.
  8. Assert expected array sizes
  9. Don’t overcomplicate the reward function.

And more and more…

RL Algorithms

What is Tabular Q-Learning

And if your question is actually “what is tabular Q-Learning”, then Q-learning is a simple RL algorithm and tabular means “use look-up tables to store the Q-values”.

“Apart from multi-armed bandits, what are the other RL techniques that are getting wide adoption in the industry?”

Good question but it’s hard to obtain any real numbers on this. From my research/reading, most people tend to follow the media. If a particular algorithm gets media attention then it’s then quite popular in the frameworks which then leads to adpotion.

In general though, the tried and tested, simple models tend to remain the most popular. From basic Q-learning based algorithms, to simple policy gradient algorithms like SAC.

There’s no one-size fits all “best algo” though, like in ML, the “no free lunch” theorem. So you have to evaluate and experiment for your particular application.

Application of RL

Motion Capture and RL

“Could it be possible to improve performance of RL agent doing humanoid motions by virtual demonstrations of a person wearing a mocap suit?”

:100: yes. This is a perfect example where Behavior cloning/Imitation RL (see chapter 8 in the book) will be useful. In fact, this reminds me of a paper that I read a while ago… Here:

Gif for example. 1) Motion capture, 2) no IRL, 3) with IRL.