Simplifying RL Problems and Solutions

Simplifying RL Problems

You noted that many industrial applications could be solved with something as simple as tabular Q-learning. I was wondering if you could elaborate on that with some examples?

If you’re talking about “many problems can be solved with simple algorithms”, then yes, there are many problems with low hanging fruit, that can be solved with simple algorithms. This comes down to a trade off between business value and technical difficulty. If it’s valuable, and easy, then that’s the problem you should solve first. If it’s less valuable, but still very easy, then that might still be prioritised higher because it’s easy to solve.

Relating “easy” to RL, what I mean is that the state and action space is simple and the reward obvious. Most of the time you can simplify the problem too. Bite off a smaller chunk of the problem. For example, imagine you were Amazon and you were trying to create RL algorithms to restock your warehouse. Yes, you could try and view the warehouse as the environment and the products as part of the state, but that would be a massively complex problem. Instead, why not say that a box is an environment, and the singular items in that box is the state. That’s much easier to solve.

See what I mean? We’d have to get into domains to be more specific than that.

What is Tabular Q-Learning

And if your question is actually “what is tabular Q-Learning”, then Q-learning is a simple RL algorithm and tabular means “use look-up tables to store the Q-values”.