Reinforcement Learning-based Autoscaling of Workflows in the Cloud: A Survey

Reinforcement Learning-based Autoscaling of Workflows in the Cloud: A Survey

Abstract: Reinforcement Learning (RL) has demonstrated a great potential for automatically solving decision-making problems in complex uncertain environments. RL proposes a computational approach that allows learning through interaction in an environment with stochastic behavior, where agents take actions to maximize some cumulative short-term and long-term rewards. Some of the most impressive results have been shown in Game Theory where agents exhibited superhuman performance in games like Go or Starcraft 2, which led to its gradual adoption in many other domains including Cloud Computing. Therefore, RL appears as a promising approach for Autoscaling in Cloud since it is possible to learn transparent (with no human intervention), dynamic (no static plans), and adaptable (constantly updated) resource management policies. These are three important distinctive aspects to consider in comparison with other widely used autoscaling policies that are defined as ad-hoc way or statically computed as in solutions based on meta-heuristics. Particularly, workflow autoscaling exploits the Cloud elasticity to optimize the execution of workflows according to given optimization criteria. This is a decision-making problem that demands to decide when and how to scale-up/down computational resources; and how to assign them to the upcoming processing workload. Such actions have to be taken considering some optimization criteria in the Cloud, a dynamic and uncertain environment. Motivated by this, many works apply RL to the autoscaling problem in Cloud. In this work, we survey exhaustively those proposals from major venues, and uniformly compare them based on a set of proposed taxonomies. We also discuss open problems and provide a prospect of future research in the area.