Multi-Agent Reinforcement Learning in a Realistic Limit Order Book Market Simulation

Multi-Agent Reinforcement Learning in a Realistic Limit Order Book Market Simulation

Abstract: Optimal order execution is widely studied by industry practitioners and academic researchers because it determines the profitability of investment decisions and high-level trading strategies, particularly those involving large volumes of orders. However, complex and unknown market dynamics pose enormous challenges for the development and validation of optimal execution strategies. We propose a model-free approach by training Reinforcement Learning (RL) agents in a realistic market simulation environment with multiple agents. First, we have configured a multi-agent historical order book simulation environment for execution tasks based on an Agent-Based Interactive Discrete Event Simulation (ABIDES) [arXiv:1904.12066]. Second, we formulated the problem of optimal execution in an RL setting in which an intelligent agent can make order execution and placement decisions based on market microstructure trading signals in HFT. Third, we developed and trained an RL execution agent using the Double Deep Q-Learning (DDQL) algorithm in the ABIDES environment. In some scenarios, our RL agent converges towards a Time-Weighted Average Price (TWAP) strategy. Finally, we evaluated the simulation with our RL agent by comparing the simulation on the actual market Limit Order Book (LOB) characteristics.