Reinforcement Learning to Optimize the Logistics Distribution Routes of Unmanned Aerial Vehicle

Abstract: Path planning methods for the unmanned aerial vehicle (UAV) in goods delivery have drawn great attention from industry and academics because of its flexibility which is suitable for many situations in the “Last Kilometer” between customer and delivery nodes. However, the complicated situation is still a problem for traditional combinatorial optimization methods. Based on the state-of-the-art Reinforcement Learning (RL), this paper proposed an improved method to achieve path planning for UAVs in complex surroundings: multiple no-fly zones. The improved approach leverages the attention mechanism and includes the embedding mechanism as the encoder and three different widths of beam search (i.e.,~1, 5, and 10) as the decoders. Policy gradients are utilized to train the RL model for obtaining the optimal strategies during inference. The results show the feasibility and efficiency of the model applying in this kind of complicated situation. Comparing the model with the results obtained by the optimization solver OR-tools, it improves the reliability of the distribution system and has a guiding significance for the broad application of UAVs.