A Deep Reinforcement Learning-Based Decision-Making Approach for Routing Problems

Dapeng Yan; Qingshu Guan; Bei Ou; Bowen Yan; Zheng Zhu; Hui Cao

doi:10.3390/app15094951

Applied Sciences (Apr 2025)

A Deep Reinforcement Learning-Based Decision-Making Approach for Routing Problems

Dapeng Yan,
Qingshu Guan,
Bei Ou,
Bowen Yan,
Zheng Zhu,
Hui Cao

Affiliations

Dapeng Yan: School of Electrical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Qingshu Guan: School of Electrical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Bei Ou: School of Electrical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Bowen Yan: School of Electrical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Zheng Zhu: School of Electrical Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Hui Cao: School of Electrical Engineering, Xi’an Jiaotong University, Xi’an 710049, China

DOI: https://doi.org/10.3390/app15094951
Journal volume & issue: Vol. 15, no. 9
p. 4951

Abstract

Read online

In recent years, routing problems have attracted significant attention in the fields of operations research and computer science due to their fundamental importance in logistics and transportation. However, most existing learning-based methods employ simplistic context embeddings to represent the routing environment, which constrains their capacity to capture real-time visitation dynamics. To address this limitation, we propose a deep reinforcement learning-based decision-making framework (DRL-DM) built upon an encoder–decoder architecture. The encoder incorporates a batch normalization fronting mechanism and a gate-like threshold block to enhance the quality of node embeddings and improve convergence speed. The decoder constructs a dynamic-aware context embedding that integrates relational information among visited and unvisited nodes, along with the start and terminal locations, thereby enabling effective tracking of real-time state transitions and graph structure variations. Furthermore, the proposed approach exploits the intrinsic symmetry and circularity of routing solutions and adopts an actor–critic training paradigm with multiple parallel trajectories to improve exploration of the solution space. Comprehensive experiments conducted on both synthetic and real-world datasets demonstrate that DRL-DM consistently outperforms heuristic and learning-based baselines, achieving up to an 8.75% reduction in tour length. Moreover, the proposed method exhibits strong generalization capabilities, effectively scaling to larger problem instances and diverse node distributions, thereby highlighting its potential for solving complex, real-life routing tasks.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords