Multi-Task Multi-Objective Evolutionary Search Based on Deep Reinforcement Learning for Multi-Objective Vehicle Routing Problems with Time Windows

Jianjun Deng; Junjie Wang; Xiaojun Wang; Yiqiao Cai; Peizhong Liu

doi:10.3390/sym16081030

Symmetry (Aug 2024)

Multi-Task Multi-Objective Evolutionary Search Based on Deep Reinforcement Learning for Multi-Objective Vehicle Routing Problems with Time Windows

Jianjun Deng,
Junjie Wang,
Xiaojun Wang,
Yiqiao Cai,
Peizhong Liu

Affiliations

Jianjun Deng: Chengdu Aeronautic Polytechnic, Chengdu 610100, China
Junjie Wang: College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
Xiaojun Wang: College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
Yiqiao Cai: College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
Peizhong Liu: College of Engineering, Huaqiao University, Quanzhou 362000, China

DOI: https://doi.org/10.3390/sym16081030
Journal volume & issue: Vol. 16, no. 8
p. 1030

Abstract

Read online

The vehicle routing problem with time windows (VRPTW) is a widely studied combinatorial optimization problem in supply chains and logistics within the last decade. Recent research has explored the potential of deep reinforcement learning (DRL) as a promising solution for the VRPTW. However, the challenge of addressing the VRPTW with many conflicting objectives (MOVRPTW) still remains for DRL. The MOVRPTW considers five conflicting objectives simultaneously: minimizing the number of vehicles required, the total travel distance, the travel time of the longest route, the total waiting time for early arrivals, and the total delay time for late arrivals. To tackle the MOVRPTW, this study introduces the MTMO/DRP-AT, a multi-task multi-objective evolutionary search algorithm, by making full use of both DRL and the multitasking mechanism. In the MTMO/DRL-AT, a two-objective MOVRPTW is constructed as an assisted task, with the objectives being to minimize the total travel distance and the travel time of the longest route. Both the main task and the assisted task are simultaneously solved in a multitasking scenario. Each task is decomposed into scalar optimization subproblems, which are then solved by an attention model trained using DRL. The outputs of these trained models serve as the initial solutions for the MTMO/DRL-AT. Subsequently, the proposed algorithm incorporates knowledge transfer and multiple local search operators to further enhance the quality of these promising solutions. The simulation results on real-world benchmarks highlight the superior performance of the MTMO/DRL-AT compared to several other algorithms in solving the MOVRPTW.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords