IEEE Access (Jan 2020)

Research on Adaptive Job Shop Scheduling Problems Based on Dueling Double DQN

  • Bao-An Han,
  • Jian-Jun Yang

DOI
https://doi.org/10.1109/ACCESS.2020.3029868
Journal volume & issue
Vol. 8
pp. 186474 – 186495

Abstract

Read online

Traditional approaches for job shop scheduling problems are ill-suited to deal with complex and changeable production environments due to their limited real-time responsiveness. Based on disjunctive graph dispatching, this work proposes a deep reinforcement learning (DRL) framework, that combines the advantages of real-time response and flexibility of a deep convolutional neural network (CNN) and reinforcement learning (RL), and learns behavior strategies directly according to the input manufacturing states, thus is more appropriate for practical order-oriented manufacturing problems. In this framework, a scheduling process using a disjunction graph is viewed as a multi-stage sequential decision-making problem and a deep CNN is used to approximate the state-action value. The manufacturing states are expressed as multi-channel images and input into the network. Various heuristic rules are used as available actions. By adopting the dueling double Deep Q-network with prioritized replay (DDDQNPR), the RL agent continually interacts with the scheduling environment through trial and error to obtain the best policy of combined actions for each decision step. Static computational experiments are performed on 85 JSSP instances from the well-known OR-Library. The results indicate that the proposed algorithm can obtain optimal solutions for small scale problems, and performs better than any single heuristic rule for large scale problems, with performances comparable to genetic algorithms. To prove the generalization and robustness of our algorithm, the instances with random initial states are used as validation sets during training to select the model with the best generalization ability, and then the performance of the trained policy on scheduling instances with different initial states is tested. The results show that the agent is able to get better solutions adaptively. Meanwhile, some studies on dynamic instances with random processing time are performed and experiment results indicate that out method can achieve comparable performances in dynamic environment in the short run.

Keywords