Sortation Control Using Multi-Agent Deep Reinforcement Learning in <i>N</i>-Grid Sortation System

Ju-Bong Kim; Ho-Bin Choi; Gyu-Young Hwang; Kwihoon Kim; Yong-Geun Hong; Youn-Hee Han

doi:10.3390/s20123401

Sensors (Jun 2020)

Sortation Control Using Multi-Agent Deep Reinforcement Learning in <i>N</i>-Grid Sortation System

Ju-Bong Kim,
Ho-Bin Choi,
Gyu-Young Hwang,
Kwihoon Kim,
Yong-Geun Hong,
Youn-Hee Han

Affiliations

Ju-Bong Kim: Department of Computer Science and Engineering, Korea University of Technology and Education, Cheonan 31253, Korea
Ho-Bin Choi: Department of Computer Science and Engineering, Korea University of Technology and Education, Cheonan 31253, Korea
Gyu-Young Hwang: Department of Computer Science and Engineering, Korea University of Technology and Education, Cheonan 31253, Korea
Kwihoon Kim: Department of Knowledge-Converged Super Brain Convergence Research, Electronics and Telecommunications Research Institute, Daejeon 34129, Korea
Yong-Geun Hong: Department of Knowledge-Converged Super Brain Convergence Research, Electronics and Telecommunications Research Institute, Daejeon 34129, Korea
Youn-Hee Han: Department of Computer Science and Engineering, Korea University of Technology and Education, Cheonan 31253, Korea

DOI: https://doi.org/10.3390/s20123401
Journal volume & issue: Vol. 20, no. 12
p. 3401

Abstract

Read online

Intralogistics is a technology that optimizes, integrates, automates, and manages the logistics flow of goods within a logistics transportation and sortation center. As the demand for parcel transportation increases, many sortation systems have been developed. In general, the goal of sortation systems is to route (or sort) parcels correctly and quickly. We design an n-grid sortation system that can be flexibly deployed and used at intralogistics warehouse and develop a collaborative multi-agent reinforcement learning (RL) algorithm to control the behavior of emitters or sorters in the system. We present two types of RL agents, emission agents and routing agents, and they are trained to achieve the given sortation goals together. For the verification of the proposed system and algorithm, we implement them in a full-fledged cyber-physical system simulator and describe the RL agents’ learning performance. From the learning results, we present that the well-trained collaborative RL agents can optimize their performance effectively. In particular, the routing agents finally learn to route the parcels through their optimal paths, while the emission agents finally learn to balance the inflow and outflow of parcels.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords