Dueling Network Architecture for GNN in the Deep Reinforcement Learning for the Automated ICT System Design

Tianchen Zhou; Yutaka Yakuwa; Natsuki Okamura; Hiroyuki Hochigai; Takayuki Kuroda; Ikuko Eguchi Yairi

doi:10.1109/ACCESS.2025.3534246

IEEE Access (Jan 2025)

Dueling Network Architecture for GNN in the Deep Reinforcement Learning for the Automated ICT System Design

Tianchen Zhou,
Yutaka Yakuwa,
Natsuki Okamura,
Hiroyuki Hochigai,
Takayuki Kuroda,
Ikuko Eguchi Yairi

Affiliations

Tianchen Zhou: ORCiD; Department of Information Science, Sophia University, Tokyo, Japan
Yutaka Yakuwa: ORCiD; NEC Corporation, Kawasaki, Japan
Natsuki Okamura: Department of Information Science, Sophia University, Tokyo, Japan
Hiroyuki Hochigai: ORCiD; Department of Information Science, Sophia University, Tokyo, Japan
Takayuki Kuroda: NEC Corporation, Kawasaki, Japan
Ikuko Eguchi Yairi: ORCiD; Department of Information Science, Sophia University, Tokyo, Japan

DOI: https://doi.org/10.1109/ACCESS.2025.3534246
Journal volume & issue: Vol. 13
pp. 21870 – 21879

Abstract

Read online

This paper presents an improved deep reinforcement learning-based (DRL) approach for end-to-end models using a Graph Neural Network(GNN). The proposed method aims to improve end-to-end deep Q learning with a GNN by decomposing the GNN-based Q-network structure into two sub-streams to separately estimate the global state value and the state-dependent action advantage instead. By doing that, our dueling GNN architecture can independently learn which states are valuable or not. This is achieved by utilizing the graph-dependent global-state value rather than relying on the effect of each action for each state. This approach provides a more accurate approximation of the Q-value. With better Q-value approximation, the network can deal with the problem of massive state space with sparse rewards and significantly achieve higher learning efficiency without imposing any change to the underlying reinforcement learning algorithm. The proposed method was introduced into an automated ICT system design model. The automated ICT system design model faces a fundamental challenge characterized by prolonged learning times, primarily attributable to the tendency to overestimate particular configurations owing to the scarcity of rewards despite the vast exploration space encompassing numerous possible combinations of ICT system components. The results reveal that the proposed architecture effectively improves the learning efficiency of the DRL model without imposing any changes to the underlying reinforcement learning algorithm.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords