A Novel Multi-Objective Deep Q-Network: Addressing Immediate and Delayed Rewards in Multi-Objective Q-Learning

Youming Zhang

doi:10.1109/ACCESS.2024.3465628

IEEE Access (Jan 2024)

A Novel Multi-Objective Deep Q-Network: Addressing Immediate and Delayed Rewards in Multi-Objective Q-Learning

Youming Zhang

Affiliations

Youming Zhang: ORCiD; Institute of Innovation, Science, and Sustainability, Federation University Australia, Ballarat, VIC, Australia

DOI: https://doi.org/10.1109/ACCESS.2024.3465628
Journal volume & issue: Vol. 12
pp. 144932 – 144949

Abstract

Read online

Current multi-objective reinforcement learning (MORL) research often struggles to balance multiple objectives and manage the stability and performance of learning algorithms, especially in complex environments. To address this, we propose a new multi-objective deep Q-network (MO-DQN) framework that integrates linear scalarization in MORL. This framework has the following features: First, we provide immediate feedback by incorporating linear scalarization into reward processing. Compared with some complex multi-objective optimization methods, this approach is relatively easy to understand and implement, offering greater convenience for practical applications. Additionally, linear scalarization accelerates the learning process and enhances the algorithm’s ability to dynamically adjust strategies. Second, we develop the Linear Scalarized Multi-objective Deep Q-Network (LSMO-DQN) under different reward mechanisms, improving MO-DQN’s ability to balance multiple objectives effectively. Immediate reward strategies accelerate learning and enable rapid adjustments, benefiting dynamic environments. In contrast, delayed reward strategies help understand long-term action impacts and promote strategic decision-making. Experiments are conducted in two different multi-objective environments, where our proposed method slightly outperforms other techniques, indicating its robustness and adaptability. Specifically, LSMO-DQN achieves higher cumulative rewards and demonstrates improved stability across various reward structures. The findings suggest that integrating linear scalarization in reward processing not only enhances learning performance but also provides a more straightforward approach to managing the trade-offs in multi-objective settings. These results highlight the potential of LSMO-DQN to improve learning performance, particularly in scenarios with data imbalances.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords