Decoupling Patrolling Tasks for Water Quality Monitoring: A Multi-Agent Deep Reinforcement Learning Approach

Dame Seck Diop; Samuel Yanes Luis; Manuel Perales Esteve; Sergio L. Toral Marin; Daniel Gutierrez Reina

doi:10.1109/ACCESS.2024.3403790

IEEE Access (Jan 2024)

Decoupling Patrolling Tasks for Water Quality Monitoring: A Multi-Agent Deep Reinforcement Learning Approach

Dame Seck Diop,
Samuel Yanes Luis,
Manuel Perales Esteve,
Sergio L. Toral Marin,
Daniel Gutierrez Reina

Affiliations

Dame Seck Diop: ORCiD; Department of Electronic Engineering, Technical School of Engineering of Seville, Seville, Spain
Samuel Yanes Luis: ORCiD; Department of Electronic Engineering, Technical School of Engineering of Seville, Seville, Spain
Manuel Perales Esteve: ORCiD; Department of Electronic Engineering, Technical School of Engineering of Seville, Seville, Spain
Sergio L. Toral Marin: ORCiD; Department of Electronic Engineering, Technical School of Engineering of Seville, Seville, Spain
Daniel Gutierrez Reina: ORCiD; Department of Electronic Engineering, Technical School of Engineering of Seville, Seville, Spain

DOI: https://doi.org/10.1109/ACCESS.2024.3403790
Journal volume & issue: Vol. 12
pp. 75559 – 75576

Abstract

Read online

This study proposes the use of an Autonomous Surface Vehicle (ASV) fleet with water quality sensors for efficient patrolling to monitor water resource pollution. This is formulated as a Patrolling Problem, which consists of planning and executing efficient routes to continuously monitor a given area. When patrolling Lake Ypacaraí with ASVs, the scenario transforms into a Partially Observable Markov Game (POMG) due to unknown pollution levels. Given the computational complexity, a Multi-Agent Deep Reinforcement Learning (MADRL) approach is adopted, with a common policy for homogeneous agents. A consensus algorithm assists in collision avoidance and coordination. The work introduces exploration and reinforcement phases to the patrolling problem. The Exploration Phase aims at homogeneous map coverage, while the Intensification Phase prioritizes high polluted areas. The innovative introduction of a transition variable, $\nu $ , efficiently controls the transition from exploration to intensification. Results demonstrate the superiority of the method, which outperforms a Single-Phase (trained on a single task) Deep Q-Network (DQN) by an average of 17% on the intensification task. The proposed multitask learning approach with parameter sharing, coupled with DQN training, outperforms Task-Specific DQN (two DQNs trained on separate tasks) by 6% in exploration and 13% in intensification. It also outperforms the heuristic-based Lawn Mower Path Planner (LMPP) and Random Wanderer Path Planner (RWPP) algorithms, by 35% and 20% on average respectively. Additionally, it outperforms a Particle Swarm Optimization-based Path Planner (PSOPP) by an average of 26%. The algorithm demonstrates adaptability in unforeseen scenarios, giving users flexibility in configuration.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords