IEEE Open Journal of the Communications Society (Jan 2024)
Meta Reinforcement Learning for UAV-Assisted Energy Harvesting IoT Devices in Disaster-Affected Areas
Abstract
Over the past decade, Unmanned Aerial Vehicles (UAVs) have attracted significant attention due to their potential applications in emergency-response applications, including wireless power transfer (WPT) and data collection from Internet of Things (IoT) devices in disaster-affected areas. UAVs are more attractive than traditional techniques due to their maneuverability, flexibility, and low deployment costs. However, using UAVs for such critical tasks comes with challenges, including limited resources, energy constraints, and the need to complete missions within strict time frames. IoT devices in disaster areas have limited resources (e.g., computation, energy), so they depend on the UAVs’ resources to accomplish vital missions. To address these resource problems in a disaster scenario, we propose a meta-reinforcement learning (RL)-based energy harvesting (EH) framework. Our system model considers a swarm of UAVs that navigate an area, providing wireless power and collecting data from IoT devices on the ground. The primary objective is to enhance the quality of service for strategic locations while allowing UAVs to dynamically join and leave the swarm (e.g., for recharging). In this context, we formulate the problem as a non-linear programming (NLP) optimization problem aimed at maximizing the total EH IoT devices and determining the optimal trajectory paths for UAVs while adhering to the constraints related to the maximum time duration, the UAVs’ maximum energy consumption, and the minimum data rate to achieve a reliable transmission. Due to the complexity of the problem, the combinatorial nature of the formulated problem, and the difficulty of obtaining the optimal solution using conventional optimization problems, we propose a lightweight meta-RL solution capable of solving the problem by learning the system dynamics. We conducted extensive simulations and compared our approach with two state-of-the-art models using traditional RL algorithms represented by a deep Q-network algorithm, a Particle Swarm Optimization (PSO) algorithm, and one greedy solution. Our simulation results show that the proposed Meta-RL algorithm can outperform the IoT EH of the DQN, PSO algorithm, and the greedy solution by 25%, 32%, and 45%, respectively. The results of our simulations also demonstrate that our proposed approach outperforms the competitive solutions in terms of efficiently covering strategic locations with a high satisfaction rate and high accuracy.
Keywords