IEEE Access (Jan 2024)
Heavy-Tailed Reinforcement Learning With Penalized Robust Estimator
Abstract
We consider finite-horizon episodic reinforcement learning (RL) under heavy-tailed noises, where the p-th moment is bounded for any $p \in (1,2$ ]. In this setting, existing RL algorithms are limited by their requirement for prior knowledge about the bounded moment order of the noise distribution. This requirement hinders their practical application, as such prior information is rarely available in real-world scenarios. Our proposed method eliminates the need for this prior knowledge, enabling implementation in a wider range of scenarios. We introduce two RL algorithms, p-Heavy-UCRL and p-Heavy-Q-learning, designed for model-based and model-free RL settings, respectively. Without the need for prior knowledge, these algorithms demonstrate robustness to heavy-tailed noise and achieve nearly optimal regret bounds, up to logarithmic terms, with the same dependencies on dominating terms as existing algorithms. Finally, we show that our proposed algorithms have empirically comparable performance to existing algorithms in synthetic tabular scenario.
Keywords