Measurement: Sensors (Aug 2024)
Design and application of deep reinforcement learning algorithms based on unbiased exploration strategies for value functions
Abstract
Deep Q-networks, as a representation of several classical techniques, have emerged as one of the primary branches in the field of value function-based reinforcement learning. The paper addresses two issues that come up in the realm of reinforcement learning for value function solving: estimating bias and maximizing projected action value function evaluation. By treating the estimation of the highest expected action value as a random selection estimation problem, the suggested approach addresses the estimation bias issue from the standpoint of random selection. A random choice estimate procedure forms the basis of the technique. Firstly, a proposed random choice estimator is presented and its theoretical fairness is established. Second, the estimator is applied to create a reinforcement learning method in a different application. Two techniques, namely stochastic two-depth Q-networks and double-Q learning, are suggested based on the random choice estimation technique. The main parameters of the suggested algorithms are then investigated, and parameter formulas for both predictable and unpredictable scenarios are created. Lastly, a random choice estimation perspective suggests a stochastic two-depth Q-network. The new approach may effectively remove bias in value function estimate, enhance learning performance, and stabilise the learning process, according to simulation findings on Grid World and Atari games.