Discover Applied Sciences (Mar 2025)
An empirical study of the naïve REINFORCE algorithm for predictive maintenance
Abstract
Abstract Reinforcement Learning (RL) is a biologically inspired, autonomous machine learning method. RL algorithms can help generate optimal predictive maintenance (PdM) policies for complex industrial systems. However, these algorithms are extremely sensitive to hyperparameter tuning and network architecture, and this is where automated RL frameworks (AutoRL) can offer a platform to encourage industrial practitioners to apply RL to their problems. AutoRL applied to PdM has yet to be studied. Aimed at practitioners unfamiliar with complex RL tuning, we undertake an empirical study to understand untuned RL algorithms for generating optimal tool replacement policies for milling machines. We compare a naïve implementation of REINFORCE against the policies of industry-grade implementations of three advanced algorithms – Deep Q-Network (DQN), Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO). Our broad goal was to study model performance under four scenarios: (1) simulated tool-wear data, (2) actual tool-wear data (benchmark IEEEDataPort PHM Society datasets), (3) univariate state with added noise levels and a random chance of break-down, and finally (4) complex multivariate state. Across 15 environment variants, REINFORCE models demonstrated higher tool replacement precision 0.687, recall 0.629 and F1 0.609 against A2C (0.449/0.480/0.442), DQN (0.418/0.504/0.374) and PPO (0.472/0.316/0.345), while demonstrating lower variability. Comparing the best auto-selected model, over ten training rounds produced unusually wider performance gaps with the REINFORCE precision, recall and F1 at 0.884, 0.884, 0.873 against the best A2C (0.520/0.859/0.639), DQN (0.651/0.937/0.740), and PPO (0.558/0.643/0.580) models. For the REINFORCE, a basic hyperparameter sensitivity and interaction analysis is conducted to better understand the dynamics and present results for the hyperparameters learning rate, discount factor $$\gamma$$ γ and the network activation functions (ReLU and Tanh). Our study suggests that, in the untuned state, simpler algorithms like the REINFORCE perform reasonably well. For AutoRL frameworks, this research encourages seeking new design approaches to automatically identify optimum algorithm-hyperparameter combinations.
Keywords