Engineering Access (Jul 2024)
A Mixture-of-experts Approach to Production Capacity Planning for Diverse Demand Patterns via Deep Reinforcement Learning
Abstract
Ensemble Learning is gaining traction in Reinforcement Learning (RL) due to its ability to improve performance, robustness, and capabilities of RL models. This paper addresses the challenge of production planning with fluctuating demand by proposing a novel Mixture of Experts Deep Reinforcement Learning (MoE-DRL) model. We leverage a combination of Proximal Policy Optimization (PPO), a powerful reinforcement learning algorithm, and Ensemble learning, a technique that combines multiple models. We propose a mixture of expert ensemble learning model that combine multiple expert PPO-DRL agents through a gating model (MoE PPO-DRL). The gating model learns to select the best expert agent for predicting the most suitable production plan for each situation's different demand patterns. The proposed model was trained and tested against the results obtained from the Mixed Integer Linear Programming model and the individual expert PPO agents. The MoE PPO-DRL model achieved a total average profit that was 25.9% higher than an average of all expert single-agent models. It also achieved a 11.02% optimality gap, which is significantly lower than the 22.93% average gap of all expert single-agent models.