Intelligent Systems with Applications (Jun 2024)
Ensemble learning for impurity prediction in high-purity indium purified via vertical zone refining
Abstract
The complexity of raw materials and multi-step purification processes presents considerable technical challenges in establishing universally applicable process parameters for the production of high-purity metals. Machine learning has emerged as an indispensable tool in the field of materials science, facilitating the accurate prediction of target variables and accelerating process optimization, thereby yielding substantial reductions in both experimental costs and time. This study explores the utilization of high-precision machine learning models to predict the residual impurity content in high-purity indium after vertical zone refining. A dataset comprising 82 experimental datasets was employed to determine the optimal hyperparameters for XGBoost and LightGBM models through Bayesian optimization. The XGBoost and LightGBM models demonstrated mean absolute errors (MAEs) of 0.022 and 0.023, respectively, as determined via leave-one-out cross-validation (LOOCV). Their comparable predictive performance to the previously established Ridge regression model (MAE = 0.024) prompted the exploration of fusion techniques, including mean, weighted, and stacking fusion, to further enhance accuracy. Remarkably, the weighted fusion model exhibited the most optimal predictive capabilities, supported by comprehensive evaluation metrics, including an MAE of 0.020, root mean squared error (RMSE) of 0.026, and a coefficient of determination (R2 score) of 0.830. Furthermore, the SHapley Additive exPlanations (SHAP) analysis revealed a significant correlation between lower initial arsenic (As) content and reduced total post-refining impurity levels in both the XGBoost and LightGBM models. This study underscores the precision of ensemble learning in predicting residual impurity content in vertically zone-refined indium products.