Case Studies in Chemical and Environmental Engineering (Dec 2024)
A feature restoration for machine learning on anti-corrosion materials
Abstract
Materials informatics often struggles with small datasets. Our study introduces the Gaussian Mixture Model Virtual Sample Generation (GMM-VSG) approach to enhance feature correlation by generating virtual samples. Applied to six small and one large dataset of 218 N-heterocyclic compounds, GMM-VSG significantly improved predictive performance. Random Forest’s R2 rose from 0.80 to 0.99, with RMSE dropping from 9.87 to 0.22. Kernel Ridge’s R2 increased from 0.70 to 0.99, and RMSE decreased from 10.08 to 0.83. KNN improved from R2 of 0.74–0.90. ANN and MLPNN also saw notable improvements. GMM-VSG is thus crucial for advancing anti-corrosion material research.