Evaluation of Odor Prediction Model Performance and Variable Importance according to Various Missing Imputation Methods

Do-Hyun Lee; Saem-Ee Woo; Min-Woong Jung; Tae-Young Heo

doi:10.3390/app12062826

Applied Sciences (Mar 2022)

Evaluation of Odor Prediction Model Performance and Variable Importance according to Various Missing Imputation Methods

Do-Hyun Lee,
Saem-Ee Woo,
Min-Woong Jung,
Tae-Young Heo

Affiliations

Do-Hyun Lee: Department of Information & Statistics, Chungbuk National University, Cheongju 28644, Korea
Saem-Ee Woo: Animal Environment Division, National Institute of Animal Science, RDA, Iseo-myeon 55365, Korea
Min-Woong Jung: Animal Environment Division, National Institute of Animal Science, RDA, Iseo-myeon 55365, Korea
Tae-Young Heo: Department of Information & Statistics, Chungbuk National University, Cheongju 28644, Korea

DOI: https://doi.org/10.3390/app12062826
Journal volume & issue: Vol. 12, no. 6
p. 2826

Abstract

Read online

The aim of this study is to ascertain the most suitable model for predicting complex odors using odor substance data that has a small number of data and a large number of missing data. First, we compared the data removal and imputation methods, and the method of imputing missing data was found to be more effective. Then, in order to recommend a suitable model, we created a total of 126 models (missing imputation: single imputation, multiple imputations, K-nearest neighbor imputation; data preprocessing: standardization, principal component analysis, partial least square; and predictive method: multiple regression, machine learning, deep learning) and compared them using R2 and mean absolute error (MAE) values. Finally, we investigated variable importance using the best prediction model. The results identified the best model as a combination of multivariate imputation using Bayesian ridge as the missing imputation method, standardization for data preprocessing, and an extremely randomized tree as the predictive method. Among the odor compounds, Methyl mercaptan, acetic acid, and dimethyl sulfide were identified as the most important odor compounds in predicting complex odors.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords