PLoS ONE (Jan 2020)
Predicting human odor perception represented by continuous values from mass spectra of essential oils resembling chemical mixtures.
Abstract
There have been recent advances in predicting odor characteristics using molecular structure parameters of chemicals. Although the molecular structure parameters are available for each chemical, they cannot be used for chemical mixtures. This study will elucidate a computational method of predicting human odor perception from the mass spectra of chemical mixtures such as essential oils. Furthermore, a method for obtaining similarity among odor descriptors has been proposed although the dataset contains binary values only. When the database indicates a set of odor descriptors for one sample, only binary data are available and the correlation between the similar descriptors disappears. Thus, the prediction performance degrades for not considering the similarity among the odor descriptors. Since mass spectra dataset is highly dimensional, we use auto-encoder to learn the compressed representation from the mass spectra of essential oils in its bottleneck hidden layer and then accomplishes the hierarchical clustering to create odor descriptor groups with similar odor impressions using a matrix of continuous value-based correlation coefficient as well as natural language processing. This work will help to expatiate the process of overcoming binary value problem and find out the similarity among odor descriptors using machine learning with natural language semantic representation of words. To overcome the problem of disproportionate ratio of positive and negative class for both the continuous value-based correlation coefficient and word similarity based models, we use Synthetic Minority Oversampling Technique (SMOTE). This model allows us to predict human odor perception through computer simulations by forming odor descriptors group. Accordingly, this study demonstrates the feasibility of ensembling machine learning with natural language processing and SMOTE approach for predicting odor descriptor group from mass spectra of essential oils.