Scientific Reports (Aug 2021)

Majority scoring with backward elimination in PLS for high dimensional spectrum data

  • Freeh N. Alenezi

DOI
https://doi.org/10.1038/s41598-021-96389-2
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy for optimal performance, hence we have proposed majority scoring with backward elimination in PLS (MSBE-PLS). In MSBE-PLS we have considered variable importance on projection (VIP) and selectivity ratio (SR). In each iteration of backward elimination in PLS variables are considered influential if they were selected by both filter indicator. The proposed method is implemented for corn’s and diesel’s content prediction. The corn contents include protein, oil, starch and moisture while diesel contents include boiling point at 50% recovery, cetane number, density, freezing temperature of the fuel, total aromatics, and viscosity. The proposed method outperforms in terms of RMSE when compared with reference methods. In addition to validating the spectrum models, data properties are also examined for explaining prediction behaviors. Moreover, MSBE-PLS select the moderate number of influential variables, hence it presents the parsimonious model for predicting contents based on spectrum data.