Applied Sciences (Jul 2020)

Evaluating Machine Learning Classification Using Sorted Missing Percentage Technique Based on Missing Data

  • Che-Yu Hung,
  • Bernard C. Jiang,
  • Chien-Chih Wang

DOI
https://doi.org/10.3390/app10144920
Journal volume & issue
Vol. 10, no. 14
p. 4920

Abstract

Read online

Missing data are common in industrial sensor readings owing to system updates and unequal radio-frequency periods. Existing methods addressing missing data through imputation may not always be appropriate. This study presented a sorted missing percentages technique for filtering attributes when building machine learning classification models using sensor readings with missing data. Signal detection theory was employed to evaluate the distinguishing ability of resulting models. To evaluate its performance, the proposed technique was applied to a publicly available air pressure system dataset, which then was used to build several classifiers. The experimental results indicated that the proposed technique allowed a logistic regression model to achieve the best accuracy score (99.56%) and a better distinguishing ability (response bias of 0.0013, adjusted response bias of 0.0044, and decision criterion of −1.8994) compared with the methods applied to the same dataset and reported in papers published between 2016 and 2019 March on binary classification, wherein attributes with more than 20% of missing data were filtered out. The proposed technique is suitable for industrial sensor data analysis and can be applied to the scenarios dealing with missing data owing to unequal radio-frequency periods or a system being updated with new fields.

Keywords