Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki (Aug 2020)
Prediction of Missing Values in Adult Data Set of UCI Machine Learning: A Case of Study
Abstract
These days, not having complete data of any kind can be a big problem for different organizations when making decisions. In this article, we propose to use Shannon entropy and information gain to predict and impute missing categorical data in any data set. It is detailed with an example of how entropy is applied and knows the level of uncertainty of each attribute value. Likewise, the imputation of the missing attributes is also carried out with other imputation techniques in the Adult data set of UCI Machine Learning to denote the advantages offered by the proposed methodology.
Keywords