MethodsX (Jan 2021)
Optimization methods for the imputation of missing values in Educational Institutions Data
Abstract
The imputation of missing values in the detail data of Educational Institutions is a difficult task. These data contain multivariate time series, which cannot be satisfactory imputed by many existing imputation techniques. Moreover, almost all the data of an Institution are interconnected: the number of graduates is not independent from the number of students, the expenditure is not independent from the staff, etc. In other words, each imputed value has an impact on the whole set of data of the institution. Therefore, imputation techniques for this specific case should be designed very carefully. We describe here the methods and the codes of the imputation methodology developed to impute the various patterns of missing values which appear in similar interconnected data. In particular, a first part of the proposed methodology, called ``trend smoothing imputation'', is designed to impute missing values in time series by respecting the trend and the other features of an Institution. The second part of the proposed methodology, called ``donor imputation'', is designed to impute larger chunks of missing data by using values taken form similar Institutions in order to respect again their size and trend. • Trend smoothing imputation can handle missing subsequences in time series, and is given by a weighted combination of: (a) weighed average of the other available values of the sequence, and (b) linear regression. • Donor imputation can handle full sequence missing in time series. It imputes the Recipient Institution using the values taken from a similar institution, called Donor, selected using optimization criteria. • The values imputed by our techniques should respect the trend, the size and the ratios of each Institution.