Optimization methods for the imputation of missing values in Educational Institutions Data

D. Aureli; R. Bruni; C. Daraio

MethodsX (Jan 2021)

Optimization methods for the imputation of missing values in Educational Institutions Data

D. Aureli,
R. Bruni,
C. Daraio

Affiliations

D. Aureli: Dep. of Information Engineering, Electronics and Telecommunications, “Sapienza” University of Rome, Rome, Italy; Corresponding author.
R. Bruni: Dep. of Computer Control and Management Engineering, “Sapienza” University of Rome, Rome, Italy
C. Daraio: Dep. of Computer Control and Management Engineering, “Sapienza” University of Rome, Rome, Italy

Journal volume & issue: Vol. 8
p. 101208

Abstract

Read online

The imputation of missing values in the detail data of Educational Institutions is a difficult task. These data contain multivariate time series, which cannot be satisfactory imputed by many existing imputation techniques. Moreover, almost all the data of an Institution are interconnected: the number of graduates is not independent from the number of students, the expenditure is not independent from the staff, etc. In other words, each imputed value has an impact on the whole set of data of the institution. Therefore, imputation techniques for this specific case should be designed very carefully. We describe here the methods and the codes of the imputation methodology developed to impute the various patterns of missing values which appear in similar interconnected data. In particular, a first part of the proposed methodology, called ``trend smoothing imputation'', is designed to impute missing values in time series by respecting the trend and the other features of an Institution. The second part of the proposed methodology, called ``donor imputation'', is designed to impute larger chunks of missing data by using values taken form similar Institutions in order to respect again their size and trend. • Trend smoothing imputation can handle missing subsequences in time series, and is given by a weighted combination of: (a) weighed average of the other available values of the sequence, and (b) linear regression. • Donor imputation can handle full sequence missing in time series. It imputes the Recipient Institution using the values taken from a similar institution, called Donor, selected using optimization criteria. • The values imputed by our techniques should respect the trend, the size and the ratios of each Institution.

Published in MethodsX

ISSN: 2215-0161 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Science
Website: http://www.journals.elsevier.com/methodsx/

About the journal

Abstract

Keywords