Cogent Engineering (Jan 2018)
Data clustering and imputing using a two-level multi-objective genetic algorithm (GA): A case study of maintenance cost data for tunnel fans
Abstract
This study develops a new two-level multi-objective genetic algorithm (GA) to optimise clustering to reduce and impute missing cost data for fans used in road tunnels by the Swedish Transport Administration (Trafikverket). Level 1 uses a multi-objective GA based on fuzzy c-means to cluster cost data objects based on three main indices. The first is cluster centre outliers; the second is the compactness and separation (vk ) of the data points and cluster centres; the third is the intensity of data points belonging to the derived clusters. Our clustering model is validated using k-means clustering. Level 2 uses a multi-objective GA to impute the reduced missing cost data in volumeusing a valid data period. The optimal population has a low vk , 0.1%, and a high intensity, 99%. It has three cluster centres, and the highest data reduction is 27%. These three cluster centres have a suitable geometry, so the cost data can be partitioned into relevant contents to be redacted for imputing. Our model shows better clustering detection and evaluation than models using k-means. The percentage of missing data for the two cost objects is the following: labour 57%, materials 81%. The second level shows highly correlated data (R-squared 0.99) after imputing. Therefore, the study concludes multi-objective GA can cluster and impute data to derive complete data for forecasting.
Keywords