Современные информационные технологии и IT-образование (Sep 2021)
The Application of Data Transformations in the Calculation of a Composite Index of a System's Quality
Abstract
The paper examines features of data used when calculating composite indexes of complex systems. Principal component analysis gives an objective summary of the dataset, but is sensitive to the quality of the data. One of the main critiques of using multidimensional analysis when calculating the weights of composite indexes is ambiguity in the socio-economic interpretation of negative weight coefficients. The paper shows that known statistical characteristics of the data, such as the coefficient of asymmetry, the coefficient of variation and the presence/absence of a normal distribution of data do not allow the identification of anomalous variables. Anomalous data is considered such if the upper range outliers neutralize all other values for an indicator. Such data affects the calculated the weights and can be identified by using heatmaps. The logarithmic transformation of anomalous variables eliminates the peculiarities of the distribution of such data. The paper proposes an analytical criterion for determining the anomalous data. The criterion evaluates the signal-to-noise ratios of variables in a fixed range that does not contain zero. The justification of using the logarithmic transformation when assessing the quality of weakly formalized systems is demonstrated in the example of using the author’s modification of the PCA when studying the quality of life of the population of Russia’s regions. The paper shows that the use of logarithmic correction for anomalous variables eliminates the negativity of the weight coefficients and results in a redistribution of weights with a more correct socio-economic interpretation.
Keywords