Small Stochastic Data Compactification Concept Justified in the Entropy Basis

Viacheslav Kovtun; Elena Zaitseva; Vitaly Levashenko; Krzysztof Grochla; Oksana Kovtun

doi:10.3390/e25121567

Entropy (Nov 2023)

Small Stochastic Data Compactification Concept Justified in the Entropy Basis

Viacheslav Kovtun,
Elena Zaitseva,
Vitaly Levashenko,
Krzysztof Grochla,
Oksana Kovtun

Affiliations

Viacheslav Kovtun: Internet of Things Group, Institute of Theoretical and Applied Informatics Polish Academy of Sciences, Bałtycka 5, 44-100 Gliwice, Poland
Elena Zaitseva: Department of Informatics, University of Žilina, 010 26 Žilina, Slovakia
Vitaly Levashenko: Department of Informatics, University of Žilina, 010 26 Žilina, Slovakia
Krzysztof Grochla: Internet of Things Group, Institute of Theoretical and Applied Informatics Polish Academy of Sciences, Bałtycka 5, 44-100 Gliwice, Poland
Oksana Kovtun: Department of the Theory and Practice of Translation, Faculty of Foreign Languages, Vasyl’ Stus Donetsk National University, 600-Richchya Str., 21, 21000 Vinnytsia, Ukraine

DOI: https://doi.org/10.3390/e25121567
Journal volume & issue: Vol. 25, no. 12
p. 1567

Abstract

Read online

Measurement is a typical way of gathering information about an investigated object, generalized by a finite set of characteristic parameters. The result of each iteration of the measurement is an instance of the class of the investigated object in the form of a set of values of characteristic parameters. An ordered set of instances forms a collection whose dimensionality for a real object is a factor that cannot be ignored. Managing the dimensionality of data collections, as well as classification, regression, and clustering, are fundamental problems for machine learning. Compactification is the approximation of the original data collection by an equivalent collection (with a reduced dimension of characteristic parameters) with the control of accompanying information capacity losses. Related to compactification is the data completeness verifying procedure, which is characteristic of the data reliability assessment. If there are stochastic parameters among the initial data collection characteristic parameters, the compactification procedure becomes more complicated. To take this into account, this study proposes a model of a structured collection of stochastic data defined in terms of relative entropy. The compactification of such a data model is formalized by an iterative procedure aimed at maximizing the relative entropy of sequential implementation of direct and reverse projections of data collections, taking into account the estimates of the probability distribution densities of their attributes. The procedure for approximating the relative entropy function of compactification to reduce the computational complexity of the latter is proposed. To qualitatively assess compactification this study undertakes a formal analysis that uses data collection information capacity and the absolute and relative share of information losses due to compaction as its metrics. Taking into account the semantic connection of compactification and completeness, the proposed metric is also relevant for the task of assessing data reliability. Testing the proposed compactification procedure proved both its stability and efficiency in comparison with previously used analogues, such as the principal component analysis method and the random projection method.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords