Analyzing Quality Measurements for Dimensionality Reduction

Michael C. Thrun; Julian Märte; Quirin Stier

doi:10.3390/make5030056

Machine Learning and Knowledge Extraction (Aug 2023)

Analyzing Quality Measurements for Dimensionality Reduction

Michael C. Thrun,
Julian Märte,
Quirin Stier

Affiliations

Michael C. Thrun: Mathematics and Computer Science, Philipps University Marburg, Hans-Meerwein-Strasse 6, 35043 Marburg, Germany
Julian Märte: Mathematics and Computer Science, Philipps University Marburg, Hans-Meerwein-Strasse 6, 35043 Marburg, Germany
Quirin Stier: IAP-GmbH Intelligent Analytics Projects, In Den Birken 10a, 29352 Adelheidsdorf, Germany

DOI: https://doi.org/10.3390/make5030056
Journal volume & issue: Vol. 5, no. 3
pp. 1076 – 1118

Abstract

Read online

Dimensionality reduction methods can be used to project high-dimensional data into low-dimensional space. If the output space is restricted to two dimensions, the result is a scatter plot whose goal is to present insightful visualizations of distance- and density-based structures. The topological invariance of dimension indicates that the two-dimensional similarities in the scatter plot cannot coercively represent high-dimensional distances. In praxis, projections of several datasets with distance- and density-based structures show a misleading interpretation of the underlying structures. The examples outline that the evaluation of projections remains essential. Here, 19 unsupervised quality measurements (QM) are grouped into semantic classes with the aid of graph theory. We use three representative benchmark datasets to show that QMs fail to evaluate the projections of straightforward structures when common methods such as Principal Component Analysis (PCA), Uniform Manifold Approximation projection, or t-distributed stochastic neighbor embedding (t-SNE) are applied. This work shows that unsupervised QMs are biased towards assumed underlying structures. Based on insights gained from graph theory, we propose a new quality measurement called the Gabriel Classification Error (GCE). This work demonstrates that GCE can make an unbiased evaluation of projections. The GCE is accessible within the R package DR quality available on CRAN.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords