A Preprocessing Manifold Learning Strategy Based on t-Distributed Stochastic Neighbor Embedding

Sha Shi; Yefei Xu; Xiaoyang Xu; Xiaofan Mo; Jun Ding

doi:10.3390/e25071065

Entropy (Jul 2023)

A Preprocessing Manifold Learning Strategy Based on t-Distributed Stochastic Neighbor Embedding

Sha Shi,
Yefei Xu,
Xiaoyang Xu,
Xiaofan Mo,
Jun Ding

Affiliations

Sha Shi: State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Yefei Xu: State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Xiaoyang Xu: State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Xiaofan Mo: National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing 100101, China
Jun Ding: Institute of Information Sensing, Xidian University, 2 South TaiBai Road, Xi’an 710071, China

DOI: https://doi.org/10.3390/e25071065
Journal volume & issue: Vol. 25, no. 7
p. 1065

Abstract

Read online

In machine learning and data analysis, dimensionality reduction and high-dimensional data visualization can be accomplished by manifold learning using a t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm. We significantly improve this manifold learning scheme by introducing a preprocessing strategy for the t-SNE algorithm. In our preprocessing, we exploit Laplacian eigenmaps to reduce the high-dimensional data first, which can aggregate each data cluster and reduce the Kullback–Leibler divergence (KLD) remarkably. Moreover, the k-nearest-neighbor (KNN) algorithm is also involved in our preprocessing to enhance the visualization performance and reduce the computation and space complexity. We compare the performance of our strategy with that of the standard t-SNE on the MNIST dataset. The experiment results show that our strategy exhibits a stronger ability to separate different clusters as well as keep data of the same kind much closer to each other. Moreover, the KLD can be reduced by about 30% at the cost of increasing the complexity in terms of runtime by only 1–2%.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords