IEEE Access (Jan 2020)
A New Approach to Improve the Topological Stability in Non-Linear Dimensionality Reduction
Abstract
Dimensionality reduction in the machine learning field mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new non-linear techniques for dimensionality reduction based on the assumption that data laying on or near a complex low-dimensional manifold is embedded in the high-dimensional space. On the other side, new techniques for dimensionality reduction aim to identify and extracting the manifold from the high-dimensional space. Isomap is one of widely-used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). The Isomap chooses the k-nearest neighbors based on the distance only which causes bridges and topological instability. In this paper, we propose a new algorithm to find the nearest neighbors to optimize the number of short-circuit errors and thus improve the topological stability. We assume that any point on the manifold and its nearest neighbors form a vector subspace and the orthogonal to that subspace is orthogonal to all vectors spans the vector subspace. Therefore, in the proposed algorithm, the point on the manifold and its nearest neighbors (i.e. the selected number of nearest neighbors based on the required subspace’s dimension) are used to find the bases of the subspace and the orthogonal to that subspace which belongs to the orthogonal complementary subspace. Then, candidate neighbors points will be added to the nearest neighbors based on the distance and the angle between each candidate point and the orthogonal to the subspace. The new algorithm is tested using low dimensional (3D) synthesized datasets and high dimensional real datasets. The superior performance of the new algorithm in choosing the nearest neighbors is confirmed through visually inspecting its capability find to the correct $2D$ representation of the synthesized datasets at different $k$ . Moreover, In the case of the high dimensional real datasets, the new algorithm yields a lower residual variance than the standard Isomap. Finally, we investigated using the new algorithm to find the nearest neighbors for the Locally Linear Embedding (LLE) algorithm. We tested the LLE variant using synthesized datasets and high dimensional real datasets and the results were promising
Keywords