A novel autoencoder approach to feature extraction with linear separability for high-dimensional data

Jian Zheng; Hongchun Qu; Zhaoni Li; Lin Li; Xiaoming Tang; Fei Guo

doi:10.7717/peerj-cs.1061

PeerJ Computer Science (Aug 2022)

A novel autoencoder approach to feature extraction with linear separability for high-dimensional data

Jian Zheng,
Hongchun Qu,
Zhaoni Li,
Lin Li,
Xiaoming Tang,
Fei Guo

Affiliations

Jian Zheng: College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
Hongchun Qu: College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
Zhaoni Li: College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
Lin Li: College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing, China
Xiaoming Tang: College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China
Fei Guo: College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, China

DOI: https://doi.org/10.7717/peerj-cs.1061
Journal volume & issue: Vol. 8
p. e1061

Abstract

Read online Read online

Feature extraction often needs to rely on sufficient information of the input data, however, the distribution of the data upon a high-dimensional space is too sparse to provide sufficient information for feature extraction. Furthermore, high dimensionality of the data also creates trouble for the searching of those features scattered in subspaces. As such, it is a tricky task for feature extraction from the data upon a high-dimensional space. To address this issue, this article proposes a novel autoencoder method using Mahalanobis distance metric of rescaling transformation. The key idea of the method is that by implementing Mahalanobis distance metric of rescaling transformation, the difference between the reconstructed distribution and the original distribution can be reduced, so as to improve the ability of feature extraction to the autoencoder. Results show that the proposed approach wins the state-of-the-art methods in terms of both the accuracy of feature extraction and the linear separabilities of the extracted features. We indicate that distance metric-based methods are more suitable for extracting those features with linear separabilities from high-dimensional data than feature selection-based methods. In a high-dimensional space, evaluating feature similarity is relatively easier than evaluating feature importance, so that distance metric methods by evaluating feature similarity gain advantages over feature selection methods by assessing feature importance for feature extraction, while evaluating feature importance is more computationally efficient than evaluating feature similarity.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords