Deep Unsupervised Fusion Feature Learning Method for Mixed Attribute Data

HE Huixia, WU Sen, WEI Guiying, XIE Jiayao, GAO Xiaonan

doi:10.3778/j.issn.1673-9418.2305025

Jisuanji kexue yu tansuo (Jul 2024)

Deep Unsupervised Fusion Feature Learning Method for Mixed Attribute Data

HE Huixia, WU Sen, WEI Guiying, XIE Jiayao, GAO Xiaonan

Affiliations

HE Huixia, WU Sen, WEI Guiying, XIE Jiayao, GAO Xiaonan: 1. School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China 2. State Grid Energy Research Institute Co., Ltd., Beijing 102209, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2305025
Journal volume & issue: Vol. 18, no. 7
pp. 1852 – 1864

Abstract

Read online

High-quality feature representation is the key to achieve accurate data mining. A deep unsupervised fusion feature learning model for mixed-attribute data (DUFERM) is proposed to address the problem that existing feature learning methods are difficult to effectively extract the association between different attributes and the real information within the data in mixed-attribute data. The model establishes a bimodal self-encoder framework that models categorical and numerical attributes using different paths and uses a deep multimodal fusion strategy to deepen the connection between the two attributes. A discrete feature self-encoder based on a weighted heterogeneous network is constructed for categorical attribute to fully exploit the structural and semantic information within the categorical attribute, a continuous feature self-encoder is constructed for numerical attribute, and the two independent self-encoders are combined in a common latent representation layer in the form of a joint representation. Finally, the fused feature representation of the mixed-attribute data is obtained by unsupervised training with a combination of pre-training and joint training. Extensive experiments on 10 publicly available datasets show that the proposed DUFERM model outperforms existing classical and novel mixed-attribute data feature learning methods in terms of comprehensive performance in all evaluation metrics, and can fully extract potential features within the mixed-attribute data, achieve high-quality fused feature representation results and improve the accuracy of downstream data mining tasks.

mixed attribute data; fusion feature learning; unsupervised; data mining

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords