Jisuanji kexue yu tansuo (Jul 2024)

Deep Unsupervised Fusion Feature Learning Method for Mixed Attribute Data

  • HE Huixia, WU Sen, WEI Guiying, XIE Jiayao, GAO Xiaonan

DOI
https://doi.org/10.3778/j.issn.1673-9418.2305025
Journal volume & issue
Vol. 18, no. 7
pp. 1852 – 1864

Abstract

Read online

High-quality feature representation is the key to achieve accurate data mining. A deep unsupervised fusion feature learning model  for mixed-attribute data (DUFERM) is proposed to address the problem that existing feature learning methods are difficult to effectively extract the association between different attributes and the real information within the data in mixed-attribute data. The model establishes a bimodal self-encoder framework that models categorical and numerical attributes using different paths and uses a deep multimodal fusion strategy to deepen the connection between the two attributes. A discrete feature self-encoder based on a weighted heterogeneous network is constructed for categorical attribute to fully exploit the structural and semantic information within the categorical attribute, a continuous feature self-encoder is constructed for numerical attribute, and the two independent self-encoders are combined in a common latent representation layer in the form of a joint representation. Finally, the fused feature representation of the mixed-attribute data is obtained by unsupervised training with a combination of pre-training and joint training. Extensive experiments on 10 publicly available datasets show that the proposed DUFERM model outperforms existing classical and novel mixed-attribute data feature learning methods in terms of comprehensive performance in all evaluation metrics, and can fully extract potential features within the mixed-attribute data, achieve high-quality fused feature representation results and improve the accuracy of downstream data mining tasks.

Keywords