Applied Mathematics and Nonlinear Sciences (Jan 2024)

Research on Key Technologies of Deep Learning Techniques in Unstructured Data Processing

  • Zhang Guorong,
  • Fu Chengli,
  • Zhou Huiqin

DOI
https://doi.org/10.2478/amns-2024-3175
Journal volume & issue
Vol. 9, no. 1

Abstract

Read online

The rise of the Internet has brought about a rapid growth of unstructured data recorded in the form of text and audio. Two key techniques that can be used to process text data are proposed in this study, which applies deep learning techniques to unstructured data processing. First, the transformer feature extractor is used to characterize dynamic word vectors. Then, the MCNN neural network is combined with it to perform key information screening and construct a text classification model based on the MCNN transformer. Then, the text features extracted from the BERT model are input into the VAEGRU module, combined with the self-attention mechanism and the K-Means algorithm, to construct the text clustering model based on VAE-GRU. The MCNN-transformer model achieves a high level of accuracy and Macro-F1 value that exceeds 0.880 and is superior to other text categorization models through experimental analysis. The ACC and NMI results of the VAE-GRU model are both greater than 70% on the Stack Overflow and SearchSnippets datasets and greater than 48% on the Chinese dataset are greater than 48%, and their performance is better than the three ablation models by 15.03% to 85.67%. In this paper, the MCNN-transformer model and the VAE-GRU model are capable of competent classification and clustering processing in unstructured text data, which help to improve the efficiency of information understanding and utilization of unstructured data.

Keywords