Collaboratively Modeling and Embedding of Latent Topics for Short Texts

Zheng Liu; Tingting Qin; Ke-Jia Chen; Yun Li

doi:10.1109/ACCESS.2020.2997973

IEEE Access (Jan 2020)

Collaboratively Modeling and Embedding of Latent Topics for Short Texts

Zheng Liu,
Tingting Qin,
Ke-Jia Chen,
Yun Li

Affiliations

Zheng Liu: ORCiD; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
Tingting Qin: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
Ke-Jia Chen: ORCiD; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
Yun Li: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China

DOI: https://doi.org/10.1109/ACCESS.2020.2997973
Journal volume & issue: Vol. 8
pp. 99141 – 99153

Abstract

Read online

Deriving a successful document representation is the critical challenge in many downstream tasks in NLP, especially when documents are very short. It is challenging to handle the sparsity and the noise problems confronting short texts. Some approaches employ latent topic models, based on global word co-occurrence, to obtain topic distribution as the representation. Others leverage word embeddings, which consider local conditional dependencies, to map a document as a summation vector of them. Unlike the existing works which explore the strategy of utilizing one to help the other, i.e., topic models for word embeddings or vice versa, we propose CME-DMM, a collaboratively modeling and embedding framework for capturing coherent latent topics from short texts. CME-DMM incorporates topic and word embeddings through the attention mechanism and implants them into the latent topic models, which significantly improve the quality of latent topics. Extensive experiments demonstrate that CME-DMM could perceive more coherent topics than other popular methods, resulting in a better performance in downstream NLP tasks such as classification. Besides the interpretable latent topics, the corresponding topic embeddings can describe the meanings of latent topics in the semantic space. The attention vectors, as a by-product of the learning process, can identify the keywords in noisy short texts.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords