A Generative Model for Topic Discovery and Polysemy Embeddings on Directed Attributed Networks

Bianfang Chai; Xinyu Ji; Jianglin Guo; Lixiao Ma; Yibo Zheng

doi:10.3390/sym14040703

Symmetry (Mar 2022)

A Generative Model for Topic Discovery and Polysemy Embeddings on Directed Attributed Networks

Bianfang Chai,
Xinyu Ji,
Jianglin Guo,
Lixiao Ma,
Yibo Zheng

Affiliations

Bianfang Chai: Hebei Key Laboratory of Optoelectronic Information and Geo-Detection Technology, Hebei GEO University, Shijiazhuang 050031, China
Xinyu Ji: Information Engineering College, Hebei GEO University, Shijiazhuang 050031, China
Jianglin Guo: Information Engineering College, Hebei GEO University, Shijiazhuang 050031, China
Lixiao Ma: Information Engineering College, Hebei GEO University, Shijiazhuang 050031, China
Yibo Zheng: Hebei Key Laboratory of Optoelectronic Information and Geo-Detection Technology, Hebei GEO University, Shijiazhuang 050031, China

DOI: https://doi.org/10.3390/sym14040703
Journal volume & issue: Vol. 14, no. 4
p. 703

Abstract

Read online

Combining topic discovery with topic-specific word embeddings is a popular, powerful method for text mining in a small collection of documents. However, the existing researches purely modeled on the contents of documents and led to discovering noisy topics. This paper proposes a generative model, the skip-gram topical word-embedding model (simplified as steoLC) on asymmetric document link networks, where nodes correspond to documents and links refer to directed references between documents. It simultaneously improves the performance of topic discovery and polysemous word embeddings. Each skip-gram in a document is generated based on the topic distribution of the document and the two word embeddings in the skip-gram. Each directed link is generated based on the hidden topic distribution of the beginning document node. For a document, the skip-grams and links share a common topic distribution. Parameter estimation is inferred and an algorithm is designed to learn the model parameters by combining the expectation-maximization (EM) algorithm with the negative sampling method. Experimental results show that our method generates more useful topic-specific word embeddings and coherent latent topics than the state-of-the-art models.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords