IEEE Access (Jan 2021)

Topic-Document Inference With the Gumbel-Softmax Distribution

  • Amit Kumar,
  • Nazanin Esmaili,
  • Massimo Piccardi

DOI
https://doi.org/10.1109/ACCESS.2020.3046607
Journal volume & issue
Vol. 9
pp. 1313 – 1320

Abstract

Read online

Topic modeling is an important application of natural language processing (NLP) that can automatically identify the set of main topics of a given, typically large, collection of documents. In addition to identifying the main topics in the given collection, topic modeling infers which combination of topics is addressed by each individual document (the so-called topic-document inference), which can be useful for their classification and organization. However, the distributional assumptions for this inference are typically restricted to the Dirichlet family which can limit the performance of the model. For this reason, in this paper we propose modeling the topic-document inference with the Gumbel-Softmax distribution, a distribution recently introduced to expand differentiability in deep networks. To set up a performing system, the proposed approach integrates Gumbel-Softmax topic-document inference in a state-of-the-art topic model based on a deep variational autoencoder. Experimental results over two probing datasets show that the proposed approach has been able to outperform the original deep variational autoencoder and other popular topic models in terms of test-set perplexity and two topic coherence measures.

Keywords