Combination of Bayesian and Latent Semantic Analysis with Domain Specific Knowledge

Shen Lu; Richard S. Segall

Journal of Systemics, Cybernetics and Informatics (Jun 2016)

Combination of Bayesian and Latent Semantic Analysis with Domain Specific Knowledge

Shen Lu,
Richard S. Segall

Affiliations

Shen Lu
Richard S. Segall

Journal volume & issue: Vol. 14, no. 3
pp. 43 – 50

Abstract

Read online

With the development of information technology, electronic publications become popular. However, it is a challenge to retrieve information from electronic publications because the large amount of words, the synonymy problem and the polysemi problem. In this paper, we introduced a new algorithm called Bayesian Latent Semantic Analysis (BLSA). We chose to model text not based on terms but associations between words. Also, the significance of interesting features were improved by expand the number of similar terms with glossaries. Latent Semantic Analysis (LSA) was chosen to discover significant features. Bayesian post probability was used to discover segmentation boundaries. Also, Dirchlet distribution was chosen to present the vector of topic distribution and calculate the maximum probability of the topics. Experimental results showed us that both Pk [8] and WindowsDiff [27] decreased 10% by using BLSA in comparison to the Lexical Cohesion with the original data. [8] Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990), 'Indexing by latent semantic analysis', Journal of the American Society for Information Science, vol. 41, n.6, pp. 391-407. [27] Pevzner, L. and Hearst, M.A. (2002). A critique and improvement of an evaluation metric for text segmentation, Computational Linguistics, vol. 28, no. 1, pp. 19-36.

Published in Journal of Systemics, Cybernetics and Informatics

ISSN: 1690-4532 (Print); 1690-4524 (Online)
Publisher: International Institute of Informatics and Cybernetics
Country of publisher: United States
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Language and Literature: Philology. Linguistics: Communication. Mass media
Website: http://www.iiisci.org/journal/sci/Home.asp

About the journal

Abstract

Keywords