Probabilistic Latent Semantic Analysis Applied to Whole Bacterial Genomes Identifies Common Genomic Features

Rusakovica J.; Hallinan J.; Wipat A.; Zuliani P.

doi:10.1515/jib-2014-243

Journal of Integrative Bioinformatics (Jun 2014)

Probabilistic Latent Semantic Analysis Applied to Whole Bacterial Genomes Identifies Common Genomic Features

Rusakovica J.,
Hallinan J.,
Wipat A.,
Zuliani P.

Affiliations

Rusakovica J.: School of Computing Science, and Centre for Synthetic Biology and Bioexploitation, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom of Great Britain and Northern Ireland
Hallinan J.: School of Computing Science, and Centre for Synthetic Biology and Bioexploitation, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom of Great Britain and Northern Ireland
Wipat A.: School of Computing Science, and Centre for Synthetic Biology and Bioexploitation, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom of Great Britain and Northern Ireland
Zuliani P.: School of Computing Science, and Centre for Synthetic Biology and Bioexploitation, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom of Great Britain and Northern Ireland

DOI: https://doi.org/10.1515/jib-2014-243
Journal volume & issue: Vol. 11, no. 2
pp. 93 – 105

Abstract

Read online

The spread of drug resistance amongst clinically-important bacteria is a serious, and growing, problem [1]. However, the analysis of entire genomes requires considerable computational effort, usually including the assembly of the genome and subsequent identification of genes known to be important in pathology. An alternative approach is to use computational algorithms to identify genomic differences between pathogenic and non-pathogenic bacteria, even without knowing the biological meaning of those differences. To overcome this problem, a range of techniques for dimensionality reduction have been developed. One such approach is known as latent-variable models [2]. In latent-variable models dimensionality reduction is achieved by representing a high-dimensional data by a few hidden or latent variables, which are not directly observed but inferred from the observed variables present in the model. Probabilistic Latent Semantic Indexing (PLSA) is an extention of LSA [3]. PLSA is based on a mixture decomposition derived from a latent class model. The main objective of the algorithm, as in LSA, is to represent high-dimensional co-occurrence information in a lower-dimensional way in order to discover the hidden semantic structure of the data using a probabilistic framework.

Published in Journal of Integrative Bioinformatics

ISSN: 1613-4516 (Online)
Publisher: De Gruyter
Country of publisher: Germany
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.degruyter.com/view/j/jib

About the journal