MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses

Ranko Gacesa; Jurica Zucko; Solveig K. Petursdottir; Elisabet Eik Gudmundsdottir; Olafur H. Fridjonsson; Janko Diminic; Paul F. Long; John Cullum; Daslav Hranueli; Gudmundur O. Hreggvidsson; Antonio Starcevic

doi:10.17113/ftb.55.02.17.4749

Food Technology and Biotechnology (Jan 2017)

MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses

Ranko Gacesa,
Jurica Zucko,
Solveig K. Petursdottir,
Elisabet Eik Gudmundsdottir,
Olafur H. Fridjonsson,
Janko Diminic,
Paul F. Long,
John Cullum,
Daslav Hranueli,
Gudmundur O. Hreggvidsson,
Antonio Starcevic

Affiliations

Ranko Gacesa: SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Jurica Zucko: SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Solveig K. Petursdottir: Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Elisabet Eik Gudmundsdottir: Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Olafur H. Fridjonsson: Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Janko Diminic: SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Paul F. Long: Institute of Pharmaceutical Science, King’s College London, Franklin-Wilkins Building, Stamford Street, London SE1 9NH, UK
John Cullum: Department of Genetics, University of Kaiserslautern, Postfach 3049, DE-67653 Kaiserslautern, Germany
Daslav Hranueli: SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
Gudmundur O. Hreggvidsson: Matis Ltd., Vínlandsleið 12, IS-113 Reykjavík, Iceland
Antonio Starcevic: SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia

DOI: https://doi.org/10.17113/ftb.55.02.17.4749
Journal volume & issue: Vol. 55, no. 2
pp. 251 – 257

Abstract

Read online

The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya. The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel ‘functional’ assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

Published in Food Technology and Biotechnology

ISSN: 1330-9862 (Print); 1334-2606 (Online)
Publisher: University of Zagreb Faculty of Food Technology and Biotechnology
Country of publisher: Croatia
LCC subjects: Technology: Chemical technology: Biotechnology; Technology: Chemical technology: Food processing and manufacture
Website: https://www.ftb.com.hr/

About the journal

Abstract

Keywords