Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows

Sztromwasser Paweł; Petersen Kjell; Puntervoll Pál

doi:10.1515/jib-2011-163

Journal of Integrative Bioinformatics (Jun 2011)

Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows

Sztromwasser Paweł,
Petersen Kjell,
Puntervoll Pál

Affiliations

Sztromwasser Paweł: Department of Informatics, University of Bergen, http://www.uib.no/ii Norway
Petersen Kjell: Computational Biology Unit, Uni Computing, Uni Research, http://www.computing.uni.no/, Norway
Puntervoll Pál: Computational Biology Unit, Uni Computing, Uni Research, http://www.computing.uni.no/, Norway

DOI: https://doi.org/10.1515/jib-2011-163
Journal volume & issue: Vol. 8, no. 2
pp. 95 – 114

Abstract

Read online

Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.

Published in Journal of Integrative Bioinformatics

ISSN: 1613-4516 (Online)
Publisher: De Gruyter
Country of publisher: Germany
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.degruyter.com/view/j/jib

About the journal