Bioinformatics and Biology Insights (Nov 2019)

Advancing HIV Vaccine Research With Low-Cost High-Performance Computing Infrastructure: An Alternative Approach for Resource-Limited Settings

  • Batsirai M Mabvakure,
  • Raymond Rott,
  • Leslie Dobrowsky,
  • Peter Van Heusden,
  • Lynn Morris,
  • Cathrine Scheepers,
  • Penny L Moore

DOI
https://doi.org/10.1177/1177932219882347
Journal volume & issue
Vol. 13

Abstract

Read online

Next-generation sequencing (NGS) technologies have revolutionized biological research by generating genomic data that were once unaffordable by traditional first-generation sequencing technologies. These sequencing methodologies provide an opportunity for in-depth analyses of host and pathogen genomes as they are able to sequence millions of templates at a time. However, these large datasets can only be efficiently explored using bioinformatics analyses requiring huge data storage and computational resources adapted for high-performance processing. High-performance computing allows for efficient handling of large data and tasks that may require multi-threading and prolonged computational times, which is not feasible with ordinary computers. However, high-performance computing resources are costly and therefore not always readily available in low-income settings. We describe the establishment of an affordable high-performance computing bioinformatics cluster consisting of 3 nodes, constructed using ordinary desktop computers and open-source software including Linux Fedora, SLURM Workload Manager, and the Conda package manager. For the analysis of large antibody sequence datasets and for complex viral phylodynamic analyses, the cluster out-performed desktop computers. This has demonstrated that it is possible to construct high-performance computing capacity capable of analyzing large NGS data from relatively low-cost hardware and entirely free (open-source) software, even in resource-limited settings. Such a cluster design has broad utility beyond bioinformatics to other studies that require high-performance computing.