Genome Biology (Feb 2022)

Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software

  • Paul P. Gardner,
  • James M. Paterson,
  • Stephanie McGimpsey,
  • Fatemeh Ashari-Ghomi,
  • Sinan U. Umu,
  • Aleksandra Pawlik,
  • Alex Gavryushkin,
  • Michael A. Black

DOI
https://doi.org/10.1186/s13059-022-02625-x
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software. Results We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs. Conclusions Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish—possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate.