SoftwareX (Jun 2022)

NSDPY: A python package to download DNA sequences from NCBI

  • Raphaël Hebert,
  • Emese Meglécz

Journal volume & issue
Vol. 18
p. 101038

Abstract

Read online

Downloading large batches of DNA sequences can be useful to create custom databases containing for example sequences of a particular genomic region or a group of organisms. These sequences can be found on NCBI databases and accessed via a web browser (GUI) or directly via NCBI API. While the GUI is user-friendly, it lacks certain functionalities. On the other extreme, the use of the API is flexible but requires coding knowledge. NSDPY is a python package that combines flexibility and ease of use to download large amount of DNA sequences and includes several taxonomic or filtering options like batch downloading sequences for a list of taxa, downloading sequences including taxonomic lineage or filtering CDS sequences for a specific gene. NSDPY is available on PyPI, it is written to minimize dependencies on other packages and to be used directly from the terminal by simple command lines so that most users can use it without prior coding experience.

Keywords