KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition

Mahdieh Labani; Amin Beheshti; Nigel H. Lovell; Hamid Alinejad-Rokny; Ali Afrasiabi

doi:10.3390/ijms232214418

International Journal of Molecular Sciences (Nov 2022)

KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition

Mahdieh Labani,
Amin Beheshti,
Nigel H. Lovell,
Hamid Alinejad-Rokny,
Ali Afrasiabi

Affiliations

Mahdieh Labani: Biomedical Machine Learning Lab, The Graduate School of Biomedical Engineering, University of New South Wales (UNSW), Sydney, NSW 2052, Australia
Amin Beheshti: Data Analytics Lab, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
Nigel H. Lovell: The Graduate School of Biomedical Engineering (GSBmE), University of New South Wales (UNSW), Sydney, NSW 2052, Australia
Hamid Alinejad-Rokny: Biomedical Machine Learning Lab, The Graduate School of Biomedical Engineering, University of New South Wales (UNSW), Sydney, NSW 2052, Australia
Ali Afrasiabi: Biomedical Machine Learning Lab, The Graduate School of Biomedical Engineering, University of New South Wales (UNSW), Sydney, NSW 2052, Australia

DOI: https://doi.org/10.3390/ijms232214418
Journal volume & issue: Vol. 23, no. 22
p. 14418

Abstract

Read online

Here we developed KARAJ, a fast and flexible Linux command-line tool to automate the end-to-end process of querying and downloading a wide range of genomic and transcriptomic sequence data types. The input to KARAJ is a list of PMCIDs or publication URLs or various types of accession numbers to automate four tasks as follows; firstly, it provides a summary list of accessible datasets generated by or used in these scientific articles, enabling users to select appropriate datasets; secondly, KARAJ calculates the size of files that users want to download and confirms the availability of adequate space on the local disk; thirdly, it generates a metadata table containing sample information and the experimental design of the corresponding study; and lastly, it enables users to download supplementary data tables attached to publications. Further, KARAJ provides a parallel downloading framework powered by Aspera connect which reduces the downloading time significantly.

Published in International Journal of Molecular Sciences

ISSN: 1661-6596 (Print); 1422-0067 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Biology (General); Science: Chemistry
Website: http://www.mdpi.com/journal/ijms

About the journal

Abstract

Keywords