Benchmarking RNA-Seq Aligners at Base-Level and Junction Base-Level Resolution Using the <i>Arabidopsis thaliana</i> Genome

Tallon Coxe; David J. Burks; Utkarsh Singh; Ron Mittler; Rajeev K. Azad

doi:10.3390/plants13050582

Plants (Feb 2024)

Benchmarking RNA-Seq Aligners at Base-Level and Junction Base-Level Resolution Using the <i>Arabidopsis thaliana</i> Genome

Tallon Coxe,
David J. Burks,
Utkarsh Singh,
Ron Mittler,
Rajeev K. Azad

Affiliations

Tallon Coxe: Department of Biological Sciences and BioDiscovery Institute, College of Science, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA
David J. Burks: Department of Biological Sciences and BioDiscovery Institute, College of Science, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA
Utkarsh Singh: Texas Academy of Mathematics and Science, University of North Texas, Denton, TX 76203, USA
Ron Mittler: The Division of Plant Science and Technology, and Interdisciplinary Plant Group, College of Agriculture, Food and Natural Resources, Christopher S. Bond Life Sciences Center University of Missouri, 1201 Rollins St., Columbia, MO 65201, USA
Rajeev K. Azad: Department of Biological Sciences and BioDiscovery Institute, College of Science, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA

DOI: https://doi.org/10.3390/plants13050582
Journal volume & issue: Vol. 13, no. 5
p. 582

Abstract

Read online

The utmost goal of selecting an RNA-Seq alignment software is to perform accurate alignments with a robust algorithm, which is capable of detecting the various intricacies underlying read-mapping procedures and beyond. Most alignment software tools are typically pre-tuned with human or prokaryotic data, and therefore may not be suitable for applications to other organisms, such as plants. The rapidly growing plant RNA-Seq databases call for the assessment of the alignment tools on curated plant data, which will aid the calibration of these tools for applications to plant transcriptomic data. We therefore focused here on benchmarking RNA-Seq read alignment tools, using simulated data derived from the model organism Arabidopsis thaliana. We assessed the performance of five popular RNA-Seq alignment tools that are currently available, based on their usage (citation count). By introducing annotated single nucleotide polymorphisms (SNPs) from The Arabidopsis Information Resource (TAIR), we recorded alignment accuracy at both base-level and junction base-level resolutions for each alignment tool. In addition to assessing the performance of the alignment tools at their default settings, accuracies were also recorded by varying the values of numerous parameters, including the confidence threshold and the level of SNP introduction. The performances of the aligners were found consistent under various testing conditions at the base-level accuracy; however, the junction base-level assessment produced varying results depending upon the applied algorithm. At the read base-level assessment, the overall performance of the aligner STAR was superior to other aligners, with the overall accuracy reaching over 90% under different test conditions. On the other hand, at the junction base-level assessment, SubRead emerged as the most promising aligner, with an overall accuracy over 80% under most test conditions.

Published in Plants

ISSN: 2223-7747 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Botany
Website: http://www.mdpi.com/journal/plants

About the journal

Abstract

Keywords