Genome Biology (Jan 2024)
Accounting for diverse transposable element landscapes is key to developing and evaluating accurate de novo annotation strategies
Abstract
Abstract Transposable elements (TEs) are important drivers of genome evolution. Nonetheless, TE annotation remains a complex and challenging task. As more genomes from phylogenetically diverse species are published, a comprehensive pipeline for accurate annotation of diverse TEs is increasingly important. Recently, (Ou et al. Genome Biol. 20:275, 2019) developed a new comprehensive pipeline, Extensive De novo Transposable element Annotator (EDTA), and benchmarked its performance on the genomes of three species: maize, wheat, and fruit fly. Because TE landscapes can vary tremendously across species, we tested EDTA’s performance on four additional genomes with different TE landscapes: mouse, zebrafish, zebra finch, and chicken. Our analysis reveals that EDTA faces challenges with repeat classification in these genomes and underperforms overall relative to its benchmark dataset. Notably, EDTA consistently misclassifies nonLTR retrotransposons as DNA transposons, resulting in erroneous TE annotations for species with considerable repertoires of nonLTR retrotransposons. Overall, we set expectations for EDTA’s performance on genomes spanning additional diversity, urge caution when using EDTA on genomes with divergent TE repertoires from the species on which it was initially benchmarked, and hope to motivate the development of methods that are robust to both the diversity of TEs and TE landscapes observed across species.