LUSTR: a new customizable tool for calling genome-wide germline and somatic short tandem repeat variants
Jinfeng Lu,
Camilo Toro,
David R. Adams,
Undiagnosed Diseases Network,
Cristiane Araujo Martins Moreno,
Wan-Ping Lee,
Yuk Yee Leung,
Mathew B. Harms,
Badri Vardarajan,
Erin L. Heinzen
Affiliations
Jinfeng Lu
Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill
Camilo Toro
NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health
David R. Adams
NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health
Undiagnosed Diseases Network
NIH Undiagnosed Diseases Program, National Human Genome Research Institute (NHGRI), National Institutes of Health
Cristiane Araujo Martins Moreno
Neurology Department, Universidade de São Paulo
Wan-Ping Lee
Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania
Yuk Yee Leung
Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory MedicinePerelman School of Medicine, University of Pennsylvania
Mathew B. Harms
Department of Neurology, Division of Neuromuscular Medicine, Columbia University Irving Medical Center
Badri Vardarajan
The Taub Institute for Research On Alzheimer’s Disease and the Aging Brain, Gertrude H. Sergievsky Center, Department of Neurology, College of Physicians and Surgeons, Columbia University, The New York Presbyterian Hospital
Erin L. Heinzen
Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill
Abstract Background Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. Results Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. Conclusions LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.