Frontiers in Genetics (Nov 2021)

Copy Number Variation Identification on 3,800 Alzheimer’s Disease Whole Genome Sequencing Data from the Alzheimer’s Disease Sequencing Project

  • Wan-Ping Lee,
  • Wan-Ping Lee,
  • Wan-Ping Lee,
  • Albert A. Tucci,
  • Mitchell Conery,
  • Mitchell Conery,
  • Yuk Yee Leung,
  • Yuk Yee Leung,
  • Yuk Yee Leung,
  • Amanda B. Kuzma,
  • Otto Valladares,
  • Yi-Fan Chou,
  • Wenbin Lu,
  • Li-San Wang,
  • Li-San Wang,
  • Li-San Wang,
  • Gerard D. Schellenberg,
  • Gerard D. Schellenberg,
  • Jung-Ying Tzeng,
  • Jung-Ying Tzeng

DOI
https://doi.org/10.3389/fgene.2021.752390
Journal volume & issue
Vol. 12

Abstract

Read online

Alzheimer’s Disease (AD) is a progressive neurologic disease and the most common form of dementia. While the causes of AD are not completely understood, genetics plays a key role in the etiology of AD, and thus finding genetic factors holds the potential to uncover novel AD mechanisms. For this study, we focus on copy number variation (CNV) detection and burden analysis. Leveraging whole-genome sequence (WGS) data released by Alzheimer’s Disease Sequencing Project (ADSP), we developed a scalable bioinformatics pipeline to identify CNVs. This pipeline was applied to 1,737 AD cases and 2,063 cognitively normal controls. As a result, we observed 237,306 and 42,767 deletions and duplications, respectively, with an average of 2,255 deletions and 1,820 duplications per subject. The burden tests show that Non-Hispanic-White cases on average have 16 more duplications than controls do (p-value 2e-6), and Hispanic cases have larger deletions than controls do (p-value 6.8e-5).

Keywords