On the association analysis of CNV data: a fast and robust family-based association method

Meiling Liu; Sanghoon Moon; Longfei Wang; Sulgi Kim; Yeon-Jung Kim; Mi Yeong Hwang; Young Jin Kim; Robert C. Elston; Bong-Jo Kim; Sungho Won

doi:10.1186/s12859-017-1622-z

BMC Bioinformatics (Apr 2017)

On the association analysis of CNV data: a fast and robust family-based association method

Meiling Liu,
Sanghoon Moon,
Longfei Wang,
Sulgi Kim,
Yeon-Jung Kim,
Mi Yeong Hwang,
Young Jin Kim,
Robert C. Elston,
Bong-Jo Kim,
Sungho Won

Affiliations

Meiling Liu: Department of Applied Statistics, Chung-Ang University
Sanghoon Moon: Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health
Longfei Wang: Interdisciplinary Program of Bioinformatics, Seoul National University
Sulgi Kim: Naver Labs, 235 Pangyoyeok-ro, Bundang-gu, Seongnam-si
Yeon-Jung Kim: Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health
Mi Yeong Hwang: Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health
Young Jin Kim: Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health
Robert C. Elston: Department of Epidemiology and Biostatistics, Case Western Reserve University
Bong-Jo Kim: Division of Structural and Functional Genomics, Center for Genome Science, National Institute of Health
Sungho Won: Interdisciplinary Program of Bioinformatics, Seoul National University

DOI: https://doi.org/10.1186/s12859-017-1622-z
Journal volume & issue: Vol. 18, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Copy number variation (CNV) is known to play an important role in the genetics of complex diseases and several methods have been proposed to detect association of CNV with phenotypes of interest. Statistical methods for CNV association analysis can be categorized into two different strategies. First, the copy number is estimated by maximum likelihood and association of the expected copy number with the phenotype is tested. Second, the observed probe intensity measurements can be directly used to detect association of CNV with the phenotypes of interest. Results For each strategy we provide a statistic that can be applied to extended families. The computational efficiency of the proposed methods enables genome-wide association analysis and we show with simulation studies that the proposed methods outperform other existing approaches. In particular, we found that the first strategy is always more efficient than the second strategy no matter whether copy numbers for each individual are well identified or not. With the proposed methods, we performed genome-wide CNV association analyses of hematological trait, hematocrit, on 521 Korean family samples. Conclusions We found that statistical analysis with the expected copy number is more powerful than the statistic with the probe intensity measurements regardless of the accuracy of the estimation of copy numbers.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords