Computational and Structural Biotechnology Journal (Jan 2022)

PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study

  • Mahdieh Labani,
  • Ali Afrasiabi,
  • Amin Beheshti,
  • Nigel H. Lovell,
  • Hamid Alinejad-Rokny

Journal volume & issue
Vol. 20
pp. 4975 – 4983

Abstract

Read online

Copy Number Variation (CNV) refers to a type of structural genomic alteration in which a segment of chromosome is duplicated or deleted. To date, many CNVs have been identified as causative genetic elements for several diseases and phenotypes. However, performing a CNV-based genome-wide association study is challenging due to inconsistency in length and occurrence of CNVs across different individuals under investigation. One of the most efficient strategies to address this issue is building CNV regions (genomic regions in which CNVs are overlapping - CNVRs). However, this approach is susceptible to a high false positive rate due to overlapping and co-occurring of confounding CNVRs with true positive CNVRs. Here, we develop PeakCNV that differentiates false-positive CNVRs from true positives by calculating a new metric, independence ranking score, (IR-score) via a feature ranking approach. We compared the performance of PeakCNV with other current existing tools by carrying out two case studies one using the CNV genotype data for individuals with prostate cancer (194 cases and 2,392 healthy individuals) and the second one for individuals with neurodevelopmental disorders (19,642 cases and 6,451 healthy individuals). Crucially, our benchmarking analyses on prostate cancer cohort indicated that PeakCNV identifies a fewer risk candidate CNVRs with shorter lengths compared to other tools. Importantly, these CNVRs cover a greater proportion of case over healthy individuals compared to other tools. The accuracy of PeakCNV in identifying relevant candidate CNVRs was reproducible in the case study on neurodevelopmental disorders. Using data from the FANTOM5 expression atlas and the Clinical Genomic Database, we show that the candidate CNVRs identified by PeakCNV for neurodevelopmental disorders overlap with a greater number of genes with the brain-enriched expression, and a greater number of genes that are associated with neurological conditions compared to candidate CNVRs identified by other tools. Taken together, PeakCNV outperformed current existing CNV association study tools by identifying more biologically meaningful CNVRs relevant to the phenotype of interest. PeakCNV is publicly available for the analysis of CNV-associated diseases and is accessible from https://rdrr.io/github/mahdieh1/PeakCNV.

Keywords