Mathematics (Jun 2022)
WAVECNV: A New Approach for Detecting Copy Number Variation by Wavelet Clustering
Abstract
Copy number variation (CNV) detection based on second-generation sequencing technology is the basis of much gene research, but the read depth is affected by mapping errors, repeated reads, and GC bias. The existing methods have low sensitivity to variation regions with a short length and small variation range. Therefore, it is necessary to improve the sensitivity of algorithms to short-variation fragments. This study proposes a new CNV-detection method named WAVECNV to solve this issue. The algorithm uses wavelet clustering to process the read depth and determine the normal cluster and abnormal cluster according to the size of the cluster. Then, according to the distance between genome bins and normal clusters, the outlier of each genome bin is evaluated. Finally, a statistical model is established, and the p-value test is used for calling CNVs. Through this method, the information of the short variation region is retained. WAVECNV was tested and compared with peer methods in terms of simulated data and real cancer-sequencing data. The results show that the sensitivity of WAVECNV is better than the existing methods. It also has high precision in data with low purity and coverage. In real data experiments, WAVECNV can detect more cancer genes than existing methods. Therefore, this method can be regarded as a conventional method in the field of genomic mutation analysis of cancer samples.
Keywords