Informatics in Medicine Unlocked (Jan 2023)

SafeMut: UMI-aware variant simulator incorporating allele-fraction overdispersion in read editing

  • Xiaofei Zhao,
  • Jingyu Guo,
  • Sizhen Wang

Journal volume & issue
Vol. 41
p. 101307

Abstract

Read online

Next-generation sequencing (NGS) has been widely used for calling biological variants. The gold-standard methodology for accessing the ability of a computational method to call a specific variant is to perform NGS wet-lab experiments on samples known to harbor this variant. Nevertheless, wet-lab experiments are both labor-intensive and time-consuming, and rare variants may not be present in a sample of population. Moreover, these two issues are exacerbated in SafeSeqS which enabled liquid biopsy and minimum-residual disease (MRD) detection with cell-free DNA by using unique molecular identifier (UMI) to detect and/or correct NGS error. Hence, we developed the first UMI-aware NGS small-variant simulator named SafeMut which also considered the overdispersion of allele fraction. We used the tumor-normal paired sequencing runs from the SEQC2 somatic reference sets and cell-free DNA data sets to assess the performance of BamSurgeon, VarBen, and SafeMut. We observed that, unlike BamSurgeon and VarBen, the allele-fraction distribution of the variants simulated by SafeMut closely resembles such distribution generated by technical replicates of wet-lab experiments. SafeMut is able to provide accurate simulation of small variants in NGS data, thereby helping with the assessment of the ability to call these variants.

Keywords