BMC Bioinformatics (Dec 2021)

Prioritising positively selected variants in whole-genome sequencing data using FineMAV

  • Fadilla Wahyudi,
  • Farhang Aghakhanian,
  • Sadequr Rahman,
  • Yik-Ying Teo,
  • Michał Szpak,
  • Jasbir Dhaliwal,
  • Qasim Ayub

DOI
https://doi.org/10.1186/s12859-021-04506-9
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background In population genomics, polymorphisms that are highly differentiated between geographically separated populations are often suggestive of Darwinian positive selection. Genomic scans have highlighted several such regions in African and non-African populations, but only a handful of these have functional data that clearly associates candidate variations driving the selection process. Fine-Mapping of Adaptive Variation (FineMAV) was developed to address this in a high-throughput manner using population based whole-genome sequences generated by the 1000 Genomes Project. It pinpoints positively selected genetic variants in sequencing data by prioritizing high frequency, population-specific and functional derived alleles. Results We developed a stand-alone software that implements the FineMAV statistic. To graphically visualise the FineMAV scores, it outputs the statistics as bigWig files, which is a common file format supported by many genome browsers. It is available as a command-line and graphical user interface. The software was tested by replicating the FineMAV scores obtained using 1000 Genomes Project African, European, East and South Asian populations and subsequently applied to whole-genome sequencing datasets from Singapore and China to highlight population specific variants that can be subsequently modelled. The software tool is publicly available at https://github.com/fadilla-wahyudi/finemav . Conclusions The software tool described here determines genome-wide FineMAV scores, using low or high-coverage whole-genome sequencing datasets, that can be used to prioritize a list of population specific, highly differentiated candidate variants for in vitro or in vivo functional screens. The tool displays these scores on the human genome browsers for easy visualisation, annotation and comparison between different genomic regions in worldwide human populations.

Keywords