Communications Biology (Dec 2024)

Population-specific reference panel improves imputation quality for genome-wide association studies conducted on the Japanese population

  • Jack Flanagan,
  • Xiaoxi Liu,
  • David Ortega-Reyes,
  • Kohei Tomizuka,
  • Nana Matoba,
  • Masato Akiyama,
  • Masaru Koido,
  • Kazuyoshi Ishigaki,
  • Kyota Ashikawa,
  • Sadaaki Takata,
  • MingYang Shi,
  • Tomomi Aoi,
  • Yukihide Momozawa,
  • Kaoru Ito,
  • Yoshinori Murakami,
  • Koichi Matsuda,
  • The Biobank Japan Project,
  • Yoichiro Kamatani,
  • Andrew P. Morris,
  • Momoko Horikoshi,
  • Chikashi Terao

DOI
https://doi.org/10.1038/s42003-024-07338-4
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 10

Abstract

Read online

Abstract To improve imputation quality for genome-wide association studies (GWAS) conducted on the Japanese population, we developed and evaluated four Japanese population-specific reference panels. These panels were constructed through the augmentation of the 1000 Genomes Project (1KG) panel using Japanese whole genome sequencing (WGS) data, with sample sizes ranging from 1 K to 7 K individuals enrolled through the Biobank Japan (BBJ) project, and sequencing depths ranging from 3× to 30×. Among these panels, an augmented reference panel comprising 7472 WGS samples of mixed depth (1KG+7K) exhibit the greatest improvement in imputation quality relative to the Trans-Omics for Precision Medicine (TOPMed) reference panel. Notably, we observe these improvements primarily for rare variants with a minor allele frequency (MAF) <5%. To demonstrate the benefits of improved imputation quality in association analyses of complex traits, we conducted GWAS for serum uric acid and total cholesterol levels following imputation up to the 1KG+7K panel. The analysis reveals several loci reaching genome-wide significance (P < 5 × 10–8) in the 1KG+7K imputation output yet remaining undetected when the same sample set is imputed up to the TOPMed reference panel. In summary, the 1KG+7K panel demonstrates significant advantages in the discovery of trait-associated loci, particularly those influenced by low-frequency association signals.