Scientific Reports (Sep 2024)

Identification of candidate causal variants and target genes at 41 breast cancer risk loci through differential allelic expression analysis

  • Joana M. Xavier,
  • Ramiro Magno,
  • Roslin Russell,
  • Bernardo P. de Almeida,
  • Ana Jacinta-Fernandes,
  • André Besouro-Duarte,
  • Mark Dunning,
  • Shamith Samarajiwa,
  • Martin O’Reilly,
  • António M. Maia,
  • Cátia L. Rocha,
  • Nordiana Rosli,
  • Bruce A. J. Ponder,
  • Ana-Teresa Maia

DOI
https://doi.org/10.1038/s41598-024-72163-y
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Understanding breast cancer genetic risk relies on identifying causal variants and candidate target genes in risk loci identified by genome-wide association studies (GWAS), which remains challenging. Since most loci fall in active gene regulatory regions, we developed a novel approach facilitated by pinpointing the variants with greater regulatory potential in the disease’s tissue of origin. Through genome-wide differential allelic expression (DAE) analysis, using microarray data from 64 normal breast tissue samples, we mapped the variants associated with DAE (daeQTLs). Then, we intersected these with GWAS data to reveal candidate risk regulatory variants and analysed their cis-acting regulatory potential. Finally, we validated our approach by extensive functional analysis of the 5q14.1 breast cancer risk locus. We observed widespread gene expression regulation by cis-acting variants in breast tissue, with 65% of coding and noncoding expressed genes displaying DAE (daeGenes). We identified over 54 K daeQTLs for 6761 (26%) daeGenes, including 385 daeGenes harbouring variants previously associated with BC risk. We found 1431 daeQTLs mapped to 93 different loci in strong linkage disequilibrium with risk-associated variants (risk-daeQTLs), suggesting a link between risk-causing variants and cis-regulation. There were 122 risk-daeQTL with stronger cis-acting potential in active regulatory regions with protein binding evidence. These variants mapped to 41 risk loci, of which 29 had no previous report of target genes and were candidates for regulating the expression levels of 65 genes. As validation, we identified and functionally characterised five candidate causal variants at the 5q14.1 risk locus targeting the ATG10 and ATP6AP1L genes, likely acting via modulation of alternative transcription and transcription factor binding. Our study demonstrates the power of DAE analysis and daeQTL mapping to identify causal regulatory variants and target genes at breast cancer risk loci, including those with complex regulatory landscapes. It additionally provides a genome-wide resource of variants associated with DAE for future functional studies.

Keywords