Cell Reports (Sep 2023)

Integration of eQTL and machine learning to dissect causal genes with pleiotropic effects in genetic regulation networks of seed cotton yield

  • Ting Zhao,
  • Hongyu Wu,
  • Xutong Wang,
  • Yongyan Zhao,
  • Luyao Wang,
  • Jiaying Pan,
  • Huan Mei,
  • Jin Han,
  • Siyuan Wang,
  • Kening Lu,
  • Menglin Li,
  • Mengtao Gao,
  • Zeyi Cao,
  • Hailin Zhang,
  • Ke Wan,
  • Jie Li,
  • Lei Fang,
  • Tianzhen Zhang,
  • Xueying Guan

Journal volume & issue
Vol. 42, no. 9
p. 113111

Abstract

Read online

Summary: The dissection of a gene regulatory network (GRN) that complements the genome-wide association study (GWAS) locus and the crosstalk underlying multiple agronomical traits remains a major challenge. In this study, we generate 558 transcriptional profiles of lint-bearing ovules at one day post-anthesis from a selective core cotton germplasm, from which 12,207 expression quantitative trait loci (eQTLs) are identified. Sixty-six known phenotypic GWAS loci are colocalized with 1,090 eQTLs, forming 38 functional GRNs associated predominantly with seed yield. Of the eGenes, 34 exhibit pleiotropic effects. Combining the eQTLs within the seed yield GRNs significantly increases the portion of narrow-sense heritability. The extreme gradient boosting (XGBoost) machine learning approach is applied to predict seed cotton yield phenotypes on the basis of gene expression. Top-ranking eGenes (NF-YB3, FLA2, and GRDP1) derived with pleiotropic effects on yield traits are validated, along with their potential roles by correlation analysis, domestication selection analysis, and transgenic plants.

Keywords