BMC Genomics (Oct 2024)

Transcriptome-based prediction for polygenic traits in rice using different gene subsets

  • Ryokei Tanaka,
  • Tsubasa Kawai,
  • Taiji Kawakatsu,
  • Nobuhiro Tanaka,
  • Matthew Shenton,
  • Shiori Yabe,
  • Yusaku Uga

DOI
https://doi.org/10.1186/s12864-024-10803-3
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Transcriptome-based prediction of complex phenotypes is a relatively new statistical method that links genetic variation to phenotypic variation. The selection of large-effect genes based on a priori biological knowledge is beneficial for predicting oligogenic traits; however, such a simple gene selection method is not applicable to polygenic traits because causal genes or large-effect loci are often unknown. Here, we used several gene-level features and tested whether it was possible to select a gene subset that resulted in better predictive ability than using all genes for predicting a polygenic trait. Results Using the phenotypic values of shoot and root traits and transcript abundances in leaves and roots of 57 rice accessions, we evaluated the predictive abilities of the transcriptome-based prediction models. Leaf transcripts predicted shoot phenotypes, such as plant height, more accurately than root transcripts, whereas root transcripts predicted root phenotypes, such as crown root length, more accurately than leaf transcripts. Furthermore, we used the following three features to train the prediction model: (1) tissue specificity of the transcripts, (2) ontology annotations, and (3) co-expression modules for selecting gene subsets. Although models trained by a gene subset often resulted in lower predictive abilities than the model trained by all genes, some gene subsets showed improved predictive ability. For example, using genes expressed in roots but not in leaves, the predictive ability for crown root diameter was improved by more than 10% (R 2 = 0.59 when using all genes; R 2 = 0.66, using 1,554 root-specifically expressed genes). Similarly, genes annotated as “gibberellic acid sensitivity” showed higher predictive ability than using all genes for root dry weight. Conclusions Our results highlight both the possibility and difficulty of selecting an appropriate gene subset to predict polygenic traits from transcript abundance, given the current biological knowledge and information. Further integration of multiple sources of information, as well as improvements in gene characterization, may enable the selection of an optimal gene set for the prediction of polygenic phenotypes.

Keywords