IEEE Access (Jan 2024)

Evaluation of Sparse Proximal Multi-Task Learning for Genome-Wide Prediction

  • Yuhua Fan,
  • Ilkka Launonen,
  • Mikko J. Sillanpaa,
  • Patrik Waldmann

DOI
https://doi.org/10.1109/ACCESS.2024.3386093
Journal volume & issue
Vol. 12
pp. 51665 – 51675

Abstract

Read online

Multi-task learning (MTL) is a learning paradigm whose aim is to leverage information shared across related tasks to improve the generalization of models. Motivated by the success of proximal optimization algorithms and single-task learning regression models, sparse proximal multi-task learning (SPMTL) for genome-wide prediction (GWP) should be explored. This study investigates proximal gradient descent splitting algorithms with five non-smooth sparsity-inducing norm regularizers, including the novel $L_{2,\frac {1}{2}}$ norm for GWP. Additionally, two popular methods based on Markov chain Monte Carlo (MCMC) are examined. To improve the computational efficiency, parallel Bayesian optimization strategy is employed for efficient hyperparameter tuning. Evaluation is conducted on three different real-world genomic datasets from mice, pigs and wheat, each associated with two, five, and four traits, respectively. Performance is assessed using mean squared error (MSE) and correlation coefficient between predicted and observed trait values in test sets. Experimental results reveal that the $L_{2,\frac {1}{2}}$ regularizer consistently achieves the best out-of-sample prediction across all datasets, demonstrating the effectiveness of SPMTL in leveraging shared information for improved GWP accuracy. Furthermore, the influence of different regularizers on sparsity and other properties of the SPMTL model are also explored.

Keywords