Computational and Structural Biotechnology Journal (Dec 2024)
Identification of genetic basis of brain imaging by group sparse multi-task learning leveraging summary statistics
Abstract
Brain imaging genetics is an evolving neuroscience topic aiming to identify genetic variations related to neuroimaging measurements of interest. Traditional linear regression methods have shown success, but their reliance on individual-level imaging and genetic data limits their applicability. Herein, we proposed S-GsMTLR, a group sparse multi-task linear regression method designed to harness summary statistics from genome-wide association studies (GWAS) of neuroimaging quantitative traits. S-GsMTLR directly employs GWAS summary statistics, bypassing the requirement for raw imaging genetic data, and applies multivariate multi-task sparse learning to these univariate GWAS results. It amalgamates the strengths of conventional sparse learning methods, including sophisticated modeling techniques and efficient feature selection. Additionally, we implemented a rapid optimization strategy to alleviate computational burdens by identifying genetic variants associated with phenotypes of interest across the entire chromosome. We first evaluated S-GsMTLR using summary statistics derived from the Alzheimer's Disease Neuroimaging Initiative. The results were remarkably encouraging, demonstrating its comparability to conventional methods in modeling and identification of risk loci. Furthermore, our method was evaluated with two additional GWAS summary statistics datasets: One focused on white matter microstructures and the other on whole brain imaging phenotypes, where the original individual-level data was unavailable. The results not only highlighted S-GsMTLR's ability to pinpoint significant loci but also revealed intriguing structures within genetic variations and loci that went unnoticed by GWAS. These findings suggest that S-GsMTLR is a promising multivariate sparse learning method in brain imaging genetics. It eliminates the need for original individual-level imaging and genetic data while demonstrating commendable modeling and feature selection capabilities.