iScience (Oct 2023)
An interpretable machine learning pipeline based on transcriptomics predicts phenotypes of lupus patients
Abstract
Summary: Machine learning (ML) has the potential to identify subsets of patients with distinct phenotypes from gene expression data. However, phenotype prediction using ML has often relied on identifying important genes without a systems biology context. To address this, we created an interpretable ML approach based on blood transcriptomics to predict phenotype in systemic lupus erythematosus (SLE), a heterogeneous autoimmune disease. We employed a sequential grouped feature importance algorithm to assess the performance of gene sets, including immune and metabolic pathways and cell types, known to be abnormal in SLE in predicting disease activity and organ involvement. Gene sets related to interferon, tumor necrosis factor, the mitoribosome, and T cell activation were the best predictors of phenotype with excellent performance. These results suggest potential relationships between the molecular pathways identified in each model and manifestations of SLE. This ML approach to phenotype prediction can be applied to other diseases and tissues.