Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples

Stanislav Listopad; Christophe Magnan; Aliya Asghar; Andrew Stolz; John A. Tayek; Zhang-Xu Liu; Timothy R. Morgan; Trina M. Norden-Krichmar

JHEP Reports (Oct 2022)

Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples

Stanislav Listopad,
Christophe Magnan,
Aliya Asghar,
Andrew Stolz,
John A. Tayek,
Zhang-Xu Liu,
Timothy R. Morgan,
Trina M. Norden-Krichmar

Affiliations

Stanislav Listopad: Department of Computer Science, University of California, Irvine, CA 92697, USA
Christophe Magnan: Department of Computer Science, University of California, Irvine, CA 92697, USA
Aliya Asghar: Medicine and Research Services, VA Long Beach Healthcare System, Long Beach, CA 90822, USA
Andrew Stolz: Division of Gastrointestinal & Liver Diseases, Department of Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
John A. Tayek: Division of General Internal Medicine, Harbor-UCLA Medical Center, University of California Los Angeles, Torrance, CA 90509, USA
Zhang-Xu Liu: Division of Gastrointestinal & Liver Diseases, Department of Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
Timothy R. Morgan: Medicine and Research Services, VA Long Beach Healthcare System, Long Beach, CA 90822, USA
Trina M. Norden-Krichmar: Department of Computer Science, University of California, Irvine, CA 92697, USA; Department of Epidemiology and Biostatistics, University of California, Irvine, CA 92697, USA; Corresponding author. Address: Department of Epidemiology and Biostatistics, University of California, Irvine, CA 92697 USA; Tel.: 949-824-8802.

Journal volume & issue: Vol. 4, no. 10
p. 100560

Abstract

Read online

Background & Aims: Liver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challenging. In this study, we implemented a machine learning pipeline for the identification of diagnostic gene expression biomarkers across several alcohol-associated and non-alcohol-associated liver diseases, using either liver tissue or blood-based samples. Methods: We collected peripheral blood mononuclear cells (PBMCs) and liver tissue samples from participants with alcohol-associated hepatitis (AH), alcohol-associated cirrhosis (AC), non-alcohol-associated fatty liver disease, chronic HCV infection, and healthy controls. We performed RNA sequencing (RNA-seq) on 137 PBMC samples and 67 liver tissue samples. Using gene expression data, we implemented a machine learning feature selection and classification pipeline to identify diagnostic biomarkers which distinguish between the liver disease groups. The liver tissue results were validated using a public independent RNA-seq dataset. The biomarkers were computationally validated for biological relevance using pathway analysis tools. Results: Utilizing liver tissue RNA-seq data, we distinguished between AH, AC, and healthy conditions with overall accuracies of 90% in our dataset, and 82% in the independent dataset, with 33 genes. Distinguishing 4 liver conditions and healthy controls yielded 91% overall accuracy in our liver tissue dataset with 39 genes, and 75% overall accuracy in our PBMC dataset with 75 genes. Conclusions: Our machine learning pipeline was effective at identifying a small set of diagnostic gene biomarkers and classifying several liver diseases using RNA-seq data from liver tissue and PBMCs. The methodologies implemented and genes identified in this study may facilitate future efforts toward a liquid biopsy diagnostic for liver diseases. Lay summary: Distinguishing between inflammatory liver diseases without multiple tests can be challenging due to their clinically similar characteristics. To lay the groundwork for the development of a non-invasive blood-based diagnostic across a range of liver diseases, we compared samples from participants with alcohol-associated hepatitis, alcohol-associated cirrhosis, chronic hepatitis C infection, and non-alcohol-associated fatty liver disease. We used a machine learning computational approach to demonstrate that gene expression data generated from either liver tissue or blood samples can be used to discover a small set of gene biomarkers for effective diagnosis of these liver diseases.

Published in JHEP Reports

ISSN: 2589-5559 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Diseases of the digestive system. Gastroenterology
Website: https://www.journals.elsevier.com/jhep-reports

About the journal

Abstract

Keywords