A Machine Learning Approach to Identifying Causal Monogenic Variants in Inflammatory Bowel Disease

Daniel J. Mulder; Sam Khalouei; Michael Li; Neil Warner; Claudia Gonzaga-Jauregui; Eric I. Benchimol; Peter C. Church; Thomas D. Walters; Arun K. Ramani; Anne M. Griffiths; Amanda Ricciuto; Aleixo M. Muise

Gastro Hep Advances (Jan 2022)

A Machine Learning Approach to Identifying Causal Monogenic Variants in Inflammatory Bowel Disease

Daniel J. Mulder,
Sam Khalouei,
Michael Li,
Neil Warner,
Claudia Gonzaga-Jauregui,
Eric I. Benchimol,
Peter C. Church,
Thomas D. Walters,
Arun K. Ramani,
Anne M. Griffiths,
Amanda Ricciuto,
Aleixo M. Muise

Affiliations

Daniel J. Mulder: Division of Gastroenterology, Hepatology and Nutrition, The Hospital for Sick Children, Toronto, Ontario, Canada; Departments of Pediatrics, Medicine and Biomedical and Molecular Sciences, Queen’s University, Kingston, Ontario, Canada; Daniel J. Mulder, MD, PhD, Queen’s University, 76 Stuart St, Kingston, Ontario K7L 2V7, Canada.
Sam Khalouei: Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
Michael Li: Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
Neil Warner: Division of Gastroenterology, Hepatology and Nutrition, The Hospital for Sick Children, Toronto, Ontario, Canada; SickKids Inflammatory Bowel Disease Centre and Cell Biology Program, Research Institute, Hospital for Sick Children, Toronto, Ontario, Canada; Department of Pediatrics and Biochemistry, Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
Claudia Gonzaga-Jauregui: Regeneron Genetics Center, Regeneron Pharmaceuticals Inc, Tarrytown, New York
Eric I. Benchimol: Division of Gastroenterology, Hepatology and Nutrition, The Hospital for Sick Children, Toronto, Ontario, Canada
Peter C. Church: Division of Gastroenterology, Hepatology and Nutrition, The Hospital for Sick Children, Toronto, Ontario, Canada
Thomas D. Walters: Division of Gastroenterology, Hepatology and Nutrition, The Hospital for Sick Children, Toronto, Ontario, Canada
Arun K. Ramani: Centre for Computational Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
Anne M. Griffiths: Division of Gastroenterology, Hepatology and Nutrition, The Hospital for Sick Children, Toronto, Ontario, Canada
Amanda Ricciuto: Division of Gastroenterology, Hepatology and Nutrition, The Hospital for Sick Children, Toronto, Ontario, Canada
Aleixo M. Muise: Division of Gastroenterology, Hepatology and Nutrition, The Hospital for Sick Children, Toronto, Ontario, Canada; SickKids Inflammatory Bowel Disease Centre and Cell Biology Program, Research Institute, Hospital for Sick Children, Toronto, Ontario, Canada; Department of Pediatrics and Biochemistry, Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada; Correspondence: Address correspondence to: Aleixo M. Muise, MD, PhD, The Hospital for Sick Children, 555 University Ave, Toronto, Ontario M5G 1X8, Canada.

Journal volume & issue: Vol. 1, no. 2
pp. 171 – 179

Abstract

Read online

Background and Aims: Diagnosis of monogenic disease is increasingly important for patient care and personalizing therapy. However, the current process is nonstandardized, expensive, and time consuming. There is currently no accepted strategy to help identify disease-causing variants in monogenic inflammatory bowel disease (IBD). The aim of the study is to develop a prioritization strategy for monogenic IBD variant discovery through detailed analysis of a whole-exome sequencing (WES) data set. Methods: All consenting pediatric patients with IBD presenting to our tertiary care hospital during the study period were enrolled and underwent WES (n = 1005). Available family members also underwent WES. Variants were analyzed en masse using the GEMINI framework and were further annotated using data from dbNSFP, Combined Annotation Dependent Depletion, and gnomAD. Known disease-causing variants (n = 36) were used as positive controls. Machine learning algorithms were optimized and then compared to assist with identifying monogenic IBD case characteristics. Results: Initial gene-level analysis identified 11 genes not previously linked to IBD that could potentially harbor IBD-causing variants. Machine learning algorithms identified 4 primary variant characteristics (Combined Annotation Dependent Depletion score, dbNSFP score, relationship with a known immunodeficiency gene, and alternate allele frequency), and optimal threshold values for each were determined to assist with identifying monogenic IBD variants. Based on these characteristics, an automated variant prioritization pipeline was then created that filters and prioritizes variants from >100,000 variants per patient down to a mean of 15. This pipeline is available online for all to use. Conclusion: Leveraging a large WES data set, we demonstrate a statistically rigorous strategy for prioritization of variants for monogenic IBD diagnosis.

Published in Gastro Hep Advances

ISSN: 2772-5723 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Internal medicine: Specialties of internal medicine: Diseases of the digestive system. Gastroenterology
Website: https://www.sciencedirect.com/journal/gastro-hep-advances

About the journal

Abstract

Keywords