Reducing False-Positive Results in Newborn Screening Using Machine Learning

Gang Peng; Yishuo Tang; Tina  M. Cowan; Gregory  M. Enns; Hongyu Zhao; Curt Scharfe

doi:10.3390/ijns6010016

International Journal of Neonatal Screening (Mar 2020)

Reducing False-Positive Results in Newborn Screening Using Machine Learning

Gang Peng,
Yishuo Tang,
Tina M. Cowan,
Gregory M. Enns,
Hongyu Zhao,
Curt Scharfe

Affiliations

Gang Peng: Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
Yishuo Tang: Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
Tina M. Cowan: Department of Pathology, Stanford University School of Medicine, Stanford, CA 94304, USA
Gregory M. Enns: Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA
Hongyu Zhao: Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
Curt Scharfe: Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA

DOI: https://doi.org/10.3390/ijns6010016
Journal volume & issue: Vol. 6, no. 1
p. 16

Abstract

Read online

Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included 39 metabolic analytes detected by tandem mass spectrometry and clinical variables such as gestational age and birth weight. Analytical performance was evaluated for a cohort of 2777 screen positives reported by the California NBS program, which consisted of 235 confirmed cases and 2542 false positives for one of four disorders: glutaric acidemia type 1 (GA-1), methylmalonic acidemia (MMA), ornithine transcarbamylase deficiency (OTCD), and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD). Without changing the sensitivity to detect these disorders in screening, Random Forest-based analysis of all metabolites reduced the number of false positives for GA-1 by 89%, for MMA by 45%, for OTCD by 98%, and for VLCADD by 2%. All primary disease markers and previously reported analytes such as methionine for MMA and OTCD were among the top-ranked analytes. Random Forest’s ability to classify GA-1 false positives was found similar to results obtained using Clinical Laboratory Integrated Reports (CLIR). We developed an online Random Forest tool for interpretive analysis of increasingly complex data from newborn screening.

Published in International Journal of Neonatal Screening

ISSN: 2409-515X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Pediatrics
Website: https://www.mdpi.com/journal/ijns

About the journal

Abstract

Keywords