Comparing a novel machine learning method to the Friedewald formula and Martin-Hopkins equation for low-density lipoprotein estimation.

Gurpreet Singh; Yasin Hussain; Zhuoran Xu; Evan Sholle; Kelly Michalak; Kristina Dolan; Benjamin C Lee; Alexander R van Rosendael; Zahra Fatima; Jessica M Peña; Peter W F Wilson; Antonio M Gotto; Leslee J Shaw; Lohendran Baskaran; Subhi J Al'Aref

doi:10.1371/journal.pone.0239934

PLoS ONE (Jan 2020)

Comparing a novel machine learning method to the Friedewald formula and Martin-Hopkins equation for low-density lipoprotein estimation.

Gurpreet Singh,
Yasin Hussain,
Zhuoran Xu,
Evan Sholle,
Kelly Michalak,
Kristina Dolan,
Benjamin C Lee,
Alexander R van Rosendael,
Zahra Fatima,
Jessica M Peña,
Peter W F Wilson,
Antonio M Gotto,
Leslee J Shaw,
Lohendran Baskaran,
Subhi J Al'Aref

Affiliations

Gurpreet Singh
Yasin Hussain
Zhuoran Xu
Evan Sholle
Kelly Michalak
Kristina Dolan
Benjamin C Lee
Alexander R van Rosendael
Zahra Fatima
Jessica M Peña
Peter W F Wilson
Antonio M Gotto
Leslee J Shaw
Lohendran Baskaran
Subhi J Al'Aref

DOI: https://doi.org/10.1371/journal.pone.0239934
Journal volume & issue: Vol. 15, no. 9
p. e0239934

Abstract

Read online

BackgroundLow-density lipoprotein cholesterol (LDL-C) is a target for cardiovascular prevention. Contemporary equations for LDL-C estimation have limited accuracy in certain scenarios (high triglycerides [TG], very low LDL-C).ObjectivesWe derived a novel method for LDL-C estimation from the standard lipid profile using a machine learning (ML) approach utilizing random forests (the Weill Cornell model). We compared its correlation to direct LDL-C with the Friedewald and Martin-Hopkins equations for LDL-C estimation.MethodsThe study cohort comprised a convenience sample of standard lipid profile measurements (with the directly measured components of total cholesterol [TC], high-density lipoprotein cholesterol [HDL-C], and TG) as well as chemical-based direct LDL-C performed on the same day at the New York-Presbyterian Hospital/Weill Cornell Medicine (NYP-WCM). Subsequently, an ML algorithm was used to construct a model for LDL-C estimation. Results are reported on the held-out test set, with correlation coefficients and absolute residuals used to assess model performance.ResultsBetween 2005 and 2019, there were 17,500 lipid profiles performed on 10,936 unique individuals (4,456 females; 40.8%) aged 1 to 103. Correlation coefficients between estimated and measured LDL-C values were 0.982 for the Weill Cornell model, compared to 0.950 for Friedewald and 0.962 for the Martin-Hopkins method. The Weill Cornell model was consistently better across subgroups stratified by LDL-C and TG values, including TG >500 and LDL-C ConclusionsAn ML model was found to have a better correlation with direct LDL-C than either the Friedewald formula or Martin-Hopkins equation, including in the setting of elevated TG and very low LDL-C.

Published in PLoS ONE

ISSN: 1932-6203 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine; Science
Website: https://journals.plos.org/plosone/

About the journal