PLoS ONE (Jan 2020)

Comparing a novel machine learning method to the Friedewald formula and Martin-Hopkins equation for low-density lipoprotein estimation.

  • Gurpreet Singh,
  • Yasin Hussain,
  • Zhuoran Xu,
  • Evan Sholle,
  • Kelly Michalak,
  • Kristina Dolan,
  • Benjamin C Lee,
  • Alexander R van Rosendael,
  • Zahra Fatima,
  • Jessica M Peña,
  • Peter W F Wilson,
  • Antonio M Gotto,
  • Leslee J Shaw,
  • Lohendran Baskaran,
  • Subhi J Al'Aref

DOI
https://doi.org/10.1371/journal.pone.0239934
Journal volume & issue
Vol. 15, no. 9
p. e0239934

Abstract

Read online

BackgroundLow-density lipoprotein cholesterol (LDL-C) is a target for cardiovascular prevention. Contemporary equations for LDL-C estimation have limited accuracy in certain scenarios (high triglycerides [TG], very low LDL-C).ObjectivesWe derived a novel method for LDL-C estimation from the standard lipid profile using a machine learning (ML) approach utilizing random forests (the Weill Cornell model). We compared its correlation to direct LDL-C with the Friedewald and Martin-Hopkins equations for LDL-C estimation.MethodsThe study cohort comprised a convenience sample of standard lipid profile measurements (with the directly measured components of total cholesterol [TC], high-density lipoprotein cholesterol [HDL-C], and TG) as well as chemical-based direct LDL-C performed on the same day at the New York-Presbyterian Hospital/Weill Cornell Medicine (NYP-WCM). Subsequently, an ML algorithm was used to construct a model for LDL-C estimation. Results are reported on the held-out test set, with correlation coefficients and absolute residuals used to assess model performance.ResultsBetween 2005 and 2019, there were 17,500 lipid profiles performed on 10,936 unique individuals (4,456 females; 40.8%) aged 1 to 103. Correlation coefficients between estimated and measured LDL-C values were 0.982 for the Weill Cornell model, compared to 0.950 for Friedewald and 0.962 for the Martin-Hopkins method. The Weill Cornell model was consistently better across subgroups stratified by LDL-C and TG values, including TG >500 and LDL-C ConclusionsAn ML model was found to have a better correlation with direct LDL-C than either the Friedewald formula or Martin-Hopkins equation, including in the setting of elevated TG and very low LDL-C.