BMC Bioinformatics (Nov 2024)

Robust double machine learning model with application to omics data

  • Xuqing Wang,
  • Yahang Liu,
  • Guoyou Qin,
  • Yongfu Yu

DOI
https://doi.org/10.1186/s12859-024-05975-4
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background Recently, there has been a growing interest in combining causal inference with machine learning algorithms. Double machine learning model (DML), as an implementation of this combination, has received widespread attention for their expertise in estimating causal effects within high-dimensional complex data. However, the DML model is sensitive to the presence of outliers and heavy-tailed noise in the outcome variable. In this paper, we propose the robust double machine learning (RDML) model to achieve a robust estimation of causal effects when the distribution of the outcome is contaminated by outliers or exhibits symmetrically heavy-tailed characteristics. Results In the modelling of RDML model, we employed median machine learning algorithms to achieve robust predictions for the treatment and outcome variables. Subsequently, we established a median regression model for the prediction residuals. These two steps ensure robust causal effect estimation. Simulation study show that the RDML model is comparable to the existing DML model when the data follow normal distribution, while the RDML model has obvious superiority when the data follow mixed normal distribution and t-distribution, which is manifested by having a smaller RMSE. Meanwhile, we also apply the RDML model to the deoxyribonucleic acid methylation dataset from the Alzheimer’s disease (AD) neuroimaging initiative database with the aim of investigating the impact of Cerebrospinal Fluid Amyloid $$\upbeta$$ β 42 (CSF A $$\upbeta$$ β 42) on AD severity. Conclusion These findings illustrate that the RDML model is capable of robustly estimating causal effect, even when the outcome distribution is affected by outliers or displays symmetrically heavy-tailed properties.

Keywords