IEEE Access (Jan 2022)

Optimal Causal Decision Trees Ensemble for Improved Prediction and Causal Inference

  • Neelam Younas,
  • Amjad Ali,
  • Hafsa Hina,
  • Muhammad Hamraz,
  • Zardad Khan,
  • Saeed Aldahmani

DOI
https://doi.org/10.1109/ACCESS.2022.3146406
Journal volume & issue
Vol. 10
pp. 13000 – 13011

Abstract

Read online

Ensemble methods can be used to identify causal relationships in data for a better understanding and taking the right decision in processes that involve high risk. This paper explores the idea of a causal decision tree forest and proposes a regularized ensemble method by integrating optimal causal trees for improved prediction accuracy while not compromising on accurately estimating heterogeneous treatment effects. The proposed method is based on selecting a subset of the most accurate causal trees from a sufficiently large pool based on their out-of-sample error estimates. The selected trees are integrated to form an ensemble that is used for estimating heterogeneous treatment effect and predicting unseen data. The proposed method is applied on Pakistan’s income function consisting of 27964 observations on wages of workers age 10 and above as an example dataset. The paper gives a detailed simulation study where datasets are generated under 5 different designs. The proposed method is assessed against ordinary least square (OLS), least absolute shrinkage and selection operator (LASSO), Ridge, Causal Tree and the standard decision trees forest (i.e. the causal forest) via mean square error (MSE), root mean square error (RMSE), mean absolute deviation (MAD) and Pearson correlation ( ${r}$ ) as performance metrics. The analyses given in the paper reveal that the proposed method can be used effectively for estimating heterogeneous treatment effects and achieves better prediction performance and as compared to the rest of the methods given in the paper.

Keywords