Mathematics (Sep 2024)

A Novel Ensemble Method of Divide-and-Conquer Markov Boundary Discovery for Causal Feature Selection

  • Hao Li,
  • Jianjun Zhan,
  • Haosen Wang,
  • Zipeng Zhao

DOI
https://doi.org/10.3390/math12182927
Journal volume & issue
Vol. 12, no. 18
p. 2927

Abstract

Read online

The discovery of Markov boundaries is highly effective at identifying features that are causally related to the target variable, providing strong interpretability and robustness. While there are numerous methods for discovering Markov boundaries in real-world applications, no single method is universally applicable to all datasets. Therefore, in order to balance precision and recall, we propose an ensemble framework of divide-and-conquer Markov boundary discovery algorithms based on U-I selection strategy. We put three divide-and-conquer Markov boundary methods into the framework to obtain an ensemble algorithm, focusing on judging controversial parent–child variables to further balance precision and recall. By combining multiple algorithms, the ensemble algorithm can leverage their respective strengths and more thoroughly analyze the cause-and-effect relationships of target variables through various perspectives. Furthermore, it can enhance the robustness of the algorithm and reduce dependence on a single algorithm. In the experiment, we select four advanced Markov boundary discovery algorithms as comparison algorithms and compare them on nine benchmark Bayesian networks and three real-world datasets. The results show that EDMB ranks first in the overall ranking, which illustrates the superiority of the integrated algorithm and the effectiveness of the adopted U-I selection strategy. The main contribution of this paper lies in proposing an ensemble framework for divide-and-conquer Markov boundary discovery algorithms, balancing precision and recall through the U-I selection strategy, and judging controversial parent–child variables to enhance algorithm performance and robustness. The advantage of the U-I selection strategy and its difference from existing methods is the ability to independently obtain the maximum precision and recall of multiple algorithms within the ensemble framework. By assessing controversial parent–child variables, it further balances precision and recall, leading to results that are closer to the true Markov boundary.

Keywords