IEEE Access (Jan 2024)

Overcoming Confounding Bias in Causal Discovery Using Minimum Redundancy and Maximum Relevancy Constraint

  • Havisha Nadendla,
  • Pujit Pavan Etha,
  • Pradeep Chowriappa

DOI
https://doi.org/10.1109/ACCESS.2024.3372369
Journal volume & issue
Vol. 12
pp. 33057 – 33068

Abstract

Read online

Causal discovery is the process of modeling cause and effect relationships among features. Unlike traditional model-based approaches, that rely on fitting data to the models, methods of causal discovery determine the causal structure from data. In clinical and EHR data analysis, causal discovery is used to identify dependencies among features that are difficult to estimate using model-based approaches. The resultant structures are represented as Directed Acyclic Graphs (DAG) consisting of nodes and arcs. Here, the direction of the arcs in a DAG indicates the influence of one feature over the other. These dependencies are fundamental to the discovery of novel insights obtained from data. However, causal discovery solely relies on establishing feature dependencies based on their conditional dependencies, that could lead to inaccurate inferences brought about by confounding bias. Our contribution in this work is ‘Non-Confounding Causal Discovery’ (NCCD), a framework aimed at overcoming confounding bias leveraging maximum relevancy and minimum redundancy between features using the concepts of information theory. The work presented uses threshold conditioned values on which the features in the graphical structure are connected to one another. Validation was carried out on three clinical trial benchmark datasets and compared the results against the previously known Naïve Bayes (NB) and Tree Augmented Naïve Bayes (TAN) algorithms. We observe a reduction in the complexity of the graph, evidenced by a decrease in the number of arcs. Notably, the graphs generated through NCCD exhibited a capacity to eliminate confounding dependencies while concurrently preserving the overall score of the network.

Keywords