A causal direction test for heterogeneous populations

Vahid Partovi Nia; Xinlin Li; Masoud Asgharian; Shoubo Hu; Yanhui Geng; Zhitang Chen

Machine Learning with Applications (Mar 2022)

A causal direction test for heterogeneous populations

Vahid Partovi Nia,
Xinlin Li,
Masoud Asgharian,
Shoubo Hu,
Yanhui Geng,
Zhitang Chen

Affiliations

Vahid Partovi Nia: Huawei Noah’s Ark Lab, Suit 201, 7101 Park avenue, Montreal, Quebec H3N 1X9, Canada; Polytechnique Montreal, Department of Mathematics and Industrial Engineering, 2500 Chemin de Polytechnique, Montreal, QC H3T 1J4, Canada; Corresponding author at: Huawei Noah’s Ark Lab, Suit 201, 7101 Park avenue, Montreal, Quebec H3N 1X9, Canada.
Xinlin Li: Huawei Noah’s Ark Lab, Suit 201, 7101 Park avenue, Montreal, Quebec H3N 1X9, Canada
Masoud Asgharian: McGill University, Department of Mathematics and Statistics, 805 Rue Sherbrooke West, Montreal, QC H3A 2K6, Canada
Shoubo Hu: The Chinese University of Hong Kong, Department of Computer Science and Engineering, Ho Sin-Hang Engineering Building, Shatin N.T., Hong Kong
Yanhui Geng: Huawei Noah’s Ark Lab, Units 525-530, Core Building 2, Hong Kong Science Park, Shatin, Hong Kong
Zhitang Chen: Huawei Noah’s Ark Lab, Units 525-530, Core Building 2, Hong Kong Science Park, Shatin, Hong Kong

Journal volume & issue: Vol. 7
p. 100235

Abstract

Read online

A probabilistic expert system emulates the decision-making ability of a human expert through a directional graphical model. The first step in building such systems is to understand data generation mechanism. To this end, one may try to decompose a multivariate distribution into product of several conditionals, and evolving a blackbox machine learning predictive models towards transparent cause-and-effect discovery. Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications. We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction. We propose an adjustment to a commonly used causal direction test statistic by using a k-means type clustering algorithm where both the labels and the number of components are estimated from the collected data to adjust the test statistic. Our simulation result show that the proposed adjustment significantly improves the performance of the causal direction test statistic for heterogeneous data. We study large sample behaviour of our proposed test statistic and demonstrate the application of the proposed method using real data.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords