ML4S: Learning Causal Skeleton from Vicinal Graphs

SIGKDD 2022 |

Organized by ACM

Causal skeleton learning aims to identify the undirected graph of the underlying causal Bayesian network (BN) from observational data. It plays a pivotal role in causal discovery and many other downstream applications. The methods for causal skeleton learning fall into three primary categories: constraint-based, score-based, and gradient-based methods. This paper, for the first time, advocates for learning a causal skeleton in a supervision-based setting, where the algorithm learns from additional datasets associated with the ground-truth BNs (complementary to input observational data). Concretizing a supervision-based method is non-trivial due to the high complexity of the problem itself, and the potential “domain shift” between training data (i.e., additional datasets associated with ground-truth BNs) and test data (i.e., observational data) in the supervision-based setting. First, it is well-known that skeleton learning suffers worst-case exponential complexity. Second, conventional supervised learning assumes an independent and identical distribution (i.i.d.) on test data, which is not easily attainable due to the divergent underlying causal mechanisms between training and test data. Our proposed framework, ML4S, adopts order-based cascade classifiers and pruning strategies that can withstand high computational overhead without sacrificing accuracy. To address the “domain shift” challenge, we generate training data from vicinal graphs w.r.t. the target BN. The associated datasets of vicinal graphs share similar joint distributions with the observational data. We evaluate ML4S on a variety of datasets and observe that it remarkably outperforms the state of the arts, demonstrating the great potential of the supervision-based skeleton learning paradigm.