Causal Discovery in the Real World
Inferring causal structure from observational data is a fundamental challenge in science and evidence-based decision-making. Most existing methods for learning directed acyclic graphs (DAGs) assume that the true causal graph is identifiable from data — an assumption that rarely holds cleanly in practice, where causal assumptions are violated, data is limited, and the space of plausible structures is large. Our recent work targets these realistic shortcomings from two directions.
CaPE: Causal Preference Elicitation addresses the fact that DAG estimation from observational data alone is often under-determined — many graph structures are consistent with the data. CaPE brings a domain expert into the loop using a Bayesian active learning framework that strategically queries the expert about edge relationships in the graph. A three-way likelihood models expert judgments about edge presence and directionality, with particle-based inference and an expected information gain criterion selecting the most informative queries. The result is faster convergence to the true causal structure and better recovery of causal effects under a limited query budget. CaPE was accepted at ICML 2026.
Arrow: A Foundation Model for Causal Discovery takes a complementary approach: rather than requiring task-specific training or expert elicitation, Arrow is a transformer-based foundation model trained on synthetic datasets with diverse known causal structures. At inference time it performs zero-shot causal discovery on new tabular datasets — no fine-tuning required. Arrow uses DAG factorization and skeleton-order decomposition to predict graph structure, achieving performance comparable to or better than existing methods at a fraction of the computational cost.
Together, these works push causal discovery toward practical deployment: CaPE by making expert knowledge tractably useful, and Arrow by eliminating the computational barrier to applying strong causal priors on new problems.