Towards a Complete Characterization of Target Law Identification in Missing Data DAG Models

It is often said that the fundamental problem of causal inference is a missing data problem: comparisons of potential outcomes are difficult because only one response is observed for each unit. In this talk we consider the converse perspective, that missing data problems can be viewed as causal inference problems, but in an important sense harder ones. Recovering the complete data law from the observed law can be interpreted as identifying a joint distribution over counterfactual variables corresponding to values that would have been observed had measurement been possible. We study non-ignorable missingness (MNAR) models by imposing structural restrictions on the full data distribution, consisting of an (un)restricted target distribution together with a missingness mechanism that factorizes according to a directed acyclic graph. This graphical formulation allows ideas from causal identification to be applied, while also revealing gaps between causal and missing data identification. A key obstacle arises when missingness indicators are treated interventionally: sequences of interventions can induce and propagate selection bias, causing identification to fail even in settings where familiar causal tools suggest success. The talk describes how these phenomena appear, when they can be avoided, and when they fundamentally obstruct recovery of the target law. We discuss partial solutions, remaining failure modes, and what structure a complete graphical identification theory for missing data DAG models would need to capture.

Further information

Time:

Venue:

Speaker:

Series:

Forthcoming Seminars

News, Announcements and Events

Social media

Study at Cambridge

About the University

Research at Cambridge