Causal Structure Recovery under Partial Observability
We show that the standard estimate-then-discover pipeline for causal inference in partially observed dynamical systems is systematically biased when built on RTS-smoothed states. We prove that filtered state estimates preserve the exogeneity conditions required for Granger-style discovery, and show empirically that switching representation recovers near-oracle structure at no extra cost.
The variables we care about are rarely the ones we observe.
In most real-world settings, like power systems, sensor networks, or neural recordings, the variables whose causal relationships we want to understand are hidden. We see only noisy projections of them over time, and reconstruct the rest.
The standard pipeline has two steps. First you recover the hidden trajectory with a Kalman filter or an RTS smoother. Then you run a causal discovery procedure, usually Granger, partial correlation, or VAR, on the recovered trajectory.
This paper argues that the first step is not neutral. The choice of state representation quietly determines whether the second step is valid at all.
Why smoothed states create endogeneity and filtered states don't.
The core claim is structural rather than empirical. RTS smoothing pulls information from future observations into the estimate at every timestep:
When this estimate is regressed on its own past to test for Granger causality, the regressor ends up correlated with the error term. The OLS assumption fails, and the result is biased.
The filtered estimate depends only on the past, so exogeneity is preserved. The same downstream test, unchanged, returns near-oracle structure.
The error lives at the pipeline level, not in the method itself. Changing the discovery algorithm won't fix it; changing what you feed it will.
A three-node system, four ways of looking at it.
A ground-truth causal graph over x, y, z, with just two edges, passed through a partially observed linear dynamical system. Switch the state representation and watch which edges the Granger test finds.
The same test, four times.
We construct a linear-Gaussian state-space system with a known two-edge causal graph, apply a partial observation matrix, and feed each of the four representations into the same set of discovery methods: Granger F, VAR coefficient tests, partial correlation, and LASSO.
04.1Conditions
- Varying observability — rank of the observation matrix from 1 to full.
- Varying sample size — 200 to 20,000 timesteps.
- Varying noise — process and observation noise independently.
- Varying system dimension — 3 to 12 latent variables.
04.2Result
Across every condition, the ranking is the same. True and filtered are near-identical. Smoothed produces systematic false edges. Innovations are causally inert.
| Representation | Signal | Bias | Discovered |
|---|---|---|---|
| True | preserved | none | oracle |
| Filtered | preserved | negligible | near-oracle |
| Smoothed | distorted | structural | false edges |
| Innovations | removed | n/a | no edges |
Critically, the smoothed-state false-positive rate increases with sample size. This rules out finite-sample noise as the cause. It is a structural artefact of the representation.
Change one line, get a different answer.
If you run an estimate-then-discover pipeline on a partially observed system, replace the smoothed trajectory with the filtered one. The Kalman recursion already produces it.
05.1Limits
- Analysis assumes linear Gaussian systems with correct model specification.
- Experiments use synthetic data. Real-world behaviour is the next section of work.
- Recovery degrades under very low observability or high latent dimension.
- The nonlinear case remains open and is the direction we are moving toward.
05.2Direction
- Integrated estimation-and-discovery formulations in which the two steps are jointly aware of each other.
- Extension to nonlinear state estimation — particle and ensemble filters.
- Evaluation on real datasets: power networks, neural population recordings, sensor arrays.