This repository contains the reference implementation and experiment scripts for DCD (Decomposition-Based Causal Discovery), a framework for recovering causal structure from non-stationary, seasonal time series. It also includes head-to-head baselines (DYNOTEARS, PCMCI+, CD-NOD) and the synthetic / real-world datasets used in the accompanying paper.
Causal discovery methods for time series tend to break down when the data is dominated by periodic or seasonal components. Strong auto-correlations from seasonality create spurious "causal" edges, and as the lag horizon τ_max grows the false discovery rate explodes.
DCD fixes this with a simple two-step recipe:
- Decompose each variable with Seasonal-Trend (STL) decomposition and keep the residual component.
- Discover on the residuals with PCMCI+ (CMI-knn independence test).
This preserves the true lagged dependence structure while removing the cyclic components that confound standard algorithms, so DCD keeps high TPR and low SHD well beyond τ_max = 4.
DCD/
├── src/
│ └── dcd/ # installable Python package
│ ├── __init__.py
│ ├── core.py # DCD pipeline
│ ├── baselines/
│ │ └── cdnod_pcmci.py # CD-NOD and PCMCI+ baselines on Arctic data
│ └── utils/
│ └── extract_code.py # helper: notebook -> .py
├── experiments/ # runnable experiment drivers
│ ├── ablation.py
│ ├── extensive_ablation.py
│ ├── plot_arctic.py
│ ├── plot_ehh1.py
│ ├── dynotears_arctic.py
│ └── run_dynotears_baseline.py
├── scripts/
│ └── regenerate_paper_tables.py # rebuilds the summary tables in results/
├── notebooks/ # annotated walkthroughs
│ ├── DCD.ipynb
│ ├── DYNOTEARS.ipynb
│ ├── CD_NOD_PCMCI+_on_SIE.ipynb
│ └── Synthetic_data_results.ipynb
├── datasets/
│ ├── Arctic_Monthly.csv # real-world climate dataset
│ ├── ehh1.csv # real-world (EHH-1) dataset
│ ├── lag_2/ # synthetic data, ground-truth max lag = 2
│ ├── lag_3/ # ground-truth max lag = 3
│ └── lag_4/ # ground-truth max lag = 4
├── replication/
│ └── isolation_test.py # Table A3 isolation-test replication script
├── tetrad/ # Tetrad session files for non-Python baselines
│ ├── Other_approaches.tet # BOSS-LiNGAM, CPC, CFCI sessions
│ ├── Results_4var.tet # saved results for 4-variable experiments
│ └── README.md
├── results/ # CSV outputs from experiments
├── figures/ # generated plots
├── pyproject.toml
├── requirements.txt
├── LICENSE
└── README.md
Each synthetic CSV is named {n_vars}.{lag}.{n_samples}.csv, e.g. 6.3.1000.csv is 6 variables, ground-truth lag 3, 1000 time steps.
Tested on Python 3.8 – 3.11.
git clone https://github.com/hferdous/DCD.git cd DCD # Option A: just install dependencies pip install -r requirements.txt # Option B: editable install of the `dcd` package pip install -e .
The heaviest dependencies are tigramite,
causalnex, and
causal-learn. We recommend a fresh
virtual environment.
All experiment drivers live under experiments/ and can be run directly —
they resolve dataset/result paths relative to the repo root, so you can
launch them from anywhere.
python experiments/extensive_ablation.py
Sweeps n_vars ∈ {4,6,8}, n_samples ∈ {500,1000,1500},
lag ∈ {2,4,6}, period ∈ {10,15,20,25,30,35} and writes
results/extensive_ablation_results.csv.
python experiments/run_dynotears_baseline.py
Writes results/dynotears_baseline_results.csv.
python experiments/plot_arctic.py # → figures/arctic_dynotears_graph.png python experiments/plot_ehh1.py # → figures/ehh1_dynotears_graph.png python experiments/dynotears_arctic.py # prints edge list for Arctic_Monthly.csv
python -m dcd.baselines.cdnod_pcmci --method pcmci python -m dcd.baselines.cdnod_pcmci --method cdnod
python scripts/regenerate_paper_tables.py
Writes algorithm_comparison.csv, statistics_by_sample_size.csv,
and statistics_by_i_value.csv to results/.
python replication/isolation_test.py
Runs the multi-scale isolation test (d=6, n=1000, 3 seeds) and writes
replication/isolation_comparison.csv comparing PCMCI+, DYNOTEARS, and DCD
on raw / STL-residual / multi-scale inputs.
Open the notebooks for step-by-step versions of each pipeline:
notebooks/DCD.ipynbnotebooks/DYNOTEARS.ipynbnotebooks/CD_NOD_PCMCI+_on_SIE.ipynbnotebooks/Synthetic_data_results.ipynb
As a library:
import pandas as pd from dcd.core import load_dataset, decompose_all, run_pcmci_analysis df = load_dataset(pd.read_csv("your_series.csv"), time_col="time") components_df, periods = decompose_all(df, "time") results = run_pcmci_analysis(components_df, max_lag=4)
Or from the command line:
python -m dcd.core your_series.csv --time-col time --max-lag 4 --no-plot- DCD maintains TPR ≈ 1.0 across lag depths up to 6, where baselines drop to ≤ 0.3.
- DYNOTEARS produces > 90% false-discovery edges on periodic systems at
τ_max > 2. - STL decomposition prior to independence-based discovery is necessary for reliable recovery in multivariate periodic time series of up to 1500 samples.
The manuscript is available on arXiv: arXiv:2602.01433.
Released under the MIT License — see LICENSE.
If you find DCD useful in your research, please cite our paper:
@article{ferdous2026dcd, title={DCD: Decomposition-based Causal Discovery from Autocorrelated and Non-Stationary Temporal Data}, author={Ferdous, Muhammad Hasan and Gani, Md Osman}, journal={arXiv preprint arXiv:2602.01433}, year={2026} }