diff_diff.generate_ddd_data#
- diff_diff.generate_ddd_data(n_per_cell=100, treatment_effect=2.0, group_effect=2.0, partition_effect=1.0, time_effect=0.5, noise_sd=1.0, add_covariates=False, seed=None)[source]
Generate synthetic data for Triple Difference (DDD) analysis.
Creates data following the DGP: Y = mu + G + P + T + G*P + G*T + P*T + tau*G*P*T + eps
where G=group, P=partition, T=time. The treatment effect (tau) only applies to units that are in the treated group (G=1), eligible partition (P=1), and post-treatment period (T=1).
- Parameters:
n_per_cell (int, default=100) – Number of observations per cell (8 cells total: 2x2x2).
treatment_effect (float, default=2.0) – True average treatment effect on the treated (G=1, P=1, T=1).
group_effect (float, default=2.0) – Main effect of being in treated group.
partition_effect (float, default=1.0) – Main effect of being in eligible partition.
time_effect (float, default=0.5) – Main effect of post-treatment period.
noise_sd (float, default=1.0) – Standard deviation of idiosyncratic noise.
add_covariates (bool, default=False) – If True, adds age and education covariates that affect outcome.
seed (int, optional) – Random seed for reproducibility.
- Returns:
Synthetic DDD data with columns: - outcome: Outcome variable - group: Group indicator (0=control, 1=treated) - partition: Partition indicator (0=ineligible, 1=eligible) - time: Time indicator (0=pre, 1=post) - unit_id: Unique unit identifier - true_effect: The true treatment effect for this observation - age: Age covariate (if add_covariates=True) - education: Education covariate (if add_covariates=True)
- Return type:
pd.DataFrame
Examples
Generate DDD data:
>>> data = generate_ddd_data(n_per_cell=100, treatment_effect=3.0, seed=42) >>> data.shape (800, 6) >>> data.groupby(['group', 'partition', 'time']).size() group partition time 0 0 0 100 1 100 1 0 100 1 100 1 0 0 100 1 100 1 0 100 1 100 dtype: int64
Use with TripleDifference estimator:
>>> from diff_diff import TripleDifference >>> ddd = TripleDifference() >>> results = ddd.fit(data, outcome='outcome', group='group', ... partition='partition', time='time') >>> abs(results.att - 3.0) < 1.0 True