diff_diff.generate_ddd_data#

diff_diff.generate_ddd_data(n_per_cell=100, treatment_effect=2.0, group_effect=2.0, partition_effect=1.0, time_effect=0.5, noise_sd=1.0, add_covariates=False, seed=None)[source]

Generate synthetic data for Triple Difference (DDD) analysis.

Creates data following the DGP: Y = mu + G + P + T + G*P + G*T + P*T + tau*G*P*T + eps

where G=group, P=partition, T=time. The treatment effect (tau) only applies to units that are in the treated group (G=1), eligible partition (P=1), and post-treatment period (T=1).

Parameters:

n_per_cell (int, default=100) – Number of observations per cell (8 cells total: 2x2x2).
treatment_effect (float, default=2.0) – True average treatment effect on the treated (G=1, P=1, T=1).
group_effect (float, default=2.0) – Main effect of being in treated group.
partition_effect (float, default=1.0) – Main effect of being in eligible partition.
time_effect (float, default=0.5) – Main effect of post-treatment period.
noise_sd (float, default=1.0) – Standard deviation of idiosyncratic noise.
add_covariates (bool, default=False) – If True, adds age and education covariates that affect outcome.
seed (int, optional) – Random seed for reproducibility.

Returns:

Synthetic DDD data with columns: - outcome: Outcome variable - group: Group indicator (0=control, 1=treated) - partition: Partition indicator (0=ineligible, 1=eligible) - time: Time indicator (0=pre, 1=post) - unit_id: Unique unit identifier - true_effect: The true treatment effect for this observation - age: Age covariate (if add_covariates=True) - education: Education covariate (if add_covariates=True)

Return type:

pd.DataFrame

Examples

Generate DDD data:

>>> data = generate_ddd_data(n_per_cell=100, treatment_effect=3.0, seed=42)
>>> data.shape
(800, 6)
>>> data.groupby(['group', 'partition', 'time']).size()
group  partition  time
0      0          0       100
                  1       100
       1          0       100
                  1       100
1      0          0       100
                  1       100
       1          0       100
                  1       100
dtype: int64

Use with TripleDifference estimator:

>>> from diff_diff import TripleDifference
>>> ddd = TripleDifference()
>>> results = ddd.fit(data, outcome='outcome', group='group',
...                   partition='partition', time='time')
>>> abs(results.att - 3.0) < 1.0
True