diff_diff.generate_event_study_data#
- diff_diff.generate_event_study_data(n_units=300, n_pre=5, n_post=5, treatment_fraction=0.5, treatment_effect=5.0, unit_fe_sd=2.0, noise_sd=2.0, seed=None)[source]
Generate synthetic data for event study analysis.
Creates panel data with simultaneous treatment at period n_pre. Useful for testing MultiPeriodDiD, pre-trends power analysis, and HonestDiD sensitivity analysis.
- Parameters:
n_units (int, default=300) – Total number of units in the panel.
n_pre (int, default=5) – Number of pre-treatment periods.
n_post (int, default=5) – Number of post-treatment periods.
treatment_fraction (float, default=0.5) – Fraction of units that receive treatment.
treatment_effect (float, default=5.0) – True average treatment effect on the treated.
unit_fe_sd (float, default=2.0) – Standard deviation of unit fixed effects.
noise_sd (float, default=2.0) – Standard deviation of idiosyncratic noise.
seed (int, optional) – Random seed for reproducibility.
- Returns:
Synthetic event study data with columns: - unit: Unit identifier - period: Time period - treated: Binary unit-level treatment indicator - post: Binary post-treatment indicator - outcome: Outcome variable - event_time: Time relative to treatment (negative=pre, 0+=post) - true_effect: The true treatment effect for this observation
- Return type:
pd.DataFrame
Examples
Generate event study data:
>>> data = generate_event_study_data(n_units=300, n_pre=5, n_post=5, seed=42) >>> data['event_time'].unique() array([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4])
Use with MultiPeriodDiD:
>>> from diff_diff import MultiPeriodDiD >>> mp_did = MultiPeriodDiD() >>> results = mp_did.fit(data, outcome='outcome', treatment='treated', ... time='period', post_periods=[5, 6, 7, 8, 9])
Notes
The event_time column is relative to treatment: - Negative values: pre-treatment periods - 0: first post-treatment period - Positive values: subsequent post-treatment periods