diff_diff.generate_did_data#
- diff_diff.generate_did_data(n_units=100, n_periods=4, treatment_effect=5.0, treatment_fraction=0.5, treatment_period=2, unit_fe_sd=2.0, time_trend=0.5, noise_sd=1.0, seed=None)[source]
Generate synthetic data for DiD analysis with known treatment effect.
Creates a balanced panel dataset with realistic features including unit fixed effects, time trends, and a known treatment effect.
- Parameters:
n_units (int, default=100) – Number of units in the panel.
n_periods (int, default=4) – Number of time periods.
treatment_effect (float, default=5.0) – True average treatment effect on the treated.
treatment_fraction (float, default=0.5) – Fraction of units that receive treatment.
treatment_period (int, default=2) – First post-treatment period (0-indexed). Periods >= this are post.
unit_fe_sd (float, default=2.0) – Standard deviation of unit fixed effects.
time_trend (float, default=0.5) – Linear time trend coefficient.
noise_sd (float, default=1.0) – Standard deviation of idiosyncratic noise.
seed (int, optional) – Random seed for reproducibility.
- Returns:
Synthetic panel data with columns: - unit: Unit identifier - period: Time period - treated: Treatment indicator (0/1) - post: Post-treatment indicator (0/1) - outcome: Outcome variable - true_effect: The true treatment effect (for validation)
- Return type:
pd.DataFrame
Examples
Generate simple data for testing:
>>> data = generate_did_data(n_units=50, n_periods=4, treatment_effect=3.0, seed=42) >>> len(data) 200 >>> data.columns.tolist() ['unit', 'period', 'treated', 'post', 'outcome', 'true_effect']
Verify treatment effect recovery:
>>> from diff_diff import DifferenceInDifferences >>> did = DifferenceInDifferences() >>> results = did.fit(data, outcome='outcome', treatment='treated', time='post') >>> abs(results.att - 3.0) < 1.0 # Close to true effect True