diff_diff.generate_panel_data#
- diff_diff.generate_panel_data(n_units=100, n_periods=8, treatment_period=4, treatment_fraction=0.5, treatment_effect=5.0, parallel_trends=True, trend_violation=1.0, unit_fe_sd=2.0, noise_sd=0.5, seed=None)[source]
Generate synthetic panel data for parallel trends testing.
Creates panel data with optional violation of parallel trends, useful for testing parallel trends diagnostics, placebo tests, and sensitivity analysis methods.
- Parameters:
n_units (int, default=100) – Total number of units in the panel.
n_periods (int, default=8) – Number of time periods.
treatment_period (int, default=4) – First post-treatment period (0-indexed).
treatment_fraction (float, default=0.5) – Fraction of units that receive treatment.
treatment_effect (float, default=5.0) – True average treatment effect on the treated.
parallel_trends (bool, default=True) – If True, treated and control groups have parallel pre-treatment trends. If False, treated group has a steeper pre-treatment trend.
trend_violation (float, default=1.0) – Size of the differential trend for treated group when parallel_trends=False. Treated units have trend = common_trend + trend_violation.
unit_fe_sd (float, default=2.0) – Standard deviation of unit fixed effects.
noise_sd (float, default=0.5) – Standard deviation of idiosyncratic noise.
seed (int, optional) – Random seed for reproducibility.
- Returns:
Synthetic panel data with columns: - unit: Unit identifier - period: Time period - treated: Binary unit-level treatment indicator - post: Binary post-treatment indicator - outcome: Outcome variable - true_effect: The true treatment effect for this observation
- Return type:
pd.DataFrame
Examples
Generate data with parallel trends:
>>> data_parallel = generate_panel_data(parallel_trends=True, seed=42) >>> from diff_diff.utils import check_parallel_trends >>> result = check_parallel_trends(data_parallel, outcome='outcome', ... time='period', treatment_group='treated', ... pre_periods=[0, 1, 2, 3]) >>> result['parallel_trends_plausible'] True
Generate data with trend violation:
>>> data_violation = generate_panel_data(parallel_trends=False, seed=42) >>> result = check_parallel_trends(data_violation, outcome='outcome', ... time='period', treatment_group='treated', ... pre_periods=[0, 1, 2, 3]) >>> result['parallel_trends_plausible'] False