diff_diff.generate_panel_data#

diff_diff.generate_panel_data(n_units=100, n_periods=8, treatment_period=4, treatment_fraction=0.5, treatment_effect=5.0, parallel_trends=True, trend_violation=1.0, unit_fe_sd=2.0, noise_sd=0.5, seed=None)[source]

Generate synthetic panel data for parallel trends testing.

Creates panel data with optional violation of parallel trends, useful for testing parallel trends diagnostics, placebo tests, and sensitivity analysis methods.

Parameters:

n_units (int, default=100) – Total number of units in the panel.
n_periods (int, default=8) – Number of time periods.
treatment_period (int, default=4) – First post-treatment period (0-indexed).
treatment_fraction (float, default=0.5) – Fraction of units that receive treatment.
treatment_effect (float, default=5.0) – True average treatment effect on the treated.
parallel_trends (bool, default=True) – If True, treated and control groups have parallel pre-treatment trends. If False, treated group has a steeper pre-treatment trend.
trend_violation (float, default=1.0) – Size of the differential trend for treated group when parallel_trends=False. Treated units have trend = common_trend + trend_violation.
unit_fe_sd (float, default=2.0) – Standard deviation of unit fixed effects.
noise_sd (float, default=0.5) – Standard deviation of idiosyncratic noise.
seed (int, optional) – Random seed for reproducibility.

Returns:

Synthetic panel data with columns: - unit: Unit identifier - period: Time period - treated: Binary unit-level treatment indicator - post: Binary post-treatment indicator - outcome: Outcome variable - true_effect: The true treatment effect for this observation

Return type:

pd.DataFrame

Examples

Generate data with parallel trends:

>>> data_parallel = generate_panel_data(parallel_trends=True, seed=42)
>>> from diff_diff.utils import check_parallel_trends
>>> result = check_parallel_trends(data_parallel, outcome='outcome',
...                                time='period', treatment_group='treated',
...                                pre_periods=[0, 1, 2, 3])
>>> result['parallel_trends_plausible']
True

Generate data with trend violation:

>>> data_violation = generate_panel_data(parallel_trends=False, seed=42)
>>> result = check_parallel_trends(data_violation, outcome='outcome',
...                                time='period', treatment_group='treated',
...                                pre_periods=[0, 1, 2, 3])
>>> result['parallel_trends_plausible']
False