diff_diff.generate_did_data#

diff_diff.generate_did_data(n_units=100, n_periods=4, treatment_effect=5.0, treatment_fraction=0.5, treatment_period=2, unit_fe_sd=2.0, time_trend=0.5, noise_sd=1.0, seed=None)[source]

Generate synthetic data for DiD analysis with known treatment effect.

Creates a balanced panel dataset with realistic features including unit fixed effects, time trends, and a known treatment effect.

Parameters:
  • n_units (int, default=100) – Number of units in the panel.

  • n_periods (int, default=4) – Number of time periods.

  • treatment_effect (float, default=5.0) – True average treatment effect on the treated.

  • treatment_fraction (float, default=0.5) – Fraction of units that receive treatment.

  • treatment_period (int, default=2) – First post-treatment period (0-indexed). Periods >= this are post.

  • unit_fe_sd (float, default=2.0) – Standard deviation of unit fixed effects.

  • time_trend (float, default=0.5) – Linear time trend coefficient.

  • noise_sd (float, default=1.0) – Standard deviation of idiosyncratic noise.

  • seed (int, optional) – Random seed for reproducibility.

Returns:

Synthetic panel data with columns: - unit: Unit identifier - period: Time period - treated: Treatment indicator (0/1) - post: Post-treatment indicator (0/1) - outcome: Outcome variable - true_effect: The true treatment effect (for validation)

Return type:

pd.DataFrame

Examples

Generate simple data for testing:

>>> data = generate_did_data(n_units=50, n_periods=4, treatment_effect=3.0, seed=42)
>>> len(data)
200
>>> data.columns.tolist()
['unit', 'period', 'treated', 'post', 'outcome', 'true_effect']

Verify treatment effect recovery:

>>> from diff_diff import DifferenceInDifferences
>>> did = DifferenceInDifferences()
>>> results = did.fit(data, outcome='outcome', treatment='treated', time='post')
>>> abs(results.att - 3.0) < 1.0  # Close to true effect
True