diff_diff.generate_factor_data#

diff_diff.generate_factor_data(n_units=50, n_pre=10, n_post=5, n_treated=10, n_factors=2, treatment_effect=2.0, factor_strength=1.0, treated_loading_shift=0.5, unit_fe_sd=1.0, noise_sd=0.5, seed=None)[source]

Generate synthetic panel data with interactive fixed effects (factor model).

Creates data following the DGP: Y_it = mu + alpha_i + beta_t + Lambda_i’F_t + tau*D_it + eps_it

where Lambda_i’F_t is the interactive fixed effects component. Useful for testing TROP (Triply Robust Panel) and comparing with SyntheticDiD.

Parameters:
  • n_units (int, default=50) – Total number of units in the panel.

  • n_pre (int, default=10) – Number of pre-treatment periods.

  • n_post (int, default=5) – Number of post-treatment periods.

  • n_treated (int, default=10) – Number of treated units (assigned to first n_treated unit IDs).

  • n_factors (int, default=2) – Number of latent factors in the interactive fixed effects.

  • treatment_effect (float, default=2.0) – True average treatment effect on the treated.

  • factor_strength (float, default=1.0) – Scaling factor for interactive fixed effects.

  • treated_loading_shift (float, default=0.5) – Shift in factor loadings for treated units (creates confounding).

  • unit_fe_sd (float, default=1.0) – Standard deviation of unit fixed effects.

  • noise_sd (float, default=0.5) – Standard deviation of idiosyncratic noise.

  • seed (int, optional) – Random seed for reproducibility.

Returns:

Synthetic factor model data with columns: - unit: Unit identifier - period: Time period - outcome: Outcome variable - treated: Binary indicator (1 if treated at this observation) - treat: Binary unit-level ever-treated indicator - true_effect: The true treatment effect for this observation

Return type:

pd.DataFrame

Examples

Generate data with factor structure:

>>> data = generate_factor_data(n_units=50, n_factors=2, seed=42)
>>> data.shape
(750, 6)

Use with TROP estimator:

>>> from diff_diff import TROP
>>> trop = TROP(n_bootstrap=50, seed=42)
>>> results = trop.fit(data, outcome='outcome', treatment='treated',
...                    unit='unit', time='period',
...                    post_periods=list(range(10, 15)))

Notes

The treated units have systematically different factor loadings (shifted by treated_loading_shift), which creates confounding that standard DiD cannot address but TROP can handle.