diff_diff.generate_factor_data#
- diff_diff.generate_factor_data(n_units=50, n_pre=10, n_post=5, n_treated=10, n_factors=2, treatment_effect=2.0, factor_strength=1.0, treated_loading_shift=0.5, unit_fe_sd=1.0, noise_sd=0.5, seed=None)[source]
Generate synthetic panel data with interactive fixed effects (factor model).
Creates data following the DGP: Y_it = mu + alpha_i + beta_t + Lambda_i’F_t + tau*D_it + eps_it
where Lambda_i’F_t is the interactive fixed effects component. Useful for testing TROP (Triply Robust Panel) and comparing with SyntheticDiD.
- Parameters:
n_units (int, default=50) – Total number of units in the panel.
n_pre (int, default=10) – Number of pre-treatment periods.
n_post (int, default=5) – Number of post-treatment periods.
n_treated (int, default=10) – Number of treated units (assigned to first n_treated unit IDs).
n_factors (int, default=2) – Number of latent factors in the interactive fixed effects.
treatment_effect (float, default=2.0) – True average treatment effect on the treated.
factor_strength (float, default=1.0) – Scaling factor for interactive fixed effects.
treated_loading_shift (float, default=0.5) – Shift in factor loadings for treated units (creates confounding).
unit_fe_sd (float, default=1.0) – Standard deviation of unit fixed effects.
noise_sd (float, default=0.5) – Standard deviation of idiosyncratic noise.
seed (int, optional) – Random seed for reproducibility.
- Returns:
Synthetic factor model data with columns: - unit: Unit identifier - period: Time period - outcome: Outcome variable - treated: Binary indicator (1 if treated at this observation) - treat: Binary unit-level ever-treated indicator - true_effect: The true treatment effect for this observation
- Return type:
pd.DataFrame
Examples
Generate data with factor structure:
>>> data = generate_factor_data(n_units=50, n_factors=2, seed=42) >>> data.shape (750, 6)
Use with TROP estimator:
>>> from diff_diff import TROP >>> trop = TROP(n_bootstrap=50, seed=42) >>> results = trop.fit(data, outcome='outcome', treatment='treated', ... unit='unit', time='period', ... post_periods=list(range(10, 15)))
Notes
The treated units have systematically different factor loadings (shifted by treated_loading_shift), which creates confounding that standard DiD cannot address but TROP can handle.