diff_diff.SyntheticDiD
- class diff_diff.SyntheticDiD[source]
Bases:
DifferenceInDifferencesSynthetic Difference-in-Differences (SDID) estimator.
Combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units’ pre-treatment trends.
This method is particularly useful when: - You have few treated units (possibly just one) - Parallel trends assumption may be questionable - Control units are heterogeneous and need reweighting - You want robustness to pre-treatment differences
- Parameters:
zeta_omega (float, optional) – Regularization for unit weights. If None (default), auto-computed from data as
(N1 * T1)^(1/4) * noise_levelmatching R’s synthdid.zeta_lambda (float, optional) – Regularization for time weights. If None (default), auto-computed from data as
1e-6 * noise_levelmatching R’s synthdid.alpha (float, default=0.05) – Significance level for confidence intervals.
variance_method (str, default="placebo") –
Method for variance estimation: - “placebo”: Placebo-based variance matching R’s synthdid::vcov(method=”placebo”).
Implements Algorithm 4 from Arkhangelsky et al. (2021). This is R’s default.
”bootstrap”: Bootstrap at unit level with fixed weights matching R’s synthdid::vcov(method=”bootstrap”).
n_bootstrap (int, default=200) – Number of replications for variance estimation. Used for both: - Bootstrap: Number of bootstrap samples - Placebo: Number of random permutations (matches R’s replications argument)
seed (int, optional) – Random seed for reproducibility. If None (default), results will vary between runs.
- results_
Estimation results after calling fit().
- Type:
Examples
Basic usage with panel data:
>>> import pandas as pd >>> from diff_diff import SyntheticDiD >>> >>> # Panel data with units observed over multiple time periods >>> # Treatment occurs at period 5 for treated units >>> data = pd.DataFrame({ ... 'unit': [...], # Unit identifier ... 'period': [...], # Time period ... 'outcome': [...], # Outcome variable ... 'treated': [...] # 1 if unit is ever treated, 0 otherwise ... }) >>> >>> # Fit SDID model >>> sdid = SyntheticDiD() >>> results = sdid.fit( ... data, ... outcome='outcome', ... treatment='treated', ... unit='unit', ... time='period', ... post_periods=[5, 6, 7, 8] ... ) >>> >>> # View results >>> results.print_summary() >>> print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})") >>> >>> # Examine unit weights >>> weights_df = results.get_unit_weights_df() >>> print(weights_df.head(10))
Notes
The SDID estimator (Arkhangelsky et al., 2021) computes:
- τ̂ = (Ȳ_treated,post - Σ_t λ_t * Y_treated,t)
Σ_j ω_j * (Ȳ_j,post - Σ_t λ_t * Y_j,t)
Where: - ω_j are unit weights (sum to 1, non-negative) - λ_t are time weights (sum to 1, non-negative)
- Unit weights ω are chosen to match pre-treatment outcomes:
min ||Σ_j ω_j * Y_j,pre - Y_treated,pre||²
This interpolates between: - Standard DiD (uniform weights): ω_j = 1/N_control - Synthetic Control (exact matching): concentrated weights
References
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088-4118.
- __init__(zeta_omega=None, zeta_lambda=None, alpha=0.05, variance_method='placebo', n_bootstrap=200, seed=None, lambda_reg=None, zeta=None)[source]
Methods
__init__([zeta_omega, zeta_lambda, alpha, ...])fit(data, outcome, treatment, unit, time[, ...])Fit the Synthetic Difference-in-Differences model.
Get estimator parameters.
predict(data)Predict outcomes using fitted model.
Print summary to stdout.
set_params(**params)Set estimator parameters.
summary()Get summary of estimation results.
- __init__(zeta_omega=None, zeta_lambda=None, alpha=0.05, variance_method='placebo', n_bootstrap=200, seed=None, lambda_reg=None, zeta=None)[source]
- fit(data, outcome, treatment, unit, time, post_periods=None, covariates=None)[source]
Fit the Synthetic Difference-in-Differences model.
- Parameters:
data (pd.DataFrame) – Panel data with observations for multiple units over multiple time periods.
outcome (str) – Name of the outcome variable column.
treatment (str) – Name of the treatment group indicator column (0/1). Should be 1 for all observations of treated units (both pre and post treatment).
unit (str) – Name of the unit identifier column.
time (str) – Name of the time period column.
post_periods (list, optional) – List of time period values that are post-treatment. If None, uses the last half of periods.
covariates (list, optional) – List of covariate column names. Covariates are residualized out before computing the SDID estimator.
- Returns:
Object containing the ATT estimate, standard error, unit weights, and time weights.
- Return type:
- Raises:
ValueError – If required parameters are missing or data validation fails.