diff_diff.SyntheticDiD#
- class diff_diff.SyntheticDiD[source]#
Bases:
DifferenceInDifferencesSynthetic Difference-in-Differences (SDID) estimator.
Combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units’ pre-treatment trends.
This method is particularly useful when:
You have few treated units (possibly just one)
Parallel trends assumption may be questionable
Control units are heterogeneous and need reweighting
You want robustness to pre-treatment differences
- Parameters:
zeta_omega (float, optional) – Regularization for unit weights. If None (default), auto-computed from data as
(N1 * T1)^(1/4) * noise_levelmatching R’s synthdid.zeta_lambda (float, optional) – Regularization for time weights. If None (default), auto-computed from data as
1e-6 * noise_levelmatching R’s synthdid.alpha (float, default=0.05) – Significance level for confidence intervals.
variance_method (str, default="placebo") –
Method for variance estimation:
”placebo”: Placebo-based variance matching R’s synthdid::vcov(method=”placebo”). Implements Algorithm 4 from Arkhangelsky et al. (2021). Library default (R’s default is
"bootstrap"; we default to placebo because it is unconditionally available on pweight-only survey designs and avoids the ~5–30× slowdown of the refit bootstrap). See REGISTRY.md §SyntheticDiDNote (default variance_method deviation from R)for rationale.”bootstrap”: Paper-faithful pairs bootstrap — Arkhangelsky et al. (2021) Algorithm 2 step 2, also the behavior of R’s default synthdid::vcov(method=”bootstrap”) (which rebinds
attr(estimate, "opts")withupdate.omega=TRUE, so the renormalized ω is only Frank-Wolfe initialization). Re-estimates ω̂_b and λ̂_b via two-pass sparsified Frank-Wolfe on each bootstrap draw. Survey support (PR #352): pweight-only fits use the constant per-control survey weight asrw; full-design fits (strata/PSU/FPC) use Rao-Wu rescaled weights per draw. Both compose with the weighted Frank-Wolfe kernel (min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²); the FW returns ω on the standard simplex, thenω_eff = rw·ω/Σ(rw·ω)is composed for the SDID estimator. See REGISTRY.md §SyntheticDiDNote (survey + bootstrap composition)for the argmin-set caveat.”jackknife”: Jackknife variance matching R’s synthdid::vcov(method=”jackknife”). Implements Algorithm 3 from Arkhangelsky et al. (2021). Deterministic (N_control + N_treated iterations), uses fixed weights (no re-estimation). The
n_bootstrapparameter is ignored for this method.
n_bootstrap (int, default=200) – Number of replications for variance estimation. Used for: - Bootstrap: Number of bootstrap samples - Placebo: Number of random permutations (matches R’s replications argument) Ignored when
variance_method="jackknife".seed (int, optional) – Random seed for reproducibility. If None (default), results will vary between runs.
- results_#
Estimation results after calling fit().
- Type:
Examples
Basic usage with panel data:
>>> import pandas as pd >>> from diff_diff import SyntheticDiD >>> >>> # Panel data with units observed over multiple time periods >>> # Treatment occurs at period 5 for treated units >>> data = pd.DataFrame({ ... 'unit': [...], # Unit identifier ... 'period': [...], # Time period ... 'outcome': [...], # Outcome variable ... 'treated': [...] # 1 if unit is ever treated, 0 otherwise ... }) >>> >>> # Fit SDID model >>> sdid = SyntheticDiD() >>> results = sdid.fit( ... data, ... outcome='outcome', ... treatment='treated', ... unit='unit', ... time='period', ... post_periods=[5, 6, 7, 8] ... ) >>> >>> # View results >>> results.print_summary() >>> print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})") >>> >>> # Examine unit weights >>> weights_df = results.get_unit_weights_df() >>> print(weights_df.head(10))
Notes
The SDID estimator (Arkhangelsky et al., 2021) computes:
- τ̂ = (Ȳ_treated,post - Σ_t λ_t * Y_treated,t)
Σ_j ω_j * (Ȳ_j,post - Σ_t λ_t * Y_j,t)
Where: - ω_j are unit weights (sum to 1, non-negative) - λ_t are time weights (sum to 1, non-negative)
- Unit weights ω are chosen to match pre-treatment outcomes:
min ||Σ_j ω_j * Y_j,pre - Y_treated,pre||²
This interpolates between: - Standard DiD (uniform weights): ω_j = 1/N_control - Synthetic Control (exact matching): concentrated weights
Conley spatial-HAC rejection. SyntheticDiD does not support the Conley (1999) spatial-HAC analytical sandwich. Passing
vcov_type="conley"or any non-NoneConley keyword (conley_coords,conley_cutoff_km,conley_metric,conley_kernel) to__init__orset_paramsraisesTypeError. Rationale: SyntheticDiD’s variance is derived from bootstrap / jackknife / placebo resampling (Arkhangelsky et al. 2021 Algorithms 2–4), not the sandwich identity Conley plugs into. Adding Conley support would require either an analytical SDID sandwich path or a spatial-block bootstrap (Politis-Romano 1994 territory). Tracked as a follow-up inTODO.md.References
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088-4118.
Methods
__init__([zeta_omega, zeta_lambda, alpha, ...])fit(data, outcome, treatment, unit, time[, ...])Fit the Synthetic Difference-in-Differences model.
get_params()Get estimator parameters.
predict(data)Predict outcomes using fitted model.
print_summary()Print summary to stdout.
set_params(**params)Set estimator parameters.
summary()Get summary of estimation results.
- __init__(zeta_omega=None, zeta_lambda=None, alpha=0.05, variance_method='placebo', n_bootstrap=200, seed=None, lambda_reg=None, zeta=None, vcov_type=None, conley_coords=None, conley_cutoff_km=None, conley_metric=None, conley_kernel=None, conley_lag_cutoff=None)[source]#
- Parameters:
zeta_omega (float | None)
zeta_lambda (float | None)
alpha (float)
variance_method (str)
n_bootstrap (int)
seed (int | None)
lambda_reg (float | None)
zeta (float | None)
vcov_type (str | None)
conley_cutoff_km (float | None)
conley_metric (str | None)
conley_kernel (str | None)
conley_lag_cutoff (int | None)
- classmethod __new__(*args, **kwargs)#