diff_diff.SyntheticDiD#

class diff_diff.SyntheticDiD[source]#

Bases: DifferenceInDifferences

Synthetic Difference-in-Differences (SDID) estimator.

Combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units’ pre-treatment trends.

This method is particularly useful when:

  • You have few treated units (possibly just one)

  • Parallel trends assumption may be questionable

  • Control units are heterogeneous and need reweighting

  • You want robustness to pre-treatment differences

Parameters:
  • zeta_omega (float, optional) – Regularization for unit weights. If None (default), auto-computed from data as (N1 * T1)^(1/4) * noise_level matching R’s synthdid.

  • zeta_lambda (float, optional) – Regularization for time weights. If None (default), auto-computed from data as 1e-6 * noise_level matching R’s synthdid.

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • variance_method (str, default="placebo") –

    Method for variance estimation:

    • ”placebo”: Placebo-based variance matching R’s synthdid::vcov(method=”placebo”). Implements Algorithm 4 from Arkhangelsky et al. (2021). Library default (R’s default is "bootstrap"; we default to placebo because it is unconditionally available on pweight-only survey designs and avoids the ~5–30× slowdown of the refit bootstrap). See REGISTRY.md §SyntheticDiD Note (default variance_method deviation from R) for rationale.

    • ”bootstrap”: Paper-faithful pairs bootstrap — Arkhangelsky et al. (2021) Algorithm 2 step 2, also the behavior of R’s default synthdid::vcov(method=”bootstrap”) (which rebinds attr(estimate, "opts") with update.omega=TRUE, so the renormalized ω is only Frank-Wolfe initialization). Re-estimates ω̂_b and λ̂_b via two-pass sparsified Frank-Wolfe on each bootstrap draw. Survey support (PR #352): pweight-only fits use the constant per-control survey weight as rw; full-design fits (strata/PSU/FPC) use Rao-Wu rescaled weights per draw. Both compose with the weighted Frank-Wolfe kernel (min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²); the FW returns ω on the standard simplex, then ω_eff = rw·ω/Σ(rw·ω) is composed for the SDID estimator. See REGISTRY.md §SyntheticDiD Note (survey + bootstrap composition) for the argmin-set caveat.

    • ”jackknife”: Jackknife variance matching R’s synthdid::vcov(method=”jackknife”). Implements Algorithm 3 from Arkhangelsky et al. (2021). Deterministic (N_control + N_treated iterations), uses fixed weights (no re-estimation). The n_bootstrap parameter is ignored for this method.

  • n_bootstrap (int, default=200) – Number of replications for variance estimation. Used for: - Bootstrap: Number of bootstrap samples - Placebo: Number of random permutations (matches R’s replications argument) Ignored when variance_method="jackknife".

  • seed (int, optional) – Random seed for reproducibility. If None (default), results will vary between runs.

results_#

Estimation results after calling fit().

Type:

SyntheticDiDResults

is_fitted_#

Whether the model has been fitted.

Type:

bool

Examples

Basic usage with panel data:

>>> import pandas as pd
>>> from diff_diff import SyntheticDiD
>>>
>>> # Panel data with units observed over multiple time periods
>>> # Treatment occurs at period 5 for treated units
>>> data = pd.DataFrame({
...     'unit': [...],      # Unit identifier
...     'period': [...],    # Time period
...     'outcome': [...],   # Outcome variable
...     'treated': [...]    # 1 if unit is ever treated, 0 otherwise
... })
>>>
>>> # Fit SDID model
>>> sdid = SyntheticDiD()
>>> results = sdid.fit(
...     data,
...     outcome='outcome',
...     treatment='treated',
...     unit='unit',
...     time='period',
...     post_periods=[5, 6, 7, 8]
... )
>>>
>>> # View results
>>> results.print_summary()
>>> print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
>>>
>>> # Examine unit weights
>>> weights_df = results.get_unit_weights_df()
>>> print(weights_df.head(10))

Notes

The SDID estimator (Arkhangelsky et al., 2021) computes:

τ̂ = (Ȳ_treated,post - Σ_t λ_t * Y_treated,t)
  • Σ_j ω_j * (Ȳ_j,post - Σ_t λ_t * Y_j,t)

Where: - ω_j are unit weights (sum to 1, non-negative) - λ_t are time weights (sum to 1, non-negative)

Unit weights ω are chosen to match pre-treatment outcomes:

min ||Σ_j ω_j * Y_j,pre - Y_treated,pre||²

This interpolates between: - Standard DiD (uniform weights): ω_j = 1/N_control - Synthetic Control (exact matching): concentrated weights

Conley spatial-HAC rejection. SyntheticDiD does not support the Conley (1999) spatial-HAC analytical sandwich. Passing vcov_type="conley" or any non-None Conley keyword (conley_coords, conley_cutoff_km, conley_metric, conley_kernel) to __init__ or set_params raises TypeError. Rationale: SyntheticDiD’s variance is derived from bootstrap / jackknife / placebo resampling (Arkhangelsky et al. 2021 Algorithms 2–4), not the sandwich identity Conley plugs into. Adding Conley support would require either an analytical SDID sandwich path or a spatial-block bootstrap (Politis-Romano 1994 territory). Tracked as a follow-up in TODO.md.

References

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088-4118.

Methods

__init__([zeta_omega, zeta_lambda, alpha, ...])

fit(data, outcome, treatment, unit, time[, ...])

Fit the Synthetic Difference-in-Differences model.

get_params()

Get estimator parameters.

predict(data)

Predict outcomes using fitted model.

print_summary()

Print summary to stdout.

set_params(**params)

Set estimator parameters.

summary()

Get summary of estimation results.

__init__(zeta_omega=None, zeta_lambda=None, alpha=0.05, variance_method='placebo', n_bootstrap=200, seed=None, lambda_reg=None, zeta=None, vcov_type=None, conley_coords=None, conley_cutoff_km=None, conley_metric=None, conley_kernel=None, conley_lag_cutoff=None)[source]#
Parameters:
  • zeta_omega (float | None)

  • zeta_lambda (float | None)

  • alpha (float)

  • variance_method (str)

  • n_bootstrap (int)

  • seed (int | None)

  • lambda_reg (float | None)

  • zeta (float | None)

  • vcov_type (str | None)

  • conley_coords (Tuple[str, str] | None)

  • conley_cutoff_km (float | None)

  • conley_metric (str | None)

  • conley_kernel (str | None)

  • conley_lag_cutoff (int | None)

classmethod __new__(*args, **kwargs)#