diff_diff.SyntheticDiD

class diff_diff.SyntheticDiD[source]

Bases: DifferenceInDifferences

Synthetic Difference-in-Differences (SDID) estimator.

Combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units’ pre-treatment trends.

This method is particularly useful when: - You have few treated units (possibly just one) - Parallel trends assumption may be questionable - Control units are heterogeneous and need reweighting - You want robustness to pre-treatment differences

Parameters:

zeta_omega (float, optional) – Regularization for unit weights. If None (default), auto-computed from data as (N1 * T1)^(1/4) * noise_level matching R’s synthdid.
zeta_lambda (float, optional) – Regularization for time weights. If None (default), auto-computed from data as 1e-6 * noise_level matching R’s synthdid.
alpha (float, default=0.05) – Significance level for confidence intervals.
variance_method (str, default="placebo") –
Method for variance estimation: - “placebo”: Placebo-based variance matching R’s synthdid::vcov(method=”placebo”).

Implements Algorithm 4 from Arkhangelsky et al. (2021). This is R’s default.
- ”bootstrap”: Bootstrap at unit level with fixed weights matching R’s synthdid::vcov(method=”bootstrap”).
n_bootstrap (int, default=200) – Number of replications for variance estimation. Used for both: - Bootstrap: Number of bootstrap samples - Placebo: Number of random permutations (matches R’s replications argument)
seed (int, optional) – Random seed for reproducibility. If None (default), results will vary between runs.

results_

Estimation results after calling fit().

Type:: SyntheticDiDResults

is_fitted_

Whether the model has been fitted.

Type:: bool

Examples

Basic usage with panel data:

>>> import pandas as pd
>>> from diff_diff import SyntheticDiD
>>>
>>> # Panel data with units observed over multiple time periods
>>> # Treatment occurs at period 5 for treated units
>>> data = pd.DataFrame({
...     'unit': [...],      # Unit identifier
...     'period': [...],    # Time period
...     'outcome': [...],   # Outcome variable
...     'treated': [...]    # 1 if unit is ever treated, 0 otherwise
... })
>>>
>>> # Fit SDID model
>>> sdid = SyntheticDiD()
>>> results = sdid.fit(
...     data,
...     outcome='outcome',
...     treatment='treated',
...     unit='unit',
...     time='period',
...     post_periods=[5, 6, 7, 8]
... )
>>>
>>> # View results
>>> results.print_summary()
>>> print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
>>>
>>> # Examine unit weights
>>> weights_df = results.get_unit_weights_df()
>>> print(weights_df.head(10))

Notes

The SDID estimator (Arkhangelsky et al., 2021) computes:

τ̂ = (Ȳ_treated,post - Σ_t λ_t * Y_treated,t)

Σ_j ω_j * (Ȳ_j,post - Σ_t λ_t * Y_j,t)

Where: - ω_j are unit weights (sum to 1, non-negative) - λ_t are time weights (sum to 1, non-negative)

Unit weights ω are chosen to match pre-treatment outcomes:: min ||Σ_j ω_j * Y_j,pre - Y_treated,pre||²

This interpolates between: - Standard DiD (uniform weights): ω_j = 1/N_control - Synthetic Control (exact matching): concentrated weights

References

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088-4118.

__init__(zeta_omega=None, zeta_lambda=None, alpha=0.05, variance_method='placebo', n_bootstrap=200, seed=None, lambda_reg=None, zeta=None)[source]

Parameters:

zeta_omega (float | None)
zeta_lambda (float | None)
alpha (float)
variance_method (str)
n_bootstrap (int)
seed (int | None)
lambda_reg (float | None)
zeta (float | None)

Methods

`__init__`([zeta_omega, zeta_lambda, alpha, ...])
`fit`(data, outcome, treatment, unit, time[, ...])	Fit the Synthetic Difference-in-Differences model.
`get_params`()	Get estimator parameters.
`predict`(data)	Predict outcomes using fitted model.
`print_summary`()	Print summary to stdout.
`set_params`(**params)	Set estimator parameters.
`summary`()	Get summary of estimation results.

__init__(zeta_omega=None, zeta_lambda=None, alpha=0.05, variance_method='placebo', n_bootstrap=200, seed=None, lambda_reg=None, zeta=None)[source]

Parameters:

zeta_omega (float | None)
zeta_lambda (float | None)
alpha (float)
variance_method (str)
n_bootstrap (int)
seed (int | None)
lambda_reg (float | None)
zeta (float | None)

fit(data, outcome, treatment, unit, time, post_periods=None, covariates=None)[source]

Fit the Synthetic Difference-in-Differences model.

Parameters:

data (pd.DataFrame) – Panel data with observations for multiple units over multiple time periods.
outcome (str) – Name of the outcome variable column.
treatment (str) – Name of the treatment group indicator column (0/1). Should be 1 for all observations of treated units (both pre and post treatment).
unit (str) – Name of the unit identifier column.
time (str) – Name of the time period column.
post_periods (list, optional) – List of time period values that are post-treatment. If None, uses the last half of periods.
covariates (list, optional) – List of covariate column names. Covariates are residualized out before computing the SDID estimator.

Returns:

Object containing the ATT estimate, standard error, unit weights, and time weights.

Return type:

SyntheticDiDResults

Raises:

ValueError – If required parameters are missing or data validation fails.

get_params()[source]

Get estimator parameters.

Return type:: Dict[str, Any]

set_params(**params)[source]

Set estimator parameters.

Return type:: SyntheticDiD