Synthetic Control Method (SCM)#
Classic synthetic control estimator for a single treated unit (Abadie, Diamond & Hainmueller 2010; originating in Abadie & Gardeazabal 2003).
The treated unit’s counterfactual is a convex combination of “donor” (never-treated)
units. Donor weights W*(V) solve a simplex-constrained, predictor-importance-weighted
least-squares fit of the treated unit’s pre-period predictors; the diagonal
predictor-importance matrix V is chosen data-driven (minimizing pre-period outcome
MSPE, v_method="nested"; out-of-sample cross-validation, v_method="cv"; or
closed-form inverse-variance, v_method="inverse_variance") or supplied by the user
(v_method="custom"). The
treatment-effect path is the gap \(\hat{\alpha}_{1t} = Y_{1t} - \sum_j w_j Y_{jt}\)
over the post periods.
When to use SCM:
Exactly one treated unit with a long, well-fit pre-treatment period.
A curated donor pool of comparable never-treated units.
Aggregate / few-unit comparative case studies (states, regions, countries).
Inference: classic SCM has no analytical standard error. se, t_stat,
p_value and conf_int are always NaN; att (the mean post-period gap) is the
reported estimate. Significance comes from in-space placebo permutation inference via
in_space_placebo() (post/pre RMSPE-ratio statistic,
placebo_p_value = rank/(n_placebos+1)). This permutation p-value is a separate field
from the (NaN) p_value; is_significant stays bound to p_value.
Robustness diagnostics (ADH 2015 §4, opt-in):
leave_one_out() drops each reportably-weighted (weight > 1e-6)
donor and re-fits (per-drop ATT / delta_att table — a large delta_att flags
single-donor dependence), and
in_time_placebo() reassigns the intervention to an
earlier pre-date and checks for a spurious gap before the true treatment date (the
backdating placebo; placebo_att should be ~0). Both re-run the validated solver and
leave the analytical inference fields NaN.
Distinct from SyntheticDiD (Arkhangelsky et al. 2021), which adds
time weights and ridge regularization; classic SCM uses donor weights only plus the
outer V search.
Reference: Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program. Journal of the American Statistical Association, 105(490), 493–505. doi:10.1198/jasa.2009.ap08746
SyntheticControl#
Main estimator class for classic synthetic control estimation.
- class diff_diff.SyntheticControl[source]
Bases:
objectClassic Synthetic Control Method estimator (Abadie-Diamond-Hainmueller 2010).
- Parameters:
v_method ({"nested", "custom", "cv", "inverse_variance"}, default "nested") – How the predictor-importance matrix V is chosen.
"nested"selects V data-driven by minimizing the pre-period outcome MSPE ofW*(V)(ADH 2010 §2.3)."custom"uses the user-suppliedcustom_vand skips the outer search."cv"selects V by out-of-sample cross-validation (ADH 2015 §; Abadie 2021 Eq. 9): the pre-period is split atv_cv_t0into a training and a validation window; V is chosen to minimize the validation-window outcome MSPE of the training-fit weights, then the final weights are re-estimated on the validation-window predictors."inverse_variance"uses the closed-formv_h = 1/Var(X_{h·})(Abadie 2021 §3.2(a); variance over donors+treated), applied to the RAW predictors so the effective objective is the unit-variance-rescaledΣ_h diff_h²/Var_h— no search, deterministic. Note this rescaling is whatstandardize="std"does, so thestandardizesetting does not compose with it (equivalent to uniform V on standardized predictors); applying1/Varon already-standardized rows would double-rescale toΣ_h diff_h²/Var_h².custom_v (array-like, optional) – Diagonal of V (length = number of predictors). Required iff
v_method="custom"; must be None for every otherv_method(nested/cv/inverse_variance). Must be finite and non-negative; trace-normalized internally.optimizer_options (dict, optional) – Extra options merged into every
scipy.optimize.minimizecall in the outer V search (e.g.maxiter,xatol,fatol).n_starts (int, default 4) – Number of starting points for the multistart outer V search.
inner_max_iter (int, default 10000) – Max iterations for the inner Frank-Wolfe simplex solve.
inner_min_decrease (float, default 1e-5) – Inner-solve convergence scale (matches the SDID/Frank-Wolfe precedent in
prep.py). The Frank-Wolfe stop threshold is(inner_min_decrease * max(||b||, 1e-12))**2wherebis the V^½-scaled treated predictor vector — scale-aware so convergence is meaningful at any data magnitude. 1e-5 reproduces RSynth’s donor weights to ~1e-4 on the Basque benchmark while still signalling convergence; tighter values (e.g. 1e-6) can exhaustinner_max_iter.standardize ({"std", "none"}, default "std") – Predictor standardization.
"std"divides each predictor row by its standard deviation across donors+treated (ddof=1), matching RSynth."none"is a deviation from R (see REGISTRY).alpha (float, default 0.05) – Significance level recorded for downstream (placebo) inference.
seed (int, optional) – Seed for the multistart random (Dirichlet) starting points.
v_cv_t0 (int, optional) – Training/validation split index for
v_method="cv"only (positional into the pre-periods: training =pre[:v_cv_t0], validation =pre[v_cv_t0:]). Must leave at least 1 training and 1 validation pre-period. Default None →len(pre_periods) // 2(Abadie 2021’st0 = T0/2). Must be None unlessv_method="cv".
Methods
fit(data, outcome, treatment, unit, time, *)Fit the classic synthetic control model.
get_params()Get estimator parameters.
set_params(**params)Set estimator parameters.
- __init__(v_method='nested', custom_v=None, optimizer_options=None, n_starts=4, inner_max_iter=10000, inner_min_decrease=1e-05, standardize='std', alpha=0.05, seed=None, v_cv_t0=None)[source]
- set_params(**params)[source]
Set estimator parameters.
Applies updates transactionally: if
_validate_config()rejects the post-update state, the instance is rolled back to its pre-call values so a raisedValueErrorleaves the object consistent.- Return type:
- fit(data, outcome, treatment, unit, time, *, post_periods=None, treated_unit=None, predictors=None, predictors_op='mean', predictor_window=None, special_predictors=None, pre_period_outcomes=None, donor_pool=None, survey_design=None)[source]
Fit the classic synthetic control model.
- Parameters:
data (pandas.DataFrame) – Balanced panel.
outcome (str) – Column names.
treatmentis the ABSORBING treatment indicator (0/1): 1 for the treated unit in its treated periods, 0 otherwise.treatment (str) – Column names.
treatmentis the ABSORBING treatment indicator (0/1): 1 for the treated unit in its treated periods, 0 otherwise.unit (str) – Column names.
treatmentis the ABSORBING treatment indicator (0/1): 1 for the treated unit in its treated periods, 0 otherwise.time (str) – Column names.
treatmentis the ABSORBING treatment indicator (0/1): 1 for the treated unit in its treated periods, 0 otherwise.post_periods (list, optional) – Explicit post-treatment period values. If None, inferred from the treated unit’s treatment column (the D==1 periods).
treated_unit (Any, optional) – Identifier of the treated unit. If None, inferred as the single ever-treated unit.
predictors (list of str, optional) – Columns averaged over
predictor_window(usingpredictors_op) to form predictor rows.predictors_op ({"mean", "sum"}, default "mean") – Aggregation operator for
predictors(linear combinations only, per ADH 2010 §2.3).predictor_window (list, optional) – Pre-periods over which
predictorsare averaged. Defaults to all pre periods. Must be a non-empty subset of the pre periods.special_predictors (list of (var, periods, op), optional) – Per-variable special predictors, each averaged over its own periods with its own operator (mirrors R
Synthspecial.predictors).pre_period_outcomes ("all" or list, optional) – Use individual pre-period outcomes as predictor rows (“all” = every pre period). When no predictor arguments are given at all, defaults to all pre-period outcomes.
donor_pool (list, optional) – Explicit donor unit identifiers (must be never-treated). Defaults to all never-treated units.
survey_design (optional) – Not yet supported — raises
NotImplementedErrorif provided.
- Return type:
SyntheticControlResults#
Results container for synthetic control estimation.
- class diff_diff.SyntheticControlResults[source]
Bases:
objectResults from a classic Synthetic Control Method (SCM) estimation.
Implements Abadie, Diamond & Hainmueller (2010), “Synthetic Control Methods for Comparative Case Studies.” A single treated unit’s counterfactual is the convex combination
Σ_j w_j · Y_jtof donor units chosen to match the treated unit’s pre-period outcomes and predictors; the treatment effect path is the gapα̂_1t = Y_1t − Σ_j w_j · Y_jtover the post periods.- att
Average post-period gap (the reported point estimate). The per-period gaps are in
gap_path.- Type:
- se
Always NaN — classic SCM has no analytical standard error (inference is permutation/placebo based; see Abadie-Diamond-Hainmueller 2010 §2.4).
- Type:
- t_stat, p_value
Always NaN (no analytical SE).
- Type:
- n_obs
Number of observations (treated + donor rows over all periods) used.
- Type:
- n_donors
Number of donor units in the (post-filter) donor pool.
- Type:
- n_pre_periods
Number of pre-treatment periods.
- Type:
- n_post_periods
Number of post-treatment periods.
- Type:
- donor_weights
Mapping
{donor_unit_id: weight}on the unit simplex. Weights below the interpretability floor (1e-6) are dropped.- Type:
- v_weights
Mapping
{predictor_label: v}— the diagonal predictor-importance matrix V, trace-normalized to sum to 1. On the degenerate single-donor path (one donor forcesw=[1]) V is unidentified — every V yields the same synthetic — sov_weightsis uniform for everyv_method(includingcv/inverse_variance), with aUserWarningemitted at fit time.- Type:
- predictor_balance
Predictor-balance table: for each predictor, the treated value, the synthetic value (donor-weighted), and the donor-pool mean. Under
v_method="cv"the reporteddonor_weightscome from the ADH-2015 step-4 refit on the validation-window re-aggregated predictors, so thetreated/synthetic/donor_meanvalues are reported on that same validation-window basis (each spec re-aggregated overpre[v_cv_t0:]) — the row’spredictorlabel remains the full spec identity, so it stays aligned withv_weights. For every otherv_methodthe values are the full-pre-period predictor aggregates.- Type:
- gap_path
Mapping
{period: gap}for ALL periods (pre periods carry the fit residual used forpre_rmspe; post periods carry the effect path).- Type:
- pre_rmspe
Root mean squared prediction error over the pre-treatment periods (the primary fit diagnostic).
- Type:
- mspe_v
The outer-objective value of the selected
V: the pre-period outcome MSPE ofW*(V*)underv_method="nested", or the held-out validation-window outcome MSPE underv_method="cv"(the CV selection criterion). None when there is no outer search — thev_method="custom"and"inverse_variance"paths and the degenerate single-donor path. Not comparable acrossv_methodvalues (different objective windows).- Type:
float, optional
- treated_unit
The treated unit’s identifier.
- Type:
Any
- pre_periods, post_periods
Calendar-sorted pre / post period values.
- Type:
- v_method
"nested"(data-driven V),"custom"(user-supplied V),"cv"(out-of-sample cross-validation V), or"inverse_variance"(closed-form1/Var(X)V).- Type:
- v_cv_t0
The training/validation split index actually used under
v_method="cv"(the resolved value — equalsn_pre_periods // 2when the constructor’sv_cv_t0was None). None for every otherv_method. Survives pickling.- Type:
int, optional
- standardize
"std"(per-row SD scaling) or"none".- Type:
- alpha
Significance level recorded for downstream (placebo) inference.
- Type:
- rmspe_ratio
The treated unit’s post/pre RMSPE ratio =
sqrt(MSPE_post / MSPE_pre)— the in-space placebo test statistic (ADH 2010 §2.4), computed at fit time.- Type:
- placebo_p_value
In-space placebo permutation p-value (
rank / (n_placebos + 1)), NaN untilin_space_placebo()is run. SEPARATE from the (always-NaN) analyticalp_value;is_significantstays bound top_value.- Type:
- n_placebos, n_failed
Donor placebos that entered the permutation reference set / were excluded for non-convergence. Both 0 until
in_space_placebo()is run.- Type:
- survey_metadata
Reserved; always None in this release.
- Type:
Any, optional
- Significance for classic SCM comes from :meth:`in_space_placebo` (opt-in
- in-space placebo permutation inference); :meth:`get_placebo_df` returns the
- per-unit RMSPE-ratio table used for the rank.
Methods
in_space_placebo([n_starts])In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010, Section 2.4).
get_placebo_df()Get the in-space placebo distribution as a DataFrame (one row per unit).
leave_one_out([n_starts])Leave-one-out donor robustness (Abadie-Diamond-Hainmueller 2015, Section 4).
get_leave_one_out_df()Get the leave-one-out donor-robustness table (see
leave_one_out()).get_leave_one_out_gaps()Long-form leave-one-out gap paths, for the overlay ("spaghetti") plot.
in_time_placebo([placebo_periods, n_starts])In-time (backdating) placebo (Abadie-Diamond-Hainmueller 2015, Section 4).
get_in_time_placebo_df()Get the in-time placebo table (see
in_time_placebo()).get_in_time_placebo_gaps()Long-form in-time placebo gap paths, for the backdating overlay plot.
summary([alpha])Generate a formatted summary of the estimation results.
print_summary([alpha])Print the summary to stdout.
to_dict()Convert scalar results to a dictionary.
to_dataframe()Convert scalar results to a single-row pandas DataFrame.
get_gap_df()Get the gap (effect) path as a DataFrame, in calendar order.
get_weights_df()Get donor weights as a DataFrame, sorted by weight descending.
- att: float
- se: float
- t_stat: float
- p_value: float
- n_obs: int
- n_donors: int
- n_pre_periods: int
- n_post_periods: int
- predictor_balance: DataFrame
- pre_rmspe: float
- treated_unit: Any
- v_method: str
- standardize: str
- alpha: float = 0.05
- placebo_p_value: float = nan
- rmspe_ratio: float = nan
- n_placebos: int = 0
- n_failed: int = 0
- __getstate__()[source]
Exclude panel-derived internal state from pickling.
_fit_snapshotretains the full treated+donor panel and_placebo_gapsthe per-unit gap paths — both panel-derived, a privacy/size hazard if the pickle is sent elsewhere. The scalar placebo fields (placebo_p_value,rmspe_ratio,n_placebos,n_failed) and the small_placebo_dfaggregate table survive. An unpickled result keeps all public fields; a diagnostic call that needs the snapshot (in_space_placebo) then raises a ValueError directing the user to re-fit. MirrorsSyntheticDiDResults.
- property coef_var: float
SE / abs(ATT). NaN here (SE is always NaN).
- Type:
Coefficient of variation
- property is_significant: bool
Always False — classic SCM produces no analytical p-value.
- property significance_stars: str
Significance stars based on p-value (empty here — p_value is NaN).
- summary(alpha=None)[source]
Generate a formatted summary of the estimation results.
- print_summary(alpha=None)[source]
Print the summary to stdout.
- Parameters:
alpha (float | None)
- Return type:
None
- to_dict()[source]
Convert scalar results to a dictionary.
- Returns:
Dictionary of the scalar estimation results (weights/balance/gaps are available via the
get_*_dfaccessors).- Return type:
Dict[str, Any]
- to_dataframe()[source]
Convert scalar results to a single-row pandas DataFrame.
- Return type:
- get_gap_df()[source]
Get the gap (effect) path as a DataFrame, in calendar order.
Rebuilt period-keyed from
gap_pathusing the canonicalpre_periods + post_periodsorder so the row order is independent of any dict-insertion order. Columns:period,gap,phase.- Return type:
- get_weights_df()[source]
Get donor weights as a DataFrame, sorted by weight descending.
- Returns:
Columns:
unit,weight.- Return type:
- get_placebo_df()[source]
Get the in-space placebo distribution as a DataFrame (one row per unit).
This is a per-unit SUMMARY table (one row per unit), enough to reproduce the permutation rank and a ratio-distribution plot — NOT the per-period placebo gap paths needed for the classic “spaghetti” plot (those are retained internally on
_placebo_gapsfor the successful placebos). Columns:unit,pre_mspe,post_mspe,rmspe_ratio,is_treated,status("treated"/"placebo"/"failed"). The treated unit is always present as a singleis_treated=True, status="treated"row (its ratio is the original J-donor fit). After a placebo run that produced a reference set (>= 2donors AND a converged treated fit), the table hasn_donors + 1rows — every donor appears, including those whose refit did not converge (status="failed"with NaN metrics, excluded from the rank). In the degenerate / fail-closed cases (fewer than 2 donors, or a treated fit that did not converge) the placebo loop does not run, so only the treated row is returned.Populated by
in_space_placebo(); the summary table is retained on pickling, so it is still returned after a round-trip. Before any placebo run — including on an unpickled result that never ran one — only the treated row is returned.- Return type:
- in_space_placebo(n_starts=None)[source]
In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010, Section 2.4).
Reassigns the treatment to each donor in turn, re-estimates a synthetic control for that pseudo-treated donor against the OTHER donors, and ranks the real treated unit’s post/pre RMSPE ratio among all units. Populates
placebo_p_value,n_placebosandn_failedon this object (rmspe_ratio— the treated unit’s own ratio — is set at fit time) and returns the placebo distribution viaget_placebo_df().The real treated unit is excluded from every placebo’s donor pool: its post-period outcome is treatment-contaminated, so allowing a placebo to load weight on it would bias the placebo gap. The ranking set is therefore the
J+1units{treated} ∪ {J placebos}, with each placebo fit against the otherJ-1donors (this matches the standardSCtools::generate.placebosconstruction). The post/pre RMSPE ratio normalizes by pre-treatment fit, which obviates the pre-fit-cutoff filtering of ADH Figures 5-7 (journal p. 502), so no pre-fit filter is offered — every converged placebo enters the rank.The permutation
placebo_p_valueis intentionally distinct fromp_value(which stays NaN — classic SCM has no analytical SE) and fromis_significant(which also stays bound to the NaNp_value).A placebo is excluded from the reference set (counted in
n_failed) when its fit is not a valid optimum — EITHER its inner Frank-Wolfe weight solve did not converge (a truncatedWis unusable) OR its outerVsearch did not converge (an under-optimizedVfits the pre-period worse, shrinking its RMSPE ratio and biasing the permutation p-value anti-conservatively). Each placebo refit inherits the original fit’s ``optimizer_options`` / ``n_starts``, so valid inference requires settings adequate for the outerVsearch to converge: production defaults do; with cheap settings, raisen_startshere or re-fit with a largeroptimizer_options['maxiter'](otherwise placebos are dropped as failed). The treated unit’s own fit is held to the same standard — if its inner OR outer search did not converge, the whole run fails closed (see below).- Parameters:
n_starts (int, optional) – Override the multistart count for each placebo’s outer V search (nested/cv). Default None inherits the original fit’s
n_starts. The placebo loop is the cost driver (one outer V search per donor); lower it for a faster, coarser scan.- Returns:
The placebo distribution (see
get_placebo_df()).- Return type:
- Raises:
ValueError – If the fit snapshot is unavailable (e.g. this result was unpickled).
- leave_one_out(n_starts=None)[source]
Leave-one-out donor robustness (Abadie-Diamond-Hainmueller 2015, Section 4).
Drops each reportably-weighted donor, one at a time, and re-fits the treated unit’s synthetic control against the remaining donor pool. The per-drop ATTs reveal whether the estimated effect is driven by any single donor (ADH 2015 overlay the leave-one-out counterfactual trajectories for this purpose;
get_leave_one_out_gaps()returns those paths). This is a thin re-run of the validated SCM solver — it has no analytical standard error;se/t_stat/p_value/conf_intandis_significantare unaffected (still bound to the NaN analyticalp_value).The drop set is exactly the donors in
donor_weights— those above the1e-6interpretability floor (synthetic_control._MIN_REPORT_WEIGHT). A donor with negligible weight0 < w ≤ 1e-6is excluded (its removal moves the ATT by ~the weight, so itsdelta_attwould be ~0 — an uninformative row), keeping the LOO table aligned with the reported support; a zero-weight donor’s removal leaves the synthetic unchanged. (This 1e-6 approximation of “positive weight” is documented in REGISTRY §SyntheticControl.) A donor that carries ALL the weight is still dropped (the others absorb its mass on re-fit); its largedelta_attis exactly the single-donor-dependence signal this diagnostic exists to surface, NOT a failure.- Parameters:
n_starts (int, optional) – Override the multistart count for each leave-one-out refit’s outer V search (nested/cv). Default None inherits the original fit’s
n_starts.- Returns:
One
status="baseline"row (the full fit,delta_att=0) followed by one row per dropped donor (status="loo", or"failed"with NaN metrics when its refit did not converge), sorted by|delta_att|descending (failed rows last). Columns:dropped_unit,att,pre_rmspe,post_rmspe,rmspe_ratio,delta_att(att_loo - full_att),status.- Return type:
- Raises:
ValueError – If the fit snapshot is unavailable (e.g. this result was unpickled).
- get_leave_one_out_df()[source]
Get the leave-one-out donor-robustness table (see
leave_one_out()).Survives pickling. Raises if
leave_one_out()has not been run.- Return type:
- get_leave_one_out_gaps()[source]
Long-form leave-one-out gap paths, for the overlay (“spaghetti”) plot.
One row per (dropped donor, period) for every converged leave-one-out refit. Columns:
dropped_unit,period,gap,phase("pre"/"post") — mirroringget_gap_df(). These per-period paths are panel-derived and are NOT retained after pickling.- Return type:
- Raises:
ValueError – If
leave_one_out()has not been run, or if the gap paths were dropped on pickling (re-fit and re-run to recompute them).
- in_time_placebo(placebo_periods=None, n_starts=None)[source]
In-time (backdating) placebo (Abadie-Diamond-Hainmueller 2015, Section 4).
Reassigns the intervention to an earlier pre-treatment date
t_fand re-fits the synthetic control using ONLY pre-t_finformation, then measures the “effect” over the held-out window[t_f, T0). A credible synthetic control should show no spurious gap there (ADH 2015 Figure 4, German reunification backdated to 1975). This is a thin re-run of the validated SCM solver — it has no analytical standard error;se/t_stat/p_value/conf_intandis_significantare unaffected.Windowing convention (TRUNCATE). The placebo fit uses only periods strictly before
t_f: pre-period-outcome predictors become the pre-t_foutcomes, and covariate / special predictor windows are intersected with the pre-t_fwindow. A predictor window lying ENTIRELY in the held-out region[t_f, T0)is dropped (surfaced inn_dropped_specs+ an aggregated warning). For outcome-predictor fits this equals the literal “lag the predictors” re-run of a manualSynth::synth(R has no in-time-placebo function); seedocs/methodology/REGISTRY.mdfor the recognized deviation note.- Parameters:
placebo_periods (period value or list of period values, optional) – The pseudo-intervention date(s), each a member of
pre_periods. Default None sweeps every feasible interior pre-date (at least 2 pre-fake periods to fit + at least 1 post-fake period to measure the gap). A date that is a true post-treatment period, or not a pre-period at all, raisesValueError; a valid pre-date that is dimensionally infeasible (too few pre-fake periods, or all predictors dropped) yields astatus="infeasible"row (no raise).n_starts (int, optional) – Override the multistart count for each placebo refit’s outer V search (nested/cv). Default None inherits the original fit’s
n_starts.
- Returns:
One row per placebo date. Columns:
placebo_period,placebo_att(mean gap over the held-out window — should be ~0 if no real pre-period effect),pre_fit_rmspe,rmspe_ratio(post-fake/pre-fake),n_pre_fake,n_post_fake,n_dropped_specs,status("ran"/"infeasible"/"failed").- Return type:
- Raises:
ValueError – If the fit snapshot is unavailable (e.g. this result was unpickled), or an explicit
placebo_periodsentry is a post-treatment period / not a pre-period.
- get_in_time_placebo_df()[source]
Get the in-time placebo table (see
in_time_placebo()).Survives pickling. Raises if
in_time_placebo()has not been run.- Return type:
- get_in_time_placebo_gaps()[source]
Long-form in-time placebo gap paths, for the backdating overlay plot.
One row per (placebo date, period) for every converged in-time refit. Columns:
placebo_period,period,gap,phase("pre_fake"for periods before the placebo date,"post_fake"for the held-out window from it on). These per-period paths are panel-derived and are NOT retained after pickling.- Return type:
- Raises:
ValueError – If
in_time_placebo()has not been run, or if the gap paths were dropped on pickling (re-fit and re-run to recompute them).
- __init__(att, se, t_stat, p_value, conf_int, n_obs, n_donors, n_pre_periods, n_post_periods, donor_weights, v_weights, predictor_balance, gap_path, pre_rmspe, treated_unit, pre_periods, post_periods, v_method, standardize, alpha=0.05, mspe_v=None, v_cv_t0=None, survey_metadata=None, placebo_p_value=nan, rmspe_ratio=nan, n_placebos=0, n_failed=0)
- Parameters:
att (float)
se (float)
t_stat (float)
p_value (float)
n_obs (int)
n_donors (int)
n_pre_periods (int)
n_post_periods (int)
predictor_balance (DataFrame)
pre_rmspe (float)
treated_unit (Any)
v_method (str)
standardize (str)
alpha (float)
mspe_v (float | None)
v_cv_t0 (int | None)
survey_metadata (Any | None)
placebo_p_value (float)
rmspe_ratio (float)
n_placebos (int)
n_failed (int)
- Return type:
None
Convenience Function#
- diff_diff.synthetic_control(data, outcome, treatment, unit, time, **kwargs)[source]#
Convenience function for classic synthetic control estimation.
Constructor-only keyword arguments (
v_method—"nested"/"custom"/"cv"/"inverse_variance"—custom_v,v_cv_t0,n_starts,standardize,alpha,seed,optimizer_options,inner_max_iter,inner_min_decrease) andfitkeyword arguments (post_periods,treated_unit,predictors,special_predictors, …) may both be passed via**kwargs.Examples
>>> from diff_diff import synthetic_control >>> res = synthetic_control(data, "y", "treated", "unit", "year", ... predictors=["x1", "x2"]) >>> print(f"ATT: {res.att:.3f}, pre-RMSPE: {res.pre_rmspe:.3f}")
Predictors and V selection#
Predictor rows of X1 (treated) / X0 (donors) are built, in this canonical row
order (the ordering matches R Synth::dataprep), from:
Argument |
Meaning |
|---|---|
|
Columns averaged over a pre-period window (default: all pre periods). |
|
|
|
Individual pre-period outcomes as predictor rows ( |
v_method="nested" selects the diagonal predictor-importance matrix V by minimizing
the pre-period outcome MSPE of W*(V) over a multistart Nelder-Mead search with a
derivative-free Powell polish. v_method="cv" selects V by out-of-sample
cross-validation (Abadie-Diamond-Hainmueller 2015; Abadie 2021): the pre-period is split
at v_cv_t0 (default len(pre)//2, i.e. t0 = T0/2) into a training and a validation
window; V is chosen to minimize the validation-window outcome MSPE of the training-fit
weights, then the final weights are re-estimated on the validation-window predictors. Each
predictor is re-aggregated over each window (a separate dataprep per window, as
ADH 2015’s CV does), so it must span both windows — the default per-period outcome lags
(single-period) are rejected; pass spanning covariate / multi-period special_predictors
(see docs/methodology/REGISTRY.md §SyntheticControl).
v_method="inverse_variance" uses the closed-form v_h = 1/Var(X_h) (variance over
donors+treated; no search), applied to the raw predictors — it intentionally bypasses
standardize (inverse-variance weighting is the unit-variance rescaling). v_method="custom" takes a user-supplied custom_v
(one entry per predictor row, trace-normalized) and skips the outer search. v_cv_t0
must be None unless v_method="cv".
Note
The predictor standardization (per-row SD over donors+treated, ddof=1) and the
optimizer are pinned from the R Synth source — they are not specified in
Abadie-Diamond-Hainmueller (2010). The outer objective uses all pre periods rather than
R’s time.optimize.ssr window, so the nested V differs from R by an
efficiency-only choice. Predictor/outcome aggregation also fails closed on any
non-finite cell, whereas R dataprep uses na.rm=TRUE — restrict
predictor_window / special_predictors periods to where a variable is observed.
Predictor rows support only equal-weight linear combinations (mean, sum,
per-period lags); ADH (2010) §2.3’s general weighted form Σ_s k_s Y_is with
arbitrary k_s (and non-linear ops such as median) is not accepted in this
release. See docs/methodology/REGISTRY.md §SyntheticControl for all deviation labels.
Example Usage#
Basic usage with covariate and special predictors:
from diff_diff import SyntheticControl
scm = SyntheticControl(v_method="nested", seed=0)
results = scm.fit(
data,
outcome="gdpcap",
treatment="treated", # absorbing 0/1 indicator
unit="region",
time="year",
predictors=["invest", "school.high"],
# Set predictor_window explicitly when a covariate is only observed on a
# subset of the pre periods — the default averages over ALL pre periods and
# fails closed if any selected cell is non-finite.
predictor_window=[1964, 1965, 1966, 1967, 1968, 1969],
special_predictors=[("gdpcap", [1960, 1965, 1969], "mean")],
)
results.print_summary()
# Effect path and donor weights
gap_df = results.get_gap_df() # period, gap, phase
weights_df = results.get_weights_df() # unit, weight (descending)
Quick estimation with the convenience function:
from diff_diff import synthetic_control
results = synthetic_control(
data, outcome="gdpcap", treatment="treated",
unit="region", time="year",
)
print(f"ATT: {results.att:.3f}, pre-RMSPE: {results.pre_rmspe:.3f}")
In-space placebo permutation inference (opt-in; refits one synthetic control per donor):
placebo_df = results.in_space_placebo() # reassigns treatment to each donor
print(f"placebo p-value: {results.placebo_p_value:.3f} "
f"(n_placebos={results.n_placebos})") # p = rank/(n_placebos+1)
print(placebo_df) # per-unit RMSPE-ratio table used for the permutation rank
Supplying a fixed predictor-importance matrix (skips the outer V search):
import numpy as np
scm = SyntheticControl(v_method="custom", custom_v=np.ones(n_predictors))
results = scm.fit(data, outcome="gdpcap", treatment="treated",
unit="region", time="year", predictors=["invest"])
Comparison with Synthetic DiD#
Feature |
SyntheticControl |
SyntheticDiD |
|---|---|---|
Unit (donor) weights |
Simplex, predictor-importance weighted |
Simplex, ridge-regularized |
Time weights |
None (level matching) |
Simplex (double difference) |
Predictor-importance |
Nested / cv / inverse-variance / custom diagonal |
No analog |
Inference |
Placebo permutation (no analytical SE) |
Bootstrap / jackknife / placebo |
Use SCM for a single treated unit with a long pre-period and a curated donor pool; use SDID when you have several treated units and parallel trends is plausible.