diff_diff.SunAbraham#
- class diff_diff.SunAbraham[source]#
Bases:
objectSun-Abraham (2021) interaction-weighted estimator for staggered DiD.
This estimator provides event-study coefficients using a saturated TWFE regression with cohort × relative-time interactions, following the methodology in Sun & Abraham (2021).
The estimation procedure follows three steps: 1. Run a saturated TWFE regression with cohort × relative-time dummies 2. Compute cohort shares (weights) at each relative time 3. Aggregate cohort-specific effects using interaction weights
This avoids the negative weighting problem of standard TWFE and provides consistent event-study estimates under treatment effect heterogeneity.
- Parameters:
control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units
anticipation (int, default=0) – Number of periods before treatment where effects may occur.
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default — UNLESS
vcov_typeis explicitly set to"hc2"or"classical", in which case the unit auto-cluster is dropped (both are one-way families and the linalg validator rejects them withcluster_ids). Usevcov_type="hc1"(default) orvcov_type="hc2_bm"for cluster-robust inference; the latter routes to CR2 Bell-McCaffrey at the cluster level.n_bootstrap (int, default=0) – Number of bootstrap iterations for inference. If 0, uses analytical cluster-robust standard errors.
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning
vcov_type ({"classical", "hc1", "hc2", "hc2_bm", "conley"}, default "hc1") –
Variance-covariance family for analytical inference. Defaults to
"hc1"(preserves prior behavior bit-equally; SA historically hard-coded HC1)."conley"(Conley 1999 spatial-HAC) threads theconley_*params through the within-transform saturated regression (conley_lag_cutoff=0= within-period spatial only;conley_lag_cutoff>0adds the within-unit Bartlett serial term — noteconley_time/conley_unitare always supplied, so this is the panel-aware path, not pooled cross-sectional); the unit auto-cluster is dropped (an explicitcluster=enables the spatial+cluster product kernel) andsurvey_design=/weights/n_bootstrap>0are rejected."classical": homoskedastic OLS standard errors. One-way only (linalg validator rejectsclassical + cluster_ids); the unit auto-cluster is dropped whenclassicalis explicitly opted into."hc1": Eicker-Huber-White HC1 finite-sample correction (default; cluster-robust whencluster=is set or the unit auto-cluster fires)."hc2": Eicker-Huber-White HC2 leverage correction. One-way only; the linalg validator rejects combininghc2with clusters. The unit auto-cluster is dropped whenhc2is explicitly opted into."hc2_bm": HC2 + Bell-McCaffrey CR2 Satterthwaite DOF for cluster-robust inference. Routes to CR2-BM at the cluster level; preserves the auto-cluster default.
When
vcov_type ∈ {"classical","hc2","hc2_bm"}, the saturated regression switches from the within-transform path to a full-dummy[intercept + interactions + covariates + unit_dummies + time_dummies]build. Forhc2andhc2_bm, the Frisch-Waugh-Lovell theorem preserves coefficients but NOT the hat matrix, so HC2 leverage and BM Satterthwaite DOF must be computed on the full FE projection.classicalalso routes through full-dummy so the(n-k)finite-sample correction ins² × (X'X)^{-1}matches R’slm()interpretation. Empirically matcheslm(...) + sandwich::vcovHC(type="HC2")andclubSandwich::vcovCR(..., type="CR2")at atol=1e-10."hc1"keeps the within-transform path (cluster-robust HC1 does not depend on the hat matrix); empirically close tofixest::sunab(cluster=~unit). See REGISTRY.md for the documented HC1 finite-sample-correction deviation.Survey designs (
survey_design=) are rejected forvcov_type ∈ {"classical","hc2","hc2_bm"}because the survey-design Taylor Series Linearization (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Usevcov_type="hc1"(default) for survey designs.conley(Conley-1999 spatial-HAC) is threaded through the within-transform saturated regression (passconley_coords/conley_cutoff_km/conley_lag_cutoff);survey_design=/weights/n_bootstrap>0are rejected. See thevcov_typeparameter docs above.
- results_#
Estimation results after calling fit().
- Type:
Examples
Basic usage:
>>> import pandas as pd >>> from diff_diff import SunAbraham >>> >>> # Panel data with staggered treatment >>> data = pd.DataFrame({ ... 'unit': [...], ... 'time': [...], ... 'outcome': [...], ... 'first_treat': [...] # 0 for never-treated ... }) >>> >>> sa = SunAbraham() >>> results = sa.fit(data, outcome='outcome', unit='unit', ... time='time', first_treat='first_treat') >>> results.print_summary()
With covariates:
>>> sa = SunAbraham() >>> results = sa.fit(data, outcome='outcome', unit='unit', ... time='time', first_treat='first_treat', ... covariates=['age', 'income'])
Notes
The Sun-Abraham estimator uses a saturated regression approach:
Y_it = α_i + λ_t + Σ_g Σ_e [δ_{g,e} × 1(G_i=g) × D_{it}^e] + X’γ + ε_it
where: - α_i = unit fixed effects - λ_t = time fixed effects - G_i = unit i’s treatment cohort (first treatment period) - D_{it}^e = indicator for being e periods from treatment - δ_{g,e} = cohort-specific effect (CATT) at relative time e
The event-study coefficients are then computed as:
β_e = Σ_g w_{g,e} × δ_{g,e}
where w_{g,e} is the share of cohort g in the treated population at relative time e (interaction weights).
Compared to Callaway-Sant’Anna: - SA uses saturated regression; CS uses 2x2 DiD comparisons - SA can be more efficient when model is correctly specified - Both are consistent under heterogeneous treatment effects - Running both provides a useful robustness check
References
Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175-199.
Methods
__init__([control_group, anticipation, ...])fit(data, outcome, unit, time, first_treat)Fit the Sun-Abraham estimator using saturated regression.
get_params()Get estimator parameters (sklearn-compatible).
print_summary()Print summary to stdout.
set_params(**params)Set estimator parameters (sklearn-compatible).
summary()Get summary of estimation results.
- __init__(control_group='never_treated', anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, seed=None, rank_deficient_action='warn', vcov_type='hc1', conley_coords=None, conley_cutoff_km=None, conley_metric='haversine', conley_kernel='bartlett', conley_lag_cutoff=None)[source]#
- Parameters:
- classmethod __new__(*args, **kwargs)#