diff_diff.SunAbraham#

class diff_diff.SunAbraham[source]#

Bases: object

Sun-Abraham (2021) interaction-weighted estimator for staggered DiD.

This estimator provides event-study coefficients using a saturated TWFE regression with cohort × relative-time interactions, following the methodology in Sun & Abraham (2021).

The estimation procedure follows three steps: 1. Run a saturated TWFE regression with cohort × relative-time dummies 2. Compute cohort shares (weights) at each relative time 3. Aggregate cohort-specific effects using interaction weights

This avoids the negative weighting problem of standard TWFE and provides consistent event-study estimates under treatment effect heterogeneity.

Parameters:

control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units
anticipation (int, default=0) – Number of periods before treatment where effects may occur.
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default — UNLESS vcov_type is explicitly set to "hc2" or "classical", in which case the unit auto-cluster is dropped (both are one-way families and the linalg validator rejects them with cluster_ids). Use vcov_type="hc1" (default) or vcov_type="hc2_bm" for cluster-robust inference; the latter routes to CR2 Bell-McCaffrey at the cluster level.
n_bootstrap (int, default=0) – Number of bootstrap iterations for inference. If 0, uses analytical cluster-robust standard errors.
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning
vcov_type ({"classical", "hc1", "hc2", "hc2_bm", "conley"}, default "hc1") –
Variance-covariance family for analytical inference. Defaults to "hc1" (preserves prior behavior bit-equally; SA historically hard-coded HC1). "conley" (Conley 1999 spatial-HAC) threads the conley_* params through the within-transform saturated regression (conley_lag_cutoff=0 = within-period spatial only; conley_lag_cutoff>0 adds the within-unit Bartlett serial term — note conley_time / conley_unit are always supplied, so this is the panel-aware path, not pooled cross-sectional); the unit auto-cluster is dropped (an explicit cluster= enables the spatial+cluster product kernel) and survey_design= / weights / n_bootstrap>0 are rejected.
- "classical": homoskedastic OLS standard errors. One-way only (linalg validator rejects classical + cluster_ids); the unit auto-cluster is dropped when classical is explicitly opted into.
- "hc1": Eicker-Huber-White HC1 finite-sample correction (default; cluster-robust when cluster= is set or the unit auto-cluster fires).
- "hc2": Eicker-Huber-White HC2 leverage correction. One-way only; the linalg validator rejects combining hc2 with clusters. The unit auto-cluster is dropped when hc2 is explicitly opted into.
- "hc2_bm": HC2 + Bell-McCaffrey CR2 Satterthwaite DOF for cluster-robust inference. Routes to CR2-BM at the cluster level; preserves the auto-cluster default.
When vcov_type ∈ {"classical","hc2","hc2_bm"}, the saturated regression switches from the within-transform path to a full-dummy [intercept + interactions + covariates + unit_dummies + time_dummies] build. For hc2 and hc2_bm, the Frisch-Waugh-Lovell theorem preserves coefficients but NOT the hat matrix, so HC2 leverage and BM Satterthwaite DOF must be computed on the full FE projection. classical also routes through full-dummy so the (n-k) finite-sample correction in s² × (X'X)^{-1} matches R’s lm() interpretation. Empirically matches lm(...) + sandwich::vcovHC(type="HC2") and clubSandwich::vcovCR(..., type="CR2") at atol=1e-10.

"hc1" keeps the within-transform path (cluster-robust HC1 does not depend on the hat matrix); empirically close to fixest::sunab(cluster=~unit). See REGISTRY.md for the documented HC1 finite-sample-correction deviation.

Survey designs (survey_design=) are rejected for vcov_type ∈ {"classical","hc2","hc2_bm"} because the survey-design Taylor Series Linearization (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use vcov_type="hc1" (default) for survey designs.

conley (Conley-1999 spatial-HAC) is threaded through the within-transform saturated regression (pass conley_coords / conley_cutoff_km / conley_lag_cutoff); survey_design= / weights / n_bootstrap>0 are rejected. See the vcov_type parameter docs above.

results_#

Estimation results after calling fit().

Type:: SunAbrahamResults

is_fitted_#

Whether the model has been fitted.

Type:: bool

Examples

Basic usage:

>>> import pandas as pd
>>> from diff_diff import SunAbraham
>>>
>>> # Panel data with staggered treatment
>>> data = pd.DataFrame({
...     'unit': [...],
...     'time': [...],
...     'outcome': [...],
...     'first_treat': [...]  # 0 for never-treated
... })
>>>
>>> sa = SunAbraham()
>>> results = sa.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat')
>>> results.print_summary()

With covariates:

>>> sa = SunAbraham()
>>> results = sa.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  covariates=['age', 'income'])

Notes

The Sun-Abraham estimator uses a saturated regression approach:

Y_it = α_i + λ_t + Σ_g Σ_e [δ_{g,e} × 1(G_i=g) × D_{it}^e] + X’γ + ε_it

where: - α_i = unit fixed effects - λ_t = time fixed effects - G_i = unit i’s treatment cohort (first treatment period) - D_{it}^e = indicator for being e periods from treatment - δ_{g,e} = cohort-specific effect (CATT) at relative time e

The event-study coefficients are then computed as:

β_e = Σ_g w_{g,e} × δ_{g,e}

where w_{g,e} is the share of cohort g in the treated population at relative time e (interaction weights).

Compared to Callaway-Sant’Anna: - SA uses saturated regression; CS uses 2x2 DiD comparisons - SA can be more efficient when model is correctly specified - Both are consistent under heterogeneous treatment effects - Running both provides a useful robustness check

References

Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175-199.

Methods

`__init__`([control_group, anticipation, ...])
`fit`(data, outcome, unit, time, first_treat)	Fit the Sun-Abraham estimator using saturated regression.
`get_params`()	Get estimator parameters (sklearn-compatible).
`print_summary`()	Print summary to stdout.
`set_params`(**params)	Set estimator parameters (sklearn-compatible).
`summary`()	Get summary of estimation results.

__init__(control_group='never_treated', anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, seed=None, rank_deficient_action='warn', vcov_type='hc1', conley_coords=None, conley_cutoff_km=None, conley_metric='haversine', conley_kernel='bartlett', conley_lag_cutoff=None)[source]#

Parameters:

control_group (str)
anticipation (int)
alpha (float)
cluster (str | None)
n_bootstrap (int)
seed (int | None)
rank_deficient_action (str)
vcov_type (str)
conley_coords (Tuple[str, str] | None)
conley_cutoff_km (float | None)
conley_metric (str)
conley_kernel (str)
conley_lag_cutoff (int | None)

classmethod __new__(*args, **kwargs)#