diff_diff.CallawaySantAnna#

class diff_diff.CallawaySantAnna[source]#

Bases: CallawaySantAnnaBootstrapMixin, CallawaySantAnnaAggregationMixin

Callaway-Sant’Anna (2021) estimator for staggered Difference-in-Differences.

This estimator handles DiD designs with variation in treatment timing (staggered adoption) and heterogeneous treatment effects. It avoids the bias of traditional two-way fixed effects (TWFE) estimators by:

Computing group-time average treatment effects ATT(g,t) for each cohort g (units first treated in period g) and time t.
Aggregating these to summary measures (overall ATT, event study, etc.) using appropriate weights.

Parameters:

control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units
anticipation (int, default=0) – Number of periods before treatment where effects may occur. Set to > 0 if treatment effects can begin before the official treatment date.
estimation_method (str, default="dr") – Estimation method: - “dr”: Doubly robust (recommended) - “ipw”: Inverse probability weighting - “reg”: Outcome regression
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. When set, the influence-function aggregator clusters at the named level via a synthesized SurveyDesign(psu=cluster_col) threaded through the existing PSU-meat machinery (_compute_stratified_psu_meat) and PSU-level multiplier bootstrap. When None (default), the aggregator uses per-unit IF variance (Williams 2000 form). When survey_design=SurveyDesign(psu=...) is also provided, the explicit PSU takes precedence; a UserWarning fires if the bare cluster= partition differs from the explicit PSU partition.
vcov_type (str, default="hc1") – Variance family. CallawaySantAnna accepts {"hc1"} only — hc1 means per-unit IF variance when cluster=None and CR1 Liang-Zeger on the IF when cluster=X is set. The analytical-sandwich families (classical, hc2, hc2_bm) and spatial-HAC (conley) are rejected at __init__ because CS’s per-(g,t) doubly-robust / IPW / outcome-regression structure has no single design matrix to compute hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF on. See REGISTRY.md “IF-based variance estimators vs analytical-sandwich estimators” for the structural taxonomy.
n_bootstrap (int, default=0) –
Number of bootstrap iterations for inference. If 0, uses analytical standard errors. Recommended: 999 or more for reliable inference.

Note

Memory Usage Bootstrap multiplier weights are generated and consumed one draw-block at a time (see diff_diff.bootstrap_chunking), so the full (n_bootstrap, n_units) weight matrix is never materialized. The live weight intermediate is bounded by roughly max(~256 MB, 8 * n_units) bytes – a block holds at least one full draw row – independent of n_bootstrap. Only the small bootstrap output arrays ((n_bootstrap, n_group_time) and (n_bootstrap,) per aggregation) stay fully in memory. Stratified survey designs are the current exception (the full PSU-weight matrix is built up front, but PSUs are few).
bootstrap_weights (str, default="rademacher") – Type of weights for multiplier bootstrap: - “rademacher”: +1/-1 with equal probability (standard choice) - “mammen”: Two-point distribution (asymptotically valid, matches skewness) - “webb”: Six-point distribution (recommended when n_clusters < 20)
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") –
Action when design matrix is rank-deficient (linearly dependent columns):
- ”warn”: Issue warning and drop linearly dependent columns (default)
- ”error”: Raise ValueError
- ”silent”: Drop columns silently without warning
base_period (str, default="varying") –
Method for selecting the base (reference) period for computing ATT(g,t). Base periods are selected positionally (by the nearest observed period in the sorted panel), matching R did::att_gt – so on gapped (non-consecutive) grids the base is the nearest observed period, not literal t-1 / g-1. The pre/post split is on the current period vs the cohort (t < g -> pre), independent of anticipation; anticipation only shifts the post/universal base. Options:
- ”varying”: pre-treatment (t < g) uses the immediately-preceding observed period as base; post-treatment uses the last observed pre-treatment period (largest observed p with p + anticipation < g).
- ”universal”: always uses that last observed pre-treatment period as base.
On consecutive grids these reduce to t-1 / g-1-anticipation. Both produce identical post-treatment effects. Matches R’s did::att_gt() on gapped panels (base selection, estimable ATT/SE cells, the "universal" zero reference cells, and all aggregations). See _select_base_period().
cband (bool, default=True) – Whether to compute simultaneous confidence bands (sup-t) for event study aggregation. Requires n_bootstrap > 0. When True, results include cband_crit_value and per-event-time cband_conf_int entries controlling family-wise error rate.
pscore_trim (float, default=0.01) – Trimming bound for propensity scores. Scores are clipped to [pscore_trim, 1 - pscore_trim] before weight computation in IPW and DR estimation. Must be in (0, 0.5).
panel (bool, default=True) – Whether the data is a balanced/unbalanced panel (units observed across multiple time periods). Set to False for stationary repeated cross-sections where each observation has a unique unit ID and units do not repeat across periods. Requires that the cross-sectional samples are drawn from the same population in each period (stationarity). Uses cross-sectional DRDID (Sant’Anna & Zhao 2020, Section 4) with per-observation influence functions.
allow_unbalanced_panel (bool, default=False) – When True and the input panel is unbalanced (some units are not observed in every period), route the pooled observations through the repeated-cross-section levels estimator (matching R did::att_gt(allow_unbalanced_panel=TRUE) / DRDID::reg_did_rc) instead of within-cell panel differencing, and cluster the influence function by unit for the standard error. Inert on a balanced panel (results are byte-identical to the default). When False (default) an unbalanced panel is handled by within-cell differencing and a UserWarning is emitted. ATT matches R bit-for-bit; the SE matches up to the documented CR1 sqrt(G/(G-1)) finite-sample factor. survey_design= combined with this flag raises NotImplementedError.
epv_threshold (float, default=10) – Events Per Variable threshold for propensity score logit. When the ratio of minority-class observations to predictor variables (excluding intercept) falls below this value, a warning is emitted (or ValueError raised if rank_deficient_action="error"). Based on Peduzzi et al. (1996). Only applies to IPW and DR estimation methods. Use diagnose_propensity() for a pre-estimation check across all cohorts.
pscore_fallback (str, default="error") –
Action when propensity score estimation fails entirely (LinAlgError or ValueError from IRLS):
- ”error”: Raise the exception (default). Ensures the user is aware of estimation failures.
- ”unconditional”: Fall back to unconditional propensity with a warning. For IPW, this drops all covariates. For DR, the propensity model becomes unconditional but outcome regression still uses covariates.
When rank_deficient_action="error", errors are always re-raised regardless of this setting.

results_#

Estimation results after calling fit().

Type:: CallawaySantAnnaResults

is_fitted_#

Whether the model has been fitted.

Type:: bool

Examples

Basic usage:

>>> import pandas as pd
>>> from diff_diff import CallawaySantAnna
>>>
>>> # Panel data with staggered treatment
>>> # 'first_treat' = period when unit was first treated (0 if never treated)
>>> data = pd.DataFrame({
...     'unit': [...],
...     'time': [...],
...     'outcome': [...],
...     'first_treat': [...]  # 0 for never-treated, else first treatment period
... })
>>>
>>> cs = CallawaySantAnna()
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat')
>>>
>>> results.print_summary()

With event study aggregation:

>>> cs = CallawaySantAnna()
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  aggregate='event_study')
>>>
>>> # Plot event study
>>> from diff_diff import plot_event_study
>>> plot_event_study(results)

With covariate adjustment (conditional parallel trends):

>>> # When parallel trends only holds conditional on covariates
>>> cs = CallawaySantAnna(estimation_method='dr')  # doubly robust
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  covariates=['age', 'income'])
>>>
>>> # DR is recommended: consistent if either outcome model
>>> # or propensity model is correctly specified

Notes

The key innovation of Callaway & Sant’Anna (2021) is the disaggregated approach: instead of estimating a single treatment effect, they estimate ATT(g,t) for each cohort-time pair. This avoids the “forbidden comparison” problem where already-treated units act as controls.

The ATT(g,t) is identified under parallel trends conditional on covariates:

E[Y(0)_t - Y(0)_g-1 | G=g] = E[Y(0)_t - Y(0)_g-1 | C=1]

where G=g indicates treatment cohort g and C=1 indicates control units. This uses g-1 as the base period, which applies to post-treatment (t >= g). With base_period=”varying” (default), pre-treatment uses the immediately- preceding observed period as base for the consecutive comparisons useful in parallel trends diagnostics. Base periods are selected positionally (nearest observed period), matching R did::att_gt on gapped grids (see _select_base_period).

References

Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.

Methods

`__init__`([control_group, anticipation, ...])
`diagnose_propensity`(df, outcome, unit, time, ...)	Check Events Per Variable (EPV) across all cohorts without estimation.
`fit`(data, outcome, unit, time, first_treat)	Fit the Callaway-Sant'Anna estimator.
`get_params`()	Get estimator parameters (sklearn-compatible).
`print_summary`()	Print summary to stdout.
`set_params`(**params)	Set estimator parameters (sklearn-compatible).
`summary`()	Get summary of estimation results.

Attributes

`n_bootstrap`
`bootstrap_weights`
`alpha`
`seed`
`anticipation`
`base_period`

__init__(control_group='never_treated', anticipation=0, estimation_method='dr', alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights=None, seed=None, rank_deficient_action='warn', base_period='varying', cband=True, pscore_trim=0.01, panel=True, allow_unbalanced_panel=False, epv_threshold=10, pscore_fallback='error', vcov_type='hc1')[source]#

Parameters:

control_group (str)
anticipation (int)
estimation_method (str)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str | None)
seed (int | None)
rank_deficient_action (str)
base_period (str)
cband (bool)
pscore_trim (float)
panel (bool)
allow_unbalanced_panel (bool)
epv_threshold (float)
pscore_fallback (str)
vcov_type (str)

classmethod __new__(*args, **kwargs)#