diff_diff.CallawaySantAnna

class diff_diff.CallawaySantAnna[source]

Bases: CallawaySantAnnaBootstrapMixin, CallawaySantAnnaAggregationMixin

Callaway-Sant’Anna (2021) estimator for staggered Difference-in-Differences.

This estimator handles DiD designs with variation in treatment timing (staggered adoption) and heterogeneous treatment effects. It avoids the bias of traditional two-way fixed effects (TWFE) estimators by:

Computing group-time average treatment effects ATT(g,t) for each cohort g (units first treated in period g) and time t.
Aggregating these to summary measures (overall ATT, event study, etc.) using appropriate weights.

Parameters:

control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units
anticipation (int, default=0) – Number of periods before treatment where effects may occur. Set to > 0 if treatment effects can begin before the official treatment date.
estimation_method (str, default="dr") – Estimation method: - “dr”: Doubly robust (recommended) - “ipw”: Inverse probability weighting - “reg”: Outcome regression
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. Defaults to unit-level clustering.
n_bootstrap (int, default=0) –
Number of bootstrap iterations for inference. If 0, uses analytical standard errors. Recommended: 999 or more for reliable inference.

Note

Memory Usage The bootstrap stores all weights in memory as a (n_bootstrap, n_units) float64 array. For large datasets, this can be significant: - 1K bootstrap × 10K units = ~80 MB - 10K bootstrap × 100K units = ~8 GB Consider reducing n_bootstrap if memory is constrained.
bootstrap_weights (str, default="rademacher") – Type of weights for multiplier bootstrap: - “rademacher”: +1/-1 with equal probability (standard choice) - “mammen”: Two-point distribution (asymptotically valid, matches skewness) - “webb”: Six-point distribution (recommended when n_clusters < 20)
bootstrap_weight_type (str, optional) –

Deprecated since version 1.0.1: Use bootstrap_weights instead. Will be removed in v3.0.
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning
base_period (str, default="varying") –
Method for selecting the base (reference) period for computing ATT(g,t). Options: - “varying”: For pre-treatment periods (t < g - anticipation), use

t-1 as base (consecutive comparisons). For post-treatment, use g-1-anticipation. Requires t-1 to exist in data.
- ”universal”: Always use g-1-anticipation as base period.
Both produce identical post-treatment effects. Matches R’s did::att_gt() base_period parameter.

results_

Estimation results after calling fit().

Type:: CallawaySantAnnaResults

is_fitted_

Whether the model has been fitted.

Type:: bool

Examples

Basic usage:

>>> import pandas as pd
>>> from diff_diff import CallawaySantAnna
>>>
>>> # Panel data with staggered treatment
>>> # 'first_treat' = period when unit was first treated (0 if never treated)
>>> data = pd.DataFrame({
...     'unit': [...],
...     'time': [...],
...     'outcome': [...],
...     'first_treat': [...]  # 0 for never-treated, else first treatment period
... })
>>>
>>> cs = CallawaySantAnna()
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat')
>>>
>>> results.print_summary()

With event study aggregation:

>>> cs = CallawaySantAnna()
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  aggregate='event_study')
>>>
>>> # Plot event study
>>> from diff_diff import plot_event_study
>>> plot_event_study(results)

With covariate adjustment (conditional parallel trends):

>>> # When parallel trends only holds conditional on covariates
>>> cs = CallawaySantAnna(estimation_method='dr')  # doubly robust
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  covariates=['age', 'income'])
>>>
>>> # DR is recommended: consistent if either outcome model
>>> # or propensity model is correctly specified

Notes

The key innovation of Callaway & Sant’Anna (2021) is the disaggregated approach: instead of estimating a single treatment effect, they estimate ATT(g,t) for each cohort-time pair. This avoids the “forbidden comparison” problem where already-treated units act as controls.

The ATT(g,t) is identified under parallel trends conditional on covariates:

E[Y(0)_t - Y(0)_g-1 | G=g] = E[Y(0)_t - Y(0)_g-1 | C=1]

where G=g indicates treatment cohort g and C=1 indicates control units. This uses g-1 as the base period, which applies to post-treatment (t >= g). With base_period=”varying” (default), pre-treatment uses t-1 as base for consecutive comparisons useful in parallel trends diagnostics.

References

Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.

__init__(control_group='never_treated', anticipation=0, estimation_method='dr', alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights=None, bootstrap_weight_type=None, seed=None, rank_deficient_action='warn', base_period='varying')[source]

Parameters:

control_group (str)
anticipation (int)
estimation_method (str)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str | None)
bootstrap_weight_type (str | None)
seed (int | None)
rank_deficient_action (str)
base_period (str)

Methods

`__init__`([control_group, anticipation, ...])
`fit`(data, outcome, unit, time, first_treat)	Fit the Callaway-Sant'Anna estimator.
`get_params`()	Get estimator parameters (sklearn-compatible).
`print_summary`()	Print summary to stdout.
`set_params`(**params)	Set estimator parameters (sklearn-compatible).
`summary`()	Get summary of estimation results.

Attributes

`n_bootstrap`
`bootstrap_weight_type`
`alpha`
`seed`
`anticipation`
`base_period`

__init__(control_group='never_treated', anticipation=0, estimation_method='dr', alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights=None, bootstrap_weight_type=None, seed=None, rank_deficient_action='warn', base_period='varying')[source]

Parameters:

control_group (str)
anticipation (int)
estimation_method (str)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str | None)
bootstrap_weight_type (str | None)
seed (int | None)
rank_deficient_action (str)
base_period (str)

fit(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None)[source]

Fit the Callaway-Sant’Anna estimator.

Parameters:

data (pd.DataFrame) – Panel data with unit and time identifiers.
outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.
covariates (list, optional) – List of covariate column names for conditional parallel trends.
aggregate (str, optional) – How to aggregate group-time effects: - None: Only compute ATT(g,t) (default) - “simple”: Simple weighted average (overall ATT) - “event_study”: Aggregate by relative time (event study) - “group”: Aggregate by treatment cohort - “all”: Compute all aggregations
balance_e (int, optional) – For event study, balance the panel at relative time e. Ensures all groups contribute to each relative period.

Returns:

Object containing all estimation results.

Return type:

CallawaySantAnnaResults

Raises:

ValueError – If required columns are missing or data validation fails.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:: Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:: CallawaySantAnna

summary()[source]

Get summary of estimation results.

Return type:: str

print_summary()[source]

Print summary to stdout.

Return type:: None