Staggered Adoption

Estimators for staggered DiD designs where treatment is adopted at different times.

This module provides two main estimators for staggered adoption settings:

  1. Callaway-Sant’Anna (2021): Aggregates group-time 2x2 DiD comparisons

  2. Sun-Abraham (2021): Interaction-weighted regression approach

Running both provides a useful robustness check—when they agree, results are more credible.

CallawaySantAnna

Callaway & Sant’Anna (2021) estimator for heterogeneous treatment timing.

class diff_diff.CallawaySantAnna[source]

Bases: CallawaySantAnnaBootstrapMixin, CallawaySantAnnaAggregationMixin

Callaway-Sant’Anna (2021) estimator for staggered Difference-in-Differences.

This estimator handles DiD designs with variation in treatment timing (staggered adoption) and heterogeneous treatment effects. It avoids the bias of traditional two-way fixed effects (TWFE) estimators by:

  1. Computing group-time average treatment effects ATT(g,t) for each cohort g (units first treated in period g) and time t.

  2. Aggregating these to summary measures (overall ATT, event study, etc.) using appropriate weights.

Parameters:
  • control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units

  • anticipation (int, default=0) – Number of periods before treatment where effects may occur. Set to > 0 if treatment effects can begin before the official treatment date.

  • estimation_method (str, default="dr") – Estimation method: - “dr”: Doubly robust (recommended) - “ipw”: Inverse probability weighting - “reg”: Outcome regression

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • cluster (str, optional) – Column name for cluster-robust standard errors. Defaults to unit-level clustering.

  • n_bootstrap (int, default=0) –

    Number of bootstrap iterations for inference. If 0, uses analytical standard errors. Recommended: 999 or more for reliable inference.

    Note

    Memory Usage The bootstrap stores all weights in memory as a (n_bootstrap, n_units) float64 array. For large datasets, this can be significant: - 1K bootstrap × 10K units = ~80 MB - 10K bootstrap × 100K units = ~8 GB Consider reducing n_bootstrap if memory is constrained.

  • bootstrap_weights (str, default="rademacher") – Type of weights for multiplier bootstrap: - “rademacher”: +1/-1 with equal probability (standard choice) - “mammen”: Two-point distribution (asymptotically valid, matches skewness) - “webb”: Six-point distribution (recommended when n_clusters < 20)

  • bootstrap_weight_type (str, optional) –

    Deprecated since version 1.0.1: Use bootstrap_weights instead. Will be removed in v3.0.

  • seed (int, optional) – Random seed for reproducibility.

  • rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning

  • base_period (str, default="varying") –

    Method for selecting the base (reference) period for computing ATT(g,t). Options: - “varying”: For pre-treatment periods (t < g - anticipation), use

    t-1 as base (consecutive comparisons). For post-treatment, use g-1-anticipation. Requires t-1 to exist in data.

    • ”universal”: Always use g-1-anticipation as base period.

    Both produce identical post-treatment effects. Matches R’s did::att_gt() base_period parameter.

results_

Estimation results after calling fit().

Type:

CallawaySantAnnaResults

is_fitted_

Whether the model has been fitted.

Type:

bool

Examples

Basic usage:

>>> import pandas as pd
>>> from diff_diff import CallawaySantAnna
>>>
>>> # Panel data with staggered treatment
>>> # 'first_treat' = period when unit was first treated (0 if never treated)
>>> data = pd.DataFrame({
...     'unit': [...],
...     'time': [...],
...     'outcome': [...],
...     'first_treat': [...]  # 0 for never-treated, else first treatment period
... })
>>>
>>> cs = CallawaySantAnna()
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat')
>>>
>>> results.print_summary()

With event study aggregation:

>>> cs = CallawaySantAnna()
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  aggregate='event_study')
>>>
>>> # Plot event study
>>> from diff_diff import plot_event_study
>>> plot_event_study(results)

With covariate adjustment (conditional parallel trends):

>>> # When parallel trends only holds conditional on covariates
>>> cs = CallawaySantAnna(estimation_method='dr')  # doubly robust
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  covariates=['age', 'income'])
>>>
>>> # DR is recommended: consistent if either outcome model
>>> # or propensity model is correctly specified

Notes

The key innovation of Callaway & Sant’Anna (2021) is the disaggregated approach: instead of estimating a single treatment effect, they estimate ATT(g,t) for each cohort-time pair. This avoids the “forbidden comparison” problem where already-treated units act as controls.

The ATT(g,t) is identified under parallel trends conditional on covariates:

E[Y(0)_t - Y(0)_g-1 | G=g] = E[Y(0)_t - Y(0)_g-1 | C=1]

where G=g indicates treatment cohort g and C=1 indicates control units. This uses g-1 as the base period, which applies to post-treatment (t >= g). With base_period=”varying” (default), pre-treatment uses t-1 as base for consecutive comparisons useful in parallel trends diagnostics.

References

Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.

Methods

fit(data, outcome, unit, time, first_treat)

Fit the Callaway-Sant'Anna estimator.

get_params()

Get estimator parameters (sklearn-compatible).

set_params(**params)

Set estimator parameters (sklearn-compatible).

__init__(control_group='never_treated', anticipation=0, estimation_method='dr', alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights=None, bootstrap_weight_type=None, seed=None, rank_deficient_action='warn', base_period='varying')[source]
Parameters:
  • control_group (str)

  • anticipation (int)

  • estimation_method (str)

  • alpha (float)

  • cluster (str | None)

  • n_bootstrap (int)

  • bootstrap_weights (str | None)

  • bootstrap_weight_type (str | None)

  • seed (int | None)

  • rank_deficient_action (str)

  • base_period (str)

anticipation: int
alpha: float
n_bootstrap: int
bootstrap_weight_type: str
seed: int | None
base_period: str
results_: CallawaySantAnnaResults | None
fit(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None)[source]

Fit the Callaway-Sant’Anna estimator.

Parameters:
  • data (pd.DataFrame) – Panel data with unit and time identifiers.

  • outcome (str) – Name of outcome variable column.

  • unit (str) – Name of unit identifier column.

  • time (str) – Name of time period column.

  • first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.

  • covariates (list, optional) – List of covariate column names for conditional parallel trends.

  • aggregate (str, optional) – How to aggregate group-time effects: - None: Only compute ATT(g,t) (default) - “simple”: Simple weighted average (overall ATT) - “event_study”: Aggregate by relative time (event study) - “group”: Aggregate by treatment cohort - “all”: Compute all aggregations

  • balance_e (int, optional) – For event study, balance the panel at relative time e. Ensures all groups contribute to each relative period.

Returns:

Object containing all estimation results.

Return type:

CallawaySantAnnaResults

Raises:

ValueError – If required columns are missing or data validation fails.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:

CallawaySantAnna

summary()[source]

Get summary of estimation results.

Return type:

str

print_summary()[source]

Print summary to stdout.

Return type:

None

CallawaySantAnnaResults

Results container for Callaway-Sant’Anna estimation.

class diff_diff.CallawaySantAnnaResults[source]

Bases: object

Results from Callaway-Sant’Anna (2021) staggered DiD estimation.

This class stores group-time average treatment effects ATT(g,t) and provides methods for aggregation into summary measures.

group_time_effects

Dictionary mapping (group, time) tuples to effect dictionaries.

Type:

dict

overall_att

Overall average treatment effect (weighted average of ATT(g,t)).

Type:

float

overall_se

Standard error of overall ATT.

Type:

float

overall_p_value

P-value for overall ATT.

Type:

float

overall_conf_int

Confidence interval for overall ATT.

Type:

tuple

groups

List of treatment cohorts (first treatment periods).

Type:

list

time_periods

List of all time periods.

Type:

list

n_obs

Total number of observations.

Type:

int

n_treated_units

Number of ever-treated units.

Type:

int

n_control_units

Number of never-treated units.

Type:

int

event_study_effects

Effects aggregated by relative time (event study).

Type:

dict, optional

group_effects

Effects aggregated by treatment cohort.

Type:

dict, optional

Methods

summary([alpha])

Generate formatted summary of estimation results.

to_dataframe([level])

Convert results to DataFrame.

group_time_effects: Dict[Tuple[Any, Any], Dict[str, Any]]
overall_att: float
overall_se: float
overall_t_stat: float
overall_p_value: float
overall_conf_int: Tuple[float, float]
groups: List[Any]
time_periods: List[Any]
n_obs: int
n_treated_units: int
n_control_units: int
alpha: float = 0.05
control_group: str = 'never_treated'
base_period: str = 'varying'
event_study_effects: Dict[int, Dict[str, Any]] | None = None
group_effects: Dict[Any, Dict[str, Any]] | None = None
influence_functions: np.ndarray | None = None
bootstrap_results: CSBootstrapResults | None = None
__repr__()[source]

Concise string representation.

Return type:

str

summary(alpha=None)[source]

Generate formatted summary of estimation results.

Parameters:

alpha (float, optional) – Significance level. Defaults to alpha used in estimation.

Returns:

Formatted summary.

Return type:

str

print_summary(alpha=None)[source]

Print summary to stdout.

Parameters:

alpha (float | None)

Return type:

None

to_dataframe(level='group_time')[source]

Convert results to DataFrame.

Parameters:

level (str, default="group_time") – Level of aggregation: “group_time”, “event_study”, or “group”.

Returns:

Results as DataFrame.

Return type:

pd.DataFrame

property is_significant: bool

Check if overall ATT is significant.

property significance_stars: str

Significance stars for overall ATT.

__init__(group_time_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, groups, time_periods, n_obs, n_treated_units, n_control_units, alpha=0.05, control_group='never_treated', base_period='varying', event_study_effects=None, group_effects=None, influence_functions=None, bootstrap_results=None)
Parameters:
Return type:

None

GroupTimeEffect

Container for individual group-time ATT(g,t) effects.

class diff_diff.GroupTimeEffect[source]

Bases: object

Treatment effect for a specific group-time combination.

group

The treatment cohort (first treatment period).

Type:

any

time

The time period.

Type:

any

effect

The ATT(g,t) estimate.

Type:

float

se

Standard error.

Type:

float

n_treated

Number of treated observations.

Type:

int

n_control

Number of control observations.

Type:

int

group: Any
time: Any
effect: float
se: float
t_stat: float
p_value: float
conf_int: Tuple[float, float]
n_treated: int
n_control: int
property is_significant: bool

Check if effect is significant at 0.05 level.

property significance_stars: str

Return significance stars based on p-value.

__init__(group, time, effect, se, t_stat, p_value, conf_int, n_treated, n_control)
Parameters:
Return type:

None

SunAbraham

Sun & Abraham (2021) interaction-weighted estimator for staggered DiD.

This estimator provides event-study coefficients using a saturated regression with cohort-by-relative-time interactions. It uses interaction-weighting to aggregate cohort-specific effects into event study estimates.

Key differences from Callaway-Sant’Anna:

  • Uses regression-based approach rather than 2x2 DiD comparisons

  • Weights cohort-specific effects by share of each cohort in treated population

  • Can be more efficient when treatment effects are homogeneous

  • Running both provides a useful robustness check

Reference: Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175-199.

class diff_diff.SunAbraham[source]

Bases: object

Sun-Abraham (2021) interaction-weighted estimator for staggered DiD.

This estimator provides event-study coefficients using a saturated TWFE regression with cohort × relative-time interactions, following the methodology in Sun & Abraham (2021).

The estimation procedure follows three steps: 1. Run a saturated TWFE regression with cohort × relative-time dummies 2. Compute cohort shares (weights) at each relative time 3. Aggregate cohort-specific effects using interaction weights

This avoids the negative weighting problem of standard TWFE and provides consistent event-study estimates under treatment effect heterogeneity.

Parameters:
  • control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units

  • anticipation (int, default=0) – Number of periods before treatment where effects may occur.

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default.

  • n_bootstrap (int, default=0) – Number of bootstrap iterations for inference. If 0, uses analytical cluster-robust standard errors.

  • seed (int, optional) – Random seed for reproducibility.

  • rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning

results_

Estimation results after calling fit().

Type:

SunAbrahamResults

is_fitted_

Whether the model has been fitted.

Type:

bool

Examples

Basic usage:

>>> import pandas as pd
>>> from diff_diff import SunAbraham
>>>
>>> # Panel data with staggered treatment
>>> data = pd.DataFrame({
...     'unit': [...],
...     'time': [...],
...     'outcome': [...],
...     'first_treat': [...]  # 0 for never-treated
... })
>>>
>>> sa = SunAbraham()
>>> results = sa.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat')
>>> results.print_summary()

With covariates:

>>> sa = SunAbraham()
>>> results = sa.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  covariates=['age', 'income'])

Notes

The Sun-Abraham estimator uses a saturated regression approach:

Y_it = α_i + λ_t + Σ_g Σ_e [δ_{g,e} × 1(G_i=g) × D_{it}^e] + X’γ + ε_it

where: - α_i = unit fixed effects - λ_t = time fixed effects - G_i = unit i’s treatment cohort (first treatment period) - D_{it}^e = indicator for being e periods from treatment - δ_{g,e} = cohort-specific effect (CATT) at relative time e

The event-study coefficients are then computed as:

β_e = Σ_g w_{g,e} × δ_{g,e}

where w_{g,e} is the share of cohort g in the treated population at relative time e (interaction weights).

Compared to Callaway-Sant’Anna: - SA uses saturated regression; CS uses 2x2 DiD comparisons - SA can be more efficient when model is correctly specified - Both are consistent under heterogeneous treatment effects - Running both provides a useful robustness check

References

Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175-199.

Methods

fit(data, outcome, unit, time, first_treat)

Fit the Sun-Abraham estimator using saturated regression.

get_params()

Get estimator parameters (sklearn-compatible).

set_params(**params)

Set estimator parameters (sklearn-compatible).

summary()

Get summary of estimation results.

print_summary()

Print summary to stdout.

__init__(control_group='never_treated', anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, seed=None, rank_deficient_action='warn')[source]
Parameters:
  • control_group (str)

  • anticipation (int)

  • alpha (float)

  • cluster (str | None)

  • n_bootstrap (int)

  • seed (int | None)

  • rank_deficient_action (str)

results_: SunAbrahamResults | None
fit(data, outcome, unit, time, first_treat, covariates=None)[source]

Fit the Sun-Abraham estimator using saturated regression.

Parameters:
  • data (pd.DataFrame) – Panel data with unit and time identifiers.

  • outcome (str) – Name of outcome variable column.

  • unit (str) – Name of unit identifier column.

  • time (str) – Name of time period column.

  • first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.

  • covariates (list, optional) – List of covariate column names to include in regression.

Returns:

Object containing all estimation results.

Return type:

SunAbrahamResults

Raises:

ValueError – If required columns are missing or data validation fails.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:

SunAbraham

summary()[source]

Get summary of estimation results.

Return type:

str

print_summary()[source]

Print summary to stdout.

Return type:

None

SunAbrahamResults

Results container for Sun-Abraham estimation.

class diff_diff.SunAbrahamResults[source]

Bases: object

Results from Sun-Abraham (2021) interaction-weighted estimation.

event_study_effects

Dictionary mapping relative time to effect dictionaries with keys: ‘effect’, ‘se’, ‘t_stat’, ‘p_value’, ‘conf_int’, ‘n_groups’.

Type:

dict

overall_att

Overall average treatment effect (weighted average of post-treatment effects).

Type:

float

overall_se

Standard error of overall ATT.

Type:

float

overall_t_stat

T-statistic for overall ATT.

Type:

float

overall_p_value

P-value for overall ATT.

Type:

float

overall_conf_int

Confidence interval for overall ATT.

Type:

tuple

cohort_weights

Dictionary mapping relative time to cohort weight dictionaries.

Type:

dict

groups

List of treatment cohorts (first treatment periods).

Type:

list

time_periods

List of all time periods.

Type:

list

n_obs

Total number of observations.

Type:

int

n_treated_units

Number of ever-treated units.

Type:

int

n_control_units

Number of never-treated units.

Type:

int

alpha

Significance level used for confidence intervals.

Type:

float

control_group

Type of control group used.

Type:

str

Methods

summary([alpha])

Generate formatted summary of estimation results.

print_summary([alpha])

Print summary to stdout.

to_dataframe([level])

Convert results to DataFrame.

event_study_effects: Dict[int, Dict[str, Any]]
overall_att: float
overall_se: float
overall_t_stat: float
overall_p_value: float
overall_conf_int: Tuple[float, float]
cohort_weights: Dict[int, Dict[Any, float]]
groups: List[Any]
time_periods: List[Any]
n_obs: int
n_treated_units: int
n_control_units: int
alpha: float = 0.05
control_group: str = 'never_treated'
bootstrap_results: SABootstrapResults | None = None
cohort_effects: Dict[Tuple[Any, int], Dict[str, Any]] | None = None
__repr__()[source]

Concise string representation.

Return type:

str

summary(alpha=None)[source]

Generate formatted summary of estimation results.

Parameters:

alpha (float, optional) – Significance level. Defaults to alpha used in estimation.

Returns:

Formatted summary.

Return type:

str

print_summary(alpha=None)[source]

Print summary to stdout.

Parameters:

alpha (float | None)

Return type:

None

to_dataframe(level='event_study')[source]

Convert results to DataFrame.

Parameters:

level (str, default="event_study") – Level of aggregation: “event_study” or “cohort”.

Returns:

Results as DataFrame.

Return type:

pd.DataFrame

property is_significant: bool

Check if overall ATT is significant.

property significance_stars: str

Significance stars for overall ATT.

__init__(event_study_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, cohort_weights, groups, time_periods, n_obs, n_treated_units, n_control_units, alpha=0.05, control_group='never_treated', bootstrap_results=None, cohort_effects=None)
Parameters:
Return type:

None

SABootstrapResults

Bootstrap inference results for Sun-Abraham estimation.

class diff_diff.SABootstrapResults[source]

Bases: object

Results from Sun-Abraham bootstrap inference.

n_bootstrap

Number of bootstrap iterations.

Type:

int

weight_type

Type of bootstrap used (always “pairs” for pairs bootstrap).

Type:

str

alpha

Significance level used for confidence intervals.

Type:

float

overall_att_se

Bootstrap standard error for overall ATT.

Type:

float

overall_att_ci

Bootstrap confidence interval for overall ATT.

Type:

Tuple[float, float]

overall_att_p_value

Bootstrap p-value for overall ATT.

Type:

float

event_study_ses

Bootstrap SEs for event study effects.

Type:

Dict[int, float]

event_study_cis

Bootstrap CIs for event study effects.

Type:

Dict[int, Tuple[float, float]]

event_study_p_values

Bootstrap p-values for event study effects.

Type:

Dict[int, float]

bootstrap_distribution

Full bootstrap distribution of overall ATT.

Type:

Optional[np.ndarray]

n_bootstrap: int
weight_type: str
alpha: float
overall_att_se: float
overall_att_ci: Tuple[float, float]
overall_att_p_value: float
event_study_ses: Dict[int, float]
event_study_cis: Dict[int, Tuple[float, float]]
event_study_p_values: Dict[int, float]
bootstrap_distribution: ndarray | None = None
__init__(n_bootstrap, weight_type, alpha, overall_att_se, overall_att_ci, overall_att_p_value, event_study_ses, event_study_cis, event_study_p_values, bootstrap_distribution=None)
Parameters:
Return type:

None