Staggered Adoption#

Estimators for staggered DiD designs where treatment is adopted at different times.

This module provides three estimators for staggered adoption settings:

  1. Callaway-Sant’Anna (2021): Aggregates group-time 2x2 DiD comparisons

  2. Sun-Abraham (2021): Interaction-weighted regression approach

  3. Ortiz-Villavicencio & Sant’Anna (2025): Staggered triple-difference (DDD) with group-time ATT

Running CS and SA together provides a useful robustness check - when they agree, results are more credible.

CallawaySantAnna#

Callaway & Sant’Anna (2021) estimator for heterogeneous treatment timing.

class diff_diff.CallawaySantAnna[source]

Bases: CallawaySantAnnaBootstrapMixin, CallawaySantAnnaAggregationMixin

Callaway-Sant’Anna (2021) estimator for staggered Difference-in-Differences.

This estimator handles DiD designs with variation in treatment timing (staggered adoption) and heterogeneous treatment effects. It avoids the bias of traditional two-way fixed effects (TWFE) estimators by:

  1. Computing group-time average treatment effects ATT(g,t) for each cohort g (units first treated in period g) and time t.

  2. Aggregating these to summary measures (overall ATT, event study, etc.) using appropriate weights.

Parameters:
  • control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units

  • anticipation (int, default=0) – Number of periods before treatment where effects may occur. Set to > 0 if treatment effects can begin before the official treatment date.

  • estimation_method (str, default="dr") – Estimation method: - “dr”: Doubly robust (recommended) - “ipw”: Inverse probability weighting - “reg”: Outcome regression

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • cluster (str, optional) – Column name for cluster-robust standard errors. Defaults to unit-level clustering.

  • n_bootstrap (int, default=0) –

    Number of bootstrap iterations for inference. If 0, uses analytical standard errors. Recommended: 999 or more for reliable inference.

    Note

    Memory Usage The bootstrap stores all weights in memory as a (n_bootstrap, n_units) float64 array. For large datasets, this can be significant: - 1K bootstrap × 10K units = ~80 MB - 10K bootstrap × 100K units = ~8 GB Consider reducing n_bootstrap if memory is constrained.

  • bootstrap_weights (str, default="rademacher") – Type of weights for multiplier bootstrap: - “rademacher”: +1/-1 with equal probability (standard choice) - “mammen”: Two-point distribution (asymptotically valid, matches skewness) - “webb”: Six-point distribution (recommended when n_clusters < 20)

  • seed (int, optional) – Random seed for reproducibility.

  • rank_deficient_action (str, default="warn") –

    Action when design matrix is rank-deficient (linearly dependent columns):

    • ”warn”: Issue warning and drop linearly dependent columns (default)

    • ”error”: Raise ValueError

    • ”silent”: Drop columns silently without warning

  • base_period (str, default="varying") –

    Method for selecting the base (reference) period for computing ATT(g,t). Options:

    • ”varying”: For pre-treatment periods (t < g - anticipation), use t-1 as base (consecutive comparisons). For post-treatment, use g-1-anticipation. Requires t-1 to exist in data.

    • ”universal”: Always use g-1-anticipation as base period.

    Both produce identical post-treatment effects. Matches R’s did::att_gt() base_period parameter.

  • cband (bool, default=True) – Whether to compute simultaneous confidence bands (sup-t) for event study aggregation. Requires n_bootstrap > 0. When True, results include cband_crit_value and per-event-time cband_conf_int entries controlling family-wise error rate.

  • pscore_trim (float, default=0.01) – Trimming bound for propensity scores. Scores are clipped to [pscore_trim, 1 - pscore_trim] before weight computation in IPW and DR estimation. Must be in (0, 0.5).

  • panel (bool, default=True) – Whether the data is a balanced/unbalanced panel (units observed across multiple time periods). Set to False for stationary repeated cross-sections where each observation has a unique unit ID and units do not repeat across periods. Requires that the cross-sectional samples are drawn from the same population in each period (stationarity). Uses cross-sectional DRDID (Sant’Anna & Zhao 2020, Section 4) with per-observation influence functions.

  • epv_threshold (float, default=10) – Events Per Variable threshold for propensity score logit. When the ratio of minority-class observations to predictor variables (excluding intercept) falls below this value, a warning is emitted (or ValueError raised if rank_deficient_action="error"). Based on Peduzzi et al. (1996). Only applies to IPW and DR estimation methods. Use diagnose_propensity() for a pre-estimation check across all cohorts.

  • pscore_fallback (str, default="error") –

    Action when propensity score estimation fails entirely (LinAlgError or ValueError from IRLS):

    • ”error”: Raise the exception (default). Ensures the user is aware of estimation failures.

    • ”unconditional”: Fall back to unconditional propensity with a warning. For IPW, this drops all covariates. For DR, the propensity model becomes unconditional but outcome regression still uses covariates.

    When rank_deficient_action="error", errors are always re-raised regardless of this setting.

results_

Estimation results after calling fit().

Type:

CallawaySantAnnaResults

is_fitted_

Whether the model has been fitted.

Type:

bool

Examples

Basic usage:

>>> import pandas as pd
>>> from diff_diff import CallawaySantAnna
>>>
>>> # Panel data with staggered treatment
>>> # 'first_treat' = period when unit was first treated (0 if never treated)
>>> data = pd.DataFrame({
...     'unit': [...],
...     'time': [...],
...     'outcome': [...],
...     'first_treat': [...]  # 0 for never-treated, else first treatment period
... })
>>>
>>> cs = CallawaySantAnna()
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat')
>>>
>>> results.print_summary()

With event study aggregation:

>>> cs = CallawaySantAnna()
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  aggregate='event_study')
>>>
>>> # Plot event study
>>> from diff_diff import plot_event_study
>>> plot_event_study(results)

With covariate adjustment (conditional parallel trends):

>>> # When parallel trends only holds conditional on covariates
>>> cs = CallawaySantAnna(estimation_method='dr')  # doubly robust
>>> results = cs.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  covariates=['age', 'income'])
>>>
>>> # DR is recommended: consistent if either outcome model
>>> # or propensity model is correctly specified

Notes

The key innovation of Callaway & Sant’Anna (2021) is the disaggregated approach: instead of estimating a single treatment effect, they estimate ATT(g,t) for each cohort-time pair. This avoids the “forbidden comparison” problem where already-treated units act as controls.

The ATT(g,t) is identified under parallel trends conditional on covariates:

E[Y(0)_t - Y(0)_g-1 | G=g] = E[Y(0)_t - Y(0)_g-1 | C=1]

where G=g indicates treatment cohort g and C=1 indicates control units. This uses g-1 as the base period, which applies to post-treatment (t >= g). With base_period=”varying” (default), pre-treatment uses t-1 as base for consecutive comparisons useful in parallel trends diagnostics.

References

Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.

Methods

fit(data, outcome, unit, time, first_treat)

Fit the Callaway-Sant'Anna estimator.

get_params()

Get estimator parameters (sklearn-compatible).

set_params(**params)

Set estimator parameters (sklearn-compatible).

__init__(control_group='never_treated', anticipation=0, estimation_method='dr', alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights=None, seed=None, rank_deficient_action='warn', base_period='varying', cband=True, pscore_trim=0.01, panel=True, epv_threshold=10, pscore_fallback='error')[source]
Parameters:
  • control_group (str)

  • anticipation (int)

  • estimation_method (str)

  • alpha (float)

  • cluster (str | None)

  • n_bootstrap (int)

  • bootstrap_weights (str | None)

  • seed (int | None)

  • rank_deficient_action (str)

  • base_period (str)

  • cband (bool)

  • pscore_trim (float)

  • panel (bool)

  • epv_threshold (float)

  • pscore_fallback (str)

anticipation: int
alpha: float
n_bootstrap: int
bootstrap_weights: str
seed: int | None
base_period: str
results_: CallawaySantAnnaResults | None
diagnose_propensity(df, outcome, unit, time, first_treat, covariates=None)[source]

Check Events Per Variable (EPV) across all cohorts without estimation.

Examines the data to identify cohorts where propensity score logit may be unreliable due to too few events per covariate. Based on Peduzzi et al. (1996).

This is a raw-count heuristic: it uses total cohort/control unit counts without filtering for missing outcomes, zero survey weights, or period-specific validity. The actual fit-time EPV (stored in results.epv_diagnostics) may be lower because fit() operates on the valid base/post outcome pair and the positive-weight effective sample. Use this method as a quick pre-check; rely on results.epv_diagnostics for authoritative per-cell EPV.

Parameters:
  • df (DataFrame) – Same arguments as fit().

  • outcome (str) – Same arguments as fit().

  • unit (str) – Same arguments as fit().

  • time (str) – Same arguments as fit().

  • first_treat (str) – Same arguments as fit().

  • covariates (List[str] | None) – Same arguments as fit().

Returns:

Per-cohort EPV diagnostics with columns: group, n_treated, n_control, n_covariates, n_params, epv, status.

Return type:

pd.DataFrame

fit(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None, survey_design=None)[source]

Fit the Callaway-Sant’Anna estimator.

Parameters:
  • data (pd.DataFrame) – Panel data with unit and time identifiers. For repeated cross-sections (panel=False), each observation should have a unique unit ID — units do not repeat across periods.

  • outcome (str) – Name of outcome variable column.

  • unit (str) – Name of unit identifier column.

  • time (str) – Name of time period column.

  • first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.

  • covariates (list, optional) – List of covariate column names for conditional parallel trends.

  • aggregate (str, optional) – How to aggregate group-time effects: - None: Only compute ATT(g,t) (default) - “simple”: Simple weighted average (overall ATT) - “event_study”: Aggregate by relative time (event study) - “group”: Aggregate by treatment cohort - “all”: Compute all aggregations

  • balance_e (int, optional) – For event study, balance the panel at relative time e. Ensures all groups contribute to each relative period.

  • survey_design (SurveyDesign, optional) – Survey design specification. Supports pweight with strata/PSU/FPC. Aggregated SEs (overall, event study, group) use design-based variance via compute_survey_if_variance(). All estimation methods (reg, ipw, dr) support covariates + survey. For repeated cross-sections (panel=False), survey weights are per-observation (no unit-level collapse).

Returns:

Object containing all estimation results.

Return type:

CallawaySantAnnaResults

Raises:

ValueError – If required columns are missing or data validation fails.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:

CallawaySantAnna

summary()[source]

Get summary of estimation results.

Return type:

str

print_summary()[source]

Print summary to stdout.

Return type:

None

CallawaySantAnnaResults#

Results container for Callaway-Sant’Anna estimation.

class diff_diff.CallawaySantAnnaResults[source]

Bases: object

Results from Callaway-Sant’Anna (2021) staggered DiD estimation.

This class stores group-time average treatment effects ATT(g,t) and provides methods for aggregation into summary measures.

group_time_effects

Dictionary mapping (group, time) tuples to effect dictionaries.

Type:

dict

overall_att

Overall average treatment effect (weighted average of ATT(g,t)).

Type:

float

overall_se

Standard error of overall ATT.

Type:

float

overall_p_value

P-value for overall ATT.

Type:

float

overall_conf_int

Confidence interval for overall ATT.

Type:

tuple

groups

List of treatment cohorts (first treatment periods).

Type:

list

time_periods

List of all time periods.

Type:

list

n_obs

Total number of observations.

Type:

int

n_treated_units

Number of ever-treated units.

Type:

int

n_control_units

Number of never-treated units (excludes not-yet-treated dynamic controls).

Type:

int

event_study_effects

Effects aggregated by relative time (event study).

Type:

dict, optional

group_effects

Effects aggregated by treatment cohort.

Type:

dict, optional

pscore_trim

Propensity score trimming bound used during estimation.

Type:

float

Methods

summary([alpha])

Generate formatted summary of estimation results.

to_dataframe([level])

Convert results to DataFrame.

group_time_effects: Dict[Tuple[Any, Any], Dict[str, Any]]
overall_att: float
overall_se: float
overall_t_stat: float
overall_p_value: float
overall_conf_int: Tuple[float, float]
groups: List[Any]
time_periods: List[Any]
n_obs: int
n_treated_units: int
n_control_units: int
alpha: float = 0.05
control_group: str = 'never_treated'
base_period: str = 'varying'
anticipation: int = 0
panel: bool = True
event_study_effects: Dict[int, Dict[str, Any]] | None = None
group_effects: Dict[Any, Dict[str, Any]] | None = None
influence_functions: np.ndarray | None = None
event_study_vcov: np.ndarray | None = None
event_study_vcov_index: list | None = None
bootstrap_results: CSBootstrapResults | None = None
cband_crit_value: float | None = None
pscore_trim: float = 0.01
survey_metadata: Any | None = None
epv_diagnostics: Dict[Tuple[Any, Any], Dict[str, Any]] | None = None
epv_threshold: float = 10
pscore_fallback: str = 'error'
property att: float
property se: float
property conf_int: Tuple[float, float]
property p_value: float
property t_stat: float
__repr__()[source]

Concise string representation.

Return type:

str

property coef_var: float

SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.

Type:

Coefficient of variation

summary(alpha=None)[source]

Generate formatted summary of estimation results.

Parameters:

alpha (float, optional) – Significance level. Defaults to alpha used in estimation.

Returns:

Formatted summary.

Return type:

str

epv_summary(show_all=False)[source]

Return per-cohort EPV diagnostics as a DataFrame.

Parameters:

show_all (bool, default False) – If False, only show cells with low EPV. If True, show all cells.

Returns:

Columns: group, time, epv, n_events, n_params, is_low.

Return type:

pd.DataFrame

print_summary(alpha=None)[source]

Print summary to stdout.

Parameters:

alpha (float | None)

Return type:

None

to_dataframe(level='group_time')[source]

Convert results to DataFrame.

Parameters:

level (str, default="group_time") – Level of aggregation: “group_time”, “event_study”, or “group”.

Returns:

Results as DataFrame.

Return type:

pd.DataFrame

property is_significant: bool

Check if overall ATT is significant.

property significance_stars: str

Significance stars for overall ATT.

__init__(group_time_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, groups, time_periods, n_obs, n_treated_units, n_control_units, alpha=0.05, control_group='never_treated', base_period='varying', anticipation=0, panel=True, event_study_effects=None, group_effects=None, influence_functions=None, event_study_vcov=None, event_study_vcov_index=None, bootstrap_results=None, cband_crit_value=None, pscore_trim=0.01, survey_metadata=None, epv_diagnostics=None, epv_threshold=10, pscore_fallback='error')
Parameters:
Return type:

None

GroupTimeEffect#

Container for individual group-time ATT(g,t) effects.

class diff_diff.GroupTimeEffect[source]

Bases: object

Treatment effect for a specific group-time combination.

group

The treatment cohort (first treatment period).

Type:

any

time

The time period.

Type:

any

effect

The ATT(g,t) estimate.

Type:

float

se

Standard error.

Type:

float

n_treated

Number of treated observations.

Type:

int

n_control

Number of control observations.

Type:

int

group: Any
time: Any
effect: float
se: float
t_stat: float
p_value: float
conf_int: Tuple[float, float]
n_treated: int
n_control: int
property is_significant: bool

Check if effect is significant at 0.05 level.

property significance_stars: str

Return significance stars based on p-value.

__init__(group, time, effect, se, t_stat, p_value, conf_int, n_treated, n_control)
Parameters:
Return type:

None

SunAbraham#

Sun & Abraham (2021) interaction-weighted estimator for staggered DiD.

This estimator provides event-study coefficients using a saturated regression with cohort-by-relative-time interactions. It uses interaction-weighting to aggregate cohort-specific effects into event study estimates.

Key differences from Callaway-Sant’Anna:

  • Uses regression-based approach rather than 2x2 DiD comparisons

  • Weights cohort-specific effects by share of each cohort in treated population

  • Can be more efficient when treatment effects are homogeneous

  • Running both provides a useful robustness check

Reference: Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175-199.

class diff_diff.SunAbraham[source]

Bases: object

Sun-Abraham (2021) interaction-weighted estimator for staggered DiD.

This estimator provides event-study coefficients using a saturated TWFE regression with cohort × relative-time interactions, following the methodology in Sun & Abraham (2021).

The estimation procedure follows three steps: 1. Run a saturated TWFE regression with cohort × relative-time dummies 2. Compute cohort shares (weights) at each relative time 3. Aggregate cohort-specific effects using interaction weights

This avoids the negative weighting problem of standard TWFE and provides consistent event-study estimates under treatment effect heterogeneity.

Parameters:
  • control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units

  • anticipation (int, default=0) – Number of periods before treatment where effects may occur.

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default — UNLESS vcov_type is explicitly set to "hc2" or "classical", in which case the unit auto-cluster is dropped (both are one-way families and the linalg validator rejects them with cluster_ids). Use vcov_type="hc1" (default) or vcov_type="hc2_bm" for cluster-robust inference; the latter routes to CR2 Bell-McCaffrey at the cluster level.

  • n_bootstrap (int, default=0) – Number of bootstrap iterations for inference. If 0, uses analytical cluster-robust standard errors.

  • seed (int, optional) – Random seed for reproducibility.

  • rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning

  • vcov_type ({"classical", "hc1", "hc2", "hc2_bm"}, default "hc1") –

    Variance-covariance family for analytical inference. Defaults to "hc1" (preserves prior behavior bit-equally; SA historically hard-coded HC1).

    • "classical": homoskedastic OLS standard errors. One-way only (linalg validator rejects classical + cluster_ids); the unit auto-cluster is dropped when classical is explicitly opted into.

    • "hc1": Eicker-Huber-White HC1 finite-sample correction (default; cluster-robust when cluster= is set or the unit auto-cluster fires).

    • "hc2": Eicker-Huber-White HC2 leverage correction. One-way only; the linalg validator rejects combining hc2 with clusters. The unit auto-cluster is dropped when hc2 is explicitly opted into.

    • "hc2_bm": HC2 + Bell-McCaffrey CR2 Satterthwaite DOF for cluster-robust inference. Routes to CR2-BM at the cluster level; preserves the auto-cluster default.

    When vcov_type {"classical","hc2","hc2_bm"}, the saturated regression switches from the within-transform path to a full-dummy [intercept + interactions + covariates + unit_dummies + time_dummies] build. For hc2 and hc2_bm, the Frisch-Waugh-Lovell theorem preserves coefficients but NOT the hat matrix, so HC2 leverage and BM Satterthwaite DOF must be computed on the full FE projection. classical also routes through full-dummy so the (n-k) finite-sample correction in × (X'X)^{-1} matches R’s lm() interpretation. Empirically matches lm(...) + sandwich::vcovHC(type="HC2") and clubSandwich::vcovCR(..., type="CR2") at atol=1e-10.

    "hc1" keeps the within-transform path (cluster-robust HC1 does not depend on the hat matrix); empirically close to fixest::sunab(cluster=~unit). See REGISTRY.md for the documented HC1 finite-sample-correction deviation.

    Survey designs (survey_design=) are rejected for vcov_type {"classical","hc2","hc2_bm"} because the survey-design Taylor Series Linearization (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use vcov_type="hc1" (default) for survey designs.

    conley spatial-HAC is not yet wired up for SunAbraham; see TODO.md.

results_

Estimation results after calling fit().

Type:

SunAbrahamResults

is_fitted_

Whether the model has been fitted.

Type:

bool

Examples

Basic usage:

>>> import pandas as pd
>>> from diff_diff import SunAbraham
>>>
>>> # Panel data with staggered treatment
>>> data = pd.DataFrame({
...     'unit': [...],
...     'time': [...],
...     'outcome': [...],
...     'first_treat': [...]  # 0 for never-treated
... })
>>>
>>> sa = SunAbraham()
>>> results = sa.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat')
>>> results.print_summary()

With covariates:

>>> sa = SunAbraham()
>>> results = sa.fit(data, outcome='outcome', unit='unit',
...                  time='time', first_treat='first_treat',
...                  covariates=['age', 'income'])

Notes

The Sun-Abraham estimator uses a saturated regression approach:

Y_it = α_i + λ_t + Σ_g Σ_e [δ_{g,e} × 1(G_i=g) × D_{it}^e] + X’γ + ε_it

where: - α_i = unit fixed effects - λ_t = time fixed effects - G_i = unit i’s treatment cohort (first treatment period) - D_{it}^e = indicator for being e periods from treatment - δ_{g,e} = cohort-specific effect (CATT) at relative time e

The event-study coefficients are then computed as:

β_e = Σ_g w_{g,e} × δ_{g,e}

where w_{g,e} is the share of cohort g in the treated population at relative time e (interaction weights).

Compared to Callaway-Sant’Anna: - SA uses saturated regression; CS uses 2x2 DiD comparisons - SA can be more efficient when model is correctly specified - Both are consistent under heterogeneous treatment effects - Running both provides a useful robustness check

References

Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175-199.

Methods

fit(data, outcome, unit, time, first_treat)

Fit the Sun-Abraham estimator using saturated regression.

get_params()

Get estimator parameters (sklearn-compatible).

set_params(**params)

Set estimator parameters (sklearn-compatible).

summary()

Get summary of estimation results.

print_summary()

Print summary to stdout.

__init__(control_group='never_treated', anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, seed=None, rank_deficient_action='warn', vcov_type='hc1')[source]
Parameters:
  • control_group (str)

  • anticipation (int)

  • alpha (float)

  • cluster (str | None)

  • n_bootstrap (int)

  • seed (int | None)

  • rank_deficient_action (str)

  • vcov_type (str)

results_: SunAbrahamResults | None
fit(data, outcome, unit, time, first_treat, covariates=None, survey_design=None)[source]

Fit the Sun-Abraham estimator using saturated regression.

Parameters:
  • data (pd.DataFrame) – Panel data with unit and time identifiers.

  • outcome (str) – Name of outcome variable column.

  • unit (str) – Name of unit identifier column.

  • time (str) – Name of time period column.

  • first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.

  • covariates (list, optional) – List of covariate column names to include in regression.

  • survey_design (SurveyDesign, optional) – Survey design specification for design-based inference. Supports weighted estimation and Taylor series linearization variance with strata, PSU, and FPC.

Returns:

Object containing all estimation results.

Return type:

SunAbrahamResults

Raises:

ValueError – If required columns are missing or data validation fails.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:

SunAbraham

summary()[source]

Get summary of estimation results.

Return type:

str

print_summary()[source]

Print summary to stdout.

Return type:

None

SunAbrahamResults#

Results container for Sun-Abraham estimation.

class diff_diff.SunAbrahamResults[source]

Bases: object

Results from Sun-Abraham (2021) interaction-weighted estimation.

event_study_effects

Dictionary mapping relative time to effect dictionaries with keys: ‘effect’, ‘se’, ‘t_stat’, ‘p_value’, ‘conf_int’, ‘n_groups’.

Type:

dict

overall_att

Overall average treatment effect (weighted average of post-treatment effects).

Type:

float

overall_se

Standard error of overall ATT.

Type:

float

overall_t_stat

T-statistic for overall ATT.

Type:

float

overall_p_value

P-value for overall ATT.

Type:

float

overall_conf_int

Confidence interval for overall ATT.

Type:

tuple

cohort_weights

Dictionary mapping relative time to cohort weight dictionaries.

Type:

dict

groups

List of treatment cohorts (first treatment periods).

Type:

list

time_periods

List of all time periods.

Type:

list

n_obs

Total number of observations.

Type:

int

n_treated_units

Number of ever-treated units.

Type:

int

n_control_units

Number of never-treated units.

Type:

int

alpha

Significance level used for confidence intervals.

Type:

float

control_group

Type of control group used.

Type:

str

vcov_type

Variance-covariance family from the fit-time configuration (classical, hc1, hc2, or hc2_bm). Note: when a survey_design= is supplied, the survey-design Taylor Series Linearization (or replicate-weight refit) variance overrides this analytical family — the field still records the configured value but survey_metadata indicates the survey path was active. Likewise, on bootstrap fits (n_bootstrap > 0) the SE comes from the pairs bootstrap (or Rao-Wu rescaled bootstrap under stratified / PSU survey designs), not the analytical family.

Type:

str

Methods

summary([alpha])

Generate formatted summary of estimation results.

print_summary([alpha])

Print summary to stdout.

to_dataframe([level])

Convert results to DataFrame.

event_study_effects: Dict[int, Dict[str, Any]]
overall_att: float
overall_se: float
overall_t_stat: float
overall_p_value: float
overall_conf_int: Tuple[float, float]
cohort_weights: Dict[int, Dict[Any, float]]
groups: List[Any]
time_periods: List[Any]
n_obs: int
n_treated_units: int
n_control_units: int
alpha: float = 0.05
control_group: str = 'never_treated'
vcov_type: str = 'hc1'
anticipation: int = 0
bootstrap_results: SABootstrapResults | None = None
cohort_effects: Dict[Tuple[Any, int], Dict[str, Any]] | None = None
survey_metadata: Any | None = None
event_study_vcov: ndarray | None = None
event_study_vcov_index: list | None = None
property att: float
property se: float
property conf_int: Tuple[float, float]
property p_value: float
property t_stat: float
__repr__()[source]

Concise string representation.

Return type:

str

property coef_var: float

SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.

Type:

Coefficient of variation

summary(alpha=None)[source]

Generate formatted summary of estimation results.

Parameters:

alpha (float, optional) – Significance level. Defaults to alpha used in estimation.

Returns:

Formatted summary.

Return type:

str

print_summary(alpha=None)[source]

Print summary to stdout.

Parameters:

alpha (float | None)

Return type:

None

to_dataframe(level='event_study')[source]

Convert results to DataFrame.

Parameters:

level (str, default="event_study") – Level of aggregation: “event_study” or “cohort”.

Returns:

Results as DataFrame.

Return type:

pd.DataFrame

property is_significant: bool

Check if overall ATT is significant.

property significance_stars: str

Significance stars for overall ATT.

__init__(event_study_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, cohort_weights, groups, time_periods, n_obs, n_treated_units, n_control_units, alpha=0.05, control_group='never_treated', vcov_type='hc1', anticipation=0, bootstrap_results=None, cohort_effects=None, survey_metadata=None, event_study_vcov=None, event_study_vcov_index=None)
Parameters:
Return type:

None

SABootstrapResults#

Bootstrap inference results for Sun-Abraham estimation.

class diff_diff.SABootstrapResults[source]

Bases: object

Results from Sun-Abraham bootstrap inference.

n_bootstrap

Number of bootstrap iterations.

Type:

int

weight_type

Type of bootstrap used (always “pairs” for pairs bootstrap).

Type:

str

alpha

Significance level used for confidence intervals.

Type:

float

overall_att_se

Bootstrap standard error for overall ATT.

Type:

float

overall_att_ci

Bootstrap confidence interval for overall ATT.

Type:

Tuple[float, float]

overall_att_p_value

Bootstrap p-value for overall ATT.

Type:

float

event_study_ses

Bootstrap SEs for event study effects.

Type:

Dict[int, float]

event_study_cis

Bootstrap CIs for event study effects.

Type:

Dict[int, Tuple[float, float]]

event_study_p_values

Bootstrap p-values for event study effects.

Type:

Dict[int, float]

bootstrap_distribution

Full bootstrap distribution of overall ATT.

Type:

Optional[np.ndarray]

n_bootstrap: int
weight_type: str
alpha: float
overall_att_se: float
overall_att_ci: Tuple[float, float]
overall_att_p_value: float
event_study_ses: Dict[int, float]
event_study_cis: Dict[int, Tuple[float, float]]
event_study_p_values: Dict[int, float]
bootstrap_distribution: ndarray | None = None
__init__(n_bootstrap, weight_type, alpha, overall_att_se, overall_att_ci, overall_att_p_value, event_study_ses, event_study_cis, event_study_p_values, bootstrap_distribution=None)
Parameters:
Return type:

None

StaggeredTripleDifference#

Ortiz-Villavicencio & Sant’Anna (2025) staggered triple-difference (DDD) estimator with group-time ATT identification under heterogeneous treatment timing.

class diff_diff.StaggeredTripleDifference[source]

Bases: CallawaySantAnnaBootstrapMixin, CallawaySantAnnaAggregationMixin

Staggered Triple Difference (DDD) estimator.

Computes group-time average treatment effects ATT(g,t) for settings with staggered adoption and a binary eligibility dimension, using the three-DiD decomposition of Ortiz-Villavicencio & Sant’Anna (2025).

Multiple comparison groups are combined via GMM-optimal (inverse-variance) weighting. Event study, group, and overall aggregations are supported.

Parameters:
  • estimation_method (str, default="dr") – Estimation method: “dr” (doubly robust), “ipw” (inverse probability weighting), or “reg” (regression adjustment).

  • alpha (float, default=0.05) – Significance level.

  • anticipation (int, default=0) – Number of anticipation periods.

  • base_period (str, default="varying") – Base period selection: “varying” (consecutive comparisons) or “universal” (always vs g-1-anticipation).

  • n_bootstrap (int, default=0) – Number of multiplier bootstrap repetitions. 0 disables bootstrap.

  • bootstrap_weights (str, default="rademacher") – Bootstrap weight distribution: “rademacher”, “mammen”, or “webb”.

  • seed (int or None, default=None) – Random seed for reproducibility.

  • cband (bool, default=True) – Whether to compute simultaneous confidence bands.

  • pscore_trim (float, default=0.01) – Propensity score trimming bound.

  • cluster (str or None, default=None) – Column name for cluster-robust standard errors.

  • rank_deficient_action (str, default="warn") – Action for rank-deficient design matrices: “warn”, “error”, “silent”.

  • epv_threshold (float, default=10) – Minimum events per variable for propensity score logistic regression. A warning is emitted when EPV falls below this threshold.

  • pscore_fallback (str, default="error") – Action when propensity score estimation fails: “error” (raise) or “unconditional” (fall back to unconditional propensity).

References

Ortiz-Villavicencio, M. & Sant’Anna, P.H.C. (2025). “Better Understanding Triple Differences Estimators.” arXiv:2505.09942.

__init__(estimation_method='dr', control_group='notyettreated', alpha=0.05, anticipation=0, base_period='varying', n_bootstrap=0, bootstrap_weights='rademacher', seed=None, cband=True, pscore_trim=0.01, cluster=None, rank_deficient_action='warn', epv_threshold=10, pscore_fallback='error')[source]
Parameters:
  • estimation_method (str)

  • control_group (str)

  • alpha (float)

  • anticipation (int)

  • base_period (str)

  • n_bootstrap (int)

  • bootstrap_weights (str)

  • seed (int | None)

  • cband (bool)

  • pscore_trim (float)

  • cluster (str | None)

  • rank_deficient_action (str)

  • epv_threshold (float)

  • pscore_fallback (str)

results_: StaggeredTripleDiffResults | None
get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:

StaggeredTripleDifference

fit(data, outcome, unit, time, first_treat, eligibility, covariates=None, aggregate=None, balance_e=None, survey_design=None)[source]

Fit the staggered triple difference estimator.

Parameters:
  • data (pd.DataFrame) – Panel data.

  • outcome (str) – Outcome variable column name.

  • unit (str) – Unit identifier column name.

  • time (str) – Time period column name.

  • first_treat (str) – Column with the enabling period for each unit’s group. Use 0 or np.inf for never-enabled units.

  • eligibility (str) – Binary eligibility indicator column (0/1, time-invariant).

  • covariates (list of str, optional) – Covariate column names.

  • aggregate (str, optional) – Aggregation method: “event_study”, “group”, “simple”, or “all”.

  • balance_e (int, optional) – Event time to balance on for event study.

  • survey_design (SurveyDesign, optional) – Survey design specification for complex survey data. When provided, uses survey weights for estimation (weighted Riesz representers, weighted logit, weighted OLS) and design-based variance for aggregated SEs (overall, event study, group) via Taylor Series Linearization or replicate weights. Requires weight_type='pweight'.

Return type:

StaggeredTripleDiffResults

StaggeredTripleDiffResults#

Results container for StaggeredTripleDifference estimation.

class diff_diff.StaggeredTripleDiffResults[source]

Bases: object

Results from Staggered Triple Difference (DDD) estimation.

Implements the Ortiz-Villavicencio & Sant’Anna (2025) estimator for staggered adoption settings with an eligibility dimension.

group_time_effects

Dictionary mapping (group, time) tuples to effect dictionaries.

Type:

dict

overall_att

Overall average treatment effect (weighted average of ATT(g,t)).

Type:

float

overall_se

Standard error of overall ATT.

Type:

float

overall_t_stat

T-statistic for overall ATT.

Type:

float

overall_p_value

P-value for overall ATT.

Type:

float

overall_conf_int

Confidence interval for overall ATT.

Type:

tuple

groups

List of enabling cohorts (first treatment periods).

Type:

list

time_periods

List of all time periods.

Type:

list

n_obs

Total number of observations.

Type:

int

n_treated_units

Number of treated units (S < inf AND Q = 1).

Type:

int

n_control_units

Number of units not in treated group.

Type:

int

n_never_enabled

Number of never-enabled units (S = inf or 0).

Type:

int

n_eligible

Number of eligible units (Q = 1).

Type:

int

n_ineligible

Number of ineligible units (Q = 0).

Type:

int

group_time_effects: Dict[Tuple[Any, Any], Dict[str, Any]]
overall_att: float
overall_se: float
overall_t_stat: float
overall_p_value: float
overall_conf_int: Tuple[float, float]
groups: List[Any]
time_periods: List[Any]
n_obs: int
n_treated_units: int
n_control_units: int
n_never_enabled: int
n_eligible: int
n_ineligible: int
alpha: float = 0.05
control_group: str = 'notyettreated'
base_period: str = 'varying'
anticipation: int = 0
estimation_method: str = 'dr'
event_study_effects: Dict[int, Dict[str, Any]] | None = None
group_effects: Dict[Any, Dict[str, Any]] | None = None
influence_functions: np.ndarray | None = None
bootstrap_results: CSBootstrapResults | None = None
cband_crit_value: float | None = None
pscore_trim: float = 0.01
survey_metadata: Any | None = None
comparison_group_counts: Dict[Tuple, int] | None = None
gmm_weights: Dict[Tuple, Dict] | None = None
epv_diagnostics: Dict[Tuple[Any, Any], Dict[str, Any]] | None = None
epv_threshold: float = 10
pscore_fallback: str = 'error'
property att: float
property se: float
property conf_int: Tuple[float, float]
property p_value: float
property t_stat: float
__repr__()[source]

Concise string representation.

Return type:

str

property coef_var: float

SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.

Type:

Coefficient of variation

summary(alpha=None)[source]

Generate formatted summary of estimation results.

Parameters:

alpha (float, optional) – Significance level. Defaults to alpha used in estimation.

Returns:

Formatted summary.

Return type:

str

print_summary(alpha=None)[source]

Print summary to stdout.

Parameters:

alpha (float | None)

Return type:

None

epv_summary(show_all=False)[source]

Return per-cohort EPV diagnostics as a DataFrame.

Parameters:

show_all (bool, default False) – If False, only show cells with low EPV. If True, show all cells.

Returns:

Columns: group, time, epv, n_events, n_params, is_low.

Return type:

pd.DataFrame

to_dataframe(level='group_time')[source]

Convert results to DataFrame.

Parameters:

level (str, default="group_time") – Level of aggregation: “group_time”, “event_study”, or “group”.

Returns:

Results as DataFrame.

Return type:

pd.DataFrame

to_dict()[source]

Convert results to dictionary.

Return type:

Dict[str, Any]

property is_significant: bool

Check if overall ATT is significant.

property significance_stars: str

Significance stars for overall ATT.

__init__(group_time_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, groups, time_periods, n_obs, n_treated_units, n_control_units, n_never_enabled, n_eligible, n_ineligible, alpha=0.05, control_group='notyettreated', base_period='varying', anticipation=0, estimation_method='dr', event_study_effects=None, group_effects=None, influence_functions=None, bootstrap_results=None, cband_crit_value=None, pscore_trim=0.01, survey_metadata=None, comparison_group_counts=None, gmm_weights=None, epv_diagnostics=None, epv_threshold=10, pscore_fallback='error')
Parameters:
Return type:

None