Staggered Adoption#
Estimators for staggered DiD designs where treatment is adopted at different times.
This module provides three estimators for staggered adoption settings:
Callaway-Sant’Anna (2021): Aggregates group-time 2x2 DiD comparisons
Sun-Abraham (2021): Interaction-weighted regression approach
Ortiz-Villavicencio & Sant’Anna (2025): Staggered triple-difference (DDD) with group-time ATT
Running CS and SA together provides a useful robustness check - when they agree, results are more credible.
CallawaySantAnna#
Callaway & Sant’Anna (2021) estimator for heterogeneous treatment timing.
- class diff_diff.CallawaySantAnna[source]
Bases:
CallawaySantAnnaBootstrapMixin,CallawaySantAnnaAggregationMixinCallaway-Sant’Anna (2021) estimator for staggered Difference-in-Differences.
This estimator handles DiD designs with variation in treatment timing (staggered adoption) and heterogeneous treatment effects. It avoids the bias of traditional two-way fixed effects (TWFE) estimators by:
Computing group-time average treatment effects ATT(g,t) for each cohort g (units first treated in period g) and time t.
Aggregating these to summary measures (overall ATT, event study, etc.) using appropriate weights.
- Parameters:
control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units
anticipation (int, default=0) – Number of periods before treatment where effects may occur. Set to > 0 if treatment effects can begin before the official treatment date.
estimation_method (str, default="dr") – Estimation method: - “dr”: Doubly robust (recommended) - “ipw”: Inverse probability weighting - “reg”: Outcome regression
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. Defaults to unit-level clustering.
n_bootstrap (int, default=0) –
Number of bootstrap iterations for inference. If 0, uses analytical standard errors. Recommended: 999 or more for reliable inference.
Note
Memory Usage The bootstrap stores all weights in memory as a (n_bootstrap, n_units) float64 array. For large datasets, this can be significant: - 1K bootstrap × 10K units = ~80 MB - 10K bootstrap × 100K units = ~8 GB Consider reducing n_bootstrap if memory is constrained.
bootstrap_weights (str, default="rademacher") – Type of weights for multiplier bootstrap: - “rademacher”: +1/-1 with equal probability (standard choice) - “mammen”: Two-point distribution (asymptotically valid, matches skewness) - “webb”: Six-point distribution (recommended when n_clusters < 20)
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") –
Action when design matrix is rank-deficient (linearly dependent columns):
”warn”: Issue warning and drop linearly dependent columns (default)
”error”: Raise ValueError
”silent”: Drop columns silently without warning
base_period (str, default="varying") –
Method for selecting the base (reference) period for computing ATT(g,t). Options:
”varying”: For pre-treatment periods (t < g - anticipation), use t-1 as base (consecutive comparisons). For post-treatment, use g-1-anticipation. Requires t-1 to exist in data.
”universal”: Always use g-1-anticipation as base period.
Both produce identical post-treatment effects. Matches R’s did::att_gt() base_period parameter.
cband (bool, default=True) – Whether to compute simultaneous confidence bands (sup-t) for event study aggregation. Requires
n_bootstrap > 0. When True, results includecband_crit_valueand per-event-timecband_conf_intentries controlling family-wise error rate.pscore_trim (float, default=0.01) – Trimming bound for propensity scores. Scores are clipped to
[pscore_trim, 1 - pscore_trim]before weight computation in IPW and DR estimation. Must be in(0, 0.5).panel (bool, default=True) – Whether the data is a balanced/unbalanced panel (units observed across multiple time periods). Set to
Falsefor stationary repeated cross-sections where each observation has a unique unit ID and units do not repeat across periods. Requires that the cross-sectional samples are drawn from the same population in each period (stationarity). Uses cross-sectional DRDID (Sant’Anna & Zhao 2020, Section 4) with per-observation influence functions.epv_threshold (float, default=10) – Events Per Variable threshold for propensity score logit. When the ratio of minority-class observations to predictor variables (excluding intercept) falls below this value, a warning is emitted (or
ValueErrorraised ifrank_deficient_action="error"). Based on Peduzzi et al. (1996). Only applies to IPW and DR estimation methods. Usediagnose_propensity()for a pre-estimation check across all cohorts.pscore_fallback (str, default="error") –
Action when propensity score estimation fails entirely (
LinAlgErrororValueErrorfrom IRLS):”error”: Raise the exception (default). Ensures the user is aware of estimation failures.
”unconditional”: Fall back to unconditional propensity with a warning. For IPW, this drops all covariates. For DR, the propensity model becomes unconditional but outcome regression still uses covariates.
When
rank_deficient_action="error", errors are always re-raised regardless of this setting.
- results_
Estimation results after calling fit().
- Type:
- is_fitted_
Whether the model has been fitted.
- Type:
Examples
Basic usage:
>>> import pandas as pd >>> from diff_diff import CallawaySantAnna >>> >>> # Panel data with staggered treatment >>> # 'first_treat' = period when unit was first treated (0 if never treated) >>> data = pd.DataFrame({ ... 'unit': [...], ... 'time': [...], ... 'outcome': [...], ... 'first_treat': [...] # 0 for never-treated, else first treatment period ... }) >>> >>> cs = CallawaySantAnna() >>> results = cs.fit(data, outcome='outcome', unit='unit', ... time='time', first_treat='first_treat') >>> >>> results.print_summary()
With event study aggregation:
>>> cs = CallawaySantAnna() >>> results = cs.fit(data, outcome='outcome', unit='unit', ... time='time', first_treat='first_treat', ... aggregate='event_study') >>> >>> # Plot event study >>> from diff_diff import plot_event_study >>> plot_event_study(results)
With covariate adjustment (conditional parallel trends):
>>> # When parallel trends only holds conditional on covariates >>> cs = CallawaySantAnna(estimation_method='dr') # doubly robust >>> results = cs.fit(data, outcome='outcome', unit='unit', ... time='time', first_treat='first_treat', ... covariates=['age', 'income']) >>> >>> # DR is recommended: consistent if either outcome model >>> # or propensity model is correctly specified
Notes
The key innovation of Callaway & Sant’Anna (2021) is the disaggregated approach: instead of estimating a single treatment effect, they estimate ATT(g,t) for each cohort-time pair. This avoids the “forbidden comparison” problem where already-treated units act as controls.
The ATT(g,t) is identified under parallel trends conditional on covariates:
E[Y(0)_t - Y(0)_g-1 | G=g] = E[Y(0)_t - Y(0)_g-1 | C=1]
where G=g indicates treatment cohort g and C=1 indicates control units. This uses g-1 as the base period, which applies to post-treatment (t >= g). With base_period=”varying” (default), pre-treatment uses t-1 as base for consecutive comparisons useful in parallel trends diagnostics.
References
Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.
Methods
fit(data, outcome, unit, time, first_treat)Fit the Callaway-Sant'Anna estimator.
get_params()Get estimator parameters (sklearn-compatible).
set_params(**params)Set estimator parameters (sklearn-compatible).
- __init__(control_group='never_treated', anticipation=0, estimation_method='dr', alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights=None, seed=None, rank_deficient_action='warn', base_period='varying', cband=True, pscore_trim=0.01, panel=True, epv_threshold=10, pscore_fallback='error')[source]
- Parameters:
control_group (str)
anticipation (int)
estimation_method (str)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str | None)
seed (int | None)
rank_deficient_action (str)
base_period (str)
cband (bool)
pscore_trim (float)
panel (bool)
epv_threshold (float)
pscore_fallback (str)
- anticipation: int
- alpha: float
- n_bootstrap: int
- bootstrap_weights: str
- base_period: str
- results_: CallawaySantAnnaResults | None
- diagnose_propensity(df, outcome, unit, time, first_treat, covariates=None)[source]
Check Events Per Variable (EPV) across all cohorts without estimation.
Examines the data to identify cohorts where propensity score logit may be unreliable due to too few events per covariate. Based on Peduzzi et al. (1996).
This is a raw-count heuristic: it uses total cohort/control unit counts without filtering for missing outcomes, zero survey weights, or period-specific validity. The actual fit-time EPV (stored in
results.epv_diagnostics) may be lower becausefit()operates on the valid base/post outcome pair and the positive-weight effective sample. Use this method as a quick pre-check; rely onresults.epv_diagnosticsfor authoritative per-cell EPV.- Parameters:
- Returns:
Per-cohort EPV diagnostics with columns: group, n_treated, n_control, n_covariates, n_params, epv, status.
- Return type:
pd.DataFrame
- fit(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None, survey_design=None)[source]
Fit the Callaway-Sant’Anna estimator.
- Parameters:
data (pd.DataFrame) – Panel data with unit and time identifiers. For repeated cross-sections (
panel=False), each observation should have a unique unit ID — units do not repeat across periods.outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.
covariates (list, optional) – List of covariate column names for conditional parallel trends.
aggregate (str, optional) – How to aggregate group-time effects: - None: Only compute ATT(g,t) (default) - “simple”: Simple weighted average (overall ATT) - “event_study”: Aggregate by relative time (event study) - “group”: Aggregate by treatment cohort - “all”: Compute all aggregations
balance_e (int, optional) – For event study, balance the panel at relative time e. Ensures all groups contribute to each relative period.
survey_design (SurveyDesign, optional) – Survey design specification. Supports pweight with strata/PSU/FPC. Aggregated SEs (overall, event study, group) use design-based variance via compute_survey_if_variance(). All estimation methods (reg, ipw, dr) support covariates + survey. For repeated cross-sections (
panel=False), survey weights are per-observation (no unit-level collapse).
- Returns:
Object containing all estimation results.
- Return type:
- Raises:
ValueError – If required columns are missing or data validation fails.
- set_params(**params)[source]
Set estimator parameters (sklearn-compatible).
- Return type:
- print_summary()[source]
Print summary to stdout.
- Return type:
None
CallawaySantAnnaResults#
Results container for Callaway-Sant’Anna estimation.
- class diff_diff.CallawaySantAnnaResults[source]
Bases:
objectResults from Callaway-Sant’Anna (2021) staggered DiD estimation.
This class stores group-time average treatment effects ATT(g,t) and provides methods for aggregation into summary measures.
- group_time_effects
Dictionary mapping (group, time) tuples to effect dictionaries.
- Type:
- overall_att
Overall average treatment effect (weighted average of ATT(g,t)).
- Type:
- overall_se
Standard error of overall ATT.
- Type:
- overall_p_value
P-value for overall ATT.
- Type:
- overall_conf_int
Confidence interval for overall ATT.
- Type:
- groups
List of treatment cohorts (first treatment periods).
- Type:
- time_periods
List of all time periods.
- Type:
- n_obs
Total number of observations.
- Type:
- n_treated_units
Number of ever-treated units.
- Type:
- n_control_units
Number of never-treated units (excludes not-yet-treated dynamic controls).
- Type:
- event_study_effects
Effects aggregated by relative time (event study).
- Type:
dict, optional
- group_effects
Effects aggregated by treatment cohort.
- Type:
dict, optional
- pscore_trim
Propensity score trimming bound used during estimation.
- Type:
Methods
summary([alpha])Generate formatted summary of estimation results.
to_dataframe([level])Convert results to DataFrame.
- overall_att: float
- overall_se: float
- overall_t_stat: float
- overall_p_value: float
- n_obs: int
- n_treated_units: int
- n_control_units: int
- alpha: float = 0.05
- control_group: str = 'never_treated'
- base_period: str = 'varying'
- anticipation: int = 0
- panel: bool = True
- influence_functions: np.ndarray | None = None
- event_study_vcov: np.ndarray | None = None
- bootstrap_results: CSBootstrapResults | None = None
- pscore_trim: float = 0.01
- epv_threshold: float = 10
- pscore_fallback: str = 'error'
- property att: float
- property se: float
- property p_value: float
- property t_stat: float
- property coef_var: float
SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.
- Type:
Coefficient of variation
- summary(alpha=None)[source]
Generate formatted summary of estimation results.
- epv_summary(show_all=False)[source]
Return per-cohort EPV diagnostics as a DataFrame.
- Parameters:
show_all (bool, default False) – If False, only show cells with low EPV. If True, show all cells.
- Returns:
Columns: group, time, epv, n_events, n_params, is_low.
- Return type:
pd.DataFrame
- print_summary(alpha=None)[source]
Print summary to stdout.
- Parameters:
alpha (float | None)
- Return type:
None
- to_dataframe(level='group_time')[source]
Convert results to DataFrame.
- Parameters:
level (str, default="group_time") – Level of aggregation: “group_time”, “event_study”, or “group”.
- Returns:
Results as DataFrame.
- Return type:
pd.DataFrame
- property is_significant: bool
Check if overall ATT is significant.
- property significance_stars: str
Significance stars for overall ATT.
- __init__(group_time_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, groups, time_periods, n_obs, n_treated_units, n_control_units, alpha=0.05, control_group='never_treated', base_period='varying', anticipation=0, panel=True, event_study_effects=None, group_effects=None, influence_functions=None, event_study_vcov=None, event_study_vcov_index=None, bootstrap_results=None, cband_crit_value=None, pscore_trim=0.01, survey_metadata=None, epv_diagnostics=None, epv_threshold=10, pscore_fallback='error')
- Parameters:
overall_att (float)
overall_se (float)
overall_t_stat (float)
overall_p_value (float)
n_obs (int)
n_treated_units (int)
n_control_units (int)
alpha (float)
control_group (str)
base_period (str)
anticipation (int)
panel (bool)
influence_functions (np.ndarray | None)
event_study_vcov (np.ndarray | None)
event_study_vcov_index (list | None)
bootstrap_results (CSBootstrapResults | None)
cband_crit_value (float | None)
pscore_trim (float)
survey_metadata (Any | None)
epv_diagnostics (Dict[Tuple[Any, Any], Dict[str, Any]] | None)
epv_threshold (float)
pscore_fallback (str)
- Return type:
None
GroupTimeEffect#
Container for individual group-time ATT(g,t) effects.
- class diff_diff.GroupTimeEffect[source]
Bases:
objectTreatment effect for a specific group-time combination.
- group
The treatment cohort (first treatment period).
- Type:
any
- time
The time period.
- Type:
any
- effect
The ATT(g,t) estimate.
- Type:
- se
Standard error.
- Type:
- n_treated
Number of treated observations.
- Type:
- n_control
Number of control observations.
- Type:
- group: Any
- time: Any
- effect: float
- se: float
- t_stat: float
- p_value: float
- n_treated: int
- n_control: int
- property is_significant: bool
Check if effect is significant at 0.05 level.
- property significance_stars: str
Return significance stars based on p-value.
SunAbraham#
Sun & Abraham (2021) interaction-weighted estimator for staggered DiD.
This estimator provides event-study coefficients using a saturated regression with cohort-by-relative-time interactions. It uses interaction-weighting to aggregate cohort-specific effects into event study estimates.
Key differences from Callaway-Sant’Anna:
Uses regression-based approach rather than 2x2 DiD comparisons
Weights cohort-specific effects by share of each cohort in treated population
Can be more efficient when treatment effects are homogeneous
Running both provides a useful robustness check
Reference: Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175-199.
- class diff_diff.SunAbraham[source]
Bases:
objectSun-Abraham (2021) interaction-weighted estimator for staggered DiD.
This estimator provides event-study coefficients using a saturated TWFE regression with cohort × relative-time interactions, following the methodology in Sun & Abraham (2021).
The estimation procedure follows three steps: 1. Run a saturated TWFE regression with cohort × relative-time dummies 2. Compute cohort shares (weights) at each relative time 3. Aggregate cohort-specific effects using interaction weights
This avoids the negative weighting problem of standard TWFE and provides consistent event-study estimates under treatment effect heterogeneity.
- Parameters:
control_group (str, default="never_treated") – Which units to use as controls: - “never_treated”: Use only never-treated units (recommended) - “not_yet_treated”: Use never-treated and not-yet-treated units
anticipation (int, default=0) – Number of periods before treatment where effects may occur.
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default — UNLESS
vcov_typeis explicitly set to"hc2"or"classical", in which case the unit auto-cluster is dropped (both are one-way families and the linalg validator rejects them withcluster_ids). Usevcov_type="hc1"(default) orvcov_type="hc2_bm"for cluster-robust inference; the latter routes to CR2 Bell-McCaffrey at the cluster level.n_bootstrap (int, default=0) – Number of bootstrap iterations for inference. If 0, uses analytical cluster-robust standard errors.
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning
vcov_type ({"classical", "hc1", "hc2", "hc2_bm"}, default "hc1") –
Variance-covariance family for analytical inference. Defaults to
"hc1"(preserves prior behavior bit-equally; SA historically hard-coded HC1)."classical": homoskedastic OLS standard errors. One-way only (linalg validator rejectsclassical + cluster_ids); the unit auto-cluster is dropped whenclassicalis explicitly opted into."hc1": Eicker-Huber-White HC1 finite-sample correction (default; cluster-robust whencluster=is set or the unit auto-cluster fires)."hc2": Eicker-Huber-White HC2 leverage correction. One-way only; the linalg validator rejects combininghc2with clusters. The unit auto-cluster is dropped whenhc2is explicitly opted into."hc2_bm": HC2 + Bell-McCaffrey CR2 Satterthwaite DOF for cluster-robust inference. Routes to CR2-BM at the cluster level; preserves the auto-cluster default.
When
vcov_type ∈ {"classical","hc2","hc2_bm"}, the saturated regression switches from the within-transform path to a full-dummy[intercept + interactions + covariates + unit_dummies + time_dummies]build. Forhc2andhc2_bm, the Frisch-Waugh-Lovell theorem preserves coefficients but NOT the hat matrix, so HC2 leverage and BM Satterthwaite DOF must be computed on the full FE projection.classicalalso routes through full-dummy so the(n-k)finite-sample correction ins² × (X'X)^{-1}matches R’slm()interpretation. Empirically matcheslm(...) + sandwich::vcovHC(type="HC2")andclubSandwich::vcovCR(..., type="CR2")at atol=1e-10."hc1"keeps the within-transform path (cluster-robust HC1 does not depend on the hat matrix); empirically close tofixest::sunab(cluster=~unit). See REGISTRY.md for the documented HC1 finite-sample-correction deviation.Survey designs (
survey_design=) are rejected forvcov_type ∈ {"classical","hc2","hc2_bm"}because the survey-design Taylor Series Linearization (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Usevcov_type="hc1"(default) for survey designs.conleyspatial-HAC is not yet wired up for SunAbraham; see TODO.md.
- results_
Estimation results after calling fit().
- Type:
- is_fitted_
Whether the model has been fitted.
- Type:
Examples
Basic usage:
>>> import pandas as pd >>> from diff_diff import SunAbraham >>> >>> # Panel data with staggered treatment >>> data = pd.DataFrame({ ... 'unit': [...], ... 'time': [...], ... 'outcome': [...], ... 'first_treat': [...] # 0 for never-treated ... }) >>> >>> sa = SunAbraham() >>> results = sa.fit(data, outcome='outcome', unit='unit', ... time='time', first_treat='first_treat') >>> results.print_summary()
With covariates:
>>> sa = SunAbraham() >>> results = sa.fit(data, outcome='outcome', unit='unit', ... time='time', first_treat='first_treat', ... covariates=['age', 'income'])
Notes
The Sun-Abraham estimator uses a saturated regression approach:
Y_it = α_i + λ_t + Σ_g Σ_e [δ_{g,e} × 1(G_i=g) × D_{it}^e] + X’γ + ε_it
where: - α_i = unit fixed effects - λ_t = time fixed effects - G_i = unit i’s treatment cohort (first treatment period) - D_{it}^e = indicator for being e periods from treatment - δ_{g,e} = cohort-specific effect (CATT) at relative time e
The event-study coefficients are then computed as:
β_e = Σ_g w_{g,e} × δ_{g,e}
where w_{g,e} is the share of cohort g in the treated population at relative time e (interaction weights).
Compared to Callaway-Sant’Anna: - SA uses saturated regression; CS uses 2x2 DiD comparisons - SA can be more efficient when model is correctly specified - Both are consistent under heterogeneous treatment effects - Running both provides a useful robustness check
References
Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175-199.
Methods
fit(data, outcome, unit, time, first_treat)Fit the Sun-Abraham estimator using saturated regression.
get_params()Get estimator parameters (sklearn-compatible).
set_params(**params)Set estimator parameters (sklearn-compatible).
summary()Get summary of estimation results.
print_summary()Print summary to stdout.
- __init__(control_group='never_treated', anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, seed=None, rank_deficient_action='warn', vcov_type='hc1')[source]
- results_: SunAbrahamResults | None
- fit(data, outcome, unit, time, first_treat, covariates=None, survey_design=None)[source]
Fit the Sun-Abraham estimator using saturated regression.
- Parameters:
data (pd.DataFrame) – Panel data with unit and time identifiers.
outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.
covariates (list, optional) – List of covariate column names to include in regression.
survey_design (SurveyDesign, optional) – Survey design specification for design-based inference. Supports weighted estimation and Taylor series linearization variance with strata, PSU, and FPC.
- Returns:
Object containing all estimation results.
- Return type:
- Raises:
ValueError – If required columns are missing or data validation fails.
- set_params(**params)[source]
Set estimator parameters (sklearn-compatible).
- Return type:
- print_summary()[source]
Print summary to stdout.
- Return type:
None
SunAbrahamResults#
Results container for Sun-Abraham estimation.
- class diff_diff.SunAbrahamResults[source]
Bases:
objectResults from Sun-Abraham (2021) interaction-weighted estimation.
- event_study_effects
Dictionary mapping relative time to effect dictionaries with keys: ‘effect’, ‘se’, ‘t_stat’, ‘p_value’, ‘conf_int’, ‘n_groups’.
- Type:
- overall_att
Overall average treatment effect (weighted average of post-treatment effects).
- Type:
- overall_se
Standard error of overall ATT.
- Type:
- overall_t_stat
T-statistic for overall ATT.
- Type:
- overall_p_value
P-value for overall ATT.
- Type:
- overall_conf_int
Confidence interval for overall ATT.
- Type:
- cohort_weights
Dictionary mapping relative time to cohort weight dictionaries.
- Type:
- groups
List of treatment cohorts (first treatment periods).
- Type:
- time_periods
List of all time periods.
- Type:
- n_obs
Total number of observations.
- Type:
- n_treated_units
Number of ever-treated units.
- Type:
- n_control_units
Number of never-treated units.
- Type:
- alpha
Significance level used for confidence intervals.
- Type:
- control_group
Type of control group used.
- Type:
- vcov_type
Variance-covariance family from the fit-time configuration (
classical,hc1,hc2, orhc2_bm). Note: when asurvey_design=is supplied, the survey-design Taylor Series Linearization (or replicate-weight refit) variance overrides this analytical family — the field still records the configured value butsurvey_metadataindicates the survey path was active. Likewise, on bootstrap fits (n_bootstrap > 0) the SE comes from the pairs bootstrap (or Rao-Wu rescaled bootstrap under stratified / PSU survey designs), not the analytical family.- Type:
Methods
summary([alpha])Generate formatted summary of estimation results.
print_summary([alpha])Print summary to stdout.
to_dataframe([level])Convert results to DataFrame.
- overall_att: float
- overall_se: float
- overall_t_stat: float
- overall_p_value: float
- n_obs: int
- n_treated_units: int
- n_control_units: int
- alpha: float = 0.05
- control_group: str = 'never_treated'
- vcov_type: str = 'hc1'
- anticipation: int = 0
- bootstrap_results: SABootstrapResults | None = None
- property att: float
- property se: float
- property p_value: float
- property t_stat: float
- property coef_var: float
SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.
- Type:
Coefficient of variation
- summary(alpha=None)[source]
Generate formatted summary of estimation results.
- print_summary(alpha=None)[source]
Print summary to stdout.
- Parameters:
alpha (float | None)
- Return type:
None
- to_dataframe(level='event_study')[source]
Convert results to DataFrame.
- Parameters:
level (str, default="event_study") – Level of aggregation: “event_study” or “cohort”.
- Returns:
Results as DataFrame.
- Return type:
pd.DataFrame
- property is_significant: bool
Check if overall ATT is significant.
- property significance_stars: str
Significance stars for overall ATT.
- __init__(event_study_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, cohort_weights, groups, time_periods, n_obs, n_treated_units, n_control_units, alpha=0.05, control_group='never_treated', vcov_type='hc1', anticipation=0, bootstrap_results=None, cohort_effects=None, survey_metadata=None, event_study_vcov=None, event_study_vcov_index=None)
- Parameters:
overall_att (float)
overall_se (float)
overall_t_stat (float)
overall_p_value (float)
n_obs (int)
n_treated_units (int)
n_control_units (int)
alpha (float)
control_group (str)
vcov_type (str)
anticipation (int)
bootstrap_results (SABootstrapResults | None)
cohort_effects (Dict[Tuple[Any, int], Dict[str, Any]] | None)
survey_metadata (Any | None)
event_study_vcov (ndarray | None)
event_study_vcov_index (list | None)
- Return type:
None
SABootstrapResults#
Bootstrap inference results for Sun-Abraham estimation.
- class diff_diff.SABootstrapResults[source]
Bases:
objectResults from Sun-Abraham bootstrap inference.
- n_bootstrap
Number of bootstrap iterations.
- Type:
- weight_type
Type of bootstrap used (always “pairs” for pairs bootstrap).
- Type:
- alpha
Significance level used for confidence intervals.
- Type:
- overall_att_se
Bootstrap standard error for overall ATT.
- Type:
- overall_att_p_value
Bootstrap p-value for overall ATT.
- Type:
- bootstrap_distribution
Full bootstrap distribution of overall ATT.
- Type:
Optional[np.ndarray]
- n_bootstrap: int
- weight_type: str
- alpha: float
- overall_att_se: float
- overall_att_p_value: float
- __init__(n_bootstrap, weight_type, alpha, overall_att_se, overall_att_ci, overall_att_p_value, event_study_ses, event_study_cis, event_study_p_values, bootstrap_distribution=None)
StaggeredTripleDifference#
Ortiz-Villavicencio & Sant’Anna (2025) staggered triple-difference (DDD) estimator with group-time ATT identification under heterogeneous treatment timing.
- class diff_diff.StaggeredTripleDifference[source]
Bases:
CallawaySantAnnaBootstrapMixin,CallawaySantAnnaAggregationMixinStaggered Triple Difference (DDD) estimator.
Computes group-time average treatment effects ATT(g,t) for settings with staggered adoption and a binary eligibility dimension, using the three-DiD decomposition of Ortiz-Villavicencio & Sant’Anna (2025).
Multiple comparison groups are combined via GMM-optimal (inverse-variance) weighting. Event study, group, and overall aggregations are supported.
- Parameters:
estimation_method (str, default="dr") – Estimation method: “dr” (doubly robust), “ipw” (inverse probability weighting), or “reg” (regression adjustment).
alpha (float, default=0.05) – Significance level.
anticipation (int, default=0) – Number of anticipation periods.
base_period (str, default="varying") – Base period selection: “varying” (consecutive comparisons) or “universal” (always vs g-1-anticipation).
n_bootstrap (int, default=0) – Number of multiplier bootstrap repetitions. 0 disables bootstrap.
bootstrap_weights (str, default="rademacher") – Bootstrap weight distribution: “rademacher”, “mammen”, or “webb”.
seed (int or None, default=None) – Random seed for reproducibility.
cband (bool, default=True) – Whether to compute simultaneous confidence bands.
pscore_trim (float, default=0.01) – Propensity score trimming bound.
cluster (str or None, default=None) – Column name for cluster-robust standard errors.
rank_deficient_action (str, default="warn") – Action for rank-deficient design matrices: “warn”, “error”, “silent”.
epv_threshold (float, default=10) – Minimum events per variable for propensity score logistic regression. A warning is emitted when EPV falls below this threshold.
pscore_fallback (str, default="error") – Action when propensity score estimation fails: “error” (raise) or “unconditional” (fall back to unconditional propensity).
References
Ortiz-Villavicencio, M. & Sant’Anna, P.H.C. (2025). “Better Understanding Triple Differences Estimators.” arXiv:2505.09942.
- __init__(estimation_method='dr', control_group='notyettreated', alpha=0.05, anticipation=0, base_period='varying', n_bootstrap=0, bootstrap_weights='rademacher', seed=None, cband=True, pscore_trim=0.01, cluster=None, rank_deficient_action='warn', epv_threshold=10, pscore_fallback='error')[source]
- results_: StaggeredTripleDiffResults | None
- set_params(**params)[source]
Set estimator parameters (sklearn-compatible).
- Return type:
- fit(data, outcome, unit, time, first_treat, eligibility, covariates=None, aggregate=None, balance_e=None, survey_design=None)[source]
Fit the staggered triple difference estimator.
- Parameters:
data (pd.DataFrame) – Panel data.
outcome (str) – Outcome variable column name.
unit (str) – Unit identifier column name.
time (str) – Time period column name.
first_treat (str) – Column with the enabling period for each unit’s group. Use 0 or np.inf for never-enabled units.
eligibility (str) – Binary eligibility indicator column (0/1, time-invariant).
covariates (list of str, optional) – Covariate column names.
aggregate (str, optional) – Aggregation method: “event_study”, “group”, “simple”, or “all”.
balance_e (int, optional) – Event time to balance on for event study.
survey_design (SurveyDesign, optional) – Survey design specification for complex survey data. When provided, uses survey weights for estimation (weighted Riesz representers, weighted logit, weighted OLS) and design-based variance for aggregated SEs (overall, event study, group) via Taylor Series Linearization or replicate weights. Requires
weight_type='pweight'.
- Return type:
StaggeredTripleDiffResults#
Results container for StaggeredTripleDifference estimation.
- class diff_diff.StaggeredTripleDiffResults[source]
Bases:
objectResults from Staggered Triple Difference (DDD) estimation.
Implements the Ortiz-Villavicencio & Sant’Anna (2025) estimator for staggered adoption settings with an eligibility dimension.
- group_time_effects
Dictionary mapping (group, time) tuples to effect dictionaries.
- Type:
- overall_att
Overall average treatment effect (weighted average of ATT(g,t)).
- Type:
- overall_se
Standard error of overall ATT.
- Type:
- overall_t_stat
T-statistic for overall ATT.
- Type:
- overall_p_value
P-value for overall ATT.
- Type:
- overall_conf_int
Confidence interval for overall ATT.
- Type:
- groups
List of enabling cohorts (first treatment periods).
- Type:
- time_periods
List of all time periods.
- Type:
- n_obs
Total number of observations.
- Type:
- n_treated_units
Number of treated units (S < inf AND Q = 1).
- Type:
- n_control_units
Number of units not in treated group.
- Type:
- n_never_enabled
Number of never-enabled units (S = inf or 0).
- Type:
- n_eligible
Number of eligible units (Q = 1).
- Type:
- n_ineligible
Number of ineligible units (Q = 0).
- Type:
- overall_att: float
- overall_se: float
- overall_t_stat: float
- overall_p_value: float
- n_obs: int
- n_treated_units: int
- n_control_units: int
- n_never_enabled: int
- n_eligible: int
- n_ineligible: int
- alpha: float = 0.05
- control_group: str = 'notyettreated'
- base_period: str = 'varying'
- anticipation: int = 0
- estimation_method: str = 'dr'
- influence_functions: np.ndarray | None = None
- bootstrap_results: CSBootstrapResults | None = None
- pscore_trim: float = 0.01
- epv_threshold: float = 10
- pscore_fallback: str = 'error'
- property att: float
- property se: float
- property p_value: float
- property t_stat: float
- property coef_var: float
SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.
- Type:
Coefficient of variation
- summary(alpha=None)[source]
Generate formatted summary of estimation results.
- print_summary(alpha=None)[source]
Print summary to stdout.
- Parameters:
alpha (float | None)
- Return type:
None
- epv_summary(show_all=False)[source]
Return per-cohort EPV diagnostics as a DataFrame.
- Parameters:
show_all (bool, default False) – If False, only show cells with low EPV. If True, show all cells.
- Returns:
Columns: group, time, epv, n_events, n_params, is_low.
- Return type:
pd.DataFrame
- to_dataframe(level='group_time')[source]
Convert results to DataFrame.
- Parameters:
level (str, default="group_time") – Level of aggregation: “group_time”, “event_study”, or “group”.
- Returns:
Results as DataFrame.
- Return type:
pd.DataFrame
- property is_significant: bool
Check if overall ATT is significant.
- property significance_stars: str
Significance stars for overall ATT.
- __init__(group_time_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, groups, time_periods, n_obs, n_treated_units, n_control_units, n_never_enabled, n_eligible, n_ineligible, alpha=0.05, control_group='notyettreated', base_period='varying', anticipation=0, estimation_method='dr', event_study_effects=None, group_effects=None, influence_functions=None, bootstrap_results=None, cband_crit_value=None, pscore_trim=0.01, survey_metadata=None, comparison_group_counts=None, gmm_weights=None, epv_diagnostics=None, epv_threshold=10, pscore_fallback='error')
- Parameters:
overall_att (float)
overall_se (float)
overall_t_stat (float)
overall_p_value (float)
n_obs (int)
n_treated_units (int)
n_control_units (int)
n_never_enabled (int)
n_eligible (int)
n_ineligible (int)
alpha (float)
control_group (str)
base_period (str)
anticipation (int)
estimation_method (str)
influence_functions (np.ndarray | None)
bootstrap_results (CSBootstrapResults | None)
cband_crit_value (float | None)
pscore_trim (float)
survey_metadata (Any | None)
epv_diagnostics (Dict[Tuple[Any, Any], Dict[str, Any]] | None)
epv_threshold (float)
pscore_fallback (str)
- Return type:
None