Two-Stage DiD (Gardner 2022)#
Two-stage residualization estimator for staggered Difference-in-Differences.
This module implements the methodology from Gardner (2022), “Two-stage differences in differences”. The method:
Estimates unit + time fixed effects on untreated observations only
Residualizes ALL outcomes using the estimated fixed effects
Regresses residualized outcomes on treatment indicators (Stage 2)
Inference uses the GMM sandwich variance estimator from Butts & Gardner (2022) that correctly accounts for first-stage estimation uncertainty. Point estimates are identical to ImputationDiD (Borusyak et al. 2024); the key difference is the variance estimator (GMM sandwich vs. conservative).
When to use TwoStageDiD:
Staggered adoption settings where you want efficient point estimates with variance that accounts for first-stage estimation uncertainty
When you prefer the GMM sandwich variance over the conservative variance used by ImputationDiD — the sandwich estimator can yield tighter confidence intervals when first-stage uncertainty is small
As a robustness check alongside CallawaySantAnna and ImputationDiD: if all estimators agree, results are robust; if they disagree, investigate treatment effect heterogeneity
When you need an event study that is free of TWFE contamination bias
Reference: Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943. Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. R Journal, 14(1), 162-173.
TwoStageDiD#
Main estimator class for two-stage DiD estimation.
- class diff_diff.TwoStageDiD[source]
Bases:
TwoStageDiDBootstrapMixinGardner (2022) two-stage Difference-in-Differences estimator.
This estimator addresses TWFE bias under heterogeneous treatment effects by: 1. Estimating unit + time FEs on untreated observations only 2. Residualizing ALL outcomes using estimated FEs 3. Regressing residualized outcomes on treatment indicators
Point estimates are identical to ImputationDiD (Borusyak et al. 2024). The key difference is the variance estimator: TwoStageDiD uses a GMM sandwich variance that accounts for first-stage estimation uncertainty, while ImputationDiD uses the conservative variance from Theorem 3.
- Parameters:
anticipation (int, default=0) – Number of periods before treatment where effects may occur.
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default.
n_bootstrap (int, default=0) – Number of bootstrap iterations. If 0, uses analytical GMM sandwich inference.
bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher”, “mammen”, or “webb”.
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient: - “warn”: Issue warning and drop linearly dependent columns - “error”: Raise ValueError - “silent”: Drop columns silently
horizon_max (int, optional) – Maximum event-study horizon. If set, event study effects are only computed for abs(h) <= horizon_max.
pretrends (bool, default=False) – If True, event study includes pre-treatment horizons for visual pre-trends assessment. Pre-period effects should be ~0 under parallel trends. Only affects event_study aggregation; overall ATT and group aggregation are unchanged.
- results_
Estimation results after calling fit().
- Type:
- is_fitted_
Whether the model has been fitted.
- Type:
Examples
Basic usage:
>>> from diff_diff import TwoStageDiD, generate_staggered_data >>> data = generate_staggered_data(n_units=200, seed=42) >>> est = TwoStageDiD() >>> results = est.fit(data, outcome='outcome', unit='unit', ... time='period', first_treat='first_treat') >>> results.print_summary()
With event study:
>>> est = TwoStageDiD() >>> results = est.fit(data, outcome='outcome', unit='unit', ... time='period', first_treat='first_treat', ... aggregate='event_study') >>> from diff_diff import plot_event_study >>> plot_event_study(results)
Notes
The two-stage estimator uses ALL untreated observations (never-treated + not-yet-treated periods of eventually-treated units) to estimate the counterfactual model.
References
- Gardner, J. (2022). Two-stage differences in differences.
arXiv:2207.05943.
- Butts, K. & Gardner, J. (2022). did2s: Two-Stage
Difference-in-Differences. R Journal, 14(1), 162-173.
Methods
fit(data, outcome, unit, time, first_treat)Fit the two-stage DiD estimator.
get_params()Get estimator parameters (sklearn-compatible).
set_params(**params)Set estimator parameters (sklearn-compatible).
- __init__(anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', horizon_max=None, pretrends=False)[source]
- alpha: float
- n_bootstrap: int
- bootstrap_weights: str
- results_: TwoStageDiDResults | None
- fit(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None, survey_design=None)[source]
Fit the two-stage DiD estimator.
- Parameters:
data (pd.DataFrame) – Panel data with unit and time identifiers.
outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.
covariates (list of str, optional) – List of covariate column names.
aggregate (str, optional) – Aggregation mode: None/”simple” (overall ATT only), “event_study”, “group”, or “all”.
balance_e (int, optional) – When computing event study, restrict to cohorts observed at all relative times in [-balance_e, max_h].
survey_design (SurveyDesign, optional) – Survey design specification for design-based inference. Supports pweight only (aweight/fweight raise ValueError). Supports strata, PSU, and FPC for design-based GMM sandwich variance. Strata enters survey df for t-distribution inference. Both analytical (n_bootstrap=0) and bootstrap inference are supported.
- Returns:
Object containing all estimation results.
- Return type:
- Raises:
ValueError – If required columns are missing or data validation fails.
- set_params(**params)[source]
Set estimator parameters (sklearn-compatible).
- Return type:
- print_summary()[source]
Print summary to stdout.
- Return type:
None
TwoStageDiDResults#
Results container for two-stage DiD estimation.
- class diff_diff.TwoStageDiDResults[source]
Bases:
objectResults from Gardner (2022) two-stage DiD estimation.
- treatment_effects
Per-observation treatment effects with columns: unit, time, tau_hat, weight. tau_hat is the residualized outcome y_tilde for treated observations; weight is 1/n_treated.
- Type:
pd.DataFrame
- overall_att
Overall average treatment effect on the treated.
- Type:
- overall_se
Standard error of overall ATT (GMM sandwich).
- Type:
- overall_t_stat
T-statistic for overall ATT.
- Type:
- overall_p_value
P-value for overall ATT.
- Type:
- overall_conf_int
Confidence interval for overall ATT.
- Type:
- event_study_effects
Dictionary mapping relative time h to effect dict with keys: ‘effect’, ‘se’, ‘t_stat’, ‘p_value’, ‘conf_int’, ‘n_obs’.
- Type:
dict, optional
- group_effects
Dictionary mapping cohort g to effect dict.
- Type:
dict, optional
- groups
List of treatment cohorts.
- Type:
- time_periods
List of all time periods.
- Type:
- n_obs
Total number of observations.
- Type:
- n_treated_obs
Number of treated observations.
- Type:
- n_untreated_obs
Number of untreated observations.
- Type:
- n_treated_units
Number of ever-treated units.
- Type:
- n_control_units
Number of units contributing to untreated observations.
- Type:
- alpha
Significance level used.
- Type:
- bootstrap_results
Bootstrap inference results.
- Type:
TwoStageBootstrapResults, optional
Methods
summary([alpha])Generate formatted summary of estimation results.
print_summary([alpha])Print summary to stdout.
to_dataframe([level])Convert results to DataFrame.
- treatment_effects: DataFrame
- overall_att: float
- overall_se: float
- overall_t_stat: float
- overall_p_value: float
- n_obs: int
- n_treated_obs: int
- n_untreated_obs: int
- n_treated_units: int
- n_control_units: int
- alpha: float = 0.05
- anticipation: int = 0
- bootstrap_results: TwoStageBootstrapResults | None = None
- property att: float
- property se: float
- property p_value: float
- property t_stat: float
- property coef_var: float
SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.
- Type:
Coefficient of variation
- summary(alpha=None)[source]
Generate formatted summary of estimation results.
- print_summary(alpha=None)[source]
Print summary to stdout.
- Parameters:
alpha (float | None)
- Return type:
None
- to_dataframe(level='event_study')[source]
Convert results to DataFrame.
- Parameters:
level (str, default="event_study") – Level of aggregation: - “event_study”: Event study effects by relative time - “group”: Group (cohort) effects - “observation”: Per-observation treatment effects
- Returns:
Results as DataFrame.
- Return type:
pd.DataFrame
- property is_significant: bool
Check if overall ATT is significant.
- property significance_stars: str
Significance stars for overall ATT.
- __init__(treatment_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, event_study_effects, group_effects, groups, time_periods, n_obs, n_treated_obs, n_untreated_obs, n_treated_units, n_control_units, alpha=0.05, anticipation=0, bootstrap_results=None, survey_metadata=None)
- Parameters:
treatment_effects (DataFrame)
overall_att (float)
overall_se (float)
overall_t_stat (float)
overall_p_value (float)
n_obs (int)
n_treated_obs (int)
n_untreated_obs (int)
n_treated_units (int)
n_control_units (int)
alpha (float)
anticipation (int)
bootstrap_results (TwoStageBootstrapResults | None)
survey_metadata (Any | None)
- Return type:
None
TwoStageBootstrapResults#
Bootstrap inference results.
- class diff_diff.TwoStageBootstrapResults[source]
Bases:
objectResults from TwoStageDiD bootstrap inference.
Bootstrap uses multiplier bootstrap on the GMM influence function, consistent with other library estimators. The R did2s package uses block bootstrap by default; multiplier bootstrap is asymptotically equivalent.
- n_bootstrap
Number of bootstrap iterations.
- Type:
- weight_type
Type of bootstrap weights: “rademacher”, “mammen”, or “webb”.
- Type:
- alpha
Significance level used for confidence intervals.
- Type:
- overall_att_se
Bootstrap standard error for overall ATT.
- Type:
- overall_att_ci
Bootstrap confidence interval for overall ATT.
- Type:
- overall_att_p_value
Bootstrap p-value for overall ATT.
- Type:
- event_study_ses
Bootstrap SEs for event study effects.
- Type:
dict, optional
- event_study_cis
Bootstrap CIs for event study effects.
- Type:
dict, optional
- event_study_p_values
Bootstrap p-values for event study effects.
- Type:
dict, optional
- group_ses
Bootstrap SEs for group effects.
- Type:
dict, optional
- group_cis
Bootstrap CIs for group effects.
- Type:
dict, optional
- group_p_values
Bootstrap p-values for group effects.
- Type:
dict, optional
- bootstrap_distribution
Full bootstrap distribution of overall ATT.
- Type:
np.ndarray, optional
- n_bootstrap: int
- weight_type: str
- alpha: float
- overall_att_se: float
- overall_att_p_value: float
- __init__(n_bootstrap, weight_type, alpha, overall_att_se, overall_att_ci, overall_att_p_value, event_study_ses=None, event_study_cis=None, event_study_p_values=None, group_ses=None, group_cis=None, group_p_values=None, bootstrap_distribution=None)
Convenience Function#
- diff_diff.two_stage_did(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None, survey_design=None, **kwargs)[source]#
Convenience function for two-stage DiD estimation.
This is a shortcut for creating a TwoStageDiD estimator and calling fit().
- Parameters:
data (pd.DataFrame) – Panel data.
outcome (str) – Outcome variable column name.
unit (str) – Unit identifier column name.
time (str) – Time period column name.
first_treat (str) – Column indicating first treatment period (0 for never-treated).
covariates (list of str, optional) – Covariate column names.
aggregate (str, optional) – Aggregation mode: None, “simple”, “event_study”, “group”, “all”.
balance_e (int, optional) – Balance event study to cohorts observed at all relative times.
survey_design (SurveyDesign, optional) – Survey design specification for design-based inference. Supports pweight only (aweight/fweight raise ValueError). Supports strata, PSU, and FPC for design-based GMM sandwich variance. Strata enters survey df for t-distribution inference. Both analytical (n_bootstrap=0) and bootstrap inference are supported.
**kwargs – Additional keyword arguments passed to TwoStageDiD constructor.
- Returns:
Estimation results.
- Return type:
Examples
>>> from diff_diff import two_stage_did, generate_staggered_data >>> data = generate_staggered_data(seed=42) >>> results = two_stage_did(data, 'outcome', 'unit', 'period', ... 'first_treat', aggregate='event_study') >>> results.print_summary()
Example Usage#
Basic usage:
from diff_diff import TwoStageDiD, generate_staggered_data
data = generate_staggered_data(n_units=200, seed=42)
est = TwoStageDiD()
results = est.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
results.print_summary()
Event study with visualization:
from diff_diff import TwoStageDiD, plot_event_study
est = TwoStageDiD()
results = est.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='event_study')
plot_event_study(results)
Comparison with other estimators:
from diff_diff import TwoStageDiD, CallawaySantAnna, ImputationDiD
# All three should agree under homogeneous effects
ts = TwoStageDiD().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
print(f"Two-Stage ATT: {ts.overall_att:.3f} (SE: {ts.overall_se:.3f})")
print(f"CS ATT: {cs.overall_att:.3f} (SE: {cs.overall_se:.3f})")
print(f"Imputation ATT:{imp.overall_att:.3f} (SE: {imp.overall_se:.3f})")
Estimator Comparison#
Feature |
TwoStageDiD |
CallawaySantAnna |
ImputationDiD |
|---|---|---|---|
Point estimates |
Identical to ImputationDiD |
Group-time ATT(g,t) |
Identical to TwoStageDiD |
Variance estimator |
GMM sandwich (accounts for first-stage uncertainty) |
Analytical IF/WIF or multiplier bootstrap |
Conservative (Theorem 3) |
Control group |
Never-treated + not-yet-treated |
Never-treated or not-yet-treated |
Never-treated + not-yet-treated |
Efficiency |
High (uses all untreated obs) |
Lower (2x2 comparisons) |
High (uses all untreated obs) |
Heterogeneous effects |
Consistent under homogeneity |
Robust to heterogeneity |
Consistent under homogeneity |
Covariates |
Supported |
Supported (outcome regression or IPW) |
Supported |
Bootstrap |
Multiplier bootstrap on GMM influence function |
Multiplier bootstrap (IF/WIF) |
Multiplier bootstrap |