Two-Stage DiD (Gardner 2022)#

Two-stage residualization estimator for staggered Difference-in-Differences.

This module implements the methodology from Gardner (2022), “Two-stage differences in differences”. The method:

Estimates unit + time fixed effects on untreated observations only
Residualizes ALL outcomes using the estimated fixed effects
Regresses residualized outcomes on treatment indicators (Stage 2)

Inference uses the GMM sandwich variance estimator from Butts & Gardner (2022) that correctly accounts for first-stage estimation uncertainty. Point estimates are identical to ImputationDiD (Borusyak et al. 2024); the key difference is the variance estimator (GMM sandwich vs. conservative).

When to use TwoStageDiD:

Staggered adoption settings where you want efficient point estimates with variance that accounts for first-stage estimation uncertainty
When you prefer the GMM sandwich variance over the conservative variance used by ImputationDiD — the sandwich estimator can yield tighter confidence intervals when first-stage uncertainty is small
As a robustness check alongside CallawaySantAnna and ImputationDiD: if all estimators agree, results are robust; if they disagree, investigate treatment effect heterogeneity
When you need an event study that is free of TWFE contamination bias

Reference: Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943. Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. R Journal, 14(1), 162-173.

TwoStageDiD#

Main estimator class for two-stage DiD estimation.

class diff_diff.TwoStageDiD[source]

Bases: TwoStageDiDBootstrapMixin

Gardner (2022) two-stage Difference-in-Differences estimator.

This estimator addresses TWFE bias under heterogeneous treatment effects by: 1. Estimating unit + time FEs on untreated observations only 2. Residualizing ALL outcomes using estimated FEs 3. Regressing residualized outcomes on treatment indicators

Point estimates are identical to ImputationDiD (Borusyak et al. 2024). The key difference is the variance estimator: TwoStageDiD uses a GMM sandwich variance that accounts for first-stage estimation uncertainty, while ImputationDiD uses the conservative variance from Theorem 3.

Parameters:

anticipation (int, default=0) – Number of periods before treatment where effects may occur.
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default.
n_bootstrap (int, default=0) – Number of bootstrap iterations. If 0, uses analytical GMM sandwich inference.
bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher”, “mammen”, or “webb”.
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient: - “warn”: Issue warning and drop linearly dependent columns - “error”: Raise ValueError - “silent”: Drop columns silently
horizon_max (int, optional) – Maximum event-study horizon. If set, event study effects are only computed for abs(h) <= horizon_max.
pretrends (bool, default=False) – If True, event study includes pre-treatment horizons for visual pre-trends assessment. Pre-period effects should be ~0 under parallel trends. Only affects event_study aggregation; overall ATT and group aggregation are unchanged.

results_

Estimation results after calling fit().

Type:: TwoStageDiDResults

is_fitted_

Whether the model has been fitted.

Type:: bool

Examples

Basic usage:

>>> from diff_diff import TwoStageDiD, generate_staggered_data
>>> data = generate_staggered_data(n_units=200, seed=42)
>>> est = TwoStageDiD()
>>> results = est.fit(data, outcome='outcome', unit='unit',
...                   time='period', first_treat='first_treat')
>>> results.print_summary()

With event study:

>>> est = TwoStageDiD()
>>> results = est.fit(data, outcome='outcome', unit='unit',
...                   time='period', first_treat='first_treat',
...                   aggregate='event_study')
>>> from diff_diff import plot_event_study
>>> plot_event_study(results)

Notes

The two-stage estimator uses ALL untreated observations (never-treated + not-yet-treated periods of eventually-treated units) to estimate the counterfactual model.

References

Gardner, J. (2022). Two-stage differences in differences.: arXiv:2207.05943.
Butts, K. & Gardner, J. (2022). did2s: Two-Stage: Difference-in-Differences. R Journal, 14(1), 162-173.

Methods

`fit`(data, outcome, unit, time, first_treat)	Fit the two-stage DiD estimator.
`get_params`()	Get estimator parameters (sklearn-compatible).
`set_params`(**params)	Set estimator parameters (sklearn-compatible).

__init__(anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', horizon_max=None, pretrends=False)[source]

Parameters:

anticipation (int)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str)
seed (int | None)
rank_deficient_action (str)
horizon_max (int | None)
pretrends (bool)

alpha: float

n_bootstrap: int

bootstrap_weights: str

seed: int | None

horizon_max: int | None

results_: TwoStageDiDResults | None

fit(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None, survey_design=None)[source]

Fit the two-stage DiD estimator.

Parameters:

data (pd.DataFrame) – Panel data with unit and time identifiers.
outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.
covariates (list of str, optional) – List of covariate column names.
aggregate (str, optional) – Aggregation mode: None/”simple” (overall ATT only), “event_study”, “group”, or “all”.
balance_e (int, optional) – When computing event study, restrict to cohorts observed at all relative times in [-balance_e, max_h].
survey_design (SurveyDesign, optional) – Survey design specification for design-based inference. Supports pweight only (aweight/fweight raise ValueError). Supports strata, PSU, and FPC for design-based GMM sandwich variance. Strata enters survey df for t-distribution inference. Both analytical (n_bootstrap=0) and bootstrap inference are supported.

Returns:

Object containing all estimation results.

Return type:

TwoStageDiDResults

Raises:

ValueError – If required columns are missing or data validation fails.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:: Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:: TwoStageDiD

summary()[source]

Get summary of estimation results.

Return type:: str

print_summary()[source]

Print summary to stdout.

Return type:: None

TwoStageDiDResults#

Results container for two-stage DiD estimation.

class diff_diff.TwoStageDiDResults[source]

Bases: object

Results from Gardner (2022) two-stage DiD estimation.

treatment_effects

Per-observation treatment effects with columns: unit, time, tau_hat, weight. tau_hat is the residualized outcome y_tilde for treated observations; weight is 1/n_treated.

Type:: pd.DataFrame

overall_att

Overall average treatment effect on the treated.

Type:: float

overall_se

Standard error of overall ATT (GMM sandwich).

Type:: float

overall_t_stat

T-statistic for overall ATT.

Type:: float

overall_p_value

P-value for overall ATT.

Type:: float

overall_conf_int

Confidence interval for overall ATT.

Type:: tuple

event_study_effects

Dictionary mapping relative time h to effect dict with keys: ‘effect’, ‘se’, ‘t_stat’, ‘p_value’, ‘conf_int’, ‘n_obs’.

Type:: dict, optional

group_effects

Dictionary mapping cohort g to effect dict.

Type:: dict, optional

groups

List of treatment cohorts.

Type:: list

time_periods

List of all time periods.

Type:: list

n_obs

Total number of observations.

Type:: int

n_treated_obs

Number of treated observations.

Type:: int

n_untreated_obs

Number of untreated observations.

Type:: int

n_treated_units

Number of ever-treated units.

Type:: int

n_control_units

Number of units contributing to untreated observations.

Type:: int

alpha

Significance level used.

Type:: float

bootstrap_results

Bootstrap inference results.

Type:: TwoStageBootstrapResults, optional

Methods

`summary`([alpha])	Generate formatted summary of estimation results.
`print_summary`([alpha])	Print summary to stdout.
`to_dataframe`([level])	Convert results to DataFrame.

treatment_effects: DataFrame

overall_att: float

overall_se: float

overall_t_stat: float

overall_p_value: float

overall_conf_int: Tuple[float, float]

event_study_effects: Dict[int, Dict[str, Any]] | None

group_effects: Dict[Any, Dict[str, Any]] | None

groups: List[Any]

time_periods: List[Any]

n_obs: int

n_treated_obs: int

n_untreated_obs: int

n_treated_units: int

n_control_units: int

alpha: float = 0.05

anticipation: int = 0

bootstrap_results: TwoStageBootstrapResults | None = None

survey_metadata: Any | None = None

property att: float

property se: float

property conf_int: Tuple[float, float]

property p_value: float

property t_stat: float

__repr__()[source]

Concise string representation.

Return type:: str

property coef_var: float

SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.

Type:: Coefficient of variation

summary(alpha=None)[source]

Generate formatted summary of estimation results.

Parameters:: alpha (float, optional) – Significance level. Defaults to alpha used in estimation.
Returns:: Formatted summary.
Return type:: str

print_summary(alpha=None)[source]

Print summary to stdout.

Parameters:: alpha (float | None)
Return type:: None

to_dataframe(level='event_study')[source]

Convert results to DataFrame.

Parameters:: level (str, default="event_study") – Level of aggregation: - “event_study”: Event study effects by relative time - “group”: Group (cohort) effects - “observation”: Per-observation treatment effects
Returns:: Results as DataFrame.
Return type:: pd.DataFrame

property is_significant: bool: Check if overall ATT is significant.

property significance_stars: str: Significance stars for overall ATT.

__init__(treatment_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, event_study_effects, group_effects, groups, time_periods, n_obs, n_treated_obs, n_untreated_obs, n_treated_units, n_control_units, alpha=0.05, anticipation=0, bootstrap_results=None, survey_metadata=None)

Parameters:

treatment_effects (DataFrame)
overall_att (float)
overall_se (float)
overall_t_stat (float)
overall_p_value (float)
overall_conf_int (Tuple[float, float])
event_study_effects (Dict[int, Dict[str, Any]] | None)
group_effects (Dict[Any, Dict[str, Any]] | None)
groups (List[Any])
time_periods (List[Any])
n_obs (int)
n_treated_obs (int)
n_untreated_obs (int)
n_treated_units (int)
n_control_units (int)
alpha (float)
anticipation (int)
bootstrap_results (TwoStageBootstrapResults | None)
survey_metadata (Any | None)

Return type:

None

TwoStageBootstrapResults#

Bootstrap inference results.

class diff_diff.TwoStageBootstrapResults[source]

Bases: object

Results from TwoStageDiD bootstrap inference.

Bootstrap uses multiplier bootstrap on the GMM influence function, consistent with other library estimators. The R did2s package uses block bootstrap by default; multiplier bootstrap is asymptotically equivalent.

n_bootstrap

Number of bootstrap iterations.

Type:: int

weight_type

Type of bootstrap weights: “rademacher”, “mammen”, or “webb”.

Type:: str

alpha

Significance level used for confidence intervals.

Type:: float

overall_att_se

Bootstrap standard error for overall ATT.

Type:: float

overall_att_ci

Bootstrap confidence interval for overall ATT.

Type:: tuple

overall_att_p_value

Bootstrap p-value for overall ATT.

Type:: float

event_study_ses

Bootstrap SEs for event study effects.

Type:: dict, optional

event_study_cis

Bootstrap CIs for event study effects.

Type:: dict, optional

event_study_p_values

Bootstrap p-values for event study effects.

Type:: dict, optional

group_ses

Bootstrap SEs for group effects.

Type:: dict, optional

group_cis

Bootstrap CIs for group effects.

Type:: dict, optional

group_p_values

Bootstrap p-values for group effects.

Type:: dict, optional

bootstrap_distribution

Full bootstrap distribution of overall ATT.

Type:: np.ndarray, optional

n_bootstrap: int

weight_type: str

alpha: float

overall_att_se: float

overall_att_ci: Tuple[float, float]

overall_att_p_value: float

event_study_ses: Dict[int, float] | None = None

event_study_cis: Dict[int, Tuple[float, float]] | None = None

event_study_p_values: Dict[int, float] | None = None

group_ses: Dict[Any, float] | None = None

group_cis: Dict[Any, Tuple[float, float]] | None = None

group_p_values: Dict[Any, float] | None = None

bootstrap_distribution: ndarray | None = None

__init__(n_bootstrap, weight_type, alpha, overall_att_se, overall_att_ci, overall_att_p_value, event_study_ses=None, event_study_cis=None, event_study_p_values=None, group_ses=None, group_cis=None, group_p_values=None, bootstrap_distribution=None)

Parameters:

n_bootstrap (int)
weight_type (str)
alpha (float)
overall_att_se (float)
overall_att_ci (Tuple[float, float])
overall_att_p_value (float)
event_study_ses (Dict[int, float] | None)
event_study_cis (Dict[int, Tuple[float, float]] | None)
event_study_p_values (Dict[int, float] | None)
group_ses (Dict[Any, float] | None)
group_cis (Dict[Any, Tuple[float, float]] | None)
group_p_values (Dict[Any, float] | None)
bootstrap_distribution (ndarray | None)

Return type:

None

Convenience Function#

diff_diff.two_stage_did(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None, survey_design=None, **kwargs)[source]#

Convenience function for two-stage DiD estimation.

This is a shortcut for creating a TwoStageDiD estimator and calling fit().

Parameters:

data (pd.DataFrame) – Panel data.
outcome (str) – Outcome variable column name.
unit (str) – Unit identifier column name.
time (str) – Time period column name.
first_treat (str) – Column indicating first treatment period (0 for never-treated).
covariates (list of str, optional) – Covariate column names.
aggregate (str, optional) – Aggregation mode: None, “simple”, “event_study”, “group”, “all”.
balance_e (int, optional) – Balance event study to cohorts observed at all relative times.
survey_design (SurveyDesign, optional) – Survey design specification for design-based inference. Supports pweight only (aweight/fweight raise ValueError). Supports strata, PSU, and FPC for design-based GMM sandwich variance. Strata enters survey df for t-distribution inference. Both analytical (n_bootstrap=0) and bootstrap inference are supported.
**kwargs – Additional keyword arguments passed to TwoStageDiD constructor.

Returns:

Estimation results.

Return type:

TwoStageDiDResults

Examples

>>> from diff_diff import two_stage_did, generate_staggered_data
>>> data = generate_staggered_data(seed=42)
>>> results = two_stage_did(data, 'outcome', 'unit', 'period',
...                         'first_treat', aggregate='event_study')
>>> results.print_summary()

Example Usage#

Basic usage:

from diff_diff import TwoStageDiD, generate_staggered_data

data = generate_staggered_data(n_units=200, seed=42)
est = TwoStageDiD()
results = est.fit(data, outcome='outcome', unit='unit',
                  time='period', first_treat='first_treat')
results.print_summary()

Event study with visualization:

from diff_diff import TwoStageDiD, plot_event_study

est = TwoStageDiD()
results = est.fit(data, outcome='outcome', unit='unit',
                  time='period', first_treat='first_treat',
                  aggregate='event_study')
plot_event_study(results)

Comparison with other estimators:

from diff_diff import TwoStageDiD, CallawaySantAnna, ImputationDiD

# All three should agree under homogeneous effects
ts = TwoStageDiD().fit(data, outcome='outcome', unit='unit',
                       time='period', first_treat='first_treat')
cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',
                            time='period', first_treat='first_treat')
imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',
                          time='period', first_treat='first_treat')

print(f"Two-Stage ATT: {ts.overall_att:.3f} (SE: {ts.overall_se:.3f})")
print(f"CS ATT:        {cs.overall_att:.3f} (SE: {cs.overall_se:.3f})")
print(f"Imputation ATT:{imp.overall_att:.3f} (SE: {imp.overall_se:.3f})")

Estimator Comparison#

TwoStageDiD vs. CallawaySantAnna vs. ImputationDiD#
Feature	TwoStageDiD	CallawaySantAnna	ImputationDiD
Point estimates	Identical to ImputationDiD	Group-time ATT(g,t)	Identical to TwoStageDiD
Variance estimator	GMM sandwich (accounts for first-stage uncertainty)	Analytical IF/WIF or multiplier bootstrap	Conservative (Theorem 3)
Control group	Never-treated + not-yet-treated	Never-treated or not-yet-treated	Never-treated + not-yet-treated
Efficiency	High (uses all untreated obs)	Lower (2x2 comparisons)	High (uses all untreated obs)
Heterogeneous effects	Consistent under homogeneity	Robust to heterogeneity	Consistent under homogeneity
Covariates	Supported	Supported (outcome regression or IPW)	Supported
Bootstrap	Multiplier bootstrap on GMM influence function	Multiplier bootstrap (IF/WIF)	Multiplier bootstrap