Interactive notebook

This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.

Two-Stage DiD (Gardner 2022)#

This tutorial demonstrates the TwoStageDiD estimator, which implements the two-stage difference-in-differences method from Gardner (2022), “Two-stage differences in differences”, with inference from Butts & Gardner (2022), “did2s: Two-Stage Difference-in-Differences”.

When to use TwoStageDiD:

  • Staggered adoption settings where you want GMM sandwich variance that accounts for first-stage estimation uncertainty

  • When you want per-observation treatment effects (treatment_effects DataFrame) for granular analysis

  • As a robustness check alongside ImputationDiD: identical point estimates with different inference confirm results are not an artifact of variance estimator choice

[ ]:
import numpy as np
import warnings
warnings.filterwarnings('ignore')

from diff_diff import (
    TwoStageDiD, ImputationDiD, CallawaySantAnna,
    generate_staggered_data, plot_event_study
)

# For nicer plots (optional)
try:
    import matplotlib.pyplot as plt
    plt.style.use('seaborn-v0_8-whitegrid')
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False
    print("matplotlib not installed - visualization examples will be skipped")

Basic Usage#

The two-stage estimator follows a simple algorithm:

  1. Estimate unit and time fixed effects using only untreated observations (never-treated + not-yet-treated periods)

  2. Residualize all outcomes using those estimated FEs

  3. Regress residualized outcomes on treatment indicators to obtain the ATT

This avoids TWFE bias because the fixed effect model is estimated only on clean (untreated) data, preventing treated outcomes from contaminating the counterfactual.

[ ]:
# Generate staggered adoption data with known treatment effect
data = generate_staggered_data(n_units=300, n_periods=10, treatment_effect=2.0, seed=42)

# Fit the two-stage estimator
est = TwoStageDiD()
results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')
results.print_summary()

Event Study#

Event study aggregation estimates treatment effects at each relative time horizon, enabling visualization of dynamic effects and informal pre-trend assessment.

[ ]:
# Fit with event study aggregation
est = TwoStageDiD()
results_es = est.fit(data, outcome='outcome', unit='unit', time='period',
                     first_treat='first_treat', aggregate='event_study')

# Plot event study
if HAS_MATPLOTLIB:
    plot_event_study(results_es, title='Two-Stage DiD Event Study')
else:
    print("Install matplotlib to see visualizations: pip install matplotlib")
[ ]:
# View event study effects as a table
results_es.to_dataframe(level='event_study')

Per-Observation Treatment Effects#

Both TwoStageDiD and ImputationDiD provide a treatment_effects DataFrame containing one row per treated observation with:

  • tau_hat: the residualized outcome (actual outcome minus estimated counterfactual)

  • The unit and time columns (using the original column names from the input data, e.g., unit and period)

  • rel_time: relative time since treatment

  • weight: aggregation weight — 1/n_valid for observations with finite tau_hat, 0 for NaN rows (e.g., rank-deficient cases)

This enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes.

[ ]:
# Per-observation treatment effects (available from the basic fit)
te = results.treatment_effects
print(f"Shape: {te.shape}")
print(f"Columns: {list(te.columns)}")
print()
te.head(10)

Comparison with Other Estimators#

TwoStageDiD and ImputationDiD produce identical point estimates because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).

CallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. It uses analytical influence-function standard errors by default, with optional multiplier bootstrap when n_bootstrap > 0.

Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.

[ ]:
# Fit all three estimators on the same data
ts = TwoStageDiD().fit(data, outcome='outcome', unit='unit',
                       time='period', first_treat='first_treat')
imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',
                          time='period', first_treat='first_treat')
cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',
                            time='period', first_treat='first_treat')

print("Estimator Comparison (True effect = 2.0)")
print("=" * 55)
print(f"{'Estimator':<25} {'ATT':>8} {'SE':>8} {'CI Width':>10}")
print("-" * 55)

for name, r in [("TwoStageDiD", ts), ("ImputationDiD", imp), ("CallawaySantAnna", cs)]:
    ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]
    print(f"{name:<25} {r.overall_att:>8.3f} {r.overall_se:>8.3f} {ci_width:>10.3f}")

Group Aggregation#

Group aggregation estimates average treatment effects by treatment cohort (groups defined by first treatment period).

[ ]:
# Fit with group aggregation
results_grp = TwoStageDiD().fit(data, outcome='outcome', unit='unit',
                                 time='period', first_treat='first_treat',
                                 aggregate='group')
results_grp.to_dataframe(level='group')

Advanced Features#

Anticipation#

If treatment effects begin before the official treatment date (e.g., firms change behavior in anticipation of a policy), use the anticipation parameter to shift the treatment onset back.

[ ]:
# Compare ATT with and without anticipation
est_antic = TwoStageDiD(anticipation=1)
results_antic = est_antic.fit(data, outcome='outcome', unit='unit',
                               time='period', first_treat='first_treat')
print(f"ATT (no anticipation):       {results.overall_att:.3f}")
print(f"ATT (1-period anticipation): {results_antic.overall_att:.3f}")

GMM Sandwich vs Conservative Variance#

The key methodological distinction between TwoStageDiD and ImputationDiD is the variance estimator:

  • ImputationDiD’s conservative variance (Theorem 3) is valid under heterogeneous treatment effects but may produce wider confidence intervals than necessary

  • TwoStageDiD’s GMM sandwich accounts for first-stage estimation uncertainty via an influence function correction term

  • In practice they usually agree closely; large divergence signals potential specification concerns

  • Bootstrap inference is also available via n_bootstrap=199

[ ]:
# Horizon-by-horizon SE comparison
ts_es = TwoStageDiD().fit(data, outcome='outcome', unit='unit',
                           time='period', first_treat='first_treat',
                           aggregate='event_study')
imp_es = ImputationDiD().fit(data, outcome='outcome', unit='unit',
                              time='period', first_treat='first_treat',
                              aggregate='event_study')

print("Horizon-by-Horizon Comparison: GMM Sandwich vs Conservative Variance")
print("=" * 70)
print(f"{'Horizon':>8} {'Effect':>10} {'GMM SE':>10} {'Cons. SE':>10} {'Ratio':>8}")
print("-" * 70)

for h in sorted(ts_es.event_study_effects.keys()):
    ts_eff = ts_es.event_study_effects[h]
    imp_eff = imp_es.event_study_effects[h]
    if ts_eff.get('n_obs', 0) == 0:
        print(f"{h:>8} {'[ref]':>10} {'---':>10} {'---':>10} {'---':>8}")
        continue
    effect = ts_eff['effect']
    gmm_se = ts_eff['se']
    cons_se = imp_eff['se']
    ratio = gmm_se / cons_se if cons_se > 0 else np.nan
    print(f"{h:>8} {effect:>10.4f} {gmm_se:>10.4f} {cons_se:>10.4f} {ratio:>8.3f}")

Summary#

Feature

TwoStageDiD

ImputationDiD

CallawaySantAnna

Approach

Residualize via FE, regress on treatment

Impute Y(0) via FE model

Group-time ATT(g,t)

Point estimates

Identical to ImputationDiD

Identical to TwoStageDiD

Different weighting

Variance

GMM sandwich (influence function)

Conservative (Theorem 3)

Analytical influence function (optional bootstrap)

Per-obs effects

Yes (treatment_effects)

Yes (treatment_effects)

No

Pre-trend test

Via event study pre-periods

Yes (built-in F-test)

Via event study pre-periods

Best for

Robustness check, granular effects

Maximum efficiency under homogeneity

Heterogeneous effects

References:

  • Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943.

  • Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. R Journal, 14(1), 162-173.