Interactive notebook
This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.
Two-Stage DiD (Gardner 2022)#
This tutorial demonstrates the TwoStageDiD estimator, which implements the two-stage difference-in-differences method from Gardner (2022), “Two-stage differences in differences”, with inference from Butts & Gardner (2022), “did2s: Two-Stage Difference-in-Differences”.
When to use TwoStageDiD:
Staggered adoption settings where you want GMM sandwich variance that accounts for first-stage estimation uncertainty
When you want per-observation treatment effects (
treatment_effectsDataFrame) for granular analysisAs a robustness check alongside ImputationDiD: identical point estimates with different inference confirm results are not an artifact of variance estimator choice
[ ]:
import numpy as np
import warnings
warnings.filterwarnings('ignore')
from diff_diff import (
TwoStageDiD, ImputationDiD, CallawaySantAnna,
generate_staggered_data, plot_event_study
)
# For nicer plots (optional)
try:
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
HAS_MATPLOTLIB = True
except ImportError:
HAS_MATPLOTLIB = False
print("matplotlib not installed - visualization examples will be skipped")
Basic Usage#
The two-stage estimator follows a simple algorithm:
Estimate unit and time fixed effects using only untreated observations (never-treated + not-yet-treated periods)
Residualize all outcomes using those estimated FEs
Regress residualized outcomes on treatment indicators to obtain the ATT
This avoids TWFE bias because the fixed effect model is estimated only on clean (untreated) data, preventing treated outcomes from contaminating the counterfactual.
[ ]:
# Generate staggered adoption data with known treatment effect
data = generate_staggered_data(n_units=300, n_periods=10, treatment_effect=2.0, seed=42)
# Fit the two-stage estimator
est = TwoStageDiD()
results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')
results.print_summary()
Event Study#
Event study aggregation estimates treatment effects at each relative time horizon, enabling visualization of dynamic effects and informal pre-trend assessment.
[ ]:
# Fit with event study aggregation
est = TwoStageDiD()
results_es = est.fit(data, outcome='outcome', unit='unit', time='period',
first_treat='first_treat', aggregate='event_study')
# Plot event study
if HAS_MATPLOTLIB:
plot_event_study(results_es, title='Two-Stage DiD Event Study')
else:
print("Install matplotlib to see visualizations: pip install matplotlib")
[ ]:
# View event study effects as a table
results_es.to_dataframe(level='event_study')
Per-Observation Treatment Effects#
Both TwoStageDiD and ImputationDiD provide a treatment_effects DataFrame containing one row per treated observation with:
tau_hat: the residualized outcome (actual outcome minus estimated counterfactual)The unit and time columns (using the original column names from the input data, e.g.,
unitandperiod)rel_time: relative time since treatmentweight: aggregation weight —1/n_validfor observations with finitetau_hat,0for NaN rows (e.g., rank-deficient cases)
This enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes.
[ ]:
# Per-observation treatment effects (available from the basic fit)
te = results.treatment_effects
print(f"Shape: {te.shape}")
print(f"Columns: {list(te.columns)}")
print()
te.head(10)
Comparison with Other Estimators#
TwoStageDiD and ImputationDiD produce identical point estimates because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).
CallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. It uses analytical influence-function standard errors by default, with optional multiplier bootstrap when n_bootstrap > 0.
Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.
[ ]:
# Fit all three estimators on the same data
ts = TwoStageDiD().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
print("Estimator Comparison (True effect = 2.0)")
print("=" * 55)
print(f"{'Estimator':<25} {'ATT':>8} {'SE':>8} {'CI Width':>10}")
print("-" * 55)
for name, r in [("TwoStageDiD", ts), ("ImputationDiD", imp), ("CallawaySantAnna", cs)]:
ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]
print(f"{name:<25} {r.overall_att:>8.3f} {r.overall_se:>8.3f} {ci_width:>10.3f}")
Group Aggregation#
Group aggregation estimates average treatment effects by treatment cohort (groups defined by first treatment period).
[ ]:
# Fit with group aggregation
results_grp = TwoStageDiD().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='group')
results_grp.to_dataframe(level='group')
Advanced Features#
Anticipation#
If treatment effects begin before the official treatment date (e.g., firms change behavior in anticipation of a policy), use the anticipation parameter to shift the treatment onset back.
[ ]:
# Compare ATT with and without anticipation
est_antic = TwoStageDiD(anticipation=1)
results_antic = est_antic.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
print(f"ATT (no anticipation): {results.overall_att:.3f}")
print(f"ATT (1-period anticipation): {results_antic.overall_att:.3f}")
GMM Sandwich vs Conservative Variance#
The key methodological distinction between TwoStageDiD and ImputationDiD is the variance estimator:
ImputationDiD’s conservative variance (Theorem 3) is valid under heterogeneous treatment effects but may produce wider confidence intervals than necessary
TwoStageDiD’s GMM sandwich accounts for first-stage estimation uncertainty via an influence function correction term
In practice they usually agree closely; large divergence signals potential specification concerns
Bootstrap inference is also available via
n_bootstrap=199
[ ]:
# Horizon-by-horizon SE comparison
ts_es = TwoStageDiD().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='event_study')
imp_es = ImputationDiD().fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='event_study')
print("Horizon-by-Horizon Comparison: GMM Sandwich vs Conservative Variance")
print("=" * 70)
print(f"{'Horizon':>8} {'Effect':>10} {'GMM SE':>10} {'Cons. SE':>10} {'Ratio':>8}")
print("-" * 70)
for h in sorted(ts_es.event_study_effects.keys()):
ts_eff = ts_es.event_study_effects[h]
imp_eff = imp_es.event_study_effects[h]
if ts_eff.get('n_obs', 0) == 0:
print(f"{h:>8} {'[ref]':>10} {'---':>10} {'---':>10} {'---':>8}")
continue
effect = ts_eff['effect']
gmm_se = ts_eff['se']
cons_se = imp_eff['se']
ratio = gmm_se / cons_se if cons_se > 0 else np.nan
print(f"{h:>8} {effect:>10.4f} {gmm_se:>10.4f} {cons_se:>10.4f} {ratio:>8.3f}")
Summary#
Feature |
TwoStageDiD |
ImputationDiD |
CallawaySantAnna |
|---|---|---|---|
Approach |
Residualize via FE, regress on treatment |
Impute Y(0) via FE model |
Group-time ATT(g,t) |
Point estimates |
Identical to ImputationDiD |
Identical to TwoStageDiD |
Different weighting |
Variance |
GMM sandwich (influence function) |
Conservative (Theorem 3) |
Analytical influence function (optional bootstrap) |
Per-obs effects |
Yes ( |
Yes ( |
No |
Pre-trend test |
Via event study pre-periods |
Yes (built-in F-test) |
Via event study pre-periods |
Best for |
Robustness check, granular effects |
Maximum efficiency under homogeneity |
Heterogeneous effects |
References:
Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943.
Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. R Journal, 14(1), 162-173.