Interactive notebook
This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.
Wooldridge Extended Two-Way Fixed Effects (ETWFE)#
This tutorial demonstrates the WooldridgeDiD estimator (alias: ETWFE), which implements Wooldridge’s (2021, 2023) Extended Two-Way Fixed Effects approach — the basis of the Stata jwdid package.
What ETWFE does: Estimates cohort×time Average Treatment Effects (ATT(g,t)) via a single saturated regression that interacts treatment indicators with cohort×time cells. Unlike standard TWFE, it correctly handles heterogeneous treatment effects across cohorts and time periods. The key insight is to include all cohort×time interaction terms simultaneously, with unit and time fixed effects absorbed via within-transformation.
Key features:
Follows the Stata
jwdidspecification (OLS and nonlinear paths; see Methodology Registry for documented SE/aggregation deviations)Supports linear (OLS), Poisson, and logit link functions
Nonlinear ATTs use the Average Structural Function (ASF): E[f(η₁)] − E[f(η₀)]
Delta-method standard errors for all aggregations
Cluster-robust sandwich variance
Topics covered:
Basic OLS estimation
Cohort×time cell estimates ATT(g,t)
Aggregation: event-study, group, simple
Poisson QMLE for count / non-negative outcomes
Logit for binary outcomes
Comparison with Callaway-Sant’Anna
Parameter reference and guidance
Prerequisites:Tutorial 02(Staggered DiD).
See also:Tutorial 15for Efficient DiD,Tutorial 11for Imputation DiD.
[ ]:
import numpy as np
import pandas as pd
from diff_diff import WooldridgeDiD, CallawaySantAnna, generate_staggered_data
try:
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
HAS_MATPLOTLIB = True
except ImportError:
HAS_MATPLOTLIB = False
print("matplotlib not installed - visualization examples will be skipped")
Data Setup#
We use generate_staggered_data() to create a balanced panel with 3 treatment cohorts, a never-treated group, and a known ATT of 2.0. This makes it easy to verify estimation accuracy.
We also demonstrate with the mpdta dataset (Callaway & Sant’Anna 2021), which contains county-level log employment data with staggered minimum-wage adoption — the canonical benchmark for staggered DiD methods.
[ ]:
# Simulated data
data = generate_staggered_data(
n_units=300, n_periods=10, treatment_effect=2.0,
dynamic_effects=False, seed=42
)
print(f"Shape: {data.shape}")
print(f"Cohorts: {sorted(data['first_treat'].unique())}")
print(f"Periods: {sorted(data['period'].unique())}")
print()
data.head()
Basic OLS Estimation#
The default method='ols' fits a single regression with:
Treatment interaction dummies (one per treatment cohort x post-treatment period cell)
Unit fixed effects (absorbed via within-transformation)
Time fixed effects (absorbed via within-transformation)
With control_group='not_yet_treated' (default), pre-treatment observations from treated units sit in the regression baseline alongside not-yet-treated controls. With control_group='never_treated', pre-treatment interaction indicators are added so only never-treated units define the counterfactual baseline, and pre-treatment coefficients serve as placebo checks.
[ ]:
m = WooldridgeDiD() # default: method='ols'
r = m.fit(data, outcome='outcome', unit='unit', time='period', cohort='first_treat')
# Compute aggregations
r.aggregate('event').aggregate('group').aggregate('simple')
print(r.summary('simple'))
Cohort×Time Cell Estimates ATT(g,t)#
The raw building blocks are ATT(g,t) — the treatment effect for cohort g at calendar time t. These are stored in r.group_time_effects and correspond to Stata’s regression output table (first_treat#year#c.__tr__).
Post-treatment cells have t >= g; pre-treatment cells (t < g) serve as placebo checks.
[ ]:
print("Post-treatment ATT(g,t) cells")
print("{:>8} {:>8} | {:>10} {:>10} {:>7} {:>7}".format(
"cohort", "year", "Coef.", "Std.Err.", "t", "P>|t|"))
print("-" * 60)
for (g, t), v in sorted(r.group_time_effects.items()):
if t < g:
continue
row = "{:>8} {:>8} | {:>10.4f} {:>10.4f} {:>7.2f} {:>7.3f}".format(
int(g), int(t), v['att'], v['se'], v['t_stat'], v['p_value']
)
print(row)
[ ]:
# Also show pre-treatment placebo cells
print("Pre-treatment placebo ATT(g,t) cells (should be ~0 under parallel trends)")
print("{:>8} {:>8} | {:>10} {:>10} {:>7} {:>7}".format(
"cohort", "year", "Coef.", "Std.Err.", "t", "P>|t|"))
print("-" * 60)
for (g, t), v in sorted(r.group_time_effects.items()):
if t >= g:
continue
row = "{:>8} {:>8} | {:>10.4f} {:>10.4f} {:>7.2f} {:>7.3f}".format(
int(g), int(t), v['att'], v['se'], v['t_stat'], v['p_value']
)
print(row)
Aggregation Methods#
ETWFE supports four aggregation types, matching Stata’s estat post-estimation commands:
Python |
Stata |
Description |
|---|---|---|
|
|
By relative time k = t − g |
|
|
By treatment cohort g |
|
|
By calendar time t |
|
|
Overall weighted average ATT |
Standard errors use the delta method, propagating uncertainty from the cell-level ATT covariance matrix.
[ ]:
# Event-study aggregation: ATT by relative time k = t - g
print(r.summary('event'))
[ ]:
# Group aggregation: ATT averaged across post-treatment periods for each cohort
print(r.summary('group'))
[ ]:
# Simple ATT: overall weighted average
print(r.summary('simple'))
[ ]:
# Event study plot
if HAS_MATPLOTLIB:
es = r.event_study_effects
ks = sorted(es.keys())
atts = [es[k]['att'] for k in ks]
lo = [es[k]['conf_int'][0] for k in ks]
hi = [es[k]['conf_int'][1] for k in ks]
fig, ax = plt.subplots(figsize=(9, 5))
ax.errorbar(ks, atts, yerr=[np.array(atts) - np.array(lo), np.array(hi) - np.array(atts)],
fmt='o-', capsize=4, color='steelblue', label='ETWFE (OLS)')
ax.axhline(0, color='black', linestyle='--', linewidth=0.8)
ax.axvline(-0.5, color='red', linestyle=':', linewidth=0.8, label='Treatment onset')
ax.set_xlabel('Relative period (k = t − g)')
ax.set_ylabel('ATT')
ax.set_title('ETWFE Event Study')
ax.legend()
plt.tight_layout()
plt.show()
else:
print("Install matplotlib to see the event study plot: pip install matplotlib")
Poisson QMLE for Count / Non-Negative Outcomes#
method='poisson' fits a Poisson QMLE regression. This is valid for any non-negative continuous outcome, not just count data — the Poisson log-likelihood produces consistent estimates whenever the conditional mean is correctly specified as exp(Xβ).
The ATT is computed as the Average Structural Function (ASF) difference:
where η₁ = Xβ (with treatment) and η₀ = Xβ − δ (counterfactual without treatment).
This matches Stata’s jwdid y, method(poisson).
[ ]:
# Simulate a non-negative outcome (e.g., employment level)
data_pois = data.copy()
data_pois['emp'] = np.exp(data_pois['outcome'] / 4 + 3) # positive outcome
m_pois = WooldridgeDiD(method='poisson')
r_pois = m_pois.fit(data_pois, outcome='emp', unit='unit', time='period', cohort='first_treat')
r_pois.aggregate('event').aggregate('group').aggregate('simple')
print(r_pois.summary('simple'))
[ ]:
# Cohort×time cells (post-treatment, Poisson)
print("Poisson ATT(g,t) — post-treatment cells")
print("{:>8} {:>8} | {:>10} {:>10} {:>7} {:>7}".format(
"cohort", "year", "ATT", "Std.Err.", "t", "P>|t|"))
print("-" * 60)
for (g, t), v in sorted(r_pois.group_time_effects.items()):
if t < g:
continue
print("{:>8} {:>8} | {:>10.4f} {:>10.4f} {:>7.2f} {:>7.3f}".format(
int(g), int(t), v['att'], v['se'], v['t_stat'], v['p_value']
))
[ ]:
print(r_pois.summary('event'))
print(r_pois.summary('group'))
Logit for Binary Outcomes#
method='logit' fits a logit model and computes ATT as the ASF probability difference:
where Λ(·) is the logistic function. Standard errors use the delta method.
This matches Stata’s jwdid y, method(logit).
[ ]:
# Create a binary outcome
data_logit = data.copy()
median_val = data_logit.loc[data_logit['period'] == data_logit['period'].min(), 'outcome'].median()
data_logit['hi_outcome'] = (data_logit['outcome'] > median_val).astype(int)
print(f"Binary outcome mean: {data_logit['hi_outcome'].mean():.3f}")
m_logit = WooldridgeDiD(method='logit')
r_logit = m_logit.fit(data_logit, outcome='hi_outcome', unit='unit', time='period', cohort='first_treat')
r_logit.aggregate('event').aggregate('group').aggregate('simple')
print(r_logit.summary('simple'))
[ ]:
print(r_logit.summary('group'))
mpdta: Real-World Example#
The mpdta dataset (Callaway & Sant’Anna 2021) contains county-level log employment (lemp) data with staggered minimum-wage adoption (first_treat = year of treatment, 0 = never treated). It is the canonical benchmark for staggered DiD methods.
This follows Stata’s jwdid lemp, ivar(countyreal) tvar(year) gvar(first_treat) specification. See the Methodology Registry for documented SE/aggregation deviations.
[ ]:
from diff_diff import load_mpdta
mpdta = load_mpdta()
print(f"mpdta loaded: {mpdta.shape}")
print(f"Cohorts: {sorted(mpdta['first_treat'].unique())}")
[ ]:
# OLS — matches: jwdid lemp, ivar(countyreal) tvar(year) gvar(first_treat)
m_ols = WooldridgeDiD(method='ols')
r_ols = m_ols.fit(mpdta, outcome='lemp', unit='countyreal', time='year', cohort='first_treat')
r_ols.aggregate('event').aggregate('group').aggregate('simple')
print(r_ols.summary('event'))
[ ]:
# cohort x time ATT cells (post-treatment)
# Matches Stata: first_treat#year#c.__tr__ output table
print("ATT(g,t) — post-treatment cells (matches Stata jwdid output)")
print("{:>6} {:>6} | {:>9} {:>9} {:>7} {:>7}".format(
"cohort", "year", "Coef.", "Std.Err.", "t", "P>|t|"))
print("-" * 55)
for (g, t), v in sorted(r_ols.group_time_effects.items()):
if t < g:
continue
print("{:>6} {:>6} | {:>9.4f} {:>9.4f} {:>7.2f} {:>7.3f}".format(
g, t, v['att'], v['se'], v['t_stat'], v['p_value']))
[ ]:
# Poisson — matches: gen emp=exp(lemp) / jwdid emp, method(poisson)
mpdta['emp'] = np.exp(mpdta['lemp'])
m_pois2 = WooldridgeDiD(method='poisson')
r_pois2 = m_pois2.fit(mpdta, outcome='emp', unit='countyreal', time='year', cohort='first_treat')
r_pois2.aggregate('event').aggregate('group').aggregate('simple')
print(r_pois2.summary('event'))
print(r_pois2.summary('group'))
print(r_pois2.summary('simple'))
Comparison with Callaway-Sant’Anna#
ETWFE and Callaway-Sant’Anna are both valid for staggered designs. Under homogeneous treatment effects and additive parallel trends, they should produce similar ATT(g,t) point estimates. Key differences:
Aspect |
WooldridgeDiD (ETWFE) |
CallawaySantAnna |
|---|---|---|
Approach |
Single saturated regression |
Separate 2×2 DiD per cell |
Nonlinear outcomes |
Yes (Poisson, Logit) |
No |
Covariates |
Via regression (linear index) |
OR, IPW, DR |
SE for aggregations |
Delta method |
Multiplier bootstrap |
Stata equivalent |
|
|
[ ]:
# Compare overall ATT: ETWFE vs Callaway-Sant'Anna
cs = CallawaySantAnna()
r_cs = cs.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')
m_etwfe = WooldridgeDiD(method='ols')
r_etwfe = m_etwfe.fit(data, outcome='outcome', unit='unit', time='period', cohort='first_treat')
r_etwfe.aggregate('event').aggregate('simple')
print("Overall ATT Comparison (true effect = 2.0)")
print("=" * 60)
print("{:<25} {:>10} {:>10} {:>12}".format("Estimator", "ATT", "SE", "95% CI"))
print("-" * 60)
for name, est_r in [("WooldridgeDiD (ETWFE)", r_etwfe), ("CallawaySantAnna", r_cs)]:
ci = est_r.overall_conf_int
print("{:<25} {:>10.4f} {:>10.4f} [{:.3f}, {:.3f}]".format(
name, est_r.overall_att, est_r.overall_se, ci[0], ci[1]
))
[ ]:
# Event-study comparison
r_cs_es = CallawaySantAnna().fit(
data, outcome='outcome', unit='unit', time='period',
first_treat='first_treat', aggregate='event_study'
)
if HAS_MATPLOTLIB:
es_etwfe = r_etwfe.event_study_effects
es_cs = {int(row['relative_period']): row
for _, row in r_cs_es.to_dataframe(level='event_study').iterrows()}
ks = sorted(es_etwfe.keys())
fig, ax = plt.subplots(figsize=(10, 5))
offset = 0.1
atts_e = [es_etwfe[k]['att'] for k in ks]
lo_e = [es_etwfe[k]['conf_int'][0] for k in ks]
hi_e = [es_etwfe[k]['conf_int'][1] for k in ks]
ax.errorbar([k - offset for k in ks], atts_e,
yerr=[np.array(atts_e) - np.array(lo_e), np.array(hi_e) - np.array(atts_e)],
fmt='o-', capsize=4, color='steelblue', label='ETWFE')
ks_cs = sorted(es_cs.keys())
atts_cs = [es_cs[k]['effect'] for k in ks_cs]
lo_cs = [es_cs[k]['conf_int_lower'] for k in ks_cs]
hi_cs = [es_cs[k]['conf_int_upper'] for k in ks_cs]
ax.errorbar([k + offset for k in ks_cs], atts_cs,
yerr=[np.array(atts_cs) - np.array(lo_cs), np.array(hi_cs) - np.array(atts_cs)],
fmt='s--', capsize=4, color='darkorange', label='Callaway-Sant\'Anna')
ax.axhline(0, color='black', linestyle='--', linewidth=0.8)
ax.axvline(-0.5, color='red', linestyle=':', linewidth=0.8)
ax.set_xlabel('Relative period (k = t − g)')
ax.set_ylabel('ATT')
ax.set_title('Event Study: ETWFE vs Callaway-Sant\'Anna')
ax.legend()
plt.tight_layout()
plt.show()
else:
print("Install matplotlib to see the comparison plot: pip install matplotlib")
Summary#
Key takeaways:
ETWFE via a single regression: all ATT(g,t) cells estimated jointly, not separately — computationally efficient and internally consistent
OLS path follows the Stata
jwdidspecification: unit + time FEs (absorbed via within-transformation), treatment interaction dummiesNonlinear paths (Poisson, Logit) use the ASF formula: E[f(η₁)] − E[f(η₀)] — the only valid ATT definition for nonlinear models
Four aggregations mirror Stata’s
estatcommands: event, group, calendar, simpleDelta-method SEs for all aggregations, including nonlinear paths
When to prefer ETWFE: nonlinear outcomes, or when a single-regression framework is preferred
When to prefer CS/ImputationDiD: covariate adjustment via IPW/DR, or multiplier bootstrap inference
Parameter reference:
Parameter |
Default |
Description |
|---|---|---|
|
|
|
|
|
|
|
|
Anticipation periods before treatment |
|
|
Significance level |
|
|
Column for clustering (default: unit variable) |
References:
Wooldridge, J. M. (2021). Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators. SSRN 3906345.
Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3), C31–C66.
Friosavila, F. (2021).
jwdid: Stata module for ETWFE. SSC s459114.
See also:Tutorial 02for Callaway-Sant’Anna,Tutorial 15for Efficient DiD.