Interactive notebook

This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.

Wooldridge Extended Two-Way Fixed Effects (ETWFE)#

This tutorial demonstrates the WooldridgeDiD estimator (alias: ETWFE), which implements Wooldridge’s (2021, 2023) Extended Two-Way Fixed Effects approach — the basis of the Stata jwdid package.

What ETWFE does: Estimates cohort×time Average Treatment Effects (ATT(g,t)) via a single saturated regression that interacts treatment indicators with cohort×time cells. Unlike standard TWFE, it correctly handles heterogeneous treatment effects across cohorts and time periods. The key insight is to include all cohort×time interaction terms simultaneously, with unit and time fixed effects absorbed via within-transformation.

Key features:

Follows the Stata jwdid specification (OLS and nonlinear paths; see Methodology Registry for documented SE/aggregation deviations)
Supports linear (OLS), Poisson, and logit link functions
Nonlinear ATTs use the Average Structural Function (ASF): E[f(η₁)] − E[f(η₀)]
Delta-method standard errors for all aggregations
Cluster-robust sandwich variance

Topics covered:

Basic OLS estimation
Cohort×time cell estimates ATT(g,t)
Aggregation: event-study, group, simple
Poisson QMLE for count / non-negative outcomes
Logit for binary outcomes
Comparison with Callaway-Sant’Anna
Parameter reference and guidance

Prerequisites:Tutorial 02(Staggered DiD).

See also:Tutorial 15for Efficient DiD,Tutorial 11for Imputation DiD.

[ ]:

import numpy as np
import pandas as pd

from diff_diff import WooldridgeDiD, CallawaySantAnna, generate_staggered_data

try:
    import matplotlib.pyplot as plt
    plt.style.use('seaborn-v0_8-whitegrid')
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False
    print("matplotlib not installed - visualization examples will be skipped")

Data Setup#

We use generate_staggered_data() to create a balanced panel with 3 treatment cohorts, a never-treated group, and a known ATT of 2.0. This makes it easy to verify estimation accuracy.

We also demonstrate with the mpdta dataset (Callaway & Sant’Anna 2021), which contains county-level log employment data with staggered minimum-wage adoption — the canonical benchmark for staggered DiD methods.

[ ]:

# Simulated data
data = generate_staggered_data(
    n_units=300, n_periods=10, treatment_effect=2.0,
    dynamic_effects=False, seed=42
)

print(f"Shape: {data.shape}")
print(f"Cohorts: {sorted(data['first_treat'].unique())}")
print(f"Periods: {sorted(data['period'].unique())}")
print()
data.head()

Basic OLS Estimation#

The default method='ols' fits a single regression with:

Treatment interaction dummies (one per treatment cohort x post-treatment period cell)
Unit fixed effects (absorbed via within-transformation)
Time fixed effects (absorbed via within-transformation)

With control_group='not_yet_treated' (default), pre-treatment observations from treated units sit in the regression baseline alongside not-yet-treated controls. With control_group='never_treated', pre-treatment interaction indicators are added so only never-treated units define the counterfactual baseline, and pre-treatment coefficients serve as placebo checks.

[ ]:

m = WooldridgeDiD()  # default: method='ols'
r = m.fit(data, outcome='outcome', unit='unit', time='period', cohort='first_treat')

# Compute aggregations
r.aggregate('event').aggregate('group').aggregate('simple')

print(r.summary('simple'))

Cohort×Time Cell Estimates ATT(g,t)#

The raw building blocks are ATT(g,t) — the treatment effect for cohort g at calendar time t. These are stored in r.group_time_effects and correspond to Stata’s regression output table (first_treat#year#c.__tr__).

Post-treatment cells have t >= g; pre-treatment cells (t < g) serve as placebo checks.

[ ]:

print("Post-treatment ATT(g,t) cells")
print("{:>8} {:>8} | {:>10} {:>10} {:>7} {:>7}".format(
    "cohort", "year", "Coef.", "Std.Err.", "t", "P>|t|"))
print("-" * 60)

for (g, t), v in sorted(r.group_time_effects.items()):
    if t < g:
        continue
    row = "{:>8} {:>8} | {:>10.4f} {:>10.4f} {:>7.2f} {:>7.3f}".format(
        int(g), int(t), v['att'], v['se'], v['t_stat'], v['p_value']
    )
    print(row)

[ ]:

# Also show pre-treatment placebo cells
print("Pre-treatment placebo ATT(g,t) cells (should be ~0 under parallel trends)")
print("{:>8} {:>8} | {:>10} {:>10} {:>7} {:>7}".format(
    "cohort", "year", "Coef.", "Std.Err.", "t", "P>|t|"))
print("-" * 60)

for (g, t), v in sorted(r.group_time_effects.items()):
    if t >= g:
        continue
    row = "{:>8} {:>8} | {:>10.4f} {:>10.4f} {:>7.2f} {:>7.3f}".format(
        int(g), int(t), v['att'], v['se'], v['t_stat'], v['p_value']
    )
    print(row)

Aggregation Methods#

ETWFE supports four aggregation types, matching Stata’s estat post-estimation commands:

Python	Stata	Description
`aggregate('event')`	`estat event`	By relative time k = t − g
`aggregate('group')`	`estat group`	By treatment cohort g
`aggregate('calendar')`	`estat calendar`	By calendar time t
`aggregate('simple')`	`estat simple`	Overall weighted average ATT

Standard errors use the delta method, propagating uncertainty from the cell-level ATT covariance matrix.

[ ]:

# Event-study aggregation: ATT by relative time k = t - g
print(r.summary('event'))

[ ]:

# Group aggregation: ATT averaged across post-treatment periods for each cohort
print(r.summary('group'))

[ ]:

# Simple ATT: overall weighted average
print(r.summary('simple'))

[ ]:

# Event study plot
if HAS_MATPLOTLIB:
    es = r.event_study_effects
    ks   = sorted(es.keys())
    atts = [es[k]['att'] for k in ks]
    lo   = [es[k]['conf_int'][0] for k in ks]
    hi   = [es[k]['conf_int'][1] for k in ks]

    fig, ax = plt.subplots(figsize=(9, 5))
    ax.errorbar(ks, atts, yerr=[np.array(atts) - np.array(lo), np.array(hi) - np.array(atts)],
                fmt='o-', capsize=4, color='steelblue', label='ETWFE (OLS)')
    ax.axhline(0, color='black', linestyle='--', linewidth=0.8)
    ax.axvline(-0.5, color='red', linestyle=':', linewidth=0.8, label='Treatment onset')
    ax.set_xlabel('Relative period (k = t − g)')
    ax.set_ylabel('ATT')
    ax.set_title('ETWFE Event Study')
    ax.legend()
    plt.tight_layout()
    plt.show()
else:
    print("Install matplotlib to see the event study plot: pip install matplotlib")

Poisson QMLE for Count / Non-Negative Outcomes#

method='poisson' fits a Poisson QMLE regression. This is valid for any non-negative continuous outcome, not just count data — the Poisson log-likelihood produces consistent estimates whenever the conditional mean is correctly specified as exp(Xβ).

The ATT is computed as the Average Structural Function (ASF) difference:

\[\text{ATT}(g,t) = \frac{1}{N_{g,t}} \sum_{i \in g,t} \left[\exp(\eta_{i,1}) - \exp(\eta_{i,0})\right]\]

where η₁ = Xβ (with treatment) and η₀ = Xβ − δ (counterfactual without treatment).

This matches Stata’s jwdid y, method(poisson).

[ ]:

# Simulate a non-negative outcome (e.g., employment level)
data_pois = data.copy()
data_pois['emp'] = np.exp(data_pois['outcome'] / 4 + 3)  # positive outcome

m_pois = WooldridgeDiD(method='poisson')
r_pois = m_pois.fit(data_pois, outcome='emp', unit='unit', time='period', cohort='first_treat')
r_pois.aggregate('event').aggregate('group').aggregate('simple')

print(r_pois.summary('simple'))

[ ]:

# Cohort×time cells (post-treatment, Poisson)
print("Poisson ATT(g,t) — post-treatment cells")
print("{:>8} {:>8} | {:>10} {:>10} {:>7} {:>7}".format(
    "cohort", "year", "ATT", "Std.Err.", "t", "P>|t|"))
print("-" * 60)

for (g, t), v in sorted(r_pois.group_time_effects.items()):
    if t < g:
        continue
    print("{:>8} {:>8} | {:>10.4f} {:>10.4f} {:>7.2f} {:>7.3f}".format(
        int(g), int(t), v['att'], v['se'], v['t_stat'], v['p_value']
    ))

[ ]:

print(r_pois.summary('event'))
print(r_pois.summary('group'))

Logit for Binary Outcomes#

method='logit' fits a logit model and computes ATT as the ASF probability difference:

\[\text{ATT}(g,t) = \frac{1}{N_{g,t}} \sum_{i \in g,t} \left[\Lambda(\eta_{i,1}) - \Lambda(\eta_{i,0})\right]\]

where Λ(·) is the logistic function. Standard errors use the delta method.

This matches Stata’s jwdid y, method(logit).

[ ]:

# Create a binary outcome
data_logit = data.copy()
median_val = data_logit.loc[data_logit['period'] == data_logit['period'].min(), 'outcome'].median()
data_logit['hi_outcome'] = (data_logit['outcome'] > median_val).astype(int)

print(f"Binary outcome mean: {data_logit['hi_outcome'].mean():.3f}")

m_logit = WooldridgeDiD(method='logit')
r_logit = m_logit.fit(data_logit, outcome='hi_outcome', unit='unit', time='period', cohort='first_treat')
r_logit.aggregate('event').aggregate('group').aggregate('simple')

print(r_logit.summary('simple'))

[ ]:

print(r_logit.summary('group'))

mpdta: Real-World Example#

The mpdta dataset (Callaway & Sant’Anna 2021) contains county-level log employment (lemp) data with staggered minimum-wage adoption (first_treat = year of treatment, 0 = never treated). It is the canonical benchmark for staggered DiD methods.

This follows Stata’s jwdid lemp, ivar(countyreal) tvar(year) gvar(first_treat) specification. See the Methodology Registry for documented SE/aggregation deviations.

[ ]:

from diff_diff import load_mpdta

mpdta = load_mpdta()
print(f"mpdta loaded: {mpdta.shape}")
print(f"Cohorts: {sorted(mpdta['first_treat'].unique())}")

[ ]:

# OLS — matches: jwdid lemp, ivar(countyreal) tvar(year) gvar(first_treat)
m_ols = WooldridgeDiD(method='ols')
r_ols = m_ols.fit(mpdta, outcome='lemp', unit='countyreal', time='year', cohort='first_treat')
r_ols.aggregate('event').aggregate('group').aggregate('simple')
print(r_ols.summary('event'))

[ ]:

# cohort x time ATT cells (post-treatment)
# Matches Stata: first_treat#year#c.__tr__ output table
print("ATT(g,t) — post-treatment cells (matches Stata jwdid output)")
print("{:>6} {:>6} | {:>9} {:>9} {:>7} {:>7}".format(
    "cohort", "year", "Coef.", "Std.Err.", "t", "P>|t|"))
print("-" * 55)
for (g, t), v in sorted(r_ols.group_time_effects.items()):
    if t < g:
        continue
    print("{:>6} {:>6} | {:>9.4f} {:>9.4f} {:>7.2f} {:>7.3f}".format(
        g, t, v['att'], v['se'], v['t_stat'], v['p_value']))

[ ]:

# Poisson — matches: gen emp=exp(lemp) / jwdid emp, method(poisson)
mpdta['emp'] = np.exp(mpdta['lemp'])

m_pois2 = WooldridgeDiD(method='poisson')
r_pois2 = m_pois2.fit(mpdta, outcome='emp', unit='countyreal', time='year', cohort='first_treat')
r_pois2.aggregate('event').aggregate('group').aggregate('simple')

print(r_pois2.summary('event'))
print(r_pois2.summary('group'))
print(r_pois2.summary('simple'))

Comparison with Callaway-Sant’Anna#

ETWFE and Callaway-Sant’Anna are both valid for staggered designs. Under homogeneous treatment effects and additive parallel trends, they should produce similar ATT(g,t) point estimates. Key differences:

Aspect	WooldridgeDiD (ETWFE)	CallawaySantAnna
Approach	Single saturated regression	Separate 2×2 DiD per cell
Nonlinear outcomes	Yes (Poisson, Logit)	No
Covariates	Via regression (linear index)	OR, IPW, DR
SE for aggregations	Delta method	Multiplier bootstrap
Stata equivalent	`jwdid`	`csdid`

[ ]:

# Compare overall ATT: ETWFE vs Callaway-Sant'Anna
cs = CallawaySantAnna()
r_cs = cs.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')

m_etwfe = WooldridgeDiD(method='ols')
r_etwfe = m_etwfe.fit(data, outcome='outcome', unit='unit', time='period', cohort='first_treat')
r_etwfe.aggregate('event').aggregate('simple')

print("Overall ATT Comparison (true effect = 2.0)")
print("=" * 60)
print("{:<25} {:>10} {:>10} {:>12}".format("Estimator", "ATT", "SE", "95% CI"))
print("-" * 60)

for name, est_r in [("WooldridgeDiD (ETWFE)", r_etwfe), ("CallawaySantAnna", r_cs)]:
    ci = est_r.overall_conf_int
    print("{:<25} {:>10.4f} {:>10.4f} [{:.3f}, {:.3f}]".format(
        name, est_r.overall_att, est_r.overall_se, ci[0], ci[1]
    ))

[ ]:

# Event-study comparison
r_cs_es = CallawaySantAnna().fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='event_study'
)

if HAS_MATPLOTLIB:
    es_etwfe = r_etwfe.event_study_effects
    es_cs    = {int(row['relative_period']): row
                for _, row in r_cs_es.to_dataframe(level='event_study').iterrows()}

    ks = sorted(es_etwfe.keys())

    fig, ax = plt.subplots(figsize=(10, 5))
    offset = 0.1

    atts_e = [es_etwfe[k]['att'] for k in ks]
    lo_e   = [es_etwfe[k]['conf_int'][0] for k in ks]
    hi_e   = [es_etwfe[k]['conf_int'][1] for k in ks]
    ax.errorbar([k - offset for k in ks], atts_e,
                yerr=[np.array(atts_e) - np.array(lo_e), np.array(hi_e) - np.array(atts_e)],
                fmt='o-', capsize=4, color='steelblue', label='ETWFE')

    ks_cs  = sorted(es_cs.keys())
    atts_cs = [es_cs[k]['effect'] for k in ks_cs]
    lo_cs   = [es_cs[k]['conf_int_lower'] for k in ks_cs]
    hi_cs   = [es_cs[k]['conf_int_upper'] for k in ks_cs]
    ax.errorbar([k + offset for k in ks_cs], atts_cs,
                yerr=[np.array(atts_cs) - np.array(lo_cs), np.array(hi_cs) - np.array(atts_cs)],
                fmt='s--', capsize=4, color='darkorange', label='Callaway-Sant\'Anna')

    ax.axhline(0, color='black', linestyle='--', linewidth=0.8)
    ax.axvline(-0.5, color='red', linestyle=':', linewidth=0.8)
    ax.set_xlabel('Relative period (k = t − g)')
    ax.set_ylabel('ATT')
    ax.set_title('Event Study: ETWFE vs Callaway-Sant\'Anna')
    ax.legend()
    plt.tight_layout()
    plt.show()
else:
    print("Install matplotlib to see the comparison plot: pip install matplotlib")

Summary#

Key takeaways:

ETWFE via a single regression: all ATT(g,t) cells estimated jointly, not separately — computationally efficient and internally consistent
OLS path follows the Stata jwdid specification: unit + time FEs (absorbed via within-transformation), treatment interaction dummies
Nonlinear paths (Poisson, Logit) use the ASF formula: E[f(η₁)] − E[f(η₀)] — the only valid ATT definition for nonlinear models
Four aggregations mirror Stata’s estat commands: event, group, calendar, simple
Delta-method SEs for all aggregations, including nonlinear paths
When to prefer ETWFE: nonlinear outcomes, or when a single-regression framework is preferred
When to prefer CS/ImputationDiD: covariate adjustment via IPW/DR, or multiplier bootstrap inference

Parameter reference:

Parameter	Default	Description
`method`	`'ols'`	`'ols'`, `'poisson'`, or `'logit'`
`control_group`	`'not_yet_treated'`	`'not_yet_treated'` or `'never_treated'`
`anticipation`	`0`	Anticipation periods before treatment
`alpha`	`0.05`	Significance level
`cluster`	`None`	Column for clustering (default: unit variable)

References:

Wooldridge, J. M. (2021). Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators. SSRN 3906345.
Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3), C31–C66.
Friosavila, F. (2021). jwdid: Stata module for ETWFE. SSC s459114.

See also:Tutorial 02for Callaway-Sant’Anna,Tutorial 15for Efficient DiD.