Wooldridge Extended Two-Way Fixed Effects (ETWFE)#

Extended Two-Way Fixed Effects estimator from Wooldridge (2021, 2023), based on the Stata jwdid package specification (Friosavila 2021), with documented SE/aggregation deviations noted in the Methodology Registry.

This module implements ETWFE via a single saturated regression that:

  1. Estimates ATT(g,t) for each cohort×time treatment cell simultaneously

  2. Supports linear (OLS), Poisson QMLE, and logit link functions

  3. Uses ASF-based ATT for nonlinear models: E[f(η₁)] − E[f(η₀)]

  4. Computes delta-method SEs for all aggregations (event, group, calendar, simple)

  5. Follows the Stata jwdid specification for OLS and nonlinear paths (see Methodology Registry for documented SE/aggregation deviations)

When to use WooldridgeDiD:

  • Staggered adoption design with heterogeneous treatment timing

  • Nonlinear outcomes (binary, count, non-negative continuous)

  • You want a single-regression approach matching Stata’s jwdid

  • You need event-study, group, calendar, or simple ATT aggregations

References:

  • Wooldridge, J. M. (2021). Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators. SSRN 3906345.

  • Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3), C31–C66.

  • Friosavila, F. (2021). jwdid: Stata module for ETWFE. SSC s459114.

WooldridgeDiD#

Main estimator class for Wooldridge ETWFE.

class diff_diff.WooldridgeDiD[source]

Bases: object

Extended Two-Way Fixed Effects (ETWFE) DiD estimator.

Implements the Wooldridge (2021) saturated cohort×time regression and Wooldridge (2023) nonlinear extensions (logit, Poisson). Produces all four jwdid_estat aggregation types: simple, group, calendar, event.

Parameters:
  • method ({"ols", "logit", "poisson"}) – Estimation method. “ols” for continuous outcomes; “logit” for binary or fractional outcomes; “poisson” for count data.

  • control_group ({"not_yet_treated", "never_treated"}) – Which units serve as the comparison group. “not_yet_treated” (jwdid default) uses all untreated observations at each time period; “never_treated” uses only units never treated throughout the sample.

  • anticipation (int) – Number of periods before treatment onset to include as treatment cells (anticipation effects). 0 means no anticipation.

  • demean_covariates (bool) – If True (jwdid default), xtvar covariates are demeaned within each cohort×period cell before entering the regression. Set to False to replicate jwdid’s xasis option.

  • alpha (float) – Significance level for confidence intervals.

  • cluster (str or None) – Column name to use for cluster-robust SEs. Defaults to the unit identifier passed to fit().

  • n_bootstrap (int) – Number of bootstrap replications. 0 disables bootstrap.

  • bootstrap_weights ({"rademacher", "webb", "mammen"}) – Bootstrap weight distribution.

  • seed (int or None) – Random seed for reproducibility.

  • rank_deficient_action ({"warn", "error", "silent"}) – How to handle rank-deficient design matrices.

Methods

fit(data, outcome, unit, time, cohort[, ...])

Fit the ETWFE model.

get_params()

Return estimator parameters (sklearn-compatible).

set_params(**params)

Set estimator parameters (sklearn-compatible).

__init__(method='ols', control_group='not_yet_treated', anticipation=0, demean_covariates=True, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn')[source]
Parameters:
  • method (str)

  • control_group (str)

  • anticipation (int)

  • demean_covariates (bool)

  • alpha (float)

  • cluster (str | None)

  • n_bootstrap (int)

  • bootstrap_weights (str)

  • seed (int | None)

  • rank_deficient_action (str)

Return type:

None

property results_: WooldridgeDiDResults
get_params()[source]

Return estimator parameters (sklearn-compatible).

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible). Returns self.

Parameters:

params (Any)

Return type:

WooldridgeDiD

fit(data, outcome, unit, time, cohort, exovar=None, xtvar=None, xgvar=None, survey_design=None)[source]

Fit the ETWFE model. See class docstring for parameter details.

Parameters:
  • data (DataFrame with panel data (long format))

  • outcome (outcome column name)

  • unit (unit identifier column)

  • time (time period column)

  • cohort (first treatment period (0 or NaN = never treated))

  • exovar (time-invariant covariates added without interaction/demeaning)

  • xtvar (time-varying covariates (demeaned within cohort×period cells) – when demean_covariates=True)

  • xgvar (covariates interacted with each cohort indicator)

  • survey_design (SurveyDesign, optional) – Survey design specification for complex survey data. Supports stratified, clustered, and weighted designs via Taylor Series Linearization (TSL). Replicate-weight designs raise NotImplementedError.

Return type:

WooldridgeDiDResults

WooldridgeDiDResults#

Results container returned by WooldridgeDiD.fit().

class diff_diff.wooldridge_results.WooldridgeDiDResults[source]

Bases: object

Results from WooldridgeDiD.fit().

Core output is group_time_effects: a dict keyed by (cohort_g, time_t) with per-cell ATT estimates and inference. Call .aggregate(type) to compute any of the four jwdid_estat aggregation types.

Methods

aggregate(type)

Compute and store one of the four jwdid_estat aggregation types.

summary([aggregation])

Print formatted summary table.

group_time_effects: Dict[Tuple[Any, Any], Dict[str, Any]]

key=(g,t), value={att, se, t_stat, p_value, conf_int}

overall_att: float
overall_se: float
overall_t_stat: float
overall_p_value: float
overall_conf_int: Tuple[float, float]
group_effects: Dict[Any, Dict] | None = None
calendar_effects: Dict[Any, Dict] | None = None
event_study_effects: Dict[int, Dict] | None = None
method: str = 'ols'
control_group: str = 'not_yet_treated'
groups: List[Any]
time_periods: List[Any]
n_obs: int = 0
n_treated_units: int = 0
n_control_units: int = 0
alpha: float = 0.05
anticipation: int = 0
survey_metadata: Any | None = None
aggregate(type)[source]

Compute and store one of the four jwdid_estat aggregation types.

Parameters:
  • type ("simple" | "group" | "calendar" | "event")

  • chaining. (Returns self for)

Return type:

WooldridgeDiDResults

summary(aggregation='simple')[source]

Print formatted summary table.

Parameters:

aggregation (which aggregation to display ("simple", "group", "calendar", "event"))

Return type:

str

to_dataframe(aggregation='event')[source]

Export aggregated effects to a DataFrame.

Parameters:

aggregation ("simple" | "group" | "calendar" | "event" | "gt") – Use “gt” to export raw group-time effects.

Return type:

DataFrame

plot_event_study(**kwargs)[source]

Event study plot. Calls aggregate(‘event’) if needed.

Return type:

None

property att: float
property se: float
__init__(group_time_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, group_effects=None, calendar_effects=None, event_study_effects=None, method='ols', control_group='not_yet_treated', groups=<factory>, time_periods=<factory>, n_obs=0, n_treated_units=0, n_control_units=0, alpha=0.05, anticipation=0, survey_metadata=None, _gt_weights=<factory>, _gt_vcov=None, _gt_keys=<factory>, _df_survey=None)
Parameters:
Return type:

None

property conf_int: Tuple[float, float]
property p_value: float
property t_stat: float

Example Usage#

Basic OLS (follows Stata jwdid y, ivar(unit) tvar(time) gvar(cohort)):

import pandas as pd
from diff_diff import WooldridgeDiD

df = pd.read_stata("mpdta.dta")
df['first_treat'] = df['first_treat'].astype(int)

m = WooldridgeDiD()
r = m.fit(df, outcome='lemp', unit='countyreal', time='year', cohort='first_treat')

r.aggregate('event').aggregate('group').aggregate('simple')
print(r.summary('event'))
print(r.summary('group'))
print(r.summary('simple'))

View cohort×time cell estimates (post-treatment):

for (g, t), v in sorted(r.group_time_effects.items()):
    if t >= g:
        print(f"g={g} t={t}  ATT={v['att']:.4f}  SE={v['se']:.4f}")

Poisson QMLE for non-negative outcomes (follows Stata jwdid emp, method(poisson)):

import numpy as np
df['emp'] = np.exp(df['lemp'])

m_pois = WooldridgeDiD(method='poisson')
r_pois = m_pois.fit(df, outcome='emp', unit='countyreal',
                    time='year', cohort='first_treat')
r_pois.aggregate('event').aggregate('group').aggregate('simple')
print(r_pois.summary('simple'))

Logit for binary outcomes (follows Stata jwdid y, method(logit)):

m_logit = WooldridgeDiD(method='logit')
r_logit = m_logit.fit(df, outcome='hi_emp', unit='countyreal',
                      time='year', cohort='first_treat')
r_logit.aggregate('group').aggregate('simple')
print(r_logit.summary('group'))

Aggregation Methods#

Call .aggregate(type) before .summary(type):

Type

Description

Stata equivalent

'event'

ATT by relative time k = t − g

estat event

'group'

ATT averaged across post-treatment periods per cohort

estat group

'calendar'

ATT averaged across cohorts per calendar period

estat calendar

'simple'

Overall weighted average ATT

estat simple

Comparison with Other Staggered Estimators#

Feature

WooldridgeDiD (ETWFE)

CallawaySantAnna

ImputationDiD

Approach

Single saturated regression

Separate 2×2 DiD per cell

Impute Y(0) via FE model

Nonlinear outcomes

Yes (Poisson, Logit)

No

No

Covariates

Via regression (linear index)

OR, IPW, DR

Supported

SE for aggregations

Delta method

Multiplier bootstrap

Multiplier bootstrap

Stata equivalent

jwdid

csdid

did_imputation