Wooldridge Extended Two-Way Fixed Effects (ETWFE)#

Extended Two-Way Fixed Effects estimator from Wooldridge (2025, 2023), based on the Stata jwdid package specification (Friosavila 2021), with documented SE/aggregation deviations noted in the Methodology Registry.

This module implements ETWFE via a single saturated regression that:

Estimates ATT(g,t) for each cohort×time treatment cell simultaneously
Supports linear (OLS), Poisson QMLE, and logit link functions
Uses ASF-based ATT for nonlinear models: E[f(η₁)] − E[f(η₀)]
Computes delta-method SEs for all aggregations (event, group, calendar, simple)
Supports paper W2025 cohort-share aggregation via aggregate(weights="cohort_share") (Eqs. 7.4 + 7.6; default is cell-count matching Stata jwdid_estat)
Supports paper W2025 Section 8 heterogeneous cohort trends via cohort_trends=True (OLS path only; auto-routes to full-dummy mode; requires control_group="not_yet_treated" — the default — and survey_design=None; the never_treated and survey paths are fail-closed with NotImplementedError because the all-(g, t)-cells placebo basis collinearity / unvalidated survey-TSL composition would make the trend specification unidentified or unverified — see Methodology Registry for the full contract)
Follows the Stata jwdid specification for OLS defaults and nonlinear paths (see Methodology Registry for documented SE/aggregation deviations)

When to use WooldridgeDiD:

Staggered adoption design with heterogeneous treatment timing
Nonlinear outcomes (binary, count, non-negative continuous)
You want a single-regression approach matching Stata’s jwdid
You need event-study, group, calendar, or simple ATT aggregations
You need paper W2025 cohort-share aggregation weights as an alternative to the default cell-count weighting
You need heterogeneous cohort-specific linear trends when parallel trends is violated (paper W2025 Section 8)

References:

Wooldridge, J. M. (2025). Two-way fixed effects, the two-way Mundlak regression, and difference-in-differences estimators. Empirical Economics, 69(5), 2545-2587. DOI 10.1007/s00181-025-02807-z.
Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3), C31-C66.
Friosavila, F. (2021). jwdid: Stata module for ETWFE. SSC s459114.

WooldridgeDiD#

Main estimator class for Wooldridge ETWFE.

class diff_diff.WooldridgeDiD[source]

Bases: object

Extended Two-Way Fixed Effects (ETWFE) DiD estimator.

Implements the Wooldridge (2025) saturated cohort×time regression (Empirical Economics 69(5), 2545-2587; DOI 10.1007/s00181-025-02807-z) and Wooldridge (2023) nonlinear extensions (logit, Poisson). Produces all four jwdid_estat aggregation types: simple, group, calendar, event. Opt-in surfaces include paper W2025 Section 7 cohort-share aggregation (aggregate(weights="cohort_share"), Eqs. 7.4 + 7.6) and paper W2025 Section 8 heterogeneous cohort-specific linear trends (cohort_trends=True, Eq. 8.1; OLS path only).

Parameters:

method ({"ols", "logit", "poisson"}) – Estimation method. "ols" is the linear baseline — valid for any response (Wooldridge 2023) and the usual choice for continuous outcomes; "logit" for binary or fractional outcomes; "poisson" for count data. When method="ols" is used on a binary ({0, 1}) or non-negative integer-count outcome, a UserWarning notes that a matching nonlinear model (logit / Poisson) is often the more appropriate specification — it imposes parallel trends on the link scale rather than in levels, and Wooldridge’s (2023) simulations show the linear model both biased and less precise for such outcomes when the nonlinear mean holds. It rests on a different identifying assumption than linear OLS, so it is a recommended comparison, not an automatic switch; suppress via warnings.filterwarnings.
control_group ({"not_yet_treated", "never_treated"}) – Which units serve as the comparison group. “not_yet_treated” (jwdid default) uses all untreated observations at each time period; “never_treated” uses only units never treated throughout the sample.
anticipation (int) – Number of periods before treatment onset to include as treatment cells (anticipation effects). 0 means no anticipation.
demean_covariates (bool) – If True (jwdid default), xtvar covariates are demeaned within each cohort×period cell before entering the regression. Set to False to replicate jwdid’s xasis option.
alpha (float) – Significance level for confidence intervals.
cluster (str or None) – Column name to use for cluster-robust SEs. Defaults to the unit identifier passed to fit().
n_bootstrap (int) – Number of bootstrap replications. 0 disables bootstrap.
bootstrap_weights ({"rademacher", "webb", "mammen"}) – Bootstrap weight distribution.
seed (int or None) – Random seed for reproducibility.
rank_deficient_action ({"warn", "error", "silent"}) – How to handle rank-deficient design matrices.
vcov_type ({"classical", "hc1", "hc2", "hc2_bm", "conley"}, default "hc1") –
Variance-covariance family for the analytical sandwich, OLS path only. hc1 (default) preserves the prior bit-equal CR1 Liang-Zeger cluster-robust behavior via the within-transform path. hc2_bm auto-routes to a full-dummy saturated design (intercept + treatment cells + unit dummies + time dummies) — FWL preserves cohort coefficients but NOT the hat matrix, so HC2 leverage and Bell-McCaffrey Satterthwaite DOF must be computed on the full FE projection (matches clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt). classical / hc2 are supported via the same full-dummy route AND an auto-drop of the unit auto-cluster (one-way families don’t compose with cluster_ids per the linalg validator). Explicit cluster="X" + one-way vcov_type raises at the validator. "conley" (Conley 1999 spatial-HAC) threads the conley_* params through solve_ols on the within-transform design (conley_lag_cutoff=0 = within-period spatial only; >0 adds within-unit Bartlett serial — the panel-aware path, not pooled cross-sectional, since conley_time / conley_unit are always supplied); the unit auto-cluster is dropped (an explicit cluster= enables the spatial+cluster product kernel) and survey_design= / weights / n_bootstrap>0 are rejected. Conley is OLS-path-only; it routes through the full-dummy design when cohort_trends=True (same as the other full-dummy families), and its vcov flows through aggregate("group"|"calendar"|"event").

method in {"logit","poisson"} + vcov_type != "hc1" is REJECTED at __init__: the GLM QMLE sandwich path uses pseudo- residuals, and CR2-BM composition with QMLE on canonical-link pseudo- residuals needs derivation + R parity (tracked in TODO.md). Survey designs combined with vcov_type != "hc1" raise NotImplementedError at fit() because the survey TSL / replicate- refit variance overrides the analytical sandwich.
cohort_trends (bool, default False) – When True, adds linear dg_i · t cohort-specific trend interactions to the design matrix per paper W2025 Section 8 / Eq. 8.1. Under a heterogeneous-trends DGP this recovers τ even when parallel trends fails (paper Section 8.3). OLS-path only: cohort_trends=True + method ∈ {"logit","poisson"} raises NotImplementedError at __init__. Auto-routes to the full-dummy design regardless of vcov_type (matching the absorb→fixed_effects auto-route). Each treated cohort must have ≥ 2 observed pre-periods in the analysis sample for dg_i · t to be separately identified from cohort + time FE; fit() raises ValueError otherwise. On all-eventually-treated panels the last cohort’s trend column is dropped per paper Section 5.4. cohort_trends=True + survey_design raises NotImplementedError at fit() (deferred follow-up). cohort_trends=True + control_group="never_treated" also raises NotImplementedError at fit() because the OLS + never_treated branch emits ALL (g, t) placebo cell dummies (paper Section 4.4 placebo coverage); the appended dg_i · t trend columns are linearly spanned by the per-cohort sum of those cell dummies, so the Section 8 trend specification is unidentified on this branch. Use control_group="not_yet_treated" (the default) for the cohort_trends surface.

Methods

`fit`(data, outcome, unit, time, cohort[, ...])	Fit the ETWFE model.
`get_params`()	Return estimator parameters (sklearn-compatible).
`set_params`(**params)	Set estimator parameters (sklearn-compatible).

__init__(method='ols', control_group='not_yet_treated', anticipation=0, demean_covariates=True, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', vcov_type='hc1', cohort_trends=False, conley_coords=None, conley_cutoff_km=None, conley_metric='haversine', conley_kernel='bartlett', conley_lag_cutoff=None)[source]

Parameters:

method (str)
control_group (str)
anticipation (int)
demean_covariates (bool)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str)
seed (int | None)
rank_deficient_action (str)
vcov_type (str)
cohort_trends (bool)
conley_coords (Tuple[str, str] | None)
conley_cutoff_km (float | None)
conley_metric (str)
conley_kernel (str)
conley_lag_cutoff (int | None)

Return type:

None

property results_: WooldridgeDiDResults

get_params()[source]

Return estimator parameters (sklearn-compatible).

Return type:: Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible). Returns self.

Atomic: if validation rejects the incoming combination (unknown parameter, invalid value, or the method × vcov_type interaction guard fires), self is unchanged so a caller that catches ValueError / NotImplementedError can keep using the estimator with its previous configuration. Mirrors the DifferenceInDifferences.set_params pattern at estimators.py:995-1023.

Parameters:: params (Any)
Return type:: WooldridgeDiD

fit(data, outcome, unit, time, cohort, exovar=None, xtvar=None, xgvar=None, survey_design=None)[source]

Fit the ETWFE model. See class docstring for parameter details.

Parameters:

data (DataFrame with panel data (long format))
outcome (outcome column name)
unit (unit identifier column)
time (time period column)
cohort (first treatment period (0 or NaN = never treated))
exovar (time-invariant covariates added without interaction/demeaning)
xtvar (time-varying covariates (demeaned within cohort×period cells) – when demean_covariates=True)
xgvar (covariates interacted with each cohort indicator)
survey_design (SurveyDesign, optional) – Survey design specification for complex survey data. Supports stratified, clustered, and weighted designs via Taylor Series Linearization (TSL). Replicate-weight designs raise NotImplementedError.

Return type:

WooldridgeDiDResults

WooldridgeDiDResults#

Results container returned by WooldridgeDiD.fit().

cohort_trend_coefs (populated under cohort_trends=True, OLS path only): Dict[g → δ_g] keyed by treated cohort. The reported slopes are relative to the baseline trend absorbed by the design — the never-treated cohort’s trend (when a never-treated cohort exists) OR the last cohort’s trend (when no never-treated cohort exists, per paper W2025 Section 5.4’s all-eventually-treated drop rule). On all-treated panels the last cohort is intentionally absent from the dict; its slope is the baseline (zero in deviation form). See docs/methodology/REGISTRY.md → ## WooldridgeDiD (ETWFE) → “Heterogeneous cohort trends” for the full normalization contract.

class diff_diff.wooldridge_results.WooldridgeDiDResults[source]

Bases: object

Results from WooldridgeDiD.fit().

Core output is group_time_effects: a dict keyed by (cohort_g, time_t) with per-cell ATT estimates and inference. Call .aggregate(type, weights=...) to compute any of the four jwdid_estat aggregation types under either the default cell-count weighting (weights="cell", matches Stata jwdid_estat) or the paper W2025 opt-in cohort-share weighting (weights="cohort_share", Eqs. 7.4 / 7.6; restricted to type ∈ {"simple", "event"}). cohort_trend_coefs carries Section 8 / Eq. 8.1 estimated δ_g slopes when the fit was produced under WooldridgeDiD(cohort_trends=True). aggregation_weights is keyed by aggregation type and records the active weighting scheme that wrote to each cached surface (surfaced in summary() / to_dataframe() / __repr__).

Methods

`aggregate`(type[, weights])	Compute and store one of the four jwdid_estat aggregation types.
`summary`([aggregation])	Print formatted summary table.

group_time_effects: Dict[Tuple[Any, Any], Dict[str, Any]]: key=(g,t), value={att, se, t_stat, p_value, conf_int}

overall_att: float

overall_se: float

overall_t_stat: float

overall_p_value: float

overall_conf_int: Tuple[float, float]

group_effects: Dict[Any, Dict] | None = None

calendar_effects: Dict[Any, Dict] | None = None

event_study_effects: Dict[int, Dict] | None = None

method: str = 'ols'

control_group: str = 'not_yet_treated'

groups: List[Any]

time_periods: List[Any]

n_obs: int = 0

n_treated_units: int = 0

n_control_units: int = 0

alpha: float = 0.05

anticipation: int = 0

survey_metadata: Any | None = None

vcov_type: str = 'hc1'

cluster_name: str | None = None

n_clusters: int | None = None

conley_lag_cutoff: int | None = None

cohort_trend_coefs: Dict[Any, float]

cohort_trends: bool = False

aggregation_weights: Dict[str, str]

aggregate(type, weights='cell')[source]

Compute and store one of the four jwdid_estat aggregation types.

Parameters:

type ("simple" | "group" | "calendar" | "event")
weights ("cell" | "cohort_share", default "cell") – Aggregation weighting scheme. "cell" (default) uses cell- count n_{g,t} observation counts and matches Stata jwdid_estat. "cohort_share" uses paper W2025 Eq. 7.4 ω̂_g = N_g / Σ_{g'} N_{g'} M_{g'} for type="simple" and Eq. 7.6 ω̂_{ge} = N_g / Σ_{g': g'+e ≤ T} N_{g'} for type="event". Both formulas reduce to N_g-proportional per-cell weights with the appropriate normalization. The two schemes coincide on balanced panels with uniform within-cohort cell counts (paper Section 7.5). The cohort-share scheme is supported only for type="simple" and type="event"; the paper provides no explicit cohort-share formula for "group" or "calendar" aggregations and the library raises ValueError to preserve a fail-closed contract.
chaining. (Returns self for)

Return type:

WooldridgeDiDResults

Notes

When vcov_type == "hc2_bm", aggregated inference (t_stat / p_value / conf_int) uses Bell-McCaffrey Satterthwaite contrast-specific DOFs rather than the survey/None default. The BM DOFs are computed lazily from _bm_artifacts via _compute_cr2_bm_contrast_dof and fail-closed (NaN inference) when the helper raises or returns NaN — per feedback_bm_contrast_dof_fail_closed. The contrast column is rebuilt under the active weights scheme so the BM DOF reflects the actual weighting used by ATT + SE.

summary(aggregation='simple')[source]

Print formatted summary table.

Parameters:: aggregation (which aggregation to display ("simple", "group", "calendar", "event"))
Return type:: str

__init__(group_time_effects, overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, group_effects=None, calendar_effects=None, event_study_effects=None, method='ols', control_group='not_yet_treated', groups=<factory>, time_periods=<factory>, n_obs=0, n_treated_units=0, n_control_units=0, alpha=0.05, anticipation=0, survey_metadata=None, vcov_type='hc1', cluster_name=None, n_clusters=None, conley_lag_cutoff=None, cohort_trend_coefs=<factory>, _bootstrap_used=False, cohort_trends=False, aggregation_weights=<factory>, _gt_weights=<factory>, _n_g_per_cohort=<factory>, _gt_vcov=None, _gt_keys=<factory>, _df_survey=None, _bm_per_cell_dof=<factory>, _bm_artifacts=None, _df_one_way=None)

Parameters:

group_time_effects (Dict[Tuple[Any, Any], Dict[str, Any]])
overall_att (float)
overall_se (float)
overall_t_stat (float)
overall_p_value (float)
overall_conf_int (Tuple[float, float])
group_effects (Dict[Any, Dict] | None)
calendar_effects (Dict[Any, Dict] | None)
event_study_effects (Dict[int, Dict] | None)
method (str)
control_group (str)
groups (List[Any])
time_periods (List[Any])
n_obs (int)
n_treated_units (int)
n_control_units (int)
alpha (float)
anticipation (int)
survey_metadata (Any | None)
vcov_type (str)
cluster_name (str | None)
n_clusters (int | None)
conley_lag_cutoff (int | None)
cohort_trend_coefs (Dict[Any, float])
_bootstrap_used (bool)
cohort_trends (bool)
aggregation_weights (Dict[str, str])
_gt_weights (Dict[Tuple[Any, Any], int])
_n_g_per_cohort (Dict[Any, int])
_gt_vcov (ndarray | None)
_gt_keys (List[Tuple[Any, Any]])
_df_survey (int | None)
_bm_per_cell_dof (Dict[Tuple[Any, Any], float])
_bm_artifacts (Tuple[ndarray, ndarray, ndarray, Dict[Tuple[Any, Any], int]] | None)
_df_one_way (float | None)

Return type:

None

to_dataframe(aggregation='event')[source]

Export aggregated effects to a DataFrame.

Parameters:: aggregation ("simple" | "group" | "calendar" | "event" | "gt") – Use “gt” to export raw group-time effects.
Return type:: DataFrame

plot_event_study(weights='cell', **kwargs)[source]

Event study plot. Always calls aggregate('event', weights=weights).

Parameters:

weights ("cell" | "cohort_share", default "cell") – Aggregation weighting scheme threaded into the underlying aggregate("event", ...) call. "cohort_share" produces paper W2025 Eq. 7.6 cohort-share-by-exposure weights (post-treatment k >= 0 only); inference fields are fail-closed to NaN per the Section 7.5 conditional-on-shares contract documented in REGISTRY, and the plot suppresses error bars / CI bands to honor the fail-closed contract (the conditional-on-shares SE would build a misleading normal-theory CI in the plotter).
**kwargs – Forwarded to diff_diff.visualization.plot_event_study.

Return type:

None

Notes

The wrapper unconditionally re-aggregates the event study under the requested weights scheme. This avoids the stale-cache hazard where a prior plot_event_study(weights="cohort_share") call would leave the cached event_study_effects restricted to k >= 0 (per the Eq. 7.6 scope), and a subsequent plot_event_study() (default weights="cell") call would silently reuse the cohort-share-keyed cache instead of restoring the full event range including pre-period placebo leads.

property att: float

property se: float

property conf_int: Tuple[float, float]

property p_value: float

property t_stat: float

Example Usage#

Basic OLS (follows Stata jwdid y, ivar(unit) tvar(time) gvar(cohort)):

import pandas as pd
from diff_diff import WooldridgeDiD

df = pd.read_stata("mpdta.dta")
df['first_treat'] = df['first_treat'].astype(int)

m = WooldridgeDiD()
r = m.fit(df, outcome='lemp', unit='countyreal', time='year', cohort='first_treat')

r.aggregate('event').aggregate('group').aggregate('simple')
print(r.summary('event'))
print(r.summary('group'))
print(r.summary('simple'))

Note

When method="ols" is applied to a binary ({0, 1}) or non-negative integer-count outcome, fit() emits a UserWarning noting that a matching nonlinear model (method="logit" / method="poisson") is often the more appropriate specification for such outcomes — it imposes parallel trends on the link/index scale rather than in levels (Wooldridge 2023 notes level-PT is only valid for continuous/unbounded outcomes), and in that paper’s simulations the linear model is both biased and less precise where the nonlinear mean holds. It rests on a different identifying assumption than linear OLS, so treat it as a recommended comparison, not an automatic switch. OLS remains a valid QMLE for any response (Wooldridge 2023); suppress the hint via warnings.filterwarnings. The check is heuristic: bounded discrete (binomial-style) outcomes with a known upper bound are not separately detected from unbounded counts.

View cohort×time cell estimates (post-treatment):

for (g, t), v in sorted(r.group_time_effects.items()):
    if t >= g:
        print(f"g={g} t={t}  ATT={v['att']:.4f}  SE={v['se']:.4f}")

Poisson QMLE for non-negative outcomes (follows Stata jwdid emp, method(poisson)):

import numpy as np
df['emp'] = np.exp(df['lemp'])

m_pois = WooldridgeDiD(method='poisson')
r_pois = m_pois.fit(df, outcome='emp', unit='countyreal',
                    time='year', cohort='first_treat')
r_pois.aggregate('event').aggregate('group').aggregate('simple')
print(r_pois.summary('simple'))

Logit for binary outcomes (follows Stata jwdid y, method(logit)):

m_logit = WooldridgeDiD(method='logit')
r_logit = m_logit.fit(df, outcome='hi_emp', unit='countyreal',
                      time='year', cohort='first_treat')
r_logit.aggregate('group').aggregate('simple')
print(r_logit.summary('group'))

Aggregation Methods#

Call .aggregate(type, weights=...) before .summary(type):

Type	Description	Stata equivalent
`'event'`	ATT by relative time k = t − g	`estat event`
`'group'`	ATT averaged across post-treatment periods per cohort	`estat group`
`'calendar'`	ATT averaged across cohorts per calendar period	`estat calendar`
`'simple'`	Overall weighted average ATT	`estat simple`

Weighting schemes (weights="cell" default, weights="cohort_share" opt-in):

weights="cell" (default) — cell-count n_{g,t} weighting; matches Stata jwdid_estat. Supported for all four aggregation types.
weights="cohort_share" — paper W2025 Eq. 7.4 (simple) and Eq. 7.6 (event, restricted to k >= 0) cohort-share weighting. Supported only for type="simple" and type="event"; raises on type ∈ {"group","calendar"} (no paper closed-form). Inference fields (t-stat / p-value / conf-int) are fail-closed to NaN with a UserWarning documenting the conditional-on-shares limitation (paper W2025 Section 7.5). Raises on survey_design is not None (design-consistent cohort totals pending follow-up).

Comparison with Other Staggered Estimators#

Feature	WooldridgeDiD (ETWFE)	CallawaySantAnna	ImputationDiD
Approach	Single saturated regression	Separate 2×2 DiD per cell	Impute Y(0) via FE model
Nonlinear outcomes	Yes (Poisson, Logit)	No	No
Covariates	Via regression (linear index)	OR, IPW, DR	Supported
SE for aggregations	Delta method	Multiplier bootstrap	Multiplier bootstrap
Stata equivalent	`jwdid`	`csdid`	`did_imputation`