diff_diff.WooldridgeDiD#

class diff_diff.WooldridgeDiD[source]#

Bases: object

Extended Two-Way Fixed Effects (ETWFE) DiD estimator.

Implements the Wooldridge (2025) saturated cohort×time regression (Empirical Economics 69(5), 2545-2587; DOI 10.1007/s00181-025-02807-z) and Wooldridge (2023) nonlinear extensions (logit, Poisson). Produces all four jwdid_estat aggregation types: simple, group, calendar, event. Opt-in surfaces include paper W2025 Section 7 cohort-share aggregation (aggregate(weights="cohort_share"), Eqs. 7.4 + 7.6) and paper W2025 Section 8 heterogeneous cohort-specific linear trends (cohort_trends=True, Eq. 8.1; OLS path only).

Parameters:

method ({"ols", "logit", "poisson"}) – Estimation method. "ols" is the linear baseline — valid for any response (Wooldridge 2023) and the usual choice for continuous outcomes; "logit" for binary or fractional outcomes; "poisson" for count data. When method="ols" is used on a binary ({0, 1}) or non-negative integer-count outcome, a UserWarning notes that a matching nonlinear model (logit / Poisson) is often the more appropriate specification — it imposes parallel trends on the link scale rather than in levels, and Wooldridge’s (2023) simulations show the linear model both biased and less precise for such outcomes when the nonlinear mean holds. It rests on a different identifying assumption than linear OLS, so it is a recommended comparison, not an automatic switch; suppress via warnings.filterwarnings.
control_group ({"not_yet_treated", "never_treated"}) – Which units serve as the comparison group. “not_yet_treated” (jwdid default) uses all untreated observations at each time period; “never_treated” uses only units never treated throughout the sample.
anticipation (int) – Number of periods before treatment onset to include as treatment cells (anticipation effects). 0 means no anticipation.
demean_covariates (bool) – If True (jwdid default), xtvar covariates are demeaned within each cohort×period cell before entering the regression. Set to False to replicate jwdid’s xasis option.
alpha (float) – Significance level for confidence intervals.
cluster (str or None) – Column name to use for cluster-robust SEs. Defaults to the unit identifier passed to fit().
n_bootstrap (int) – Number of bootstrap replications. 0 disables bootstrap.
bootstrap_weights ({"rademacher", "webb", "mammen"}) – Bootstrap weight distribution.
seed (int or None) – Random seed for reproducibility.
rank_deficient_action ({"warn", "error", "silent"}) – How to handle rank-deficient design matrices.
vcov_type ({"classical", "hc1", "hc2", "hc2_bm", "conley"}, default "hc1") –
Variance-covariance family for the analytical sandwich, OLS path only. hc1 (default) preserves the prior bit-equal CR1 Liang-Zeger cluster-robust behavior via the within-transform path. hc2_bm auto-routes to a full-dummy saturated design (intercept + treatment cells + unit dummies + time dummies) — FWL preserves cohort coefficients but NOT the hat matrix, so HC2 leverage and Bell-McCaffrey Satterthwaite DOF must be computed on the full FE projection (matches clubSandwich::vcovCR(lm(...), type="CR2") + coef_test()$df_Satt). classical / hc2 are supported via the same full-dummy route AND an auto-drop of the unit auto-cluster (one-way families don’t compose with cluster_ids per the linalg validator). Explicit cluster="X" + one-way vcov_type raises at the validator. "conley" (Conley 1999 spatial-HAC) threads the conley_* params through solve_ols on the within-transform design (conley_lag_cutoff=0 = within-period spatial only; >0 adds within-unit Bartlett serial — the panel-aware path, not pooled cross-sectional, since conley_time / conley_unit are always supplied); the unit auto-cluster is dropped (an explicit cluster= enables the spatial+cluster product kernel) and survey_design= / weights / n_bootstrap>0 are rejected. Conley is OLS-path-only; it routes through the full-dummy design when cohort_trends=True (same as the other full-dummy families), and its vcov flows through aggregate("group"|"calendar"|"event").

method in {"logit","poisson"} + vcov_type != "hc1" is REJECTED at __init__: the GLM QMLE sandwich path uses pseudo- residuals, and CR2-BM composition with QMLE on canonical-link pseudo- residuals needs derivation + R parity (tracked in TODO.md). Survey designs combined with vcov_type != "hc1" raise NotImplementedError at fit() because the survey TSL / replicate- refit variance overrides the analytical sandwich.
cohort_trends (bool, default False) – When True, adds linear dg_i · t cohort-specific trend interactions to the design matrix per paper W2025 Section 8 / Eq. 8.1. Under a heterogeneous-trends DGP this recovers τ even when parallel trends fails (paper Section 8.3). OLS-path only: cohort_trends=True + method ∈ {"logit","poisson"} raises NotImplementedError at __init__. Auto-routes to the full-dummy design regardless of vcov_type (matching the absorb→fixed_effects auto-route). Each treated cohort must have ≥ 2 observed pre-periods in the analysis sample for dg_i · t to be separately identified from cohort + time FE; fit() raises ValueError otherwise. On all-eventually-treated panels the last cohort’s trend column is dropped per paper Section 5.4. cohort_trends=True + survey_design raises NotImplementedError at fit() (deferred follow-up). cohort_trends=True + control_group="never_treated" also raises NotImplementedError at fit() because the OLS + never_treated branch emits ALL (g, t) placebo cell dummies (paper Section 4.4 placebo coverage); the appended dg_i · t trend columns are linearly spanned by the per-cohort sum of those cell dummies, so the Section 8 trend specification is unidentified on this branch. Use control_group="not_yet_treated" (the default) for the cohort_trends surface.

Methods

`__init__`([method, control_group, ...])
`fit`(data, outcome, unit, time, cohort[, ...])	Fit the ETWFE model.
`get_params`()	Return estimator parameters (sklearn-compatible).
`set_params`(**params)	Set estimator parameters (sklearn-compatible).

Attributes

results_

__init__(method='ols', control_group='not_yet_treated', anticipation=0, demean_covariates=True, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', vcov_type='hc1', cohort_trends=False, conley_coords=None, conley_cutoff_km=None, conley_metric='haversine', conley_kernel='bartlett', conley_lag_cutoff=None)[source]#

Parameters:

method (str)
control_group (str)
anticipation (int)
demean_covariates (bool)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str)
seed (int | None)
rank_deficient_action (str)
vcov_type (str)
cohort_trends (bool)
conley_coords (Tuple[str, str] | None)
conley_cutoff_km (float | None)
conley_metric (str)
conley_kernel (str)
conley_lag_cutoff (int | None)

Return type:

None

classmethod __new__(*args, **kwargs)#