diff_diff.EfficientDiD#

class diff_diff.EfficientDiD[source]#

Bases: EfficientDiDBootstrapMixin

Efficient DiD estimator (Chen, Sant’Anna & Xie 2025).

Without covariates, achieves the semiparametric efficiency bound for ATT(g,t) using a closed-form estimator based on within-group sample means and covariances.

With covariates, uses a doubly robust path: sieve-based propensity score ratios (Eq 4.1-4.2), sieve outcome regressions (polynomial basis, AIC/BIC order selection), sieve-estimated inverse propensities (algorithm step 4), and kernel-smoothed conditional Omega*(X) with per-unit efficient weights (Eq 3.12). The DR property ensures consistency if either the outcome regression or the sieve propensity ratio is correctly specified; because all nuisances are sieves / kernel smoothers (the paper’s flexible-nuisance specification), the covariate path attains the semiparametric efficiency bound under the paper’s regularity conditions (see REGISTRY.md).

Parameters:

pt_assumption (str, default "all") – Parallel trends variant: "all" (overidentified, uses all pre-treatment periods and comparison groups) or "post" (just-identified, single baseline, equivalent to CS).
alpha (float, default 0.05) – Significance level.
cluster (str or None) – Column name for cluster-robust SEs. When set, analytical SEs use the Liang-Zeger clustered sandwich estimator on EIF values. With n_bootstrap > 0, bootstrap weights are generated at the cluster level (all units in a cluster share the same weight).
vcov_type (str, default "hc1") – Variance-estimator family. Permanently narrow to {"hc1"} per the Chen-Sant’Anna-Xie (2025) IF-based variance — analytical-sandwich families {classical, hc2, hc2_bm} and conley are rejected at __init__ / set_params. See REGISTRY.md for the methodology rationale (no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined). Use cluster=<col> for Liang-Zeger CR1 on cluster-aggregated EIF; use survey_design= for Taylor Series Linearization on the combined IF.
control_group (str, default "never_treated") – Which units serve as the comparison group: "never_treated" requires a never-treated cohort (raises if none exist); "last_cohort" reclassifies the latest treatment cohort as pseudo-never-treated and drops periods at t >= last_g - anticipation so the pseudo-control’s pre-treatment window excludes anticipation-contaminated periods. Distinct from CallawaySantAnna’s "not_yet_treated" — see REGISTRY.md for details.
n_bootstrap (int, default 0) – Number of multiplier bootstrap iterations (0 = analytical only).
bootstrap_weights (str, default "rademacher") – Bootstrap weight distribution.
seed (int or None) – Random seed for reproducibility.
anticipation (int, default 0) – Number of anticipation periods (shifts the effective treatment boundary forward by this amount). When combined with control_group="last_cohort", also trims the pseudo-control period set at t >= last_g - anticipation (see REGISTRY.md).
sieve_k_max (int or None) – Maximum polynomial degree for the covariate-path sieves — the propensity-ratio, inverse-propensity, AND outcome-regression fits all use it. None = auto (floor(n_pos^{1/5}) over each group’s positive-weight support n_pos — the raw group size when unweighted — a growing sieve with no fixed ceiling, bounded by n_basis < n_pos; zero-weight survey rows do not affect order selection). Only used with covariates. sieve_k_max=1 forces every covariate-path sieve (outcome regression and both propensity sieves) to degree 1: it recovers the pre-sieve linear-OLS outcome regression but also degree-1-constrains the propensity sieves, so it does not reproduce the exact pre-sieve estimator.
sieve_criterion (str, default "bic") – Information criterion ("aic" or "bic") for the order selection of all covariate-path sieves (propensity ratio, inverse propensity, and outcome regression).
ratio_clip (float, default 20.0) – Clip sieve propensity ratios to [1/ratio_clip, ratio_clip].
kernel_bandwidth (float or None) – Bandwidth for Gaussian kernel in conditional Omega* estimation. None = Silverman’s rule-of-thumb (automatic).
omega_ridge (float, default OMEGA_RIDGE_DEFAULT (1e-6)) – Relative ridge for the Omega* inversion behind the efficient weights: solves (Omega* + omega_ridge * max(trace/H, 0) * I) x = 1 instead of inverting the numerically singular Omega* that PT-All’s telescoping overidentified moments produce. Stabilizes per-cell ATT(g,t) against floating-point-level input/BLAS changes (1-ulp stability ~1e-9 vs ~1e-4 for the legacy pseudoinverse) without changing overall-ATT bias/RMSE/coverage (see REGISTRY.md). omega_ridge=0 restores the exact legacy inv/pinv code path bit-for-bit - including its per-cell condition-number warnings and the slow O(n^2 H^2) conditional-Omega* loops, so expect the legacy runtime as well.

Examples

>>> from diff_diff import EfficientDiD
>>> edid = EfficientDiD(pt_assumption="all")
>>> results = edid.fit(data, outcome="y", unit="id", time="t",
...                    first_treat="first_treat", aggregate="all")
>>> results.print_summary()

Methods

`__init__`([pt_assumption, alpha, cluster, ...])
`fit`(data, outcome, unit, time, first_treat)	Fit the Efficient DiD estimator.
`get_params`()	Get estimator parameters (sklearn-compatible).
`hausman_pretest`(data, outcome, unit, time, ...)	Hausman pretest for PT-All vs PT-Post (Theorem A.1).
`print_summary`()	Print summary to stdout.
`set_params`(**params)	Set estimator parameters (sklearn-compatible).
`summary`()	Get summary of estimation results.

Attributes

`n_bootstrap`
`bootstrap_weights`
`alpha`
`seed`
`anticipation`

__init__(pt_assumption='all', alpha=0.05, cluster=None, vcov_type='hc1', control_group='never_treated', n_bootstrap=0, bootstrap_weights='rademacher', seed=None, anticipation=0, sieve_k_max=None, sieve_criterion='bic', ratio_clip=20.0, kernel_bandwidth=None, omega_ridge=1e-06)[source]#

Parameters:

pt_assumption (str)
alpha (float)
cluster (str | None)
vcov_type (str)
control_group (str)
n_bootstrap (int)
bootstrap_weights (str)
seed (int | None)
anticipation (int)
sieve_k_max (int | None)
sieve_criterion (str)
ratio_clip (float)
kernel_bandwidth (float | None)
omega_ridge (float)

classmethod __new__(*args, **kwargs)#