diff_diff.EfficientDiD#

class diff_diff.EfficientDiD[source]#

Bases: EfficientDiDBootstrapMixin

Efficient DiD estimator (Chen, Sant’Anna & Xie 2025).

Without covariates, achieves the semiparametric efficiency bound for ATT(g,t) using a closed-form estimator based on within-group sample means and covariances.

With covariates, uses a doubly robust path: sieve-based propensity score ratios (Eq 4.1-4.2), sieve outcome regressions (polynomial basis, AIC/BIC order selection), sieve-estimated inverse propensities (algorithm step 4), and kernel-smoothed conditional Omega*(X) with per-unit efficient weights (Eq 3.12). The DR property ensures consistency if either the outcome regression or the sieve propensity ratio is correctly specified; because all nuisances are sieves / kernel smoothers (the paper’s flexible-nuisance specification), the covariate path attains the semiparametric efficiency bound under the paper’s regularity conditions (see REGISTRY.md).

Parameters:
  • pt_assumption (str, default "all") – Parallel trends variant: "all" (overidentified, uses all pre-treatment periods and comparison groups) or "post" (just-identified, single baseline, equivalent to CS).

  • alpha (float, default 0.05) – Significance level.

  • cluster (str or None) – Column name for cluster-robust SEs. When set, analytical SEs use the Liang-Zeger clustered sandwich estimator on EIF values. With n_bootstrap > 0, bootstrap weights are generated at the cluster level (all units in a cluster share the same weight).

  • vcov_type (str, default "hc1") – Variance-estimator family. Permanently narrow to {"hc1"} per the Chen-Sant’Anna-Xie (2025) IF-based variance — analytical-sandwich families {classical, hc2, hc2_bm} and conley are rejected at __init__ / set_params. See REGISTRY.md for the methodology rationale (no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined). Use cluster=<col> for Liang-Zeger CR1 on cluster-aggregated EIF; use survey_design= for Taylor Series Linearization on the combined IF.

  • control_group (str, default "never_treated") – Which units serve as the comparison group: "never_treated" requires a never-treated cohort (raises if none exist); "last_cohort" reclassifies the latest treatment cohort as pseudo-never-treated and drops periods at t >= last_g - anticipation so the pseudo-control’s pre-treatment window excludes anticipation-contaminated periods. Distinct from CallawaySantAnna’s "not_yet_treated" — see REGISTRY.md for details.

  • n_bootstrap (int, default 0) – Number of multiplier bootstrap iterations (0 = analytical only).

  • bootstrap_weights (str, default "rademacher") – Bootstrap weight distribution.

  • seed (int or None) – Random seed for reproducibility.

  • anticipation (int, default 0) – Number of anticipation periods (shifts the effective treatment boundary forward by this amount). When combined with control_group="last_cohort", also trims the pseudo-control period set at t >= last_g - anticipation (see REGISTRY.md).

  • sieve_k_max (int or None) – Maximum polynomial degree for the covariate-path sieves — the propensity-ratio, inverse-propensity, AND outcome-regression fits all use it. None = auto (floor(n_pos^{1/5}) over each group’s positive-weight support n_pos — the raw group size when unweighted — a growing sieve with no fixed ceiling, bounded by n_basis < n_pos; zero-weight survey rows do not affect order selection). Only used with covariates. sieve_k_max=1 forces every covariate-path sieve (outcome regression and both propensity sieves) to degree 1: it recovers the pre-sieve linear-OLS outcome regression but also degree-1-constrains the propensity sieves, so it does not reproduce the exact pre-sieve estimator.

  • sieve_criterion (str, default "bic") – Information criterion ("aic" or "bic") for the order selection of all covariate-path sieves (propensity ratio, inverse propensity, and outcome regression).

  • ratio_clip (float, default 20.0) – Clip sieve propensity ratios to [1/ratio_clip, ratio_clip].

  • kernel_bandwidth (float or None) – Bandwidth for Gaussian kernel in conditional Omega* estimation. None = Silverman’s rule-of-thumb (automatic).

Examples

>>> from diff_diff import EfficientDiD
>>> edid = EfficientDiD(pt_assumption="all")
>>> results = edid.fit(data, outcome="y", unit="id", time="t",
...                    first_treat="first_treat", aggregate="all")
>>> results.print_summary()

Methods

__init__([pt_assumption, alpha, cluster, ...])

fit(data, outcome, unit, time, first_treat)

Fit the Efficient DiD estimator.

get_params()

Get estimator parameters (sklearn-compatible).

hausman_pretest(data, outcome, unit, time, ...)

Hausman pretest for PT-All vs PT-Post (Theorem A.1).

print_summary()

Print summary to stdout.

set_params(**params)

Set estimator parameters (sklearn-compatible).

summary()

Get summary of estimation results.

Attributes

n_bootstrap

bootstrap_weights

alpha

seed

anticipation

__init__(pt_assumption='all', alpha=0.05, cluster=None, vcov_type='hc1', control_group='never_treated', n_bootstrap=0, bootstrap_weights='rademacher', seed=None, anticipation=0, sieve_k_max=None, sieve_criterion='bic', ratio_clip=20.0, kernel_bandwidth=None)[source]#
Parameters:
  • pt_assumption (str)

  • alpha (float)

  • cluster (str | None)

  • vcov_type (str)

  • control_group (str)

  • n_bootstrap (int)

  • bootstrap_weights (str)

  • seed (int | None)

  • anticipation (int)

  • sieve_k_max (int | None)

  • sieve_criterion (str)

  • ratio_clip (float)

  • kernel_bandwidth (float | None)

classmethod __new__(*args, **kwargs)#