diff_diff.EfficientDiD#
- class diff_diff.EfficientDiD[source]#
Bases:
EfficientDiDBootstrapMixinEfficient DiD estimator (Chen, Sant’Anna & Xie 2025).
Without covariates, achieves the semiparametric efficiency bound for ATT(g,t) using a closed-form estimator based on within-group sample means and covariances.
With covariates, uses a doubly robust path: sieve-based propensity score ratios (Eq 4.1-4.2), sieve outcome regressions (polynomial basis, AIC/BIC order selection), sieve-estimated inverse propensities (algorithm step 4), and kernel-smoothed conditional Omega*(X) with per-unit efficient weights (Eq 3.12). The DR property ensures consistency if either the outcome regression or the sieve propensity ratio is correctly specified; because all nuisances are sieves / kernel smoothers (the paper’s flexible-nuisance specification), the covariate path attains the semiparametric efficiency bound under the paper’s regularity conditions (see REGISTRY.md).
- Parameters:
pt_assumption (str, default
"all") – Parallel trends variant:"all"(overidentified, uses all pre-treatment periods and comparison groups) or"post"(just-identified, single baseline, equivalent to CS).alpha (float, default 0.05) – Significance level.
cluster (str or None) – Column name for cluster-robust SEs. When set, analytical SEs use the Liang-Zeger clustered sandwich estimator on EIF values. With
n_bootstrap > 0, bootstrap weights are generated at the cluster level (all units in a cluster share the same weight).vcov_type (str, default
"hc1") – Variance-estimator family. Permanently narrow to{"hc1"}per the Chen-Sant’Anna-Xie (2025) IF-based variance — analytical-sandwich families{classical, hc2, hc2_bm}andconleyare rejected at__init__/set_params. See REGISTRY.md for the methodology rationale (no single design matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined). Usecluster=<col>for Liang-Zeger CR1 on cluster-aggregated EIF; usesurvey_design=for Taylor Series Linearization on the combined IF.control_group (str, default
"never_treated") – Which units serve as the comparison group:"never_treated"requires a never-treated cohort (raises if none exist);"last_cohort"reclassifies the latest treatment cohort as pseudo-never-treated and drops periods att >= last_g - anticipationso the pseudo-control’s pre-treatment window excludes anticipation-contaminated periods. Distinct from CallawaySantAnna’s"not_yet_treated"— see REGISTRY.md for details.n_bootstrap (int, default 0) – Number of multiplier bootstrap iterations (0 = analytical only).
bootstrap_weights (str, default
"rademacher") – Bootstrap weight distribution.seed (int or None) – Random seed for reproducibility.
anticipation (int, default 0) – Number of anticipation periods (shifts the effective treatment boundary forward by this amount). When combined with
control_group="last_cohort", also trims the pseudo-control period set att >= last_g - anticipation(see REGISTRY.md).sieve_k_max (int or None) – Maximum polynomial degree for the covariate-path sieves — the propensity-ratio, inverse-propensity, AND outcome-regression fits all use it. None = auto (
floor(n_pos^{1/5})over each group’s positive-weight supportn_pos— the raw group size when unweighted — a growing sieve with no fixed ceiling, bounded byn_basis < n_pos; zero-weight survey rows do not affect order selection). Only used with covariates.sieve_k_max=1forces every covariate-path sieve (outcome regression and both propensity sieves) to degree 1: it recovers the pre-sieve linear-OLS outcome regression but also degree-1-constrains the propensity sieves, so it does not reproduce the exact pre-sieve estimator.sieve_criterion (str, default
"bic") – Information criterion ("aic"or"bic") for the order selection of all covariate-path sieves (propensity ratio, inverse propensity, and outcome regression).ratio_clip (float, default 20.0) – Clip sieve propensity ratios to
[1/ratio_clip, ratio_clip].kernel_bandwidth (float or None) – Bandwidth for Gaussian kernel in conditional Omega* estimation. None = Silverman’s rule-of-thumb (automatic).
Examples
>>> from diff_diff import EfficientDiD >>> edid = EfficientDiD(pt_assumption="all") >>> results = edid.fit(data, outcome="y", unit="id", time="t", ... first_treat="first_treat", aggregate="all") >>> results.print_summary()
Methods
__init__([pt_assumption, alpha, cluster, ...])fit(data, outcome, unit, time, first_treat)Fit the Efficient DiD estimator.
get_params()Get estimator parameters (sklearn-compatible).
hausman_pretest(data, outcome, unit, time, ...)Hausman pretest for PT-All vs PT-Post (Theorem A.1).
print_summary()Print summary to stdout.
set_params(**params)Set estimator parameters (sklearn-compatible).
summary()Get summary of estimation results.
Attributes
n_bootstrapbootstrap_weightsalphaseedanticipation- __init__(pt_assumption='all', alpha=0.05, cluster=None, vcov_type='hc1', control_group='never_treated', n_bootstrap=0, bootstrap_weights='rademacher', seed=None, anticipation=0, sieve_k_max=None, sieve_criterion='bic', ratio_clip=20.0, kernel_bandwidth=None)[source]#
- classmethod __new__(*args, **kwargs)#