Heterogeneous Adoption Difference-in-Differences#

Estimator for designs where no unit remains untreated at the post period. Every unit g is exposed to treatment at the same single date but adoption intensity (dose) varies across units; there is no genuinely untreated control group to anchor a standard DiD contrast.

This module implements the methodology from de Chaisemartin, Ciccia, D’Haultfœuille & Knau (2026), “Difference-in-Differences Estimators When No Unit Remains Untreated” (arXiv:2405.04465v6), which:

  1. Targets WAS or WAS_{d̲} depending on design path: Design 1’ (the QUG / Quasi-Untreated-Group case with = 0) identifies the Weighted Average Slope (WAS, paper Equation 2); Design 1 (no QUG, > 0) identifies WAS_{d̲} under Assumption 6, or sign identification only under Assumption 5 (neither additional assumption is testable via pre-trends). The shipped result classes expose target_parameter == "WAS" versus "WAS_d_lower" so callers can key on the resolved estimand.

  2. Estimates the target via local-linear regression at the dose support boundary, with three concrete fit paths: continuous_at_zero for Design 1’, and continuous_near_d_lower or mass_point for Design 1 (auto-detected from the dose distribution).

  3. Provides bias-corrected confidence intervals ported from the nprobust machinery for the continuous-dose paths, and a structural-residual 2SLS sandwich for the mass-point path.

  4. Extends to multi-period event-study settings (paper Appendix B.2), restricting staggered-timing panels to the last-treatment cohort (which retains never-treated units as comparisons) with pointwise per-horizon CIs.

Note

When to use HAD. Use HeterogeneousAdoptionDiD when your panel has no untreated unit at the post period (e.g. universal-rollout policies, industry-wide tariff changes) but treatment intensity varies across units. For panels with a never-treated control group and continuous treatment, use ContinuousDiD instead. For binary reversible treatments, use ChaisemartinDHaultfoeuille.

Note

Inference contract. Per-horizon CIs are always pointwise. There are three SE regimes selected by call site:

  • Unweighted - continuous paths use the CCT-2014 weighted-robust SE from the in-house lprobust port; the mass-point path uses a structural-residual 2SLS sandwich. No cross-horizon covariance.

  • ``weights=np.ndarray`` shortcut (deprecated) - continuous paths reuse the CCT-2014 SE; the mass-point path uses an analytical weighted 2SLS sandwich (classical / hc1; CR1 when cluster= is supplied, except cluster= + aggregate="event_study" + cband=True is rejected outright regardless of vcov_type per the cluster-combination deviation below; hc2 / hc2_bm raise NotImplementedError pending a 2SLS-specific leverage derivation). Yields variance_formula="pweight" / "pweight_2sls".

  • ``survey_design=SurveyDesign(weights=”col”, …)`` (canonical; accepts strata / PSU / FPC) - both paths compose Binder (1983) Taylor-series linearization with df_survey threaded into safe_inference. Yields variance_formula="survey_binder_tsl" / "survey_binder_tsl_2sls".

The two weighted paths currently produce different SE families on this estimator (CCT-2014 / 2SLS pweight-sandwich vs Binder-TSL); the deprecated weights= and survey= aliases will be removed in the next minor release, at which point the long-term unification onto a single SE contract under survey_design= lands. (Tracked in TODO.md; the deprecation warning emitted by HeterogeneousAdoptionDiD.fit spells the migration out per call site.) On array-in HAD pretest helpers (stute_test, yatchew_hr_test, stute_joint_pretest) the pweight-only shortcut is survey_design=make_pweight_design(weights); data-in surfaces use survey_design=SurveyDesign(weights="col_name", ...) against data instead. qug_test is the exception: the QUG step has no survey-aware migration target (Phase 4.5 C0 decision; see methodology REGISTRY) and permanently raises NotImplementedError on any of survey_design= / survey= / weights=. The composite workflow did_had_pretest_workflow handles this by skipping QUG under survey/weighted dispatch and emitting a UserWarning.

A simultaneous confidence band (sup-t) is available only on the weighted event-study path via cband=True. Joint cross-horizon analytical covariance is not computed in this release; tracked in TODO.md.

Mass-point ``vcov_type=”classical”`` deviation. The mass-point survey_design=SurveyDesign(...) paths (static and event-study) and the deprecated weights= + aggregate="event_study" + cband=True path reject vcov_type="classical" with NotImplementedError. The per-unit 2SLS influence function returned by the mass-point fit is HC1-scaled so that compute_survey_if_variance and the sup-t bootstrap target V_HC1 consistently; mixing it with a classical analytical SE would silently report a V_HC1-targeted variance under a classical label. Use vcov_type="hc1" or set robust=True explicitly (the constructor default robust=False maps to vcov_type="classical", which triggers the guard); a classical-aligned IF derivation is queued for a follow-up PR.

Mass-point cluster-combination deviation. On design="mass_point", two clustered weighted paths are rejected outright regardless of vcov_type:

  • survey_design=SurveyDesign(...) + cluster= (static and event-study): the survey path composes Binder-TSL variance, which would silently override the CR1 cluster-robust sandwich. Workarounds: cluster= alone (unweighted CR1), or weights= + cluster= (weighted-CR1 pweight sandwich), or survey_design= alone (Binder-TSL). Combined cluster-robust + survey inference is queued for a follow-up PR.

  • Deprecated weights= shortcut + cluster= + aggregate="event_study" + cband=True: the sup-t bootstrap normalizes HC1-scale perturbations by the CR1 analytical SE, mixing variance families. Workarounds: pass cband=False (keeps weighted-CR1 per-horizon), or drop cluster= (keeps weighted-HC1 sup-t).

Tip

For an end-to-end walkthrough of the survey-aware HAD workflow on a BRFSS-shape stratified household-survey panel - including the now- supported SurveyDesign(strata=...) path through the Stute pretest family (lifted in PR #432, 2026-05) - see Tutorial 22: Survey-Weighted HAD.

HeterogeneousAdoptionDiD#

class diff_diff.HeterogeneousAdoptionDiD[source]

Bases: object

Heterogeneous Adoption Difference-in-Differences estimator.

Implements de Chaisemartin, Ciccia, D’Haultfoeuille, and Knau (2026) Weighted-Average-Slope (WAS) estimator with three design-dispatch paths: Design 1’ (continuous-at-zero), Design 1 continuous-near- d_lower, and Design 1 mass-point (2SLS sample-average per paper Section 3.2.4). Two aggregation modes:

  • aggregate="overall" (Phase 2a, default) returns a single-period HeterogeneousAdoptionDiDResults on a two-period panel.

  • aggregate="event_study" (Phase 2b, paper Appendix B.2) returns a HeterogeneousAdoptionDiDEventStudyResults with per- event-time WAS estimates on a multi-period panel, using a uniform F-1 anchor and pointwise CIs per horizon. Staggered-timing panels auto-filter to the last-treatment cohort plus never-treated units (paper Appendix B.2 prescription).

Parameters:
  • design ({"auto", "continuous_at_zero", "continuous_near_d_lower", "mass_point"}) –

    Design-dispatch strategy. Defaults to "auto" which resolves via the REGISTRY auto-detect rule on the fitted dose data (see _detect_design()).

    Explicit overrides are checked against the paper’s regime-partition contract (Section 3.2) at fit time:

    • "continuous_at_zero" (Design 1’): paper requires the support infimum d_lower = 0. Phase 1c’s _validate_had_inputs rejects mass-point samples passed to this path.

    • "continuous_near_d_lower" (Design 1, continuous density near d_lower): requires d_lower > 0 and a non-mass-point sample (modal fraction at d.min() must be <= 2%). d_lower must equal float(d.min()) within float tolerance; non-support-infimum thresholds are off- support and raise.

    • "mass_point" (Design 1 mass-point): requires d_lower > 0 AND a mass-point sample (modal fraction at d.min() must be > 2%). d_lower must equal float(d.min()) within float tolerance. Forcing this design on a d_lower = 0 sample or on a continuous (non-mass-point) sample raises; in either case 2SLS identifies a different estimand than the paper’s Design 1 mass-point WAS.

    Mismatched overrides raise ValueError pointing at the correct design rather than silently identifying a different estimand.

  • d_lower (float or None) – Support infimum d_lower. None means use 0.0 on the Design 1’ path and float(d.min()) on the other two paths. On Design 1 paths (continuous_near_d_lower and mass_point), an explicit d_lower must equal float(d.min()) within float tolerance AND must be strictly positive; zero-valued or mismatched thresholds raise.

  • kernel ({"epanechnikov", "triangular", "uniform"}) – Forwarded to bias_corrected_local_linear() on the continuous paths. Ignored on the mass-point path.

  • alpha (float) – CI level (0.05 for 95% CI).

  • vcov_type ({"classical", "hc1"} or None) – Mass-point-path only. When None, the effective family falls back to the robust flag: robust=True -> "hc1", robust=False -> "classical" (the default construction). Explicit "hc2" and "hc2_bm" raise NotImplementedError pending a 2SLS-specific leverage derivation. Ignored on the continuous paths (which use the CCT-2014 robust SE from Phase 1c); passing a non-default vcov_type on a continuous path emits a UserWarning per fit call.

  • robust (bool) – Backward-compat alias used only when vcov_type is None: True -> "hc1", False -> "classical". Explicit vcov_type takes precedence (e.g., vcov_type="classical", robust=True runs classical). Only the mass-point path consumes these; continuous paths ignore both with a warning.

  • cluster (str or None) – Column name for cluster-robust SE on the mass-point path (CR1). Ignored with a UserWarning on the continuous paths in Phase 2a (nonparametric cluster support exists on Phase 1c but is exposed separately via bias_corrected_local_linear; the estimator-level knob is queued for a follow-up PR).

Notes

Non-testable assumptions (paper Section 3.1.2). Point identification of WAS_{d_lower} on the Design 1 family (continuous_near_d_lower and mass_point) requires Assumption 6 in addition to parallel trends; sign identification requires Assumption 5. Neither is testable via pre-trends:

  • Assumption 5 (sign identification): the boundary slope-ratio lim_{d down d_lower} E(TE_2 | D_2 <= d) / WAS < E(D_2) / d_lower relates the conditional expectation near the boundary to the overall WAS; it cannot be inferred from pre-period outcome trajectories alone.

  • Assumption 6 (point identification): the counterfactual-mean alignment lim_{d down d_lower} E[Y_2(d_lower) - Y_2(0) | D_2 <= d] = E[Y_2(d_lower) - Y_2(0)] is a statement about an unobserved counterfactual at the support infimum.

The fit() method emits a UserWarning whenever resolved_design is on the Design 1 family (continuous_near_d_lower or mass_point) so users are not silently led to interpret point estimates as full point identification. The available pre-tests verify ADJACENT identifying conditions:

  • diff_diff.qug_test(): Theorem 4 / Design 1’ support-infimum null d_lower = 0 (adjacent evidence on the d_lower = 0 clause of Assumption 4 only, NOT a test of the full Assumption 4 statement which also covers boundary-density positivity, conditional-mean smoothness, conditional-variance regularity, and bandwidth conditions).

  • diff_diff.stute_test() / diff_diff.yatchew_hr_test(): Assumption 8 linearity of E[ΔY | D_2] in D_2 (residuals from dy ~ 1 + d).

  • diff_diff.joint_pretrends_test(): Assumption 7 mean-independence pre-trends across multi-period placebos (intercept-only residual form via null_form="mean_independence"; the raw stute_test / yatchew_hr_test helpers do NOT cover Assumption 7 on their own).

None of these test Assumptions 5 or 6 directly. The Assumption 5/6 non-testability caveat is surfaced by the Design 1 fit-time UserWarning and by T21 (HAD pretest workflow tutorial) prose, NOT by the composite workflow verdict string (which only flags the Assumption 7 step-2 gap on the two-period aggregate="overall" path).

Diagnostics coverage. HeterogeneousAdoptionDiDResults.bandwidth_diagnostics and .bias_corrected_fit are populated only on the continuous paths; both are None on the mass-point path (which is parametric and has no bandwidth). Conversely, .n_mass_point and .n_above_d_lower are populated only on the mass-point path.

Clone idempotence. self.design stores the RAW user input (e.g., "auto"); the resolved mode is stored on the result object at fit time. This mirrors Phase 1a’s _vcov_type_arg pattern and keeps get_params() / sklearn.clone() round-trips exact.

Examples

Construct a two-period HAD panel by hand. Phase 2a requires exactly two periods with D_{g,1} = 0 for every unit.

>>> import numpy as np
>>> import pandas as pd
>>> from diff_diff import HeterogeneousAdoptionDiD
>>> rng = np.random.default_rng(42)
>>> G = 500
>>> dose_post = rng.uniform(0.0, 1.0, G)
>>> dose_post[0] = 0.0  # at least one zero-dose unit for Design 1'
>>> delta_y = 0.3 * dose_post + 0.1 * rng.standard_normal(G)
>>> data = pd.DataFrame({
...     "unit": np.repeat(np.arange(G), 2),
...     "period": np.tile([1, 2], G),
...     "dose": np.column_stack([np.zeros(G), dose_post]).ravel(),
...     "outcome": np.column_stack([np.zeros(G), delta_y]).ravel(),
... })
>>> est = HeterogeneousAdoptionDiD(design="auto")
>>> result = est.fit(
...     data, outcome_col="outcome", dose_col="dose",
...     time_col="period", unit_col="unit",
... )
>>> result.design
'continuous_at_zero'
__init__(design='auto', d_lower=None, kernel='epanechnikov', alpha=0.05, vcov_type=None, robust=False, cluster=None, n_bootstrap=999, seed=None)[source]
Parameters:
Return type:

None

get_params(deep=True)[source]

Return the raw constructor parameters (sklearn-compatible).

Matches the sklearn.base.BaseEstimator.get_params() signature. Preserves the user’s original inputs - in particular, design returns "auto" when the user set it to "auto" (even after fit), so sklearn.base.clone(est) round-trips exactly.

Parameters:

deep (bool, default=True) – Accepted for sklearn-contract compatibility. This estimator has no nested sub-estimator parameters, so deep=False and deep=True return the same dict.

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters and return self (sklearn-compatible).

Only keys returned by get_params() are accepted. Passing any other attribute name (including method names like fit) raises ValueError so the estimator cannot be silently corrupted by a mistyped or attacker-supplied key.

Mutation is ATOMIC: validation runs on a proposed merged parameter dict before any attribute is overwritten. A failing call (invalid key, or an otherwise valid key whose value violates the constructor constraints) leaves self unchanged and safe to reuse.

Parameters:

params (Any)

Return type:

HeterogeneousAdoptionDiD

fit(data, outcome_col, dose_col, time_col, unit_col, first_treat_col=None, aggregate='overall', survey=None, weights=None, cband=True, *, survey_design=None, trends_lin=False)[source]

Fit the HAD estimator.

aggregate="overall" (default) fits on a two-period panel and returns a HeterogeneousAdoptionDiDResults with the single-period WAS estimate. aggregate="event_study" fits on a multi-period panel (T > 2) and returns a HeterogeneousAdoptionDiDEventStudyResults with per- event-time WAS estimates using a uniform F-1 anchor (paper Appendix B.2).

Both the overall and event-study paths are panel-only: the paper (Section 2) defines HAD on panel or repeated-cross-section data, but this implementation requires a balanced panel with a unit identifier so that unit-level first differences ΔY_{g,t} = Y_{g,t} - Y_{g,t_anchor} can be formed. Repeated-cross-section inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator. Repeated-cross-section support is queued for a follow-up PR (tracked in TODO.md); it requires a separate identification path based on pre/post cell means rather than unit-level differences.

Parameters:
  • data (pd.DataFrame)

  • outcome_col (str) – Column names.

  • dose_col (str) – Column names.

  • time_col (str) – Column names.

  • unit_col (str) – Column names.

  • first_treat_col (str or None) –

    Optional first-treatment column (the period at which each unit first receives treatment; 0 for never-treated). For common-adoption panels the column is optional; when omitted, the event-study path infers the first-treatment period F from the dose invariant. Staggered-timing contract (HAD Appendix B.2):

    • `first_treat_col` supplied + multiple cohorts detected: auto-filter to the last-treatment cohort + never-treated units with a UserWarning naming kept / dropped counts.

    • `first_treat_col` omitted + multiple distinct first- positive-dose cohorts inferred from the dose path: the estimator FAIL-CLOSES with ValueError directing the user to either pass first_treat_col (activates the auto-filter) or use ChaisemartinDHaultfoeuille (did_multiplegt_dyn) for full staggered support. See REGISTRY § “Library extension: Staggered-timing fail- closed” for the rationale on raising vs. warning.

  • aggregate ({"overall", "event_study"}) – "overall" (default): returns a single-period HeterogeneousAdoptionDiDResults (Phase 2a). Requires exactly two time periods. "event_study" (Phase 2b): returns a HeterogeneousAdoptionDiDEventStudyResults with per- event-time WAS estimates on the multi-period panel (paper Appendix B.2). Requires more than two time periods. Pointwise CIs per horizon; joint cross-horizon covariance is deferred to a follow-up PR. Staggered-timing panels: see the first_treat_col contract above (auto-filter to last cohort + never-treated with UserWarning when supplied; fail-closed ValueError when omitted on a staggered panel).

  • survey_design (SurveyDesign or None, keyword-only) – Survey design (sampling weights + optional strata / PSU / FPC) for design-based inference. Supported on ALL design × aggregate combinations after Phase 4.5 B: continuous paths (continuous_at_zero, continuous_near_d_lower) on both aggregate="overall" and aggregate="event_study", AND the mass_point design on both aggregates. Continuous paths compose the SE via compute_survey_if_variance() (Binder 1983 TSL); weights propagate pointwise into the lprobust kernel. Mass-point composes the per-unit 2SLS IF on the HC1-scale and Binder-TSL-aggregates that — requires vcov_type='hc1' (the classical default raises NotImplementedError on the survey path). Event-study fits with cband=True add a multiplier-bootstrap simultaneous confidence band. Only weight_type="pweight" is supported (aweight / fweight raise NotImplementedError). Survey design columns (strata / PSU / FPC) must be constant within unit (sampling-unit-level assignment); within-unit variance raises ValueError. Replicate-weight designs raise NotImplementedError. Mutually exclusive with the deprecated survey= and weights= aliases. See docs/methodology/REGISTRY.md § HeterogeneousAdoptionDiD — “Note (HAD survey-design API consolidation)” for the full dispatch matrix.

  • survey (SurveyDesign or None) – DEPRECATED alias of survey_design=. Remains positional-or- keyword for one minor cycle to preserve pre-PR call shapes; will be removed in the next minor release. Prefer survey_design=.

  • weights (np.ndarray or None) – DEPRECATED alias for the per-row pweight shortcut. Remains positional-or-keyword for one minor cycle. Prefer adding the weights as a column on data and passing survey_design=SurveyDesign(weights='col_name') instead. Will be removed in the next minor release. Currently preserved as the analytical-HC1-sandwich shortcut (continuous: CCT-2014 weighted-robust; mass-point: pweight 2SLS sandwich) with the per-row → per-unit aggregation invariant intact. Mutually exclusive with survey_design= and survey=.

  • cband (bool, default True) – Phase 4.5 B: controls the multiplier-bootstrap simultaneous confidence band on the weighted event-study path. When True (default) and aggregate="event_study" AND any of survey_design= / survey= / weights= is supplied, the fit populates cband_low / cband_high / cband_crit_value / cband_method / cband_n_bootstrap on the result. When False those fields stay None. No effect on aggregate="overall" or on unweighted event- study. n_bootstrap and seed (constructor params) control replicate count and RNG; defaults are 999 / None.

  • trends_lin (bool, default False, keyword-only) – When True, applies paper Eq 17 linear-trend detrending to per-event-time outcome evolutions. Mirrors R DIDHAD::did_had(..., trends_lin=TRUE). Per-group slope is estimated as Y[g, F-1] - Y[g, F-2]; each event-time e evolution is replaced by dy_dict[e] - (e+1) × slope (uniform formula that absorbs both effect-side detrending and placebo-side anchor swap). Requires aggregate="event_study" AND F >= 3 (panel must include both F-1 and F-2); raises NotImplementedError on aggregate="overall" and ValueError on F < 3. The “consumed” placebo at event time e=-2 is auto-dropped (R reduces max placebo lag by 1 with the same effect). Mutually exclusive with survey weighting (survey_design / survey / weights); raises NotImplementedError if combined. Default False preserves bit-exact backcompat with all pre-PR fits.

Returns:

  • HeterogeneousAdoptionDiDResults – When aggregate="overall" (the default; two-period only): single-period WAS estimate plus shared metadata.

  • HeterogeneousAdoptionDiDEventStudyResults – When aggregate="event_study" (multi-period panel; on staggered panels auto-filters to the last cohort plus never-treated): per-event-time WAS estimates with per- horizon arrays.

Return type:

HeterogeneousAdoptionDiDResults | HeterogeneousAdoptionDiDEventStudyResults

HeterogeneousAdoptionDiDResults#

Single-period results container for HeterogeneousAdoptionDiD estimation.

class diff_diff.HeterogeneousAdoptionDiDResults[source]

Bases: object

Estimator output for HeterogeneousAdoptionDiD.

NaN-safe inference: the three downstream fields t_stat, p_value, and conf_int are routed through diff_diff.utils.safe_inference(), which returns NaN on all three whenever se is non-finite, zero, or negative. att and se themselves are RAW estimator outputs from the chosen fit path and are NOT gated by safe_inference:

  • On the degenerate fit configurations (constant outcome on the continuous paths, all-units-at-d_lower / no-dose-variation on the mass-point path), the fit path explicitly returns (att=nan, se=nan), which combined with the safe-inference gate yields all five fields NaN together.

  • On the degenerate CR1 cluster configuration (mass-point path with a single cluster), _fit_mass_point_2sls returns (att=beta_hat, se=nan) - att stays finite because the Wald-IV ratio is well defined, but the cluster-robust SE is not, so se is NaN and the downstream triple becomes NaN via the safe-inference gate.

So the guaranteed NaN coupling is on the downstream triple (t_stat, p_value, conf_int), not on att. The assert_nan_inference fixture in tests/conftest.py checks the downstream triple against the gate contract and does not assume att is NaN.

att

Point estimate of the WAS parameter on the beta-scale.

  • Design 1’ (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator): att = (mean(ΔY) - tau_bc) / D_bar where tau_bc is the bias-corrected local-linear estimate of lim_{d v 0} E[ΔY | D_2 <= d] and D_bar = (1/G) * sum(D_{g,2}).

  • Design 1 continuous-near-d_lower (paper Theorem 3 / Equation 11, WAS_{d_lower} under Assumption 6): att = (mean(ΔY) - tau_bc) / mean(D_2 - d_lower) where tau_bc is the bias-corrected local-linear estimate of lim_{d v d_lower} E[ΔY | D_2 <= d].

  • Mass-point (paper Section 3.2.4): the Wald-IV / 2SLS coefficient directly - (Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0}).

Type:

float

se

Standard error on the beta-scale. For continuous designs:

  • Unweighted or weights=<array>: CCT-2014 weighted-robust SE from Phase 1c divided by |den| (den = raw or weighted denominator depending on fit path).

  • survey=SurveyDesign(...): Binder (1983) Taylor-series linearization of the per-unit IF (bias-corrected scale, aligned with tau_bc) routed through compute_survey_if_variance() for PSU-aggregated, FPC/strata-adjusted variance, divided by |den|.

In both cases the higher-order variance from mean(ΔY) is dominated by the nonparametric boundary estimate in large samples and is not included in the leading-order formula. For mass-point, the 2SLS structural-residual sandwich SE.

Type:

float

t_stat, p_value, conf_int

Routed through safe_inference; NaN when SE is non-finite.

Type:

inference fields

alpha

CI level used at fit time (0.05 for a 95% CI).

Type:

float

design

Resolved design mode: "continuous_at_zero", "continuous_near_d_lower", or "mass_point". "auto" is resolved to one of the three concrete modes before storing.

Type:

str

target_parameter

Estimand label: "WAS" for Design 1’, "WAS_d_lower" for the other two. Pins the estimand semantically even when two designs share the same divisor.

Type:

str

d_lower

Support infimum d_lower. 0.0 for Design 1’; float(d.min()) for the other two.

Type:

float

dose_mean

D_bar = (1/G) * sum(D_{g,2}).

Type:

float

n_obs

Number of units contributing to the estimator (post panel aggregation to unit-level first differences).

Type:

int

n_treated

Number of units with D_{g,2} > d_lower.

Type:

int

n_control

Number of units at or below d_lower (the “not-treated” subset).

Type:

int

n_mass_point

Mass-point path only: number of units with D_{g,2} == d_lower. None on continuous paths.

Type:

int or None

n_above_d_lower

Mass-point path only: number of units with D_{g,2} > d_lower. None on continuous paths.

Type:

int or None

inference_method

"analytical_nonparametric" (continuous designs) or "analytical_2sls" (mass-point).

Type:

str

vcov_type

Effective variance-covariance family used. None on continuous paths (they use the CCT-2014 robust SE from Phase 1c, not the library’s vcov_type enum). Mass-point: "classical" or "hc1" when cluster is not supplied, and "cr1" whenever cluster is supplied (cluster-robust CR1 is computed regardless of the requested vcov_type because classical/hc1 + cluster collapses to the same CR1 sandwich). Downstream consumers reading result.to_dict() can inspect this field directly to determine the effective SE family.

Type:

str or None

cluster_name

Column name of the cluster variable on the mass-point path when cluster-robust SE is requested. None otherwise.

Type:

str or None

survey_metadata

Repo-standard survey metadata dataclass from diff_diff.survey.SurveyMetadata. None when fit() was called without survey= or weights=; populated on the continuous-dose weighted paths via diff_diff.survey.compute_survey_metadata(). Exposes weight_type, effective_n, design_effect, sum_weights, n_strata, n_psu, weight_range, and df_survey for downstream reporting consumers (BusinessReport, DiagnosticReport) that read these fields via attribute access. HAD-specific inference-method info (pweight vs Binder-TSL) is carried on inference_method and variance_formula.

Type:

SurveyMetadata or None

bandwidth_diagnostics

Full Phase 1b MSE-DPI selector output on the continuous paths (when bandwidths were auto-selected). None on the mass-point path (parametric, no bandwidth).

Type:

BandwidthResult or None

bias_corrected_fit

Full Phase 1c bias-corrected local-linear fit on the continuous paths. None on the mass-point path.

Type:

BiasCorrectedFit or None

att: float
se: float
t_stat: float
p_value: float
conf_int: Tuple[float, float]
alpha: float
design: str
target_parameter: str
d_lower: float
dose_mean: float
n_obs: int
n_treated: int
n_control: int
n_mass_point: int | None
n_above_d_lower: int | None
inference_method: str
vcov_type: str | None
cluster_name: str | None
survey_metadata: SurveyMetadata | None
bandwidth_diagnostics: BandwidthResult | None
bias_corrected_fit: BiasCorrectedFit | None
variance_formula: str | None = None

HAD-specific label for the SE formula on weighted fits, populated on BOTH continuous and mass-point designs (Phase 4.5 A / B): "pweight" (continuous, weighted-robust CCT 2014 under the weights= shortcut), "survey_binder_tsl" (continuous, Binder 1983 TSL with PSU/strata/FPC under survey_design=SurveyDesign(...)), "pweight_2sls" (mass-point + weights=; label applied uniformly across vcov families — classical / HC1 / CR1 — on the weighted 2SLS path, with the actual sandwich resolved via vcov_type), or "survey_binder_tsl_2sls" (mass-point, Binder 1983 TSL under survey_design=). None on unweighted fits. Orthogonal to survey_metadata which is the repo-standard diff_diff.survey.SurveyMetadata shared with downstream report/diagnostic consumers (no HAD-specific leakage).

effective_dose_mean: float | None = None

Weighted denominator used by the beta-scale rescaling, populated on weighted fits across all designs: sum(w_g · D_g) / sum(w_g) on continuous_at_zero, sum(w_g · (D_g - d_lower)) / sum(w_g) on continuous_near_d_lower, and the weighted Wald-IV dose gap mean(D | Z=1, w) - mean(D | Z=0, w) on mass_point (where Z = 1{D > d_lower}). On the continuous designs reduces bit-exactly to dose_mean / mean(D - d_lower) when weights are uniform or absent. None when fit() was called without survey_design= / survey= / weights= (use dose_mean there). Exists because dose_mean is the raw sample mean of the dose column; under weighted fits the estimator’s actual denominator is the weighted form above, and users reconstructing the β-scale value by hand need the weighted one.

summary()[source]

Formatted summary table.

Return type:

str

print_summary()[source]

Print the summary to stdout.

Return type:

None

to_dict()[source]

Return results as a dict of scalars + weighted-path surfaces.

Always-present keys mirror the dataclass fields: att, se, t_stat, p_value, conf_int_lower / conf_int_upper, alpha, design, target_parameter, d_lower, dose_mean, n_obs / n_treated / n_control / n_mass_point / n_above_d_lower, inference_method, vcov_type, cluster_name.

Weighted-path keys (None on unweighted fits):

  • survey_metadata: repo-standard diff_diff.survey.SurveyMetadata dataclass (object, not dict) carrying weight_type / effective_n / design_effect / sum_weights / weight_range + n_strata / n_psu / df_survey (latter three None on the weights= shortcut).

  • variance_formula: HAD-specific SE label, populated on BOTH continuous and mass-point designs (Phase 4.5 A / B): "pweight" (continuous, weighted-robust CCT 2014 under weights=), "survey_binder_tsl" (continuous, Binder 1983 TSL under survey_design=), "pweight_2sls" (mass-point + weights=; label applied uniformly across vcov families — classical / HC1 / CR1 — with the sandwich resolved via vcov_type), or "survey_binder_tsl_2sls" (mass-point, Binder 1983 TSL under survey_design=). See the field docstring above for the full contract.

  • effective_dose_mean: weighted denominator used by the beta-scale rescaling - weighted mean(D) on continuous_at_zero, weighted mean(D - d_lower) on continuous_near_d_lower, or the weighted Wald-IV dose gap mean(D | Z=1, w) - mean(D | Z=0, w) on mass_point.

Return type:

Dict[str, Any]

to_dataframe()[source]

Return a one-row DataFrame of the result dict.

Return type:

DataFrame

__init__(att, se, t_stat, p_value, conf_int, alpha, design, target_parameter, d_lower, dose_mean, n_obs, n_treated, n_control, n_mass_point, n_above_d_lower, inference_method, vcov_type, cluster_name, survey_metadata, bandwidth_diagnostics, bias_corrected_fit, variance_formula=None, effective_dose_mean=None)
Parameters:
Return type:

None

HeterogeneousAdoptionDiDEventStudyResults#

Multi-period event-study results container for the Appendix B.2 extension.

class diff_diff.HeterogeneousAdoptionDiDEventStudyResults[source]

Bases: object

Event-study results for HeterogeneousAdoptionDiD (Phase 2b).

Per-horizon arrays align with event_times by index; all per-horizon arrays have shape (n_horizons,). The anchor horizon e = -1 (i.e., t = F - 1) is NOT included because Y_{g, F-1} - Y_{g, F-1} = 0 trivially and the WAS is not identified there.

Per-horizon inference fields (t_stat, p_value, conf_int_low, conf_int_high) are NaN-coupled to the per-horizon se via diff_diff.utils.safe_inference(); att and se themselves are raw estimator outputs from the chosen design path on each horizon’s first differences.

Design resolution is SHARED across horizons: the design, d_lower, target_parameter, and inference_method are single scalars determined once from the post-period dose distribution D_{g, F} (paper Appendix B.2 convention — the dose regressor is invariant across event-time horizons).

event_times

Integer event-time labels e = t - F, sorted ascending. Excludes e = -1 (the anchor). Post-period horizons have e >= 0; pre-period placebos have e <= -2.

Type:

np.ndarray, shape (n_horizons,)

att

Per-horizon WAS point estimate on the beta-scale (see HeterogeneousAdoptionDiDResults.att for the per-design formula, applied to ΔY_t = Y_{g,t} - Y_{g,F-1}).

Type:

np.ndarray, shape (n_horizons,)

se

Per-horizon standard error on the beta-scale. Three regimes:

  • Unweighted: per-horizon INDEPENDENT analytical sandwich (continuous: CCT-2014 weighted-robust divided by |den|; mass-point: structural-residual 2SLS sandwich via _fit_mass_point_2sls). No cross-horizon covariance.

  • ``weights=`` shortcut: continuous paths still use the CCT-2014 weighted-robust SE from lprobust (bc_fit.se_robust / |den|); mass-point uses the analytical weighted 2SLS pweight sandwich (HC1 / classical / CR1 depending on vcov_type + cluster=). No Binder-TSL composition on this path — inference is Normal (df=None).

  • ``survey=``: each horizon composes Binder (1983) Taylor-series linearization via compute_survey_if_variance() on the per-unit β̂-scale IF (continuous + mass-point both route through the same helper). df_survey threads into safe_inference for t-inference.

Pointwise CIs are always populated; a simultaneous confidence band is available only on the weighted path via cband_* below. Joint cross-horizon analytical covariance is not computed in this release (tracked in TODO.md).

Type:

np.ndarray, shape (n_horizons,)

t_stat, p_value

Per-horizon inference triple element.

Type:

np.ndarray, shape (n_horizons,)

conf_int_low, conf_int_high

Per-horizon CI endpoints at level alpha.

Type:

np.ndarray, shape (n_horizons,)

n_obs_per_horizon

Per-horizon sample size (units contributing at that event time). In Phase 2b this equals n_units for every horizon because the validator rejects NaN in outcome / dose / unit columns upstream; tracked as a field for future flexibility (e.g., per-period missingness).

Type:

np.ndarray, shape (n_horizons,)

alpha

CI level used at fit time (0.05 for a 95% CI).

Type:

float

design

Resolved design mode, shared across horizons: "continuous_at_zero", "continuous_near_d_lower", or "mass_point".

Type:

str

target_parameter

Estimand label: "WAS" for Design 1’ (continuous_at_zero), "WAS_d_lower" for the other two.

Type:

str

d_lower

Support infimum used for all horizons. 0.0 for Design 1’; float(d.min()) otherwise.

Type:

float

dose_mean

D_bar = (1/G) * sum(D_{g,F}) computed on the fit sample (after the staggered last-cohort filter, if applied).

Type:

float

F

First-treatment period label (arbitrary dtype — int, str, datetime). Identified by the multi-period dose invariant from the fitted data.

Type:

object

n_units

Number of unique units contributing to the fit. After staggered auto-filter: last-cohort units PLUS never-treated (first_treat = 0) units retained as the untreated-group comparison per paper Appendix B.2. Only earlier-treated cohorts are dropped.

Type:

int

inference_method

"analytical_nonparametric" (continuous designs) or "analytical_2sls" (mass-point). Shared across horizons.

Type:

str

vcov_type

Effective variance-covariance family used on the mass-point path ("classical", "hc1", or "cr1" when cluster supplied). None on the continuous paths (they use CCT-2014 robust SE).

Type:

str or None

cluster_name

Column name of the cluster variable when cluster-robust SE is requested. None otherwise.

Type:

str or None

survey_metadata

Repo-standard survey metadata dataclass from diff_diff.survey.SurveyMetadata. None when fit() was called without survey= or weights=; populated on the weighted event-study path (Phase 4.5 B). See HeterogeneousAdoptionDiDResults.survey_metadata for the attribute contract.

Type:

SurveyMetadata or None

variance_formula

Per-horizon variance family (applied uniformly across horizons). "pweight" / "pweight_2sls" on the weights= shortcut, "survey_binder_tsl" / "survey_binder_tsl_2sls" on the survey= path. None on unweighted fits.

Type:

str or None

effective_dose_mean

Weighted denominator used by the β̂-scale rescaling (continuous paths: weighted sample mean of d or d - d_lower; mass-point: weighted Wald-IV dose gap). None on unweighted fits.

Type:

float or None

cband_low, cband_high

Simultaneous confidence-band endpoints constructed by the multiplier-bootstrap sup-t procedure. None on unweighted fits and when fit(..., cband=False) is passed. Horizons with se <= 0 or non-finite se are NaN (matches the pointwise inference gate from safe_inference).

Type:

np.ndarray or None, shape (n_horizons,)

cband_crit_value

Sup-t multiplier-bootstrap critical value at level 1 - alpha. Under a trivial resolved design (no strata / PSU / FPC) at H=1 reduces to Φ⁻¹(1 alpha/2) 1.96 up to Monte Carlo error; under stratified designs the helper applies PSU-aggregation + stratum-demeaning + sqrt(n_h / (n_h - 1)) small-sample correction so the bootstrap variance matches the analytical Binder-TSL target term-for- term.

Type:

float or None

cband_method

"multiplier_bootstrap" on the weighted event-study path with cband=True, else None.

Type:

str or None

cband_n_bootstrap

Number of multiplier-bootstrap replicates used to compute the sup-t critical value.

Type:

int or None

bandwidth_diagnostics

Per-horizon bandwidth diagnostics on the continuous paths; None on the mass-point path. When non-None, aligned with event_times by index.

Type:

list[BandwidthResult] or None

bias_corrected_fit

Per-horizon bias-corrected fit on the continuous paths; None on the mass-point path. When non-None, aligned with event_times by index.

Type:

list[BiasCorrectedFit] or None

filter_info

Populated when the staggered-timing last-cohort auto-filter fires. Keys: "F_last" (kept cohort label), "n_kept" (units retained), "n_dropped" (units dropped), "dropped_cohorts" (list of dropped cohort labels). None when no filter was applied.

Type:

dict or None

event_times: ndarray
att: ndarray
se: ndarray
t_stat: ndarray
p_value: ndarray
conf_int_low: ndarray
conf_int_high: ndarray
n_obs_per_horizon: ndarray
alpha: float
design: str
target_parameter: str
d_lower: float
dose_mean: float
F: Any
n_units: int
inference_method: str
vcov_type: str | None
cluster_name: str | None
survey_metadata: SurveyMetadata | None
bandwidth_diagnostics: List[BandwidthResult | None] | None
bias_corrected_fit: List[BiasCorrectedFit | None] | None
filter_info: Dict[str, Any] | None
variance_formula: str | None = None

Per-horizon variance family label (applied uniformly across all horizons in the fit). One of "pweight" / "pweight_2sls" (when a per-row weight array was supplied, including via the deprecated weights= alias; continuous / mass-point), "survey_binder_tsl" / "survey_binder_tsl_2sls" (when a SurveyDesign was supplied via survey_design= or the deprecated survey= alias), or None on unweighted fits. Mirrors the static-path variance_formula field.

effective_dose_mean: float | None = None

Weighted denominator used by the β̂-scale rescaling. For continuous designs: weighted sum(w · d)/sum(w) (continuous_at_zero) or sum(w · (d d_lower))/sum(w) (continuous_near_d_lower). For mass-point: weighted Wald-IV dose gap. None on unweighted fits.

cband_low: ndarray | None = None

Simultaneous confidence-band lower endpoints, shape (n_horizons,). None on unweighted fits and when cband=False on the weighted event-study path. Derived from multiplier-bootstrap sup-t critical value: cband_low[e] = att[e] cband_crit_value * se[e].

cband_high: ndarray | None = None

Simultaneous confidence-band upper endpoints, shape (n_horizons,). See cband_low.

cband_crit_value: float | None = None

Sup-t multiplier-bootstrap critical value at level 1 - alpha. Reduces to Φ⁻¹(1 alpha/2) 1.96 at H=1 up to Monte Carlo error. None on unweighted fits and when cband=False.

cband_method: str | None = None

"multiplier_bootstrap" on the weighted event-study path with cband=True, else None.

cband_n_bootstrap: int | None = None

Number of multiplier-bootstrap replicates used to compute the sup-t critical value. None on unweighted fits and when cband=False.

summary()[source]

Formatted per-horizon summary table.

Return type:

str

print_summary()[source]

Print the summary to stdout.

Return type:

None

to_dict()[source]

Return results as a dict with per-horizon arrays and scalars.

Per-horizon arrays are converted to Python lists via ndarray.tolist() (which unwraps NumPy scalar elements to native int / float); scalar fields are coerced to native Python types via _json_safe_scalar where relevant (NumPy scalars -> .item(), pandas Timestamp -> ISO string, Timedelta -> ISO string). The returned dict is JSON-serializable directly via json.dumps.

Return type:

Dict[str, Any]

to_dataframe()[source]

Return a tidy per-horizon DataFrame.

Columns: event_time, att, se, t_stat, p_value, conf_int_low, conf_int_high, n_obs. One row per event-time horizon. On the weighted event-study path with cband=True, also includes cband_low and cband_high columns.

Return type:

DataFrame

__init__(event_times, att, se, t_stat, p_value, conf_int_low, conf_int_high, n_obs_per_horizon, alpha, design, target_parameter, d_lower, dose_mean, F, n_units, inference_method, vcov_type, cluster_name, survey_metadata, bandwidth_diagnostics, bias_corrected_fit, filter_info, variance_formula=None, effective_dose_mean=None, cband_low=None, cband_high=None, cband_crit_value=None, cband_method=None, cband_n_bootstrap=None)
Parameters:
Return type:

None

HAD Pretests#

Diagnostic pretests for the HAD identification assumptions from de Chaisemartin et al. (2026). The composite orchestrator did_had_pretest_workflow() is a diagnostic battery only - it does NOT pick the HAD design path (continuous_at_zero / continuous_near_d_lower / mass_point); that is auto-detected inside HeterogeneousAdoptionDiD.fit() from the dose support. The workflow has two explicit modes selected by the caller via the aggregate= kwarg: aggregate="overall" (default, two-period first-differenced sample) runs single-period tests; aggregate="event_study" (multi-period panel with three or more periods) runs joint multi-period tests. Both modes return a unified HADPretestReport.

diff_diff.did_had_pretest_workflow(data, outcome_col, dose_col, time_col, unit_col, first_treat_col=None, alpha=0.05, n_bootstrap=999, seed=None, *, aggregate='overall', survey_design=None, survey=None, weights=None, trends_lin=False)[source]#

Run the HAD pre-test workflow (paper Section 4.2-4.3).

Two dispatch modes via aggregate:

aggregate="overall" (default, two-period panel): runs paper steps 1 (qug_test()) and 3 (stute_test() + yatchew_hr_test()). Step 2 (Assumption 7 pre-trends) is NOT implemented on this path because a single-pre-period panel cannot support the joint Stute variant; the returned verdict flags the Assumption 7 gap explicitly so callers do not receive a misleading “TWFE safe” signal. For multi-period panels, pass aggregate="event_study" to close the step-2 gap.

aggregate="event_study" (multi-period panel, >= 3 periods): runs QUG + joint pre-trends Stute + joint homogeneity-linearity Stute, covering paper Section 4 steps 1-3 together. The step-3 Yatchew-HR alternative (a single-horizon swap-in for Stute) is subsumed by joint Stute on this path - the paper does not derive a joint Yatchew variant, so users who need Yatchew robustness under multi-period data should call yatchew_hr_test() on each (base, post) pair manually. (Paper step 4 is the decision itself - “use TWFE if none of the tests rejects” - not a separate test, so it has no code path here. Mirrors the framing in the module-level docstring at line 54 and _compose_verdict_event_study at line 2735.)

Eq 17 / Eq 18 linear-trend detrending (paper Section 5.2 Pierce- Schott application) is now SHIPPED on the event-study path via the trends_lin keyword-only parameter (PR #392 / Phase 4 R-parity). When trends_lin=True, this workflow forwards the flag to both joint_pretrends_test() and joint_homogeneity_test(); the consumed placebo at base_period - 1 is auto-dropped from step 2 and the workflow skips step 2 (pretrends_joint=None) if no earlier placebo survives. Mirrors R DIDHAD::did_had(..., trends_lin=TRUE). Mutually exclusive with aggregate="overall" (raises NotImplementedError).

Parameters:
  • data (pd.DataFrame) – HAD panel. For aggregate="overall": balanced two-period panel with pre-period dose = 0 for every unit. For aggregate="event_study": balanced multi-period panel with >= 3 periods, an ordered time dtype (numeric, datetime, or ordered categorical), and the pre-period D=0 invariant across all pre-periods.

  • outcome_col (str)

  • dose_col (str)

  • time_col (str)

  • unit_col (str)

  • first_treat_col (str or None, default None) – Optional first-treatment-period column. Required on the aggregate="event_study" path when the panel is staggered (multi-cohort); the panel validator auto-filters to the last cohort and emits UserWarning. The overall path uses this for cross-validation only.

  • alpha (float, default 0.05)

  • n_bootstrap (int, default 999) – Replication count for the single-horizon Stute (overall) or joint Stute (event_study).

  • seed (int or None, default None) – Seed forwarded to the Stute bootstrap. QUG / Yatchew are deterministic.

  • aggregate (str, keyword-only, default "overall") – Dispatch mode. Invalid values raise ValueError.

  • survey_design (SurveyDesign or None, keyword-only, default None) – Survey design for design-based pretest inference. Linearity-family pretests use PSU-level Mammen multiplier bootstrap (Stute family) and weighted OLS + weighted variance components (Yatchew). The QUG step is skipped under survey with a UserWarning (permanent deferral per Phase 4.5 C0). Replicate-weight designs raise NotImplementedError. Mutually exclusive with the deprecated survey= and weights= aliases.

  • survey (SurveyDesign or None, keyword-only, default None) – DEPRECATED alias of survey_design=. Will be removed in the next minor release; prefer survey_design=.

  • weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias for the per-row pweight shortcut. Prefer adding the weights as a column on data and passing survey_design=SurveyDesign(weights='col_name') instead. Will be removed in the next minor release. Currently routed through a synthetic trivial ResolvedSurveyDesign so the same kernel handles both paths.

  • trends_lin (bool, default False, keyword-only) – Forwards into joint_pretrends_test() and joint_homogeneity_test() on the event-study dispatch path. Mirrors R DIDHAD::did_had(..., trends_lin=TRUE). Requires aggregate="event_study"; raises NotImplementedError on aggregate="overall" (the overall path’s qug + stute + yatchew block has no joint-pretest surface). Mutually exclusive with survey weighting at the joint-pretest layer; the joint wrappers raise NotImplementedError if combined. Effective step-2 rule under trends_lin: the consumed placebo at base_period - 1 is dropped before step 2 is dispatched; if no earlier placebo survives the drop (e.g., a minimal 4-period panel with F=3 where base_period=2 and the only earlier placebo at t=1 is the consumed one), step 2 is skipped (pretrends_joint=None) and the workflow proceeds to step 3 (homogeneity). Default False preserves bit-exact backcompat.

Returns:

On the overall path: stute and yatchew populated, pretrends_joint / homogeneity_joint are None. On the event-study path: pretrends_joint (None if no earlier pre-period) and homogeneity_joint populated, stute / yatchew are None. aggregate is recorded on the report for serialization dispatch. On the survey/weights path, qug is None (Phase 4.5 C0 deferral); other components populated as on the unweighted path.

Return type:

HADPretestReport

Raises:
  • ValueError – On invalid aggregate; if more than one of survey_design, survey, weights is supplied (3-way mutex; survey= and weights= are deprecated aliases of survey_design=); or any downstream front-door failure (panel balance, dtype, dose invariant).

  • NotImplementedError – If survey.replicate_weights is not None (replicate-weight pretests deferred to a parallel follow-up after Phase 4.5 C).

Notes

Scope (what this composite workflow does NOT cover). The component pretests target the Theorem 4 / Design 1’ support-infimum null (QUG: d_lower = 0, adjacent evidence on the d_lower = 0 clause of Assumption 4 only — does not validate boundary density, conditional-mean smoothness, or variance regularity), Assumption 7 (joint Stute pre-trends: mean-independence of placebo first- differences from dose), and Assumption 8 (Yatchew / joint homogeneity: linearity of treatment effects in dose). The workflow does NOT and CANNOT test Assumptions 5 and 6 from de Chaisemartin et al. (2026) Section 3.1.2, which are required for sign / point identification of WAS_{d_lower} on the Design 1 family (d_lower > 0). Assumptions 5/6 are non-testable via pre-trends. The composite verdict string does NOT mention Assumptions 5 or 6 — it only flags the Assumption 7 step-2 gap on the two-period aggregate="overall" path. The Assumption 5/6 caveat is surfaced separately by (a) the HeterogeneousAdoptionDiD.fit() fit-time UserWarning (which fires whenever the resolved design is Design 1 family — continuous_near_d_lower or mass_point) and (b) T21 (HAD pretest workflow tutorial) tutorial prose.

Survey/weighted data (Phase 4.5 C): under survey= or weights=, the workflow:

  1. Skips QUG with a UserWarning and sets qug=None on the report. QUG-under-survey is permanently deferred per Phase 4.5 C0; extreme-order-statistic tests are not smooth functionals of the empirical CDF and have no off-the-shelf survey-aware analog. See qug_test() Notes for the full methodology rationale.

  2. Runs the linearity family with the survey-aware mechanism (PSU-level Mammen multiplier bootstrap for Stute / joint variants; weighted OLS + weighted variance components for Yatchew) routed via the existing kernels.

  3. Verdict carries a "linearity-conditional verdict; QUG-under- survey deferred per Phase 4.5 C0" suffix to remind callers that admissibility is conditional on the linearity family alone.

  4. `all_pass` drops the QUG-conclusiveness gate (one less precondition). The linearity-conditional rule splits by aggregate:

    • aggregate="overall" survey: True iff at least one of Stute/Yatchew is conclusive AND no conclusive test rejects (paper Section 4 step-3 “Stute OR Yatchew” wording).

    • aggregate="event_study" survey: True iff pretrends_joint is non-None and conclusive, homogeneity_joint is conclusive, AND neither rejects. Both joint variants must be conclusive on the event-study path (same step-2 + step-3 closure as the unweighted aggregate, just without the QUG step).

Sister pretests are unchanged on the workflow path; direct callers can also pass weights= / survey= to stute_test(), yatchew_hr_test(), etc. (Phase 4.5 C extends each helper’s signature). Per-unit constant-within-unit invariant on weights / strata / psu / fpc is enforced by the workflow via diff_diff.had._aggregate_unit_weights() / diff_diff.had._aggregate_unit_resolved_survey().

References

de Chaisemartin et al. (2026), Section 4.2-4.3, Theorem 4, Appendix D, Theorem 7.

class diff_diff.HADPretestReport[source]

Bases: object

Composite output of did_had_pretest_workflow().

Two dispatch shapes, distinguished by aggregate:

aggregate="overall" (default, two-period panel): bundles paper steps 1 (QUG) and 3 (linearity via Stute + Yatchew-HR) on a two-period first-differenced sample. Step 2 (Assumption 7 pre-trends) is NOT implemented on this path and is explicitly flagged in the verdict; callers must run pre-trends separately.

aggregate="event_study" (multi-period panel, >= 3 periods): bundles QUG + joint pre-trends Stute + joint homogeneity-linearity Stute. The joint Stute variants close the paper step-2 gap; the event-study verdict does NOT emit the “paper step 2 deferred” caveat. Step 3 adjudication uses joint Stute only - no joint Yatchew variant exists because the paper does not derive one; users who need Yatchew robustness under multi-period data can run yatchew_hr_test() on each (base, post) pair manually.

qug

Populated by default; None only when the workflow runs under survey= / weights= (Phase 4.5 C path), where the QUG step is permanently skipped per Phase 4.5 C0 (extreme-value theory under complex sampling not a settled toolkit; see qug_test()).

Type:

QUGTestResults or None

stute

Populated when aggregate == "overall"; None when aggregate == "event_study".

Type:

StuteTestResults or None

yatchew

Populated when aggregate == "overall"; None when aggregate == "event_study".

Type:

YatchewTestResults or None

pretrends_joint

Populated when aggregate == "event_study" and at least one earlier pre-period exists; None on the overall path or when only the immediate base pre-period is available.

Type:

StuteJointResult or None

homogeneity_joint

Populated when aggregate == "event_study"; None on the overall path.

Type:

StuteJointResult or None

all_pass

On the unweighted overall path: same Phase 3 semantics - True iff QUG is conclusive AND at least one of Stute/Yatchew is conclusive AND no conclusive test rejects. On the unweighted event-study path: True iff np.isfinite(qug.p_value), pretrends_joint is not None and np.isfinite(pretrends_joint.p_value), np.isfinite(homogeneity_joint.p_value), AND none of the three rejects. On the survey/weights path (Phase 4.5 C) the QUG-conclusiveness gate is dropped (qug=None per C0 deferral); the linearity-conditional rule splits by aggregate:

  • aggregate="overall" survey: True iff at least one of Stute/Yatchew is conclusive AND no conclusive test rejects.

  • aggregate="event_study" survey: True iff pretrends_joint is non-None and conclusive, homogeneity_joint is conclusive, AND neither rejects. (Both joint variants must be conclusive on the event-study path - same step-2 + step-3 closure as the unweighted aggregate, just without the QUG step.)

Mirrors Phase 3’s bool(np.isfinite(p_value)) convention - no .conclusive() helper on any result dataclass.

Type:

bool

verdict

Human-readable classification. Paper rule applies symmetrically: TWFE is admissible only if NONE of the implemented tests rejects. Conclusive rejections are the primary verdict; unresolved steps append as "; additional steps unresolved: ..." rather than replacing the rejection.

Type:

str

alpha
Type:

float

n_obs

Unit count. For overall: units after two-period first-difference aggregation. For event_study: units after balanced-panel validation and (if applicable) last-cohort auto-filter.

Type:

int

aggregate

"overall" or "event_study". Determines which component fields are populated and which branch of serialization methods to render.

Type:

str

qug: QUGTestResults | None
stute: StuteTestResults | None
yatchew: YatchewTestResults | None
all_pass: bool
verdict: str
alpha: float
n_obs: int
pretrends_joint: StuteJointResult | None = None
homogeneity_joint: StuteJointResult | None = None
aggregate: str = 'overall'
summary()[source]

Formatted summary of all tests and the verdict.

Return type:

str

print_summary()[source]

Print the summary to stdout.

Return type:

None

to_dict()[source]

Return a JSON-safe nested dict of the full report.

On aggregate="overall", the output schema is bit-exact with Phase 3 ({qug, stute, yatchew, all_pass, verdict, alpha, n_obs}) - no new keys, no aggregate field. On aggregate="event_study", the output carries aggregate, pretrends_joint, homogeneity_joint and omits the None-valued stute / yatchew keys entirely.

Return type:

Dict[str, Any]

to_dataframe()[source]

Return a tidy 3-row DataFrame (one row per implemented test).

Columns (stable across aggregates): [test, statistic_name, statistic_value, p_value, reject, alpha, n_obs]. Row identifiers vary by aggregate:

  • aggregate="overall": rows are qug, stute, yatchew_hr (Phase 3 schema, unchanged).

  • aggregate="event_study": rows are qug, pretrends_joint, homogeneity_joint.

Rows for None-valued components (e.g. pretrends_joint when no earlier pre-period exists) are emitted with NaN statistic values and reject=False to preserve the 3-row shape.

Return type:

DataFrame

__init__(qug, stute, yatchew, all_pass, verdict, alpha, n_obs, pretrends_joint=None, homogeneity_joint=None, aggregate='overall')
Parameters:
Return type:

None

Single-period tests (aggregate="overall")#

diff_diff.qug_test(d, alpha=0.05, *, survey_design=None, survey=None, weights=None)[source]#

Run the QUG null test for the support infimum (paper Theorem 4).

Tests H_0: d_lower = 0 using the order-statistic ratio T = D_{(1)} / (D_{(2)} - D_{(1)}), rejecting when T > 1/alpha - 1. Under the null, the asymptotic limit law of T is the ratio of two independent Exp(1) variables with CDF F(t) = t / (1 + t), so the one-sided p-value is 1 / (1 + T).

Zero-dose observations are filtered out (the test targets the infimum of the treated support). A UserWarning is emitted naming the exclusion count. When fewer than two positive doses remain, the test returns all-NaN inference with reject=False.

Parameters:
  • d (np.ndarray, shape (G,)) – Post-period dose vector. Must be 1D numeric and contain no NaN.

  • alpha (float, default 0.05) – One-sided significance level. Must satisfy 0 < alpha < 1.

  • survey_design (ResolvedSurveyDesign or None, keyword-only, default None) – Permanently rejected with NotImplementedError (Phase 4.5 C0 decision gate). Surface-symmetric kwarg with the rest of the HAD family — accepted in the signature so all 8 HAD entry points share the canonical kwarg name, but qug_test has no survey-aware migration target. See Notes – Survey/weighted data.

  • survey (SurveyDesign or None, keyword-only, default None) – DEPRECATED alias of survey_design=. Surface-symmetric only; any non-None value still raises NotImplementedError — the deprecation is about kwarg-name consolidation, NOT a migration path (there is no survey-aware QUG). Will be removed in the next minor release.

  • weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias of survey_design= for the per-row pweight shortcut on the rest of the HAD array-in family. On qug_test, surface-symmetric only; any non-None value still raises NotImplementedError — there is no migration path (make_pweight_design(arr) is NOT a valid QUG migration target). Will be removed in the next minor release.

Returns:

Result dataclass with t_stat, p_value, reject, and sample metadata.

Return type:

QUGTestResults

Raises:
  • ValueError – If d is not 1D numeric or contains NaN, or if alpha is not in (0, 1), or if more than one of survey_design/survey/weights is non-None (mutex).

  • NotImplementedError – If any of survey_design, survey, weights is non-None. See Notes – Survey/weighted data.

Notes

Scope (what this test does NOT cover). qug_test tests the Theorem 4 / Design 1’ support-infimum null H_0: d_lower = 0. It does not validate the full Assumption 4 (Assumption 4 also requires positive boundary density, twice-differentiable conditional-mean, bounded continuous conditional-variance, and bandwidth regularity — QUG is adjacent evidence on the d_lower = 0 clause only). It does NOT and CANNOT test Assumptions 5 and 6 from the same paper (Section 3.1.2), which are required for sign identification (A5) and point identification (A6) of WAS_{d_lower} on the Design 1 family (d_lower > 0). Assumptions 5 and 6 are statements about conditional expectations near the support boundary and about counterfactual-mean alignment respectively; they are non-testable via pre-trends. See HeterogeneousAdoptionDiD class docstring Notes for the full statement and T21 (HAD pretest workflow tutorial) for the verdict-language convention that surfaces this gap.

Tie-break: when D_{(1)} == D_{(2)} the statistic is undefined. The test returns t_stat=NaN, p_value=NaN, reject=False with a UserWarning rather than raising.

Survey/weighted data: QUG is permanently deferred under survey-weighted or pweight inputs (Phase 4.5 C0 decision gate, 2026-04). The test statistic uses extreme order statistics (D_{(1)}, D_{(2)}), which are NOT smooth functionals of the empirical CDF – standard survey machinery (Binder TSL linearization, multiplier bootstrap, Rao-Wu rescaled bootstrap) does not yield a calibrated test, and under cluster sampling the Exp(1)/Exp(1) limit law’s independence assumption breaks. The extreme-value-theory-under-unequal-probability- sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests; no off-the-shelf survey-aware QUG exists. Phase 4.5 C ships survey-aware Stute via did_had_pretest_workflow() (which skips the QUG step under survey/weights and runs the linearity family with a PSU-level Mammen multiplier bootstrap for Stute and weighted OLS + pweight-sandwich variance components for Yatchew). See docs/methodology/REGISTRY.md § “QUG Null Test” for the full methodology note.

References

de Chaisemartin, Ciccia, D’Haultfoeuille, Knau (2026, arXiv:2405.04465v6), Theorem 4 and Section 4.2.

diff_diff.stute_test(d, dy, alpha=0.05, n_bootstrap=999, seed=None, *, survey_design=None, survey=None, weights=None)[source]#

Run the Stute Cramer-von Mises linearity test (paper Appendix D).

Tests H_0: E[ΔY | D_2] is linear in D_2 (paper Assumption 8). The test statistic is the sorted-residual cusum CvM

S = (1 / G^2) * sum_{g=1}^G (sum_{h=1}^g eps_(h))^2

where eps_(h) is the h-th OLS residual after sorting by d. The p-value is the bootstrap tail probability (1 + sum(S_b >= S)) / (B + 1) under the Mammen (1993) two-point wild bootstrap; each bootstrap iteration refits OLS on dy_b = a_hat + b_hat * d + eps * eta with multiplier weights eta.

Parameters:
  • d (np.ndarray, shape (G,)) – Dose and first-difference outcome vectors.

  • dy (np.ndarray, shape (G,)) – Dose and first-difference outcome vectors.

  • alpha (float, default 0.05) – Significance level. Must satisfy 0 < alpha < 1.

  • n_bootstrap (int, default 999) – Number of Mammen wild bootstrap replications. Must be >= 99 (below which the discretised p-value grid is too coarse).

  • seed (int or None, default None) – Seed for np.random.default_rng. Pass an integer for reproducible results.

  • survey_design (ResolvedSurveyDesign or None, keyword-only, default None) – Already-resolved survey design (per-unit). Array-in helpers accept ResolvedSurveyDesign ONLY; passing a SurveyDesign raises TypeError with migration guidance. For the pweight-only shortcut, use survey_design=make_pweight_design(arr). Triggers the survey-aware Stute calibration: PSU-level Mammen multipliers via diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch(), broadcast to per-unit residual perturbation, with weighted CvM recompute. Replicate-weight designs raise NotImplementedError.

  • survey (ResolvedSurveyDesign or None, keyword-only, default None) – DEPRECATED alias of survey_design=. Will be removed in the next minor release.

  • weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias of survey_design=make_pweight_design(arr). Will be removed in the next minor release.

Return type:

StuteTestResults

Raises:
  • ValueError – If d / dy are not 1D numeric, contain NaN, have unequal lengths, if any d value is negative (paper Section 2 HAD support restriction), if alpha is outside (0, 1), or if n_bootstrap < 99. Also raised if more than one of survey_design, survey, weights is supplied (3-way mutex; survey= and weights= are deprecated aliases of survey_design=).

  • TypeError – If survey_design=SurveyDesign(...) (or the deprecated survey=SurveyDesign(...) alias) is passed; array-in helpers accept ResolvedSurveyDesign only. Use survey_design=make_pweight_design(arr) for pweight-only or pre-resolve via SurveyDesign(...).resolve(data).

  • NotImplementedError – If survey.replicate_weights is not None. Replicate-weight pretests are a parallel follow-up after Phase 4.5 C; the per-replicate weight-ratio rescaling for the OLS-on-residuals refit step is not covered by the multiplier-bootstrap composition used here.

Notes

Scope (what this test does NOT cover). stute_test targets paper Assumption 8 (linearity of E[ΔY | D_2] in D_2) — the raw helper always fits dy ~ 1 + d and tests the linearity null; it does NOT target Assumption 7 mean-independence pre-trends on its own. For Assumption 7 mean-independence (residuals from intercept- only dy ~ 1), use joint_pretrends_test() (which routes null_form="mean_independence" into the joint CvM core). It does NOT and CANNOT test Assumptions 5 and 6 from de Chaisemartin et al. (2026) Section 3.1.2, which are required for sign / point identification of WAS_{d_lower} on the Design 1 family (d_lower > 0). Assumptions 5/6 are non-testable via pre-trends (boundary-conditional expectations and counterfactual-mean alignment statements); they are surfaced by the Design 1 fit-time UserWarning and by T21 tutorial prose, NOT by the workflow verdict string. See HeterogeneousAdoptionDiD class docstring Notes for the full statement.

Sample-size gate: below G = 10 the CvM statistic is not well-calibrated. In that case the function emits UserWarning and returns all-NaN inference rather than raising.

Large-G warning: at G > 100_000 the per-iteration refit dominates runtime; the function emits a UserWarning pointing users to yatchew_hr_test(). Memory usage remains O(G) regardless (no G x G matrix).

Survey/weighted data (Phase 4.5 C): when weights or survey is supplied, the OLS baseline becomes weighted OLS (_fit_weighted_ols_intercept_slope()), the bootstrap multipliers become PSU-level Mammen draws (broadcast to per-obs perturbation), and the test statistic uses _cvm_statistic_weighted(). Per-unit constant-within-unit invariant on weights/strata/psu/fpc is the CALLER’s responsibility; the workflow (did_had_pretest_workflow()) enforces it via _aggregate_unit_weights() / _aggregate_unit_resolved_survey() from had.py. At w = ones(G), weighted helpers reduce bit-exactly to the unweighted versions but bootstrap p-values diverge by Monte-Carlo noise (different RNG consumption between batched generate_survey_multiplier_weights_batch and per-iteration _generate_mammen_weights); use the distribution-equivalence reduction test (large B) for trivial-pweight parity, NOT numerical equivalence.

References

Stute, W. (1997). Nonparametric model checks for regression. Annals of Statistics 25, 613-641. Mammen, E. (1993). Bootstrap and wild bootstrap for high-dimensional linear models. Annals of Statistics 21, 255-285. de Chaisemartin et al. (2026), Appendix D.

diff_diff.yatchew_hr_test(d, dy, alpha=0.05, *, null='linearity', survey_design=None, survey=None, weights=None)[source]#

Run the Yatchew heteroskedasticity-robust specification test.

Tests one of two nulls (selected via null=) using the variance-ratio statistic

T_hr = sqrt(G) * (sigma2_lin - sigma2_diff) / sigma2_W

where

sigma2_lin = (1/G) * sum(eps^2) # residuals under chosen null sigma2_diff = (1/(2G)) * sum((dy_{(g)} - dy_{(g-1)})^2) # Yatchew differencing sigma2_W = sqrt((1/(G-1)) * sum(eps_{(g)}^2 * eps_{(g-1)}^2))

and _{(g)} denotes sort by d. Under null="linearity" (default, paper Assumption 8 / Theorem 7) eps are residuals from OLS dy = a + b*d + eps. Under null="mean_independence" eps = dy - mean(dy) (intercept-only OLS), mirroring R YatchewTest::yatchew_test(order=0). The sigma2_diff and sigma2_W formulas are identical between the two modes - the only delta is the residual definition. Rejection uses the one-sided standard-normal critical value z_{1-alpha}.

Parameters:
  • d (np.ndarray, shape (G,)) – Dose and first-difference outcome vectors.

  • dy (np.ndarray, shape (G,)) – Dose and first-difference outcome vectors.

  • alpha (float, default 0.05) – One-sided significance level.

  • null ({"linearity", "mean_independence"}, keyword-only, default "linearity") –

    Which null hypothesis the test targets:

    • "linearity" (default): H_0 E[dY | D] is linear in D (paper Assumption 8, Theorem 7). Residuals come from OLS dy = a + b*d + eps. Bit-exact backcompat with pre-PR calls.

    • "mean_independence": H_0 E[dY | D] = E[dY] (mean independence of dY from D). Residuals come from intercept-only OLS dy = a + eps, so eps = dy - mean(dy). Mirrors R YatchewTest::yatchew_test(order=0). Used by the R-parity test on placebo Yatchew rows (Credible-Answers/did_had runs order=0 on placebos to test pre-trends as a non-parametric mean-independence assertion).

    d is required under both modes (the sort-by-d differencing step is null-agnostic).

  • survey_design (ResolvedSurveyDesign or None, keyword-only, default None) – Already-resolved survey design (per-unit). Array-in helpers accept ResolvedSurveyDesign ONLY; passing a SurveyDesign raises TypeError. For pweight-only, use survey_design=make_pweight_design(arr). When supplied, the OLS baseline becomes weighted OLS and all three variance components become their pweight-sandwich analogs. PSU clustering is NOT propagated through the variance-ratio statistic (would require deriving a survey-aware variance-of-variance estimator; out of scope per Phase 4.5 C). Replicate-weight designs raise NotImplementedError.

  • survey (ResolvedSurveyDesign or None, keyword-only, default None) – DEPRECATED alias of survey_design=. Will be removed in the next minor release.

  • weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias of survey_design=make_pweight_design(arr). Will be removed in the next minor release.

Return type:

YatchewTestResults

Raises:
  • ValueError – If d / dy are not 1D numeric, contain NaN, have unequal lengths, if any d value is negative (paper Section 2 HAD support restriction), or if alpha is outside (0, 1). Also raised if more than one of survey_design, survey, weights is supplied (3-way mutex; survey= and weights= are deprecated aliases of survey_design=), or if any weight is non-positive.

  • TypeError – If survey_design=SurveyDesign(...) (or the deprecated survey=SurveyDesign(...) alias) is passed; array-in helpers accept ResolvedSurveyDesign only. Use survey_design=make_pweight_design(arr) for pweight-only or pre-resolve via SurveyDesign(...).resolve(data).

  • NotImplementedError – If survey.replicate_weights is not None (deferred follow-up).

Notes

Scope (what this test does NOT cover). yatchew_hr_test targets paper Assumption 8 (linearity of E[ΔY | D_2] in D_2) under null="linearity" (default); null="mean_independence" swaps the residual definition to intercept-only dy ~ 1 for R parity with YatchewTest::yatchew_test(order=0) on pre-trend placebos. It does NOT and CANNOT test Assumptions 5 and 6 from de Chaisemartin et al. (2026) Section 3.1.2, which are required for sign / point identification of WAS_{d_lower} on the Design 1 family (d_lower > 0). Assumptions 5/6 are non-testable via pre-trends; they are surfaced by the Design 1 fit-time UserWarning and by T21 tutorial prose, NOT by the workflow verdict string. See HeterogeneousAdoptionDiD class docstring Notes for the full statement.

Sample-size gate: below G = 3 the difference-variance estimator is undefined; the function emits UserWarning and returns NaN rather than raising.

Dose ties: REJECTED with UserWarning + all-NaN result. The difference-based variance estimator sigma2_diff and the heteroskedasticity-robust scale sigma4_W both use adjacent differences of quantities sorted by d; under tied doses the within-tie row ordering is arbitrary (stable sort falls back to input order) so the statistic becomes order-dependent rather than data-dependent. Callers with tied doses (mass-point designs, discretised dose registers) should use stute_test() instead - its tie-safe Cramer-von Mises statistic collapses tie blocks to the post-tie cumulative sum and is provably order-invariant under within-tie permutations.

Exact-linear short-circuit: when the OLS residual sum-of-squares is below IEEE precision relative to the centered total sum of squares (sum(eps^2) <= 1e-24 * sum((dy - dybar)^2), i.e. essentially 1 - R^2 == 0), the test short-circuits to t_stat_hr=-inf, p_value=1.0, reject=False - Assumption 8 holds exactly, the formal statistic is -inf under the one-sided critical value, and the correct decision is fail-to-reject. This shortcut is translation- invariant because the comparison is against centered TSS (not raw sum(dy^2)).

Degenerate sigma4_W = 0 with non-zero residuals: when the adjacent-residual-product sum vanishes AFTER the exact-linear shortcut is bypassed (e.g. residuals alternate zero/non-zero after sorting), the formal statistic is +inf or -inf depending on the sign of the numerator sigma2_lin - sigma2_diff. The function returns the sign-aware limit (p=0, reject=True for positive numerator; p=1, reject=False for negative; NaN for zero) with a UserWarning, rather than unconditionally mapping this to p=1 (which would flip a legitimate rejection).

Survey/weighted data (Phase 4.5 C): when weights or survey is supplied, all three variance components use their pweight-sandwich analogs:

  • sigma2_lin = sum(w * eps^2) / sum(w) (weighted OLS residual variance).

  • sigma2_diff = sum(w_avg * (dy_g - dy_{g-1})^2) / (2 * sum(w)) where w_avg_g = (w_g + w_{g-1}) / 2 and the divisor uses sum(w) (not sum(w_avg)) so the formula reduces bit-exactly to the unweighted (1/(2G)) divisor at w = ones(G).

  • sigma4_W = sum(w_avg * eps_g^2 * eps_{g-1}^2) / sum(w_avg) with arithmetic-mean pair weights; reduces to the unweighted (1/(G-1)) divisor at w = ones(G).

  • T_hr = sqrt(sum(w)) * (sigma2_lin - sigma2_diff) / sigma2_W.

The pair-weight convention follows Krieger-Pfeffermann (1997, §3) for design-consistent inference on smooth functionals; PSU clustering is NOT propagated through the variance-ratio statistic. Strictly positive weights are required (the adjacent-difference formula has sum(w_avg) in the denominator). Per-unit constant-within-unit invariant on weights/strata/psu/fpc is the CALLER’s responsibility.

References

Yatchew, A. (1997). An elementary estimator of the partial linear model. Economics Letters 57, 135-143. de Chaisemartin et al. (2026), Theorem 7 / Equation 29. Krieger, A., Pfeffermann, D. (1997). Testing of distribution functions from complex sample surveys. Journal of Official Statistics 13(2), 123-142.

class diff_diff.QUGTestResults[source]

Bases: object

Result of qug_test() (paper Theorem 4).

The QUG test rejects H_0: d_lower = 0 when the order-statistic ratio T = D_{(1)} / (D_{(2)} - D_{(1)}) exceeds 1/alpha - 1. Under the null, the asymptotic limit law of T is the ratio of two independent Exp(1) random variables, with CDF F(t) = t / (1 + t), so p_value = 1 / (1 + T).

t_stat

D_{(1)} / (D_{(2)} - D_{(1)}). NaN when fewer than 2 non-zero observations remain or when the two smallest doses tie.

Type:

float

p_value

1 / (1 + t_stat) under the null. NaN when t_stat is NaN.

Type:

float

reject

True iff t_stat > critical_value. False on NaN statistic.

Type:

bool

alpha

Significance level used.

Type:

float

critical_value

1 / alpha - 1. Populated even when the statistic is NaN so downstream readers can inspect the decision threshold.

Type:

float

n_obs

Number of observations after filtering to d > 0.

Type:

int

n_excluded_zero

Number of zero-dose observations excluded from the sample.

Type:

int

d_order_1

Smallest positive dose D_{(1)}. NaN when n_obs < 2.

Type:

float

d_order_2

Second-smallest positive dose D_{(2)}. NaN when n_obs < 2.

Type:

float

t_stat: float
p_value: float
reject: bool
alpha: float
critical_value: float
n_obs: int
n_excluded_zero: int
d_order_1: float
d_order_2: float
summary()[source]

Formatted summary table.

Return type:

str

print_summary()[source]

Print the summary to stdout.

Return type:

None

to_dict()[source]

Return results as a JSON-safe dict.

Return type:

Dict[str, Any]

to_dataframe()[source]

Return a one-row DataFrame of the result dict.

Return type:

DataFrame

__init__(t_stat, p_value, reject, alpha, critical_value, n_obs, n_excluded_zero, d_order_1, d_order_2)
Parameters:
Return type:

None

class diff_diff.StuteTestResults[source]

Bases: object

Result of stute_test() (paper Appendix D).

The Stute test rejects the null that E[ΔY | D_2] is linear in D_2 (paper Assumption 8) when the sorted-residual CvM statistic S = (1/G^2) Σ (Σ_{h=1}^g eps_{(h)})^2 exceeds the Mammen wild bootstrap 1 - alpha quantile.

cvm_stat

CvM statistic. NaN when G < 10 (below the threshold the statistic is not well-calibrated).

Type:

float

p_value

Bootstrap p-value (1 + sum(S_b >= S)) / (B + 1). NaN when the statistic is NaN.

Type:

float

reject

True iff p_value <= alpha. False on NaN.

Type:

bool

alpha

Significance level used.

Type:

float

n_bootstrap

Number of Mammen wild bootstrap replications.

Type:

int

n_obs

Number of observations.

Type:

int

seed

Seed passed to np.random.default_rng. None when unseeded.

Type:

int or None

cvm_stat: float
p_value: float
reject: bool
alpha: float
n_bootstrap: int
n_obs: int
seed: int | None
summary()[source]

Formatted summary table.

Return type:

str

print_summary()[source]

Print the summary to stdout.

Return type:

None

to_dict()[source]

Return results as a JSON-safe dict.

Return type:

Dict[str, Any]

to_dataframe()[source]

Return a one-row DataFrame of the result dict.

Return type:

DataFrame

__init__(cvm_stat, p_value, reject, alpha, n_bootstrap, n_obs, seed)
Parameters:
Return type:

None

class diff_diff.YatchewTestResults[source]

Bases: object

Result of yatchew_hr_test() (paper Theorem 7 / Equation 29).

Heteroskedasticity-robust specification test using Yatchew’s difference-based variance estimator. Two nulls are supported via the null= argument on yatchew_hr_test() and reflected on the null_form attribute below: "linearity" (default; paper Theorem 7, the same null as stute_test(), residuals from OLS dy ~ 1 + d) and "mean_independence" (R-parity extension mirroring R YatchewTest::yatchew_test(order=0), residuals from intercept-only OLS dy ~ 1). The test statistic T_hr = sqrt(G) * (sigma2_lin - sigma2_diff) / sigma2_W is asymptotically N(0, 1) under H_0 in both modes; rejection uses the one-sided standard-normal critical value. Only the residual definition (and therefore sigma2_lin) differs between modes — the sigma2_diff / sigma2_W / sort-by-d machinery is shared.

t_stat_hr

Test statistic T_hr from paper Equation 29. NaN when G < 3.

Type:

float

p_value

1 - Phi(T_hr). NaN when the statistic is NaN.

Type:

float

reject

True iff T_hr >= critical_value. False on NaN.

Type:

bool

alpha

Significance level used.

Type:

float

critical_value

One-sided standard-normal critical value z_{1 - alpha}.

Type:

float

sigma2_lin

Residual variance under the chosen null. Under null_form="linearity": residual variance from OLS of dy on d. Under null_form="mean_independence": (1/G) * sum((dy - mean(dy))^2), the population variance of dy.

Type:

float

sigma2_diff

Yatchew differencing variance (1 / (2G)) * sum((dy_{(g)} - dy_{(g-1)})^2) - divisor is 2G (paper-literal), NOT 2(G-1).

Type:

float

sigma2_W

Heteroskedasticity-robust scale sqrt((1 / (G-1)) * sum(eps_{(g)}^2 * eps_{(g-1)}^2)).

Type:

float

n_obs

Number of observations.

Type:

int

null_form

"linearity" (default; H_0: E[dY|D] is linear in D, residuals from OLS dy ~ 1 + d) or "mean_independence" (H_0: E[dY|D] = E[dY], residuals from intercept-only OLS dy ~ 1). Mirrors R YatchewTest::yatchew_test’s order argument (order=1"linearity"; order=0"mean_independence").

Type:

str

t_stat_hr: float
p_value: float
reject: bool
alpha: float
critical_value: float
sigma2_lin: float
sigma2_diff: float
sigma2_W: float
n_obs: int
null_form: str = 'linearity'
summary()[source]

Formatted summary table.

Return type:

str

print_summary()[source]

Print the summary to stdout.

Return type:

None

to_dict()[source]

Return results as a JSON-safe dict.

Return type:

Dict[str, Any]

to_dataframe()[source]

Return a one-row DataFrame of the result dict.

Return type:

DataFrame

__init__(t_stat_hr, p_value, reject, alpha, critical_value, sigma2_lin, sigma2_diff, sigma2_W, n_obs, null_form='linearity')
Parameters:
Return type:

None

Joint multi-period tests (aggregate="event_study")#

diff_diff.stute_joint_pretest(residuals_by_horizon, fitted_by_horizon, doses, design_matrix, *, alpha=0.05, n_bootstrap=999, seed=None, null_form='custom', survey_design=None, survey=None, weights=None)[source]#

Joint Cramer-von Mises pretest across multiple horizons.

Generalizes stute_test() to K horizons with the joint statistic S_joint = sum_k S_k, where S_k is the single- horizon CvM on residuals eps_{g,k}. Inference is via Mammen wild bootstrap with a shared multiplier eta_g across horizons per unit to preserve the vector-valued empirical process’s unit-level dependence.

Note: sum-of-CvMs aggregation follows the standard joint specification-test construction (Delgado 1993; Escanciano 2006). The paper does not prescribe an aggregation; sum-of-CvMs balances power across diffuse vs concentrated alternatives and bootstraps cleanly with the shared-eta structure.

Bootstrap uses the literal per-iteration OLS refit form (paper Appendix D) for consistency with Phase 3’s stute_test(). XtX_inv_Xt is precomputed once (same design matrix each iteration), so the refit cost is O(Gp) per bootstrap draw and the overall loop is dominated by _cvm_statistic() across K horizons.

Parameters:
  • residuals_by_horizon (dict[str, np.ndarray]) – {label: eps_g} per horizon. All values must have identical length G and be unit-ordered consistently with doses.

  • fitted_by_horizon (dict[str, np.ndarray]) – {label: fitted_g} per horizon. Required to reconstruct bootstrap outcomes dy*_{g,k} = fitted_{g,k} + eps_{g,k} * eta_g under the null.

  • doses (np.ndarray, shape (G,)) – Dose per unit. Shared across horizons (HAD contract: dose is time-invariant per unit). Must be finite and non-negative.

  • design_matrix (np.ndarray, shape (G, p)) – Regression design used in the per-horizon bootstrap refit. Mean-independence: [1] (intercept only). Linearity: [1, doses]. The matrix is identical across horizons.

  • alpha (see stute_test().)

  • n_bootstrap (see stute_test().)

  • seed (see stute_test().)

  • null_form (str) – Diagnostic label recorded on the result ("mean_independence" | "linearity" | "custom"). The wrappers joint_pretrends_test() and joint_homogeneity_test() set this automatically.

  • survey_design (ResolvedSurveyDesign or None, keyword-only, default None) – Already-resolved per-unit survey design (Phase 4.5 C). Array-in helpers accept ResolvedSurveyDesign ONLY; passing a SurveyDesign raises TypeError. For pweight-only, use survey_design=make_pweight_design(arr). When supplied, the bootstrap is a PSU-level Mammen multiplier bootstrap with the multiplier matrix shared across horizons within each replicate (preserves both vector-valued empirical-process unit-level dependence + PSU clustering). Replicate-weight designs raise NotImplementedError; non-pweight weight types are rejected. Variance-unidentified designs (df_survey <= 0) return NaN with a UserWarning instead of calibrating against an all-zero multiplier matrix.

  • survey (ResolvedSurveyDesign or None, keyword-only, default None) – DEPRECATED alias of survey_design=. Will be removed in the next minor release.

  • weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias of survey_design=make_pweight_design(arr). Will be removed in the next minor release.

Returns:

On the common path, a populated result with bootstrap-based p_value and cvm_stat_joint. On the small-sample branch (G < _MIN_G_STUTE), constant-dose branch (np.ptp(doses) <= 0), or any-NaN branch in the input residuals / fitted arrays, returns an all-NaN result (with reject=False and the full per_horizon_stats dict keyed by the validated horizon labels) and emits a UserWarning for the first two branches. Mirrors the single-horizon stute_test() contract so event-study workflows on small or staggered-filtered panels surface an inconclusive report rather than crashing.

Return type:

StuteJointResult

Raises:

ValueError – On empty input, key-mismatch, stringified-label collisions between distinct raw keys, shape-mismatch, doses containing negative values, n_bootstrap < _MIN_N_BOOTSTRAP, or invalid alpha. G < _MIN_G_STUTE does NOT raise; see Returns.

diff_diff.joint_pretrends_test(data, outcome_col, dose_col, time_col, unit_col, pre_periods, base_period, first_treat_col=None, *, alpha=0.05, n_bootstrap=999, seed=None, survey_design=None, survey=None, weights=None, trends_lin=False)[source]#

Joint Stute pre-trends test (paper Section 4.2 step 2).

Data-in wrapper around stute_joint_pretest() for the mean-independence null E[Y_{g,t} - Y_{g,base} | D_{g,treat}] = mu_t across multiple pre-period placebos. For each t in pre_periods, residuals are the deviations of Y_{g,t} - Y_{g,base} from their cross-unit mean (an intercept-only OLS fit); the joint CvM tests that the conditional mean depends on D.

Use this wrapper to close the paper’s step-2 pre-trends gap that did_had_pretest_workflow() otherwise flags. On a panel with at least one earlier pre-period, the aggregate="event_study" dispatch calls this wrapper internally.

Parameters:
  • data (pd.DataFrame)

  • outcome_col (str)

  • dose_col (str)

  • time_col (str)

  • unit_col (str)

  • pre_periods (list) – Non-empty list of pre-period labels (all < base_period, all with D = 0 across every unit). Empty list raises; the workflow dispatch handles the “no earlier pre-period” case by setting pretrends_joint=None rather than calling this wrapper.

  • base_period (period label) – The reference period. Must not be in pre_periods. Must also satisfy D = 0 across every unit (reciprocal of the pre-period HAD invariant - base is itself a pre-period in the four-step workflow).

  • first_treat_col (str or None) – Forwarded to the underlying panel validator; matched cohort handling follows the HAD contract (staggered auto-filter warns and proceeds on last cohort; solo cohort proceeds).

  • alpha (as in stute_test().)

  • n_bootstrap (as in stute_test().)

  • seed (as in stute_test().)

  • survey_design (SurveyDesign or None, keyword-only, default None) – Survey design (Phase 4.5 C). Resolved on the filtered panel; replicate-weight designs raise NotImplementedError; weight_type must be "pweight". Forwarded to stute_joint_pretest() as a per-unit ResolvedSurveyDesign. Mutually exclusive with the deprecated survey= and weights= aliases.

  • survey (SurveyDesign or None, keyword-only, default None) – DEPRECATED alias of survey_design=. Will be removed in the next minor release.

  • weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias for the per-row pweight shortcut. Prefer survey_design=SurveyDesign(weights='col_name') against your dataframe instead. Will be removed in the next minor release.

  • trends_lin (bool, default False, keyword-only) – When True, applies paper Eq 17 / Eq 18 linear-trend detrending: per-group slope estimated as Y[g, base] - Y[g, base - 1] and subtracted from each pre-period horizon’s outcome evolution as (t - base) × slope. Mirrors R DIDHAD::did_had(..., trends_lin=TRUE) on its joint Stute pre-trends surface (paper Section 5.2 Pierce-Schott application). Requires base_period to equal the last validated pre-period (t_pre_list[-1], i.e. the canonical F-1 anchor). Direct callers passing a non-terminal base get a ValueError — Eq 17 / R both anchor at F-1 and any other anchor would compute a different slope and detrending. The previous validated pre-period (t_pre_list[-2], F-2) must also be present so the slope is identified. The “consumed” placebo at F-2 is dropped from pre_periods explicitly (its detrended residual is mechanically zero by construction); a UserWarning fires when the filter triggers. If pre_periods becomes empty after the drop, raises ValueError (no testable placebo horizons remain). Mutually exclusive with survey weighting (survey_design / survey / weights); raises NotImplementedError if combined. Default False preserves bit-exact backcompat.

Return type:

StuteJointResult with null_form = "mean_independence".

diff_diff.joint_homogeneity_test(data, outcome_col, dose_col, time_col, unit_col, post_periods, base_period, first_treat_col=None, *, alpha=0.05, n_bootstrap=999, seed=None, survey_design=None, survey=None, weights=None, trends_lin=False)[source]#

Joint Stute homogeneity-linearity test (paper Section 4.3 joint).

Data-in wrapper around stute_joint_pretest() for the linearity null E[Y_{g,t} - Y_{g,base} | D_{g,t}] = beta_{0,t} + beta_{fe,t} * D_{g,t} across multiple post-period horizons. For each t in post_periods, residuals are from an OLS regression of Y_{g,t} - Y_{g,base} on [1, D_g]; the joint CvM tests whether the conditional mean is nonlinear in D in any horizon.

Used by did_had_pretest_workflow() with aggregate="event_study" as the step-3 test (no joint Yatchew variant exists - the paper does not derive one; users who need Yatchew-style adjacent-difference robustness can call yatchew_hr_test() on each (base, post) pair manually).

Parameters:
  • data (pd.DataFrame)

  • outcome_col (str)

  • dose_col (str)

  • time_col (str)

  • unit_col (str)

  • post_periods (list) – Non-empty list of post-period labels (all strictly > base_period by chronological order; each with D > 0 for some unit, i.e. at least one treated unit per horizon).

  • base_period (period label) – The reference period (last pre-period in the event-study convention). Must not be in post_periods.

  • first_treat_col (str or None) – Forwarded to the underlying panel validator.

  • alpha (as in stute_test().)

  • n_bootstrap (as in stute_test().)

  • seed (as in stute_test().)

  • survey_design (SurveyDesign or None, keyword-only, default None) – Survey design (Phase 4.5 C). Same contract as joint_pretrends_test(). Mutually exclusive with the deprecated survey= and weights= aliases.

  • survey (SurveyDesign or None, keyword-only, default None) – DEPRECATED alias of survey_design=. Will be removed in the next minor release.

  • weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias for the per-row pweight shortcut. Prefer survey_design=SurveyDesign(weights='col_name') against your dataframe instead. Will be removed in the next minor release.

  • trends_lin (bool, default False, keyword-only) – When True, applies paper page-32 linear-trend detrending: per-group slope estimated as Y[g, base] - Y[g, base - 1] and applied to each post-period horizon’s outcome evolution as (t - base) × slope (forward extrapolation into post). Same slope estimator as joint_pretrends_test(). Mirrors R DIDHAD::did_had(..., trends_lin=TRUE) on its joint homogeneity surface (paper Section 4.3, Pierce-Schott p=0.40 anchor). Requires base_period to equal the last validated pre-period (t_pre_list[-1], the canonical F-1 anchor) AND F-2 to be present in the panel so the slope is identified. Direct callers passing a non- terminal base get a ValueError — Eq 17 / R both anchor at F-1. Mutually exclusive with survey weighting; raises NotImplementedError if combined. Default False preserves bit-exact backcompat.

Return type:

StuteJointResult with null_form = "linearity".

class diff_diff.StuteJointResult[source]

Bases: object

Result of stute_joint_pretest() (joint Cramer-von Mises across horizons).

Aggregates the per-horizon Stute (1997) CvM statistic into a joint specification test: S_joint = sum_k S_k, where S_k is the single-horizon CvM on residuals eps_{g,k}. Inference is via Mammen (1993) wild bootstrap with a shared multiplier eta_g across horizons per unit (Delgado-Manteiga 2001; Hlavka-Huskova 2020) to preserve the unit-level dependence structure of the vector-valued empirical process.

Two nulls are supported via the thin wrappers joint_pretrends_test() (mean-independence: E[Y_t - Y_base | D] = mu_t, design matrix [1]) and joint_homogeneity_test() (linearity: E[Y_t - Y_base | D_t] = beta_{0,t} + beta_{fe,t} * D, design matrix [1, D]). Both wrappers accept a trends_lin: bool = False keyword-only flag (PR #392): when True, applies paper Eq 17 / Eq 18 linear-trend detrending before the joint CvM using per-group slope Y[g, F-1] - Y[g, F-2].

cvm_stat_joint

Joint statistic S_joint = sum_k S_k. NaN on NaN-propagation.

Type:

float

p_value

Bootstrap p-value (1 + sum(S*_b >= S_joint)) / (B + 1). NaN when the statistic is NaN. 1.0 when the per-horizon exact- linear short-circuit fires (all horizons machine-exact linear).

Type:

float

reject

True iff p_value <= alpha. Always False on NaN.

Type:

bool

alpha

Significance level.

Type:

float

horizon_labels

Horizon identifiers as str(t) for each period. String identity only - NOT a chronological ordering key. Callers who need chronological order should preserve the original period values alongside (a downstream plotter sorting labels lexicographically will misorder e.g. ["2003-Q10", "2003-Q2", ...]).

Type:

list of str

per_horizon_stats

{label: S_k} diagnostic. Per-horizon p-values are NOT exposed (decomposing the joint bootstrap into K independent loops is a K-fold memory/time cost; deferred). Callers who need per-horizon p-values can call stute_test() separately on each (period, residual) pair.

On NaN-propagation (any horizon has NaN input), this dict is preserved with {label: np.nan for label in horizon_labels}, NOT an empty dict, NOT a partial dict: the keys carry diagnostic value (which horizons were attempted), the NaN values signal non-propagation.

Type:

dict[str, float]

n_bootstrap
Type:

int

n_obs

Number of units G.

Type:

int

n_horizons
Type:

int

seed
Type:

int or None

null_form

"mean_independence" (from joint_pretrends_test()) or "linearity" (from joint_homogeneity_test()). "custom" when called directly via stute_joint_pretest() without a wrapper.

Type:

str

exact_linear_short_circuited

True when every horizon’s residual SSR to centered TSS ratio is below _EXACT_LINEAR_RELATIVE_TOL; bootstrap is skipped and p_value = 1.0. The per-horizon check ensures a single degenerate horizon does not collapse the joint test when other horizons have nontrivial residuals.

Type:

bool

cvm_stat_joint: float
p_value: float
reject: bool
alpha: float
horizon_labels: list
per_horizon_stats: Dict[str, float]
n_bootstrap: int
n_obs: int
n_horizons: int
seed: int | None
null_form: str
exact_linear_short_circuited: bool
summary()[source]

Formatted summary table.

Return type:

str

print_summary()[source]

Print the summary to stdout.

Return type:

None

to_dict()[source]

Return results as a JSON-safe dict.

Return type:

Dict[str, Any]

to_dataframe()[source]

Return a one-row DataFrame of the top-level result fields.

Return type:

DataFrame

__init__(cvm_stat_joint, p_value, reject, alpha, horizon_labels, per_horizon_stats, n_bootstrap, n_obs, n_horizons, seed, null_form, exact_linear_short_circuited)
Parameters:
Return type:

None