Heterogeneous Adoption Difference-in-Differences#
Estimator for designs where no unit remains untreated at the post period. Every unit g is exposed to treatment at the same single date but adoption intensity (dose) varies across units; there is no genuinely untreated control group to anchor a standard DiD contrast.
This module implements the methodology from de Chaisemartin, Ciccia, D’Haultfœuille & Knau (2026), “Difference-in-Differences Estimators When No Unit Remains Untreated” (arXiv:2405.04465v6), which:
Targets WAS or WAS_{d̲} depending on design path: Design 1’ (the QUG / Quasi-Untreated-Group case with
d̲ = 0) identifies the Weighted Average Slope (WAS, paper Equation 2); Design 1 (no QUG,d̲ > 0) identifiesWAS_{d̲}under Assumption 6, or sign identification only under Assumption 5 (neither additional assumption is testable via pre-trends). The shipped result classes exposetarget_parameter == "WAS"versus"WAS_d_lower"so callers can key on the resolved estimand.Estimates the target via local-linear regression at the dose support boundary, with three concrete fit paths:
continuous_at_zerofor Design 1’, andcontinuous_near_d_lowerormass_pointfor Design 1 (auto-detected from the dose distribution).Provides bias-corrected confidence intervals ported from the
nprobustmachinery for the continuous-dose paths, and a structural-residual 2SLS sandwich for the mass-point path.Extends to multi-period event-study settings (paper Appendix B.2), restricting staggered-timing panels to the last-treatment cohort (which retains never-treated units as comparisons) with pointwise per-horizon CIs.
Note
When to use HAD. Use HeterogeneousAdoptionDiD when your panel has
no untreated unit at the post period (e.g. universal-rollout policies,
industry-wide tariff changes) but treatment intensity varies across
units. For panels with a never-treated control group and continuous
treatment, use ContinuousDiD instead. For binary
reversible treatments, use ChaisemartinDHaultfoeuille.
Note
Inference contract. Per-horizon CIs are always pointwise. There are three SE regimes selected by call site:
Unweighted - continuous paths use the CCT-2014 weighted-robust SE from the in-house
lprobustport; the mass-point path uses a structural-residual 2SLS sandwich. No cross-horizon covariance.``weights=np.ndarray`` shortcut (deprecated) - continuous paths reuse the CCT-2014 SE; the mass-point path uses an analytical weighted 2SLS sandwich (
classical/hc1; CR1 whencluster=is supplied, exceptcluster=+aggregate="event_study"+cband=Trueis rejected outright regardless ofvcov_typeper the cluster-combination deviation below;hc2/hc2_bmraiseNotImplementedErrorpending a 2SLS-specific leverage derivation). Yieldsvariance_formula="pweight"/"pweight_2sls".``survey_design=SurveyDesign(weights=”col”, …)`` (canonical; accepts strata / PSU / FPC) - both paths compose Binder (1983) Taylor-series linearization with
df_surveythreaded intosafe_inference. Yieldsvariance_formula="survey_binder_tsl"/"survey_binder_tsl_2sls".
The two weighted paths currently produce different SE families on this
estimator (CCT-2014 / 2SLS pweight-sandwich vs Binder-TSL); the
deprecated weights= and survey= aliases will be removed in the
next minor release, at which point the long-term unification onto a
single SE contract under survey_design= lands. (Tracked in
TODO.md; the deprecation warning emitted by HeterogeneousAdoptionDiD.fit
spells the migration out per call site.) On array-in HAD pretest
helpers (stute_test, yatchew_hr_test, stute_joint_pretest)
the pweight-only shortcut is
survey_design=make_pweight_design(weights); data-in surfaces use
survey_design=SurveyDesign(weights="col_name", ...) against
data instead. qug_test is the exception: the QUG step has no
survey-aware migration target (Phase 4.5 C0 decision; see methodology
REGISTRY) and permanently raises NotImplementedError on any of
survey_design= / survey= / weights=. The composite
workflow did_had_pretest_workflow handles this by skipping QUG
under survey/weighted dispatch and emitting a UserWarning.
A simultaneous confidence band (sup-t) is available only on the
weighted event-study path via cband=True. Joint cross-horizon
analytical covariance is not computed in this release; tracked in
TODO.md.
Mass-point ``vcov_type=”classical”`` deviation. The mass-point
survey_design=SurveyDesign(...) paths (static and event-study) and
the deprecated weights= + aggregate="event_study" +
cband=True path reject vcov_type="classical" with
NotImplementedError. The per-unit 2SLS influence function returned
by the mass-point fit is HC1-scaled so that
compute_survey_if_variance and the sup-t bootstrap target
V_HC1 consistently; mixing it with a classical analytical SE
would silently report a V_HC1-targeted variance under a
classical label. Use vcov_type="hc1" or set robust=True
explicitly (the constructor default robust=False maps to
vcov_type="classical", which triggers the guard); a
classical-aligned IF derivation is queued for a follow-up PR.
Mass-point cluster-combination deviation. On
design="mass_point", two clustered weighted paths are rejected
outright regardless of vcov_type:
survey_design=SurveyDesign(...)+cluster=(static and event-study): the survey path composes Binder-TSL variance, which would silently override the CR1 cluster-robust sandwich. Workarounds:cluster=alone (unweighted CR1), orweights=+cluster=(weighted-CR1 pweight sandwich), orsurvey_design=alone (Binder-TSL). Combined cluster-robust + survey inference is queued for a follow-up PR.Deprecated
weights=shortcut +cluster=+aggregate="event_study"+cband=True: the sup-t bootstrap normalizes HC1-scale perturbations by the CR1 analytical SE, mixing variance families. Workarounds: passcband=False(keeps weighted-CR1 per-horizon), or dropcluster=(keeps weighted-HC1 sup-t).
Tip
For an end-to-end walkthrough of the survey-aware HAD workflow on a
BRFSS-shape stratified household-survey panel - including the now-
supported SurveyDesign(strata=...) path through the Stute pretest
family (lifted in PR #432, 2026-05) - see
Tutorial 22: Survey-Weighted HAD.
HeterogeneousAdoptionDiD#
- class diff_diff.HeterogeneousAdoptionDiD[source]
Bases:
objectHeterogeneous Adoption Difference-in-Differences estimator.
Implements de Chaisemartin, Ciccia, D’Haultfoeuille, and Knau (2026) Weighted-Average-Slope (WAS) estimator with three design-dispatch paths: Design 1’ (continuous-at-zero), Design 1 continuous-near- d_lower, and Design 1 mass-point (2SLS sample-average per paper Section 3.2.4). Two aggregation modes:
aggregate="overall"(Phase 2a, default) returns a single-periodHeterogeneousAdoptionDiDResultson a two-period panel.aggregate="event_study"(Phase 2b, paper Appendix B.2) returns aHeterogeneousAdoptionDiDEventStudyResultswith per- event-time WAS estimates on a multi-period panel, using a uniformF-1anchor and pointwise CIs per horizon. Staggered-timing panels auto-filter to the last-treatment cohort plus never-treated units (paper Appendix B.2 prescription).
- Parameters:
design ({"auto", "continuous_at_zero", "continuous_near_d_lower", "mass_point"}) –
Design-dispatch strategy. Defaults to
"auto"which resolves via the REGISTRY auto-detect rule on the fitted dose data (see_detect_design()).Explicit overrides are checked against the paper’s regime-partition contract (Section 3.2) at fit time:
"continuous_at_zero"(Design 1’): paper requires the support infimumd_lower = 0. Phase 1c’s_validate_had_inputsrejects mass-point samples passed to this path."continuous_near_d_lower"(Design 1, continuous density neard_lower): requiresd_lower > 0and a non-mass-point sample (modal fraction atd.min()must be <= 2%).d_lowermust equalfloat(d.min())within float tolerance; non-support-infimum thresholds are off- support and raise."mass_point"(Design 1 mass-point): requiresd_lower > 0AND a mass-point sample (modal fraction atd.min()must be > 2%).d_lowermust equalfloat(d.min())within float tolerance. Forcing this design on ad_lower = 0sample or on a continuous (non-mass-point) sample raises; in either case 2SLS identifies a different estimand than the paper’s Design 1 mass-point WAS.
Mismatched overrides raise
ValueErrorpointing at the correct design rather than silently identifying a different estimand.d_lower (float or None) – Support infimum
d_lower.Nonemeans use0.0on the Design 1’ path andfloat(d.min())on the other two paths. On Design 1 paths (continuous_near_d_lowerandmass_point), an explicitd_lowermust equalfloat(d.min())within float tolerance AND must be strictly positive; zero-valued or mismatched thresholds raise.kernel ({"epanechnikov", "triangular", "uniform"}) – Forwarded to
bias_corrected_local_linear()on the continuous paths. Ignored on the mass-point path.alpha (float) – CI level (0.05 for 95% CI).
vcov_type ({"classical", "hc1"} or None) – Mass-point-path only. When
None, the effective family falls back to therobustflag:robust=True->"hc1",robust=False->"classical"(the default construction). Explicit"hc2"and"hc2_bm"raiseNotImplementedErrorpending a 2SLS-specific leverage derivation. Ignored on the continuous paths (which use the CCT-2014 robust SE from Phase 1c); passing a non-defaultvcov_typeon a continuous path emits aUserWarningper fit call.robust (bool) – Backward-compat alias used only when
vcov_type is None:True->"hc1",False->"classical". Explicitvcov_typetakes precedence (e.g.,vcov_type="classical", robust=Trueruns classical). Only the mass-point path consumes these; continuous paths ignore both with a warning.cluster (str or None) – Column name for cluster-robust SE on the mass-point path (CR1). Ignored with a
UserWarningon the continuous paths in Phase 2a (nonparametric cluster support exists on Phase 1c but is exposed separately viabias_corrected_local_linear; the estimator-level knob is queued for a follow-up PR).
Notes
Diagnostics coverage.
HeterogeneousAdoptionDiDResults.bandwidth_diagnosticsand.bias_corrected_fitare populated only on the continuous paths; both areNoneon the mass-point path (which is parametric and has no bandwidth). Conversely,.n_mass_pointand.n_above_d_lowerare populated only on the mass-point path.Clone idempotence.
self.designstores the RAW user input (e.g.,"auto"); the resolved mode is stored on the result object at fit time. This mirrors Phase 1a’s_vcov_type_argpattern and keepsget_params()/sklearn.clone()round-trips exact.Examples
Construct a two-period HAD panel by hand. Phase 2a requires exactly two periods with
D_{g,1} = 0for every unit.>>> import numpy as np >>> import pandas as pd >>> from diff_diff import HeterogeneousAdoptionDiD >>> rng = np.random.default_rng(42) >>> G = 500 >>> dose_post = rng.uniform(0.0, 1.0, G) >>> dose_post[0] = 0.0 # at least one zero-dose unit for Design 1' >>> delta_y = 0.3 * dose_post + 0.1 * rng.standard_normal(G) >>> data = pd.DataFrame({ ... "unit": np.repeat(np.arange(G), 2), ... "period": np.tile([1, 2], G), ... "dose": np.column_stack([np.zeros(G), dose_post]).ravel(), ... "outcome": np.column_stack([np.zeros(G), delta_y]).ravel(), ... }) >>> est = HeterogeneousAdoptionDiD(design="auto") >>> result = est.fit( ... data, outcome_col="outcome", dose_col="dose", ... time_col="period", unit_col="unit", ... ) >>> result.design 'continuous_at_zero'
- __init__(design='auto', d_lower=None, kernel='epanechnikov', alpha=0.05, vcov_type=None, robust=False, cluster=None, n_bootstrap=999, seed=None)[source]
- get_params(deep=True)[source]
Return the raw constructor parameters (sklearn-compatible).
Matches the
sklearn.base.BaseEstimator.get_params()signature. Preserves the user’s original inputs - in particular,designreturns"auto"when the user set it to"auto"(even after fit), sosklearn.base.clone(est)round-trips exactly.
- set_params(**params)[source]
Set estimator parameters and return self (sklearn-compatible).
Only keys returned by
get_params()are accepted. Passing any other attribute name (including method names likefit) raisesValueErrorso the estimator cannot be silently corrupted by a mistyped or attacker-supplied key.Mutation is ATOMIC: validation runs on a proposed merged parameter dict before any attribute is overwritten. A failing call (invalid key, or an otherwise valid key whose value violates the constructor constraints) leaves
selfunchanged and safe to reuse.- Parameters:
params (Any)
- Return type:
- fit(data, outcome_col, dose_col, time_col, unit_col, first_treat_col=None, aggregate='overall', survey=None, weights=None, cband=True, *, survey_design=None, trends_lin=False)[source]
Fit the HAD estimator.
aggregate="overall"(default) fits on a two-period panel and returns aHeterogeneousAdoptionDiDResultswith the single-period WAS estimate.aggregate="event_study"fits on a multi-period panel (T > 2) and returns aHeterogeneousAdoptionDiDEventStudyResultswith per- event-time WAS estimates using a uniformF-1anchor (paper Appendix B.2).Both the overall and event-study paths are panel-only: the paper (Section 2) defines HAD on panel or repeated-cross-section data, but this implementation requires a balanced panel with a unit identifier so that unit-level first differences
ΔY_{g,t} = Y_{g,t} - Y_{g,t_anchor}can be formed. Repeated-cross-section inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator. Repeated-cross-section support is queued for a follow-up PR (tracked inTODO.md); it requires a separate identification path based on pre/post cell means rather than unit-level differences.- Parameters:
data (pd.DataFrame)
outcome_col (str) – Column names.
dose_col (str) – Column names.
time_col (str) – Column names.
unit_col (str) – Column names.
first_treat_col (str or None) – Optional first-treatment column (the period at which each unit first receives treatment;
0for never-treated). Required on the event-study path when the panel has more than two distinct first-treat values (staggered timing): the estimator auto-filters to the last-treatment cohort with aUserWarningper paper Appendix B.2 prescription. For common-adoption panels the column is optional; when omitted, the event-study path infers the first-treatment periodFfrom the dose invariant.aggregate ({"overall", "event_study"}) –
"overall"(default): returns a single-periodHeterogeneousAdoptionDiDResults(Phase 2a). Requires exactly two time periods."event_study"(Phase 2b): returns aHeterogeneousAdoptionDiDEventStudyResultswith per- event-time WAS estimates on the multi-period panel (paper Appendix B.2). Requires more than two time periods. Pointwise CIs per horizon; joint cross-horizon covariance is deferred to a follow-up PR. Staggered-timing panels are auto-filtered to the last-treatment cohort with aUserWarning.survey_design (SurveyDesign or None, keyword-only) – Survey design (sampling weights + optional strata / PSU / FPC) for design-based inference. Supported on ALL design × aggregate combinations after Phase 4.5 B: continuous paths (
continuous_at_zero,continuous_near_d_lower) on bothaggregate="overall"andaggregate="event_study", AND themass_pointdesign on both aggregates. Continuous paths compose the SE viacompute_survey_if_variance()(Binder 1983 TSL); weights propagate pointwise into the lprobust kernel. Mass-point composes the per-unit 2SLS IF on the HC1-scale and Binder-TSL-aggregates that — requiresvcov_type='hc1'(the classical default raisesNotImplementedErroron the survey path). Event-study fits withcband=Trueadd a multiplier-bootstrap simultaneous confidence band. Onlyweight_type="pweight"is supported (aweight/fweightraiseNotImplementedError). Survey design columns (strata / PSU / FPC) must be constant within unit (sampling-unit-level assignment); within-unit variance raisesValueError. Replicate-weight designs raiseNotImplementedError. Mutually exclusive with the deprecatedsurvey=andweights=aliases. Seedocs/methodology/REGISTRY.md§ HeterogeneousAdoptionDiD — “Note (HAD survey-design API consolidation)” for the full dispatch matrix.survey (SurveyDesign or None) – DEPRECATED alias of
survey_design=. Remains positional-or- keyword for one minor cycle to preserve pre-PR call shapes; will be removed in the next minor release. Prefersurvey_design=.weights (np.ndarray or None) – DEPRECATED alias for the per-row pweight shortcut. Remains positional-or-keyword for one minor cycle. Prefer adding the weights as a column on
dataand passingsurvey_design=SurveyDesign(weights='col_name')instead. Will be removed in the next minor release. Currently preserved as the analytical-HC1-sandwich shortcut (continuous: CCT-2014 weighted-robust; mass-point: pweight 2SLS sandwich) with the per-row → per-unit aggregation invariant intact. Mutually exclusive withsurvey_design=andsurvey=.cband (bool, default True) – Phase 4.5 B: controls the multiplier-bootstrap simultaneous confidence band on the weighted event-study path. When
True(default) andaggregate="event_study"AND any ofsurvey_design=/survey=/weights=is supplied, the fit populatescband_low/cband_high/cband_crit_value/cband_method/cband_n_bootstrapon the result. WhenFalsethose fields stayNone. No effect onaggregate="overall"or on unweighted event- study.n_bootstrapandseed(constructor params) control replicate count and RNG; defaults are 999 /None.trends_lin (bool, default False, keyword-only) – When
True, applies paper Eq 17 linear-trend detrending to per-event-time outcome evolutions. Mirrors RDIDHAD::did_had(..., trends_lin=TRUE). Per-group slope is estimated asY[g, F-1] - Y[g, F-2]; each event-timeeevolution is replaced bydy_dict[e] - (e+1) × slope(uniform formula that absorbs both effect-side detrending and placebo-side anchor swap). Requiresaggregate="event_study"ANDF >= 3(panel must include bothF-1andF-2); raisesNotImplementedErroronaggregate="overall"andValueErroronF < 3. The “consumed” placebo at event timee=-2is auto-dropped (R reduces max placebo lag by 1 with the same effect). Mutually exclusive with survey weighting (survey_design/survey/weights); raisesNotImplementedErrorif combined. DefaultFalsepreserves bit-exact backcompat with all pre-PR fits.
- Returns:
HeterogeneousAdoptionDiDResults – When
aggregate="overall"(the default; two-period only): single-period WAS estimate plus shared metadata.HeterogeneousAdoptionDiDEventStudyResults – When
aggregate="event_study"(multi-period panel; on staggered panels auto-filters to the last cohort plus never-treated): per-event-time WAS estimates with per- horizon arrays.
- Return type:
HeterogeneousAdoptionDiDResults | HeterogeneousAdoptionDiDEventStudyResults
HeterogeneousAdoptionDiDResults#
Single-period results container for HeterogeneousAdoptionDiD estimation.
- class diff_diff.HeterogeneousAdoptionDiDResults[source]
Bases:
objectEstimator output for
HeterogeneousAdoptionDiD.NaN-safe inference: the three downstream fields
t_stat,p_value, andconf_intare routed throughdiff_diff.utils.safe_inference(), which returns NaN on all three wheneverseis non-finite, zero, or negative.attandsethemselves are RAW estimator outputs from the chosen fit path and are NOT gated bysafe_inference:On the degenerate fit configurations (constant outcome on the continuous paths, all-units-at-d_lower / no-dose-variation on the mass-point path), the fit path explicitly returns
(att=nan, se=nan), which combined with the safe-inference gate yields all five fields NaN together.On the degenerate CR1 cluster configuration (mass-point path with a single cluster),
_fit_mass_point_2slsreturns(att=beta_hat, se=nan)-attstays finite because the Wald-IV ratio is well defined, but the cluster-robust SE is not, soseis NaN and the downstream triple becomes NaN via the safe-inference gate.
So the guaranteed NaN coupling is on the downstream triple (
t_stat,p_value,conf_int), not onatt. Theassert_nan_inferencefixture intests/conftest.pychecks the downstream triple against the gate contract and does not assumeattis NaN.- att
Point estimate of the WAS parameter on the beta-scale.
Design 1’ (paper Theorem 1 / Equation 3 identification; Equation 7 sample estimator):
att = (mean(ΔY) - tau_bc) / D_barwheretau_bcis the bias-corrected local-linear estimate oflim_{d v 0} E[ΔY | D_2 <= d]andD_bar = (1/G) * sum(D_{g,2}).Design 1 continuous-near-d_lower (paper Theorem 3 / Equation 11,
WAS_{d_lower}under Assumption 6):att = (mean(ΔY) - tau_bc) / mean(D_2 - d_lower)wheretau_bcis the bias-corrected local-linear estimate oflim_{d v d_lower} E[ΔY | D_2 <= d].Mass-point (paper Section 3.2.4): the Wald-IV / 2SLS coefficient directly -
(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0}).
- Type:
- se
Standard error on the beta-scale. For continuous designs:
Unweighted or
weights=<array>: CCT-2014 weighted-robust SE from Phase 1c divided by|den|(den= raw or weighted denominator depending on fit path).survey=SurveyDesign(...): Binder (1983) Taylor-series linearization of the per-unit IF (bias-corrected scale, aligned withtau_bc) routed throughcompute_survey_if_variance()for PSU-aggregated, FPC/strata-adjusted variance, divided by|den|.
In both cases the higher-order variance from
mean(ΔY)is dominated by the nonparametric boundary estimate in large samples and is not included in the leading-order formula. For mass-point, the 2SLS structural-residual sandwich SE.- Type:
- t_stat, p_value, conf_int
Routed through
safe_inference; NaN when SE is non-finite.- Type:
inference fields
- alpha
CI level used at fit time (0.05 for a 95% CI).
- Type:
- design
Resolved design mode:
"continuous_at_zero","continuous_near_d_lower", or"mass_point"."auto"is resolved to one of the three concrete modes before storing.- Type:
- target_parameter
Estimand label:
"WAS"for Design 1’,"WAS_d_lower"for the other two. Pins the estimand semantically even when two designs share the same divisor.- Type:
- d_lower
Support infimum
d_lower.0.0for Design 1’;float(d.min())for the other two.- Type:
- dose_mean
D_bar = (1/G) * sum(D_{g,2}).- Type:
- n_obs
Number of units contributing to the estimator (post panel aggregation to unit-level first differences).
- Type:
- n_treated
Number of units with
D_{g,2} > d_lower.- Type:
- n_control
Number of units at or below
d_lower(the “not-treated” subset).- Type:
- n_mass_point
Mass-point path only: number of units with
D_{g,2} == d_lower.Noneon continuous paths.- Type:
int or None
- n_above_d_lower
Mass-point path only: number of units with
D_{g,2} > d_lower.Noneon continuous paths.- Type:
int or None
- inference_method
"analytical_nonparametric"(continuous designs) or"analytical_2sls"(mass-point).- Type:
- vcov_type
Effective variance-covariance family used.
Noneon continuous paths (they use the CCT-2014 robust SE from Phase 1c, not the library’svcov_typeenum). Mass-point:"classical"or"hc1"whenclusteris not supplied, and"cr1"wheneverclusteris supplied (cluster-robust CR1 is computed regardless of the requestedvcov_typebecause classical/hc1 + cluster collapses to the same CR1 sandwich). Downstream consumers readingresult.to_dict()can inspect this field directly to determine the effective SE family.- Type:
str or None
- cluster_name
Column name of the cluster variable on the mass-point path when cluster-robust SE is requested.
Noneotherwise.- Type:
str or None
- survey_metadata
Repo-standard survey metadata dataclass from
diff_diff.survey.SurveyMetadata.Nonewhenfit()was called withoutsurvey=orweights=; populated on the continuous-dose weighted paths viadiff_diff.survey.compute_survey_metadata(). Exposesweight_type,effective_n,design_effect,sum_weights,n_strata,n_psu,weight_range, anddf_surveyfor downstream reporting consumers (BusinessReport, DiagnosticReport) that read these fields via attribute access. HAD-specific inference-method info (pweight vs Binder-TSL) is carried oninference_methodandvariance_formula.- Type:
SurveyMetadata or None
- bandwidth_diagnostics
Full Phase 1b MSE-DPI selector output on the continuous paths (when bandwidths were auto-selected).
Noneon the mass-point path (parametric, no bandwidth).- Type:
BandwidthResult or None
- bias_corrected_fit
Full Phase 1c bias-corrected local-linear fit on the continuous paths.
Noneon the mass-point path.- Type:
BiasCorrectedFit or None
- att: float
- se: float
- t_stat: float
- p_value: float
- alpha: float
- design: str
- target_parameter: str
- d_lower: float
- dose_mean: float
- n_obs: int
- n_treated: int
- n_control: int
- inference_method: str
- survey_metadata: SurveyMetadata | None
- bandwidth_diagnostics: BandwidthResult | None
- bias_corrected_fit: BiasCorrectedFit | None
- variance_formula: str | None = None
HAD-specific label for the SE formula on weighted fits, populated on BOTH continuous and mass-point designs (Phase 4.5 A / B):
"pweight"(continuous, weighted-robust CCT 2014 under theweights=shortcut),"survey_binder_tsl"(continuous, Binder 1983 TSL with PSU/strata/FPC undersurvey_design=SurveyDesign(...)),"pweight_2sls"(mass-point +weights=; label applied uniformly across vcov families — classical / HC1 / CR1 — on the weighted 2SLS path, with the actual sandwich resolved viavcov_type), or"survey_binder_tsl_2sls"(mass-point, Binder 1983 TSL undersurvey_design=).Noneon unweighted fits. Orthogonal tosurvey_metadatawhich is the repo-standarddiff_diff.survey.SurveyMetadatashared with downstream report/diagnostic consumers (no HAD-specific leakage).
- effective_dose_mean: float | None = None
Weighted denominator used by the beta-scale rescaling, populated on weighted fits across all designs:
sum(w_g · D_g) / sum(w_g)oncontinuous_at_zero,sum(w_g · (D_g - d_lower)) / sum(w_g)oncontinuous_near_d_lower, and the weighted Wald-IV dose gapmean(D | Z=1, w) - mean(D | Z=0, w)onmass_point(whereZ = 1{D > d_lower}). On the continuous designs reduces bit-exactly todose_mean/mean(D - d_lower)when weights are uniform or absent.Nonewhenfit()was called withoutsurvey_design=/survey=/weights=(usedose_meanthere). Exists becausedose_meanis the raw sample mean of the dose column; under weighted fits the estimator’s actual denominator is the weighted form above, and users reconstructing the β-scale value by hand need the weighted one.
- print_summary()[source]
Print the summary to stdout.
- Return type:
None
- to_dict()[source]
Return results as a dict of scalars + weighted-path surfaces.
Always-present keys mirror the dataclass fields:
att,se,t_stat,p_value,conf_int_lower/conf_int_upper,alpha,design,target_parameter,d_lower,dose_mean,n_obs/n_treated/n_control/n_mass_point/n_above_d_lower,inference_method,vcov_type,cluster_name.Weighted-path keys (
Noneon unweighted fits):survey_metadata: repo-standarddiff_diff.survey.SurveyMetadatadataclass (object, not dict) carryingweight_type/effective_n/design_effect/sum_weights/weight_range+n_strata/n_psu/df_survey(latter threeNoneon theweights=shortcut).variance_formula: HAD-specific SE label, populated on BOTH continuous and mass-point designs (Phase 4.5 A / B):"pweight"(continuous, weighted-robust CCT 2014 underweights=),"survey_binder_tsl"(continuous, Binder 1983 TSL undersurvey_design=),"pweight_2sls"(mass-point +weights=; label applied uniformly across vcov families — classical / HC1 / CR1 — with the sandwich resolved viavcov_type), or"survey_binder_tsl_2sls"(mass-point, Binder 1983 TSL undersurvey_design=). See the field docstring above for the full contract.effective_dose_mean: weighted denominator used by the beta-scale rescaling - weightedmean(D)oncontinuous_at_zero, weightedmean(D - d_lower)oncontinuous_near_d_lower, or the weighted Wald-IV dose gapmean(D | Z=1, w) - mean(D | Z=0, w)onmass_point.
- __init__(att, se, t_stat, p_value, conf_int, alpha, design, target_parameter, d_lower, dose_mean, n_obs, n_treated, n_control, n_mass_point, n_above_d_lower, inference_method, vcov_type, cluster_name, survey_metadata, bandwidth_diagnostics, bias_corrected_fit, variance_formula=None, effective_dose_mean=None)
- Parameters:
att (float)
se (float)
t_stat (float)
p_value (float)
alpha (float)
design (str)
target_parameter (str)
d_lower (float)
dose_mean (float)
n_obs (int)
n_treated (int)
n_control (int)
n_mass_point (int | None)
n_above_d_lower (int | None)
inference_method (str)
vcov_type (str | None)
cluster_name (str | None)
survey_metadata (SurveyMetadata | None)
bandwidth_diagnostics (BandwidthResult | None)
bias_corrected_fit (BiasCorrectedFit | None)
variance_formula (str | None)
effective_dose_mean (float | None)
- Return type:
None
HeterogeneousAdoptionDiDEventStudyResults#
Multi-period event-study results container for the Appendix B.2 extension.
- class diff_diff.HeterogeneousAdoptionDiDEventStudyResults[source]
Bases:
objectEvent-study results for
HeterogeneousAdoptionDiD(Phase 2b).Per-horizon arrays align with
event_timesby index; all per-horizon arrays have shape(n_horizons,). The anchor horizone = -1(i.e.,t = F - 1) is NOT included becauseY_{g, F-1} - Y_{g, F-1} = 0trivially and the WAS is not identified there.Per-horizon inference fields (
t_stat,p_value,conf_int_low,conf_int_high) are NaN-coupled to the per-horizonseviadiff_diff.utils.safe_inference();attandsethemselves are raw estimator outputs from the chosen design path on each horizon’s first differences.Design resolution is SHARED across horizons: the design,
d_lower,target_parameter, andinference_methodare single scalars determined once from the post-period dose distributionD_{g, F}(paper Appendix B.2 convention — the dose regressor is invariant across event-time horizons).- event_times
Integer event-time labels
e = t - F, sorted ascending. Excludese = -1(the anchor). Post-period horizons havee >= 0; pre-period placebos havee <= -2.- Type:
np.ndarray, shape (n_horizons,)
- att
Per-horizon WAS point estimate on the beta-scale (see
HeterogeneousAdoptionDiDResults.attfor the per-design formula, applied toΔY_t = Y_{g,t} - Y_{g,F-1}).- Type:
np.ndarray, shape (n_horizons,)
- se
Per-horizon standard error on the beta-scale. Three regimes:
Unweighted: per-horizon INDEPENDENT analytical sandwich (continuous: CCT-2014 weighted-robust divided by
|den|; mass-point: structural-residual 2SLS sandwich via_fit_mass_point_2sls). No cross-horizon covariance.``weights=`` shortcut: continuous paths still use the CCT-2014 weighted-robust SE from lprobust (
bc_fit.se_robust / |den|); mass-point uses the analytical weighted 2SLS pweight sandwich (HC1 / classical / CR1 depending onvcov_type+cluster=). No Binder-TSL composition on this path — inference is Normal (df=None).``survey=``: each horizon composes Binder (1983) Taylor-series linearization via
compute_survey_if_variance()on the per-unit β̂-scale IF (continuous + mass-point both route through the same helper).df_surveythreads intosafe_inferencefor t-inference.
Pointwise CIs are always populated; a simultaneous confidence band is available only on the weighted path via
cband_*below. Joint cross-horizon analytical covariance is not computed in this release (tracked in TODO.md).- Type:
np.ndarray, shape (n_horizons,)
- t_stat, p_value
Per-horizon inference triple element.
- Type:
np.ndarray, shape (n_horizons,)
- conf_int_low, conf_int_high
Per-horizon CI endpoints at level
alpha.- Type:
np.ndarray, shape (n_horizons,)
- n_obs_per_horizon
Per-horizon sample size (units contributing at that event time). In Phase 2b this equals
n_unitsfor every horizon because the validator rejects NaN in outcome / dose / unit columns upstream; tracked as a field for future flexibility (e.g., per-period missingness).- Type:
np.ndarray, shape (n_horizons,)
- alpha
CI level used at fit time (0.05 for a 95% CI).
- Type:
- design
Resolved design mode, shared across horizons:
"continuous_at_zero","continuous_near_d_lower", or"mass_point".- Type:
- target_parameter
Estimand label:
"WAS"for Design 1’ (continuous_at_zero),"WAS_d_lower"for the other two.- Type:
- d_lower
Support infimum used for all horizons.
0.0for Design 1’;float(d.min())otherwise.- Type:
- dose_mean
D_bar = (1/G) * sum(D_{g,F})computed on the fit sample (after the staggered last-cohort filter, if applied).- Type:
- F
First-treatment period label (arbitrary dtype — int, str, datetime). Identified by the multi-period dose invariant from the fitted data.
- Type:
- n_units
Number of unique units contributing to the fit. After staggered auto-filter: last-cohort units PLUS never-treated (
first_treat = 0) units retained as the untreated-group comparison per paper Appendix B.2. Only earlier-treated cohorts are dropped.- Type:
- inference_method
"analytical_nonparametric"(continuous designs) or"analytical_2sls"(mass-point). Shared across horizons.- Type:
- vcov_type
Effective variance-covariance family used on the mass-point path (
"classical","hc1", or"cr1"when cluster supplied).Noneon the continuous paths (they use CCT-2014 robust SE).- Type:
str or None
- cluster_name
Column name of the cluster variable when cluster-robust SE is requested.
Noneotherwise.- Type:
str or None
- survey_metadata
Repo-standard survey metadata dataclass from
diff_diff.survey.SurveyMetadata.Nonewhenfit()was called withoutsurvey=orweights=; populated on the weighted event-study path (Phase 4.5 B). SeeHeterogeneousAdoptionDiDResults.survey_metadatafor the attribute contract.- Type:
SurveyMetadata or None
- variance_formula
Per-horizon variance family (applied uniformly across horizons).
"pweight"/"pweight_2sls"on theweights=shortcut,"survey_binder_tsl"/"survey_binder_tsl_2sls"on thesurvey=path.Noneon unweighted fits.- Type:
str or None
- effective_dose_mean
Weighted denominator used by the β̂-scale rescaling (continuous paths: weighted sample mean of
dord - d_lower; mass-point: weighted Wald-IV dose gap).Noneon unweighted fits.- Type:
float or None
- cband_low, cband_high
Simultaneous confidence-band endpoints constructed by the multiplier-bootstrap sup-t procedure.
Noneon unweighted fits and whenfit(..., cband=False)is passed. Horizons withse <= 0or non-finiteseare NaN (matches the pointwise inference gate fromsafe_inference).- Type:
np.ndarray or None, shape (n_horizons,)
- cband_crit_value
Sup-t multiplier-bootstrap critical value at level
1 - alpha. Under a trivial resolved design (no strata / PSU / FPC) atH=1reduces toΦ⁻¹(1 − alpha/2) ≈ 1.96up to Monte Carlo error; under stratified designs the helper applies PSU-aggregation + stratum-demeaning +sqrt(n_h / (n_h - 1))small-sample correction so the bootstrap variance matches the analytical Binder-TSL target term-for- term.- Type:
float or None
- cband_method
"multiplier_bootstrap"on the weighted event-study path withcband=True, elseNone.- Type:
str or None
- cband_n_bootstrap
Number of multiplier-bootstrap replicates used to compute the sup-t critical value.
- Type:
int or None
- bandwidth_diagnostics
Per-horizon bandwidth diagnostics on the continuous paths;
Noneon the mass-point path. When non-None, aligned withevent_timesby index.- Type:
list[BandwidthResult] or None
- bias_corrected_fit
Per-horizon bias-corrected fit on the continuous paths;
Noneon the mass-point path. When non-None, aligned withevent_timesby index.- Type:
list[BiasCorrectedFit] or None
- filter_info
Populated when the staggered-timing last-cohort auto-filter fires. Keys:
"F_last"(kept cohort label),"n_kept"(units retained),"n_dropped"(units dropped),"dropped_cohorts"(list of dropped cohort labels).Nonewhen no filter was applied.- Type:
dict or None
- event_times: ndarray
- att: ndarray
- se: ndarray
- t_stat: ndarray
- p_value: ndarray
- conf_int_low: ndarray
- conf_int_high: ndarray
- n_obs_per_horizon: ndarray
- alpha: float
- design: str
- target_parameter: str
- d_lower: float
- dose_mean: float
- F: Any
- n_units: int
- inference_method: str
- survey_metadata: SurveyMetadata | None
- bandwidth_diagnostics: List[BandwidthResult | None] | None
- bias_corrected_fit: List[BiasCorrectedFit | None] | None
- variance_formula: str | None = None
Per-horizon variance family label (applied uniformly across all horizons in the fit). One of
"pweight"/"pweight_2sls"(when a per-row weight array was supplied, including via the deprecatedweights=alias; continuous / mass-point),"survey_binder_tsl"/"survey_binder_tsl_2sls"(when a SurveyDesign was supplied viasurvey_design=or the deprecatedsurvey=alias), orNoneon unweighted fits. Mirrors the static-pathvariance_formulafield.
- effective_dose_mean: float | None = None
Weighted denominator used by the β̂-scale rescaling. For continuous designs: weighted
sum(w · d)/sum(w)(continuous_at_zero) orsum(w · (d − d_lower))/sum(w)(continuous_near_d_lower). For mass-point: weighted Wald-IV dose gap.Noneon unweighted fits.
- cband_low: ndarray | None = None
Simultaneous confidence-band lower endpoints, shape
(n_horizons,).Noneon unweighted fits and whencband=Falseon the weighted event-study path. Derived from multiplier-bootstrap sup-t critical value:cband_low[e] = att[e] − cband_crit_value * se[e].
- cband_high: ndarray | None = None
Simultaneous confidence-band upper endpoints, shape
(n_horizons,). Seecband_low.
- cband_crit_value: float | None = None
Sup-t multiplier-bootstrap critical value at level
1 - alpha. Reduces toΦ⁻¹(1 − alpha/2) ≈ 1.96atH=1up to Monte Carlo error.Noneon unweighted fits and whencband=False.
- cband_method: str | None = None
"multiplier_bootstrap"on the weighted event-study path withcband=True, elseNone.
- cband_n_bootstrap: int | None = None
Number of multiplier-bootstrap replicates used to compute the sup-t critical value.
Noneon unweighted fits and whencband=False.
- print_summary()[source]
Print the summary to stdout.
- Return type:
None
- to_dict()[source]
Return results as a dict with per-horizon arrays and scalars.
Per-horizon arrays are converted to Python lists via
ndarray.tolist()(which unwraps NumPy scalar elements to nativeint/float); scalar fields are coerced to native Python types via_json_safe_scalarwhere relevant (NumPy scalars ->.item(), pandasTimestamp-> ISO string,Timedelta-> ISO string). The returned dict is JSON-serializable directly viajson.dumps.
- to_dataframe()[source]
Return a tidy per-horizon DataFrame.
Columns:
event_time, att, se, t_stat, p_value, conf_int_low, conf_int_high, n_obs. One row per event-time horizon. On the weighted event-study path withcband=True, also includescband_lowandcband_highcolumns.- Return type:
- __init__(event_times, att, se, t_stat, p_value, conf_int_low, conf_int_high, n_obs_per_horizon, alpha, design, target_parameter, d_lower, dose_mean, F, n_units, inference_method, vcov_type, cluster_name, survey_metadata, bandwidth_diagnostics, bias_corrected_fit, filter_info, variance_formula=None, effective_dose_mean=None, cband_low=None, cband_high=None, cband_crit_value=None, cband_method=None, cband_n_bootstrap=None)
- Parameters:
event_times (ndarray)
att (ndarray)
se (ndarray)
t_stat (ndarray)
p_value (ndarray)
conf_int_low (ndarray)
conf_int_high (ndarray)
n_obs_per_horizon (ndarray)
alpha (float)
design (str)
target_parameter (str)
d_lower (float)
dose_mean (float)
F (Any)
n_units (int)
inference_method (str)
vcov_type (str | None)
cluster_name (str | None)
survey_metadata (SurveyMetadata | None)
bandwidth_diagnostics (List[BandwidthResult | None] | None)
bias_corrected_fit (List[BiasCorrectedFit | None] | None)
variance_formula (str | None)
effective_dose_mean (float | None)
cband_low (ndarray | None)
cband_high (ndarray | None)
cband_crit_value (float | None)
cband_method (str | None)
cband_n_bootstrap (int | None)
- Return type:
None
HAD Pretests#
Diagnostic pretests for the HAD identification assumptions from de Chaisemartin
et al. (2026). The composite orchestrator
did_had_pretest_workflow() is a diagnostic battery only - it
does NOT pick the HAD design path (continuous_at_zero /
continuous_near_d_lower / mass_point); that is auto-detected inside
HeterogeneousAdoptionDiD.fit() from the dose support. The workflow has
two explicit modes selected by the caller via the aggregate= kwarg:
aggregate="overall" (default, two-period first-differenced sample) runs
single-period tests; aggregate="event_study" (multi-period panel with
three or more periods) runs joint multi-period tests. Both modes return a
unified HADPretestReport.
- diff_diff.did_had_pretest_workflow(data, outcome_col, dose_col, time_col, unit_col, first_treat_col=None, alpha=0.05, n_bootstrap=999, seed=None, *, aggregate='overall', survey_design=None, survey=None, weights=None, trends_lin=False)[source]#
Run the HAD pre-test workflow (paper Section 4.2-4.3).
Two dispatch modes via
aggregate:aggregate="overall"(default, two-period panel): runs paper steps 1 (qug_test()) and 3 (stute_test()+yatchew_hr_test()). Step 2 (Assumption 7 pre-trends) is NOT implemented on this path because a single-pre-period panel cannot support the joint Stute variant; the returned verdict flags the Assumption 7 gap explicitly so callers do not receive a misleading “TWFE safe” signal. For multi-period panels, passaggregate="event_study"to close the step-2 gap.aggregate="event_study"(multi-period panel, >= 3 periods): runs QUG + joint pre-trends Stute + joint homogeneity-linearity Stute, covering paper Section 4 steps 1-3 together. The step-3 Yatchew-HR alternative (a single-horizon swap-in for Stute) is subsumed by joint Stute on this path - the paper does not derive a joint Yatchew variant, so users who need Yatchew robustness under multi-period data should callyatchew_hr_test()on each(base, post)pair manually. (Paper step 4 is the decision itself - “use TWFE if none of the tests rejects” - not a separate test, so it has no code path here. Mirrors the framing in the module-level docstring at line 54 and_compose_verdict_event_studyat line 2735.)Eq 17 / Eq 18 linear-trend detrending (paper Section 5.2 Pierce- Schott application) is now SHIPPED on the event-study path via the
trends_linkeyword-only parameter (PR #392 / Phase 4 R-parity). Whentrends_lin=True, this workflow forwards the flag to bothjoint_pretrends_test()andjoint_homogeneity_test(); the consumed placebo atbase_period - 1is auto-dropped from step 2 and the workflow skips step 2 (pretrends_joint=None) if no earlier placebo survives. Mirrors RDIDHAD::did_had(..., trends_lin=TRUE). Mutually exclusive withaggregate="overall"(raisesNotImplementedError).- Parameters:
data (pd.DataFrame) – HAD panel. For
aggregate="overall": balanced two-period panel with pre-period dose = 0 for every unit. Foraggregate="event_study": balanced multi-period panel with >= 3 periods, an ordered time dtype (numeric, datetime, or ordered categorical), and the pre-period D=0 invariant across all pre-periods.outcome_col (str)
dose_col (str)
time_col (str)
unit_col (str)
first_treat_col (str or None, default None) – Optional first-treatment-period column. Required on the
aggregate="event_study"path when the panel is staggered (multi-cohort); the panel validator auto-filters to the last cohort and emitsUserWarning. The overall path uses this for cross-validation only.alpha (float, default 0.05)
n_bootstrap (int, default 999) – Replication count for the single-horizon Stute (overall) or joint Stute (event_study).
seed (int or None, default None) – Seed forwarded to the Stute bootstrap. QUG / Yatchew are deterministic.
aggregate (str, keyword-only, default
"overall") – Dispatch mode. Invalid values raiseValueError.survey_design (SurveyDesign or None, keyword-only, default None) – Survey design for design-based pretest inference. Linearity-family pretests use PSU-level Mammen multiplier bootstrap (Stute family) and weighted OLS + weighted variance components (Yatchew). The QUG step is skipped under survey with a
UserWarning(permanent deferral per Phase 4.5 C0). Replicate-weight designs raiseNotImplementedError. Mutually exclusive with the deprecatedsurvey=andweights=aliases.survey (SurveyDesign or None, keyword-only, default None) – DEPRECATED alias of
survey_design=. Will be removed in the next minor release; prefersurvey_design=.weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias for the per-row pweight shortcut. Prefer adding the weights as a column on
dataand passingsurvey_design=SurveyDesign(weights='col_name')instead. Will be removed in the next minor release. Currently routed through a synthetic trivialResolvedSurveyDesignso the same kernel handles both paths.trends_lin (bool, default False, keyword-only) – Forwards into
joint_pretrends_test()andjoint_homogeneity_test()on the event-study dispatch path. Mirrors RDIDHAD::did_had(..., trends_lin=TRUE). Requiresaggregate="event_study"; raisesNotImplementedErroronaggregate="overall"(the overall path’s qug + stute + yatchew block has no joint-pretest surface). Mutually exclusive with survey weighting at the joint-pretest layer; the joint wrappers raiseNotImplementedErrorif combined. Effective step-2 rule under trends_lin: the consumed placebo atbase_period - 1is dropped before step 2 is dispatched; if no earlier placebo survives the drop (e.g., a minimal 4-period panel withF=3wherebase_period=2and the only earlier placebo att=1is the consumed one), step 2 is skipped (pretrends_joint=None) and the workflow proceeds to step 3 (homogeneity). DefaultFalsepreserves bit-exact backcompat.
- Returns:
On the overall path:
stuteandyatchewpopulated,pretrends_joint/homogeneity_jointareNone. On the event-study path:pretrends_joint(Noneif no earlier pre-period) andhomogeneity_jointpopulated,stute/yatchewareNone.aggregateis recorded on the report for serialization dispatch. On the survey/weights path,qugisNone(Phase 4.5 C0 deferral); other components populated as on the unweighted path.- Return type:
- Raises:
ValueError – On invalid
aggregate; if more than one ofsurvey_design,survey,weightsis supplied (3-way mutex;survey=andweights=are deprecated aliases ofsurvey_design=); or any downstream front-door failure (panel balance, dtype, dose invariant).NotImplementedError – If
survey.replicate_weights is not None(replicate-weight pretests deferred to a parallel follow-up after Phase 4.5 C).
Notes
Survey/weighted data (Phase 4.5 C): under
survey=orweights=, the workflow:Skips QUG with a
UserWarningand setsqug=Noneon the report. QUG-under-survey is permanently deferred per Phase 4.5 C0; extreme-order-statistic tests are not smooth functionals of the empirical CDF and have no off-the-shelf survey-aware analog. Seequg_test()Notes for the full methodology rationale.Runs the linearity family with the survey-aware mechanism (PSU-level Mammen multiplier bootstrap for Stute / joint variants; weighted OLS + weighted variance components for Yatchew) routed via the existing kernels.
Verdict carries a
"linearity-conditional verdict; QUG-under- survey deferred per Phase 4.5 C0"suffix to remind callers that admissibility is conditional on the linearity family alone.`all_pass` drops the QUG-conclusiveness gate (one less precondition). The linearity-conditional rule splits by aggregate:
aggregate="overall"survey:Trueiff at least one of Stute/Yatchew is conclusive AND no conclusive test rejects (paper Section 4 step-3 “Stute OR Yatchew” wording).aggregate="event_study"survey:Trueiffpretrends_jointis non-None and conclusive,homogeneity_jointis conclusive, AND neither rejects. Both joint variants must be conclusive on the event-study path (same step-2 + step-3 closure as the unweighted aggregate, just without the QUG step).
Sister pretests are unchanged on the workflow path; direct callers can also pass
weights=/survey=tostute_test(),yatchew_hr_test(), etc. (Phase 4.5 C extends each helper’s signature). Per-unit constant-within-unit invariant on weights / strata / psu / fpc is enforced by the workflow viadiff_diff.had._aggregate_unit_weights()/diff_diff.had._aggregate_unit_resolved_survey().References
de Chaisemartin et al. (2026), Section 4.2-4.3, Theorem 4, Appendix D, Theorem 7.
- class diff_diff.HADPretestReport[source]
Bases:
objectComposite output of
did_had_pretest_workflow().Two dispatch shapes, distinguished by
aggregate:aggregate="overall"(default, two-period panel): bundles paper steps 1 (QUG) and 3 (linearity via Stute + Yatchew-HR) on a two-period first-differenced sample. Step 2 (Assumption 7 pre-trends) is NOT implemented on this path and is explicitly flagged in the verdict; callers must run pre-trends separately.aggregate="event_study"(multi-period panel, >= 3 periods): bundles QUG + joint pre-trends Stute + joint homogeneity-linearity Stute. The joint Stute variants close the paper step-2 gap; the event-study verdict does NOT emit the “paper step 2 deferred” caveat. Step 3 adjudication uses joint Stute only - no joint Yatchew variant exists because the paper does not derive one; users who need Yatchew robustness under multi-period data can runyatchew_hr_test()on each (base, post) pair manually.- qug
Populated by default;
Noneonly when the workflow runs undersurvey=/weights=(Phase 4.5 C path), where the QUG step is permanently skipped per Phase 4.5 C0 (extreme-value theory under complex sampling not a settled toolkit; seequg_test()).- Type:
QUGTestResults or None
- stute
Populated when
aggregate == "overall";Nonewhenaggregate == "event_study".- Type:
StuteTestResults or None
- yatchew
Populated when
aggregate == "overall";Nonewhenaggregate == "event_study".- Type:
YatchewTestResults or None
- pretrends_joint
Populated when
aggregate == "event_study"and at least one earlier pre-period exists;Noneon the overall path or when only the immediate base pre-period is available.- Type:
StuteJointResult or None
- homogeneity_joint
Populated when
aggregate == "event_study";Noneon the overall path.- Type:
StuteJointResult or None
- all_pass
On the unweighted overall path: same Phase 3 semantics - True iff QUG is conclusive AND at least one of Stute/Yatchew is conclusive AND no conclusive test rejects. On the unweighted event-study path: True iff
np.isfinite(qug.p_value),pretrends_joint is not None and np.isfinite(pretrends_joint.p_value),np.isfinite(homogeneity_joint.p_value), AND none of the three rejects. On the survey/weights path (Phase 4.5 C) the QUG-conclusiveness gate is dropped (qug=Noneper C0 deferral); the linearity-conditional rule splits by aggregate:aggregate="overall"survey: True iff at least one of Stute/Yatchew is conclusive AND no conclusive test rejects.aggregate="event_study"survey: True iffpretrends_jointis non-None and conclusive,homogeneity_jointis conclusive, AND neither rejects. (Both joint variants must be conclusive on the event-study path - same step-2 + step-3 closure as the unweighted aggregate, just without the QUG step.)
Mirrors Phase 3’s
bool(np.isfinite(p_value))convention - no.conclusive()helper on any result dataclass.- Type:
- verdict
Human-readable classification. Paper rule applies symmetrically: TWFE is admissible only if NONE of the implemented tests rejects. Conclusive rejections are the primary verdict; unresolved steps append as
"; additional steps unresolved: ..."rather than replacing the rejection.- Type:
- alpha
- Type:
- n_obs
Unit count. For overall: units after two-period first-difference aggregation. For event_study: units after balanced-panel validation and (if applicable) last-cohort auto-filter.
- Type:
- aggregate
"overall"or"event_study". Determines which component fields are populated and which branch of serialization methods to render.- Type:
- qug: QUGTestResults | None
- stute: StuteTestResults | None
- yatchew: YatchewTestResults | None
- all_pass: bool
- verdict: str
- alpha: float
- n_obs: int
- pretrends_joint: StuteJointResult | None = None
- homogeneity_joint: StuteJointResult | None = None
- aggregate: str = 'overall'
- print_summary()[source]
Print the summary to stdout.
- Return type:
None
- to_dict()[source]
Return a JSON-safe nested dict of the full report.
On
aggregate="overall", the output schema is bit-exact with Phase 3 ({qug, stute, yatchew, all_pass, verdict, alpha, n_obs}) - no new keys, no aggregate field. Onaggregate="event_study", the output carriesaggregate,pretrends_joint,homogeneity_jointand omits theNone-valuedstute/yatchewkeys entirely.
- to_dataframe()[source]
Return a tidy 3-row DataFrame (one row per implemented test).
Columns (stable across aggregates):
[test, statistic_name, statistic_value, p_value, reject, alpha, n_obs]. Row identifiers vary by aggregate:aggregate="overall": rows arequg,stute,yatchew_hr(Phase 3 schema, unchanged).aggregate="event_study": rows arequg,pretrends_joint,homogeneity_joint.
Rows for
None-valued components (e.g.pretrends_jointwhen no earlier pre-period exists) are emitted with NaN statistic values andreject=Falseto preserve the 3-row shape.- Return type:
- __init__(qug, stute, yatchew, all_pass, verdict, alpha, n_obs, pretrends_joint=None, homogeneity_joint=None, aggregate='overall')
- Parameters:
qug (QUGTestResults | None)
stute (StuteTestResults | None)
yatchew (YatchewTestResults | None)
all_pass (bool)
verdict (str)
alpha (float)
n_obs (int)
pretrends_joint (StuteJointResult | None)
homogeneity_joint (StuteJointResult | None)
aggregate (str)
- Return type:
None
Single-period tests (aggregate="overall")#
- diff_diff.qug_test(d, alpha=0.05, *, survey_design=None, survey=None, weights=None)[source]#
Run the QUG null test for the support infimum (paper Theorem 4).
Tests
H_0: d_lower = 0using the order-statistic ratioT = D_{(1)} / (D_{(2)} - D_{(1)}), rejecting whenT > 1/alpha - 1. Under the null, the asymptotic limit law ofTis the ratio of two independent Exp(1) variables with CDFF(t) = t / (1 + t), so the one-sided p-value is1 / (1 + T).Zero-dose observations are filtered out (the test targets the infimum of the treated support). A
UserWarningis emitted naming the exclusion count. When fewer than two positive doses remain, the test returns all-NaN inference withreject=False.- Parameters:
d (np.ndarray, shape (G,)) – Post-period dose vector. Must be 1D numeric and contain no NaN.
alpha (float, default 0.05) – One-sided significance level. Must satisfy
0 < alpha < 1.survey_design (ResolvedSurveyDesign or None, keyword-only, default None) – Permanently rejected with
NotImplementedError(Phase 4.5 C0 decision gate). Surface-symmetric kwarg with the rest of the HAD family — accepted in the signature so all 8 HAD entry points share the canonical kwarg name, butqug_testhas no survey-aware migration target. See Notes – Survey/weighted data.survey (SurveyDesign or None, keyword-only, default None) – DEPRECATED alias of
survey_design=. Surface-symmetric only; any non-Nonevalue still raisesNotImplementedError— the deprecation is about kwarg-name consolidation, NOT a migration path (there is no survey-aware QUG). Will be removed in the next minor release.weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias of
survey_design=for the per-row pweight shortcut on the rest of the HAD array-in family. Onqug_test, surface-symmetric only; any non-Nonevalue still raisesNotImplementedError— there is no migration path (make_pweight_design(arr)is NOT a valid QUG migration target). Will be removed in the next minor release.
- Returns:
Result dataclass with
t_stat,p_value,reject, and sample metadata.- Return type:
- Raises:
ValueError – If
dis not 1D numeric or contains NaN, or ifalphais not in(0, 1), or if more than one ofsurvey_design/survey/weightsis non-None (mutex).NotImplementedError – If any of
survey_design,survey,weightsis non-None. See Notes – Survey/weighted data.
Notes
Tie-break: when
D_{(1)} == D_{(2)}the statistic is undefined. The test returnst_stat=NaN, p_value=NaN, reject=Falsewith aUserWarningrather than raising.Survey/weighted data: QUG is permanently deferred under survey-weighted or pweight inputs (Phase 4.5 C0 decision gate, 2026-04). The test statistic uses extreme order statistics
(D_{(1)}, D_{(2)}), which are NOT smooth functionals of the empirical CDF – standard survey machinery (Binder TSL linearization, multiplier bootstrap, Rao-Wu rescaled bootstrap) does not yield a calibrated test, and under cluster sampling theExp(1)/Exp(1)limit law’s independence assumption breaks. The extreme-value-theory-under-unequal-probability- sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests; no off-the-shelf survey-aware QUG exists. Phase 4.5 C ships survey-aware Stute viadid_had_pretest_workflow()(which skips the QUG step under survey/weights and runs the linearity family with a PSU-level Mammen multiplier bootstrap for Stute and weighted OLS + pweight-sandwich variance components for Yatchew). Seedocs/methodology/REGISTRY.md§ “QUG Null Test” for the full methodology note.References
de Chaisemartin, Ciccia, D’Haultfoeuille, Knau (2026, arXiv:2405.04465v6), Theorem 4 and Section 4.2.
- diff_diff.stute_test(d, dy, alpha=0.05, n_bootstrap=999, seed=None, *, survey_design=None, survey=None, weights=None)[source]#
Run the Stute Cramer-von Mises linearity test (paper Appendix D).
Tests
H_0: E[ΔY | D_2]is linear inD_2(paper Assumption 8). The test statistic is the sorted-residual cusum CvMS = (1 / G^2) * sum_{g=1}^G (sum_{h=1}^g eps_(h))^2
where
eps_(h)is theh-th OLS residual after sorting byd. The p-value is the bootstrap tail probability(1 + sum(S_b >= S)) / (B + 1)under the Mammen (1993) two-point wild bootstrap; each bootstrap iteration refits OLS ondy_b = a_hat + b_hat * d + eps * etawith multiplier weightseta.- Parameters:
d (np.ndarray, shape (G,)) – Dose and first-difference outcome vectors.
dy (np.ndarray, shape (G,)) – Dose and first-difference outcome vectors.
alpha (float, default 0.05) – Significance level. Must satisfy
0 < alpha < 1.n_bootstrap (int, default 999) – Number of Mammen wild bootstrap replications. Must be
>= 99(below which the discretised p-value grid is too coarse).seed (int or None, default None) – Seed for
np.random.default_rng. Pass an integer for reproducible results.survey_design (ResolvedSurveyDesign or None, keyword-only, default None) – Already-resolved survey design (per-unit). Array-in helpers accept
ResolvedSurveyDesignONLY; passing aSurveyDesignraisesTypeErrorwith migration guidance. For the pweight-only shortcut, usesurvey_design=make_pweight_design(arr). Triggers the survey-aware Stute calibration: PSU-level Mammen multipliers viadiff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch(), broadcast to per-unit residual perturbation, with weighted CvM recompute. Replicate-weight designs raiseNotImplementedError.survey (ResolvedSurveyDesign or None, keyword-only, default None) – DEPRECATED alias of
survey_design=. Will be removed in the next minor release.weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias of
survey_design=make_pweight_design(arr). Will be removed in the next minor release.
- Return type:
- Raises:
ValueError – If
d/dyare not 1D numeric, contain NaN, have unequal lengths, if anydvalue is negative (paper Section 2 HAD support restriction), ifalphais outside(0, 1), or ifn_bootstrap < 99. Also raised if more than one ofsurvey_design,survey,weightsis supplied (3-way mutex;survey=andweights=are deprecated aliases ofsurvey_design=).TypeError – If
survey_design=SurveyDesign(...)(or the deprecatedsurvey=SurveyDesign(...)alias) is passed; array-in helpers acceptResolvedSurveyDesignonly. Usesurvey_design=make_pweight_design(arr)for pweight-only or pre-resolve viaSurveyDesign(...).resolve(data).NotImplementedError – If
survey.replicate_weights is not None. Replicate-weight pretests are a parallel follow-up after Phase 4.5 C; the per-replicate weight-ratio rescaling for the OLS-on-residuals refit step is not covered by the multiplier-bootstrap composition used here.
Notes
Sample-size gate: below
G = 10the CvM statistic is not well-calibrated. In that case the function emitsUserWarningand returns all-NaN inference rather than raising.Large-G warning: at
G > 100_000the per-iteration refit dominates runtime; the function emits aUserWarningpointing users toyatchew_hr_test(). Memory usage remainsO(G)regardless (no G x G matrix).Survey/weighted data (Phase 4.5 C): when
weightsorsurveyis supplied, the OLS baseline becomes weighted OLS (_fit_weighted_ols_intercept_slope()), the bootstrap multipliers become PSU-level Mammen draws (broadcast to per-obs perturbation), and the test statistic uses_cvm_statistic_weighted(). Per-unit constant-within-unit invariant on weights/strata/psu/fpc is the CALLER’s responsibility; the workflow (did_had_pretest_workflow()) enforces it via_aggregate_unit_weights()/_aggregate_unit_resolved_survey()fromhad.py. Atw = ones(G), weighted helpers reduce bit-exactly to the unweighted versions but bootstrap p-values diverge by Monte-Carlo noise (different RNG consumption between batchedgenerate_survey_multiplier_weights_batchand per-iteration_generate_mammen_weights); use the distribution-equivalence reduction test (large B) for trivial-pweight parity, NOT numerical equivalence.References
Stute, W. (1997). Nonparametric model checks for regression. Annals of Statistics 25, 613-641. Mammen, E. (1993). Bootstrap and wild bootstrap for high-dimensional linear models. Annals of Statistics 21, 255-285. de Chaisemartin et al. (2026), Appendix D.
- diff_diff.yatchew_hr_test(d, dy, alpha=0.05, *, null='linearity', survey_design=None, survey=None, weights=None)[source]#
Run the Yatchew heteroskedasticity-robust specification test.
Tests one of two nulls (selected via
null=) using the variance-ratio statisticT_hr = sqrt(G) * (sigma2_lin - sigma2_diff) / sigma2_W
where
sigma2_lin = (1/G) * sum(eps^2) # residuals under chosen null sigma2_diff = (1/(2G)) * sum((dy_{(g)} - dy_{(g-1)})^2) # Yatchew differencing sigma2_W = sqrt((1/(G-1)) * sum(eps_{(g)}^2 * eps_{(g-1)}^2))
and
_{(g)}denotes sort byd. Undernull="linearity"(default, paper Assumption 8 / Theorem 7)epsare residuals from OLSdy = a + b*d + eps. Undernull="mean_independence"eps = dy - mean(dy)(intercept-only OLS), mirroring RYatchewTest::yatchew_test(order=0). Thesigma2_diffandsigma2_Wformulas are identical between the two modes - the only delta is the residual definition. Rejection uses the one-sided standard-normal critical valuez_{1-alpha}.- Parameters:
d (np.ndarray, shape (G,)) – Dose and first-difference outcome vectors.
dy (np.ndarray, shape (G,)) – Dose and first-difference outcome vectors.
alpha (float, default 0.05) – One-sided significance level.
null ({"linearity", "mean_independence"}, keyword-only, default "linearity") –
Which null hypothesis the test targets:
"linearity"(default): H_0E[dY | D]is linear inD(paper Assumption 8, Theorem 7). Residuals come from OLSdy = a + b*d + eps. Bit-exact backcompat with pre-PR calls."mean_independence": H_0E[dY | D] = E[dY](mean independence ofdYfromD). Residuals come from intercept-only OLSdy = a + eps, soeps = dy - mean(dy). Mirrors RYatchewTest::yatchew_test(order=0). Used by the R-parity test on placebo Yatchew rows (Credible-Answers/did_hadrunsorder=0on placebos to test pre-trends as a non-parametric mean-independence assertion).
dis required under both modes (the sort-by-ddifferencing step is null-agnostic).survey_design (ResolvedSurveyDesign or None, keyword-only, default None) – Already-resolved survey design (per-unit). Array-in helpers accept
ResolvedSurveyDesignONLY; passing aSurveyDesignraisesTypeError. For pweight-only, usesurvey_design=make_pweight_design(arr). When supplied, the OLS baseline becomes weighted OLS and all three variance components become their pweight-sandwich analogs. PSU clustering is NOT propagated through the variance-ratio statistic (would require deriving a survey-aware variance-of-variance estimator; out of scope per Phase 4.5 C). Replicate-weight designs raiseNotImplementedError.survey (ResolvedSurveyDesign or None, keyword-only, default None) – DEPRECATED alias of
survey_design=. Will be removed in the next minor release.weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias of
survey_design=make_pweight_design(arr). Will be removed in the next minor release.
- Return type:
- Raises:
ValueError – If
d/dyare not 1D numeric, contain NaN, have unequal lengths, if anydvalue is negative (paper Section 2 HAD support restriction), or ifalphais outside(0, 1). Also raised if more than one ofsurvey_design,survey,weightsis supplied (3-way mutex;survey=andweights=are deprecated aliases ofsurvey_design=), or if any weight is non-positive.TypeError – If
survey_design=SurveyDesign(...)(or the deprecatedsurvey=SurveyDesign(...)alias) is passed; array-in helpers acceptResolvedSurveyDesignonly. Usesurvey_design=make_pweight_design(arr)for pweight-only or pre-resolve viaSurveyDesign(...).resolve(data).NotImplementedError – If
survey.replicate_weights is not None(deferred follow-up).
Notes
Sample-size gate: below
G = 3the difference-variance estimator is undefined; the function emitsUserWarningand returns NaN rather than raising.Dose ties: REJECTED with
UserWarning+ all-NaN result. The difference-based variance estimatorsigma2_diffand the heteroskedasticity-robust scalesigma4_Wboth use adjacent differences of quantities sorted byd; under tied doses the within-tie row ordering is arbitrary (stable sort falls back to input order) so the statistic becomes order-dependent rather than data-dependent. Callers with tied doses (mass-point designs, discretised dose registers) should usestute_test()instead - its tie-safe Cramer-von Mises statistic collapses tie blocks to the post-tie cumulative sum and is provably order-invariant under within-tie permutations.Exact-linear short-circuit: when the OLS residual sum-of-squares is below IEEE precision relative to the centered total sum of squares (
sum(eps^2) <= 1e-24 * sum((dy - dybar)^2), i.e. essentially1 - R^2 == 0), the test short-circuits tot_stat_hr=-inf, p_value=1.0, reject=False- Assumption 8 holds exactly, the formal statistic is-infunder the one-sided critical value, and the correct decision is fail-to-reject. This shortcut is translation- invariant because the comparison is against centered TSS (not rawsum(dy^2)).Degenerate
sigma4_W = 0with non-zero residuals: when the adjacent-residual-product sum vanishes AFTER the exact-linear shortcut is bypassed (e.g. residuals alternate zero/non-zero after sorting), the formal statistic is+infor-infdepending on the sign of the numeratorsigma2_lin - sigma2_diff. The function returns the sign-aware limit (p=0, reject=Truefor positive numerator;p=1, reject=Falsefor negative;NaNfor zero) with aUserWarning, rather than unconditionally mapping this top=1(which would flip a legitimate rejection).Survey/weighted data (Phase 4.5 C): when
weightsorsurveyis supplied, all three variance components use their pweight-sandwich analogs:sigma2_lin = sum(w * eps^2) / sum(w)(weighted OLS residual variance).sigma2_diff = sum(w_avg * (dy_g - dy_{g-1})^2) / (2 * sum(w))wherew_avg_g = (w_g + w_{g-1}) / 2and the divisor usessum(w)(notsum(w_avg)) so the formula reduces bit-exactly to the unweighted(1/(2G))divisor atw = ones(G).sigma4_W = sum(w_avg * eps_g^2 * eps_{g-1}^2) / sum(w_avg)with arithmetic-mean pair weights; reduces to the unweighted(1/(G-1))divisor atw = ones(G).T_hr = sqrt(sum(w)) * (sigma2_lin - sigma2_diff) / sigma2_W.
The pair-weight convention follows Krieger-Pfeffermann (1997, §3) for design-consistent inference on smooth functionals; PSU clustering is NOT propagated through the variance-ratio statistic. Strictly positive weights are required (the adjacent-difference formula has
sum(w_avg)in the denominator). Per-unit constant-within-unit invariant on weights/strata/psu/fpc is the CALLER’s responsibility.References
Yatchew, A. (1997). An elementary estimator of the partial linear model. Economics Letters 57, 135-143. de Chaisemartin et al. (2026), Theorem 7 / Equation 29. Krieger, A., Pfeffermann, D. (1997). Testing of distribution functions from complex sample surveys. Journal of Official Statistics 13(2), 123-142.
- class diff_diff.QUGTestResults[source]
Bases:
objectResult of
qug_test()(paper Theorem 4).The QUG test rejects
H_0: d_lower = 0when the order-statistic ratioT = D_{(1)} / (D_{(2)} - D_{(1)})exceeds1/alpha - 1. Under the null, the asymptotic limit law ofTis the ratio of two independent Exp(1) random variables, with CDFF(t) = t / (1 + t), sop_value = 1 / (1 + T).- t_stat
D_{(1)} / (D_{(2)} - D_{(1)}). NaN when fewer than 2 non-zero observations remain or when the two smallest doses tie.- Type:
- p_value
1 / (1 + t_stat)under the null. NaN whent_statis NaN.- Type:
- reject
Trueifft_stat > critical_value.Falseon NaN statistic.- Type:
- alpha
Significance level used.
- Type:
- critical_value
1 / alpha - 1. Populated even when the statistic is NaN so downstream readers can inspect the decision threshold.- Type:
- n_obs
Number of observations after filtering to
d > 0.- Type:
- n_excluded_zero
Number of zero-dose observations excluded from the sample.
- Type:
- d_order_1
Smallest positive dose
D_{(1)}. NaN whenn_obs < 2.- Type:
- d_order_2
Second-smallest positive dose
D_{(2)}. NaN whenn_obs < 2.- Type:
- t_stat: float
- p_value: float
- reject: bool
- alpha: float
- critical_value: float
- n_obs: int
- n_excluded_zero: int
- d_order_1: float
- d_order_2: float
- print_summary()[source]
Print the summary to stdout.
- Return type:
None
- class diff_diff.StuteTestResults[source]
Bases:
objectResult of
stute_test()(paper Appendix D).The Stute test rejects the null that
E[ΔY | D_2]is linear inD_2(paper Assumption 8) when the sorted-residual CvM statisticS = (1/G^2) Σ (Σ_{h=1}^g eps_{(h)})^2exceeds the Mammen wild bootstrap1 - alphaquantile.- cvm_stat
CvM statistic. NaN when
G < 10(below the threshold the statistic is not well-calibrated).- Type:
- p_value
Bootstrap p-value
(1 + sum(S_b >= S)) / (B + 1). NaN when the statistic is NaN.- Type:
- reject
Trueiffp_value <= alpha.Falseon NaN.- Type:
- alpha
Significance level used.
- Type:
- n_bootstrap
Number of Mammen wild bootstrap replications.
- Type:
- n_obs
Number of observations.
- Type:
- seed
Seed passed to
np.random.default_rng.Nonewhen unseeded.- Type:
int or None
- cvm_stat: float
- p_value: float
- reject: bool
- alpha: float
- n_bootstrap: int
- n_obs: int
- print_summary()[source]
Print the summary to stdout.
- Return type:
None
- class diff_diff.YatchewTestResults[source]
Bases:
objectResult of
yatchew_hr_test()(paper Theorem 7 / Equation 29).Heteroskedasticity-robust specification test using Yatchew’s difference-based variance estimator. Two nulls are supported via the
null=argument onyatchew_hr_test()and reflected on thenull_formattribute below:"linearity"(default; paper Theorem 7, the same null asstute_test(), residuals from OLSdy ~ 1 + d) and"mean_independence"(R-parity extension mirroring RYatchewTest::yatchew_test(order=0), residuals from intercept-only OLSdy ~ 1). The test statisticT_hr = sqrt(G) * (sigma2_lin - sigma2_diff) / sigma2_Wis asymptotically N(0, 1) under H_0 in both modes; rejection uses the one-sided standard-normal critical value. Only the residual definition (and thereforesigma2_lin) differs between modes — thesigma2_diff/sigma2_W/ sort-by-dmachinery is shared.- t_stat_hr
Test statistic
T_hrfrom paper Equation 29. NaN whenG < 3.- Type:
- p_value
1 - Phi(T_hr). NaN when the statistic is NaN.- Type:
- reject
TrueiffT_hr >= critical_value.Falseon NaN.- Type:
- alpha
Significance level used.
- Type:
- critical_value
One-sided standard-normal critical value
z_{1 - alpha}.- Type:
- sigma2_lin
Residual variance under the chosen null. Under
null_form="linearity": residual variance from OLS ofdyond. Undernull_form="mean_independence":(1/G) * sum((dy - mean(dy))^2), the population variance ofdy.- Type:
- sigma2_diff
Yatchew differencing variance
(1 / (2G)) * sum((dy_{(g)} - dy_{(g-1)})^2)- divisor is2G(paper-literal), NOT2(G-1).- Type:
- sigma2_W
Heteroskedasticity-robust scale
sqrt((1 / (G-1)) * sum(eps_{(g)}^2 * eps_{(g-1)}^2)).- Type:
- n_obs
Number of observations.
- Type:
- null_form
"linearity"(default; H_0:E[dY|D]is linear inD, residuals from OLSdy ~ 1 + d) or"mean_independence"(H_0:E[dY|D] = E[dY], residuals from intercept-only OLSdy ~ 1). Mirrors RYatchewTest::yatchew_test’sorderargument (order=1↔"linearity";order=0↔"mean_independence").- Type:
- t_stat_hr: float
- p_value: float
- reject: bool
- alpha: float
- critical_value: float
- sigma2_lin: float
- sigma2_diff: float
- sigma2_W: float
- n_obs: int
- null_form: str = 'linearity'
- print_summary()[source]
Print the summary to stdout.
- Return type:
None
- __init__(t_stat_hr, p_value, reject, alpha, critical_value, sigma2_lin, sigma2_diff, sigma2_W, n_obs, null_form='linearity')
Joint multi-period tests (aggregate="event_study")#
- diff_diff.stute_joint_pretest(residuals_by_horizon, fitted_by_horizon, doses, design_matrix, *, alpha=0.05, n_bootstrap=999, seed=None, null_form='custom', survey_design=None, survey=None, weights=None)[source]#
Joint Cramer-von Mises pretest across multiple horizons.
Generalizes
stute_test()to K horizons with the joint statisticS_joint = sum_k S_k, whereS_kis the single- horizon CvM on residualseps_{g,k}. Inference is via Mammen wild bootstrap with a shared multipliereta_gacross horizons per unit to preserve the vector-valued empirical process’s unit-level dependence.Note: sum-of-CvMs aggregation follows the standard joint specification-test construction (Delgado 1993; Escanciano 2006). The paper does not prescribe an aggregation; sum-of-CvMs balances power across diffuse vs concentrated alternatives and bootstraps cleanly with the shared-eta structure.
Bootstrap uses the literal per-iteration OLS refit form (paper Appendix D) for consistency with Phase 3’s
stute_test().XtX_inv_Xtis precomputed once (same design matrix each iteration), so the refit cost is O(Gp) per bootstrap draw and the overall loop is dominated by_cvm_statistic()across K horizons.- Parameters:
residuals_by_horizon (dict[str, np.ndarray]) –
{label: eps_g}per horizon. All values must have identical lengthGand be unit-ordered consistently withdoses.fitted_by_horizon (dict[str, np.ndarray]) –
{label: fitted_g}per horizon. Required to reconstruct bootstrap outcomesdy*_{g,k} = fitted_{g,k} + eps_{g,k} * eta_gunder the null.doses (np.ndarray, shape (G,)) – Dose per unit. Shared across horizons (HAD contract: dose is time-invariant per unit). Must be finite and non-negative.
design_matrix (np.ndarray, shape (G, p)) – Regression design used in the per-horizon bootstrap refit. Mean-independence:
[1](intercept only). Linearity:[1, doses]. The matrix is identical across horizons.alpha (see
stute_test().)n_bootstrap (see
stute_test().)seed (see
stute_test().)null_form (str) – Diagnostic label recorded on the result (
"mean_independence"|"linearity"|"custom"). The wrappersjoint_pretrends_test()andjoint_homogeneity_test()set this automatically.survey_design (ResolvedSurveyDesign or None, keyword-only, default None) – Already-resolved per-unit survey design (Phase 4.5 C). Array-in helpers accept
ResolvedSurveyDesignONLY; passing aSurveyDesignraisesTypeError. For pweight-only, usesurvey_design=make_pweight_design(arr). When supplied, the bootstrap is a PSU-level Mammen multiplier bootstrap with the multiplier matrix shared across horizons within each replicate (preserves both vector-valued empirical-process unit-level dependence + PSU clustering). Replicate-weight designs raiseNotImplementedError; non-pweight weight types are rejected. Variance-unidentified designs (df_survey <= 0) return NaN with aUserWarninginstead of calibrating against an all-zero multiplier matrix.survey (ResolvedSurveyDesign or None, keyword-only, default None) – DEPRECATED alias of
survey_design=. Will be removed in the next minor release.weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias of
survey_design=make_pweight_design(arr). Will be removed in the next minor release.
- Returns:
On the common path, a populated result with bootstrap-based
p_valueandcvm_stat_joint. On the small-sample branch (G < _MIN_G_STUTE), constant-dose branch (np.ptp(doses) <= 0), or any-NaN branch in the input residuals / fitted arrays, returns an all-NaN result (withreject=Falseand the fullper_horizon_statsdict keyed by the validated horizon labels) and emits aUserWarningfor the first two branches. Mirrors the single-horizonstute_test()contract so event-study workflows on small or staggered-filtered panels surface an inconclusive report rather than crashing.- Return type:
- Raises:
ValueError – On empty input, key-mismatch, stringified-label collisions between distinct raw keys, shape-mismatch,
dosescontaining negative values,n_bootstrap < _MIN_N_BOOTSTRAP, or invalidalpha.G < _MIN_G_STUTEdoes NOT raise; see Returns.
- diff_diff.joint_pretrends_test(data, outcome_col, dose_col, time_col, unit_col, pre_periods, base_period, first_treat_col=None, *, alpha=0.05, n_bootstrap=999, seed=None, survey_design=None, survey=None, weights=None, trends_lin=False)[source]#
Joint Stute pre-trends test (paper Section 4.2 step 2).
Data-in wrapper around
stute_joint_pretest()for the mean-independence nullE[Y_{g,t} - Y_{g,base} | D_{g,treat}] = mu_tacross multiple pre-period placebos. For eacht in pre_periods, residuals are the deviations ofY_{g,t} - Y_{g,base}from their cross-unit mean (an intercept-only OLS fit); the joint CvM tests that the conditional mean depends onD.Use this wrapper to close the paper’s step-2 pre-trends gap that
did_had_pretest_workflow()otherwise flags. On a panel with at least one earlier pre-period, theaggregate="event_study"dispatch calls this wrapper internally.- Parameters:
data (pd.DataFrame)
outcome_col (str)
dose_col (str)
time_col (str)
unit_col (str)
pre_periods (list) – Non-empty list of pre-period labels (all
< base_period, all withD = 0across every unit). Empty list raises; the workflow dispatch handles the “no earlier pre-period” case by settingpretrends_joint=Nonerather than calling this wrapper.base_period (period label) – The reference period. Must not be in
pre_periods. Must also satisfyD = 0across every unit (reciprocal of the pre-period HAD invariant - base is itself a pre-period in the four-step workflow).first_treat_col (str or None) – Forwarded to the underlying panel validator; matched cohort handling follows the HAD contract (staggered auto-filter warns and proceeds on last cohort; solo cohort proceeds).
alpha (as in
stute_test().)n_bootstrap (as in
stute_test().)seed (as in
stute_test().)survey_design (SurveyDesign or None, keyword-only, default None) – Survey design (Phase 4.5 C). Resolved on the filtered panel; replicate-weight designs raise
NotImplementedError;weight_typemust be"pweight". Forwarded tostute_joint_pretest()as a per-unitResolvedSurveyDesign. Mutually exclusive with the deprecatedsurvey=andweights=aliases.survey (SurveyDesign or None, keyword-only, default None) – DEPRECATED alias of
survey_design=. Will be removed in the next minor release.weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias for the per-row pweight shortcut. Prefer
survey_design=SurveyDesign(weights='col_name')against your dataframe instead. Will be removed in the next minor release.trends_lin (bool, default False, keyword-only) – When
True, applies paper Eq 17 / Eq 18 linear-trend detrending: per-group slope estimated asY[g, base] - Y[g, base - 1]and subtracted from each pre-period horizon’s outcome evolution as(t - base) × slope. Mirrors RDIDHAD::did_had(..., trends_lin=TRUE)on its joint Stute pre-trends surface (paper Section 5.2 Pierce-Schott application). Requiresbase_periodto equal the last validated pre-period (t_pre_list[-1], i.e. the canonicalF-1anchor). Direct callers passing a non-terminal base get aValueError— Eq 17 / R both anchor atF-1and any other anchor would compute a different slope and detrending. The previous validated pre-period (t_pre_list[-2],F-2) must also be present so the slope is identified. The “consumed” placebo atF-2is dropped frompre_periodsexplicitly (its detrended residual is mechanically zero by construction); aUserWarningfires when the filter triggers. Ifpre_periodsbecomes empty after the drop, raisesValueError(no testable placebo horizons remain). Mutually exclusive with survey weighting (survey_design/survey/weights); raisesNotImplementedErrorif combined. DefaultFalsepreserves bit-exact backcompat.
- Return type:
StuteJointResult with
null_form = "mean_independence".
- diff_diff.joint_homogeneity_test(data, outcome_col, dose_col, time_col, unit_col, post_periods, base_period, first_treat_col=None, *, alpha=0.05, n_bootstrap=999, seed=None, survey_design=None, survey=None, weights=None, trends_lin=False)[source]#
Joint Stute homogeneity-linearity test (paper Section 4.3 joint).
Data-in wrapper around
stute_joint_pretest()for the linearity nullE[Y_{g,t} - Y_{g,base} | D_{g,t}] = beta_{0,t} + beta_{fe,t} * D_{g,t}across multiple post-period horizons. For eacht in post_periods, residuals are from an OLS regression ofY_{g,t} - Y_{g,base}on[1, D_g]; the joint CvM tests whether the conditional mean is nonlinear inDin any horizon.Used by
did_had_pretest_workflow()withaggregate="event_study"as the step-3 test (no joint Yatchew variant exists - the paper does not derive one; users who need Yatchew-style adjacent-difference robustness can callyatchew_hr_test()on each (base, post) pair manually).- Parameters:
data (pd.DataFrame)
outcome_col (str)
dose_col (str)
time_col (str)
unit_col (str)
post_periods (list) – Non-empty list of post-period labels (all strictly
> base_periodby chronological order; each withD > 0for some unit, i.e. at least one treated unit per horizon).base_period (period label) – The reference period (last pre-period in the event-study convention). Must not be in
post_periods.first_treat_col (str or None) – Forwarded to the underlying panel validator.
alpha (as in
stute_test().)n_bootstrap (as in
stute_test().)seed (as in
stute_test().)survey_design (SurveyDesign or None, keyword-only, default None) – Survey design (Phase 4.5 C). Same contract as
joint_pretrends_test(). Mutually exclusive with the deprecatedsurvey=andweights=aliases.survey (SurveyDesign or None, keyword-only, default None) – DEPRECATED alias of
survey_design=. Will be removed in the next minor release.weights (np.ndarray or None, keyword-only, default None) – DEPRECATED alias for the per-row pweight shortcut. Prefer
survey_design=SurveyDesign(weights='col_name')against your dataframe instead. Will be removed in the next minor release.trends_lin (bool, default False, keyword-only) – When
True, applies paper page-32 linear-trend detrending: per-group slope estimated asY[g, base] - Y[g, base - 1]and applied to each post-period horizon’s outcome evolution as(t - base) × slope(forward extrapolation into post). Same slope estimator asjoint_pretrends_test(). Mirrors RDIDHAD::did_had(..., trends_lin=TRUE)on its joint homogeneity surface (paper Section 4.3, Pierce-Schott p=0.40 anchor). Requiresbase_periodto equal the last validated pre-period (t_pre_list[-1], the canonicalF-1anchor) ANDF-2to be present in the panel so the slope is identified. Direct callers passing a non- terminal base get aValueError— Eq 17 / R both anchor atF-1. Mutually exclusive with survey weighting; raisesNotImplementedErrorif combined. DefaultFalsepreserves bit-exact backcompat.
- Return type:
StuteJointResult with
null_form = "linearity".
- class diff_diff.StuteJointResult[source]
Bases:
objectResult of
stute_joint_pretest()(joint Cramer-von Mises across horizons).Aggregates the per-horizon Stute (1997) CvM statistic into a joint specification test:
S_joint = sum_k S_k, whereS_kis the single-horizon CvM on residualseps_{g,k}. Inference is via Mammen (1993) wild bootstrap with a shared multipliereta_gacross horizons per unit (Delgado-Manteiga 2001; Hlavka-Huskova 2020) to preserve the unit-level dependence structure of the vector-valued empirical process.Two nulls are supported via the thin wrappers
joint_pretrends_test()(mean-independence:E[Y_t - Y_base | D] = mu_t, design matrix[1]) andjoint_homogeneity_test()(linearity:E[Y_t - Y_base | D_t] = beta_{0,t} + beta_{fe,t} * D, design matrix[1, D]). Both wrappers accept atrends_lin: bool = Falsekeyword-only flag (PR #392): whenTrue, applies paper Eq 17 / Eq 18 linear-trend detrending before the joint CvM using per-group slopeY[g, F-1] - Y[g, F-2].- cvm_stat_joint
Joint statistic
S_joint = sum_k S_k. NaN on NaN-propagation.- Type:
- p_value
Bootstrap p-value
(1 + sum(S*_b >= S_joint)) / (B + 1). NaN when the statistic is NaN.1.0when the per-horizon exact- linear short-circuit fires (all horizons machine-exact linear).- Type:
- reject
Trueiffp_value <= alpha. AlwaysFalseon NaN.- Type:
- alpha
Significance level.
- Type:
- horizon_labels
Horizon identifiers as
str(t)for each period. String identity only - NOT a chronological ordering key. Callers who need chronological order should preserve the original period values alongside (a downstream plotter sorting labels lexicographically will misorder e.g.["2003-Q10", "2003-Q2", ...]).
- per_horizon_stats
{label: S_k}diagnostic. Per-horizon p-values are NOT exposed (decomposing the joint bootstrap into K independent loops is a K-fold memory/time cost; deferred). Callers who need per-horizon p-values can callstute_test()separately on each (period, residual) pair.On NaN-propagation (any horizon has NaN input), this dict is preserved with
{label: np.nan for label in horizon_labels}, NOT an empty dict, NOT a partial dict: the keys carry diagnostic value (which horizons were attempted), the NaN values signal non-propagation.
- n_bootstrap
- Type:
- n_obs
Number of units
G.- Type:
- n_horizons
- Type:
- seed
- Type:
int or None
- null_form
"mean_independence"(fromjoint_pretrends_test()) or"linearity"(fromjoint_homogeneity_test())."custom"when called directly viastute_joint_pretest()without a wrapper.- Type:
- exact_linear_short_circuited
Truewhen every horizon’s residual SSR to centered TSS ratio is below_EXACT_LINEAR_RELATIVE_TOL; bootstrap is skipped andp_value = 1.0. The per-horizon check ensures a single degenerate horizon does not collapse the joint test when other horizons have nontrivial residuals.- Type:
- cvm_stat_joint: float
- p_value: float
- reject: bool
- alpha: float
- horizon_labels: list
- n_bootstrap: int
- n_obs: int
- n_horizons: int
- null_form: str
- exact_linear_short_circuited: bool
- print_summary()[source]
Print the summary to stdout.
- Return type:
None
- to_dataframe()[source]
Return a one-row DataFrame of the top-level result fields.
- Return type:
- __init__(cvm_stat_joint, p_value, reject, alpha, horizon_labels, per_horizon_stats, n_bootstrap, n_obs, n_horizons, seed, null_form, exact_linear_short_circuited)