Interactive notebook

This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.

Tutorial 22: Survey-Weighted HAD - The BRFSS-Shape Rollout#

The HAD series so far ran on simple iid panels. T20 (docs/tutorials/20_had_brand_campaign.ipynb) walked the headline workflow; T21 (docs/tutorials/21_had_pretest_workflow.ipynb) layered in the pretest diagnostics. Realistic federal-survey designs (BRFSS / CPS / NHANES shape - stratified household samples with PSU clustering, post-stratification weights, and a finite population correction) became demonstrable end-to-end through the HAD workflow only on 2026-05-14, when the gate on SurveyDesign(strata=...) was lifted across the Stute pretest family. T22 shows the full design-aware workflow on a stylized BRFSS-shape state rollout: 60 states organized as 5 strata x 6 PSUs/stratum x 2 states/PSU, with post-stratification raking weights, an iid PSU x period shock to inject real cluster correlation, and a single national health-campaign intensity rolled out at week 5.

Prerequisites. T16 (docs/tutorials/16_survey_did.ipynb) for the SurveyDesign API and survey-DiD background; T20 for the HAD headline fit; T21 for the pretest workflow.

Sections:

  1. Survey design adds two problems for HAD

  2. The panel: BRFSS-shape state rollout

  3. Naive vs survey-aware headline fit

  4. Why the SE inflation is modest for HAD

  5. Event-study under the survey design

  6. Pretest workflow under the survey design

  7. Communicating the design-based verdict to leadership

  8. Extensions + summary checklist

[ ]:
import warnings

import numpy as np
import pandas as pd

from diff_diff import (
    HAD,
    SurveyDesign,
    did_had_pretest_workflow,
    generate_continuous_did_data,
)

try:
    import matplotlib.pyplot as plt
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False

# Locked DGP parameters (mirrored in tests/test_t22_had_survey_design_drift.py).
MAIN_SEED = 87
N_UNITS = 60
N_PERIODS = 8
COHORT_PERIOD = 5
TRUE_SLOPE = 100.0
BASELINE_OUTCOME = 35.0
DOSE_LOW = 5.0
DOSE_HIGH = 50.0

# Survey-design layer.
N_STRATA = 5
PSU_PER_STRATUM = 6
STATES_PER_PSU = 2
WEIGHT_CV_TARGET = 0.30
FPC_PER_STRATUM = 30
PSU_PERIOD_SHOCK_SD = 1.5
SD_SEED = 87

# Pretest workflow.
WORKFLOW_SEED = 22
N_BOOTSTRAP = 999

1. Survey design adds two problems for HAD#

When the panel comes out of a complex survey instead of a simple random sample, HAD’s headline numbers acquire two new failure modes:

  • Cluster correlation. PSUs (households-within-tract, individuals-within-PSU, states-within-region) share unobserved shocks. Treating them as iid under-states the SE; a 95% pointwise CI written off the naive variance under-covers the truth.

  • Unequal sampling weights. When the probability of being sampled is not flat across the population, the unweighted point estimate is the sample-quantity, not the population-quantity it usually has to represent for policy.

The survey-design fix is the standard library composition: stratified PSU-sandwich variance via Binder-TSL (Binder 1983; Lumley 2004) with a finite-population correction, weights as pweight, and PSU as the clustering unit. T16 walks the API end-to-end on a binary-cohort DiD; T22 specializes it for HAD. One headline difference to flag up front: HAD’s WAS-d_lower estimand reads variance from a local-linear neighborhood at the boundary rather than from a regression coefficient on the full panel - so survey-aware SE inflation manifests differently than the rule-of-thumb you would expect from CallawaySantAnna or LinearRegression. We come back to this in section 4.

The Phase 4.5 C0 deferral is the second user-facing caveat: the HAD QUG step (paper Section 4 step 1; tests H_0: d_lower = 0 via the extreme order statistic of the dose) is permanently deferred under survey/weights. Extreme order statistics are not Hadamard- differentiable functionals of the empirical CDF, so no off-the-shelf survey-adjusted EVT fits the slot. The pretest workflow surfaces this deferral in the verdict text and on report.qug (set to None); the linearity-conditional remainder of the verdict still runs. Section 6 walks the verdict text in detail.

2. The panel: BRFSS-shape state rollout#

The DGP shape is identical to T20 (60 states x 8 weeks; rollout at w=5; dose ~ Uniform[$5K, $50K] of supplemental campaign spend per state; true ATT linear in dose with slope=100 screening uptake per $1K spend; outcome shifted by a baseline of 35 per 1,000 adults). The new piece is the survey layer: a permutation-based assignment of states to a 5 x 6 stratum x PSU grid (5 census-region x urbanicity strata, 6 PSUs/stratum, 2 states/PSU = 60), per-state post- stratification raking weights with CV ~ 0.30, and a PSU x period iid shock injected into the outcome so cluster correlation survives DiD first-differencing.

That last point is load-bearing. A time-invariant per-PSU random intercept cancels exactly under DiD: the (post - pre) average shifts each state by a constant, and the slope on dose is unaffected. generate_survey_did_data documents this directly (prep_dgp.py:1517-1523): “the time-invariant PSU RE cancels in the treatment-vs-control time-difference and the cluster-robust / survey SE would be smaller than naive OLS SE.” The fix is a PSU x period interaction shock: an iid draw per (psu, period) cell. T16’s reference helper carries the same composition (psu_re_sd=2.0 paired with psu_period_factor=1.0).

Helper kept in-notebook so practitioners can copy it into their own workflow:

[ ]:
def attach_brfss_survey_columns(
    panel,
    *,
    seed,
    n_strata=N_STRATA,
    psu_per_stratum=PSU_PER_STRATUM,
    states_per_psu=STATES_PER_PSU,
    weight_cv=WEIGHT_CV_TARGET,
    fpc_per_stratum=FPC_PER_STRATUM,
    psu_period_shock_sd=PSU_PERIOD_SHOCK_SD,
):
    """Attach a BRFSS-shape survey design to a HAD panel.

    Adds: stratum (int), psu_id (int), weight (float), fpc (float).
    Also injects a PSU x period shock into the outcome so cluster
    correlation survives DiD differencing. Constant-within-state on
    the four design columns.
    """
    rng = np.random.default_rng(seed)
    state_ids = np.sort(panel["state_id"].unique())
    n_states = len(state_ids)
    n_psu = n_strata * psu_per_stratum
    if n_states != n_psu * states_per_psu:
        raise ValueError(
            f"state count {n_states} must equal n_strata*psu_per_stratum*"
            f"states_per_psu = {n_psu * states_per_psu}"
        )
    perm = rng.permutation(n_states)
    psu_block = np.repeat(np.arange(n_psu), states_per_psu)
    psu_of_state = psu_block[np.argsort(perm)]
    stratum_of_state = psu_of_state // psu_per_stratum
    base_per_stratum = np.array([0.8, 0.9, 1.0, 1.1, 1.3])
    base_w = base_per_stratum[stratum_of_state]
    sigma = np.sqrt(np.log(1 + weight_cv ** 2))
    pert = rng.lognormal(mean=-0.5 * sigma ** 2, sigma=sigma, size=n_states)
    w_per_state = base_w * pert
    state_lookup = pd.DataFrame({
        "state_id": state_ids,
        "stratum": stratum_of_state.astype(np.int64),
        "psu_id": psu_of_state.astype(np.int64),
        "weight": w_per_state,
        "fpc": float(fpc_per_stratum),
    })
    out = panel.merge(state_lookup, on="state_id", how="left")
    n_periods_local = int(panel["week"].max() - panel["week"].min() + 1)
    psu_period_shocks = rng.normal(0.0, psu_period_shock_sd, size=(n_psu, n_periods_local))
    week_min = int(panel["week"].min())
    shock_lookup = pd.DataFrame([
        {"psu_id": int(p), "week": int(w + week_min), "psu_period_shock": float(psu_period_shocks[p, w])}
        for p in range(n_psu)
        for w in range(n_periods_local)
    ])
    out = out.merge(shock_lookup, on=["psu_id", "week"], how="left")
    out["screening_uptake"] = out["screening_uptake"] + out["psu_period_shock"]
    return out.drop(columns=["psu_period_shock"])

[ ]:
raw = generate_continuous_did_data(
    n_units=N_UNITS,
    n_periods=N_PERIODS,
    cohort_periods=[COHORT_PERIOD],
    never_treated_frac=0.0,
    dose_distribution="uniform",
    dose_params={"low": DOSE_LOW, "high": DOSE_HIGH},
    att_function="linear",
    att_intercept=0.0,
    att_slope=TRUE_SLOPE,
    unit_fe_sd=8.0,
    time_trend=0.5,
    noise_sd=2.0,
    seed=MAIN_SEED,
)
panel = raw.copy()
panel.loc[panel["period"] < panel["first_treat"], "dose"] = 0.0
panel = panel.rename(columns={
    "unit": "state_id",
    "period": "week",
    "outcome": "screening_uptake",
    "dose": "spend_k",
})
panel["screening_uptake"] = panel["screening_uptake"] + BASELINE_OUTCOME

panel = attach_brfss_survey_columns(panel, seed=SD_SEED)

per_state = panel.groupby("state_id").agg(
    weight=("weight", "first"),
    stratum=("stratum", "first"),
    psu_id=("psu_id", "first"),
).reset_index()
print(f"states                   = {panel['state_id'].nunique()}")
print(f"strata                   = {panel['stratum'].nunique()} ({sorted(panel['stratum'].unique())})")
print(f"PSUs                     = {panel['psu_id'].nunique()}")
print(f"weight CV across states  = {per_state['weight'].std() / per_state['weight'].mean():.3f}")
print(f"weight range             = [{per_state['weight'].min():.3f}, {per_state['weight'].max():.3f}]")
print(f"FPC (per row, constant)  = {panel['fpc'].iloc[0]:.0f}")
panel.head()

[ ]:
sd = SurveyDesign(
    weights="weight",
    strata="stratum",
    psu="psu_id",
    fpc="fpc",
)
print(sd)

3. Naive vs survey-aware headline fit#

T20’s headline path collapses to two periods (pre-mean vs post-mean per state) and fits HAD with design="auto" — the heuristic lands on continuous_near_d_lower (Design 1), with the target estimand WAS_d_lower. T22 fits the same configuration twice: once naive (no survey_design argument), once survey-aware. Both fits share the same local-linear machinery at d_lower; the survey path additionally consumes the weights in the local-linear tau_bc boundary fit, in the weighted ΔY mean, and in the weighted denominator E_w[D - d_lower]. On this DGP the weight CV (~0.30) and dose distribution do not co-vary strongly enough to shift the boundary slope materially, so the two ATTs land close. The SE and CI differ because the survey path folds PSU clustering and FPC into the variance via Binder TSL.

[ ]:
p2 = panel.copy()
p2["period"] = (p2["week"] >= COHORT_PERIOD).astype(int) + 1
panel_2p = p2.groupby(["state_id", "period"], as_index=False).agg(
    screening_uptake=("screening_uptake", "mean"),
    spend_k=("spend_k", "mean"),
    stratum=("stratum", "first"),
    psu_id=("psu_id", "first"),
    weight=("weight", "first"),
    fpc=("fpc", "first"),
)
panel_2p.head()

[ ]:
with warnings.catch_warnings():
    warnings.filterwarnings(
        "ignore", message=r".*pweight.*", category=UserWarning
    )
    # NB: we deliberately do NOT filter the
    # `continuous_near_d_lower ... Assumption 5 or 6` warning here
    # so it fires once on the headline fit (the §3 interpretation
    # cell discusses it in prose). Subsequent fit cells suppress it
    # as redundant.
    naive = HAD(design="auto").fit(
        panel_2p,
        outcome_col="screening_uptake",
        dose_col="spend_k",
        time_col="period",
        unit_col="state_id",
    )
    survey = HAD(design="auto").fit(
        panel_2p,
        outcome_col="screening_uptake",
        dose_col="spend_k",
        time_col="period",
        unit_col="state_id",
        survey_design=sd,
    )
print(f"design auto-detected: naive={naive.design}, survey={survey.design}")
print(f"target parameter   : naive={naive.target_parameter}, survey={survey.target_parameter}")

[ ]:
def _row(name, r):
    lo, hi = r.conf_int
    return {
        "fit": name,
        "ATT": round(float(r.att), 3),
        "SE": round(float(r.se), 3),
        "CI low": round(float(lo), 3),
        "CI high": round(float(hi), 3),
        "CI width": round(float(hi - lo), 3),
        "covers truth (slope=100)": "YES" if lo <= TRUE_SLOPE <= hi else "NO",
    }

comparison = pd.DataFrame([_row("naive", naive), _row("survey", survey)])
print(comparison.to_string(index=False))
print()
print(f"se_ratio (survey / naive) = {survey.se / naive.se:.3f}")

Reading the table. Both fits land on the same identification path (continuous_near_d_lower), report point estimates very close to the true slope of 100, and produce CIs that cover the truth. The survey SE is larger than the naive SE (the design machinery picks up the cluster correlation from the PSU x period shock). The inflation is modest - around 1.10x - which is smaller than the inflation a similar weight_cv and PSU shock would produce on, say, a CallawaySantAnna ATT. That’s a feature of HAD’s local-linear estimand, not a bug. We unpack it in the next section.

A non-testable identifying assumption. Design 1 requires Assumption 6 for point identification of WAS_d_lower (or Assumption 5 for sign identification only) — both are about local linearity of the dose-response near d_lower and are not testable from data. The §6 linearity diagnostics (Stute, Yatchew, joint pretrends/homogeneity) are necessary but not sufficient. Assumption 6 itself is justified from domain knowledge (is the marginal effect of the next $1K of supplemental spend roughly constant in the $5K-$50K range?). The library fires a UserWarning on every Design 1 fit; the headline cell above lets it surface, subsequent cells filter it as redundant. This is the load-bearing methodology caveat alongside the QUG-under-survey deferral (§6).

4. Why the SE inflation is modest for HAD#

The intuition. WAS_d_lower is the average slope above d_lower, but its leading-order variance reads off a local-linear boundary fit at d_lower — and that fit only weights units near the boundary. With dose ~ Uniform[5, 50] and 60 states, only a handful of states sit close to d_lower ~ 5, and those are the units whose influence functions dominate Var(WAS_d_lower). The PSU-level cluster correlation can amplify the variance only as much as those few units are correlated with PSU-mates. With 2 states/PSU and only a small share of states near the boundary, the within-PSU correlation has a small lever to act on.

Formal definition. WAS_{d̲} = (E[ΔY] - lim_{d↓d̲} E[ΔY | D_2 d]) / E[D_2 - d̲] (REGISTRY § HeterogeneousAdoptionDiD; had.py:21-31). The estimator uses a local-linear boundary fit at d_lower to estimate the lim_{d↓d̲} E[ΔY | D_2 d] term — the only component requiring nonparametric identification.

Contrast with the event-study path: each event-time horizon is a separate local-linear fit on that horizon’s first differences ΔY_{g,t} = Y_{g,t} - Y_{g,F-1} against the common regressor D_{g,F} (paper Appendix B.2). The pointwise per-horizon SE is computed from each horizon’s own residual variance — not aggregated across post-period observations. So the per-horizon survey-vs-naive ratio is an empirical property of how PSU correlation interacts with each horizon’s ΔY distribution on this seeded DGP, not a method-contract guarantee. We compute it below to inspect:

[ ]:
with warnings.catch_warnings():
    warnings.filterwarnings(
        "ignore", message=r".*pweight.*", category=UserWarning
    )
    warnings.filterwarnings(
        "ignore",
        message=r".*continuous_near_d_lower.*Assumption.*",
        category=UserWarning,
    )
    naive_es = HAD(design="auto").fit(
        panel,
        outcome_col="screening_uptake",
        dose_col="spend_k",
        time_col="week",
        unit_col="state_id",
        first_treat_col="first_treat",
        aggregate="event_study",
    )
    survey_es_for_ratio = HAD(design="auto").fit(
        panel,
        outcome_col="screening_uptake",
        dose_col="spend_k",
        time_col="week",
        unit_col="state_id",
        first_treat_col="first_treat",
        aggregate="event_study",
        survey_design=sd,
    )

ratios = pd.DataFrame({
    "event_time": list(naive_es.event_times),
    "naive_se": [round(float(s), 3) for s in naive_es.se],
    "survey_se": [round(float(s), 3) for s in survey_es_for_ratio.se],
    "se_ratio": [round(float(s_s / s_n), 3) for s_n, s_s in zip(naive_es.se, survey_es_for_ratio.se)],
})
print(ratios.to_string(index=False))
print()
print(f"overall SE ratio (from section 3) = {survey.se / naive.se:.3f}")

The per-horizon table reflects the IF-spread argument. Per paper Appendix B.2 the event-study uses D_{g,F} (the period-F dose) as the SAME dose regressor for every horizon - pre-period placebos and post-period horizons alike. Per-horizon SEs differ because the outcome side ΔY_{g,t} = Y_{g,t} - Y_{g,F-1} does. Pre-period placebos have small ΔY (no treatment signal — just within-pre noise between t and F-1) so the local-linear at d_lower fits low residual variance and reads small SEs. Post-period horizons have ΔY that scales with the treatment effect (slope * D_{g,F} plus noise), so the local-linear residual variance is larger and the SE is larger. The survey machinery folds PSU clustering into both surfaces and produces a moderate per-horizon inflation on top. The takeaway for the practitioner: for HAD specifically, do not eyeball SE inflation against your DEFF expectation from regression-coefficient examples. Read the design-aware SE off the fit object and report it; the magnitude of the survey-vs-naive ratio is determined by the IF-weighting of your particular estimand.

5. Event-study under the survey design#

Refit with aggregate="event_study" and cband=True to get per-horizon ATT estimates plus a sup-t confidence band that adjusts for the multiple-horizon comparison. The cband is computed via a multiplier bootstrap that aggregates the per-PSU IF tensor under the survey design.

[ ]:
with warnings.catch_warnings():
    warnings.filterwarnings(
        "ignore", message=r".*pweight.*", category=UserWarning
    )
    warnings.filterwarnings(
        "ignore",
        message=r".*continuous_near_d_lower.*Assumption.*",
        category=UserWarning,
    )
    es = HAD(design="auto").fit(
        panel,
        outcome_col="screening_uptake",
        dose_col="spend_k",
        time_col="week",
        unit_col="state_id",
        first_treat_col="first_treat",
        aggregate="event_study",
        survey_design=sd,
        cband=True,
    )

es_table = pd.DataFrame({
    "event_time": list(es.event_times),
    "att": [round(float(a), 2) for a in es.att],
    "se": [round(float(s), 3) for s in es.se],
    "ci_low": [round(float(c), 2) for c in es.conf_int_low],
    "ci_high": [round(float(c), 2) for c in es.conf_int_high],
    "cband_low": [round(float(c), 2) for c in es.cband_low],
    "cband_high": [round(float(c), 2) for c in es.cband_high],
})
print(es_table.to_string(index=False))
print()
print(f"cband critical value = {es.cband_crit_value:.4f} ({es.cband_method})")

[ ]:
if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(8, 4.2))
    et = np.asarray(es.event_times, dtype=float)
    ax.axhline(0.0, color="black", linewidth=0.5)
    ax.axhline(TRUE_SLOPE, color="gray", linewidth=0.6, linestyle=":", label="true slope = 100")
    ax.fill_between(et, np.asarray(es.cband_low), np.asarray(es.cband_high), alpha=0.15, label="sup-t cband (survey)")
    # Use the estimator's stored pointwise CI endpoints (built from t
    # critical values with df_survey under SurveyDesign — NOT hard-coded
    # 1.96 * se, which would silently understate uncertainty under
    # survey-aware inference). matplotlib errorbar takes asymmetric
    # yerr as a (2, n) array of [lower_distances, upper_distances].
    att = np.asarray(es.att)
    yerr = np.vstack([att - np.asarray(es.conf_int_low), np.asarray(es.conf_int_high) - att])
    ax.errorbar(et, att, yerr=yerr, fmt="o", color="#1f77b4", capsize=3, label="point + pointwise CI (survey-aware t)")
    ax.axvline(-0.5, color="red", linewidth=0.6, linestyle="--")
    ax.set_xlabel("event time (weeks since campaign launch)")
    ax.set_ylabel("per-$1K WAS-d_lower")
    ax.set_title("HAD event-study under SurveyDesign(weights, strata, psu, fpc)")
    ax.legend(loc="center right", fontsize=8)
    fig.tight_layout()
    plt.show()

Reading the event-study. Pre-launch horizons (e in {-4, -3, -2}) cover zero - no pre-trends - and the post-launch horizons (e in {0, 1, 2, 3}) sit on the true per-$1K slope of 100 with both pointwise and sup-t coverage. The sup-t band is a few percent wider than the pointwise interval; that gap is the multiple- horizon multiplicity correction. Per-horizon SEs are noticeably larger post-launch than pre because ΔY_{g,t} post-launch carries the treatment-effect contribution (cross-unit variation in slope * D_{g,F}), driving larger local-linear residual variance than the pre-period placebos see (placebos use the same D_{g,F} regressor, but ΔY_{g,t} is noise-only — see section 4).

6. Pretest workflow under the survey design#

did_had_pretest_workflow runs the three-step diagnostic battery (QUG; Stute or joint Stute; Yatchew) and composes a single verdict text. Under survey_design, the QUG step is permanently deferred (Phase 4.5 C0); the workflow returns report.qug=None and inserts a suffix on the verdict string flagging the deferral. The Stute and Yatchew steps run normally, with the Stute multiplier bootstrap operating under the now-supported stratified-clustered wild construction (lifted by PR #432).

Overall path (two-period collapse):

[ ]:
with warnings.catch_warnings():
    warnings.filterwarnings(
        "ignore", message=r".*pweight.*", category=UserWarning
    )
    warnings.filterwarnings(
        "ignore",
        message=r".*continuous_near_d_lower.*Assumption.*",
        category=UserWarning,
    )
    overall_report = did_had_pretest_workflow(
        panel_2p,
        outcome_col="screening_uptake",
        dose_col="spend_k",
        time_col="period",
        unit_col="state_id",
        survey_design=sd,
        aggregate="overall",
        n_bootstrap=N_BOOTSTRAP,
        seed=WORKFLOW_SEED,
    )
print("verdict :", overall_report.verdict)
print("all_pass:", overall_report.all_pass)
print("qug     :", overall_report.qug)
print()
print(overall_report.summary())

The verdict ends in (linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0) - that suffix is the caveat the workflow owes the practitioner under survey: read the linearity verdict knowing the boundary-presence diagnostic was not run. report.qug is None. The Stute and Yatchew rows in the formatted summary populate as usual.

Event-study path (full panel + joint diagnostics):

[ ]:
with warnings.catch_warnings():
    warnings.filterwarnings(
        "ignore", message=r".*pweight.*", category=UserWarning
    )
    warnings.filterwarnings(
        "ignore",
        message=r".*continuous_near_d_lower.*Assumption.*",
        category=UserWarning,
    )
    es_report = did_had_pretest_workflow(
        panel,
        outcome_col="screening_uptake",
        dose_col="spend_k",
        time_col="week",
        unit_col="state_id",
        first_treat_col="first_treat",
        survey_design=sd,
        aggregate="event_study",
        n_bootstrap=N_BOOTSTRAP,
        seed=WORKFLOW_SEED,
    )
print("verdict       :", es_report.verdict)
print("all_pass      :", es_report.all_pass)
print()
print("pretrends_joint :", es_report.pretrends_joint)
print("homogeneity_joint:", es_report.homogeneity_joint)
print()
print(es_report.summary())

The event-study verdict ends in the same C0 suffix; the formatted summary() block additionally renders a (QUG step skipped - permanently deferred under survey/weights per Phase 4.5 C0) note in place of the normal QUG row. pretrends_joint runs across the three pre-launch horizons (1, 2, 3) and homogeneity_joint runs across the four post-launch horizons (5, 6, 7, 8); both fail-to- reject under the linear-DGP null.

The Stute family on the survey path is a documented synthesis of ingredients from several papers, not a direct port: cluster-level multipliers (Cameron-Gelbach-Miller 2008), wild-bootstrap centering (Davidson-Flachaire 2008), within-stratum demeaning + Bessel rescale (Wu 1986; Liu 1988; Kreiss-Lahiri 2012), and the cluster-wild consistency of nonlinear functionals (Djogbenou-MacKinnon-Nielsen 2019), composed for the Hlavka-Huskova 2020 Stute residual bootstrap. No single paper covers the exact composition for HAD under stratified PSU sampling; the library’s apply_stratum_centering helper (bootstrap_utils.py:657-786) is what makes the composition turn-key. See docs/methodology/REGISTRY.md § HAD - Stute stratified survey-bootstrap calibration for the derivation.

7. Communicating the design-based verdict to leadership#

For the executive. The campaign moved screening uptake by about 100 per 1,000 adults per $1K of supplemental spend, with a 95% confidence interval that comfortably excludes zero. The survey-design analysis - which accounts for the way states were sampled within strata and clustered into PSUs - lands on the same bottom line as the simpler analysis, but with a slightly wider confidence interval. Both intervals cover the methodological ground truth (slope = 100). Pre-launch placebo periods showed no drift; the per-week post-launch effect is consistent across the rollout.

For the methodologist. The HAD pretest workflow runs two diagnostic passes — overall (Stute CvM + Yatchew-HR closed-form) on the two-period collapse, and event-study (joint pre-trends + joint homogeneity, both joint-Stute) on the full panel. Both passes fail-to-reject on this DGP. Both verdicts end in (linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0) — the load-bearing C0 caveat. On the event-study path report.yatchew and report.stute are None; those single-horizon tests are overall-only.

The design-based SE on the headline fit is ~10% larger than the naive SE — smaller than the inflation a CallawaySantAnna or LinearRegression coefficient would see at this PSU correlation, because HAD uses a local-linear boundary fit at d_lower to estimate the boundary-limit term in the WAS_d_lower formula (variance is dominated by the few states near the boundary; see §4).

8. Extensions + summary checklist#

Extensions to keep on the radar:

  • ``lonely_psu=’adjust’`` + singleton strata still raises NotImplementedError for the Stute family on the survey path (had.py:2139, had_pretests.py:1927). The pseudo-stratum centering transform that would make this case work is a separate derivation from PR #432; same gap as the HAD sup-t deviation documented at REGISTRY:2382. Stay with lonely_psu='remove' or 'certainty' for now.

  • Replicate-weight designs (BRR / Fay / JK1 / JKn / SDR) are defensively rejected at every survey-aware HAD entry point. The Rao-Wu / JKn bootstrap composition is a separate slot of work. T22 stays inside the FPC / Mammen-multiplier path; full replicate- weight HAD is a follow-up tutorial.

  • QUG under survey / weights is permanently deferred (Phase 4.5 C0). The smooth functionals route through the Stute family per Krieger-Pfeffermann (1997); QUG’s extreme order statistic does not. There is no intended migration path.

  • Trends-linear refit under the survey design is also currently rejected by HeterogeneousAdoptionDiD.fit; the trends_lin + survey_design composition is open as a separate slot.

What you can ship to leadership:

  • Single design-aware ATT with a survey-design CI (section 3, 5).

  • Verdict text that flags the C0 deferral verbatim - do not summarize it away (section 6).

  • Per-horizon ATT and sup-t cband when you want a rollout chart (section 5).

  • A side note on the modest SE inflation magnitude (section 4) so the audience does not over-correct against the simpler analysis.

  • The workflow’s report.summary() block as the methodologist-facing artifact; report.verdict as the executive-facing artifact.