{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bd4f9026",
   "metadata": {},
   "source": [
    "# Tutorial 22: Survey-Weighted HAD - The BRFSS-Shape Rollout\n",
    "\n",
    "The HAD series so far ran on simple iid panels. T20\n",
    "(`docs/tutorials/20_had_brand_campaign.ipynb`) walked the headline workflow;\n",
    "T21 (`docs/tutorials/21_had_pretest_workflow.ipynb`) layered in the pretest\n",
    "diagnostics. Realistic federal-survey designs (BRFSS / CPS / NHANES shape -\n",
    "stratified household samples with PSU clustering, post-stratification\n",
    "weights, and a finite population correction) became demonstrable end-to-end\n",
    "through the HAD workflow only on 2026-05-14, when the gate on\n",
    "`SurveyDesign(strata=...)` was lifted across the Stute pretest family. T22\n",
    "shows the full design-aware workflow on a stylized BRFSS-shape state\n",
    "rollout: 60 states organized as 5 strata x 6 PSUs/stratum x 2 states/PSU,\n",
    "with post-stratification raking weights, an iid PSU x period shock to\n",
    "inject real cluster correlation, and a single national health-campaign\n",
    "intensity rolled out at week 5.\n",
    "\n",
    "**Prerequisites.** T16 (`docs/tutorials/16_survey_did.ipynb`) for the\n",
    "`SurveyDesign` API and survey-DiD background; T20 for the HAD headline\n",
    "fit; T21 for the pretest workflow.\n",
    "\n",
    "**Sections:**\n",
    "\n",
    "1. Survey design adds two problems for HAD\n",
    "2. The panel: BRFSS-shape state rollout\n",
    "3. Naive vs survey-aware headline fit\n",
    "4. Why the SE inflation is modest for HAD\n",
    "5. Event-study under the survey design\n",
    "6. Pretest workflow under the survey design\n",
    "7. Communicating the design-based verdict to leadership\n",
    "8. Extensions + summary checklist\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a525e43",
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "from diff_diff import (\n",
    "    HAD,\n",
    "    SurveyDesign,\n",
    "    did_had_pretest_workflow,\n",
    "    generate_continuous_did_data,\n",
    ")\n",
    "\n",
    "try:\n",
    "    import matplotlib.pyplot as plt\n",
    "    HAS_MATPLOTLIB = True\n",
    "except ImportError:\n",
    "    HAS_MATPLOTLIB = False\n",
    "\n",
    "# Locked DGP parameters (mirrored in tests/test_t22_had_survey_design_drift.py).\n",
    "MAIN_SEED = 87\n",
    "N_UNITS = 60\n",
    "N_PERIODS = 8\n",
    "COHORT_PERIOD = 5\n",
    "TRUE_SLOPE = 100.0\n",
    "BASELINE_OUTCOME = 35.0\n",
    "DOSE_LOW = 5.0\n",
    "DOSE_HIGH = 50.0\n",
    "\n",
    "# Survey-design layer.\n",
    "N_STRATA = 5\n",
    "PSU_PER_STRATUM = 6\n",
    "STATES_PER_PSU = 2\n",
    "WEIGHT_CV_TARGET = 0.30\n",
    "FPC_PER_STRATUM = 30\n",
    "PSU_PERIOD_SHOCK_SD = 1.5\n",
    "SD_SEED = 87\n",
    "\n",
    "# Pretest workflow.\n",
    "WORKFLOW_SEED = 22\n",
    "N_BOOTSTRAP = 999\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "239a35f5",
   "metadata": {},
   "source": [
    "## 1. Survey design adds two problems for HAD\n",
    "\n",
    "When the panel comes out of a complex survey instead of a simple random\n",
    "sample, HAD's headline numbers acquire two new failure modes:\n",
    "\n",
    "- **Cluster correlation.** PSUs (households-within-tract,\n",
    "  individuals-within-PSU, states-within-region) share unobserved shocks.\n",
    "  Treating them as iid under-states the SE; a 95% pointwise CI written\n",
    "  off the naive variance under-covers the truth.\n",
    "- **Unequal sampling weights.** When the probability of being sampled is\n",
    "  not flat across the population, the unweighted point estimate is the\n",
    "  sample-quantity, not the population-quantity it usually has to\n",
    "  represent for policy.\n",
    "\n",
    "The survey-design fix is the standard library composition: stratified\n",
    "PSU-sandwich variance via Binder-TSL (Binder 1983; Lumley 2004) with a\n",
    "finite-population correction, weights as `pweight`, and PSU as the\n",
    "clustering unit. T16 walks the API end-to-end on a binary-cohort DiD;\n",
    "T22 specializes it for HAD. One headline difference to flag up front:\n",
    "HAD's WAS-d_lower estimand reads variance from a **local-linear\n",
    "neighborhood at the boundary** rather than from a regression coefficient\n",
    "on the full panel - so survey-aware SE inflation manifests differently\n",
    "than the rule-of-thumb you would expect from CallawaySantAnna or\n",
    "LinearRegression. We come back to this in section 4.\n",
    "\n",
    "The Phase 4.5 C0 deferral is the second user-facing caveat: the HAD\n",
    "QUG step (paper Section 4 step 1; tests `H_0: d_lower = 0` via the\n",
    "extreme order statistic of the dose) is **permanently deferred under\n",
    "survey/weights**. Extreme order statistics are not Hadamard-\n",
    "differentiable functionals of the empirical CDF, so no off-the-shelf\n",
    "survey-adjusted EVT fits the slot. The pretest workflow surfaces this\n",
    "deferral in the verdict text and on `report.qug` (set to `None`); the\n",
    "linearity-conditional remainder of the verdict still runs. Section 6\n",
    "walks the verdict text in detail.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f74f76d9",
   "metadata": {},
   "source": [
    "## 2. The panel: BRFSS-shape state rollout\n",
    "\n",
    "The DGP shape is identical to T20 (60 states x 8 weeks; rollout at\n",
    "w=5; dose ~ Uniform[$5K, $50K] of supplemental campaign spend per\n",
    "state; true ATT linear in dose with slope=100 screening uptake per\n",
    "$1K spend; outcome shifted by a baseline of 35 per 1,000 adults). The\n",
    "new piece is the survey layer: a permutation-based assignment of\n",
    "states to a 5 x 6 stratum x PSU grid (5 census-region x urbanicity\n",
    "strata, 6 PSUs/stratum, 2 states/PSU = 60), per-state post-\n",
    "stratification raking weights with CV ~ 0.30, and a PSU x period iid\n",
    "shock injected into the outcome so cluster correlation **survives DiD\n",
    "first-differencing**.\n",
    "\n",
    "That last point is load-bearing. A time-invariant per-PSU random\n",
    "intercept cancels exactly under DiD: the (post - pre) average shifts\n",
    "each state by a constant, and the slope on dose is unaffected.\n",
    "`generate_survey_did_data` documents this directly\n",
    "(`prep_dgp.py:1517-1523`): \"the time-invariant PSU RE cancels in the\n",
    "treatment-vs-control time-difference and the cluster-robust / survey\n",
    "SE would be smaller than naive OLS SE.\" The fix is a **PSU x period\n",
    "interaction shock**: an iid draw per (psu, period) cell. T16's\n",
    "reference helper carries the same composition (`psu_re_sd=2.0` paired\n",
    "with `psu_period_factor=1.0`).\n",
    "\n",
    "Helper kept in-notebook so practitioners can copy it into their own\n",
    "workflow:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3ad9844",
   "metadata": {},
   "outputs": [],
   "source": [
    "def attach_brfss_survey_columns(\n",
    "    panel,\n",
    "    *,\n",
    "    seed,\n",
    "    n_strata=N_STRATA,\n",
    "    psu_per_stratum=PSU_PER_STRATUM,\n",
    "    states_per_psu=STATES_PER_PSU,\n",
    "    weight_cv=WEIGHT_CV_TARGET,\n",
    "    fpc_per_stratum=FPC_PER_STRATUM,\n",
    "    psu_period_shock_sd=PSU_PERIOD_SHOCK_SD,\n",
    "):\n",
    "    \"\"\"Attach a BRFSS-shape survey design to a HAD panel.\n",
    "\n",
    "    Adds: stratum (int), psu_id (int), weight (float), fpc (float).\n",
    "    Also injects a PSU x period shock into the outcome so cluster\n",
    "    correlation survives DiD differencing. Constant-within-state on\n",
    "    the four design columns.\n",
    "    \"\"\"\n",
    "    rng = np.random.default_rng(seed)\n",
    "    state_ids = np.sort(panel[\"state_id\"].unique())\n",
    "    n_states = len(state_ids)\n",
    "    n_psu = n_strata * psu_per_stratum\n",
    "    if n_states != n_psu * states_per_psu:\n",
    "        raise ValueError(\n",
    "            f\"state count {n_states} must equal n_strata*psu_per_stratum*\"\n",
    "            f\"states_per_psu = {n_psu * states_per_psu}\"\n",
    "        )\n",
    "    perm = rng.permutation(n_states)\n",
    "    psu_block = np.repeat(np.arange(n_psu), states_per_psu)\n",
    "    psu_of_state = psu_block[np.argsort(perm)]\n",
    "    stratum_of_state = psu_of_state // psu_per_stratum\n",
    "    base_per_stratum = np.array([0.8, 0.9, 1.0, 1.1, 1.3])\n",
    "    base_w = base_per_stratum[stratum_of_state]\n",
    "    sigma = np.sqrt(np.log(1 + weight_cv ** 2))\n",
    "    pert = rng.lognormal(mean=-0.5 * sigma ** 2, sigma=sigma, size=n_states)\n",
    "    w_per_state = base_w * pert\n",
    "    state_lookup = pd.DataFrame({\n",
    "        \"state_id\": state_ids,\n",
    "        \"stratum\": stratum_of_state.astype(np.int64),\n",
    "        \"psu_id\": psu_of_state.astype(np.int64),\n",
    "        \"weight\": w_per_state,\n",
    "        \"fpc\": float(fpc_per_stratum),\n",
    "    })\n",
    "    out = panel.merge(state_lookup, on=\"state_id\", how=\"left\")\n",
    "    n_periods_local = int(panel[\"week\"].max() - panel[\"week\"].min() + 1)\n",
    "    psu_period_shocks = rng.normal(0.0, psu_period_shock_sd, size=(n_psu, n_periods_local))\n",
    "    week_min = int(panel[\"week\"].min())\n",
    "    shock_lookup = pd.DataFrame([\n",
    "        {\"psu_id\": int(p), \"week\": int(w + week_min), \"psu_period_shock\": float(psu_period_shocks[p, w])}\n",
    "        for p in range(n_psu)\n",
    "        for w in range(n_periods_local)\n",
    "    ])\n",
    "    out = out.merge(shock_lookup, on=[\"psu_id\", \"week\"], how=\"left\")\n",
    "    out[\"screening_uptake\"] = out[\"screening_uptake\"] + out[\"psu_period_shock\"]\n",
    "    return out.drop(columns=[\"psu_period_shock\"])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "00f75a66",
   "metadata": {},
   "outputs": [],
   "source": [
    "raw = generate_continuous_did_data(\n",
    "    n_units=N_UNITS,\n",
    "    n_periods=N_PERIODS,\n",
    "    cohort_periods=[COHORT_PERIOD],\n",
    "    never_treated_frac=0.0,\n",
    "    dose_distribution=\"uniform\",\n",
    "    dose_params={\"low\": DOSE_LOW, \"high\": DOSE_HIGH},\n",
    "    att_function=\"linear\",\n",
    "    att_intercept=0.0,\n",
    "    att_slope=TRUE_SLOPE,\n",
    "    unit_fe_sd=8.0,\n",
    "    time_trend=0.5,\n",
    "    noise_sd=2.0,\n",
    "    seed=MAIN_SEED,\n",
    ")\n",
    "panel = raw.copy()\n",
    "panel.loc[panel[\"period\"] < panel[\"first_treat\"], \"dose\"] = 0.0\n",
    "panel = panel.rename(columns={\n",
    "    \"unit\": \"state_id\",\n",
    "    \"period\": \"week\",\n",
    "    \"outcome\": \"screening_uptake\",\n",
    "    \"dose\": \"spend_k\",\n",
    "})\n",
    "panel[\"screening_uptake\"] = panel[\"screening_uptake\"] + BASELINE_OUTCOME\n",
    "\n",
    "panel = attach_brfss_survey_columns(panel, seed=SD_SEED)\n",
    "\n",
    "per_state = panel.groupby(\"state_id\").agg(\n",
    "    weight=(\"weight\", \"first\"),\n",
    "    stratum=(\"stratum\", \"first\"),\n",
    "    psu_id=(\"psu_id\", \"first\"),\n",
    ").reset_index()\n",
    "print(f\"states                   = {panel['state_id'].nunique()}\")\n",
    "print(f\"strata                   = {panel['stratum'].nunique()} ({sorted(panel['stratum'].unique())})\")\n",
    "print(f\"PSUs                     = {panel['psu_id'].nunique()}\")\n",
    "print(f\"weight CV across states  = {per_state['weight'].std() / per_state['weight'].mean():.3f}\")\n",
    "print(f\"weight range             = [{per_state['weight'].min():.3f}, {per_state['weight'].max():.3f}]\")\n",
    "print(f\"FPC (per row, constant)  = {panel['fpc'].iloc[0]:.0f}\")\n",
    "panel.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2056e977",
   "metadata": {},
   "outputs": [],
   "source": [
    "sd = SurveyDesign(\n",
    "    weights=\"weight\",\n",
    "    strata=\"stratum\",\n",
    "    psu=\"psu_id\",\n",
    "    fpc=\"fpc\",\n",
    ")\n",
    "print(sd)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "326875a6",
   "metadata": {},
   "source": [
    "## 3. Naive vs survey-aware headline fit\n",
    "\n",
    "T20's headline path collapses to two periods (pre-mean vs post-mean\n",
    "per state) and fits HAD with `design=\"auto\"` — the heuristic lands\n",
    "on `continuous_near_d_lower` (Design 1), with the target estimand\n",
    "`WAS_d_lower`. T22 fits the same configuration twice: once naive\n",
    "(no `survey_design` argument), once survey-aware. Both fits share\n",
    "the same local-linear machinery at d_lower; the survey path\n",
    "additionally consumes the weights in the local-linear `tau_bc`\n",
    "boundary fit, in the weighted ΔY mean, and in the weighted\n",
    "denominator `E_w[D - d_lower]`. On this DGP the weight CV (~0.30)\n",
    "and dose distribution do not co-vary strongly enough to shift the\n",
    "boundary slope materially, so the two ATTs land close. The SE and\n",
    "CI differ because the survey path folds PSU clustering and FPC\n",
    "into the variance via Binder TSL.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3d14a66",
   "metadata": {},
   "outputs": [],
   "source": [
    "p2 = panel.copy()\n",
    "p2[\"period\"] = (p2[\"week\"] >= COHORT_PERIOD).astype(int) + 1\n",
    "panel_2p = p2.groupby([\"state_id\", \"period\"], as_index=False).agg(\n",
    "    screening_uptake=(\"screening_uptake\", \"mean\"),\n",
    "    spend_k=(\"spend_k\", \"mean\"),\n",
    "    stratum=(\"stratum\", \"first\"),\n",
    "    psu_id=(\"psu_id\", \"first\"),\n",
    "    weight=(\"weight\", \"first\"),\n",
    "    fpc=(\"fpc\", \"first\"),\n",
    ")\n",
    "panel_2p.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e875f23",
   "metadata": {},
   "outputs": [],
   "source": [
    "with warnings.catch_warnings():\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\", message=r\".*pweight.*\", category=UserWarning\n",
    "    )\n",
    "    # NB: we deliberately do NOT filter the\n",
    "    # `continuous_near_d_lower ... Assumption 5 or 6` warning here\n",
    "    # so it fires once on the headline fit (the §3 interpretation\n",
    "    # cell discusses it in prose). Subsequent fit cells suppress it\n",
    "    # as redundant.\n",
    "    naive = HAD(design=\"auto\").fit(\n",
    "        panel_2p,\n",
    "        outcome_col=\"screening_uptake\",\n",
    "        dose_col=\"spend_k\",\n",
    "        time_col=\"period\",\n",
    "        unit_col=\"state_id\",\n",
    "    )\n",
    "    survey = HAD(design=\"auto\").fit(\n",
    "        panel_2p,\n",
    "        outcome_col=\"screening_uptake\",\n",
    "        dose_col=\"spend_k\",\n",
    "        time_col=\"period\",\n",
    "        unit_col=\"state_id\",\n",
    "        survey_design=sd,\n",
    "    )\n",
    "print(f\"design auto-detected: naive={naive.design}, survey={survey.design}\")\n",
    "print(f\"target parameter   : naive={naive.target_parameter}, survey={survey.target_parameter}\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1c036649",
   "metadata": {},
   "outputs": [],
   "source": [
    "def _row(name, r):\n",
    "    lo, hi = r.conf_int\n",
    "    return {\n",
    "        \"fit\": name,\n",
    "        \"ATT\": round(float(r.att), 3),\n",
    "        \"SE\": round(float(r.se), 3),\n",
    "        \"CI low\": round(float(lo), 3),\n",
    "        \"CI high\": round(float(hi), 3),\n",
    "        \"CI width\": round(float(hi - lo), 3),\n",
    "        \"covers truth (slope=100)\": \"YES\" if lo <= TRUE_SLOPE <= hi else \"NO\",\n",
    "    }\n",
    "\n",
    "comparison = pd.DataFrame([_row(\"naive\", naive), _row(\"survey\", survey)])\n",
    "print(comparison.to_string(index=False))\n",
    "print()\n",
    "print(f\"se_ratio (survey / naive) = {survey.se / naive.se:.3f}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a3a5aa6",
   "metadata": {},
   "source": [
    "**Reading the table.** Both fits land on the same identification\n",
    "path (`continuous_near_d_lower`), report point estimates very close to\n",
    "the true slope of 100, and produce CIs that cover the truth. The\n",
    "survey SE is **larger** than the naive SE (the design machinery picks\n",
    "up the cluster correlation from the PSU x period shock). The\n",
    "inflation is modest - around 1.10x - which is smaller than the\n",
    "inflation a similar `weight_cv` and PSU shock would produce on, say,\n",
    "a CallawaySantAnna ATT. That's a feature of HAD's local-linear\n",
    "estimand, not a bug. We unpack it in the next section.\n",
    "\n",
    "\n",
    "**A non-testable identifying assumption.** Design 1 requires\n",
    "**Assumption 6** for point identification of `WAS_d_lower` (or\n",
    "**Assumption 5** for sign identification only) — both are about\n",
    "local linearity of the dose-response near `d_lower` and are **not\n",
    "testable from data**. The §6 linearity diagnostics (Stute,\n",
    "Yatchew, joint pretrends/homogeneity) are necessary but not\n",
    "sufficient. Assumption 6 itself is justified from domain\n",
    "knowledge (is the marginal effect of the next $1K of supplemental\n",
    "spend roughly constant in the $5K-$50K range?). The library fires\n",
    "a `UserWarning` on every Design 1 fit; the headline cell above\n",
    "lets it surface, subsequent cells filter it as redundant. This is\n",
    "the load-bearing methodology caveat alongside the QUG-under-survey\n",
    "deferral (§6)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "637dd580",
   "metadata": {},
   "source": [
    "## 4. Why the SE inflation is modest for HAD\n",
    "\n",
    "**The intuition.** `WAS_d_lower` is the average slope above d_lower,\n",
    "but its leading-order variance reads off a local-linear boundary\n",
    "fit at `d_lower` — and that fit only weights units near the\n",
    "boundary. With dose ~ Uniform[5, 50] and 60 states, only a handful\n",
    "of states sit close to d_lower ~ 5, and those are the units whose\n",
    "influence functions dominate `Var(WAS_d_lower)`. The PSU-level\n",
    "cluster correlation can amplify the variance only as much as those\n",
    "few units are correlated with PSU-mates. With 2 states/PSU and\n",
    "only a small share of states near the boundary, the within-PSU\n",
    "correlation has a small lever to act on.\n",
    "\n",
    "**Formal definition.** `WAS_{d̲} = (E[ΔY] - lim_{d↓d̲} E[ΔY | D_2\n",
    "≤ d]) / E[D_2 - d̲]` (REGISTRY § HeterogeneousAdoptionDiD;\n",
    "`had.py:21-31`). The estimator uses a local-linear boundary fit at\n",
    "`d_lower` to estimate the `lim_{d↓d̲} E[ΔY | D_2 ≤ d]` term — the\n",
    "only component requiring nonparametric identification.\n",
    "\n",
    "Contrast with the event-study path: each event-time horizon is a\n",
    "**separate** local-linear fit on that horizon's first differences\n",
    "`ΔY_{g,t} = Y_{g,t} - Y_{g,F-1}` against the common regressor\n",
    "`D_{g,F}` (paper Appendix B.2). The pointwise per-horizon SE is\n",
    "computed from each horizon's own residual variance — not aggregated\n",
    "across post-period observations. So the per-horizon survey-vs-naive\n",
    "ratio is an empirical property of how PSU correlation interacts with\n",
    "each horizon's `ΔY` distribution on this seeded DGP, not a\n",
    "method-contract guarantee. We compute it below to inspect:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4b6ec75",
   "metadata": {},
   "outputs": [],
   "source": [
    "with warnings.catch_warnings():\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\", message=r\".*pweight.*\", category=UserWarning\n",
    "    )\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\",\n",
    "        message=r\".*continuous_near_d_lower.*Assumption.*\",\n",
    "        category=UserWarning,\n",
    "    )\n",
    "    naive_es = HAD(design=\"auto\").fit(\n",
    "        panel,\n",
    "        outcome_col=\"screening_uptake\",\n",
    "        dose_col=\"spend_k\",\n",
    "        time_col=\"week\",\n",
    "        unit_col=\"state_id\",\n",
    "        first_treat_col=\"first_treat\",\n",
    "        aggregate=\"event_study\",\n",
    "    )\n",
    "    survey_es_for_ratio = HAD(design=\"auto\").fit(\n",
    "        panel,\n",
    "        outcome_col=\"screening_uptake\",\n",
    "        dose_col=\"spend_k\",\n",
    "        time_col=\"week\",\n",
    "        unit_col=\"state_id\",\n",
    "        first_treat_col=\"first_treat\",\n",
    "        aggregate=\"event_study\",\n",
    "        survey_design=sd,\n",
    "    )\n",
    "\n",
    "ratios = pd.DataFrame({\n",
    "    \"event_time\": list(naive_es.event_times),\n",
    "    \"naive_se\": [round(float(s), 3) for s in naive_es.se],\n",
    "    \"survey_se\": [round(float(s), 3) for s in survey_es_for_ratio.se],\n",
    "    \"se_ratio\": [round(float(s_s / s_n), 3) for s_n, s_s in zip(naive_es.se, survey_es_for_ratio.se)],\n",
    "})\n",
    "print(ratios.to_string(index=False))\n",
    "print()\n",
    "print(f\"overall SE ratio (from section 3) = {survey.se / naive.se:.3f}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1200aaee",
   "metadata": {},
   "source": [
    "The per-horizon table reflects the IF-spread argument. Per paper\n",
    "Appendix B.2 the event-study uses `D_{g,F}` (the period-F dose) as\n",
    "the SAME dose regressor for every horizon - pre-period placebos and\n",
    "post-period horizons alike. Per-horizon SEs differ because the\n",
    "outcome side `ΔY_{g,t} = Y_{g,t} - Y_{g,F-1}` does. Pre-period\n",
    "placebos have small `ΔY` (no treatment signal — just within-pre noise\n",
    "between t and F-1) so the local-linear at d_lower fits low residual\n",
    "variance and reads small SEs. Post-period horizons have `ΔY` that\n",
    "scales with the treatment effect (slope * `D_{g,F}` plus noise), so\n",
    "the local-linear residual variance is larger and the SE is larger.\n",
    "The survey machinery folds PSU clustering into both surfaces and\n",
    "produces a moderate per-horizon inflation on top. The takeaway for the practitioner: **for HAD specifically, do\n",
    "not eyeball SE inflation against your DEFF expectation from\n",
    "regression-coefficient examples**. Read the design-aware SE off the\n",
    "fit object and report it; the magnitude of the survey-vs-naive ratio\n",
    "is determined by the IF-weighting of your particular estimand.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc9b05ca",
   "metadata": {},
   "source": [
    "## 5. Event-study under the survey design\n",
    "\n",
    "Refit with `aggregate=\"event_study\"` and `cband=True` to get\n",
    "per-horizon ATT estimates plus a sup-t confidence band that adjusts\n",
    "for the multiple-horizon comparison. The cband is computed via a\n",
    "multiplier bootstrap that aggregates the per-PSU IF tensor under\n",
    "the survey design.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c35c6e78",
   "metadata": {},
   "outputs": [],
   "source": [
    "with warnings.catch_warnings():\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\", message=r\".*pweight.*\", category=UserWarning\n",
    "    )\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\",\n",
    "        message=r\".*continuous_near_d_lower.*Assumption.*\",\n",
    "        category=UserWarning,\n",
    "    )\n",
    "    es = HAD(design=\"auto\").fit(\n",
    "        panel,\n",
    "        outcome_col=\"screening_uptake\",\n",
    "        dose_col=\"spend_k\",\n",
    "        time_col=\"week\",\n",
    "        unit_col=\"state_id\",\n",
    "        first_treat_col=\"first_treat\",\n",
    "        aggregate=\"event_study\",\n",
    "        survey_design=sd,\n",
    "        cband=True,\n",
    "    )\n",
    "\n",
    "es_table = pd.DataFrame({\n",
    "    \"event_time\": list(es.event_times),\n",
    "    \"att\": [round(float(a), 2) for a in es.att],\n",
    "    \"se\": [round(float(s), 3) for s in es.se],\n",
    "    \"ci_low\": [round(float(c), 2) for c in es.conf_int_low],\n",
    "    \"ci_high\": [round(float(c), 2) for c in es.conf_int_high],\n",
    "    \"cband_low\": [round(float(c), 2) for c in es.cband_low],\n",
    "    \"cband_high\": [round(float(c), 2) for c in es.cband_high],\n",
    "})\n",
    "print(es_table.to_string(index=False))\n",
    "print()\n",
    "print(f\"cband critical value = {es.cband_crit_value:.4f} ({es.cband_method})\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "450c47b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "if HAS_MATPLOTLIB:\n",
    "    fig, ax = plt.subplots(figsize=(8, 4.2))\n",
    "    et = np.asarray(es.event_times, dtype=float)\n",
    "    ax.axhline(0.0, color=\"black\", linewidth=0.5)\n",
    "    ax.axhline(TRUE_SLOPE, color=\"gray\", linewidth=0.6, linestyle=\":\", label=\"true slope = 100\")\n",
    "    ax.fill_between(et, np.asarray(es.cband_low), np.asarray(es.cband_high), alpha=0.15, label=\"sup-t cband (survey)\")\n",
    "    # Use the estimator's stored pointwise CI endpoints (built from t\n",
    "    # critical values with df_survey under SurveyDesign — NOT hard-coded\n",
    "    # 1.96 * se, which would silently understate uncertainty under\n",
    "    # survey-aware inference). matplotlib errorbar takes asymmetric\n",
    "    # yerr as a (2, n) array of [lower_distances, upper_distances].\n",
    "    att = np.asarray(es.att)\n",
    "    yerr = np.vstack([att - np.asarray(es.conf_int_low), np.asarray(es.conf_int_high) - att])\n",
    "    ax.errorbar(et, att, yerr=yerr, fmt=\"o\", color=\"#1f77b4\", capsize=3, label=\"point + pointwise CI (survey-aware t)\")\n",
    "    ax.axvline(-0.5, color=\"red\", linewidth=0.6, linestyle=\"--\")\n",
    "    ax.set_xlabel(\"event time (weeks since campaign launch)\")\n",
    "    ax.set_ylabel(\"per-$1K WAS-d_lower\")\n",
    "    ax.set_title(\"HAD event-study under SurveyDesign(weights, strata, psu, fpc)\")\n",
    "    ax.legend(loc=\"center right\", fontsize=8)\n",
    "    fig.tight_layout()\n",
    "    plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7eff930",
   "metadata": {},
   "source": [
    "**Reading the event-study.** Pre-launch horizons (e in\n",
    "{-4, -3, -2}) cover zero - no pre-trends - and the post-launch\n",
    "horizons (e in {0, 1, 2, 3}) sit on the true per-$1K slope of 100\n",
    "with both pointwise and sup-t coverage. The sup-t band is a few\n",
    "percent wider than the pointwise interval; that gap is the multiple-\n",
    "horizon multiplicity correction. Per-horizon SEs are noticeably\n",
    "larger post-launch than pre because `ΔY_{g,t}` post-launch carries\n",
    "the treatment-effect contribution (cross-unit variation in\n",
    "`slope * D_{g,F}`), driving larger local-linear residual variance\n",
    "than the pre-period placebos see (placebos use the same `D_{g,F}`\n",
    "regressor, but `ΔY_{g,t}` is noise-only — see section 4).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95252100",
   "metadata": {},
   "source": [
    "## 6. Pretest workflow under the survey design\n",
    "\n",
    "`did_had_pretest_workflow` runs the three-step diagnostic battery\n",
    "(QUG; Stute or joint Stute; Yatchew) and composes a single verdict\n",
    "text. Under `survey_design`, the QUG step is **permanently deferred**\n",
    "(Phase 4.5 C0); the workflow returns `report.qug=None` and inserts a\n",
    "suffix on the verdict string flagging the deferral. The Stute and\n",
    "Yatchew steps run normally, with the Stute multiplier bootstrap\n",
    "operating under the now-supported stratified-clustered wild\n",
    "construction (lifted by PR #432).\n",
    "\n",
    "**Overall path** (two-period collapse):\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1dc9e20e",
   "metadata": {},
   "outputs": [],
   "source": [
    "with warnings.catch_warnings():\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\", message=r\".*pweight.*\", category=UserWarning\n",
    "    )\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\",\n",
    "        message=r\".*continuous_near_d_lower.*Assumption.*\",\n",
    "        category=UserWarning,\n",
    "    )\n",
    "    overall_report = did_had_pretest_workflow(\n",
    "        panel_2p,\n",
    "        outcome_col=\"screening_uptake\",\n",
    "        dose_col=\"spend_k\",\n",
    "        time_col=\"period\",\n",
    "        unit_col=\"state_id\",\n",
    "        survey_design=sd,\n",
    "        aggregate=\"overall\",\n",
    "        n_bootstrap=N_BOOTSTRAP,\n",
    "        seed=WORKFLOW_SEED,\n",
    "    )\n",
    "print(\"verdict :\", overall_report.verdict)\n",
    "print(\"all_pass:\", overall_report.all_pass)\n",
    "print(\"qug     :\", overall_report.qug)\n",
    "print()\n",
    "print(overall_report.summary())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbcaf022",
   "metadata": {},
   "source": [
    "The verdict ends in\n",
    "`(linearity-conditional verdict; QUG-under-survey deferred per Phase\n",
    "4.5 C0)` - that suffix is **the** caveat the workflow owes the\n",
    "practitioner under survey: read the linearity verdict knowing the\n",
    "boundary-presence diagnostic was not run. `report.qug` is `None`.\n",
    "The Stute and Yatchew rows in the formatted summary populate as\n",
    "usual.\n",
    "\n",
    "**Event-study path** (full panel + joint diagnostics):\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94b0abb9",
   "metadata": {},
   "outputs": [],
   "source": [
    "with warnings.catch_warnings():\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\", message=r\".*pweight.*\", category=UserWarning\n",
    "    )\n",
    "    warnings.filterwarnings(\n",
    "        \"ignore\",\n",
    "        message=r\".*continuous_near_d_lower.*Assumption.*\",\n",
    "        category=UserWarning,\n",
    "    )\n",
    "    es_report = did_had_pretest_workflow(\n",
    "        panel,\n",
    "        outcome_col=\"screening_uptake\",\n",
    "        dose_col=\"spend_k\",\n",
    "        time_col=\"week\",\n",
    "        unit_col=\"state_id\",\n",
    "        first_treat_col=\"first_treat\",\n",
    "        survey_design=sd,\n",
    "        aggregate=\"event_study\",\n",
    "        n_bootstrap=N_BOOTSTRAP,\n",
    "        seed=WORKFLOW_SEED,\n",
    "    )\n",
    "print(\"verdict       :\", es_report.verdict)\n",
    "print(\"all_pass      :\", es_report.all_pass)\n",
    "print()\n",
    "print(\"pretrends_joint :\", es_report.pretrends_joint)\n",
    "print(\"homogeneity_joint:\", es_report.homogeneity_joint)\n",
    "print()\n",
    "print(es_report.summary())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "211fab22",
   "metadata": {},
   "source": [
    "The event-study `verdict` ends in the same C0 suffix; the\n",
    "formatted `summary()` block additionally renders a `(QUG step skipped\n",
    "- permanently deferred under survey/weights per Phase 4.5 C0)` note\n",
    "in place of the normal QUG row. `pretrends_joint` runs across the\n",
    "three pre-launch horizons (1, 2, 3) and `homogeneity_joint` runs\n",
    "across the four post-launch horizons (5, 6, 7, 8); both fail-to-\n",
    "reject under the linear-DGP null.\n",
    "\n",
    "The Stute family on the survey path is a **documented synthesis** of\n",
    "ingredients from several papers, not a direct port: cluster-level\n",
    "multipliers (Cameron-Gelbach-Miller 2008), wild-bootstrap centering\n",
    "(Davidson-Flachaire 2008), within-stratum demeaning + Bessel rescale\n",
    "(Wu 1986; Liu 1988; Kreiss-Lahiri 2012), and the cluster-wild\n",
    "consistency of nonlinear functionals (Djogbenou-MacKinnon-Nielsen\n",
    "2019), composed for the Hlavka-Huskova 2020 Stute residual\n",
    "bootstrap. No single paper covers the exact composition for HAD\n",
    "under stratified PSU sampling; the library's `apply_stratum_centering`\n",
    "helper (`bootstrap_utils.py:657-786`) is what makes the composition\n",
    "turn-key. See `docs/methodology/REGISTRY.md` § HAD - Stute\n",
    "stratified survey-bootstrap calibration for the derivation.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a40a113e",
   "metadata": {},
   "source": [
    "## 7. Communicating the design-based verdict to leadership\n",
    "\n",
    "> **For the executive.** The campaign moved screening uptake by\n",
    "> **about 100 per 1,000 adults per $1K of supplemental spend**, with\n",
    "> a 95% confidence interval that comfortably excludes zero. The\n",
    "> survey-design analysis - which accounts for the way states were\n",
    "> sampled within strata and clustered into PSUs - lands on the same\n",
    "> bottom line as the simpler analysis, but with a slightly wider\n",
    "> confidence interval. Both intervals cover the methodological\n",
    "> ground truth (slope = 100). Pre-launch placebo periods showed no\n",
    "> drift; the per-week post-launch effect is consistent across the\n",
    "> rollout.\n",
    "\n",
    "> **For the methodologist.** The HAD pretest workflow runs two\n",
    "> diagnostic passes — overall (`Stute` CvM + `Yatchew-HR`\n",
    "> closed-form) on the two-period collapse, and event-study\n",
    "> (`joint pre-trends` + `joint homogeneity`, both joint-Stute)\n",
    "> on the full panel. Both passes fail-to-reject on this DGP. Both\n",
    "> verdicts end in `(linearity-conditional verdict; QUG-under-survey\n",
    "> deferred per Phase 4.5 C0)` — the load-bearing C0 caveat. On the\n",
    "> event-study path `report.yatchew` and `report.stute` are `None`;\n",
    "> those single-horizon tests are overall-only.\n",
    ">\n",
    "> The design-based SE on the headline fit is ~10% larger than the\n",
    "> naive SE — smaller than the inflation a CallawaySantAnna or\n",
    "> LinearRegression coefficient would see at this PSU correlation,\n",
    "> because HAD uses a local-linear boundary fit at `d_lower` to\n",
    "> estimate the boundary-limit term in the `WAS_d_lower` formula\n",
    "> (variance is dominated by the few states near the boundary; see\n",
    "> §4).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d79ad79a",
   "metadata": {},
   "source": [
    "## 8. Extensions + summary checklist\n",
    "\n",
    "**Extensions to keep on the radar:**\n",
    "\n",
    "- **`lonely_psu='adjust'` + singleton strata** still raises\n",
    "  `NotImplementedError` for the Stute family on the survey path\n",
    "  (`had.py:2139`, `had_pretests.py:1927`). The pseudo-stratum\n",
    "  centering transform that would make this case work is a separate\n",
    "  derivation from PR #432; same gap as the HAD sup-t deviation\n",
    "  documented at REGISTRY:2382. Stay with `lonely_psu='remove'` or\n",
    "  `'certainty'` for now.\n",
    "- **Replicate-weight designs** (BRR / Fay / JK1 / JKn / SDR) are\n",
    "  defensively rejected at every survey-aware HAD entry point. The\n",
    "  Rao-Wu / JKn bootstrap composition is a separate slot of work.\n",
    "  T22 stays inside the FPC / Mammen-multiplier path; full replicate-\n",
    "  weight HAD is a follow-up tutorial.\n",
    "- **QUG under survey / weights is permanently deferred** (Phase 4.5\n",
    "  C0). The smooth functionals route through the Stute family per\n",
    "  Krieger-Pfeffermann (1997); QUG's extreme order statistic does\n",
    "  not. There is no intended migration path.\n",
    "- **Trends-linear refit** under the survey design is also currently\n",
    "  rejected by `HeterogeneousAdoptionDiD.fit`; the trends_lin +\n",
    "  survey_design composition is open as a separate slot.\n",
    "\n",
    "**What you can ship to leadership:**\n",
    "\n",
    "- Single design-aware ATT with a survey-design CI (section 3, 5).\n",
    "- Verdict text that flags the C0 deferral verbatim - do not summarize\n",
    "  it away (section 6).\n",
    "- Per-horizon ATT and sup-t cband when you want a rollout chart\n",
    "  (section 5).\n",
    "- A side note on the modest SE inflation magnitude (section 4) so the\n",
    "  audience does not over-correct against the simpler analysis.\n",
    "- The workflow's `report.summary()` block as the methodologist-facing\n",
    "  artifact; `report.verdict` as the executive-facing artifact.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}