Interactive notebook

This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.

Tutorial 21: HAD Pre-test Workflow - Running the Pre-test Diagnostics on the Brand Campaign Panel#

Tutorial 20 fit HeterogeneousAdoptionDiD (HAD) on a regional brand-campaign panel and reported a per-dollar lift, with a brief visual placebo check at the end. We deliberately deferred the formal pre-test workflow to this tutorial, with a forward pointer in T20’s “Extensions” section.

This tutorial picks up where T20 left off. We re-run the brand campaign on a panel close in shape to T20’s, then walk through HAD’s composite pre-test workflow did_had_pretest_workflow and read the diagnostics for paper Section 4.2 of de Chaisemartin, Ciccia, D’Haultfoeuille, & Knau (2026). We start with the two-period (aggregate="overall") workflow, observe that it does not run the parallel pre-trends step, and then upgrade to the multi-period (aggregate="event_study") workflow that adds the joint Stute pre-trends and joint homogeneity diagnostics. None of the diagnostics in this tutorial reject; we walk through what that does and does not let us conclude. A side panel compares the two null= modes of the Yatchew-HR test, including the recently-shipped null="mean_independence" mode (R-parity with YatchewTest::yatchew_test(order=0)).

1. The Pre-test Battery#

de Chaisemartin et al. (2026) Section 4.2 lays out a four-step pre-test workflow for HAD identification:

  1. Step 1 - QUG support-infimum test (paper Theorem 4): is the support of the dose distribution consistent with d_lower = 0 (Design 1’, continuous_at_zero, target = WAS)? Or is the support strictly above zero (Design 1, continuous_near_d_lower, target = WAS_d_lower)? The two designs identify different estimands; getting this right matters.

  2. Step 2 - Parallel pre-trends (paper Assumption 7): does the differenced outcome behave the same way across dose groups in the pre-treatment periods? Same identifying logic as classic DiD.

  3. Step 3 - Linearity / homogeneity (paper Assumption 8): is E[dY | D] linear in D, so that the WAS reading reflects the average per-dose marginal effect rather than masking heterogeneity bias?

  4. Step 4 - Decision rule: if Steps 1-3 all fail to reject, TWFE may be used to estimate the treatment effect (paper Section 4.3).

The library bundles the testable steps into one entry point: did_had_pretest_workflow. It dispatches to a two-period implementation (steps 1 + 3 only - step 2 needs at least two pre-periods) or a multi-period implementation (steps 1 + 2 + 3 jointly). The Yatchew-HR test from Step 3 is also exposed standalone with two null modes; we exercise both in the side panel.

Non-testable identification caveat (separate from the four-step workflow). Identification of the WAS estimand under Design 1’ (continuous_at_zero, target = WAS) requires Assumption 3 (uniform continuity of d -> Y_2(d) at zero, holds if the dose-response is Lipschitz; not testable). The Design 1 paths (continuous_near_d_lower / mass_point, target = WAS_d_lower) instead need Assumption 5 (sign identification) or Assumption 6 (WAS_d_lower point identification) - that is the caveat T20’s tutorial flagged because T20’s panel was Design 1. T21’s panel resolves to Design 1’ (see Section 2 + Section 3), so the relevant non-testable caveat here is Assumption 3, NOT Assumptions 5/6. The library reflects this: it emits a UserWarning about Assumption 5/6 on Design 1 fits and does not emit it on continuous_at_zero (Design 1’) fits.

2. The Panel#

We use a panel close in shape to T20’s brand campaign (60 DMAs over 8 weeks, regional add-on spend on top of a national TV blast at week 5, true per-$1K lift = 100 weekly visits). The one difference: regional spend in this tutorial is drawn from Uniform[$0.01K, $50K] instead of T20’s Uniform[$5K, $50K]. The true support of the dose distribution is therefore strictly positive (down to about $10), but very near zero - some markets barely participated in the regional add-on. Two independent things follow from that small D_(1). (a) The QUG test in Step 1 will fail to reject H0: d_lower = 0, which means the data are statistically consistent with the continuous_at_zero (Design 1’) identification path even though the true simulation lower bound is positive. (b) Independently, HAD’s design="auto" detection - which uses a separate min/median heuristic, NOT the QUG p-value (continuous_at_zero fires when d.min() < 0.01 * median(|d|)) - also lands on continuous_at_zero here, because D_(1) / median(D) is below 0.01 on this panel. Both checks point to the same identification path on this panel, but they are independent rules; the workflow’s _detect_design does not consume the pre-test outcomes. The point of this tutorial is not to assert that the data is Design 1’ from the DGP up; the point is to read what the workflow concludes from the data and what it leaves open.

[1]:
import numpy as np
import pandas as pd

from diff_diff import generate_continuous_did_data

MAIN_SEED = 87
N_UNITS = 60
N_PERIODS = 8
COHORT_PERIOD = 5
TRUE_SLOPE = 100.0
BASELINE_VISITS = 5000.0
DOSE_LOW = 0.01
DOSE_HIGH = 50.0

raw = generate_continuous_did_data(
    n_units=N_UNITS,
    n_periods=N_PERIODS,
    cohort_periods=[COHORT_PERIOD],
    never_treated_frac=0.0,
    dose_distribution="uniform",
    dose_params={"low": DOSE_LOW, "high": DOSE_HIGH},
    att_function="linear",
    att_intercept=0.0,
    att_slope=TRUE_SLOPE,
    unit_fe_sd=8.0,
    time_trend=0.5,
    noise_sd=2.0,
    seed=MAIN_SEED,
)
panel = raw.copy()
panel.loc[panel["period"] < panel["first_treat"], "dose"] = 0.0
panel = panel.rename(
    columns={
        "unit": "dma_id",
        "period": "week",
        "outcome": "weekly_visits",
        "dose": "regional_spend_k",
    }
)
panel["weekly_visits"] = panel["weekly_visits"] + BASELINE_VISITS

post = panel[panel["week"] >= COHORT_PERIOD]
print(f"Panel: {panel['dma_id'].nunique()} DMAs x {panel['week'].nunique()} weeks")
print(
    f"Regional spend (post-launch): "
    f"${post['regional_spend_k'].min():.2f}K - "
    f"${post['regional_spend_k'].max():.2f}K"
)
print(f"True per-$1K lift (locked at seed): {TRUE_SLOPE} weekly visits")

Panel: 60 DMAs x 8 weeks
Regional spend (post-launch): $0.18K - $49.00K
True per-$1K lift (locked at seed): 100.0 weekly visits

3. Step 1: The Overall Workflow (Two-Period Path)#

T20’s headline used a two-period collapse of the panel - average pre-launch outcome per DMA against average post-launch outcome per DMA. That’s also the natural input shape for HAD’s two-period (aggregate="overall") pre-test workflow, which runs paper Step 1 (QUG) + paper Step 3 (linearity, via Stute and Yatchew-HR). Step 2 (parallel pre-trends) is not implemented on this path - a single pre-period structurally can’t support a pre-trends test - and the workflow’s verdict says so explicitly.

We collapse to two periods (pre = avg over weeks 1-4, post = avg over weeks 5-8), then call the workflow.

[2]:
from diff_diff import did_had_pretest_workflow

p = panel.copy()
p["period"] = (p["week"] >= COHORT_PERIOD).astype(int) + 1  # 1=pre, 2=post
two_period = p.groupby(["dma_id", "period"], as_index=False).agg(
    weekly_visits=("weekly_visits", "mean"),
    regional_spend_k=("regional_spend_k", "mean"),
)
# Workflow invariant: pre-period dose = 0 for every unit.
two_period.loc[two_period["period"] == 1, "regional_spend_k"] = 0.0
# first_treat in the collapsed coordinates: 2 (the post-period) for every DMA.
two_period["first_treat"] = 2

overall_report = did_had_pretest_workflow(
    data=two_period,
    outcome_col="weekly_visits",
    dose_col="regional_spend_k",
    time_col="period",
    unit_col="dma_id",
    first_treat_col="first_treat",
    alpha=0.05,
    n_bootstrap=999,
    seed=21,
    aggregate="overall",
)

print(overall_report.verdict)
print(f"\nall_pass = {overall_report.all_pass}")
print(f"aggregate = {overall_report.aggregate!r}")
print(f"pretrends_joint populated? {overall_report.pretrends_joint is not None}")
print(f"homogeneity_joint populated? {overall_report.homogeneity_joint is not None}")

QUG and linearity diagnostics fail-to-reject; Assumption 7 pre-trends test NOT run (paper step 2 deferred to Phase 3 follow-up)

all_pass = True
aggregate = 'overall'
pretrends_joint populated? False
homogeneity_joint populated? False

Reading the overall verdict. Three things to note.

  • Step 1 (QUG) fails to reject: the test statistic T = D_(1) / (D_(2) - D_(1)) ~ 3.86 lands well below its critical value (1/alpha - 1 = 19 at alpha = 0.05); the data are statistically consistent with d_lower = 0. (Failing to reject is non-rejection, not proof - the true support could still be slightly above zero in finite samples; here it is, by construction of the DGP. QUG’s outcome supports interpreting the data as Design 1’, but the QUG test is independent of HAD’s design="auto" selector - which uses the min/median heuristic described in Section 2 to reach the same continuous_at_zero decision on this panel.)

  • Step 3 (linearity) fails to reject on both Stute (CvM) and Yatchew-HR. The diagnostics do not flag heterogeneity bias on the dose dimension, so reading the WAS as an average per-dose marginal effect is supported by these tests (subject to finite-sample power).

  • Step 2 (Assumption 7 pre-trends) is not run on this path. The verdict says so verbatim: "Assumption 7 pre-trends test NOT run (paper step 2 deferred to Phase 3 follow-up)". With a single pre-period (the avg over weeks 1-4), there is nothing to compare against - we need at least two pre-periods to run a parallel-trends test on the dose dimension. The structural fields back this up: pretrends_joint and homogeneity_joint on the report are both None (the joint-Stute output containers don’t get populated on the two-period path).

A note on all_pass = True here: the workflow’s all_pass flag aggregates only the steps that actually ran on this dispatch path. On the overall path that is QUG + linearity (Stute / Yatchew); Step 2’s deferral is not folded into all_pass. So all_pass = True on the overall path means “of the two steps that ran, neither rejected” - it does not mean Assumption 7 has been cleared. The upgrade to event-study below makes this concrete by actually running Step 2.

Let’s look at each individual test result.

[3]:
overall_report.qug.print_summary()
print()
overall_report.stute.print_summary()
print()
overall_report.yatchew.print_summary()

================================================================
                QUG null test (H_0: d_lower = 0)
================================================================
Statistic T:                                 3.8562
p-value:                                     0.2059
Critical value (1/alpha-1):                 19.0000
Reject H_0:                                   False
alpha:                                       0.0500
Observations:                                    60
Excluded (d == 0):                                0
D_(1):                                       0.1806
D_(2):                                       0.2274
================================================================

================================================================
         Stute CvM linearity test (H_0: linear E[dY|D])
================================================================
CvM statistic:                               0.0735
Bootstrap p-value:                           0.6860
Reject H_0:                                   False
alpha:                                       0.0500
Bootstrap replications:                         999
Observations:                                    60
Seed:                                            21
================================================================

================================================================
        Yatchew-HR linearity test (H_0: linear E[dY|D])
================================================================
T_hr statistic:                         -34759.3017
p-value:                                     1.0000
Critical value (1-sided z):                  1.6449
Reject H_0:                                   False
alpha:                                       0.0500
sigma^2_lin (OLS):                           1.6177
sigma^2_diff (Yatchew):                   6250.2569
sigma^2_W (HR scale):                        1.3925
Observations:                                    60
================================================================

A note on the Yatchew row. The T_hr statistic is very large and negative (~-35,000), which looks alarming but is a scale artifact, not pathology. Under the Yatchew construction sigma2_diff = (1 / 2G) * sum((dy_{(g)} - dy_{(g-1)})^2) is computed on dy sorted by dose D. With doses spread over Uniform[$0.01K, $50K] and a true per-$1K slope of 100 (locked by the DGP), adjacent-by-dose units have dy values that differ by roughly 100 * (D_{(g)} - D_{(g-1)}) plus noise — those squared gaps add up to a large sigma2_diff (about 6,250 here) by virtue of the dose scale, while the OLS residual variance sigma2_lin (about 1.6) reflects only noise around the linear fit. The formula T_hr = sqrt(G) * (sigma2_lin - sigma2_diff) / sigma2_W then goes massively negative, p-value rounds to 1.0, and we comfortably fail to reject linearity. The side panel later in the notebook constructs a different Yatchew input (within-pre-period first-differences, where the adjacent-by-dose dy gaps are not driven by the post-treatment slope) and produces a T_hr near zero — a useful sanity check that the test behaves the way it should when the dose dimension genuinely contributes nothing to the variance of dy.

4. Step 2: Upgrade to the Event-Study Workflow#

The two-period workflow ran Steps 1 and 3 but did not run Step 2 (parallel pre-trends). Our panel actually has 8 weeks - that is enough pre-periods to add the joint Stute pre-trends diagnostic (paper Section 4.2 step 2 + Hlavka-Huskova 2020 / Delgado-Manteiga 2001 dependence-preserving Mammen multiplier bootstrap).

We pass the full multi-period panel to did_had_pretest_workflow(aggregate="event_study", ...). The dispatch runs all three testable steps in one call:

  • Step 1: QUG re-runs on the dose distribution at the treatment period F (deterministic; same numbers as the overall path).

  • Step 2: joint_pretrends_test - mean-independence joint Stute over the pre-period horizons (E[Y_t - Y_base | D] = mu_t for each t < F).

  • Step 3: joint_homogeneity_test - linearity joint Stute over the post-period horizons (E[Y_t - Y_base | D_t] = beta_{0,t} + beta_{fe,t} * D for each t >= F).

Step 3’s “Yatchew-HR” arm has no joint variant in the paper (the differencing-based variance estimator doesn’t have a derived multi-horizon extension), so the event-study path runs only joint Stute for linearity. Practitioners who want Yatchew-HR robustness on multi-period data can call the standalone yatchew_hr_test on each (base, post) pair manually.

[4]:
es_report = did_had_pretest_workflow(
    data=panel,
    outcome_col="weekly_visits",
    dose_col="regional_spend_k",
    time_col="week",
    unit_col="dma_id",
    first_treat_col="first_treat",
    alpha=0.05,
    n_bootstrap=999,
    seed=21,
    aggregate="event_study",
)

print(es_report.verdict)
print(f"\nall_pass = {es_report.all_pass}")
print(f"aggregate = {es_report.aggregate!r}")
print(f"pretrends_joint populated? {es_report.pretrends_joint is not None}")
print(f"homogeneity_joint populated? {es_report.homogeneity_joint is not None}")

QUG, joint pre-trends, and joint linearity diagnostics fail-to-reject (TWFE admissible under Section 4 assumptions)

all_pass = True
aggregate = 'event_study'
pretrends_joint populated? True
homogeneity_joint populated? True

Reading the event-study verdict. Now the verdict reads "QUG, joint pre-trends, and joint linearity diagnostics fail-to-reject (TWFE admissible under Section 4 assumptions)". The "deferred" caveat from the overall path is gone because the joint pre-trends and joint homogeneity diagnostics now ran. The structural fields confirm: pretrends_joint and homogeneity_joint are both populated.

A note on the verdict’s “TWFE admissible” language. This is the workflow’s classifier output when none of the three testable diagnostics rejects at the configured alpha = 0.05 (paper Step 4 decision rule). That is non-rejection evidence under the diagnostics’ finite-sample power and specification, not proof that the identifying assumptions hold. The non-testable Design 1’ identification caveat (Assumption 3 / boundary regularity at zero, see Section 1) sits alongside this and is not covered by any of the three diagnostics.

The joint pre-trends test runs over n_horizons = 3 (pre-periods 1, 2, 3, with week 4 reserved as the base period). The joint homogeneity test runs over n_horizons = 4 (post-periods 5, 6, 7, 8). Let’s inspect the per-horizon detail.

[5]:
es_report.qug.print_summary()
print()
es_report.pretrends_joint.print_summary()
print()
es_report.homogeneity_joint.print_summary()

================================================================
                QUG null test (H_0: d_lower = 0)
================================================================
Statistic T:                                 3.8562
p-value:                                     0.2059
Critical value (1/alpha-1):                 19.0000
Reject H_0:                                   False
alpha:                                       0.0500
Observations:                                    60
Excluded (d == 0):                                0
D_(1):                                       0.1806
D_(2):                                       0.2274
================================================================

================================================================
     Joint Stute CvM test (mean-independence (pre-trends))
================================================================
Joint CvM statistic:                         7.1627
Bootstrap p-value:                           0.0720
Reject H_0:                                   False
alpha:                                       0.0500
Bootstrap replications:                         999
Horizons:                                         3
Observations:                                    60
Seed:                                            21
Exact-linear short-circuit:                   False
----------------------------------------------------------------
Per-horizon statistics:
  1                                  1.6112
  2                                  2.9262
  3                                  2.6253
================================================================

================================================================
      Joint Stute CvM test (linearity (post-homogeneity))
================================================================
Joint CvM statistic:                         1.3562
Bootstrap p-value:                           0.7630
Reject H_0:                                   False
alpha:                                       0.0500
Bootstrap replications:                         999
Horizons:                                         4
Observations:                                    60
Seed:                                            21
Exact-linear short-circuit:                   False
----------------------------------------------------------------
Per-horizon statistics:
  5                                  0.4218
  6                                  0.2186
  7                                  0.4928
  8                                  0.2230
================================================================

The pre-trends p-value (~0.07) sits close to the conventional alpha = 0.05 threshold. The test does not reject at alpha = 0.05, but the near-threshold p-value warrants scrutiny - the diagnostic is not failing in a clearly-far-from-rejection regime. In a real analysis this would warrant a closer look at the per-horizon CvM contributions (visible in per_horizon_stats) and possibly a Pierce-Schott-style linear-trend detrending via trends_lin=True (an extension we do not demonstrate here; see did_had_pretest_workflow’s docstring).

The joint homogeneity p-value (~0.76) is comfortably far from rejection. The diagnostic does not flag heterogeneity bias on the dose dimension across the four post-launch horizons.

Together with QUG (Step 1’s design decision) and joint linearity (Step 3), the workflow has now run all three testable steps and none reject at alpha = 0.05. By paper Step 4 (the decision rule), TWFE may then be used. That is the workflow’s strongest non-rejection evidence; it is not proof that the identifying assumptions hold. The non-testable Design 1’ identification caveat (Assumption 3 / boundary regularity at zero) remains and is argued from domain knowledge.

5. Side Panel: Yatchew-HR Null Modes#

The Yatchew-HR test exposes two null= modes (the second was added in 2026-04 for parity with the R YatchewTest package).

  • null="linearity" (default; paper Theorem 7): tests H0: E[dY | D] is linear in D. Residuals come from OLS dy ~ 1 + d. This is what did_had_pretest_workflow calls under the hood.

  • null="mean_independence" (added 2026-04-26 in PR #397, Phase 4 R-parity): tests the stricter H0: E[dY | D] = E[dY], i.e. dY is mean-independent of D. Residuals come from intercept-only OLS dy ~ 1. Mirrors R YatchewTest::yatchew_test(order=0).

The mean-independence mode is typically used on placebo (pre-treatment) data to test parallel pre-trends as a non-parametric mean-independence assertion. Below we construct an illustrative input - the within-pre-period first-difference dy = Y[week=4] - Y[week=3] paired with each DMA’s actual post-period dose - and run both modes side by side. Both should fail to reject on this clean linear DGP; the contrast is in the residual structure.

[6]:
from diff_diff import yatchew_hr_test

panel_sorted = panel.sort_values(["dma_id", "week"]).reset_index(drop=True)
pre = panel_sorted[panel_sorted["week"].isin([3, 4])]
pre_pivot = pre.pivot(index="dma_id", columns="week", values="weekly_visits")
dy = (pre_pivot[4] - pre_pivot[3]).to_numpy(dtype=np.float64)
post_dose = (
    panel_sorted[panel_sorted["week"] == 5]
    .set_index("dma_id")
    .sort_index()["regional_spend_k"]
    .to_numpy(dtype=np.float64)
)

res_lin = yatchew_hr_test(d=post_dose, dy=dy, alpha=0.05, null="linearity")
res_mi = yatchew_hr_test(d=post_dose, dy=dy, alpha=0.05, null="mean_independence")

print(res_lin.summary())
print()
print(res_mi.summary())

================================================================
        Yatchew-HR linearity test (H_0: linear E[dY|D])
================================================================
T_hr statistic:                              0.0207
p-value:                                     0.4917
Critical value (1-sided z):                  1.6449
Reject H_0:                                   False
alpha:                                       0.0500
sigma^2_lin (OLS):                           6.5340
sigma^2_diff (Yatchew):                      6.5170
sigma^2_W (HR scale):                        6.3639
Observations:                                    60
================================================================

================================================================
    Yatchew-HR mean-independence test (H_0: E[dY|D] = E[dY])
================================================================
T_hr statistic:                              0.5536
p-value:                                     0.2899
Critical value (1-sided z):                  1.6449
Reject H_0:                                   False
alpha:                                       0.0500
sigma^2_lin (OLS):                           7.0076
sigma^2_diff (Yatchew):                      6.5170
sigma^2_W (HR scale):                        6.8638
Observations:                                    60
================================================================

Reading the side-panel comparison.

  • The linearity mode fits dy ~ 1 + d and computes residual variance sigma2_lin from those residuals. Under a clean linear DGP the residuals are small (close to noise variance), the gap sigma2_lin - sigma2_diff is near zero, and T_hr lands close to zero with a p-value far above alpha.

  • The mean_independence mode fits intercept-only dy ~ 1 and computes sigma2_lin as the population variance of dy. That residual variance is strictly larger than under linearity (the linear fit can absorb any apparent slope between dy and d - real or sample noise - shrinking the residual variance, while intercept-only cannot). The gap sigma2_lin - sigma2_diff is then larger and T_hr is larger - same asymptotic distribution, stricter null, more easily rejected when the alternative is true.

On clean linear placebo data both modes fail to reject - exactly what we want. On data where dY actually responds to D in pre-period (parallel pre-trends fail), null="mean_independence" is more sensitive than null="linearity" because linearity is a weaker null (linear pre-trends would fail to reject the linearity null but would reject the mean-independence null).

When to choose which: use null="linearity" to defend the joint identification assumption (paper Step 3, Assumption 8). Use null="mean_independence" on placebo (pre-treatment) data when you want a non-parametric mean-independence assertion. The null="mean_independence" mode is what R YatchewTest::yatchew_test(order=0) runs by default for placebo pre-trend tests.

6. Communicating the Diagnostics to Leadership#

Pre-test results travel awkwardly to non-technical audiences. The template below structures the diagnostics around what each test does and does not rule out - mirroring the headline-and-evidence pattern from T20 Section 5.

The HAD pre-test diagnostics on the brand-campaign panel do not flag a violation of the testable identifying assumptions.

  • Step 1 (QUG support-infimum, paper Theorem 4): the test does not reject H0: d_lower = 0 (p approximately 0.21). The data are statistically consistent with a dose distribution starting at zero. Independently of QUG, HAD’s design="auto" selector applies a min/median heuristic to the post-period dose vector and lands on the continuous_at_zero design (target WAS) on this panel; QUG and the design selector are separate rules that point to the same identification path here. Failing to reject the QUG null is not proof that the true support is exactly at zero, and the design selector’s choice is operational, not statistical.

  • Step 2 (parallel pre-trends, Assumption 7): the joint Stute pre-trends test does not reject (joint p approximately 0.07 across the three pre-period horizons). The p-value is close to alpha = 0.05, so the non-rejection here is not by a wide margin - in a high-stakes deployment we would inspect the per-horizon contributions (per_horizon_stats) and consider Pierce-Schott-style linear-trend detrending.

  • Step 3 (linearity, Assumption 8): joint Stute homogeneity does not reject (joint p approximately 0.76 across the four post-launch horizons). The diagnostic does not flag heterogeneity bias on the dose dimension under the test’s specification.

Non-testable from data (Design 1’ identification, paper Assumption 3 / boundary regularity at zero): uniform continuity of the dose-response d -> Y_2(d) at zero. Argued from domain knowledge - is there reason to believe outcomes are continuous in spend at the lower-dose boundary, with no extensive-margin discontinuity at $0? In our case yes, by DGP construction. (Note: this is the Design 1’ caveat. T20’s panel was Design 1, where the corresponding non-testable caveats are Assumptions 5/6 - the library actually emits a UserWarning surfacing those on Design 1 fits but stays silent on Design 1’ fits like ours.)

Bottom line: the workflow’s three testable diagnostics do not flag a violation, so by paper Step 4 (decision rule) TWFE may be used. Carrying the headline per-$1K lift forward should be paired with the standard caveats: finite-sample power of the diagnostics, the test specifications themselves, and the non-testable Design 1’ caveat (Assumption 3 / boundary regularity at zero). None of these are settled by non-rejection of the pre-tests.

7. Extensions#

This tutorial covered the composite pre-test workflow on a single panel where QUG fail-to-reject and HAD’s design="auto" heuristic both pointed independently to the continuous_at_zero (Design 1’) identification path. A few directions we did not exercise here:

  • Survey-weighted / population-weighted inference - HAD’s pre-test workflow accepts survey_design= (or the deprecated survey= / weights= aliases) for design-based inference. The QUG step is permanently deferred under survey weighting (extreme-value theory under complex sampling is not a settled toolkit); the linearity family runs with PSU-level Mammen multiplier bootstrap (Stute and joint variants) and weighted OLS + weighted variance components (Yatchew). See Tutorial 22: Survey-Weighted HAD for an end-to-end walkthrough on a BRFSS-shape household-survey panel including the now-supported SurveyDesign(strata=...) pretest workflow.

  • ``trends_lin=True`` (Pierce-Schott Eq 17 / 18 detrending) - mirrors R DIDHAD::did_had(..., trends_lin=TRUE). Forwards into both joint pre-trends and joint homogeneity wrappers; consumes the placebo at base_period - 1 and skips Step 2 if no earlier placebo survives the drop. Useful when you suspect linear time trends correlated with dose but want to keep the joint-Stute machinery.

  • Standalone constituent tests - all four building blocks are exposed for direct calling: qug_test, stute_test, yatchew_hr_test (used in this tutorial’s side panel), and the joint variants stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test.

See the `HeterogeneousAdoptionDiD API reference <../api/had.html>`__ and the `HAD pre-tests reference <../api/had.html#pre-tests>`__ for the full parameter lists.

Related tutorials.

8. Summary Checklist#

  • HAD’s pre-test workflow did_had_pretest_workflow bundles paper Section 4.2 Steps 1 (QUG support infimum), 2 (joint Stute pre-trends - event-study path only), and 3 (Stute / Yatchew-HR linearity, joint variant on event-study path).

  • The two-period (aggregate="overall") path runs Steps 1 + 3 only - it cannot run Step 2 because a single pre-period structurally has nothing to test against. The verdict says so verbatim: “Assumption 7 pre-trends test NOT run”.

  • Upgrade to the multi-period (aggregate="event_study") path to add the joint Stute pre-trends and joint homogeneity diagnostics. The verdict then reads “TWFE admissible under Section 4 assumptions” when none of the three testable diagnostics rejects - that is non-rejection evidence under finite-sample power and test specification, not proof.

  • Paper Step 4 is the decision rule (if Steps 1-3 don’t reject, use TWFE), not a non-testable assumption. The non-testable identification caveat is design-path-specific: Assumption 3 (boundary regularity at zero) for continuous_at_zero (Design 1’, T21), or Assumptions 5/6 for the Design 1 paths (continuous_near_d_lower / mass_point, T20).

  • The Yatchew-HR test exposes two null modes: null="linearity" (paper Theorem 7, default; what the workflow calls under the hood) and null="mean_independence" (Phase 4 R-parity with R YatchewTest::yatchew_test(order=0), useful on placebo pre-period data).

  • QUG fail-to-reject means the data are statistically consistent with d_lower = 0; it does not prove the true support starts at zero. The QUG test and HAD’s design="auto" selector are independent rules: QUG is a statistical test on H0: d_lower = 0; design="auto" calls _detect_design() which uses a min/median heuristic on the dose vector. Both pointed to continuous_at_zero on this panel; finite-sample uncertainty in either decision is a remaining caveat.

  • Bootstrap p-values are RNG-dependent. The drift test for this notebook lives in tests/test_t21_had_pretest_workflow_drift.py and uses tolerance bands per backend (Rust vs pure-Python).