Spillover-Aware DiD (Butts 2021)#

Ring-indicator spillover-aware Difference-in-Differences estimator.

This module implements the methodology from Butts, K. (2023, originally 2021), “Difference-in-Differences with Spatial Spillovers” (arXiv:2105.03737v3). The estimator augments two-stage Gardner (2022) DiD with ring-indicator covariates that identify, alongside the direct effect on treated units (tau_total), per-ring spillover effects on near-control units (delta_j).

The standard DiD estimator is biased for tau_total by -tau_spill(0) (paper Proposition 2.1 / Equations 1-2); the ring-augmented regression removes this bias. The “far-away” cutoff d_bar defines the boundary beyond which spillovers vanish (Assumption 5).

When to use SpilloverDiD:

Spatial-policy DiD settings where treatment may spill over onto nearby control units (border-RD, place-based policy, neighborhood effects, geographic policy boundaries)
When the canonical DiD coefficient is suspected to be attenuated or inflated by spillover bias and a far-away control group exists
As a robustness check on TwoStageDiD / ImputationDiD when the treatment has plausibly local geographic externalities

Reference: Butts, K. (2023). Difference-in-Differences with Spatial Spillovers. arXiv:2105.03737v3. Gardner, J. (2022). Two-stage differences in differences. arXiv:2207.05943.

SpilloverDiD#

Main estimator class.

class diff_diff.SpilloverDiD[source]

Bases: object

Ring-indicator spillover-aware DiD (Butts 2021).

Standalone estimator implementing two-stage Gardner (2022) methodology with ring-indicator covariates that identify the direct effect on treated units (tau_total) alongside per-ring spillover effects on near-control units (delta_j). Supports both panel non-staggered timing and Section 5 staggered timing in a single fit() entry point — non-staggered is the special case where all treated units share an onset time.

Parameters:

rings (list of float) – Sorted distance breakpoints with at least 2 elements. K = len(rings) - 1 rings are constructed.
d_bar (float, optional) – Far-away cutoff (Butts Assumption 5). Defaults to max(rings); if explicitly set, must equal max(rings). Wave B MVP does not support a d_bar strictly larger than the outermost ring edge (a “dead zone” where units satisfy rings[-1] < d_i <= d_bar but are in neither a ring nor the far-away group has no clean methodological interpretation). To use a smaller spillover bandwidth, shrink the outermost ring edge instead.
vcov_type (str, default="hc1") – Variance estimator. Set to "conley" and supply conley_coords/conley_cutoff_km/conley_lag_cutoff to enable Conley spatial-HAC at stage 2 (recommended per paper Section 3.1).
conley_coords (tuple of (str, str), optional) – (lat_col, lon_col) column names. Used for ring construction AND for the Conley vcov spatial kernel.
conley_metric (str or callable, default="haversine") – Distance metric used for both ring construction and the Conley spatial kernel. See diff_diff.conley for callable contract.
conley_cutoff_km (float, optional) – Conley spatial-HAC bandwidth. Required when vcov_type="conley".
conley_lag_cutoff (int, optional) – Within-unit Bartlett max lag. Required when vcov_type="conley". Use 0 to suppress the serial-component sandwich.
cluster (str, optional) – Column name for cluster-robust variance, or the combined Conley cluster product kernel when paired with vcov_type="conley".
alpha (float, default=0.05) – Significance level for confidence intervals.
anticipation (int, default=0) – Number of pre-treatment periods where effects may occur. Treatment and ring-membership clocks both shift by -anticipation so the stage-1 untreated-and-unexposed subsample correctly excludes anticipation rows.
event_study (bool, default=False) – If True, emit per-event-time × ring coefficients (Butts Table 2 staggered specification). The result’s spillover_effects DataFrame uses a MultiIndex over (ring, event_time).
horizon_max (int, optional) – Maximum absolute event-study horizon. Used only when event_study=True. Event-times outside [-horizon_max, +horizon_max] are binned into endpoint pools (k <= -H aggregated into a single pre-bin coefficient; k >= +H into a single post-bin coefficient), so no observations are dropped. This intentionally diverges from diff_diff.two_stage.TwoStageDiD, which filters rows with |K| > horizon_max out of the stage-2 sample. The endpoint-pool semantic honors the library’s no-silent-data-drop policy (feedback_no_silent_failures). When None, the helper auto-detects the bin set from observed K values. If ref_period = -1 - anticipation falls outside [-horizon_max, +horizon_max] the fit raises ValueError.
rank_deficient_action ({"warn", "error", "silent"}, default="warn") – Action when the stage-2 design is rank-deficient.

results_

Populated after fit() completes.

Type:: SpilloverDiDResults

is_fitted_

Type:: bool

Notes

The implementation uses two-stage Gardner methodology with the time-varying S_it = S_i * 1{t >= t_treat} form (paper page 12, just above Equation 5). Reading the literal unit-static (1 - D_it) * S_i from Equation 5 yields a rank-deficient design under TWFE; Section 5’s Table 2 makes the time-varying form explicit. The diff-diff implementation matches the paper’s identification argument once the S_it notation is read correctly.

For non-staggered timing, Gardner identity → stage-2 point estimates equal a single-stage TWFE with the time-varying spillover regressor.

Methods

`fit`(data, *, outcome, unit, time[, ...])	Fit the two-stage Gardner DiD with ring-indicator covariates.
`get_params`()
`set_params`(**params)

__init__(*, rings, d_bar=None, vcov_type='hc1', conley_coords=None, conley_metric='haversine', conley_cutoff_km=None, conley_lag_cutoff=None, cluster=None, alpha=0.05, anticipation=0, event_study=False, horizon_max=None, rank_deficient_action='warn')[source]

Parameters:

rings (List[float])
d_bar (float | None)
vcov_type (str)
conley_coords (Tuple[str, str] | None)
conley_metric (Literal['haversine', 'euclidean'] | ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], ~numpy.ndarray])
conley_cutoff_km (float | None)
conley_lag_cutoff (int | None)
cluster (str | None)
alpha (float)
anticipation (int)
event_study (bool)
horizon_max (int | None)
rank_deficient_action (str)

get_params()[source]

Return type:: Dict[str, Any]

set_params(**params)[source]

Parameters:: params (Any)
Return type:: SpilloverDiD

fit(data, *, outcome, unit, time, treatment=None, first_treat=None, covariates=None, survey_design=None)[source]

Fit the two-stage Gardner DiD with ring-indicator covariates.

Methodology (Butts 2021 Section 5 + Gardner 2022):

Compute per-row spillover indicators from conley_coords.
Build stage-1 subsample Omega_0 = {D_it=0 AND S_it=0} (untreated AND unexposed) — Butts’ clean control group.
Stage 1: fit Y_it = mu_i + lambda_t + u on Omega_0.
Residualize: Y_tilde = Y - mu_hat - lambda_hat for ALL rows.
Stage 2: regress Y_tilde on [D_it, (1-D_it)*Ring_{it,j}] via solve_ols(), threading the configured vcov_type.
Wrap as SpilloverDiDResults.

Notes

Stage-2 variance applies the Wave D Gardner (2022) GMM first-stage uncertainty correction across all supported vcov_type paths ("hc1", "conley", "cluster" via cluster=<col>). The unified IF outer-product formula is psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i} with meat = Psi' K Psi where K is path-dependent (identity for HC1, block-indicator for cluster, spatial kernel for Conley). Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 + Conley (1999); no reference software combines all three. vcov_type="classical" raises NotImplementedError because the Wave D synthesis has not been derived for the homoskedastic meat structure sigma_hat^2 * (X_10' X_10); use "hc1" for heteroskedasticity-robust SE with the GMM correction.

Parameters:

data (DataFrame)
outcome (str)
unit (str)
time (str)
treatment (str | None)
first_treat (str | None)
covariates (List[str] | None)
survey_design (object)

Return type:

SpilloverDiDResults

SpilloverDiDResults#

Results container with per-ring spillover-effect table.

class diff_diff.SpilloverDiDResults[source]

Bases: DiDResults

Results from a ring-indicator spillover-aware DiD estimation (Butts 2021).

Extends DiDResults with per-ring spillover effect estimates and diagnostic counts for the identifying control population.

att

Total effect on the treated, tau_total (inherited from DiDResults).

Type:: float

spillover_effects

Per-ring spillover-on-control estimates. Columns: ["coef", "se", "t_stat", "p_value", "ci_low", "ci_high"]. Index: ring label like "[0, 50)"; for event-study output the index is a MultiIndex over (ring_label, event_time).

Type:: pd.DataFrame, optional

ring_breakpoints

Sorted distance breakpoints used to construct the rings.

Type:: list of float

d_bar

Far-away cutoff (Butts Assumption 5). Equals max(ring_breakpoints) unless explicitly overridden.

Type:: float

n_units_ever_in_ring

Counts of UNIQUE units that appear in each ring AT LEAST ONCE across observed periods. Under staggered timing the ring membership is time-varying, so a unit can be counted in multiple rings (one per period it visits). Under non-staggered timing the rings are unit-static, so this is a clean partition (each unit appears in exactly one ring or the far-away group). Includes treated units in Ring_1 by geometric construction even though the (1 - D_it) factor zeros their stage-2 contribution.

Type:: dict[str, int]

n_far_away_obs

Number of observations with D_it = 0 AND d_it > d_bar; these observations identify the counterfactual trend (Butts Assumption 5(ii)).

Type:: int

is_staggered

True if multiple distinct treatment-onset times were detected.

Type:: bool

event_study

True if per-event-time ring coefficients were emitted (i.e., self.event_study=True was set on the estimator).

Type:: bool

stage1_n_obs

Number of observations in the stage-1 untreated-and-unexposed subsample (Omega_0_butts).

Type:: int

Methods

`summary`([alpha])	Extended summary with ATT row, per-event-time direct block, and per-(ring, event-time) spillover block.
`to_dict`()	Override to serialize `spillover_effects` DataFrame as list-of-dicts.

spillover_effects: DataFrame | None = None

ring_breakpoints: List[float] | None = None

d_bar: float | None = None

n_units_ever_in_ring: Dict[str, int] | None = None

n_far_away_obs: int | None = None

is_staggered: bool | None = None

event_study: bool | None = None

stage1_n_obs: int | None = None

anticipation: int | None = None

att_dynamic: DataFrame | None = None

event_study_effects: Dict[int, Dict[str, Any]] | None = None

horizon_max: int | None = None

reference_period: int | None = None

survey_metadata: Any | None = None

n_psu: int | None = None

n_strata: int | None = None

summary(alpha=None)[source]

Extended summary with ATT row, per-event-time direct block, and per-(ring, event-time) spillover block.

Parameters:: alpha (float | None)
Return type:: str

to_dict()[source]

Override to serialize spillover_effects DataFrame as list-of-dicts.

Return type:: Dict[str, Any]

__init__(att, se, t_stat, p_value, conf_int, n_obs, n_treated, n_control, alpha=0.05, coefficients=None, vcov=None, residuals=None, fitted_values=None, r_squared=None, inference_method='analytical', n_bootstrap=None, n_clusters=None, bootstrap_distribution=None, survey_metadata=None, vcov_type=None, cluster_name=None, conley_lag_cutoff=None, spillover_effects=None, ring_breakpoints=None, d_bar=None, n_units_ever_in_ring=None, n_far_away_obs=None, is_staggered=None, event_study=None, stage1_n_obs=None, anticipation=None, att_dynamic=None, event_study_effects=None, horizon_max=None, reference_period=None, n_psu=None, n_strata=None)

Parameters:

att (float)
se (float)
t_stat (float)
p_value (float)
conf_int (Tuple[float, float])
n_obs (int)
n_treated (int)
n_control (int)
alpha (float)
coefficients (Dict[str, float] | None)
vcov (ndarray | None)
residuals (ndarray | None)
fitted_values (ndarray | None)
r_squared (float | None)
inference_method (str)
n_bootstrap (int | None)
n_clusters (int | None)
bootstrap_distribution (ndarray | None)
survey_metadata (Any | None)
vcov_type (str | None)
cluster_name (str | None)
conley_lag_cutoff (int | None)
spillover_effects (DataFrame | None)
ring_breakpoints (List[float] | None)
d_bar (float | None)
n_units_ever_in_ring (Dict[str, int] | None)
n_far_away_obs (int | None)
is_staggered (bool | None)
event_study (bool | None)
stage1_n_obs (int | None)
anticipation (int | None)
att_dynamic (DataFrame | None)
event_study_effects (Dict[int, Dict[str, Any]] | None)
horizon_max (int | None)
reference_period (int | None)
n_psu (int | None)
n_strata (int | None)

Return type:

None

Example Usage#

Non-staggered 2-period panel with synthetic spillovers:

from diff_diff import SpilloverDiD

est = SpilloverDiD(
    rings=[0, 50, 100, 200],          # 3 rings: [0,50), [50,100), [100,200]
    conley_coords=("lat", "lon"),
    vcov_type="conley",
    conley_cutoff_km=200.0,
    conley_lag_cutoff=0,
)
result = est.fit(
    data=df,
    outcome="y",
    unit="unit",
    time="time",
    treatment="D",                    # binary; auto-converts to first_treat
)
print(result.summary())
# Total effect on treated (Butts tau_total):
print(f"tau_total = {result.att:.4f}")
# Per-ring spillover-on-control:
print(result.spillover_effects)

Staggered timing (Gardner-convention first_treat column):

result = est.fit(
    data=df,
    outcome="y",
    unit="unit",
    time="time",
    first_treat="first_treat",        # 0 / inf = never-treated
)

Identification spec#

The stage-2 regressor for ring j is the time-varying form

\[(1 - D_{it}) \cdot \mathrm{Ring}_{it,j}\]

where Ring_{it,j} is the indicator that unit i’s nearest currently- treated unit is in distance bin j at time t, and D_it is the treatment indicator (1 if unit i is treated by time t). For non-staggered timing, all treated units share one onset and Ring is unit-static post-treatment / zero pre-treatment. For staggered timing, Ring_{it,j} is unit-time-varying (a unit’s nearest treated neighbor changes as more cohorts enter treatment).

Stage 1 fits unit + time fixed effects on Butts’ subsample Omega_0 = {D_it = 0 AND S_it = 0} (untreated AND unexposed) — the clean far-away control group. Reading the literal unit-static (1 - D_it) * S_i from paper Equation 5 yields a rank-deficient design under TWFE; the diff-diff implementation uses the time-varying form (paper page 12’s S_it notation, made explicit in Section 5 Table 2).

Estimator Comparison#

SpilloverDiD vs. TwoStageDiD vs. TwoWayFixedEffects#
Feature	SpilloverDiD	TwoStageDiD	TwoWayFixedEffects
Spillover handling	Identifies per-ring delta_j	None (assumes SUTVA)	None (assumes SUTVA)
Methodology	Two-stage Gardner + ring covariates	Two-stage Gardner	Single-stage TWFE
Staggered timing	Yes	Yes	Biased under heterogeneity
Stage-1 subsample	Butts: `D=0 AND S=0` (untreated AND unexposed)	`D=0` (untreated)	N/A (single stage)
Conley spatial-HAC SE	Yes (Wave D GMM-corrected sandwich)	Not yet supported	Yes
Cluster-robust SE	Yes (HC1 + CR1, Wave D GMM-corrected sandwich)	Yes (GMM sandwich + clusters)	Yes

Restrictions and follow-ups#

The current implementation has the following documented restrictions and planned follow-up enhancements:

Gardner GMM first-stage correction at stage 2 — SHIPPED in Wave D. Stage-2 variance now applies the influence-function-based correction for stage-1 FE estimation uncertainty across all three vcov_type paths (HC1, Conley, cluster) on both event_study=False AND event_study=True. The IF formula is psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i} with gamma_hat = (X_10' X_10)^{-1} (X_1' X_2); the meat is Psi' K Psi where K is the path-dependent kernel matrix (identity for HC1, block-indicator for cluster, spatial kernel for Conley). Documented synthesis of Butts (2021) Section 3.1 + Gardner (2022) Section 4 + Conley (1999); no reference software combines all three ingredients. Point estimates unchanged; SE values shift upward by 1-few percent depending on first-stage residual variance.
Event-study mode — event_study=True is SHIPPED in Wave C. The per-event-time × ring decomposition (Butts Section 5 / Table 2) emits per-event-time direct effects tau_k and per-(ring, event-time) spillover effects delta_jk as a att_dynamic DataFrame plus MultiIndex spillover_effects. The event_study_effects: Dict[int, Dict] alias mirrors TwoStageDiD’s schema for plot_event_study consumption (the plotter prefers the new reference_period attribute over the legacy n_obs==0 heuristic). DiagnosticReport routing for SpilloverDiDResults is queued as a follow-up. Reference period -1 - anticipation (TwoStageDiD parity). horizon_max bins event-times into endpoint pools (no row drop — divergence from TwoStageDiD’s filtering semantic, intentional per feedback_no_silent_failures). horizon_max must be >=1 or None under event_study=True; horizon_max=0 is rejected (the single bin k=0 leaves no event-time pair to anchor the reference period — for a single aggregate effect, use event_study=False instead). Scalar att becomes a share-weighted average of post-treatment tau_k with SE from linear-combination inference on the post-treatment vcov block. Per-event-time SEs apply the Wave D Gardner GMM first-stage uncertainty correction (see the “Gardner GMM first-stage correction” entry above). Wave E.1 amendment: when survey_design= is supplied, the per-horizon shares are survey-weight totals rather than raw observation counts (share_k = sum_i w_i * 1{K_direct_i = k AND treated_i = 1}); the same vector enters both the att point estimate and Var(att) = w' V_subset w. Without this, the lincom would mix unweighted aggregation shares with weighted WLS horizon coefficients and target the wrong estimand. See the registry “Variance (Wave E.1)” subsection for the full survey-aggregation contract.
Survey-design integration (Wave E.1 — HC1 / CR1 via Binder TSL). SHIPPED in Wave E.1. survey_design= is now supported on vcov_type ∈ {"hc1"} and CR1 (cluster=<col>) paths.

Note

Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper’s Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD’s ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD’s per-obs IF psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i} is aggregated to PSU level, then passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. Degrees of freedom for the t-distribution lookup use ResolvedSurveyDesign.df_survey (4-way branch: PSU+strata → n_PSU - n_strata; PSU only → n_PSU - 1; strata only → n_obs - n_strata; neither → n_obs - 1). No reference software combines all ingredients.

Restrictions:
- Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises NotImplementedError; per Gerber (2026) Appendix A, the IF-reweighting shortcut does not apply because gamma_hat is weight-sensitive — follow-up requires per-replicate full re-fit.
- Singleton-stratum lonely_psu="remove" with all strata singletons saturates df_survey = 0 and the meat helper NaN-fails; SE / t-stat / p-value / CI all NaN-propagate (no silent fallback to HC1).
- cluster=<col> + survey_design.psu with different groupings emits a UserWarning and uses PSU as the cluster (mirrors TwoStageDiD._resolve_effective_cluster). When the two groupings are identical, no warning fires; PSU still takes precedence (inference is unchanged either way).
- SurveyDesign.subpopulation() with FE-undefined zero-weight padding rows: Wave E.1 physically removes such rows from the survey design via finite_mask rather than retaining them as zero-score domain padding. n_psu / df_survey / Binder centering reflect the reduced fit sample rather than the full domain design. See the REGISTRY note for details and the Wave E.3 follow-up tracked in TODO.md.
Survey-design integration (Wave E.2 — Conley × survey via stratified-Conley sandwich on PSU totals). SHIPPED in Wave E.2. vcov_type="conley" + survey_design= is now supported via a per-stratum Conley sandwich applied to PSU-aggregated Wave D Gardner GMM influence functions.

Note

Wave E.2 composes Conley (1999) spatial-HAC with Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD’s ring-indicator stage-2 design. The composition is panel-aware — it preserves the library’s existing conley_lag_cutoff = 0 semantic (“within-period spatial only, exclude cross-period pairs”) by looping over periods and aggregating Psi to PSU totals WITHIN each period (not over the whole panel). For each period t, S_psu_t[g] = sum_{i in PSU g, time t} psi_i; per-PSU centroids are panel-constant (mean of per-observation conley_coords); for each stratum the within-stratum sandwich is M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / cutoff) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)', where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it at the fit-call site; the survey helper’s kernel parameter can also take "uniform", but exposing that on the SpilloverDiD constructor is a separate follow-up). Cross-stratum kernel weights are exactly zero by sampling design (strata are exact independence partitions). Total meat is sum_t sum_h M_h_t. Cross-period spatial pairs are excluded by construction. No reference software combines all three ingredients on a two-stage influence function.

Reduction semantics:
- Per-period sum invariant: sum_t of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals. Pinned at tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant (pure unit test on the orchestrator + helper composition).
- Single stratum (H = 1, FPC = inf): reduces to sum_t plain Conley sandwich on per-period PSU totals (NOT on time-collapsed PSU totals — the per-period loop preserves lag_cutoff = 0 semantics).
- All PSUs singleton + lonely_psu="remove": df_survey = 0 and the stratified-Conley meat NaN-fails (matches Wave E.1 saturation behaviour, with UserWarning template “Wave E.2 stratified-Conley sandwich: df_survey = 0…”).
Restrictions:
- Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises NotImplementedError (inherits Wave E.1 gate; per-replicate refit is separate follow-up scope).
- cluster=<col> + survey_design.psu + vcov_type="conley": cluster=<col> is coerced to PSU per Wave E.1’s warn-and-use-PSU pattern; the Conley cluster product kernel becomes a no-op after PSU aggregation.
- The LinearRegression-side vcov_type="conley" + survey_design= gate at diff_diff/linalg.py is a separate Bertanha-Imbens 2014 weighted-Conley roadmap (not Wave E).
- DiagnosticReport routing for SpilloverDiDResults(vcov_type="conley", survey_design=) is queued for a follow-up (the _APPLICABILITY / _PT_METHOD wiring must register the new combination first).
Survey-design integration (Wave E.2 follow-up — ``conley_lag_cutoff > 0`` panel-block composition). SHIPPED in Wave E.2 follow-up. vcov_type="conley" + conley_lag_cutoff > 0 + survey_design= is now supported by adding a within-PSU serial Bartlett HAC term to the Wave E.2 spatial sandwich, provided the survey design has an effective PSU (either explicit survey_design.psu or a cluster=<col> argument that gets injected as the effective PSU per Wave E.1’s _inject_cluster_as_psu routing). No-effective-PSU survey designs (weights-only / strata-only WITHOUT a cluster fallback) raise NotImplementedError at SpilloverDiD.fit post-resolution — see the Restrictions list below.

Note

Wave E.2 follow-up composes Wave E.2’s panel-aware stratified-Conley spatial sandwich with within-PSU serial Bartlett HAC over time (Newey-West 1987 form, kernel weights 1 - |t-s|/(L+1) for |t-s| <= L, t != s). The composition is meat = meat_spatial + meat_serial with disjoint index sets, exactly matching the no-survey panel-block decomposition at diff_diff.conley._compute_conley_meat (Conley 1999 + Newey-West 1987 separable form, NOT Driscoll-Kraay 2D-HAC). For each stratum h: meat_serial_h = FPC_h_panel * sum_{g in stratum h} sum_{|t-s| <= L, t != s, both periods present for PSU g} K_serial(|t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g]' where S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t is per-period within-stratum centered (Binder TSL form — matches the spatial helper’s centering exactly), and |t-s| uses panel-wide dense time codes (matches conley.py:940 documented R deviation that mirrors R conleyreg). Serial Bartlett kernel is hardcoded regardless of conley_kernel (mirrors conley.py:951-965; conley_kernel governs spatial kernel only). FPC for serial uses panel-wide n_h_panel = |unique PSUs in stratum h across active sample|, NOT per-period n_h_t (the serial sum is a PANEL-level construct; standalone Newey-West composition on stratified clusters, deliberately NOT by analogy to the cross-sectional Wave E.2 spatial FPC convention). No reference software combines panel-block Conley + Binder TSL + Gardner GMM correction on a two-stage influence function.

Reduction semantics:
- conley_lag_cutoff = 0 or None: bit-identical to shipped Wave E.2 ATT and scalar SE (orchestrator skips the serial helper invocation when L = 0 so meat_serial does not contribute; the test_a2 mock-spy independently asserts the helper isn’t invoked).
- conley_time is None or T = 1: serial helper short-circuits to zero meat (no cross-period pairs possible).
- Single stratum (H = 1, FPC = inf) with L > 0: serial reduces to Newey-West Bartlett HAC on per-period within-stratum-CENTERED per-PSU score sequences (NOT raw scores — Binder TSL centering is retained at H=1 because the survey path always subtracts the per-period stratum mean before the kernel application; the panel-wide G/(G-1) survey factor replaces FPC).
- Bandwidth → 0 with L > 0: spatial reduces to per-period within-stratum HC sandwich; serial term unchanged (separable form).
- All PSUs singleton + lonely_psu="remove" with L > 0: meat NaN-fails on the saturation diagnostic (template “Wave E.2 stratified-Conley sandwich” covers both spatial-only and panel-block cases).
Centering asymmetry vs no-survey reference:
- The no-survey panel-block path at conley.py:949-965 uses RAW scores for the serial term (no centering) because it assumes E[scores] = 0 under correct specification. The survey-weighted Binder TSL form estimates the within-stratum mean and centers explicitly (textbook stratified-cluster sandwich); raw scores in the survey case would inflate variance by twice the squared per-period stratum mean and would NOT reduce to the cross-sectional Wave E.2 form at lag = 0.
Restrictions:
- Requires an effective PSU — either explicit survey_design.psu OR cluster=<col> injected as the effective PSU per Wave E.1’s _inject_cluster_as_psu. No-effective-PSU survey designs (weights-only / strata-only WITHOUT a cluster fallback) raise NotImplementedError post-resolution at SpilloverDiD.fit because the pseudo-PSU = obs-index fallback would silently zero the serial sum (each pseudo-PSU appears in exactly one period). Tracked as a follow-up in TODO.md.
- Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises NotImplementedError (inherits Wave E.1 gate).
- DiagnosticReport routing for the panel-block case is queued for the same Wave F follow-up as the Wave E.2 cross-sectional case.
Count-of-treated-in-ring — only the “nearest-treated ring” specification is implemented. The “count” form re-introduces functional-form dependence (paper Section 3.2 end) and is queued.
Data-driven d_bar selection — Butts (2021b) / Butts (2023) JUE Insight propose cross-validation under stronger parallel-trends assumptions. Not in this PR.
HC2 / HC2_BM (Bell-McCaffrey / CR2) variance — current stage-2 inference uses a generic residual-df (n - effective rank) for t-distribution lookups. vcov_type="hc2" / "hc2_bm" require per-coefficient BM / CR2 DOF and raise NotImplementedError. Routing stage 2 through LinearRegression (which supplies the per-coefficient DOF metadata) is queued.
`vcov_type=”classical”` (Wave D restriction) — raises NotImplementedError. The Wave D Gardner GMM first-stage uncertainty correction has not been derived for the classical homoskedastic variance (different meat structure sigma_hat^2 * (X_10' X_10) vs the Wave D IF outer product Psi' Psi). Use vcov_type="hc1" for heteroskedasticity-robust SE with the GMM correction, or combine with cluster=<col> for CR1 with the GMM correction; both apply the Wave D synthesis (Butts §3.1 + Gardner §4 + Conley 1999) unconditionally.
Balanced panel required — every unit must observe every period. An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected FE components and unidentified stage-1 residuals on treated rows. Validator rejects unbalanced inputs.
Omega_0 row-level identification (period strict, unit warn-drop, plus connectivity) — Every period must have at least one Omega_0 row (else time FE for that period is structurally unidentified; hard ValueError). Units lacking Omega_0 rows (e.g. baseline-treated units with D_it = 1 at every observed t) are warned-and-dropped: their unit FE is NaN and the downstream finite-mask path excludes them from stage 2. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; K > 1 components raise ValueError because the FE solver would return only component-specific constants and residualization would silently mix them across components. Mirrors TwoStageDiD’s always-treated unit handling on the warn-drop axis.
One row per (unit, time) cell required — duplicate cells silently re-weight both stage-1 FE estimation and stage-2 OLS. Validator rejects duplicate cells; aggregate to unique cells before fitting.
rings[0] must equal 0 — the partition must cover treated locations (d_it = 0 belongs to Ring 1). Validator rejects rings that start at a nonzero inner edge to prevent a silent exposed-but-unmodeled population in 0 <= d_it < rings[0].
conley_coords must be constant within unit — ring construction uses a single (lat, lon) per unit (units are stationary point locations). Validator rejects coordinates that vary across rows of the same unit; aggregate to one location per unit (e.g. df.groupby(unit)[["lat","lon"]].first()) before fitting if your raw data has moving coordinates. This requirement applies on every fit path, not just vcov_type="conley", because the rings themselves are always coordinate-derived.