Source code for diff_diff.chaisemartin_dhaultfoeuille

"""
de Chaisemartin-D'Haultfoeuille (dCDH) estimator for reversible-treatment DiD.

The dCDH estimator is the only modern DiD estimator in the diff-diff library
that handles **non-absorbing (reversible) treatments** — treatment can switch
on AND off over time. All other staggered estimators in the library
(``CallawaySantAnna``, ``SunAbraham``, ``ImputationDiD``, ``TwoStageDiD``,
``EfficientDiD``, ``WooldridgeDiD``) assume treatment is absorbing.

Phase 1 ships the contemporaneous-switch case ``DID_M`` (= ``DID_1`` at
horizon ``l = 1`` of the dynamic companion paper). Phases 2 and 3 add
dynamic horizons and covariates respectively, on the *same* class — see
``ROADMAP.md`` for the full progression. The forward-compatibility
parameters in :meth:`ChaisemartinDHaultfoeuille.fit` raise
``NotImplementedError`` with phase pointers until later phases land.

References
----------
- de Chaisemartin, C. & D'Haultfoeuille, X. (2020). Two-Way Fixed Effects
  Estimators with Heterogeneous Treatment Effects. *American Economic
  Review*, 110(9), 2964-2996.
- de Chaisemartin, C. & D'Haultfoeuille, X. (2022, revised 2023).
  Difference-in-Differences Estimators of Intertemporal Treatment Effects.
  NBER Working Paper 29873. Web Appendix Section 3.7.3 contains the
  cohort-recentered plug-in variance formula implemented here.
"""

import warnings
from typing import Any, Dict, List, Optional, Sequence, Set, Tuple

import numpy as np
import pandas as pd

from diff_diff.chaisemartin_dhaultfoeuille_bootstrap import (
    ChaisemartinDHaultfoeuilleBootstrapMixin,
)
from diff_diff.chaisemartin_dhaultfoeuille_results import (
    ChaisemartinDHaultfoeuilleResults,
    DCDHBootstrapResults,
)
from diff_diff.linalg import solve_ols
from diff_diff.utils import safe_inference

__all__ = [
    "ChaisemartinDHaultfoeuille",
    "chaisemartin_dhaultfoeuille",
    "twowayfeweights",
    "TWFEWeightsResult",
]


# =============================================================================
# Public dataclass for the standalone TWFE diagnostic helper
# =============================================================================


[docs] class TWFEWeightsResult: """ Lightweight container for the standalone ``twowayfeweights`` helper. Returned by :func:`twowayfeweights`. Mirrors the per-cell decomposition information that the dCDH estimator stores on its results object when ``twfe_diagnostic=True``, but available as a standalone function for users who only want the diagnostic without fitting the full estimator. """ __slots__ = ("weights", "fraction_negative", "sigma_fe", "beta_fe")
[docs] def __init__( self, weights: pd.DataFrame, fraction_negative: float, sigma_fe: float, beta_fe: float, ) -> None: self.weights = weights self.fraction_negative = fraction_negative self.sigma_fe = sigma_fe self.beta_fe = beta_fe
def __repr__(self) -> str: return ( f"TWFEWeightsResult(beta_fe={self.beta_fe:.4f}, " f"fraction_negative={self.fraction_negative:.4f}, " f"sigma_fe={self.sigma_fe:.4f}, n_cells={len(self.weights)})" )
# ============================================================================= # Shared validation + cell aggregation helper # ============================================================================= def _validate_and_aggregate_to_cells( data: pd.DataFrame, outcome: str, group: str, time: str, treatment: str, weights: Optional[np.ndarray] = None, ) -> pd.DataFrame: """ Validate input data and aggregate to ``(g, t)`` cells per the dCDH contract. Used by both :meth:`ChaisemartinDHaultfoeuille.fit` and :func:`twowayfeweights` so the validation rules and aggregation behavior are identical across the two public entry points. The contract (matching ``REGISTRY.md`` ``## ChaisemartinDHaultfoeuille``): 1. **Required columns** ``outcome``, ``group``, ``time``, ``treatment`` must all be present in ``data`` (raises ``ValueError`` listing any missing). 2. **Treatment** must coerce to numeric and contain no ``NaN`` (raises ``ValueError`` — silent dropping would change cell counts without informing the user). 3. **Outcome** must coerce to numeric and contain no ``NaN`` (same reasoning). 4. **Treatment** must be numeric. Both binary ``{0, 1}`` and non-binary (ordinal or continuous) treatment are supported. Non-binary treatment requires ``L_max >= 1`` in ``fit()`` because the per-period DID path uses binary joiner/leaver categorization. 5. **Cell aggregation** via ``groupby([group, time]).agg(...)`` producing ``y_gt`` (cell mean of ``outcome``), ``d_gt`` (cell mean of ``treatment``), and ``n_gt`` (count of original observations in the cell). 6. **Within-cell-varying treatment** (any cell where ``d_min != d_max``) raises ``ValueError``. Treatment must be constant within each ``(group, time)`` cell; fuzzy DiD is deferred to a separate dCdH 2018 paper. Pre-aggregate your data to constant cell-level treatment before calling ``fit()`` or ``twowayfeweights()``. When ``weights`` is provided (survey pweights), cell means use weighted averages: ``y_gt = sum(w_i * y_i) / sum(w_i)``. An additional column ``w_gt`` (total weight per cell) is included in the output for downstream IF expansion. Returns the aggregated cell DataFrame with columns ``[group, time, y_gt, d_gt, n_gt]`` (plus ``w_gt`` when weighted), sorted by ``[group, time]`` with a fresh index. Raises ------ ValueError On missing columns; NaN values in any of the ``group``, ``time``, ``treatment``, or ``outcome`` columns (``group`` and ``time`` are rejected pre-``groupby`` because ``groupby`` silently drops NaN keys, which would change the estimation sample without warning); non-numeric treatment / outcome that cannot be coerced via ``pd.to_numeric``; or within-cell-varying treatment (any ``(group, time)`` cell where ``d_min != d_max``, since fuzzy DiD is out of scope and deferred to a separate dCdH 2018 paper). Integer-coded non-binary treatment (the ``by_path`` / ``paths_of_interest`` requirement) is enforced separately at ``fit()`` time, not here at aggregation time — this helper accepts continuous ``d_gt`` cell means and lets ``fit()`` decide whether the integer-only contract applies. Under the survey-weighted path (``weights`` is not ``None``), zero-weight rows are pre-filtered before any NaN / coercion / within-cell validation per the ``SurveyDesign.subpopulation()`` out-of-sample contract — invalid values in zero-weight rows therefore do NOT raise. NaN / coercion / within-cell checks still apply to all positive-weight rows. """ # 1. Required columns missing = [c for c in (outcome, group, time, treatment) if c not in data.columns] if missing: raise ValueError( f"ChaisemartinDHaultfoeuille / twowayfeweights: column(s) {missing!r} " f"not found in data. Required columns: outcome, group, time, treatment." ) df = data.copy() # 1a. SurveyDesign.subpopulation() contract: zero-weight rows are # out-of-sample. Pre-filter them *before* any NaN/coercion validation # so that invalid values in excluded rows do not abort the fit. if weights is not None: weights_arr = np.asarray(weights, dtype=np.float64) pos_mask = weights_arr > 0 if not pos_mask.all(): df = df.loc[pos_mask].reset_index(drop=True) weights = weights_arr[pos_mask] # 1b. Group and time NaN checks (before groupby, which silently drops NaN keys) n_nan_group = int(df[group].isna().sum()) if n_nan_group > 0: raise ValueError( f"Group column {group!r} contains {n_nan_group} NaN value(s). " "groupby silently drops NaN keys, which would change the " "estimation sample without warning. Drop or impute NaN group " "values before calling fit() or twowayfeweights()." ) n_nan_time = int(df[time].isna().sum()) if n_nan_time > 0: raise ValueError( f"Time column {time!r} contains {n_nan_time} NaN value(s). " "groupby silently drops NaN keys, which would change the " "estimation sample without warning. Drop or impute NaN time " "values before calling fit() or twowayfeweights()." ) # 2. Treatment numeric coercion + NaN check try: df[treatment] = pd.to_numeric(df[treatment]) except (ValueError, TypeError) as exc: raise ValueError( f"Could not coerce treatment column {treatment!r} to numeric: {exc}" ) from exc n_nan_treat = int(df[treatment].isna().sum()) if n_nan_treat > 0: raise ValueError( f"Treatment column {treatment!r} contains {n_nan_treat} NaN value(s). " "ChaisemartinDHaultfoeuille requires non-missing treatment indicators " "on every observation; impute or drop NaN treatment rows before fitting " "so the dropped count is explicit." ) # 3. Outcome numeric coercion + NaN check try: df[outcome] = pd.to_numeric(df[outcome]) except (ValueError, TypeError) as exc: raise ValueError(f"Could not coerce outcome column {outcome!r} to numeric: {exc}") from exc n_nan_outcome = int(df[outcome].isna().sum()) if n_nan_outcome > 0: raise ValueError( f"Outcome column {outcome!r} contains {n_nan_outcome} NaN value(s). " "Drop or impute missing outcomes before calling fit() so the " "exclusion is explicit (silently averaging over present values " "would distort per-cell means)." ) # 4. Treatment must be numeric (binary or non-binary both accepted) # No longer enforces {0, 1} - non-binary and continuous treatment supported. # 5. Cell aggregation (compute min/max for within-cell check) if weights is not None: # Survey-weighted cell aggregation (zero-weight rows already # filtered upstream at step 1a). # y_gt = sum(w_i * y_i) / sum(w_i) within each (g, t) cell. # Treatment is constant within cells (checked below), so weighted # and unweighted means are identical for d_gt. df["_w_"] = weights df["_wy_"] = weights * df[outcome].values g_obj = df.groupby([group, time], as_index=False) cell = g_obj.agg( _wy_sum=("_wy_", "sum"), w_gt=("_w_", "sum"), d_gt=(treatment, "mean"), d_min=(treatment, "min"), d_max=(treatment, "max"), n_gt=(treatment, "count"), ) cell["y_gt"] = cell["_wy_sum"] / cell["w_gt"] cell = cell.drop(columns=["_wy_sum"]) # Zero-weight cells: drop entirely so downstream validators # (ragged-panel, baseline requirement) don't see them. zero_w_mask = cell["w_gt"] <= 0 if zero_w_mask.any(): cell = cell[~zero_w_mask].reset_index(drop=True) df.drop(columns=["_w_", "_wy_"], inplace=True) else: cell = df.groupby([group, time], as_index=False).agg( y_gt=(outcome, "mean"), d_gt=(treatment, "mean"), d_min=(treatment, "min"), d_max=(treatment, "max"), n_gt=(treatment, "count"), ) # 6. Within-cell-varying treatment rejection. # All observations in a cell must have the same treatment value # (for both binary and non-binary treatment). Detect by checking # that cell min equals cell max. non_constant_mask = cell["d_min"] != cell["d_max"] if non_constant_mask.any(): n_non_constant = int(non_constant_mask.sum()) example_cells = cell.loc[non_constant_mask, [group, time, "d_gt", "d_min", "d_max"]].head(5) raise ValueError( f"Within-cell-varying treatment detected in {n_non_constant} " f"(group, time) cell(s). dCDH requires treatment to be " f"constant within each (group, time) cell. Cells where " f"d_min != d_max indicate that some units have different " f"treatment values. Pre-aggregate your data to constant " f"cell-level treatment before calling fit() or " f"twowayfeweights(). Fuzzy DiD is deferred to a separate " f"dCDH paper (see ROADMAP.md out-of-scope). Affected cells " f"(first 5):\n{example_cells}" ) # Drop the min/max columns; keep d_gt as float (no int cast - supports # ordinal and continuous treatment). w_gt retained when weighted. drop_cols = ["d_min", "d_max"] cell = cell.drop(columns=drop_cols) # Sort to ensure deterministic order in downstream operations cell = cell.sort_values([group, time]).reset_index(drop=True) return cell def _validate_paths_of_interest( paths_of_interest: Any, ) -> List[Tuple[int, ...]]: """Validate and canonicalize ``paths_of_interest`` to ``List[Tuple[int, ...]]``. Rejects non-sequence inputs, empty lists, non-tuple/list path entries, empty path entries, non-int elements (including ``bool`` and ``np.bool_``), and entries with mixed lengths. Numpy integer types (``np.integer``) are accepted and canonicalized to Python ``int`` so the resulting tuples are usable as dict keys interchangeably with paths emitted by ``_enumerate_treatment_paths`` (which casts via ``int(round(float(v)))``). """ if not isinstance(paths_of_interest, (list, tuple)): raise ValueError( f"paths_of_interest must be a list/tuple of int tuples, " f"got {type(paths_of_interest).__name__}." ) if len(paths_of_interest) == 0: raise ValueError("paths_of_interest must be non-empty.") canonical: List[Tuple[int, ...]] = [] for i, p in enumerate(paths_of_interest): if not isinstance(p, (list, tuple)): raise ValueError( f"paths_of_interest[{i}] must be a tuple/list of ints, " f"got {type(p).__name__}." ) if len(p) == 0: raise ValueError(f"paths_of_interest[{i}] must be non-empty.") canonical_path: List[int] = [] for j, v in enumerate(p): if isinstance(v, (bool, np.bool_)) or not isinstance(v, (int, np.integer)): raise ValueError( f"paths_of_interest[{i}][{j}] must be an int, got " f"{v!r} of type {type(v).__name__}." ) canonical_path.append(int(v)) canonical.append(tuple(canonical_path)) lens = {len(p) for p in canonical} if len(lens) > 1: raise ValueError( f"paths_of_interest entries must all have the same length " f"(L_max+1); got mixed lengths {sorted(lens)}." ) return canonical # ============================================================================= # Main estimator class # =============================================================================
[docs] class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin): """ de Chaisemartin-D'Haultfoeuille (dCDH) estimator. The only modern DiD estimator in the library that handles **reversible (non-absorbing) treatments** - treatment may switch on AND off over time. Computes the contemporaneous-switch DiD ``DID_M`` from the AER 2020 paper (equivalently ``DID_1`` at horizon ``l = 1`` of the dynamic companion paper, NBER WP 29873) plus the full multi-horizon event study ``DID_l`` for ``l = 1..L_max`` via the ``L_max`` parameter on :meth:`fit`. Supported: - Headline ``DID_M`` plus multi-horizon ``DID_l`` event study - Joiners-only ``DID_+`` and leavers-only ``DID_-`` decompositions - Single-lag placebo ``DID_M^pl`` and dynamic placebos ``DID^{pl}_l`` (computed automatically by default; gate via ``placebo=False``) - Analytical SE via the cohort-recentered plug-in formula from Web Appendix Section 3.7.3; multiplier bootstrap clustered at the group level by default via ``n_bootstrap``; under ``survey_design`` with strictly-coarser PSUs the bootstrap automatically upgrades to PSU-level Hall-Mammen wild clustering (see REGISTRY.md ``ChaisemartinDHaultfoeuille`` Note on survey + bootstrap) - Normalized estimator ``DID^n_l``, cost-benefit aggregate ``delta``, and sup-t simultaneous confidence bands - Residualization-style covariate adjustment (``DID^X``) via ``controls=``, group-specific linear trends (``DID^{fd}``) via ``trends_linear=True``, state-set-specific trends via ``trends_nonparam=``, heterogeneity testing, non-binary treatment, HonestDiD sensitivity integration on placebos via ``honest_did=True`` - Per-path event-study disaggregation via ``by_path=k`` (top-k most common observed treatment paths within the window ``[F_g-1, F_g-1+L_max]``; requires ``drop_larger_lower=False``; supports binary or integer-coded discrete treatment) or via ``paths_of_interest=[(...), ...]`` for an explicit user-specified path subset (Python-only API; mutex with ``by_path=k``) - Survey support via ``survey_design=``: pweight with strata/PSU/FPC via Taylor Series Linearization (analytical) or replicate-weight variance (BRR/Fay/JK1/JKn/SDR) - TWFE decomposition diagnostic from Theorem 1 of AER 2020 Only ``aggregate`` on :meth:`fit` still raises ``NotImplementedError``. Parameters ---------- alpha : float, default=0.05 Significance level for confidence intervals. cluster : str, optional, default=None Must be ``None`` (the default). User-specified clustering via this kwarg is not supported — passing any non-``None`` value raises ``NotImplementedError`` at construction time (and the same gate fires from ``set_params``). The effective clustering depends on how you call ``fit()``: - **Default (no survey_design)**: clustered at the group level via the cohort-recentered influence-function plug-in (analytical SEs) and the multiplier bootstrap. - **Under ``survey_design`` with auto-inject or explicit ``psu=group``**: PSU coincides with the group and the group-level and PSU-level paths are bit-identical. - **Under ``survey_design`` with strictly-coarser PSUs**: the multiplier bootstrap automatically upgrades to PSU-level Hall-Mammen wild clustering. So dCDH does NOT always cluster at the group level — see REGISTRY.md ``ChaisemartinDHaultfoeuille`` Notes on cluster contract and survey + bootstrap for the full matrix. Custom user-specified clustering at a coarser or finer level than the group is a planned extension. n_bootstrap : int, default=0 Number of multiplier-bootstrap iterations. ``0`` (default) uses only the analytical SE. Set to ``999`` or higher for stable bootstrap inference. bootstrap_weights : str, default="rademacher" Type of multiplier-bootstrap weights: ``"rademacher"``, ``"mammen"``, or ``"webb"``. Ignored unless ``n_bootstrap > 0``. seed : int, optional Random seed for the multiplier bootstrap. placebo : bool, default=True If ``True`` (default), automatically compute the single-lag placebo ``DID_M^pl`` (AER 2020 placebo specification) on the same data. Set to ``False`` to skip the placebo computation for speed; the results object will still expose ``placebo_*`` fields, but with NaN values and ``placebo_available=False``. twfe_diagnostic : bool, default=True If ``True`` (default), compute the TWFE decomposition diagnostic from Theorem 1 of AER 2020: per-``(g, t)`` weights, fraction of treated cells with negative weights, and ``sigma_fe`` (the smallest cell-effect standard deviation that could flip the sign of the plain TWFE coefficient). The diagnostic answers "what would the plain TWFE estimator say on the data you passed in?", so it runs on the **FULL pre-filter cell sample** (the same input as the standalone :func:`twowayfeweights` function), NOT on the post-filter estimation sample used by ``DID_M``. When the ragged-panel filter or ``drop_larger_lower`` drops groups, the fitted ``results.twfe_*`` values describe a LARGER sample (pre-filter) than ``results.overall_att`` and a ``UserWarning`` is emitted to make the divergence explicit. See REGISTRY.md ``ChaisemartinDHaultfoeuille`` ``Note (TWFE diagnostic sample contract)`` for the full rationale. drop_larger_lower : bool, default=True If ``True`` (default, matches R ``DIDmultiplegtDYN``), drops groups whose treatment switches more than once (multi-switch groups) before estimation. This is required for the analytical variance formula to be consistent with the AER 2020 Theorem 3 point estimate — both formulas operate on the same post-drop dataset. Setting to ``False`` is supported for diagnostic comparison but produces an inconsistent estimator-variance pairing for multi-switch groups; a warning is emitted. by_path : int, optional, default=None If set to a positive integer ``k``, disaggregate the per-horizon event study by the observed treatment trajectory in the window ``[F_g - 1, F_g, ..., F_g - 1 + L_max]``, reporting ATT + SE + inference for the ``k`` most common observed paths (ties broken lexicographically on the path tuple). If ``k`` exceeds the number of observed paths, all paths are returned and a ``UserWarning`` is emitted. ``None`` (the default) disables the disaggregation. Requires ``drop_larger_lower=False`` (multi-switch groups are the object of interest) and ``L_max >= 1`` (the path window depends on ``L_max``). Compatible with non-binary integer-coded treatment (D in Z); path tuples become integer-state tuples like ``(0, 2, 2, 2)``. D values must be integer-valued (``D == round(D)``); a ``ValueError`` is raised at fit-time on continuous D. Compatible with ``survey_design`` for analytical Binder TSL SE and replicate-weight bootstrap; per-path SE routes through the cell-period allocator, with non-path switcher-side contributions skipped (control contributions remain unchanged, matching the joiners/leavers IF convention). ``n_bootstrap > 0`` (multiplier bootstrap) under ``survey_design`` is not yet supported and raises ``NotImplementedError``. Top-k path ranking under ``survey_design`` remains group-cardinality-based (unweighted), not population-weight-based — survey weights do not affect which paths are selected as "top-k". Compatible with ``heterogeneity="<col>"`` — per-path heterogeneity coefficient is computed by re-running the Lemma 7 regression on each path-restricted switcher subsample. Cohort dummies absorb baseline (no R-divergence warning needed). Surfaces on ``results.path_heterogeneity_effects`` keyed ``{path: {l: {beta, se, t_stat, p_value, conf_int, n_obs}}}`` and on ``to_dataframe(level="by_path")`` via ``het_*`` columns. Mirrors R ``did_multiplegt_dyn(..., by_path, predict_het)`` per-by_level. Composes with ``survey_design`` (analytical Binder TSL + replicate-weight) via the existing cell-period IF allocator path. Incompatible with ``design2`` and ``honest_did`` (each combination raises ``NotImplementedError`` in the current release). Mutually exclusive with ``paths_of_interest`` — use ``by_path=k`` for top-k automatic ranking by frequency, or ``paths_of_interest=[(...), ...]`` for an explicit user- specified path list. Setting both raises ``ValueError``. Compatible with ``controls`` (DID^X residualization) -- the per-baseline OLS residualization runs once on first-differenced ``Y`` BEFORE path enumeration, so per-path point estimates, bootstrap SE, per-path placebos, and per-path sup-t bands all consume the residualized ``Y_mat`` automatically (Frisch- Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing ``controls`` + per-period DID contract. **Deviation from R on multi-baseline switcher panels:** R ``did_multiplegt_dyn(..., by_path, controls)`` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls), so its residualization coefficients vary per path when switchers have different baseline values. Our global- residualization architecture coincides with R on single- baseline panels (every switcher shares the same ``D_{g,1}``) and per-path point estimates match exactly on the one- observation-per-``(g, t)`` regime; on multi-observation-per- cell panels the existing DID^X cell-weighting deviation from R applies (see ``docs/methodology/REGISTRY.md`` "Note (Phase 3 DID^X covariate adjustment)"; independent of the by_path lift). On multi-baseline switcher panels, point estimates can diverge — a ``UserWarning`` is emitted at fit-time when this configuration is detected. SE inherits the cross-path cohort- sharing deviation from R documented for ``path_effects``. Compatible with ``trends_linear`` (DID^{fd} group-specific linear trends) -- first-differencing replaces ``Y`` with ``Z = Y_t - Y_{t-1}`` once globally before path enumeration, so per-path raw second-differences DID^{fd}_{path, l} surface on ``path_effects[path]["horizons"][l]`` automatically. Per-path cumulated level effects ``delta_{path, l} = sum_{l'=1..l} DID^{fd}_{path, l'}`` are surfaced on the new ``results.path_cumulated_event_study[path][l]`` field (mirroring the global ``linear_trends_effects`` cumulation; inner dict keyed by horizon directly, no ``"horizons"`` wrapper). SE on the cumulated layer is the conservative upper bound (sum of per-horizon component SEs, NaN-consistent), matching the global ``linear_trends_effects`` SE convention. Path enumeration runs on the post-first-differenced ``N_mat_fd``: switchers with ``F_g==2`` fail the window-eligibility check and are dropped from path enumeration entirely, so a path whose switchers all have ``F_g < 3`` is silently absent from ``path_effects`` (the existing global ``F_g < 3`` warning still fires). Per-path R parity matches R ``did_multiplegt_dyn(..., by_path, trends_lin)`` on per-path cumulated point estimates under single-baseline panels with sufficient pre-window depth (``F_g >= 4`` for every selected- path switcher). R re-runs the per-path full pipeline on each path's restricted subsample; same multi-baseline divergence pattern as ``controls`` (a ``UserWarning`` fires when switcher baselines take multiple values). **F_g=3 boundary-case divergence:** `F_g=3` switchers have only 1 valid pre-window Z value after first-differencing and the ``time==1`` filter, which causes Python's global-then-disaggregate architecture to diverge from R's per-path full-pipeline call (30%+ on point estimates observed empirically). A separate ``UserWarning`` fires at fit-time when the panel includes any `F_g=3` switchers and `by_path + trends_linear` is set, so practitioners hitting this boundary regime see the divergence flag explicitly. **Placebo under trends_linear returns RAW per-horizon values, not cumulated** -- there is no per-path placebo cumulation surface (verified empirically against R via the existing ``joiners_only_trends_lin`` parity scenario). Compatible with ``trends_nonparam`` (state-set trends) -- the set membership column is validated and stored once globally (time-invariance, NaN rejection, partition coarseness checks unchanged); per-path analytical SE, bootstrap SE, per-path placebos, and per-path sup-t bands all inherit the set-restricted control pool automatically through the ``set_ids`` parameter threaded through the per-path IF helpers. Per-path R parity matches R ``did_multiplegt_dyn(..., by_path, trends_nonparam)`` on per-path point estimates under single-baseline panels. Compatible with ``n_bootstrap > 0`` -- the top-k paths are enumerated once on the observed data (paths held fixed across bootstrap draws, matching R ``did_multiplegt_dyn(..., by_path, bootstrap=B)``) and bootstrap SE / percentile CI / percentile p-value are written to ``path_effects[path]["horizons"][l]`` in place of the analytical fields. See REGISTRY.md for the full bootstrap contract. Compatible with ``placebo=True`` -- when both are active, per-path backward-horizon placebos ``DID^{pl}_{path, l}`` for ``l = 1..L_max`` are surfaced on ``results.path_placebo_event_study[path][-l]`` (negative-int keys mirroring ``placebo_event_study``). The same per-path SE convention is applied backward (joiners/leavers IF precedent; cohort-recentered plug-in with path-specific divisor); the cross-path cohort-sharing deviation from R is inherited from the analytical event-study path. With ``n_bootstrap > 0``, per-path joint sup-t simultaneous confidence bands are also computed across horizons ``1..L_max`` within each path. A path-specific critical value ``c_p`` (constructed from a fresh shared-weights multiplier- bootstrap draw per path) is surfaced at top level as ``results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}``, applied per-horizon as ``cband_conf_int`` on ``path_effects[path]["horizons"][l]``, and rendered as ``cband_lower`` / ``cband_upper`` columns on ``results.to_dataframe(level="by_path")`` (mirroring the OVERALL ``level="event_study"`` schema). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Python-only library extension; R ``did_multiplegt_dyn`` provides no joint bands at any surface. See REGISTRY.md ``Note (Phase 3 by_path per-path joint sup-t bands)``. SE convention: per-path IF parallels the joiners / leavers construction — the switcher-side contribution is zeroed for groups not in the selected path, and the cohort structure and control pool are unchanged. Plug-in SE uses the path-specific divisor ``N_l_path`` (count of path switchers eligible at horizon ``l``), matching how ``joiners_se`` / ``leavers_se`` use their respective counts as divisors. See REGISTRY.md ``ChaisemartinDHaultfoeuille`` ``Note`` on ``by_path`` for the full contract. Results are exposed on ``results.path_effects`` as a dict keyed by the path tuple, with nested ``"horizons"`` dicts per horizon ``l``. Also available via ``results.to_dataframe(level="by_path")``. paths_of_interest : list of tuple of int, optional, default=None Explicit user-specified treatment paths to disaggregate by, as an alternative to ``by_path=k``'s top-k automatic ranking. Each path tuple must have length ``L_max + 1`` and represents the treatment trajectory in the window ``[F_g - 1, F_g, ..., F_g - 1 + L_max]``, e.g. ``[(0, 1, 1, 1), (0, 1, 0, 0)]`` for two paths under ``L_max=3``. Mutually exclusive with ``by_path``; setting both raises ``ValueError``. Validation: - Each path element must be an ``int`` (``bool`` and ``np.bool_`` rejected; ``np.integer`` accepted and canonicalized to Python ``int``). - All paths must have the same length (uniformity validated at ``__init__``; length match against ``L_max + 1`` validated at fit-time). - Empty list raises ``ValueError``. - Duplicate paths are deduplicated with a ``UserWarning``. - A path with zero observed groups in the panel emits a ``UserWarning`` and is omitted from ``path_effects``. Compatible with non-binary integer treatment (paths can contain integer states like ``(0, 2, 2)``). Compatible with all downstream surfaces inherited by ``by_path``: bootstrap, per-path placebos, per-path joint sup-t bands, ``controls``, ``trends_linear``, ``trends_nonparam``, ``survey_design`` (analytical Binder TSL + replicate-weight; multiplier bootstrap under survey remains gated, same as ``by_path=k``), and ``heterogeneity`` (per-path heterogeneity coefficient surfaces on ``results.path_heterogeneity_effects``). Mechanical extension to path enumeration; no methodology change. **Order semantics**: paths appear in ``results.path_effects`` in the user-specified order, modulo deduplication and unobserved-path filtering. **Python-only API extension; no R equivalent.** R's ``did_multiplegt_dyn(..., by_path=k)`` only accepts a positive int (top-k) or ``-1`` (all paths); there is no list-based path selection in R. Results expose the same surfaces as ``by_path``: ``results.path_effects`` (dict keyed by path tuple), ``results.path_placebo_event_study``, ``results.path_sup_t_bands``, ``results.path_cumulated_event_study`` (under ``trends_linear``), and the ``level="by_path"`` DataFrame. rank_deficient_action : str, default="warn" Action when the TWFE decomposition diagnostic OLS encounters a rank-deficient design matrix: ``"warn"``, ``"error"``, or ``"silent"``. Only used when ``twfe_diagnostic=True``. Attributes ---------- results_ : ChaisemartinDHaultfoeuilleResults Estimation results after calling :meth:`fit`. is_fitted_ : bool Whether the model has been fitted. Notes ----- The analytical CI is **conservative** under Assumption 8 (independent groups) of the dynamic companion paper, and exact only under iid sampling. This is documented as a deliberate deviation from "default nominal coverage" in ``REGISTRY.md``. Examples -------- Basic single-switch panel: >>> from diff_diff import ChaisemartinDHaultfoeuille >>> from diff_diff.prep_dgp import generate_reversible_did_data >>> data = generate_reversible_did_data(n_groups=80, n_periods=6, seed=42) >>> est = ChaisemartinDHaultfoeuille() >>> results = est.fit( ... data, outcome="outcome", group="group", ... time="period", treatment="treatment", ... ) >>> abs(results.overall_att - 2.0) < 1.0 # close to the true effect True """
[docs] def __init__( self, alpha: float = 0.05, cluster: Optional[str] = None, n_bootstrap: int = 0, bootstrap_weights: str = "rademacher", seed: Optional[int] = None, placebo: bool = True, twfe_diagnostic: bool = True, drop_larger_lower: bool = True, by_path: Optional[int] = None, paths_of_interest: Optional[Sequence[Sequence[int]]] = None, rank_deficient_action: str = "warn", ) -> None: # Parameter validation if rank_deficient_action not in ("warn", "error", "silent"): raise ValueError( f"rank_deficient_action must be 'warn', 'error', or 'silent', " f"got '{rank_deficient_action}'" ) if bootstrap_weights not in ("rademacher", "mammen", "webb"): raise ValueError( f"bootstrap_weights must be 'rademacher', 'mammen', or 'webb', " f"got '{bootstrap_weights}'" ) if not 0.0 < alpha < 1.0: raise ValueError(f"alpha must be in (0, 1), got {alpha}") if n_bootstrap < 0: raise ValueError(f"n_bootstrap must be non-negative, got {n_bootstrap}") if by_path is not None: if isinstance(by_path, bool) or not isinstance(by_path, int): raise ValueError( f"by_path must be None or a positive int, got " f"{by_path!r} of type {type(by_path).__name__}." ) if by_path <= 0: raise ValueError( f"by_path must be a positive int (top-k most common paths), " f"got {by_path}. Use by_path=None to disable, or " f"paths_of_interest for explicit path selection." ) if paths_of_interest is not None: paths_of_interest = _validate_paths_of_interest(paths_of_interest) if by_path is not None and paths_of_interest is not None: raise ValueError( "by_path and paths_of_interest are mutually exclusive. " "Use by_path=k for top-k automatic ranking, OR " "paths_of_interest=[(...), ...] for explicit user-" "specified paths. Set one and leave the other as None." ) if cluster is not None: raise NotImplementedError( f"cluster={cluster!r}: user-specified clustering is not " f"supported in ChaisemartinDHaultfoeuille. dCDH clusters at " f"the group level by default via the cohort-recentered " f"influence-function plug-in (analytical SEs) and the " f"multiplier bootstrap. Under survey_design with strictly-" f"coarser PSUs, bootstrap clustering automatically upgrades " f"to PSU-level Hall-Mammen wild. To use the default path, " f"pass cluster=None (the " f"default). Custom clustering is reserved for a future " f"phase. See REGISTRY.md ChaisemartinDHaultfoeuille section " f"for the full contract." ) self.alpha = alpha self.cluster = cluster self.n_bootstrap = n_bootstrap self.bootstrap_weights = bootstrap_weights self.seed = seed self.placebo = placebo self.twfe_diagnostic = twfe_diagnostic self.drop_larger_lower = drop_larger_lower self.by_path = by_path self.paths_of_interest = paths_of_interest self.rank_deficient_action = rank_deficient_action self.is_fitted_ = False self.results_: Optional[ChaisemartinDHaultfoeuilleResults] = None
# ------------------------------------------------------------------ # sklearn-style parameter introspection # ------------------------------------------------------------------
[docs] def get_params(self) -> Dict[str, Any]: """Return all ``__init__`` parameters as a dictionary.""" return { "alpha": self.alpha, "cluster": self.cluster, "n_bootstrap": self.n_bootstrap, "bootstrap_weights": self.bootstrap_weights, "seed": self.seed, "placebo": self.placebo, "twfe_diagnostic": self.twfe_diagnostic, "drop_larger_lower": self.drop_larger_lower, "by_path": self.by_path, "paths_of_interest": self.paths_of_interest, "rank_deficient_action": self.rank_deficient_action, }
[docs] def set_params(self, **params: Any) -> "ChaisemartinDHaultfoeuille": """ Set estimator parameters (sklearn-compatible). **Transactional**: validation runs after the candidate mutations, and if any rule fails the estimator state is rolled back to its pre-call values before the exception is re-raised. Callers can therefore retry with corrected params on the same instance without repairing inconsistent intermediate state. """ # Snapshot current values for the keys we are about to set so # we can roll back on validation failure (transactional semantics). for key in params: if not hasattr(self, key): raise ValueError(f"Unknown parameter: {key}") snapshot = {key: getattr(self, key) for key in params} try: for key, value in params.items(): setattr(self, key, value) self._validate_invariants() except Exception: for key, value in snapshot.items(): setattr(self, key, value) raise return self
def _validate_invariants(self) -> None: """Run the post-mutation validation rules. Mirrors `__init__`.""" # Re-run __init__ validation rules so the post-set state is valid. if self.rank_deficient_action not in ("warn", "error", "silent"): raise ValueError( f"rank_deficient_action must be 'warn', 'error', or 'silent', " f"got '{self.rank_deficient_action}'" ) if self.bootstrap_weights not in ("rademacher", "mammen", "webb"): raise ValueError( f"bootstrap_weights must be 'rademacher', 'mammen', or 'webb', " f"got '{self.bootstrap_weights}'" ) if not 0.0 < self.alpha < 1.0: raise ValueError(f"alpha must be in (0, 1), got {self.alpha}") if self.n_bootstrap < 0: raise ValueError(f"n_bootstrap must be non-negative, got {self.n_bootstrap}") if self.by_path is not None: if isinstance(self.by_path, bool) or not isinstance(self.by_path, int): raise ValueError( f"by_path must be None or a positive int, got " f"{self.by_path!r} of type {type(self.by_path).__name__}." ) if self.by_path <= 0: raise ValueError( f"by_path must be a positive int (top-k most common paths), " f"got {self.by_path}. Use by_path=None to disable, or " f"paths_of_interest for explicit path selection." ) if self.paths_of_interest is not None: self.paths_of_interest = _validate_paths_of_interest(self.paths_of_interest) if self.by_path is not None and self.paths_of_interest is not None: raise ValueError( "by_path and paths_of_interest are mutually exclusive. " "Use by_path=k for top-k automatic ranking, OR " "paths_of_interest=[(...), ...] for explicit user-" "specified paths. Set one and leave the other as None." ) if self.cluster is not None: raise NotImplementedError( f"cluster={self.cluster!r}: user-specified clustering is " f"not supported in ChaisemartinDHaultfoeuille. dCDH clusters " f"at the group level by default; under survey_design with " f"strictly-coarser PSUs the bootstrap automatically upgrades " f"to PSU-level Hall-Mammen wild clustering. Pass cluster=None " f"(the default) to use this path. User-specified custom " f"clustering is reserved for a future phase. See REGISTRY.md " f"ChaisemartinDHaultfoeuille section for the full contract." ) # ------------------------------------------------------------------ # fit # ------------------------------------------------------------------
[docs] def fit( self, data: pd.DataFrame, outcome: str, group: str, time: str, treatment: str, # ---------- forward-compat parameters ---------- aggregate: Optional[str] = None, L_max: Optional[int] = None, controls: Optional[List[str]] = None, trends_linear: Optional[bool] = None, trends_nonparam: Optional[Any] = None, honest_did: bool = False, # ---------- Phase 3 extensions ---------- heterogeneity: Optional[str] = None, design2: bool = False, # ---------- deferred (separate effort) ---------- survey_design: Any = None, ) -> ChaisemartinDHaultfoeuilleResults: """ Fit the dCDH estimator on individual-level panel data. Parameters ---------- data : pd.DataFrame Individual-level panel. Must contain columns for ``outcome``, ``group``, ``time``, and ``treatment``. The estimator internally aggregates to ``(group, time)`` cells. outcome : str Outcome variable column name. group : str Group identifier column name. Treatment must be constant within each ``(group, time)`` cell after aggregation; ``ValueError`` is raised if any cell has fractional treatment after grouping (within-cell-varying treatment indicates a fuzzy design not supported in Phase 1). time : str Time period column name. Must be sortable. treatment : str Per-observation treatment column. Must be numeric and constant within each ``(group, time)`` cell. Both binary ``{0, 1}`` and non-binary (ordinal or continuous) treatment are supported. Non-binary treatment requires ``L_max >= 1``. aggregate : str, optional **Reserved for Phase 3.** Must be ``None``; any other value raises ``NotImplementedError``. L_max : int, optional Maximum event-study horizon. When set, computes ``DID_l`` for ``l = 1, ..., L_max`` using the per-group building block from Equation 3 of the dynamic companion paper. When ``None`` (default), only the ``l = 1`` contemporaneous- switch estimator ``DID_M`` is computed (Phase 1 behavior). Must be a positive integer not exceeding the number of post-baseline periods in the panel. controls : list of str, optional Column names for covariate adjustment via residualization-style ``DID^X`` (Web Appendix Section 1.2). Requires ``L_max >= 1``. One ``theta_hat`` per baseline treatment value, estimated by OLS on not-yet-treated observations. NOT doubly-robust. trends_linear : bool, optional If ``True``, estimate group-specific linear trends via ``DID^{fd}`` (Web Appendix Section 1.3, Lemma 6). Requires ``L_max >= 1`` and at least 3 time periods. trends_nonparam : str, optional Column name for state-set membership. Restricts the control pool to groups in the same set (Web Appendix Section 1.4). Requires ``L_max >= 1`` and time-invariant values per group. honest_did : bool, default=False Run HonestDiD sensitivity analysis (Rambachan & Roth 2023) on the placebo + event study surface. Requires ``L_max >= 1``. Default: relative magnitudes (DeltaRM, Mbar=1.0), targeting the equal-weight average over all post-treatment horizons (``l_vec=None``). Results stored on ``results.honest_did_results``; ``None`` with a warning if the solver fails. For custom parameters (e.g., targeting the on-impact effect only via ``l_vec``), call ``compute_honest_did(results, ...)`` post-hoc instead. heterogeneity : str, optional Column name for a time-invariant covariate to test for heterogeneous effects (Web Appendix Section 1.5, Lemma 7). Partial implementation: post-treatment regressions only (no placebo regressions or joint null test). Cannot be combined with ``controls``, ``trends_linear``, or ``trends_nonparam``. Requires ``L_max >= 1``. Under ``by_path`` / ``paths_of_interest``, per-path heterogeneity coefficients also surface on ``results.path_heterogeneity_effects`` and on ``to_dataframe(level="by_path")`` via ``het_*`` columns. design2 : bool, default=False If ``True``, identify and report switch-in/switch-out (Design-2) groups. Convenience wrapper (descriptive summary, not full paper re-estimation). Requires ``drop_larger_lower=False`` to retain 2-switch groups. survey_design : SurveyDesign, optional Survey design specification for design-based inference. Supports ``weight_type='pweight'`` with two variance paths: (1) Taylor Series Linearization using strata / PSU / FPC (analytical) via the **cell-period IF allocator** that attributes per-``(g, t)``-cell mass and aggregates through Binder (1983), and (2) replicate-weight variance using BRR / Fay / JK1 / JKn / SDR methods (analytical, closed- form). Survey weights produce weighted cell means for the point estimate. Under a survey design without an explicit ``psu``, ``fit()`` auto-injects ``psu=<group_col>`` as a safe default (the group is the effective sampling unit). **Strata and PSU may vary across cells of a group** but must be constant within each ``(g, t)`` cell (trivially true in one-obs-per-cell panels; enforced otherwise with ``ValueError``). Three supported combinations under the auto-injected ``psu=<group_col>``: (1) strata constant within group (any ``nest`` flag works); (2) strata vary within group **and** ``nest=True`` — the resolver re-labels the synthesized ``psu`` uniquely within strata; (3) strata vary within group **and** ``nest=False`` — rejected up front with a targeted ``ValueError``; pass ``SurveyDesign(..., nest=True)`` or an explicit ``psu=<col>`` with globally-unique labels instead. When ``n_bootstrap > 0`` and a survey design is supplied, the multiplier bootstrap operates at the PSU level (Hall-Mammen wild PSU bootstrap) — under the default auto-inject this collapses to a group-level clustered bootstrap. Under within-group-varying PSU the bootstrap uses a cell-level wild PSU allocator — a group contributing cells to multiple PSUs receives independent multiplier draws per PSU (see the Survey + bootstrap contract Note in REGISTRY.md). **Scope note (terminal missingness under any cell-period-allocator path):** on panels where a terminally-missing group is in a cohort whose other groups still contribute at the missing period, every survey variance path that uses the cell- period allocator raises a targeted ``ValueError``: Binder TSL with within-group-varying PSU, Rao-Wu replicate-weight ATT (which always uses the cell allocator), and the cell-level wild PSU bootstrap. Cohort-recentering leaks centered IF mass onto cells with no positive-weight obs, which the cell-period allocator cannot allocate to any observation or PSU. Pre-process the panel (drop late-exit groups or trim to a balanced sub-panel), or — for Binder TSL only — use an explicit ``psu=<group_col>`` so the analytical path routes through the legacy group-level allocator. Replicate ATT and within-group-varying-PSU bootstrap have no such allocator fallback. **Replicate weights with ``n_bootstrap > 0`` raises ``NotImplementedError``** (replicate variance is closed-form; bootstrap would double-count variance). See REGISTRY.md ``ChaisemartinDHaultfoeuille`` Notes for the full contract. Returns ------- ChaisemartinDHaultfoeuilleResults Raises ------ ValueError If required columns are missing, treatment is not binary, or the panel has too few groups / periods. NotImplementedError If any forward-compat parameter is set to a non-default value, with a clear pointer to the relevant ROADMAP phase. """ # ------------------------------------------------------------------ # Step 1: Column validation # ------------------------------------------------------------------ required_cols = [outcome, group, time, treatment] missing = [c for c in required_cols if c not in data.columns] if missing: raise ValueError(f"Missing columns: {missing}") # ------------------------------------------------------------------ # Step 2: Forward-compat gates # ------------------------------------------------------------------ _check_forward_compat_gates( aggregate=aggregate, L_max=L_max, controls=controls, trends_linear=trends_linear, trends_nonparam=trends_nonparam, honest_did=honest_did, ) # ------------------------------------------------------------------ # Step 3: Survey resolution # ------------------------------------------------------------------ from diff_diff.survey import _resolve_survey_for_fit resolved_survey, survey_weights, _, survey_metadata = _resolve_survey_for_fit( survey_design, data, "analytical" ) # dCDH contract: the group is the effective sampling unit for # the TSL IF expansion psi_i = U[g] * (w_i / W_g). When the user # passed a SurveyDesign without an explicit PSU, # compute_survey_if_variance() would fall back to per-observation # PSUs — which contradicts the per-group structure the IF # expansion assumes and inflates df_survey. Auto-inject # `psu=<group>` and re-resolve so downstream variance, df_survey, # and HonestDiD critical values match the documented contract. # Strata / FPC / weight_type / nest are preserved. # Skipped for replicate-weight designs — they're rejected below. if ( resolved_survey is not None and resolved_survey.psu is None and ( resolved_survey.replicate_weights is None or resolved_survey.replicate_weights.shape[1] == 0 ) ): # Pre-auto-inject contract check: the auto-inject path # synthesizes ``psu=<group>`` and preserves the user's # ``nest`` flag. Under ``nest=False`` (the default), the # survey resolver requires globally-unique PSU labels when # strata are present; if strata varies within group, the # synthesized PSU column reuses group labels across strata # and trips the cross-stratum PSU uniqueness check at # resolution time. Under ``nest=True`` the resolver # re-labels ``(stratum, psu)`` uniquely within strata # (``diff_diff/survey.py:299-302``), so varying strata is # fine — let the auto-inject proceed. Only the # ``nest=False`` + varying-strata + omitted-psu triple # warrants an up-front targeted error. if resolved_survey.strata is not None and not getattr(survey_design, "nest", False): _strata_varies_pre, _ = _strata_psu_vary_within_group( resolved_survey, data, group, survey_weights, ) if _strata_varies_pre: raise ValueError( "ChaisemartinDHaultfoeuille survey support: " "strata that vary across cells of the same " "group require either an explicit " "`psu=<col>` (any column whose labels are " "globally unique within strata) or the " "original `SurveyDesign(..., nest=True)` " "flag so the auto-injected `psu=<group>` is " "re-labeled uniquely within strata by the " "resolver. The default `nest=False` auto-" "inject path reuses group labels across " "strata and trips the cross-stratum PSU " "uniqueness check in survey resolution. " "Either (a) set strata constant within each " "group, (b) pass `SurveyDesign(..., " "nest=True)`, or (c) pass an explicit " "`psu=<col>` with globally-unique labels." ) from diff_diff.survey import SurveyDesign as _SurveyDesign # Build a synthesized PSU column on a private copy of data # so the caller's DataFrame is untouched. Valid group values # flow through as their own PSU label; NaN/invalid group # values on zero-weight rows (SurveyDesign.subpopulation() # excluded rows) are replaced with a single shared dummy # label so the PSU resolver accepts them. Zero-weight rows # contribute psi_i = 0 to the variance; keeping them in the # resolved design preserves the full-design df_survey # contract (n_psu / n_strata reflect the full sample, not # the positive-weight subset). psu_col_name = "__dcdh_eff_psu__" synth_data = data.copy() synth_psu = synth_data[group].copy() try: invalid_mask = synth_psu.isna().to_numpy() except (AttributeError, TypeError): invalid_mask = np.zeros(len(synth_psu), dtype=bool) if invalid_mask.any(): synth_psu = synth_psu.astype(object) synth_psu.loc[invalid_mask] = "__dcdh_excluded_null_psu__" synth_data[psu_col_name] = synth_psu eff_design = _SurveyDesign( weights=survey_design.weights, strata=survey_design.strata, psu=psu_col_name, fpc=getattr(survey_design, "fpc", None), weight_type=getattr(survey_design, "weight_type", "pweight"), nest=getattr(survey_design, "nest", False), lonely_psu=getattr(survey_design, "lonely_psu", "remove"), ) resolved_survey, survey_weights, _, survey_metadata = _resolve_survey_for_fit( eff_design, synth_data, "analytical" ) if resolved_survey is not None: if resolved_survey.weight_type != "pweight": raise ValueError( f"ChaisemartinDHaultfoeuille survey support requires " f"weight_type='pweight', got '{resolved_survey.weight_type}'. " f"The survey IF variance math assumes probability weights." ) # Replicate-weight designs (BRR/Fay/JK1/JKn/SDR) are supported # for analytical variance via compute_replicate_if_variance() # at each IF site (see _survey_se_from_group_if). The combination # of replicate weights and n_bootstrap > 0 is rejected inside # the bootstrap entry block (replicate variance is closed-form; # bootstrap would double-count variance). Matches library # precedent: efficient_did.py:989, staggered.py:1869, # two_stage.py:251-253. # Cell-period IF allocator contract: strata and PSU must be # constant within each (g, t) cell, a strict relaxation of # the previous within-group constancy rule. Both the # analytical TSL path and the PSU-level wild bootstrap now # honor within-group-varying PSU via the cell-period # allocator (the bootstrap dispatcher routes PSU-within- # group-constant regimes through the legacy group-level # path for bit-identity with prior releases). _validate_cell_constant_strata_psu( resolved_survey, data, group, time, survey_weights, ) # Design-2 precondition: requires drop_larger_lower=False if design2 and self.drop_larger_lower: raise ValueError( "design2=True requires drop_larger_lower=False because " "Design-2 groups have exactly 2 treatment changes (join " "then leave), which are dropped by the default " "drop_larger_lower=True filter. Construct the estimator " "with ChaisemartinDHaultfoeuille(drop_larger_lower=False)." ) # ------------------------------------------------------------------ # by_path / paths_of_interest preconditions and Phase 3 # compatibility gates # ------------------------------------------------------------------ if self.by_path is not None or self.paths_of_interest is not None: if self.drop_larger_lower: raise ValueError( "by_path / paths_of_interest requires " "drop_larger_lower=False because multi-switch groups " "are the object of interest for per-path " "disaggregation, but the default " "drop_larger_lower=True filter removes them. " "Construct the estimator with " "ChaisemartinDHaultfoeuille(drop_larger_lower=False, " "by_path=k) or " "ChaisemartinDHaultfoeuille(" "drop_larger_lower=False, " "paths_of_interest=[(...), ...])." ) if L_max is None: raise ValueError( "by_path / paths_of_interest requires L_max >= 1. " "The path window spans [F_g - 1, F_g - 1 + L_max] " "and therefore depends on the event-study horizon. " "Set L_max when calling fit()." ) if self.paths_of_interest is not None: expected_len = L_max + 1 for p in self.paths_of_interest: if len(p) != expected_len: raise ValueError( f"paths_of_interest entries must have " f"length L_max+1={expected_len} (window " f"[F_g-1, ..., F_g-1+L_max]); got path " f"{p!r} of length {len(p)}." ) if design2: raise NotImplementedError( "by_path / paths_of_interest combined with design2 " "is deferred to a future release." ) if honest_did: raise NotImplementedError( "by_path / paths_of_interest combined with honest_did " "(HonestDiD sensitivity analysis) is deferred to a " "future release." ) if survey_design is not None and self.n_bootstrap > 0: raise NotImplementedError( "by_path / paths_of_interest combined with both " "survey_design and n_bootstrap>0 (multiplier " "bootstrap) is not yet supported (the survey-aware " "perturbation pivot for path-restricted IFs has " "not been derived). Use n_bootstrap=0 for " "analytical Binder TSL SE under survey_design, or " "use replicate weights " "(SurveyDesign(..., replicate_weights=...)) for " "design-based bootstrap variance." ) # ------------------------------------------------------------------ # Step 4-5: Validate input + aggregate to (g, t) cells via the # shared helper used by both fit() and twowayfeweights(). The # helper enforces NaN/binary/within-cell-rounding rules from # REGISTRY.md and returns a sorted cell DataFrame with columns # [group, time, y_gt, d_gt, n_gt]. # ------------------------------------------------------------------ cell = _validate_and_aggregate_to_cells( data=data, outcome=outcome, group=group, time=time, treatment=treatment, weights=survey_weights, ) # Retain observation-level survey info for IF expansion (Step 3 # of survey integration: group-level IF → observation-level psi). # `time_ids` is per-row; `periods` (the sorted column index used # by the pivoted U_per_period matrices) is populated below once # it's known. Together they enable cell-level IF expansion # psi_i = U[g_i, t_i] * (w_i / W_{g_i, t_i}) under the cell- # period allocator (REGISTRY.md survey IF expansion contract). _obs_survey_info = None if resolved_survey is not None: _obs_survey_info = { "group_ids": data[group].values, "time_ids": data[time].values, "weights": survey_weights, "resolved": resolved_survey, "periods": None, } # Replicate-weight variance tracker: each _compute_se call under # a replicate design returns n_valid (number of replicate columns # that produced a finite estimate). The effective df_survey is # min(n_valid) - 1 across all IF sites — matches the precedent in # `diff_diff/efficient_did.py:1133-1135` and # `diff_diff/triple_diff.py:676-686`. Under TSL (analytical), # _compute_se returns None for n_valid and df_survey falls through # to resolved_survey.df_survey (= n_psu - n_strata). _replicate_n_valid_list: List[int] = [] # ------------------------------------------------------------------ # Step 4b: Covariate aggregation (DID^X, Web Appendix Section 1.2) # ------------------------------------------------------------------ if controls is not None: if not controls: raise ValueError( "controls must be a non-empty list of column names, " "got an empty list. Pass controls=None to disable " "covariate adjustment." ) if L_max is None: raise ValueError( "Covariate adjustment (DID^X) requires L_max >= 1. The " "per-period DID path does not support covariate " "residualization. Set L_max to use the per-group " "DID_{g,l} path with covariate adjustment." ) missing_controls = [c for c in controls if c not in data.columns] if missing_controls: raise ValueError( f"Control column(s) {missing_controls!r} not found in " f"data. Available columns: {list(data.columns)}" ) # SurveyDesign.subpopulation() contract: zero-weight rows are # out-of-sample. Scope BOTH validation and aggregation to the # positive-weight subset so excluded rows with missing/invalid # covariates do not abort the fit and cell aggregation aligns # with the effective sample used by _validate_and_aggregate_to_cells. if survey_weights is not None: pos_mask_ctrl = np.asarray(survey_weights) > 0 data_eff = data.loc[pos_mask_ctrl] survey_weights_eff = np.asarray(survey_weights)[pos_mask_ctrl] else: data_eff = data survey_weights_eff = None data_controls = data_eff[controls].copy() for c in controls: try: data_controls[c] = pd.to_numeric(data_controls[c]) except (ValueError, TypeError) as exc: raise ValueError( f"Could not coerce control column {c!r} to numeric: {exc}" ) from exc n_nan = int(data_controls[c].isna().sum()) if n_nan > 0: raise ValueError( f"Control column {c!r} contains {n_nan} NaN value(s). " "Drop or impute missing covariates before fitting." ) n_inf = int(np.isinf(data_controls[c].to_numpy()).sum()) if n_inf > 0: raise ValueError( f"Control column {c!r} contains {n_inf} Inf value(s). " "Remove or replace non-finite covariates before fitting." ) # Aggregate covariates to cell means (same groupby as treatment/outcome). # Build x_agg_input from the same effective-sample frame so rows # align with data_controls. x_agg_input = data_eff[[group, time]].copy() x_agg_input[controls] = data_controls[controls].values if survey_weights_eff is not None: # Survey-weighted covariate cell means: sum(w*x)/sum(w) x_agg_input["_w_"] = survey_weights_eff for c in controls: x_agg_input[f"_wx_{c}"] = survey_weights_eff * x_agg_input[c].values wx_cols = [f"_wx_{c}" for c in controls] g_agg = x_agg_input.groupby([group, time], as_index=False).agg( {**{wc: "sum" for wc in wx_cols}, "_w_": "sum"} ) for c in controls: w_safe = g_agg["_w_"].replace(0, 1) g_agg[c] = g_agg[f"_wx_{c}"] / w_safe x_cell_agg = g_agg[[group, time] + controls] else: x_cell_agg = x_agg_input.groupby([group, time], as_index=False)[controls].mean() cell = cell.merge(x_cell_agg, on=[group, time], how="left") # ------------------------------------------------------------------ # Step 5a: Compute the TWFE diagnostic on the FULL pre-filter cell # dataset, so the diagnostic reflects the data the user # actually passed in. This MUST run BEFORE Step 5b (the # ragged-panel filter) so that the fitted diagnostic and # the standalone twowayfeweights() function produce # identical results on ragged panels — both operate on # the same _validate_and_aggregate_to_cells() output. # ------------------------------------------------------------------ twfe_diagnostic_payload = None # TWFE diagnostic assumes binary treatment (d_arr == 1 for # treated mask). Skip for non-binary data with a warning. is_binary_pre = set(cell["d_gt"].unique()).issubset({0.0, 1.0, 0, 1}) if self.twfe_diagnostic and not is_binary_pre: warnings.warn( "TWFE diagnostic (twfe_diagnostic=True) is not supported for " "non-binary treatment. The diagnostic assumes binary {0, 1} " "treatment. Skipping TWFE diagnostic for this fit.", UserWarning, stacklevel=2, ) elif self.twfe_diagnostic: try: twfe_diagnostic_payload = _compute_twfe_diagnostic( cell=cell, group_col=group, time_col=time, rank_deficient_action=self.rank_deficient_action, ) except Exception as exc: # noqa: BLE001 # Honor rank_deficient_action="error": if the user # explicitly requested strict failure on rank-deficient # designs, re-raise instead of downgrading to a warning. # Only genuinely non-fatal failures (e.g., numerical # issues unrelated to rank deficiency) should be # swallowed as warnings. if self.rank_deficient_action == "error" and isinstance(exc, ValueError): raise warnings.warn( f"TWFE decomposition diagnostic failed: {exc}. " "Skipping diagnostic; main estimation continues.", UserWarning, stacklevel=2, ) twfe_diagnostic_payload = None # ------------------------------------------------------------------ # Step 5b: Ragged panel validation # # The cohort/variance path treats D_{g,1} as the canonical # baseline and walks adjacent observed periods to detect first # switches. Ragged panels with missing baseline rows or interior # gaps would either crash the cohort enumeration (NaN -> int # cast) or silently misclassify cohorts. Two-tier handling: # # (a) Reject groups missing the FIRST GLOBAL period (the # baseline) with a clear ValueError listing offenders. # (b) Drop groups with INTERIOR GAPS (missing intermediate # periods between their first and last observed period) # with an explicit UserWarning. # ------------------------------------------------------------------ all_periods_pre_drop = sorted(cell[time].unique().tolist()) if len(all_periods_pre_drop) < 2: raise ValueError( f"ChaisemartinDHaultfoeuille requires at least 2 distinct time " f"periods in the panel, got {len(all_periods_pre_drop)}." ) first_global_period = all_periods_pre_drop[0] # (a) Reject groups missing the first global period groups_with_baseline = set(cell.loc[cell[time] == first_global_period, group].tolist()) all_groups_pre_validation = set(cell[group].unique().tolist()) groups_missing_baseline = sorted(all_groups_pre_validation - groups_with_baseline) if groups_missing_baseline: raise ValueError( f"ChaisemartinDHaultfoeuille requires every group to have an " f"observation at the first global period " f"(period={first_global_period!r}). " f"{len(groups_missing_baseline)} group(s) are missing this baseline. " f"Examples: {groups_missing_baseline[:5]}" + ( f" (and {len(groups_missing_baseline) - 5} more)" if len(groups_missing_baseline) > 5 else "" ) + ". Drop these groups or back-fill the baseline before fitting " "so the exclusion is explicit." ) # (b) Drop groups with interior gaps period_index = {p: i for i, p in enumerate(all_periods_pre_drop)} groups_with_interior_gaps: List[Any] = [] for g_id, sub in cell.groupby(group): g_periods = sub[time].tolist() g_min_idx = period_index[min(g_periods)] g_max_idx = period_index[max(g_periods)] expected_count = g_max_idx - g_min_idx + 1 if len(g_periods) != expected_count: groups_with_interior_gaps.append(g_id) n_groups_dropped_interior_gap = len(groups_with_interior_gaps) if groups_with_interior_gaps: warnings.warn( f"Dropping {len(groups_with_interior_gaps)} group(s) with interior " f"period gaps (missing observations between their first and last " f"observed period). Examples: {groups_with_interior_gaps[:5]}" + ( f" (and {len(groups_with_interior_gaps) - 5} more)" if len(groups_with_interior_gaps) > 5 else "" ) + ". dCDH requires consecutive observed periods for the " "cohort/variance path; back-fill or interpolate the missing " "periods if you want these groups in the estimation.", UserWarning, stacklevel=2, ) cell = cell[~cell[group].isin(groups_with_interior_gaps)].reset_index(drop=True) if cell.empty: raise ValueError( "After dropping groups with interior period gaps, no groups " "remain. Provide a balanced panel or back-fill missing periods." ) all_periods_pre_drop = sorted(cell[time].unique().tolist()) if len(all_periods_pre_drop) < 2: raise ValueError( f"ChaisemartinDHaultfoeuille requires at least 2 periods, " f"got {len(all_periods_pre_drop)}" ) # ------------------------------------------------------------------ # Step 6: Drop A5-violating (multi-switch) cells per drop_larger_lower # ------------------------------------------------------------------ n_groups_dropped_crossers = 0 if self.drop_larger_lower: cell, n_groups_dropped_crossers = _drop_crossing_cells( cell=cell, group_col=group, d_col="d_gt" ) else: warnings.warn( "drop_larger_lower=False: the analytical variance formula will " "be inconsistent with the point estimate for any multi-switch " "groups present in the data, producing a biased SE. Use only " "for diagnostic comparison against R or when you are confident " "no multi-switch groups exist.", UserWarning, stacklevel=2, ) # ------------------------------------------------------------------ # Step 6b: TWFE diagnostic sample-contract notice # # The fitted twfe_* values (if the diagnostic succeeded in # Step 5a) were computed on the FULL pre-filter cell sample, # matching the standalone twowayfeweights() output. Steps 5b # and 6 may have dropped groups since then. When they did, the # fitted diagnostic and the dCDH point estimate describe # DIFFERENT samples, so we surface that divergence as a # UserWarning per the REGISTRY contract Note. Users see the # warning at fit time and can decide whether to pre-process # their data before re-fitting (or accept the documented # divergence). # # The warning fires whenever the user requested the diagnostic # AND filters dropped groups, even if _compute_twfe_diagnostic # itself failed (rank-deficient fallback) and # twfe_diagnostic_payload is None. The warning text uses "(if # the diagnostic succeeded)" to remain accurate in both cases. # ------------------------------------------------------------------ if self.twfe_diagnostic and (n_groups_dropped_interior_gap + n_groups_dropped_crossers) > 0: warnings.warn( f"TWFE diagnostic sample-contract notice: the dCDH point " f"estimate, results.groups, and inference fields use a " f"POST-FILTER sample after Step 5b dropped " f"{n_groups_dropped_interior_gap} interior-gap group(s) " f"and Step 6 dropped {n_groups_dropped_crossers} multi-" f"switch group(s). The fitted results.twfe_* values (if " f"the diagnostic succeeded) were computed on the FULL " f"pre-filter cell sample, so they describe a LARGER " f"sample (pre-filter) than overall_att. The standalone " f"twowayfeweights() function also uses the pre-filter " f"sample. This is the documented Phase 1 contract — see " f"REGISTRY.md ChaisemartinDHaultfoeuille `Note (TWFE " f"diagnostic sample contract)` for the rationale. To " f"reproduce the dCDH estimation sample for an external " f"TWFE comparison, pre-process your data to drop the " f"{n_groups_dropped_interior_gap + n_groups_dropped_crossers} " f"flagged groups before re-fitting.", UserWarning, stacklevel=2, ) # ------------------------------------------------------------------ # Step 7: Singleton-baseline identification (footnote 15 of dynamic paper) # ------------------------------------------------------------------ # The singleton-baseline filter identifies groups whose baseline # treatment value D_{g,1} is unique in the panel. Per footnote 15 # of the dynamic paper, these have no baseline-matched cohort peer # and contribute zero variance under the cohort framework. # # IMPORTANT: under Python's documented period-based stable-control # interpretation, a singleton-baseline group can STILL be a valid # stable_0 / stable_1 control for the point estimate, even though # it has no cohort peer. The filter is therefore applied at the # variance stage only — the cell DataFrame retains these groups # so they can serve as stable controls. # Use the validated first global period as the canonical baseline. # Step 5b guarantees every group has an observation at this period, # so we can read it directly without a groupby.first() that could # otherwise return a later observed period for late-entry groups. baselines_per_group = cell.loc[cell[time] == first_global_period, [group, "d_gt"]].rename( columns={"d_gt": "_baseline"} ) baseline_counts = baselines_per_group["_baseline"].value_counts() singleton_baseline_values = baseline_counts[baseline_counts < 2].index.tolist() singleton_baseline_groups: List[Any] = ( baselines_per_group.loc[ baselines_per_group["_baseline"].isin(singleton_baseline_values), group ].tolist() if singleton_baseline_values else [] ) n_groups_dropped_singleton_baseline = len(singleton_baseline_groups) if n_groups_dropped_singleton_baseline > 0: warnings.warn( f"Singleton-baseline filter (footnote 15 of dynamic paper): " f"{n_groups_dropped_singleton_baseline} group(s) excluded from " f"the cohort-recentered VARIANCE computation only — they remain " f"in the point-estimate sample as period-based stable controls. " f"Examples: {singleton_baseline_groups[:5]}" + ( f" (and {n_groups_dropped_singleton_baseline - 5} more)" if n_groups_dropped_singleton_baseline > 5 else "" ), UserWarning, stacklevel=2, ) if cell.empty or cell[group].nunique() == 0: raise ValueError( "After dropping multi-switch cells (drop_larger_lower=True), no " "groups remain. The dataset cannot support dCDH estimation. " "Check the input panel for diversity in treatment patterns." ) # Determine the post-filter group set, period set, and per-group state all_groups = sorted(cell[group].unique().tolist()) all_periods = sorted(cell[time].unique().tolist()) n_obs_post = int(cell["n_gt"].sum()) # ------------------------------------------------------------------ # L_max validation (Phase 2): must be a positive integer not # exceeding the number of post-baseline periods. Validated here # (after period detection) rather than in _check_forward_compat_gates # (which runs before data is processed). # ------------------------------------------------------------------ if L_max is not None: if not isinstance(L_max, int) or L_max < 1: raise ValueError(f"L_max must be a positive integer or None, got {L_max!r}.") n_post_baseline = len(all_periods) - 1 if L_max > n_post_baseline: raise ValueError( f"L_max={L_max} exceeds available post-baseline periods " f"({n_post_baseline}). Maximum L_max for this panel " f"is {n_post_baseline}." ) if honest_did and L_max is None: raise ValueError( "honest_did=True requires L_max >= 1 for multi-horizon placebos. " "Set L_max to compute DID^{pl}_l placebos that HonestDiD uses as " "pre-period coefficients." ) if honest_did and not self.placebo: raise ValueError( "honest_did=True requires placebo computation. The estimator was " "constructed with placebo=False. Use " "ChaisemartinDHaultfoeuille(placebo=True) (the default)." ) # Pivot to (group x time) matrices for vectorized computations d_pivot = cell.pivot(index=group, columns=time, values="d_gt").reindex( index=all_groups, columns=all_periods ) y_pivot = cell.pivot(index=group, columns=time, values="y_gt").reindex( index=all_groups, columns=all_periods ) n_pivot = ( cell.pivot(index=group, columns=time, values="n_gt") .reindex(index=all_groups, columns=all_periods) .fillna(0) .astype(int) ) D_mat = d_pivot.to_numpy() Y_mat = y_pivot.to_numpy() N_mat = n_pivot.to_numpy() # Finalize survey obs-info with the pivot's column index so that # cell-level IF expansion can map per-row time values to matrix # column indices in _survey_se_from_group_if. if _obs_survey_info is not None: _obs_survey_info["periods"] = np.asarray(all_periods) # ------------------------------------------------------------------ # Step 7b: Covariate residualization (DID^X) # # When controls are specified, residualize Y_mat by partialling # out covariate effects per baseline treatment group. This # transforms Y_mat so the per-group multi-horizon DID path # (event_study_effects, overall_att, joiners/leavers, by_path # surfaces, placebos, sup-t bands) automatically produces # covariate-adjusted estimates. The per-period DID path # (per_period_effects) intentionally remains on raw outcomes — # it uses binary joiner/leaver categorization and is not part # of the DID^X contract per REGISTRY.md "Note (Phase 3 DID^X # covariate adjustment)". See Web Appendix Section 1.2. # ------------------------------------------------------------------ covariate_diagnostics: Optional[Dict[str, Any]] = None _switch_metadata_computed = False if controls is not None: # Pivot covariates to (n_groups, n_periods, n_covariates) X_pivots = [] for c in controls: x_piv = cell.pivot(index=group, columns=time, values=c).reindex( index=all_groups, columns=all_periods ) X_pivots.append(x_piv.to_numpy()) X_cell = np.stack(X_pivots, axis=2) # Need switch metadata for residualization (baselines, F_g) baselines, first_switch_idx_arr, switch_direction_arr, T_g_arr = ( _compute_group_switch_metadata(D_mat, N_mat) ) _switch_metadata_computed = True # by_path + controls residualization-sample deviation from R. # R's `did_multiplegt_dyn(..., by_path, controls)` calls # `did_multiplegt_main()` once per path with `df_main` filtered # to: rows of the path's switchers OR rows where # `yet_to_switch=1 AND baseline matches the path's baseline` # (R/R/did_multiplegt_dyn.R lines 401-405). Inside the per-path # `did_multiplegt_main()` call, the per-baseline first-stage # residualization regression uses `(g, t)` cells where g's # treatment hasn't changed yet at t. Critically, R's path- # restricted subset INCLUDES the pre-switch rows of OTHER-path # switchers via the `yet_to_switch=1 AND baseline matches` # clause, so the first-stage SAMPLE that R uses for path B # equals: pre-switch rows of all switchers with matching # baseline + all rows of never-switchers with matching # baseline. This is BIT-IDENTICAL to the first-stage sample # we use under our global residualization — first-stage # coefficients (and therefore residualized outcomes) coincide, # and per-path point estimates match R exactly **under single- # baseline switcher panels** (every switcher has the same # `D_{g,1}`, regardless of how `F_g` varies across paths or # within a path). Empirical confirmation: the # `multi_path_reversible_by_path_controls` R-parity scenario # has 4 paths with switcher `F_g` values spanning [0..6] under # `D_{g,1}=0` for every switcher, and Python matches R to # rtol ~1e-11 across all `(path, horizon)` cells. # # On MULTI-baseline switcher panels the per-baseline regression # coefficients diverge per path under R (R's per-path subset # for path B drops switchers whose baseline differs from B's # baseline), so point estimates can diverge between Python and # R — warn the user explicitly. The check filters to switcher # groups only (never-switchers do not contribute to "switcher # baseline" multiplicity even if they appear at multiple # `D_{g,1}` values across the never-treated / always-treated # control mix). SE inheritance (cross-path cohort-sharing) is # documented separately in REGISTRY.md. if self.by_path is not None or self.paths_of_interest is not None: _switcher_mask = first_switch_idx_arr >= 0 if _switcher_mask.any(): _switcher_baselines = baselines[_switcher_mask] if np.unique(_switcher_baselines).size > 1: warnings.warn( "by_path / paths_of_interest + controls: " "switcher baselines D_{g,1} take multiple values " "in this panel. Python residualizes once on the " "full panel before path enumeration; R " "`did_multiplegt_dyn(..., by_path, controls)` " "re-runs residualization per path on the " "path-restricted subsample, so per-path point " "estimates can diverge between Python and R on " "this panel. See `docs/methodology/REGISTRY.md` " "(`Note (Phase 3 by_path ...)` -> Per-path " "covariate residualization) for the full " "deviation contract.", UserWarning, stacklevel=2, ) Y_mat_residualized, covariate_diagnostics, _failed_baselines = ( _compute_covariate_residualization( Y_mat=Y_mat, X_cell=X_cell, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, rank_deficient_action=self.rank_deficient_action, ) ) # Zero out N_mat for failed-stratum groups so the downstream # eligibility checks (N_mat[g, idx] > 0) naturally exclude # them from all DID/IF/placebo computation. if _failed_baselines: for g_idx in range(len(baselines)): if float(baselines[g_idx]) in _failed_baselines: N_mat[g_idx, :] = 0 # Keep raw Y_mat for the per-period DID path (which does not # support covariate residualization - it uses binary joiner/leaver # categorization). The residualized matrix is used only by the # per-group multi-horizon path (L_max >= 1). Y_mat_raw = Y_mat Y_mat = Y_mat_residualized # ------------------------------------------------------------------ # Step 7c: First-differencing for linear trends (DID^{fd}) # # When trends_linear=True, replace Y_mat with Z_mat (first- # differenced outcomes) so that DID_{g,l}(Z) = DID^{fd}_{g,l}. # N_mat is also adjusted: N_mat_fd marks which Z values are valid. # IMPORTANT: _compute_group_switch_metadata uses the ORIGINAL # N_mat (treatment path metadata), not N_mat_fd. # ------------------------------------------------------------------ _is_trends_linear = trends_linear is True linear_trends_effects: Optional[Dict[int, Dict[str, Any]]] = None # N_mat_orig preserves observation counts for switch-metadata and # cohort-identification code that must NOT see the first-differenced # N_mat_fd. When trends_linear=False, N_mat_orig == N_mat. N_mat_orig = N_mat if _is_trends_linear: if L_max is None: raise ValueError( "Group-specific linear trends (DID^{fd}) requires " "L_max >= 1. Set L_max to use the per-group " "DID_{g,l} path with trend adjustment." ) if len(all_periods) < 3: raise ValueError( "Group-specific linear trends (DID^{fd}) requires " "at least 3 time periods (F_g >= 3 in the paper). " f"Got {len(all_periods)} period(s)." ) # Compute switch metadata on original N_mat if not done yet if not _switch_metadata_computed: baselines, first_switch_idx_arr, switch_direction_arr, T_g_arr = ( _compute_group_switch_metadata(D_mat, N_mat) ) _switch_metadata_computed = True # Count and warn about excluded groups (F_g < 3 -> f_g < 2) n_excluded_fd = int(((first_switch_idx_arr >= 0) & (first_switch_idx_arr < 2)).sum()) if n_excluded_fd > 0: warnings.warn( f"DID^{{fd}} (trends_linear=True): {n_excluded_fd} " f"switching group(s) have F_g < 3 (fewer than 2 " f"pre-switch periods) and are excluded from the " f"trend-adjusted estimation.", UserWarning, stacklevel=2, ) # Multi-baseline switcher panel detection (`by_path + trends_linear`): # mirror the analogous warning fired by `by_path + controls` at # `:1565-1584`. Python first-differences once globally before # path enumeration; R `did_multiplegt_dyn(..., by_path, trends_lin)` # re-runs the full pipeline (including first-differencing) per # path's restricted subsample. On single-baseline switcher # panels the two architectures coincide; on multi-baseline # switcher panels (some switchers have `D_{g,1}=0`, others # `D_{g,1}=1`) per-path point estimates can diverge — warn the # user explicitly so they don't silently consume estimates # that disagree with R. The check filters to switcher groups # only (never-switchers / always-treated controls don't # contribute to switcher baseline multiplicity). if self.by_path is not None or self.paths_of_interest is not None: _switcher_mask_tl = first_switch_idx_arr >= 0 if _switcher_mask_tl.any(): _switcher_baselines_tl = baselines[_switcher_mask_tl] if np.unique(_switcher_baselines_tl).size > 1: warnings.warn( "by_path / paths_of_interest + trends_linear: " "switcher baselines D_{g,1} take multiple values " "in this panel. Python first-differences once on " "the full panel before path enumeration; R " "`did_multiplegt_dyn(..., by_path, trends_lin)` " "re-runs the full pipeline (including " "first-differencing) on each path's restricted " "subsample, so per-path point estimates can " "diverge between Python and R on this panel. " "See `docs/methodology/REGISTRY.md` " "(`Note (Phase 3 by_path ...)` -> Per-path " "linear-trends DID^{fd}) for the full " "deviation contract.", UserWarning, stacklevel=2, ) # F_g=3 boundary-case divergence (`by_path + trends_linear`). # `F_g=3` switchers have exactly 2 pre-switch periods, # which after trends_linear's first-difference and # `time != 1` filter leaves only 1 valid pre-window Z # value. R re-runs the full pipeline on each path's # restricted subsample (path's switchers + same-baseline # yet-to-treat controls), and this single-pre-period # regime triggers different control-eligibility # treatment in R's per-path call than Python's global- # then-disaggregate architecture. Empirically observed # 30-165% rel diff on path 1 of the parity fixture's # earlier `F_g=3` variant; the shipped parity scenario # uses `F_g >= 4` exclusively. Fire a targeted warning # whenever the panel contains any `F_g=3` switchers AND # `by_path` is requested, so practitioners hitting this # boundary regime see the divergence flag explicitly. _f_g_three_count = int((first_switch_idx_arr == 2).sum()) if _f_g_three_count > 0: warnings.warn( f"by_path / paths_of_interest + trends_linear: " f"{_f_g_three_count} switching group(s) have " f"F_g=3 (exactly 2 " f"pre-switch periods). After first-differencing " f"and the time==1 filter, these groups have " f"only 1 valid pre-window Z value, which " f"triggers a documented boundary-case " f"divergence between Python's global-then-" f"disaggregate architecture and R's per-path " f"full-pipeline call. Per-path point estimates " f"on paths whose switchers include F_g=3 can " f"diverge from `did_multiplegt_dyn(..., " f"by_path, trends_lin)` by 30%+ on point " f"estimates. See `docs/methodology/REGISTRY.md` " f"(`Note (Phase 3 by_path ...)` -> Per-path " f"linear-trends DID^{{fd}}) for the full " f"deviation contract.", UserWarning, stacklevel=2, ) N_mat_orig = N_mat.copy() Y_mat, N_mat = _compute_first_differenced_matrix(Y_mat, N_mat) # ------------------------------------------------------------------ # Step 7d: State-set trends validation (trends_nonparam) # # When trends_nonparam is set (a column name), restrict the # control pool for each switcher to groups in the same set. # ------------------------------------------------------------------ set_ids_arr: Optional[np.ndarray] = None if trends_nonparam is not None: if L_max is None: raise ValueError( "State-set-specific trends (trends_nonparam) requires " "L_max >= 1. Set L_max to use the per-group " "DID_{g,l} path with state-set trends." ) set_col = str(trends_nonparam) if set_col not in data.columns: raise ValueError( f"trends_nonparam column {set_col!r} not found in " f"data. Available columns: {list(data.columns)}" ) # SurveyDesign.subpopulation() contract: scope NaN and # time-invariance validation to positive-weight rows so # excluded obs with missing set IDs do not abort the fit. if survey_weights is not None: pos_mask_tnp = np.asarray(survey_weights) > 0 data_tnp = data.loc[pos_mask_tnp] else: data_tnp = data # Reject NaN/missing set assignments (effective sample only) n_na_set = int(data_tnp[set_col].isna().sum()) if n_na_set > 0: raise ValueError( f"trends_nonparam column {set_col!r} contains " f"{n_na_set} NaN/missing value(s). All groups must " f"have a valid set assignment." ) # Aggregate set membership per group (must be time-invariant) set_per_group = data_tnp.groupby(group)[set_col].nunique() time_varying = set_per_group[set_per_group > 1] if len(time_varying) > 0: raise ValueError( f"trends_nonparam column {set_col!r} must be " f"time-invariant within each group. " f"{len(time_varying)} group(s) have varying values. " f"Examples: {time_varying.index.tolist()[:5]}" ) # Set partition must be coarser than group (multiple groups # per set). A group-level partition creates singleton sets # with no within-set controls available. set_map_check = data_tnp.groupby(group)[set_col].first() n_sets = set_map_check.nunique() n_groups_total = len(set_map_check) if n_sets >= n_groups_total: raise ValueError( f"trends_nonparam column {set_col!r} defines " f"{n_sets} distinct sets for {n_groups_total} " f"groups. The set partition must be coarser than " f"group (multiple groups per set) to provide " f"within-set controls." ) # Extract set membership per group aligned with all_groups set_map = data_tnp.groupby(group)[set_col].first() set_ids_arr = np.array([set_map.loc[g] for g in all_groups], dtype=object) # ------------------------------------------------------------------ # Step 8-9: Switching-cell counts and per-period DIDs (Theorem 3) # with explicit A11 zero-retention pseudocode # ------------------------------------------------------------------ ( per_period_effects, a11_warnings, did_plus_t_arr, did_minus_t_arr, n_10_t_arr, n_01_t_arr, n_00_t_arr, n_11_t_arr, a11_plus_zeroed_arr, a11_minus_zeroed_arr, ) = _compute_per_period_dids( D_mat=D_mat, # Use raw (unadjusted) outcomes for per-period DID. Covariate # residualization applies only to the per-group multi-horizon # path (L_max >= 1). The per-period path uses binary # joiner/leaver categorization and is not part of the DID^X # contract (Web Appendix Section 1.2). # Use raw outcomes for per-period DID when controls or # trends_linear is active (both transform Y_mat). Y_mat=( Y_mat_raw if controls is not None else (y_pivot.to_numpy() if _is_trends_linear else Y_mat) ), N_mat=N_mat_orig, periods=all_periods, ) if a11_warnings: warnings.warn( f"Assumption 11 (existence of stable controls) violated in " f"{len(a11_warnings)} period(s); the affected DID_+/DID_- values " f"are zeroed but their switcher counts are retained in the N_S " f"denominator (matching paper convention). Affected: " f"{', '.join(a11_warnings[:3])}" + (f" (and {len(a11_warnings) - 3} more)" if len(a11_warnings) > 3 else ""), UserWarning, stacklevel=2, ) # ------------------------------------------------------------------ # Step 10: Aggregate DID_M = sum_t (n_10_t * did_plus_t + n_01_t * did_minus_t) / N_S # ------------------------------------------------------------------ N_S = int(n_10_t_arr.sum() + n_01_t_arr.sum()) # For non-binary treatment, the per-period DID path may find N_S=0 # because it uses binary joiner/leaver categorization. When L_max # is set, the multi-horizon path (which handles non-binary correctly # via per-group DID_{g,l}) will compute the effects. Only raise if # L_max is also None (i.e., no fallback path). is_binary = set(np.unique(D_mat[~np.isnan(D_mat)])).issubset({0.0, 1.0}) if not is_binary and L_max is None: raise ValueError( "Non-binary treatment requires L_max >= 1. The per-period DID " "path uses binary joiner/leaver categorization; set L_max to " "use the per-group DID_{g,l} building block which handles " "non-binary treatment." ) if (self.by_path is not None or self.paths_of_interest is not None) and not is_binary: finite_D = D_mat[~np.isnan(D_mat)] if finite_D.size > 0 and not np.all(finite_D == np.round(finite_D)): bad_examples = np.unique(finite_D[finite_D != np.round(finite_D)])[:3] raise ValueError( f"by_path / paths_of_interest with non-binary " f"treatment requires integer-coded treatment values " f"(D in Z). Found non-integer values: " f"{bad_examples.tolist()!r}. Round/discretize D " f"before fitting, or set by_path=None and " f"paths_of_interest=None with continuous treatment." ) if N_S == 0 and (L_max is None or is_binary): raise ValueError( "No switching cells found in the data after filtering: every " "group has constant treatment for the entire panel. dCDH " "requires at least one (g, t) cell where the group's treatment " "differs from the previous period." ) if N_S > 0: overall_att = float((n_10_t_arr @ did_plus_t_arr + n_01_t_arr @ did_minus_t_arr) / N_S) else: # Non-binary treatment with L_max: per-period DID is not # applicable. The multi-horizon path will provide overall_att # via the cost-benefit delta. overall_att = float("nan") # ------------------------------------------------------------------ # Step 11: Joiners and leavers views # ------------------------------------------------------------------ joiner_total = int(n_10_t_arr.sum()) leaver_total = int(n_01_t_arr.sum()) joiners_available = joiner_total > 0 leavers_available = leaver_total > 0 if joiners_available: joiners_att = float((n_10_t_arr @ did_plus_t_arr) / joiner_total) else: joiners_att = float("nan") if leavers_available: leavers_att = float((n_01_t_arr @ did_minus_t_arr) / leaver_total) else: leavers_att = float("nan") # Joiner / leaver sample-size metadata. # n_*_cells: total switching cells across all periods (sum of per-period # cell counts; each (g, t) joiner/leaver cell counted once). # n_*_obs: actual observation count (sum of n_gt over the same cells), # which differs from cells when individual-level inputs have # multiple original observations per (g, t). n_joiner_cells = int(n_10_t_arr.sum()) n_leaver_cells = int(n_01_t_arr.sum()) n_joiner_obs = 0 n_leaver_obs = 0 for t_idx in range(1, len(all_periods)): d_curr = D_mat[:, t_idx] d_prev = D_mat[:, t_idx - 1] n_curr = N_mat[:, t_idx] n_prev = N_mat[:, t_idx - 1] present = (n_curr > 0) & (n_prev > 0) joiner_mask_t = (d_prev == 0) & (d_curr == 1) & present leaver_mask_t = (d_prev == 1) & (d_curr == 0) & present n_joiner_obs += int(n_curr[joiner_mask_t].sum()) n_leaver_obs += int(n_curr[leaver_mask_t].sum()) # ------------------------------------------------------------------ # Step 12: Placebo (DID_M^pl) # ------------------------------------------------------------------ placebo_available = False placebo_effect = float("nan") if self.placebo: if len(all_periods) < 3: warnings.warn( f"Placebo DID_M^pl requires at least 3 time " f"periods; the post-filter panel has only {len(all_periods)}. " "Skipping the placebo computation. Pass placebo=False to " "suppress this warning, or use a panel with T >= 3.", UserWarning, stacklevel=2, ) else: placebo_payload = _compute_placebo( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, periods=all_periods ) if placebo_payload is None: warnings.warn( "Placebo DID_M^pl could not be computed: no qualifying " "switching cells with the required 3-period stable " "history exist after filtering. The placebo fields on " "the results object are NaN with placebo_available=False.", UserWarning, stacklevel=2, ) else: placebo_effect, placebo_available, placebo_a11_warnings = placebo_payload # Surface placebo A11 violations via a consolidated warning # mirroring the main DID path's contract. The affected # per-period placebo contributions are zeroed in the # numerator with their switcher counts retained in the # placebo N_S^pl denominator (placebo zero-retention). if placebo_a11_warnings: warnings.warn( f"Placebo (DID_M^pl) Assumption 11 violations in " f"{len(placebo_a11_warnings)} period(s); the affected " f"placebo contributions are zeroed but their switcher " f"counts are retained in the placebo N_S denominator " f"(matching placebo paper convention). Affected: " + ", ".join(placebo_a11_warnings[:3]) + ( f" (and {len(placebo_a11_warnings) - 3} more)" if len(placebo_a11_warnings) > 3 else "" ), UserWarning, stacklevel=2, ) # ------------------------------------------------------------------ # Step 12b: Per-group switch metadata (shared by Phase 1 IF and # Phase 2 multi-horizon). May already be computed by # Step 7b (covariate residualization). # ------------------------------------------------------------------ if not _switch_metadata_computed: baselines, first_switch_idx_arr, switch_direction_arr, T_g_arr = ( _compute_group_switch_metadata(D_mat, N_mat_orig) ) # ------------------------------------------------------------------ # Step 12c: Multi-horizon per-group computation (L_max >= 1) # ------------------------------------------------------------------ multi_horizon_dids: Optional[Dict[int, Dict[str, Any]]] = None multi_horizon_if: Optional[Dict[int, Tuple[np.ndarray, np.ndarray]]] = None multi_horizon_se: Optional[Dict[int, float]] = None multi_horizon_inference: Optional[Dict[int, Dict[str, Any]]] = None if L_max is not None and L_max >= 1: multi_horizon_dids = _compute_multi_horizon_dids( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max, set_ids=set_ids_arr, ) # Surface A11 warnings from multi-horizon computation mh_a11 = multi_horizon_dids.pop("_a11_warnings", None) if mh_a11: warnings.warn( f"Multi-horizon control-availability violations in " f"{len(mh_a11)} (group, horizon) pair(s): affected " f"groups are excluded from N_l (no observed baseline-" f"matched controls at the outcome period). Examples: " + ", ".join(mh_a11[:3]) + (f" (and {len(mh_a11) - 3} more)" if len(mh_a11) > 3 else ""), UserWarning, stacklevel=2, ) # Guard: if no eligible switchers at horizon 1 (e.g., all # groups have constant treatment), raise ValueError. if 1 in multi_horizon_dids and multi_horizon_dids[1]["N_l"] == 0: raise ValueError( "No switching groups found at horizon 1 after filtering. " "dCDH requires at least one group whose treatment changes " "from the baseline period." ) multi_horizon_if = _compute_per_group_if_multi_horizon( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max, set_ids=set_ids_arr, compute_per_period=_obs_survey_info is not None, ) # Per-horizon analytical SE via cohort recentering. # Reuse the singleton-baseline exclusion from Step 7 and # build cohort IDs per horizon. singleton_baseline_set = set(singleton_baseline_groups) eligible_mask_var = np.array( [g not in singleton_baseline_set for g in all_groups], dtype=bool ) # Lift eligible_groups_var once for downstream by_path / # paths_of_interest call sites; mirrors the inline # `_elig_groups_l` construction at the global per-horizon # path so both surfaces share the same variance-eligibility # ordering. Used to align per-group IF entries with the # cell-period allocator's `eligible_groups` argument. eligible_groups_var: List[Any] = [ all_groups[g] for g in range(len(all_groups)) if eligible_mask_var[g] ] multi_horizon_se = {} multi_horizon_inference = {} # Cache the cohort-recentered per-cell IF tensors built # inside this loop so the bootstrap block below can reuse # them without recomputing cohort-recentering. Keyed by # horizon; value shape (n_eligible_var, n_periods). Populated # only when a survey design is active (else the per-period # tensor is None and the bootstrap has no cell-level path # to take). _mh_pp_cache: Dict[int, np.ndarray] = {} # Compute inference for ALL horizons 1..L_max (including l=1) # so the event_study_effects dict uses a consistent estimand # (per-group DID_{g,l}) across all horizons. for l_h in range(1, L_max + 1): U_l, U_pp_l = multi_horizon_if[l_h] # Cohort IDs for this horizon: (D_{g,1}, F_g, S_g) triples # are the same as Phase 1 (cohort identity depends on first # switch, not on the horizon). Filter to eligible. cohort_keys_l = [ ( float(baselines[g]), int(first_switch_idx_arr[g]), int(switch_direction_arr[g]), ) for g in range(len(all_groups)) ] unique_c: Dict[Tuple[float, int, int], int] = {} cid_l = np.zeros(len(all_groups), dtype=int) for g in range(len(all_groups)): if not eligible_mask_var[g]: cid_l[g] = -1 continue key = cohort_keys_l[g] if key not in unique_c: unique_c[key] = len(unique_c) cid_l[g] = unique_c[key] # Use the full variance-eligible group set (singleton- # baseline exclusion only). Do NOT intersect with # did_eligible — never-switchers and later-switching # controls can have non-zero IF mass via their control # roles, and dropping them understates the SE. U_l_elig = U_l[eligible_mask_var] cid_elig = cid_l[eligible_mask_var] U_centered_l = _cohort_recenter(U_l_elig, cid_elig) # Only build the cell-level attribution when the IF # helper actually produced a per-period tensor (i.e., # a survey design is present). Otherwise the plug-in # path consumes U_centered_l only. if U_pp_l is not None: U_centered_pp_l: Optional[np.ndarray] = _cohort_recenter_per_period( U_pp_l[eligible_mask_var], cid_elig ) _mh_pp_cache[l_h] = U_centered_pp_l else: U_centered_pp_l = None N_l_h = multi_horizon_dids[l_h]["N_l"] _elig_groups_l = [ all_groups[g] for g in range(len(all_groups)) if eligible_mask_var[g] ] se_l, n_valid_l = _compute_se( U_centered=U_centered_l, divisor=N_l_h, obs_survey_info=_obs_survey_info, eligible_groups=_elig_groups_l, U_centered_per_period=U_centered_pp_l, ) if n_valid_l is not None: _replicate_n_valid_list.append(n_valid_l) multi_horizon_se[l_h] = se_l did_l_val = multi_horizon_dids[l_h]["did_l"] _df_s = _effective_df_survey(resolved_survey, _replicate_n_valid_list) t_l, p_l, ci_l = safe_inference( did_l_val, se_l, alpha=self.alpha, df=_inference_df(_df_s, resolved_survey) ) multi_horizon_inference[l_h] = { "effect": did_l_val, "se": se_l, "t_stat": t_l, "p_value": p_l, "conf_int": ci_l, "n_obs": N_l_h, } # Emit <50% switcher warning for far horizons if multi_horizon_dids.get(1, {}).get("N_l", 0) > 0: N_1_ref = multi_horizon_dids[1]["N_l"] thin_horizons = [ l_h for l_h in range(2, L_max + 1) if multi_horizon_dids[l_h]["N_l"] < 0.5 * N_1_ref and multi_horizon_dids[l_h]["N_l"] > 0 ] if thin_horizons: warnings.warn( f"Fewer than 50% of l=1 switchers contribute at " f"horizon(s) {thin_horizons}. Far-horizon estimates " f"may be noisy. The paper recommends not reporting " f"horizons where fewer than ~50% of switchers " f"contribute (Favara-Imbs application, footnote 14).", UserWarning, stacklevel=2, ) # by_path disaggregation by observed treatment trajectory path_effects: Optional[Dict[Tuple[int, ...], Dict[str, Any]]] = None path_placebos: Optional[Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]] = None path_cumulated_event_study: Optional[Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]] = ( None ) if ( (self.by_path is not None or self.paths_of_interest is not None) and L_max is not None and L_max >= 1 and multi_horizon_dids is not None ): _df_s_bp = _effective_df_survey(resolved_survey, _replicate_n_valid_list) path_effects = _compute_path_effects( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max, by_path=self.by_path, paths_of_interest=self.paths_of_interest, eligible_mask_var=eligible_mask_var, multi_horizon_dids=multi_horizon_dids, all_groups=all_groups, alpha=self.alpha, df_inference=_inference_df(_df_s_bp, resolved_survey), set_ids=set_ids_arr, obs_survey_info=_obs_survey_info, eligible_groups=eligible_groups_var, replicate_n_valid_list=_replicate_n_valid_list, ) # NOTE: per-path cumulated layer is computed AFTER the # bootstrap propagation block below (search for # `path_cumulated_event_study =`) so it reads the final # post-bootstrap per-horizon SEs rather than the analytical # ones that path_effects was just populated with. This # mirrors the global `linear_trends_effects` cumulation # which also runs after the event_study bootstrap propagation. # Phase 2: placebos, normalized effects, cost-benefit delta multi_horizon_placebos: Optional[Dict[int, Dict[str, Any]]] = None placebo_horizon_if: Optional[Dict[int, Tuple[np.ndarray, np.ndarray]]] = None placebo_horizon_se: Optional[Dict[int, float]] = None placebo_horizon_inference: Optional[Dict[int, Dict[str, Any]]] = None normalized_effects_dict: Optional[Dict[int, Dict[str, Any]]] = None cost_benefit_result: Optional[Dict[str, Any]] = None if L_max is not None and L_max >= 1 and multi_horizon_dids is not None: # Dynamic placebos DID^{pl}_l if self.placebo: multi_horizon_placebos = _compute_multi_horizon_placebos( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max, set_ids=set_ids_arr, ) # Surface placebo A11 warnings pl_a11 = multi_horizon_placebos.pop("_a11_warnings", None) if pl_a11: warnings.warn( f"Multi-horizon placebo control-availability " f"violations in {len(pl_a11)} (group, lag) pair(s): " f"affected groups are excluded from N^{{pl}}_l " f"(no observed controls). Examples: " + ", ".join(pl_a11[:3]) + (f" (and {len(pl_a11) - 3} more)" if len(pl_a11) > 3 else ""), UserWarning, stacklevel=2, ) # Placebo IF computation + analytical SE if multi_horizon_placebos is not None: placebo_horizon_if = _compute_per_group_if_placebo_horizon( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max, compute_per_period=_obs_survey_info is not None, set_ids=set_ids_arr, ) # Per-placebo-horizon analytical SE via cohort recentering # (same pattern as positive-horizon SE at Step 12c). placebo_horizon_se: Dict[int, float] = {} placebo_horizon_inference: Dict[int, Dict[str, Any]] = {} singleton_baseline_set_pl = set(singleton_baseline_groups) eligible_mask_pl = np.array( [g not in singleton_baseline_set_pl for g in all_groups], dtype=bool, ) # Cache per-placebo-horizon cohort-recentered per-cell # IF tensors for the bootstrap block below (same # pattern as _mh_pp_cache for positive horizons). _pl_pp_cache: Dict[int, np.ndarray] = {} for lag_l in range(1, L_max + 1): pl_data = multi_horizon_placebos.get(lag_l) if pl_data is None or pl_data["N_pl_l"] == 0: placebo_horizon_se[lag_l] = float("nan") continue U_pl, U_pp_pl = placebo_horizon_if[lag_l] # Cohort IDs (same as positive horizons) cohort_keys_pl = [ ( float(baselines[g]), int(first_switch_idx_arr[g]), int(switch_direction_arr[g]), ) for g in range(len(all_groups)) ] unique_cpl: Dict[Tuple[float, int, int], int] = {} cid_pl = np.zeros(len(all_groups), dtype=int) for g in range(len(all_groups)): if not eligible_mask_pl[g]: cid_pl[g] = -1 continue key = cohort_keys_pl[g] if key not in unique_cpl: unique_cpl[key] = len(unique_cpl) cid_pl[g] = unique_cpl[key] U_pl_elig = U_pl[eligible_mask_pl] cid_elig_pl = cid_pl[eligible_mask_pl] U_centered_pl_l = _cohort_recenter(U_pl_elig, cid_elig_pl) if U_pp_pl is not None: U_centered_pp_pl_l: Optional[np.ndarray] = _cohort_recenter_per_period( U_pp_pl[eligible_mask_pl], cid_elig_pl ) _pl_pp_cache[lag_l] = U_centered_pp_pl_l else: U_centered_pp_pl_l = None _elig_groups_pl = [ all_groups[g] for g in range(len(all_groups)) if eligible_mask_pl[g] ] se_pl_l, n_valid_pl_l = _compute_se( U_centered=U_centered_pl_l, divisor=pl_data["N_pl_l"], obs_survey_info=_obs_survey_info, eligible_groups=_elig_groups_pl, U_centered_per_period=U_centered_pp_pl_l, ) if n_valid_pl_l is not None: _replicate_n_valid_list.append(n_valid_pl_l) placebo_horizon_se[lag_l] = se_pl_l pl_val = pl_data["placebo_l"] _df_s = _effective_df_survey(resolved_survey, _replicate_n_valid_list) t_pl_l, p_pl_l, ci_pl_l = safe_inference( pl_val, se_pl_l, alpha=self.alpha, df=_inference_df(_df_s, resolved_survey), ) placebo_horizon_inference[lag_l] = { "effect": pl_val, "se": se_pl_l, "t_stat": t_pl_l, "p_value": p_pl_l, "conf_int": ci_pl_l, "n_obs": pl_data["N_pl_l"], } # Per-path backward-horizon placebos under by_path. Sibling # of the per-path event-study computation above; keyed by # path tuple -> negative-int lag (-l for lag l) to match # `placebo_event_study`'s convention. Inherits the cross- # path cohort-sharing SE deviation from R documented for # `path_effects` (full-panel cohort-centered plug-in vs # R's per-path re-run). if ( (self.by_path is not None or self.paths_of_interest is not None) and self.placebo and multi_horizon_placebos is not None ): _df_s_bp_pl = _effective_df_survey(resolved_survey, _replicate_n_valid_list) path_placebos = _compute_path_placebos( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max, by_path=self.by_path, paths_of_interest=self.paths_of_interest, eligible_mask_var=eligible_mask_var, multi_horizon_placebos=multi_horizon_placebos, alpha=self.alpha, df_inference=_inference_df(_df_s_bp_pl, resolved_survey), set_ids=set_ids_arr, obs_survey_info=_obs_survey_info, eligible_groups=eligible_groups_var, replicate_n_valid_list=_replicate_n_valid_list, ) # Per-path inference for replicate-weight designs is # refreshed in the final R2 P1b block below (alongside # global event-study / placebo / heterogeneity surfaces), # so it reflects the FINAL `_replicate_n_valid_list` after # heterogeneity / overall / joiners / leavers IF sites # have appended their own `n_valid` values. Computing it # here would only see per-path appends and miss any later # df shrinkage from those subsequent IF sites. # Normalized effects DID^n_l (suppressed under trends_linear # because event_study_effects holds second-differences DID^{fd}_l, # not level effects - normalizing second-differences is wrong) if not _is_trends_linear: normalized_effects_dict = _compute_normalized_effects( multi_horizon_dids=multi_horizon_dids, D_mat=D_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, L_max=L_max, ) # Cost-benefit delta (only meaningful when L_max >= 2) if L_max >= 2: cost_benefit_result = _compute_cost_benefit_delta( multi_horizon_dids=multi_horizon_dids, D_mat=D_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, L_max=L_max, ) if cost_benefit_result.get("has_leavers", False): warnings.warn( "Assumption 7 (D_{g,t} >= D_{g,1}) is violated: leavers " "present. The cost-benefit delta is computed on the full " "sample (both joiners and leavers); delta_joiners and " "delta_leavers are available separately on " "results.cost_benefit_delta.", UserWarning, stacklevel=2, ) # ------------------------------------------------------------------ # Step 13-16: Cohort identification, influence-function vectors, # cohort-recentered plug-in variance # ------------------------------------------------------------------ ( U_centered_overall, n_groups_for_overall_var, n_cohorts, n_groups_dropped_never_switching, U_centered_joiners, U_centered_leavers, _eligible_group_ids, U_centered_pp_overall, U_centered_pp_joiners, U_centered_pp_leavers, ) = _compute_cohort_recentered_inputs( D_mat=D_mat, # Phase 1 IF uses per-period structure: use raw outcomes # when controls or trends_linear transform Y_mat. Y_mat=( Y_mat_raw if controls is not None else (y_pivot.to_numpy() if _is_trends_linear else Y_mat) ), N_mat=N_mat_orig, n_10_t_arr=n_10_t_arr, n_00_t_arr=n_00_t_arr, n_01_t_arr=n_01_t_arr, n_11_t_arr=n_11_t_arr, a11_plus_zeroed_arr=a11_plus_zeroed_arr, a11_minus_zeroed_arr=a11_minus_zeroed_arr, all_groups=all_groups, singleton_baseline_groups=singleton_baseline_groups, compute_per_period=_obs_survey_info is not None, ) # Analytical SE for DID_M (survey-aware when survey_design provided) overall_se, n_valid_overall = _compute_se( U_centered=U_centered_overall, divisor=N_S, obs_survey_info=_obs_survey_info, eligible_groups=_eligible_group_ids, U_centered_per_period=U_centered_pp_overall, ) if n_valid_overall is not None: _replicate_n_valid_list.append(n_valid_overall) # Detect the degenerate-cohort case: every variance-eligible group # forms its own (D_{g,1}, F_g, S_g) cohort, so the centered # influence function is identically zero and `_plugin_se` returns # NaN. Surface this as a UserWarning so users see the variance is # unidentified rather than silently mistaking NaN for "missing # data" or 0.0 for infinite precision. The bootstrap path inherits # the same degeneracy on this panel because it multiplies the # same all-zero centered IF by random weights. if np.isnan(overall_se) and n_groups_for_overall_var > 0 and N_S > 0: warnings.warn( f"Cohort-recentered analytical variance is unidentified: " f"every variance-eligible group forms its own " f"(D_{{g,1}}, F_g, S_g) cohort " f"({n_groups_for_overall_var} groups across {n_cohorts} " f"cohorts), so the centered influence function vector is " f"identically zero. The DID_M point estimate is still " f"valid; SE / t_stat / p_value / conf_int are NaN-" f"consistent. To get a non-degenerate analytical SE, " f"include more groups so cohorts have peers (real-world " f"panels typically have G >> K). The bootstrap path " f"inherits the same degeneracy on this data.", UserWarning, stacklevel=2, ) _df_survey = _effective_df_survey(resolved_survey, _replicate_n_valid_list) overall_t, overall_p, overall_ci = safe_inference( overall_att, overall_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), ) # Joiners SE (uses joiner-only centered IF; conservative bound) if joiners_available: joiners_se, n_valid_joiners = _compute_se( U_centered=U_centered_joiners, divisor=joiner_total, obs_survey_info=_obs_survey_info, eligible_groups=_eligible_group_ids, U_centered_per_period=U_centered_pp_joiners, ) if n_valid_joiners is not None: _replicate_n_valid_list.append(n_valid_joiners) _df_survey = _effective_df_survey(resolved_survey, _replicate_n_valid_list) joiners_t, joiners_p, joiners_ci = safe_inference( joiners_att, joiners_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), ) else: joiners_se, joiners_t, joiners_p, joiners_ci = ( float("nan"), float("nan"), float("nan"), (float("nan"), float("nan")), ) # Leavers SE if leavers_available: leavers_se, n_valid_leavers = _compute_se( U_centered=U_centered_leavers, divisor=leaver_total, obs_survey_info=_obs_survey_info, eligible_groups=_eligible_group_ids, U_centered_per_period=U_centered_pp_leavers, ) if n_valid_leavers is not None: _replicate_n_valid_list.append(n_valid_leavers) _df_survey = _effective_df_survey(resolved_survey, _replicate_n_valid_list) leavers_t, leavers_p, leavers_ci = safe_inference( leavers_att, leavers_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), ) else: leavers_se, leavers_t, leavers_p, leavers_ci = ( float("nan"), float("nan"), float("nan"), (float("nan"), float("nan")), ) # Phase 1 per-period placebo (L_max=None): SE is NaN because the # per-period DID_M^pl aggregation path does not have an IF # derivation. Multi-horizon placebos (L_max >= 1) use the per-group # placebo IF computed above and have valid SE. placebo_se = float("nan") placebo_t = float("nan") placebo_p = float("nan") placebo_ci: Tuple[float, float] = (float("nan"), float("nan")) if placebo_available and L_max is None: warnings.warn( "Single-period placebo SE (L_max=None) is NaN. The " "per-period DID_M^pl aggregation path does not have an " "influence-function derivation. Use L_max >= 1 for " "multi-horizon placebos with valid SE. The placebo " "point estimate (results.placebo_effect) is still " "meaningful.", UserWarning, stacklevel=2, ) # ------------------------------------------------------------------ # Step 18: Build per-period decomposition with explicit n_*_t fields # ------------------------------------------------------------------ n_treated_obs_post = int(N_mat[D_mat == 1].sum()) # ------------------------------------------------------------------ # Step 19: Bootstrap if requested # ------------------------------------------------------------------ bootstrap_results: Optional[DCDHBootstrapResults] = None if self.n_bootstrap > 0: if resolved_survey is not None: # Replicate-weight designs have their own closed-form # variance (Rao-Wu / BRR / Fay / JK1 / JKn / SDR via # compute_replicate_if_variance). Combining replicate # variance with a multiplier bootstrap would double-count # variance. Match library precedent: # efficient_did.py:989, staggered.py:1869, # two_stage.py:251-253. if ( resolved_survey.replicate_weights is not None and resolved_survey.replicate_weights.shape[1] > 0 ): raise NotImplementedError( "dCDH survey support rejects the combination of " "replicate weights and n_bootstrap > 0. Replicate-" "weight variance is closed-form " "(compute_replicate_if_variance); use n_bootstrap=0. " "For a bootstrap-based SE, use strata/PSU/FPC design " "instead of replicate weights." ) # Warning fires only when PSU is **strictly coarser # than group** on an otherwise within-group-constant # design (multiple eligible groups share a PSU label # but no group spans more than one PSU). Two regimes # are explicitly excluded: # - PSU=group (auto-inject default): identity-map # fast path — no warning needed. # - Within-group-varying PSU: the cell-level # allocator honors the per-cell PSU structure; # "n_psu < n_groups" is expected whenever cells # of a group share a PSU with cells of another # group, which does not indicate coarser-than-group # clustering in the Hall-Mammen sense. # Count unique PSUs across ALL positive-weight obs of # eligible groups AND detect within-group-varying # PSU; suppress the warning in that regime. psu_arr_warn = getattr(resolved_survey, "psu", None) if psu_arr_warn is None or _obs_survey_info is None: # No PSU info — can't compare to group count. n_psu_eff_warn, n_groups_eff_warn = -1, -1 psu_varies_within_warn = False else: obs_gids_warn = np.asarray(_obs_survey_info["group_ids"]) obs_ws_warn = np.asarray(_obs_survey_info["weights"], dtype=np.float64) pos_mask_warn = obs_ws_warn > 0 psu_codes_warn = np.asarray(psu_arr_warn) eligible_gid_set = set(_eligible_group_ids) elig_obs_mask_warn = pos_mask_warn & np.array( [g in eligible_gid_set for g in obs_gids_warn], dtype=bool, ) if elig_obs_mask_warn.any(): elig_psu_labels_arr = psu_codes_warn[elig_obs_mask_warn] n_psu_eff_warn = int(len(np.unique(elig_psu_labels_arr))) n_groups_eff_warn = len(_eligible_group_ids) # Detect within-group-varying PSU on the # eligible subset so we can suppress the # "strictly coarser PSU" warning there. psu_varies_within_warn = bool( pd.DataFrame( { "g": obs_gids_warn[elig_obs_mask_warn], "p": elig_psu_labels_arr, } ) .groupby("g")["p"] .nunique() .gt(1) .any() ) else: n_psu_eff_warn, n_groups_eff_warn = -1, -1 psu_varies_within_warn = False if 0 <= n_psu_eff_warn < n_groups_eff_warn and not psu_varies_within_warn: warnings.warn( f"Bootstrap with survey_design uses Hall-Mammen " f"wild multiplier weights at the PSU level " f"(n_psu={n_psu_eff_warn} PSUs across " f"n_groups={n_groups_eff_warn} groups). For " f"designs with substantially unequal PSU sizes, " f"the wild bootstrap may under-cover relative to " f"analytical TSL inference; consider " f"n_bootstrap=0 for the TSL variance. If n_psu " f"is close to n_groups, lonely-PSU removal " f"(lonely_psu='remove') may be collapsing " f"singletons.", UserWarning, stacklevel=2, ) joiners_inputs = ( (U_centered_joiners, joiner_total, joiners_att, U_centered_pp_joiners) if joiners_available else None ) leavers_inputs = ( (U_centered_leavers, leaver_total, leavers_att, U_centered_pp_leavers) if leavers_available else None ) # Phase 1 placebo bootstrap: the Phase 1 per-period placebo # DID_M^pl still uses NaN SE (no IF derivation for the # per-period aggregation). The multi-horizon placebo bootstrap # below handles Phase 2+ placebos when placebo_horizon_if is # available. placebo_inputs = None # Phase 2: build placebo-horizon bootstrap inputs from the # cohort-centered placebo IF vectors. pl_boot_inputs = None if ( placebo_horizon_if is not None and multi_horizon_placebos is not None and L_max is not None and L_max >= 1 ): singleton_baseline_set_pl_b = set(singleton_baseline_groups) eligible_mask_pl_b = np.array( [g not in singleton_baseline_set_pl_b for g in all_groups], dtype=bool, ) pl_boot_inputs = {} for lag_l in range(1, L_max + 1): pl_data = multi_horizon_placebos.get(lag_l) if pl_data is None or pl_data["N_pl_l"] == 0: continue U_pl_full, _ = placebo_horizon_if[lag_l] U_pl_elig_b = U_pl_full[eligible_mask_pl_b] cohort_keys_pl_b = [ ( float(baselines[g]), int(first_switch_idx_arr[g]), int(switch_direction_arr[g]), ) for g in range(len(all_groups)) ] unique_cpl_b: Dict[Tuple[float, int, int], int] = {} cid_pl_b = np.zeros(len(all_groups), dtype=int) for g in range(len(all_groups)): if not eligible_mask_pl_b[g]: cid_pl_b[g] = -1 continue key = cohort_keys_pl_b[g] if key not in unique_cpl_b: unique_cpl_b[key] = len(unique_cpl_b) cid_pl_b[g] = unique_cpl_b[key] cid_elig_pl_b = cid_pl_b[eligible_mask_pl_b] U_centered_pl_b = _cohort_recenter(U_pl_elig_b, cid_elig_pl_b) pl_boot_inputs[lag_l] = ( U_centered_pl_b, pl_data["N_pl_l"], pl_data["placebo_l"], _pl_pp_cache.get(lag_l), ) # Phase 2: build multi-horizon bootstrap inputs from the # cohort-centered IF vectors computed in Step 12c. mh_boot_inputs = None if ( multi_horizon_if is not None and multi_horizon_dids is not None and multi_horizon_se is not None and L_max is not None and L_max >= 1 ): singleton_baseline_set_b = set(singleton_baseline_groups) eligible_mask_b = np.array( [g not in singleton_baseline_set_b for g in all_groups], dtype=bool ) mh_boot_inputs = {} # Include ALL horizons 1..L_max so the sup-t critical # value is calibrated over the same set that receives # cband_conf_int. For l=1, use the per-group IF (not # the Phase 1 per-period IF) so the bootstrap matches # the event_study_effects[1] estimand. for l_h in range(1, L_max + 1): h_data = multi_horizon_dids.get(l_h) if h_data is None or h_data["N_l"] == 0: continue U_l_full, _ = multi_horizon_if[l_h] # Full variance-eligible group set (matching # analytical SE path: singleton-baseline only) U_l_elig = U_l_full[eligible_mask_b] # Use the same cohort IDs as the analytical SE path cohort_keys_b = [ ( float(baselines[g]), int(first_switch_idx_arr[g]), int(switch_direction_arr[g]), ) for g in range(len(all_groups)) ] unique_cb: Dict[Tuple[float, int, int], int] = {} cid_b = np.zeros(len(all_groups), dtype=int) for g in range(len(all_groups)): if not eligible_mask_b[g]: cid_b[g] = -1 continue key = cohort_keys_b[g] if key not in unique_cb: unique_cb[key] = len(unique_cb) cid_b[g] = unique_cb[key] cid_elig = cid_b[eligible_mask_b] U_centered_h = _cohort_recenter(U_l_elig, cid_elig) mh_boot_inputs[l_h] = ( U_centered_h, h_data["N_l"], h_data["did_l"], _mh_pp_cache.get(l_h), ) # Under a survey design with PSU information, build both # (a) a group-level `group_id_to_psu_code` dict (one PSU # code per eligible group) and (b) a per-cell PSU tensor # `psu_codes_per_cell` of shape (n_eligible, n_periods). # The bootstrap mixin's dispatcher inspects (b) to decide # whether PSU is within-group-constant: when constant, it # runs the legacy group-level bootstrap via (a) for # bit-identity with pre-release behavior; when varying, # it switches to the cell-level wild PSU bootstrap that # draws one multiplier per (g, t)'s PSU. Under auto-inject # `psu=group` each group has a unique PSU code and every # cell of a group shares it — the dispatcher routes to # the legacy path and the identity-map fast path # reproduces the pre-PSU behavior bit-for-bit. See # REGISTRY.md ChaisemartinDHaultfoeuille Survey + # bootstrap contract Note. group_id_to_psu_code_bootstrap: Optional[Dict[Any, int]] = None eligible_group_ids_bootstrap: Optional[np.ndarray] = None psu_codes_per_cell_bootstrap: Optional[np.ndarray] = None if ( resolved_survey is not None and getattr(resolved_survey, "psu", None) is not None and _obs_survey_info is not None ): obs_psu_codes = np.asarray(resolved_survey.psu) obs_gids_boot = np.asarray(_obs_survey_info["group_ids"]) obs_tids_boot = np.asarray(_obs_survey_info["time_ids"]) obs_weights_boot = np.asarray(_obs_survey_info["weights"], dtype=np.float64) pos_mask_boot = obs_weights_boot > 0 gid_to_idx = {gid: i for i, gid in enumerate(_eligible_group_ids)} tid_to_idx = {t: i for i, t in enumerate(all_periods)} n_elig_boot = len(_eligible_group_ids) n_per_boot = len(all_periods) g_idx_arr = np.array( [gid_to_idx.get(g, -1) for g in obs_gids_boot], dtype=np.int64, ) t_idx_arr = np.array( [tid_to_idx.get(t, -1) for t in obs_tids_boot], dtype=np.int64, ) # Factor PSU labels to dense int codes over the # **eligible-subset** positive-weight observations only # (not the full positive-weight population). Restricting # to eligible obs ensures the resulting dense codes # range ONLY over PSUs actually used by variance- # eligible groups, so downstream n_psu = max(code) + 1 # is exact: no gaps from singleton-baseline-excluded # groups that would silently trigger the identity # fast path in `_generate_psu_or_group_weights`. elig_obs_mask = pos_mask_boot & (g_idx_arr >= 0) & (t_idx_arr >= 0) elig_psu_labels = obs_psu_codes[elig_obs_mask] dense_per_row: Optional[np.ndarray] = None if elig_psu_labels.size > 0: _, elig_dense_codes = np.unique( elig_psu_labels, return_inverse=True, ) elig_dense_codes = np.asarray(elig_dense_codes, dtype=np.int64) dense_per_row = np.full( len(obs_psu_codes), -1, dtype=np.int64, ) dense_per_row[elig_obs_mask] = elig_dense_codes # Per-cell PSU tensor: (n_eligible, n_periods), -1 sentinel # for ineligible / zero-weight cells. Populated # unconditionally when `dense_per_row` exists — a row # that ends up all-sentinel (eligible group with no # positive-weight obs) is masked out at unroll time, # not by discarding the entire tensor. See also the # dispatcher's `_psu_varies_within_group` helper which # ignores sentinel entries row-wise. if dense_per_row is not None: psu_codes_per_cell = np.full( (n_elig_boot, n_per_boot), -1, dtype=np.int64, ) psu_codes_per_cell[ g_idx_arr[elig_obs_mask], t_idx_arr[elig_obs_mask], ] = dense_per_row[elig_obs_mask] psu_codes_per_cell_bootstrap = psu_codes_per_cell # Group-level dict: one PSU code per eligible # group. For rows that are all-sentinel (eligible # group has no positive-weight obs), assign code # `0` as a harmless placeholder — the group's IF # mass is zero, so the bootstrap multiplier it # receives is irrelevant on either the legacy or # the cell-level path. Always populate the dict # so the legacy group-level path keeps clustering # correctly when psu_varies=False even if some # eligible groups happen to have no positive- # weight obs. group_psu_labels: List[int] = [] for i in range(n_elig_boot): row = psu_codes_per_cell[i] valid = row[row >= 0] if valid.size == 0: group_psu_labels.append(0) else: group_psu_labels.append(int(valid[0])) group_id_to_psu_code_bootstrap = { gid: code for gid, code in zip(_eligible_group_ids, group_psu_labels) } eligible_group_ids_bootstrap = np.asarray(_eligible_group_ids) # Collect per-(path, horizon) bootstrap inputs when by_path is # active. Uses the sibling helper to walk the same enumeration # / per-path IF / cohort-recentering pipeline that # `_compute_path_effects` uses (kept separate per the review # architectural preference — see `_collect_path_bootstrap_inputs`). path_bootstrap_inputs = None if ( (self.by_path is not None or self.paths_of_interest is not None) and L_max is not None and L_max >= 1 and multi_horizon_dids is not None and path_effects is not None and len(path_effects) > 0 ): path_bootstrap_inputs = _collect_path_bootstrap_inputs( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max, by_path=self.by_path, paths_of_interest=self.paths_of_interest, eligible_mask_var=eligible_mask_var, multi_horizon_dids=multi_horizon_dids, path_effects=path_effects, set_ids=set_ids_arr, ) # Sibling collector for per-path backward placebos. Mirrors # the path_bootstrap_inputs gating: only invoke when by_path # + placebo are both active, multi_horizon_placebos is # populated, and analytical path_placebos returned a non- # empty dict. path_placebo_bootstrap_inputs = None if ( (self.by_path is not None or self.paths_of_interest is not None) and self.placebo and L_max is not None and L_max >= 1 and multi_horizon_placebos is not None and path_placebos is not None and len(path_placebos) > 0 ): path_placebo_bootstrap_inputs = _collect_path_placebo_bootstrap_inputs( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max, by_path=self.by_path, paths_of_interest=self.paths_of_interest, eligible_mask_var=eligible_mask_var, multi_horizon_placebos=multi_horizon_placebos, path_placebos=path_placebos, set_ids=set_ids_arr, ) br = self._compute_dcdh_bootstrap( n_groups_for_overall=n_groups_for_overall_var, u_centered_overall=U_centered_overall, divisor_overall=N_S, original_overall=overall_att, joiners_inputs=joiners_inputs, leavers_inputs=leavers_inputs, placebo_inputs=placebo_inputs, multi_horizon_inputs=mh_boot_inputs, placebo_horizon_inputs=pl_boot_inputs, path_bootstrap_inputs=path_bootstrap_inputs, path_placebo_bootstrap_inputs=path_placebo_bootstrap_inputs, group_id_to_psu_code=group_id_to_psu_code_bootstrap, eligible_group_ids=eligible_group_ids_bootstrap, u_per_period_overall=U_centered_pp_overall, psu_codes_per_cell=psu_codes_per_cell_bootstrap, ) bootstrap_results = br # Replace the analytical SE with the bootstrap SE for the # targets that have valid bootstrap output, AND propagate # the bootstrap percentile p-value and CI directly to the # top-level fields. The t-stat is computed from the SE via # safe_inference()[0] so the project anti-pattern rule # (never compute t_stat = effect / se inline) stays # satisfied — bootstrap does not define an alternative # t-stat semantic for percentile bootstrap, so the # SE-based t-stat is the natural choice. # # Library precedent: imputation.py:790-805, # two_stage.py:778-787, and efficient_did.py:1009-1013 all # propagate bootstrap p/CI to the public surface while # keeping a SE-derived t-stat. Round 10 brings dCDH in line # with that pattern (the prior code silently recomputed # normal-theory p/CI from the bootstrap SE, which made the # public inference surface a hybrid). # # See REGISTRY.md ChaisemartinDHaultfoeuille `Note # (bootstrap inference surface)` and the regression test # ``test_bootstrap_p_value_and_ci_propagated_to_top_level``. # Bootstrap contract: once the caller opts into n_bootstrap > 0, # bootstrap SE / percentile CI / percentile p-value replace the # analytical values. When the bootstrap SE comes back non-finite # (e.g., n_bootstrap too small, degenerate bootstrap distribution, # zero-IF target), the full inference tuple goes to NaN rather # than silently falling back to analytical — mixing bootstrap- # contract and analytical-contract semantics within one result # object would be a public-surface inconsistency. Same treatment # applies to the event_study_effects propagation below and the # path_effects propagation further down. if np.isfinite(br.overall_se): overall_se = br.overall_se overall_p = br.overall_p_value if br.overall_p_value is not None else np.nan overall_ci = br.overall_ci if br.overall_ci is not None else (np.nan, np.nan) overall_t = safe_inference( overall_att, overall_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), )[0] else: overall_se = np.nan overall_p = np.nan overall_ci = (np.nan, np.nan) overall_t = np.nan if joiners_available: if br.joiners_se is not None and np.isfinite(br.joiners_se): joiners_se = br.joiners_se joiners_p = br.joiners_p_value if br.joiners_p_value is not None else np.nan joiners_ci = br.joiners_ci if br.joiners_ci is not None else (np.nan, np.nan) joiners_t = safe_inference( joiners_att, joiners_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), )[0] else: joiners_se = np.nan joiners_p = np.nan joiners_ci = (np.nan, np.nan) joiners_t = np.nan if leavers_available: if br.leavers_se is not None and np.isfinite(br.leavers_se): leavers_se = br.leavers_se leavers_p = br.leavers_p_value if br.leavers_p_value is not None else np.nan leavers_ci = br.leavers_ci if br.leavers_ci is not None else (np.nan, np.nan) leavers_t = safe_inference( leavers_att, leavers_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), )[0] else: leavers_se = np.nan leavers_p = np.nan leavers_ci = (np.nan, np.nan) leavers_t = np.nan # ------------------------------------------------------------------ # Step 20: Build the results dataclass # ------------------------------------------------------------------ # event_study_effects: when L_max is None, l=1 mirrors Phase 1 # DID_M (per-period path). When L_max >= 1, ALL horizons including # l=1 use the per-group DID_{g,l} path for a consistent estimand. if multi_horizon_inference is not None and 1 in multi_horizon_inference: # Per-group mode: use per-group path for all horizons. # When L_max >= 1, the per-group DID_{g,1} is the correct # estimand for overall_att (not the binary-only per-period # DID_M). This handles both pure non-binary (N_S=0) and # mixed binary/non-binary panels (N_S > 0 but incomplete). l1_inf = multi_horizon_inference[1] overall_att = l1_inf["effect"] overall_se = l1_inf["se"] overall_t = l1_inf["t_stat"] overall_p = l1_inf["p_value"] overall_ci = l1_inf["conf_int"] event_study_effects: Dict[int, Dict[str, Any]] = dict(multi_horizon_inference) else: # Phase 1 mode (L_max=None): l=1 from per-period path event_study_effects = { 1: { "effect": overall_att, "se": overall_se, "t_stat": overall_t, "p_value": overall_p, "conf_int": overall_ci, "n_obs": N_S, } } # Phase 2: propagate bootstrap results to event_study_effects. # Same bootstrap-contract rule as the overall/joiners/leavers block # above and the path_effects block below: non-finite bootstrap SE # writes NaN to the full inference tuple rather than falling back # to analytical. if bootstrap_results is not None and bootstrap_results.event_study_ses: for l_h in bootstrap_results.event_study_ses: if l_h in event_study_effects: bs_se = bootstrap_results.event_study_ses.get(l_h) bs_ci = ( bootstrap_results.event_study_cis.get(l_h) if bootstrap_results.event_study_cis else None ) bs_p = ( bootstrap_results.event_study_p_values.get(l_h) if bootstrap_results.event_study_p_values else None ) eff = event_study_effects[l_h]["effect"] if bs_se is not None and np.isfinite(bs_se): event_study_effects[l_h]["se"] = bs_se event_study_effects[l_h]["p_value"] = bs_p if bs_p is not None else np.nan event_study_effects[l_h]["conf_int"] = ( bs_ci if bs_ci is not None else (np.nan, np.nan) ) event_study_effects[l_h]["t_stat"] = safe_inference( eff, bs_se, alpha=self.alpha, df=None )[0] else: event_study_effects[l_h]["se"] = np.nan event_study_effects[l_h]["p_value"] = np.nan event_study_effects[l_h]["conf_int"] = (np.nan, np.nan) event_study_effects[l_h]["t_stat"] = np.nan # Add sup-t bands to event_study_effects entries if bootstrap_results.cband_crit_value is not None: crit = bootstrap_results.cband_crit_value for l_h in event_study_effects: se = event_study_effects[l_h]["se"] eff = event_study_effects[l_h]["effect"] if np.isfinite(se) and se > 0: event_study_effects[l_h]["cband_conf_int"] = ( eff - crit * se, eff + crit * se, ) # Phase 3: propagate bootstrap results to path_effects (by_path). # Mirrors the event_study propagation above: replace the analytical # SE / p-value / CI with the bootstrap percentile statistics # (Round-10 library convention — `br.path_p_values` and # `br.path_cis` are already percentile-based via # `compute_effect_bootstrap_stats`), and re-derive the t-stat # from the bootstrap SE via `safe_inference` per the anti-pattern # rule. Point estimates (`effect`, `n_obs`, `n_groups`, # `frequency_rank`) are unchanged from the analytical path. if ( bootstrap_results is not None and bootstrap_results.path_ses and path_effects is not None ): for path_key, horizon_ses in bootstrap_results.path_ses.items(): if path_key not in path_effects: continue for l_h, bs_se in horizon_ses.items(): if l_h not in path_effects[path_key]["horizons"]: continue bs_ci = ( bootstrap_results.path_cis.get(path_key, {}).get(l_h) if bootstrap_results.path_cis else None ) bs_p = ( bootstrap_results.path_p_values.get(path_key, {}).get(l_h) if bootstrap_results.path_p_values else None ) # Bootstrap replaces analytical inference for # this (path, horizon) regardless of outcome. If # the bootstrap SE is non-finite (e.g., n_bootstrap # too small, degenerate bootstrap distribution, or # zero-IF path inherited from the analytical # degenerate-cohort branch), the full inference # tuple goes to NaN — we must NOT fall back to # analytical inference here, since the caller # explicitly chose the bootstrap path by setting # n_bootstrap > 0. Falling back would silently mix # bootstrap-contract semantics with analytical- # contract semantics within the same result # object. eff_p = path_effects[path_key]["horizons"][l_h]["effect"] if bs_se is not None and np.isfinite(bs_se): path_effects[path_key]["horizons"][l_h]["se"] = bs_se path_effects[path_key]["horizons"][l_h]["p_value"] = ( bs_p if bs_p is not None else np.nan ) path_effects[path_key]["horizons"][l_h]["conf_int"] = ( bs_ci if bs_ci is not None else (np.nan, np.nan) ) path_effects[path_key]["horizons"][l_h]["t_stat"] = safe_inference( eff_p, bs_se, alpha=self.alpha, df=None )[0] else: path_effects[path_key]["horizons"][l_h]["se"] = np.nan path_effects[path_key]["horizons"][l_h]["p_value"] = np.nan path_effects[path_key]["horizons"][l_h]["conf_int"] = ( np.nan, np.nan, ) path_effects[path_key]["horizons"][l_h]["t_stat"] = np.nan # Per-path cumulated layer (under trends_linear). Computed AFTER # the bootstrap propagation block above so the cumulated SE / t / # p / CI are derived from the FINAL post-bootstrap per-horizon # path SEs rather than the analytical ones path_effects was # initially populated with at fit-time. Mirrors the global # `linear_trends_effects` placement at `:3405-3454` which also # runs after the event_study bootstrap propagation. Honors the # library-wide NaN-on-invalid bootstrap contract: any non-finite # component SE in the running-sum upper bound yields a NaN # cumulated SE / t / p / CI, regardless of whether the source # was an analytical singularity or a non-finite bootstrap draw. if ( (self.by_path is not None or self.paths_of_interest is not None) and _is_trends_linear and L_max is not None and L_max >= 1 and multi_horizon_dids is not None and path_effects is not None and len(path_effects) > 0 ): _df_s_bp_cum = _effective_df_survey(resolved_survey, _replicate_n_valid_list) path_cumulated_event_study = _compute_path_cumulated_event_study( D_mat=D_mat, N_mat=N_mat, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, L_max=L_max, by_path=self.by_path, paths_of_interest=self.paths_of_interest, multi_horizon_dids=multi_horizon_dids, path_effects=path_effects, alpha=self.alpha, df_inference=_inference_df(_df_s_bp_cum, resolved_survey), ) # Phase 3: propagate bootstrap results to per-path placebos # (by_path + placebo). Sibling of the path_effects propagation # block above. Library-wide NaN-on-invalid bootstrap contract: # non-finite bootstrap SE writes NaN to the full inference # tuple rather than falling back to the analytical SE -- the # caller opted into bootstrap by setting n_bootstrap > 0, and # mixing analytical + bootstrap semantics inside one result # object is a public-surface inconsistency. if ( bootstrap_results is not None and bootstrap_results.path_placebo_ses and path_placebos is not None ): for path_key, lag_ses in bootstrap_results.path_placebo_ses.items(): if path_key not in path_placebos: continue for lag_l, bs_se_pl in lag_ses.items(): neg_key = -lag_l if neg_key not in path_placebos[path_key]: continue bs_ci_pl = ( bootstrap_results.path_placebo_cis.get(path_key, {}).get(lag_l) if bootstrap_results.path_placebo_cis else None ) bs_p_pl = ( bootstrap_results.path_placebo_p_values.get(path_key, {}).get(lag_l) if bootstrap_results.path_placebo_p_values else None ) eff_pl = path_placebos[path_key][neg_key]["effect"] if bs_se_pl is not None and np.isfinite(bs_se_pl): path_placebos[path_key][neg_key]["se"] = bs_se_pl path_placebos[path_key][neg_key]["p_value"] = ( bs_p_pl if bs_p_pl is not None else np.nan ) path_placebos[path_key][neg_key]["conf_int"] = ( bs_ci_pl if bs_ci_pl is not None else (np.nan, np.nan) ) path_placebos[path_key][neg_key]["t_stat"] = safe_inference( eff_pl, bs_se_pl, alpha=self.alpha, df=None )[0] else: path_placebos[path_key][neg_key]["se"] = np.nan path_placebos[path_key][neg_key]["p_value"] = np.nan path_placebos[path_key][neg_key]["conf_int"] = (np.nan, np.nan) path_placebos[path_key][neg_key]["t_stat"] = np.nan # Phase 3: propagate per-path sup-t critical values to per- # horizon `cband_conf_int` entries on path_effects (by_path + # n_bootstrap > 0). Sibling of the OVERALL event-study cband # propagation at `:2865-2875`. For each path with a finite # crit, write `cband_conf_int = (eff - c_p*se, eff + c_p*se)` # into each horizon's dict whose bootstrap-replaced SE is # finite > 0. Mirror the OVERALL absent-key pattern: non-finite # SE horizons simply don't get the `cband_conf_int` key. if ( bootstrap_results is not None and bootstrap_results.path_cband_crit_values is not None and path_effects is not None ): for path_key, crit in bootstrap_results.path_cband_crit_values.items(): if path_key not in path_effects: continue if not np.isfinite(crit): continue for l_h, h_dict in path_effects[path_key]["horizons"].items(): se = h_dict.get("se", np.nan) eff = h_dict.get("effect", np.nan) if np.isfinite(se) and se > 0: h_dict["cband_conf_int"] = ( eff - crit * se, eff + crit * se, ) # When L_max >= 1 and the per-group path is active, sync # overall_* from event_study_effects[1] AFTER bootstrap propagation # so that bootstrap SE/p/CI flow to the top-level surface. if L_max is not None and L_max >= 1 and 1 in event_study_effects: es1 = event_study_effects[1] overall_att = es1["effect"] overall_se = es1["se"] overall_t = es1["t_stat"] overall_p = es1["p_value"] overall_ci = es1["conf_int"] # Sync nested bootstrap_results.overall_* to DID_1 only when # L_max == 1. When L_max >= 2, the cost-benefit delta overrides # overall_* later, so bootstrap_results.overall_* should stay # on the scalar DID_M bootstrap (or be overridden by delta logic). if ( L_max == 1 and bootstrap_results is not None and bootstrap_results.event_study_ses and 1 in bootstrap_results.event_study_ses ): bootstrap_results.overall_se = bootstrap_results.event_study_ses[1] bootstrap_results.overall_ci = ( bootstrap_results.event_study_cis[1] if bootstrap_results.event_study_cis and 1 in bootstrap_results.event_study_cis else (np.nan, np.nan) ) # Clear the DID_M distribution - it doesn't match the # DID_1 summary statistics. The per-horizon bootstrap # stats are accessible via event_study_ses/cis/p_values. bootstrap_results.bootstrap_distribution = None bootstrap_results.overall_p_value = ( bootstrap_results.event_study_p_values[1] if bootstrap_results.event_study_p_values and 1 in bootstrap_results.event_study_p_values else np.nan ) # Phase 2: override overall_att with cost-benefit delta when L_max > 1 effective_overall_att = overall_att effective_overall_se = overall_se effective_overall_t = overall_t effective_overall_p = overall_p effective_overall_ci = overall_ci if cost_benefit_result is not None and L_max is not None and L_max >= 2: delta_val = cost_benefit_result["delta"] if not np.isfinite(delta_val): # Delta is non-estimable (e.g., no eligible switchers at # any horizon). Set all overall_* to NaN rather than # silently falling back to the Phase 1 DID_M values, # since the results surface labels them as delta. effective_overall_att = float("nan") effective_overall_se = float("nan") effective_overall_t = float("nan") effective_overall_p = float("nan") effective_overall_ci = (float("nan"), float("nan")) else: effective_overall_att = delta_val # Cost-benefit delta SE: compute from per-horizon bootstrap # distributions if available (delta = sum w_l * DID_l, so # delta_b = sum w_l * DID_l_b for each bootstrap rep). # Delta-method SE: Var(delta) = sum w_l^2 * Var(DID_l) # (treating horizons as independent, conservative under # Assumption 8). Works on both analytical and bootstrap # SEs since event_study_effects[l]["se"] holds whichever # was propagated. # Require ALL positively-weighted horizons to have finite # SE. If any has NaN, delta SE is NaN (NaN-consistent # inference contract: no partial aggregation). weights = cost_benefit_result.get("weights", {}) var_delta = 0.0 all_finite = True for l_w, w_l in weights.items(): if w_l <= 0: continue se_l = event_study_effects.get(l_w, {}).get("se", float("nan")) if not np.isfinite(se_l): all_finite = False break var_delta += (w_l * se_l) ** 2 delta_se = ( float(np.sqrt(var_delta)) if all_finite and var_delta > 0 else float("nan") ) if np.isfinite(delta_se): effective_overall_se = delta_se effective_overall_t, effective_overall_p, effective_overall_ci = safe_inference( delta_val, delta_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), ) else: effective_overall_se = float("nan") effective_overall_t = float("nan") effective_overall_p = float("nan") effective_overall_ci = (float("nan"), float("nan")) # Phase 2: build placebo_event_study with negative keys. # Use analytical SE from placebo IF (computed above), with # bootstrap override when available. placebo_event_study_dict: Optional[Dict[int, Dict[str, Any]]] = None if multi_horizon_placebos is not None: placebo_event_study_dict = {} for lag_l, pl_data in multi_horizon_placebos.items(): if pl_data["N_pl_l"] > 0: # Pull analytical SE from placebo IF computation if placebo_horizon_inference is not None and lag_l in placebo_horizon_inference: inf = placebo_horizon_inference[lag_l] placebo_event_study_dict[-lag_l] = { "effect": inf["effect"], "se": inf["se"], "t_stat": inf["t_stat"], "p_value": inf["p_value"], "conf_int": inf["conf_int"], "n_obs": inf["n_obs"], } else: # Fallback: NaN SE (Phase 1 path or missing IF) pl_se = float("nan") pl_t, pl_p, pl_ci = safe_inference( pl_data["placebo_l"], pl_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), ) placebo_event_study_dict[-lag_l] = { "effect": pl_data["placebo_l"], "se": pl_se, "t_stat": pl_t, "p_value": pl_p, "conf_int": pl_ci, "n_obs": pl_data["N_pl_l"], } else: placebo_event_study_dict[-lag_l] = { "effect": float("nan"), "se": float("nan"), "t_stat": float("nan"), "p_value": float("nan"), "conf_int": (float("nan"), float("nan")), "n_obs": 0, } # Propagate bootstrap results to placebo_event_study (must run # after placebo_event_study_dict is assembled above). if ( bootstrap_results is not None and bootstrap_results.placebo_horizon_ses and placebo_event_study_dict is not None ): for lag_l in bootstrap_results.placebo_horizon_ses: neg_key = -lag_l if neg_key in placebo_event_study_dict: bs_se = bootstrap_results.placebo_horizon_ses.get(lag_l) bs_ci = ( bootstrap_results.placebo_horizon_cis.get(lag_l) if bootstrap_results.placebo_horizon_cis else None ) bs_p = ( bootstrap_results.placebo_horizon_p_values.get(lag_l) if bootstrap_results.placebo_horizon_p_values else None ) # Same bootstrap-contract rule as overall / joiners / # leavers / event_study_effects / path_effects above: # once the caller opts into n_bootstrap > 0, the # bootstrap output replaces analytical inference on # this surface regardless of outcome. Non-finite # bootstrap SE writes NaN to the full inference tuple # rather than silently leaving analytical values in # place — that would mix bootstrap-contract and # analytical-contract semantics in the same rendered # output (dynamic placebo rows appear in # `results.to_dataframe(level="event_study")` alongside # positive-horizon entries). eff = placebo_event_study_dict[neg_key]["effect"] if bs_se is not None and np.isfinite(bs_se): placebo_event_study_dict[neg_key]["se"] = bs_se placebo_event_study_dict[neg_key]["p_value"] = ( bs_p if bs_p is not None else np.nan ) placebo_event_study_dict[neg_key]["conf_int"] = ( bs_ci if bs_ci is not None else (np.nan, np.nan) ) placebo_event_study_dict[neg_key]["t_stat"] = safe_inference( eff, bs_se, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), )[0] else: placebo_event_study_dict[neg_key]["se"] = np.nan placebo_event_study_dict[neg_key]["p_value"] = np.nan placebo_event_study_dict[neg_key]["conf_int"] = ( np.nan, np.nan, ) placebo_event_study_dict[neg_key]["t_stat"] = np.nan # Phase 2: build normalized_effects with SE normalized_effects_out: Optional[Dict[int, Dict[str, Any]]] = None if normalized_effects_dict is not None and multi_horizon_se is not None: normalized_effects_out = {} for l_h, n_data in normalized_effects_dict.items(): denom = n_data["denominator"] eff = n_data["effect"] # SE via delta method: SE(DID^n_l) = SE(DID_l) / delta^D_l se_did_l = multi_horizon_se.get(l_h, float("nan")) se_norm = se_did_l / denom if np.isfinite(denom) and denom > 0 else float("nan") t_n, p_n, ci_n = safe_inference( eff, se_norm, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey) ) normalized_effects_out[l_h] = { "effect": eff, "se": se_norm, "t_stat": t_n, "p_value": p_n, "conf_int": ci_n, "denominator": denom, } # ------------------------------------------------------------------ # DID^{fd} cumulation: recover level effects from second-differences # # DID^{fd}_l identifies delta_{g,l} - delta_{g,l-1} (Lemma 6). # Cumulate per-group: for each group eligible at horizon l, # sum DID^{fd}_{g,l'} for l'=1..l, then average over that # eligible set. This matches R's did_multiplegt_dyn which # cumulates per-group then aggregates (NOT sum-of-aggregates, # which mixes different eligible populations). # ------------------------------------------------------------------ if _is_trends_linear and multi_horizon_dids is not None: cumulated = {} n_groups_total = D_mat.shape[0] # Accumulate per-group running sum of DID^{fd}_{g,l'} running_per_group = np.zeros(n_groups_total) for l_h in range(1, (L_max or 0) + 1): if l_h not in multi_horizon_dids: continue mh = multi_horizon_dids[l_h] did_g_l = mh["did_g_l"] # (n_groups,) per-group DID eligible = mh["eligible_mask"] # (n_groups,) bool N_l = mh["N_l"] if N_l == 0: continue # Add this horizon's per-group DID to running sum # (NaN for ineligible groups; use 0 for accumulation) increment = np.where(np.isfinite(did_g_l), did_g_l, 0.0) running_per_group += increment # Average the cumulated sum over groups eligible at THIS horizon # Weight by S_g (switch direction) and divide by N_l S_arr = switch_direction_arr.astype(float) cum_effect = float(np.sum(S_arr[eligible] * running_per_group[eligible]) / N_l) # SE: conservative upper bound (sum of per-horizon SEs). # NaN-consistency: if ANY component SE up to horizon l is # non-finite, the cumulated SE is NaN (not 0.0). if event_study_effects is not None: component_ses = [ event_study_effects.get(ll, {}).get("se", np.nan) for ll in range(1, l_h + 1) ] if all(np.isfinite(s) for s in component_ses): running_se_ub = sum(component_ses) else: running_se_ub = float("nan") else: running_se_ub = float("nan") cum_t, cum_p, cum_ci = safe_inference( cum_effect, running_se_ub, alpha=self.alpha, df=_inference_df(_df_survey, resolved_survey), ) cumulated[l_h] = { "effect": cum_effect, "se": running_se_ub, "t_stat": cum_t, "p_value": cum_p, "conf_int": cum_ci, } linear_trends_effects = cumulated if cumulated else None # When trends_linear=True and L_max>=2, suppress cost_benefit_delta # and NaN out the overall_* surface. R's did_multiplegt_dyn with # trends_lin=TRUE does not compute an aggregate "average total # effect" - users should access cumulated level effects via # results.linear_trends_effects[l] instead. if _is_trends_linear and L_max is not None and L_max >= 2: cost_benefit_result = None effective_overall_att = float("nan") effective_overall_se = float("nan") effective_overall_t = float("nan") effective_overall_p = float("nan") effective_overall_ci = (float("nan"), float("nan")) # ------------------------------------------------------------------ # Heterogeneity testing (Web Appendix Section 1.5, Lemma 7) # ------------------------------------------------------------------ heterogeneity_effects: Optional[Dict[int, Dict[str, Any]]] = None if heterogeneity is not None: if L_max is None: raise ValueError( "heterogeneity testing requires L_max >= 1. Set L_max " "to use the per-group DID_{g,l} path." ) het_col = str(heterogeneity) if het_col not in data.columns: raise ValueError(f"heterogeneity column {het_col!r} not found in data.") # R's predict_het disallows controls; our partial implementation # follows this restriction to avoid inconsistent behavior. if controls is not None: raise ValueError( "heterogeneity cannot be combined with controls. " "R's did_multiplegt_dyn disallows predict_het with " "controls; remove one of the two options." ) if _is_trends_linear: raise ValueError( "heterogeneity cannot be combined with trends_linear. " "The heterogeneity test operates on level outcome " "changes but trends_linear uses second-differenced " "outcomes; the results would be inconsistent." ) if trends_nonparam is not None: raise ValueError( "heterogeneity cannot be combined with trends_nonparam. " "The heterogeneity test does not thread state-set " "control-pool restrictions; the results would be " "inconsistent with the fitted estimator." ) # Extract per-group covariate (must be time-invariant). # SurveyDesign.subpopulation() contract: scope time-invariance # check to positive-weight rows so excluded obs with NaN/varying # het values do not abort the fit. if survey_weights is not None: pos_mask_het = np.asarray(survey_weights) > 0 data_het = data.loc[pos_mask_het] else: data_het = data het_per_group = data_het.groupby(group)[het_col].nunique() het_varying = het_per_group[het_per_group > 1] if len(het_varying) > 0: raise ValueError( f"heterogeneity column {het_col!r} must be " f"time-invariant within each group. " f"{len(het_varying)} group(s) have varying values." ) het_map = data_het.groupby(group)[het_col].first() X_het = np.array([float(het_map.loc[g]) for g in all_groups]) # Use original Y_mat (not first-differenced) for heterogeneity # test, since it operates on level differences Y[out] - Y[ref]. # When trends_linear, the DID^{fd} second-differences are in # event_study_effects but the het test uses level outcomes. Y_het = Y_mat if not _is_trends_linear else y_pivot.to_numpy() N_het = N_mat_orig heterogeneity_effects = _compute_heterogeneity_test( Y_mat=Y_het, N_mat=N_het, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, X_het=X_het, L_max=L_max, alpha=self.alpha, rank_deficient_action=self.rank_deficient_action, group_ids_order=np.array(all_groups), obs_survey_info=_obs_survey_info, replicate_n_valid_list=_replicate_n_valid_list, ) # Per-path heterogeneity (mirrors R `did_multiplegt_dyn(..., # by_path, predict_het)` per-by_level dispatch). Empty-state # contract: None when not requested (no `heterogeneity` kwarg # or no `by_path`/`paths_of_interest` selector); `{}` when # requested but no path is observed (mirrors `path_effects`). path_heterogeneity_effects: Optional[Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]] = ( None ) if heterogeneity is not None and ( self.by_path is not None or self.paths_of_interest is not None ): path_heterogeneity_effects = _compute_path_heterogeneity_test( Y_mat=Y_het, N_mat=N_het, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, X_het=X_het, L_max=L_max, by_path=self.by_path, paths_of_interest=self.paths_of_interest, D_mat=D_mat, alpha=self.alpha, rank_deficient_action=self.rank_deficient_action, group_ids_order=np.array(all_groups), obs_survey_info=_obs_survey_info, replicate_n_valid_list=_replicate_n_valid_list, ) twfe_weights_df = None twfe_fraction_negative = None twfe_sigma_fe = None twfe_beta_fe = None if twfe_diagnostic_payload is not None: twfe_weights_df = twfe_diagnostic_payload.weights twfe_fraction_negative = twfe_diagnostic_payload.fraction_negative twfe_sigma_fe = twfe_diagnostic_payload.sigma_fe twfe_beta_fe = twfe_diagnostic_payload.beta_fe # When L_max >= 1, the overall estimand is per-group DID_1 # (not per-period DID_M). The joiner/leaver decomposition is a # per-period DID_M concept and can differ from DID_1 on mixed # panels, so it's suppressed for all L_max >= 1 cases. N_S and # n_treated_obs are updated from the per-group path. effective_N_S = N_S effective_n_treated = n_treated_obs_post effective_joiners_available = joiners_available effective_leavers_available = leavers_available if ( L_max is not None and L_max >= 1 and multi_horizon_dids is not None and 1 in multi_horizon_dids ): # Use horizon-1 eligible switcher count as the effective N_S effective_N_S = multi_horizon_dids[1]["N_l"] if not is_binary: # For non-binary: count all observations where treatment # differs from baseline effective_n_treated = ( int(N_mat[D_mat != D_mat[:, 0:1]].sum()) if D_mat.shape[1] > 1 else 0 ) if not is_binary: # Suppress binary-only Phase 1 artifacts on non-binary # panels: per_period_effects and single-period placebo # are DID_M concepts that don't apply to non-binary data. per_period_effects = {} placebo_effect = float("nan") placebo_se = float("nan") placebo_t = float("nan") placebo_p = float("nan") placebo_ci = (float("nan"), float("nan")) placebo_available = False # Suppress joiner/leaver decomposition for all L_max >= 1 # (the decomposition is a per-period DID_M concept, not # applicable to the per-group DID_1 estimand) effective_joiners_available = False effective_leavers_available = False # R2 P1b: Finalize replicate-df propagation across public # surfaces. When heterogeneity is active, its n_valid is # appended to `_replicate_n_valid_list` AFTER the main # surfaces (overall / joiners / leavers / event study / # placebo horizon) have already computed their t/p/CI fields # with an intermediate `_df_survey`. If heterogeneity reports # the smallest n_valid, the main-surface inference would be # anti-conservative relative to `survey_metadata.df_survey` # and HonestDiD. Re-run safe_inference with the FINAL # effective df so every surface agrees. _final_eff_df = _effective_df_survey(resolved_survey, _replicate_n_valid_list) if _replicate_n_valid_list: _final_inf_df = _inference_df(_final_eff_df, resolved_survey) # Recompute `effective_overall_*` directly — that's what # ships in results at line ~2776+. `effective_overall_att/se` # may differ from the raw `overall_att/se` under the delta # (cost-benefit) path at L_max >= 2; recomputing from # `effective_*` ensures both paths get the final df. if np.isfinite(effective_overall_se): effective_overall_t, effective_overall_p, effective_overall_ci = safe_inference( effective_overall_att, effective_overall_se, alpha=self.alpha, df=_final_inf_df, ) # Keep `overall_*` in sync for any downstream code that # reads them directly (e.g., placebo_event_study_dict # construction flows from overall_*). overall_t, overall_p, overall_ci = safe_inference( overall_att, overall_se, alpha=self.alpha, df=_final_inf_df, ) if joiners_available: joiners_t, joiners_p, joiners_ci = safe_inference( joiners_att, joiners_se, alpha=self.alpha, df=_final_inf_df, ) if leavers_available: leavers_t, leavers_p, leavers_ci = safe_inference( leavers_att, leavers_se, alpha=self.alpha, df=_final_inf_df, ) if multi_horizon_inference is not None: for _lag_r2, _info_r2 in list(multi_horizon_inference.items()): _t_r2, _p_r2, _ci_r2 = safe_inference( _info_r2["effect"], _info_r2["se"], alpha=self.alpha, df=_final_inf_df, ) _info_r2["t_stat"] = _t_r2 _info_r2["p_value"] = _p_r2 _info_r2["conf_int"] = _ci_r2 if placebo_horizon_inference is not None: for _lag_r2, _info_r2 in list(placebo_horizon_inference.items()): _t_r2, _p_r2, _ci_r2 = safe_inference( _info_r2["effect"], _info_r2["se"], alpha=self.alpha, df=_final_inf_df, ) _info_r2["t_stat"] = _t_r2 _info_r2["p_value"] = _p_r2 _info_r2["conf_int"] = _ci_r2 # `placebo_event_study_dict` holds VALUE copies # (not shared references) of the inner dicts, so # mutating `placebo_horizon_inference[lag]` above # does NOT propagate to the public surface. Update # the negative-key mirror explicitly so the # recomputed t/p/CI ship in results. if ( placebo_event_study_dict is not None and -_lag_r2 in placebo_event_study_dict ): placebo_event_study_dict[-_lag_r2]["t_stat"] = _t_r2 placebo_event_study_dict[-_lag_r2]["p_value"] = _p_r2 placebo_event_study_dict[-_lag_r2]["conf_int"] = _ci_r2 if heterogeneity_effects: for _lag_r2, _info_r2 in list(heterogeneity_effects.items()): if np.isfinite(_info_r2["se"]): _t_r2, _p_r2, _ci_r2 = safe_inference( _info_r2["beta"], _info_r2["se"], alpha=self.alpha, df=_final_inf_df, ) _info_r2["t_stat"] = _t_r2 _info_r2["p_value"] = _p_r2 _info_r2["conf_int"] = _ci_r2 # Per-path heterogeneity (Wave 5 #11): per-(path, l) entries # snapshot df_inference at compute-time. Refresh with final df # so t/p/CI match `survey_metadata.df_survey`. Schema differs # from per-path event-study (`{path: {l: ...}}` vs # `{path: {"horizons": {l: ...}}}`), so inline loop here # rather than reusing `_refresh_path_inference`. if path_heterogeneity_effects: for _path_r2, _horizons_r2 in list(path_heterogeneity_effects.items()): for _l_r2, _info_r2 in list(_horizons_r2.items()): if np.isfinite(_info_r2["se"]): _t_r2, _p_r2, _ci_r2 = safe_inference( _info_r2["beta"], _info_r2["se"], alpha=self.alpha, df=_final_inf_df, ) _info_r2["t_stat"] = _t_r2 _info_r2["p_value"] = _p_r2 _info_r2["conf_int"] = _ci_r2 # Normalized effects: another public surface built with the # pre-heterogeneity `_df_survey`. Recompute inference with # the final df so t/p/CI match the other surfaces (and the # NaN contract when the final df becomes undefined). if normalized_effects_out is not None: for _lag_r2, _info_r2 in list(normalized_effects_out.items()): _t_r2, _p_r2, _ci_r2 = safe_inference( _info_r2["effect"], _info_r2["se"], alpha=self.alpha, df=_final_inf_df, ) _info_r2["t_stat"] = _t_r2 _info_r2["p_value"] = _p_r2 _info_r2["conf_int"] = _ci_r2 # Per-path event-study and placebo surfaces: their helpers # snapshotted df_inference BEFORE appending their own n_valid # contributions, and the global event-study / placebo / # heterogeneity / overall / joiners / leavers IF sites # appended their n_valid AFTER per-path runs. Refresh per-path # inference with the final df so it agrees with the global # surfaces and `survey_metadata.df_survey`. _refresh_path_inference( path_effects=path_effects, path_placebos=path_placebos, alpha=self.alpha, df_final=_final_inf_df, ) # Persist the final effective df_survey into survey_metadata so # downstream consumers — HonestDiD bounds (honest_did.py:973 # reads results.survey_metadata.df_survey), exported metadata, # and users — all see the same df that the recomputed # inference above used. SurveyMetadata is a mutable @dataclass # (diff_diff/survey.py:681), so direct attribute assignment is # safe. if survey_metadata is not None: survey_metadata.df_survey = _final_eff_df results = ChaisemartinDHaultfoeuilleResults( overall_att=effective_overall_att, overall_se=effective_overall_se, overall_t_stat=effective_overall_t, overall_p_value=effective_overall_p, overall_conf_int=effective_overall_ci, joiners_att=joiners_att if effective_joiners_available else float("nan"), joiners_se=joiners_se if effective_joiners_available else float("nan"), joiners_t_stat=joiners_t if effective_joiners_available else float("nan"), joiners_p_value=joiners_p if effective_joiners_available else float("nan"), joiners_conf_int=( joiners_ci if effective_joiners_available else (float("nan"), float("nan")) ), n_joiner_cells=n_joiner_cells if effective_joiners_available else 0, n_joiner_obs=n_joiner_obs if effective_joiners_available else 0, joiners_available=effective_joiners_available, leavers_att=leavers_att if effective_leavers_available else float("nan"), leavers_se=leavers_se if effective_leavers_available else float("nan"), leavers_t_stat=leavers_t if effective_leavers_available else float("nan"), leavers_p_value=leavers_p if effective_leavers_available else float("nan"), leavers_conf_int=( leavers_ci if effective_leavers_available else (float("nan"), float("nan")) ), n_leaver_cells=n_leaver_cells if effective_leavers_available else 0, n_leaver_obs=n_leaver_obs if effective_leavers_available else 0, leavers_available=effective_leavers_available, placebo_effect=placebo_effect, placebo_se=placebo_se, placebo_t_stat=placebo_t, placebo_p_value=placebo_p, placebo_conf_int=placebo_ci, placebo_available=placebo_available, per_period_effects=per_period_effects, groups=all_groups, time_periods=all_periods, n_obs=n_obs_post, n_treated_obs=effective_n_treated, n_switcher_cells=effective_N_S, n_cohorts=n_cohorts, n_groups_dropped_crossers=n_groups_dropped_crossers, n_groups_dropped_singleton_baseline=n_groups_dropped_singleton_baseline, n_groups_dropped_never_switching=n_groups_dropped_never_switching, event_study_effects=event_study_effects, L_max=L_max, placebo_event_study=placebo_event_study_dict, twfe_weights=twfe_weights_df, twfe_fraction_negative=twfe_fraction_negative, twfe_sigma_fe=twfe_sigma_fe, twfe_beta_fe=twfe_beta_fe, alpha=self.alpha, normalized_effects=normalized_effects_out, cost_benefit_delta=cost_benefit_result, sup_t_bands=( { "crit_value": bootstrap_results.cband_crit_value, "alpha": self.alpha, "n_bootstrap": self.n_bootstrap, "method": "multiplier_bootstrap", } if bootstrap_results is not None and bootstrap_results.cband_crit_value is not None else None ), bootstrap_results=bootstrap_results, covariate_residuals=( _build_covariate_diagnostics_df(covariate_diagnostics, controls) if covariate_diagnostics is not None else None ), linear_trends_effects=linear_trends_effects, trends_linear=_is_trends_linear, heterogeneity_effects=heterogeneity_effects, path_heterogeneity_effects=path_heterogeneity_effects, design2_effects=( _compute_design2_effects( D_mat=D_mat, # Design-2 always uses raw level outcomes (not residualized, # not first-differenced). Use y_pivot as the canonical raw source. Y_mat=y_pivot.to_numpy(), N_mat=N_mat_orig, baselines=baselines, first_switch_idx=first_switch_idx_arr, switch_direction=switch_direction_arr, T_g=T_g_arr, L_max=L_max if L_max is not None else 1, ) if design2 else None ), path_effects=path_effects, path_placebo_event_study=path_placebos, path_cumulated_event_study=path_cumulated_event_study, path_sup_t_bands=( # When by_path + n_bootstrap > 0 is active, surface a # dict (possibly empty) — preserving the documented # `None` (not requested) vs `{}` (requested but empty) # contract that mirrors `path_effects` / `path_placebo_ # event_study` empty-state behavior. The empty case # arises in two ways: # 1. `path_effects == {}` — no observed path has a # complete window; the per-path bootstrap collector # is skipped upstream and `path_cband_crit_values` # stays `None`. We materialize `{}` here. # 2. Bootstrap ran but no path passed both gates # (>=2 valid horizons AND a strict majority — more # than 50% — of finite sup-t draws); # `path_cband_crit_values == {}` — passes through. { path_key: { "crit_value": crit, "alpha": self.alpha, "n_bootstrap": self.n_bootstrap, "method": "multiplier_bootstrap", "n_valid_horizons": ( bootstrap_results.path_cband_n_valid_horizons.get(path_key, 0) if bootstrap_results is not None and bootstrap_results.path_cband_n_valid_horizons is not None else 0 ), } for path_key, crit in ( bootstrap_results.path_cband_crit_values if bootstrap_results is not None and bootstrap_results.path_cband_crit_values is not None else {} ).items() if np.isfinite(crit) } if ( (self.by_path is not None or self.paths_of_interest is not None) and self.n_bootstrap > 0 ) else None ), survey_metadata=survey_metadata, _estimator_ref=self, ) # ------------------------------------------------------------------ # HonestDiD integration (when honest_did=True) # ------------------------------------------------------------------ if honest_did and results.placebo_event_study: try: from diff_diff.honest_did import compute_honest_did results.honest_did_results = compute_honest_did( results, method="relative_magnitude", M=1.0, alpha=self.alpha, ) except (ValueError, np.linalg.LinAlgError) as exc: warnings.warn( f"HonestDiD computation failed ({type(exc).__name__}): " f"{exc}. results.honest_did_results will be None. " f"You can retry with compute_honest_did(results, ...) " f"using different parameters.", UserWarning, stacklevel=2, ) results.honest_did_results = None self.results_ = results self.is_fitted_ = True return results
# ============================================================================= # Module-level helpers # ============================================================================= def _check_forward_compat_gates( aggregate: Optional[str], L_max: Optional[int], controls: Optional[List[str]], trends_linear: Optional[bool], trends_nonparam: Any, honest_did: bool, ) -> None: """Raise ``NotImplementedError`` for any non-default Phase 3 parameter. Phase 2 parameters (``L_max``) are validated inline in ``fit()`` after period detection. The ``aggregate`` parameter is still reserved for Phase 3. """ if aggregate is not None: raise NotImplementedError( f"aggregate={aggregate!r} is reserved for Phase 3 of dCDH. " "Multi-horizon event study effects are computed automatically " "when L_max is set. See ROADMAP.md Phase 3." ) # L_max is validated inline in fit() after period detection (needs # the period count). Not gated here. # controls gate lifted — DID^X covariate residualization implemented. # Validation (L_max >= 1 required) is in fit() after L_max detection. # trends_linear gate lifted - DID^{fd} linear trends implemented. # Validation (L_max >= 1, n_periods >= 3 required) is in fit(). # trends_nonparam gate lifted - state-set trends implemented. # Validation (L_max >= 1, column exists, time-invariant) is in fit(). # honest_did gate lifted - integration implemented. # Validation (L_max >= 1 required) is in fit() after L_max detection. def _drop_crossing_cells( cell: pd.DataFrame, group_col: str, d_col: str ) -> Tuple[pd.DataFrame, int]: """ Drop groups with more than one treatment-change period. The dCDH estimator uses the **first treatment change** (``F_g``) as the cohort marker for both the per-group building block ``DID_{g,l}`` and the variance computation. Groups with a second treatment change at a later period would confound the multi-horizon estimates because ``DID_{g,l}`` attributes the full outcome change from ``F_g-1`` to ``F_g-1+l`` to the first switch, while the second switch also contributes to that outcome change. For binary treatment, >1 change means a reversal (e.g., 0->1->0). For non-binary, >1 change includes both reversals (0->2->1) and monotone multi-step paths (0->1->2). Both are dropped because the dCDH framework requires a single treatment-change event per group. A single jump of any magnitude (e.g., 0->3->3->3) has exactly 1 change period and is kept. Parameters ---------- cell : pd.DataFrame Cell-level dataset with columns for ``group_col`` and ``d_col``. Must be sorted by group and time. group_col : str d_col : str Treatment column name. Returns ------- filtered : pd.DataFrame Subset of ``cell`` with all multi-switch groups removed. n_dropped : int Number of groups dropped. """ # Count the number of periods with non-zero treatment changes per # group. A group with > 1 such period has changed treatment more # than once (multi-switch). This generalizes correctly to non-binary # treatment: a single jump 0->3 has 1 non-zero diff, while 0->1->0 # has 2 non-zero diffs. diffs = cell.groupby(group_col)[d_col].diff().fillna(0) n_changes = (diffs != 0).groupby(cell[group_col]).sum() multi_switch_groups = n_changes[n_changes > 1].index.tolist() n_dropped = len(multi_switch_groups) if n_dropped > 0: warnings.warn( f"drop_larger_lower=True dropped {n_dropped} multi-switch group(s) " f"matching R DIDmultiplegtDYN behavior. Examples: " f"{multi_switch_groups[:5]}" + (f" (and {n_dropped - 5} more)" if n_dropped > 5 else ""), UserWarning, stacklevel=3, ) cell = cell[~cell[group_col].isin(multi_switch_groups)].reset_index(drop=True) return cell, n_dropped def _compute_per_period_dids( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, periods: List[Any], ) -> Tuple[ Dict[Any, Dict[str, Any]], List[str], np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, ]: """ Compute per-period DID_+,t and DID_-,t with explicit A11 zero-retention. Returns ------- per_period_effects : dict Keyed by period; values are full per-period dicts including the ``did_*_t_a11_zeroed`` flags. a11_warnings : list of str One string per period that triggered an A11 violation. did_plus_t_arr : np.ndarray DID_+,t values aligned to ``periods[1:]``. did_minus_t_arr : np.ndarray DID_-,t values aligned to ``periods[1:]``. n_10_t_arr : np.ndarray Joiner cell counts aligned to ``periods[1:]``. n_01_t_arr : np.ndarray Leaver cell counts aligned to ``periods[1:]``. n_00_t_arr : np.ndarray Stable-untreated cell counts aligned to ``periods[1:]``. n_11_t_arr : np.ndarray Stable-treated cell counts aligned to ``periods[1:]``. a11_plus_zeroed_arr : np.ndarray Boolean flags marking periods where DID_+,t was zeroed by the A11 convention (joiners present but no stable_0 controls). a11_minus_zeroed_arr : np.ndarray Mirror for DID_-,t. """ n_periods = len(periods) per_period_effects: Dict[Any, Dict[str, Any]] = {} a11_warnings: List[str] = [] did_plus_t_list: List[float] = [] did_minus_t_list: List[float] = [] n_10_t_list: List[int] = [] n_01_t_list: List[int] = [] n_00_t_list: List[int] = [] n_11_t_list: List[int] = [] a11_plus_zeroed_list: List[bool] = [] a11_minus_zeroed_list: List[bool] = [] for t_idx in range(1, n_periods): d_curr = D_mat[:, t_idx] d_prev = D_mat[:, t_idx - 1] y_curr = Y_mat[:, t_idx] y_prev = Y_mat[:, t_idx - 1] n_curr = N_mat[:, t_idx] # Cell-presence guard: a (g, t) cell only counts if BOTH t and t-1 # were observed for that group (n_gt > 0 and n_{g,t-1} > 0). n_prev = N_mat[:, t_idx - 1] present = (n_curr > 0) & (n_prev > 0) joiner_mask = (d_prev == 0) & (d_curr == 1) & present stable0_mask = (d_prev == 0) & (d_curr == 0) & present leaver_mask = (d_prev == 1) & (d_curr == 0) & present stable1_mask = (d_prev == 1) & (d_curr == 1) & present # AER 2020 Theorem 3 N_{a,b,t} weights are CELL counts, not # within-cell observation sums. Each (g, t) cell contributes once # regardless of how many original observations fed into the # y_gt cell mean. See REGISTRY.md ChaisemartinDHaultfoeuille # estimator equations. n_10 = int(joiner_mask.sum()) n_00 = int(stable0_mask.sum()) n_01 = int(leaver_mask.sum()) n_11 = int(stable1_mask.sum()) # --- DID_+,t (joiners side) --- did_plus_t_a11_zeroed = False if n_10 == 0: did_plus_t = 0.0 elif n_00 == 0: # A11 violation: joiners exist but no stable_0 controls did_plus_t = 0.0 did_plus_t_a11_zeroed = True a11_warnings.append(f"period {periods[t_idx]}: joiners present, no stable_0") else: # Unweighted means over cells (each cell contributes equally) joiner_avg = float((y_curr[joiner_mask] - y_prev[joiner_mask]).mean()) stable0_avg = float((y_curr[stable0_mask] - y_prev[stable0_mask]).mean()) did_plus_t = joiner_avg - stable0_avg # --- DID_-,t (leavers side) --- did_minus_t_a11_zeroed = False if n_01 == 0: did_minus_t = 0.0 elif n_11 == 0: did_minus_t = 0.0 did_minus_t_a11_zeroed = True a11_warnings.append(f"period {periods[t_idx]}: leavers present, no stable_1") else: stable1_avg = float((y_curr[stable1_mask] - y_prev[stable1_mask]).mean()) leaver_avg = float((y_curr[leaver_mask] - y_prev[leaver_mask]).mean()) did_minus_t = stable1_avg - leaver_avg per_period_effects[periods[t_idx]] = { "did_plus_t": did_plus_t, "did_minus_t": did_minus_t, "n_10_t": n_10, "n_01_t": n_01, "n_00_t": n_00, "n_11_t": n_11, "did_plus_t_a11_zeroed": did_plus_t_a11_zeroed, "did_minus_t_a11_zeroed": did_minus_t_a11_zeroed, } did_plus_t_list.append(did_plus_t) did_minus_t_list.append(did_minus_t) n_10_t_list.append(n_10) n_01_t_list.append(n_01) n_00_t_list.append(n_00) n_11_t_list.append(n_11) a11_plus_zeroed_list.append(did_plus_t_a11_zeroed) a11_minus_zeroed_list.append(did_minus_t_a11_zeroed) return ( per_period_effects, a11_warnings, np.array(did_plus_t_list, dtype=float), np.array(did_minus_t_list, dtype=float), np.array(n_10_t_list, dtype=int), np.array(n_01_t_list, dtype=int), np.array(n_00_t_list, dtype=int), np.array(n_11_t_list, dtype=int), np.array(a11_plus_zeroed_list, dtype=bool), np.array(a11_minus_zeroed_list, dtype=bool), ) def _compute_placebo( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, periods: List[Any], ) -> Optional[Tuple[float, bool, List[str]]]: """ Compute the single-lag placebo DID_M^pl from AER 2020 placebo specification. Same logic as DID_M but evaluated on the pre-event difference ``Y_{g, t-1} - Y_{g, t-2}`` for cells with three-period histories. Requires ``T >= 3``. Mirrors the main path's A11 zero-retention machinery: when placebo joiners exist but no 3-period stable_0 controls do (or symmetric for leavers/stable_1), the affected per-period contribution is set to zero AND a warning string is appended to ``placebo_a11_warnings``. The caller is responsible for surfacing the consolidated warning. The zero-retention preserves the period's switcher count in the placebo ``N_S^pl`` denominator, biasing the placebo toward zero in the offending direction (matching placebo paper convention). Returns ------- None if ``T < 3`` or no qualifying cells. Otherwise a tuple ``(placebo_effect, True, placebo_a11_warnings)`` where ``placebo_a11_warnings`` is a list of one string per period that triggered an A11 violation in the placebo numerator. """ n_periods = len(periods) if n_periods < 3: return None placebo_plus_per_t: List[float] = [] placebo_minus_per_t: List[float] = [] n_10_per_t: List[int] = [] n_01_per_t: List[int] = [] placebo_a11_warnings: List[str] = [] for t_idx in range(2, n_periods): d_curr = D_mat[:, t_idx] d_prev = D_mat[:, t_idx - 1] d_pre_prev = D_mat[:, t_idx - 2] y_prev = Y_mat[:, t_idx - 1] y_pre_prev = Y_mat[:, t_idx - 2] # Cell-presence guard: a (g, t) cell only counts if all three # consecutive periods (t-2, t-1, t) were observed for the group. present = (N_mat[:, t_idx] > 0) & (N_mat[:, t_idx - 1] > 0) & (N_mat[:, t_idx - 2] > 0) # Joiners that have a 3-period history with stable D=0 in t-2 and t-1 joiner_mask = (d_pre_prev == 0) & (d_prev == 0) & (d_curr == 1) & present # Stable_0 controls with stable D=0 in t-2 and t-1 stable0_mask = (d_pre_prev == 0) & (d_prev == 0) & (d_curr == 0) & present # Mirror for leavers/stable_1 (3-period stable treatment then leave) leaver_mask = (d_pre_prev == 1) & (d_prev == 1) & (d_curr == 0) & present stable1_mask = (d_pre_prev == 1) & (d_prev == 1) & (d_curr == 1) & present # Placebo weights are CELL counts (matching Theorem 3 convention) n_10 = int(joiner_mask.sum()) n_00 = int(stable0_mask.sum()) n_01 = int(leaver_mask.sum()) n_11 = int(stable1_mask.sum()) # Joiners side: distinguish "no joiners" (natural zero) from # "joiners but no stable_0" (A11 violation, flagged + warned) if n_10 == 0: placebo_plus_t = 0.0 elif n_00 == 0: placebo_plus_t = 0.0 placebo_a11_warnings.append( f"period {periods[t_idx]}: placebo joiners present, no stable_0" ) else: joiner_avg = float((y_prev[joiner_mask] - y_pre_prev[joiner_mask]).mean()) stable0_avg = float((y_prev[stable0_mask] - y_pre_prev[stable0_mask]).mean()) placebo_plus_t = joiner_avg - stable0_avg # Leavers side: symmetric A11 distinction if n_01 == 0: placebo_minus_t = 0.0 elif n_11 == 0: placebo_minus_t = 0.0 placebo_a11_warnings.append( f"period {periods[t_idx]}: placebo leavers present, no stable_1" ) else: stable1_avg = float((y_prev[stable1_mask] - y_pre_prev[stable1_mask]).mean()) leaver_avg = float((y_prev[leaver_mask] - y_pre_prev[leaver_mask]).mean()) placebo_minus_t = stable1_avg - leaver_avg placebo_plus_per_t.append(placebo_plus_t) placebo_minus_per_t.append(placebo_minus_t) n_10_per_t.append(n_10) n_01_per_t.append(n_01) n_10_arr = np.array(n_10_per_t, dtype=int) n_01_arr = np.array(n_01_per_t, dtype=int) N_S_pl = int(n_10_arr.sum() + n_01_arr.sum()) if N_S_pl == 0: return None placebo_effect = float( (n_10_arr @ np.array(placebo_plus_per_t) + n_01_arr @ np.array(placebo_minus_per_t)) / N_S_pl ) return placebo_effect, True, placebo_a11_warnings # ====================================================================== # Phase 3: Covariate residualization helpers # ====================================================================== def _compute_covariate_residualization( Y_mat: np.ndarray, X_cell: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, rank_deficient_action: str = "warn", ) -> Tuple[np.ndarray, Dict[str, Any], set]: """Residualize outcomes by partialling out covariates per baseline treatment. Implements ``DID^X`` from Web Appendix Section 1.2 of de Chaisemartin & D'Haultfoeuille (2024). For each baseline treatment value *d*, estimates ``theta_hat_d`` via OLS of first-differenced outcomes on first-differenced covariates with time FEs, restricted to not-yet-treated observations. Then residualizes at levels: ``Y_tilde[g,t] = Y[g,t] - X[g,t] @ theta_hat_d``. The level-residualization is equivalent to difference-residualization by the Frisch-Waugh-Lovell theorem, so all downstream DID computations (which use ``Y[g, out] - Y[g, ref]``) automatically produce the correct covariate-adjusted estimates. Parameters ---------- Y_mat : np.ndarray, shape (n_groups, n_periods) Cell-level outcome means. X_cell : np.ndarray, shape (n_groups, n_periods, n_covariates) Cell-level covariate means. N_mat : np.ndarray, shape (n_groups, n_periods) Observation counts per cell (>0 if observed). baselines : np.ndarray, shape (n_groups,) ``D_{g,1}`` baseline treatment values (float). first_switch_idx : np.ndarray, shape (n_groups,) Column index of first treatment change (-1 if never-switching). Returns ------- Y_residualized : np.ndarray, shape (n_groups, n_periods) Outcome matrix with covariate effects removed. diagnostics : dict Keyed by baseline value (float). Each entry has ``theta_hat`` (covariate coefficients), ``n_obs`` (OLS sample size), and ``r_squared`` (first-stage R-squared). """ from diff_diff.linalg import solve_ols n_groups, n_periods = Y_mat.shape n_covariates = X_cell.shape[2] Y_resid = Y_mat.copy() diagnostics: Dict[str, Any] = {} failed_baselines: set = set() # Pre-compute observation validity masks for first-differencing. # both_observed[g, t] = True iff N_mat[g, t] > 0 AND N_mat[g, t-1] > 0 both_observed = np.zeros((n_groups, n_periods), dtype=bool) both_observed[:, 1:] = (N_mat[:, 1:] > 0) & (N_mat[:, :-1] > 0) # not_yet_switched[g, t] = True iff group g has not switched by period t # (first_switch_idx[g] == -1 means never-switcher -> always True) t_indices = np.arange(n_periods)[np.newaxis, :] # (1, n_periods) f_g_col = first_switch_idx[:, np.newaxis] # (n_groups, 1) not_yet_switched = (f_g_col == -1) | (f_g_col > t_indices) for d_val in np.unique(baselines): d_mask = baselines == d_val # (n_groups,) # Valid OLS observations: baseline matches, not-yet-treated, both # periods observed, t >= 1 (first-differencing needs t and t-1). valid = d_mask[:, np.newaxis] & not_yet_switched & both_observed valid_g, valid_t = np.where(valid) n_obs = len(valid_g) if n_obs == 0: diagnostics[float(d_val)] = { "theta_hat": np.full(n_covariates, np.nan), "n_obs": 0, "r_squared": np.nan, } # NaN out outcomes for failed strata so they're excluded # from downstream DID computation (don't mix raw + adjusted). group_indices = np.where(d_mask)[0] Y_resid[group_indices, :] = np.nan failed_baselines.add(float(d_val)) warnings.warn( f"No not-yet-treated observations for baseline treatment " f"d={d_val}. Cannot estimate covariate slope theta_hat. " f"Groups with this baseline are excluded from the " f"covariate-adjusted estimation.", UserWarning, stacklevel=3, ) continue # First-differenced outcomes and covariates dY = Y_mat[valid_g, valid_t] - Y_mat[valid_g, valid_t - 1] # (n_obs,) dX = X_cell[valid_g, valid_t] - X_cell[valid_g, valid_t - 1] # (n_obs, K) # Check for non-finite values (NaN from missing covariates/outcomes) finite_mask = np.isfinite(dY) & np.all(np.isfinite(dX), axis=1) if not finite_mask.all(): dY = dY[finite_mask] dX = dX[finite_mask] n_obs = len(dY) if n_obs == 0: diagnostics[float(d_val)] = { "theta_hat": np.full(n_covariates, np.nan), "n_obs": 0, "r_squared": np.nan, } continue valid_t_finite = valid_t[finite_mask] else: valid_t_finite = valid_t # Build design: [intercept, dX, time_dummies (reference dropped)] # The intercept is required when dropping one time dummy as # reference category; without it the omitted period's FE is # forced to zero, biasing theta_hat. intercept = np.ones((n_obs, 1)) unique_t = np.unique(valid_t_finite) n_time_fe = len(unique_t) - 1 if n_time_fe > 0: time_dummies = np.zeros((n_obs, n_time_fe)) for i, t_val in enumerate(unique_t[1:]): time_dummies[:, i] = (valid_t_finite == t_val).astype(float) design = np.hstack([intercept, dX, time_dummies]) else: design = np.hstack([intercept, dX]) # Small-sample guard: skip if fewer obs than parameters n_params = design.shape[1] if n_obs < n_params: diagnostics[float(d_val)] = { "theta_hat": np.full(n_covariates, np.nan), "n_obs": n_obs, "r_squared": np.nan, } # NaN out outcomes for failed strata (don't mix raw + adjusted) group_indices_fail = np.where(d_mask)[0] Y_resid[group_indices_fail, :] = np.nan failed_baselines.add(float(d_val)) warnings.warn( f"DID^X: baseline d={d_val} has {n_obs} not-yet-treated " f"observations but {n_params} regressors. Groups with " f"this baseline are excluded from covariate-adjusted " f"estimation.", UserWarning, stacklevel=3, ) continue # OLS: dY = [dX, time_FE] @ beta + epsilon coefs, residuals, _vcov = solve_ols( design, dY, return_vcov=True, rank_deficient_action=rank_deficient_action, ) # Extract covariate coefficients (indices 1..n_covariates; # index 0 is the intercept) theta_hat = coefs[1 : 1 + n_covariates] # R-squared of first-stage regression ss_res = float(np.sum(residuals**2)) ss_tot = float(np.sum((dY - dY.mean()) ** 2)) r_squared = 1.0 - ss_res / ss_tot if ss_tot > 0 else np.nan diagnostics[float(d_val)] = { "theta_hat": theta_hat.copy(), "n_obs": n_obs, "r_squared": r_squared, } # Guard: if some control coefficients are NaN (rank-deficient # OLS dropped collinear controls), residualize with only the # finite subset. Replace NaN coefficients with 0 so einsum # only uses the identified controls. nan_mask = ~np.isfinite(theta_hat) if nan_mask.any(): n_dropped = int(nan_mask.sum()) warnings.warn( f"DID^X: rank-deficient first-stage OLS for baseline " f"d={d_val} dropped {n_dropped} collinear control(s). " f"Residualization uses the {n_covariates - n_dropped} " f"identified control(s).", UserWarning, stacklevel=3, ) theta_hat = np.where(np.isfinite(theta_hat), theta_hat, 0.0) # Residualize Y at levels for all groups with this baseline. # Vectorized level residualization: Y_tilde[g, t] = Y[g, t] - X[g, t] @ theta_hat group_indices = np.where(d_mask)[0] if len(group_indices) > 0: # X_sub: (n_d_groups, n_periods, n_covariates), theta: (n_covariates,) X_sub = X_cell[group_indices] # (n_d, T, K) adjustment = np.einsum("gtk,k->gt", X_sub, theta_hat) # (n_d, T) # Mask: only adjust cells that are observed and have finite covariates valid = (N_mat[group_indices] > 0) & np.all(np.isfinite(X_sub), axis=2) Y_resid[group_indices] = np.where( valid, Y_mat[group_indices] - adjustment, Y_mat[group_indices] ) return Y_resid, diagnostics, failed_baselines def _compute_first_differenced_matrix( Y_mat: np.ndarray, N_mat: np.ndarray, ) -> Tuple[np.ndarray, np.ndarray]: """First-difference the outcome matrix for ``DID^{fd}`` estimation. Transforms ``Y_mat`` into first-differences for the group-specific linear trends estimator (Web Appendix Section 1.3, Lemma 6). When passed to ``_compute_multi_horizon_dids()`` and the IF function, the standard ``DID_{g,l}`` formula on ``Z_mat`` produces ``DID^{fd}_{g,l}`` exactly. The ``F_g >= 3`` constraint (paper, 1-indexed) maps to ``first_switch_idx >= 2`` (0-indexed). This is enforced automatically: ``N_mat_fd[:, 0] = 0`` causes groups with ``first_switch_idx = 1`` to fail the ``N_mat > 0`` eligibility check at their reference period. Parameters ---------- Y_mat : np.ndarray, shape (n_groups, n_periods) Cell-level outcome means (possibly already residualized). N_mat : np.ndarray, shape (n_groups, n_periods) Observation counts per cell. Returns ------- Z_mat : np.ndarray, shape (n_groups, n_periods) First-differenced outcomes. ``Z[:, 0] = NaN``, ``Z[:, t] = Y[:, t] - Y[:, t-1]`` for ``t >= 1``. N_mat_fd : np.ndarray, shape (n_groups, n_periods) Adjusted observation counts. ``N_fd[:, 0] = 0``, ``N_fd[:, t] = min(N[:, t], N[:, t-1])`` for ``t >= 1``. """ n_groups, n_periods = Y_mat.shape Z_mat = np.full((n_groups, n_periods), np.nan) Z_mat[:, 1:] = Y_mat[:, 1:] - Y_mat[:, :-1] N_mat_fd = np.zeros_like(N_mat) N_mat_fd[:, 1:] = np.minimum(N_mat[:, 1:], N_mat[:, :-1]) return Z_mat, N_mat_fd def _compute_heterogeneity_test( Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, X_het: np.ndarray, L_max: int, alpha: float = 0.05, rank_deficient_action: str = "warn", group_ids_order: Optional[np.ndarray] = None, obs_survey_info: Optional[Dict[str, Any]] = None, replicate_n_valid_list: Optional[List[int]] = None, path_groups: Optional[Set[int]] = None, ) -> Dict[int, Dict[str, Any]]: """Test for heterogeneous treatment effects (Web Appendix Section 1.5). Regresses ``S_g * (Y_{g, F_g-1+l} - Y_{g, F_g-1})`` on ``X_g`` plus cohort indicator dummies ``(D_{g,1}, F_g, S_g)``. Under Assumption 15 (Lemma 7), the coefficient on ``X_g`` is an unbiased estimator of the variance-weighted average of effect differences. Standard OLS inference is valid - no need to account for DID estimation error. Parameters ---------- Y_mat : np.ndarray, shape (n_groups, n_periods) N_mat : np.ndarray, shape (n_groups, n_periods) baselines, first_switch_idx, switch_direction, T_g : np.ndarray X_het : np.ndarray, shape (n_groups,) Time-invariant covariate to test for heterogeneity. L_max : int alpha : float group_ids_order : np.ndarray, optional Canonical post-filter group id list aligned to Y_mat row order. Required when ``obs_survey_info`` is supplied. obs_survey_info : dict, optional Observation-level survey info with keys ``group_ids`` (raw per-row group labels), ``time_ids`` (raw per-row period labels), ``weights`` (per-row survey weights), ``resolved`` (ResolvedSurveyDesign), and ``periods`` (sorted canonical period array matching ``Y_mat``'s column order). When provided, the regression uses WLS with per-group weights ``W_g = sum of obs survey weights in group g``. The group-level WLS coefficient IF is ``ψ_g = inv(X'WX)[1,:] @ x_g * W_g * r_g``. Two observation-level expansions of ``ψ_g`` coexist on this path, split by variance helper so each path uses the allocator that preserves byte-identity for its aggregation rule: * **Binder TSL** (``compute_survey_if_variance``): the cell-period single-cell allocator — ``ψ_i = ψ_g * (w_i / W_{g, out_idx})`` for obs in ``(g, out_idx)``, zero elsewhere. Under PSU=group per-obs distribution differs from the legacy ``ψ_i = ψ_g * (w_i / W_g)`` but PSU-level aggregates telescope to the same ``ψ_g``, so Binder variance is byte-identical to the pre-cell-period release. Under within-group-varying PSU mass lands in the post-period PSU of the transition (DID_l post-period convention). * **Rao-Wu replicate** (``compute_replicate_if_variance``): the legacy group-level allocator ``ψ_i = ψ_g * (w_i / W_g)``. Replicate variance computes ``θ_r = sum_i ratio_ir * ψ_i`` at observation level, so moving ψ_g mass onto the post-period cell would silently change the replicate SE whenever a replicate column's ratios vary within a group (which the library allows — e.g., per-row BRR/Fay/SDR matrices). Keeping the legacy allocator on this branch preserves byte-identity of replicate SE across every previously-supported fit. Replicate + within-group-varying PSU is unreachable by construction (``SurveyDesign`` rejects ``replicate_weights`` combined with explicit ``strata/psu/fpc``). The effective df for t-critical values follows the site-level ``min(df_s, n_valid_het - 1)`` rule and the helper mutates ``replicate_n_valid_list`` so the final ``_effective_df_survey(...)`` sees this site's n_valid. replicate_n_valid_list : list[int], optional Shared accumulator for replicate-weight ``n_valid`` counts across IF sites. When provided and a replicate design is in use, this function appends its own ``n_valid_het`` before computing the local effective df — so both the local inference fields and the final ``survey_metadata.df_survey`` (set by ``fit()``) reflect this site's contribution. Returns ------- dict ``{l: {beta, se, t_stat, p_value, conf_int, n_obs}}`` per horizon. """ from diff_diff.linalg import solve_ols from diff_diff.utils import safe_inference n_groups, n_periods = Y_mat.shape results: Dict[int, Dict[str, Any]] = {} # Survey setup (once, before horizon loop). When inactive, df_s=None and # the existing plain-OLS path runs unchanged. use_survey = obs_survey_info is not None and group_ids_order is not None if use_survey: from diff_diff.survey import ( compute_replicate_if_variance, compute_survey_if_variance, ) obs_gids_raw = np.asarray(obs_survey_info["group_ids"]) obs_w_raw = np.asarray(obs_survey_info["weights"], dtype=np.float64) resolved = obs_survey_info["resolved"] # df_s starts from the shared effective df (base-capped by # design df). Under replicate designs the helper handles the # min(resolved.df_survey, min(n_valid) - 1) reduction and # preserves None when the base df is undefined (QR-rank ≤ 1). # Using the helper here — rather than re-deriving locally — # keeps heterogeneity's df consistent with the main dCDH # surfaces (R2 P1a). `list(... or [])` avoids accidental # mutation of the caller's shared tracker at this site; the # explicit append happens inside the horizon loop below. df_s = _effective_df_survey(resolved, list(replicate_n_valid_list or [])) # Contract: only obs whose group is in the canonical post-filter # list contribute. Groups dropped upstream (Step 5b interior gaps, # Step 6 multi-switch) appear in obs_gids_raw but must be # zero-weighted in the IF expansion. gid_list = ( group_ids_order.tolist() if hasattr(group_ids_order, "tolist") else list(group_ids_order) ) gid_set = set(gid_list) valid = np.array([g in gid_set for g in obs_gids_raw]) # Per-group total weight aligned to Y_mat row order W_g_all = np.zeros(n_groups, dtype=np.float64) for i, gid in enumerate(gid_list): mask_g = (obs_gids_raw == gid) & valid W_g_all[i] = obs_w_raw[mask_g].sum() else: df_s = None for l_h in range(1, L_max + 1): # Eligible switchers at this horizon (same logic as multi-horizon DID) eligible = [] dep_var = [] x_vals = [] cohort_keys = [] for g in range(n_groups): if path_groups is not None and g not in path_groups: continue f_g = first_switch_idx[g] if f_g < 0: continue # never-switcher ref_idx = f_g - 1 out_idx = f_g - 1 + l_h if out_idx >= n_periods: continue if ref_idx < 0: continue if N_mat[g, ref_idx] <= 0 or N_mat[g, out_idx] <= 0: continue if T_g[g] < out_idx: continue S_g = float(switch_direction[g]) y_diff = Y_mat[g, out_idx] - Y_mat[g, ref_idx] eligible.append(g) dep_var.append(S_g * y_diff) x_vals.append(X_het[g]) cohort_keys.append((float(baselines[g]), int(f_g), int(switch_direction[g]))) n_obs = len(eligible) if n_obs < 3: results[l_h] = { "beta": float("nan"), "se": float("nan"), "t_stat": float("nan"), "p_value": float("nan"), "conf_int": (float("nan"), float("nan")), "n_obs": n_obs, } continue dep_arr = np.array(dep_var) x_arr = np.array(x_vals).reshape(-1, 1) # Design: [intercept, X_g, cohort_dummies (reference dropped)] # The intercept is required when dropping one cohort dummy as # reference; without it the omitted cohort's mean is forced to # zero, which biases beta^{het}_l. intercept = np.ones((n_obs, 1)) unique_cohorts = sorted(set(cohort_keys)) n_cohort_dummies = len(unique_cohorts) - 1 if n_cohort_dummies > 0: cohort_map = {c: i for i, c in enumerate(unique_cohorts)} cohort_idx = np.array([cohort_map[c] for c in cohort_keys]) cohort_dummies = np.zeros((n_obs, len(unique_cohorts))) cohort_dummies[np.arange(n_obs), cohort_idx] = 1.0 # Drop first cohort as reference cohort_dummies = cohort_dummies[:, 1:] design = np.hstack([intercept, x_arr, cohort_dummies]) else: design = np.hstack([intercept, x_arr]) # Guard: need more observations than parameters n_params = design.shape[1] if n_obs <= n_params: results[l_h] = { "beta": float("nan"), "se": float("nan"), "t_stat": float("nan"), "p_value": float("nan"), "conf_int": (float("nan"), float("nan")), "n_obs": n_obs, } continue if not use_survey: # Plain OLS path (unchanged): standard inference per Lemma 7. coefs, _residuals, vcov = solve_ols( design, dep_arr, return_vcov=True, rank_deficient_action=rank_deficient_action, ) beta_het = float(coefs[1]) se_het = float("nan") if vcov is not None and np.isfinite(vcov[1, 1]) and vcov[1, 1] > 0: se_het = float(np.sqrt(vcov[1, 1])) t_stat, p_val, ci = safe_inference(beta_het, se_het, alpha=alpha, df=None) else: # Survey-aware path: WLS with per-group weights + TSL IF variance. W_elig = W_g_all[eligible] # solve_ols handles sqrt-weight scaling natively when # weight_type='pweight' (linalg.py). Skip vcov — we compute # design-based variance ourselves below. coefs, _residuals, _vcov_ignored = solve_ols( design, dep_arr, weights=W_elig, weight_type="pweight", return_vcov=False, rank_deficient_action=rank_deficient_action, ) # Rank-deficiency short-circuit: if any coef is NaN, return NaN # inference. Mixing solve_ols's R-style drop with a pinv-derived # IF would describe different estimands. if not np.all(np.isfinite(coefs)): results[l_h] = { "beta": float("nan"), "se": float("nan"), "t_stat": float("nan"), "p_value": float("nan"), "conf_int": (float("nan"), float("nan")), "n_obs": n_obs, } continue beta_het = float(coefs[1]) # Original-scale residuals (solve_ols applies sqrt-weight scaling # internally and back-transforms residuals, but we need them for # our IF computation below). r_g = dep_arr - design @ coefs # Group-level IF for β_X: ψ_g[X] = inv(X'WX)[1,:] @ x_g * W_g * r_g. # Under full rank (gated above), pinv == inv. Wrap matmuls in # errstate: macOS Accelerate BLAS can emit spurious divide/overflow # warnings on sparse-cohort designs even though the result is finite. with np.errstate(divide="ignore", invalid="ignore", over="ignore"): XtWX = design.T @ (W_elig[:, None] * design) XtWX_inv = np.linalg.pinv(XtWX) psi_g = (XtWX_inv[1, :] @ design.T) * W_elig * r_g # (n_eligible,) # Allocator dispatch. Two observation-level expansions of # ψ_g coexist on this path, split by variance helper: # # * Binder TSL (compute_survey_if_variance): cell-period # single-cell allocator — # ψ_i = ψ_g * (w_i / W_{g, out_idx}) # for obs in (g, out_idx), zero elsewhere. Under # PSU=group, per-obs distribution differs from the # legacy ψ_i = ψ_g * (w_i / W_g) but PSU-level # aggregates telescope to ψ_g, so Binder variance is # byte-identical. Under within-group-varying PSU, mass # lands in the post-period PSU of the transition, which # is what Binder needs. DID_l single-cell convention — # see REGISTRY.md ChaisemartinDHaultfoeuille survey IF # expansion Note. # # * Rao-Wu replicate (compute_replicate_if_variance): # legacy group-level allocator — # ψ_i = ψ_g * (w_i / W_g) # for obs in group g. Replicate variance computes # θ_r = sum_i ratio_ir * ψ_i at observation level, so # moving ψ_g onto the post-period cell only would # silently change the replicate SE whenever a # replicate column's ratios vary within group (e.g., # the per-row replicate matrices this library # accepts). The group-level allocator preserves # byte-identity for all replicate usages under # PSU=group. The replicate + within-group-varying # PSU case is not reachable (SurveyDesign rejects # replicate_weights combined with explicit psu). if getattr(resolved, "uses_replicate_variance", False): psi_obs = np.zeros(len(obs_w_raw), dtype=np.float64) for e_idx, g_idx in enumerate(eligible): gid = gid_list[g_idx] mask_g = (obs_gids_raw == gid) & valid w_sum_g = obs_w_raw[mask_g].sum() if w_sum_g > 0: psi_obs[mask_g] = psi_g[e_idx] * (obs_w_raw[mask_g] / w_sum_g) var_s, n_valid_het = compute_replicate_if_variance(psi_obs, resolved) if replicate_n_valid_list is not None: replicate_n_valid_list.append(n_valid_het) # Reduce df_s to reflect this horizon's n_valid. # R2 P1a: when the shared base-capped df_s is None # (undefined base df, e.g., QR-rank ≤ 1), the # heterogeneity df MUST stay None — per-site n_valid # cannot rescue a rank-deficient design. The # _inference_df wrapper at the safe_inference call # below coerces None to 0 under replicate, forcing # NaN inference. if df_s is None: df_s_local = None else: df_s_local = min(int(df_s), int(n_valid_het) - 1) else: obs_tids = np.asarray(obs_survey_info["time_ids"]) periods_arr = np.asarray(obs_survey_info["periods"]) psi_obs = np.zeros(len(obs_w_raw), dtype=np.float64) for e_idx, g_idx in enumerate(eligible): gid = gid_list[g_idx] out_idx = first_switch_idx[g_idx] - 1 + l_h t_val_out = periods_arr[out_idx] mask_cell = (obs_gids_raw == gid) & (obs_tids == t_val_out) & valid w_cell = obs_w_raw[mask_cell].sum() if w_cell > 0: psi_obs[mask_cell] = psi_g[e_idx] * (obs_w_raw[mask_cell] / w_cell) var_s = compute_survey_if_variance(psi_obs, resolved) df_s_local = df_s se_het = float(np.sqrt(var_s)) if np.isfinite(var_s) and var_s > 0 else float("nan") t_stat, p_val, ci = safe_inference( beta_het, se_het, alpha=alpha, df=_inference_df(df_s_local, resolved), ) results[l_h] = { "beta": beta_het, "se": se_het, "t_stat": t_stat, "p_value": p_val, "conf_int": ci, "n_obs": n_obs, } return results def _compute_design2_effects( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, ) -> Optional[Dict[str, Any]]: """Compute Design-2 switch-in/switch-out effects (Web Appendix Section 1.6). Identifies groups with exactly 2 treatment changes (join then leave), computes the exit period E_g, and provides delta^+ (post-join) and delta^- (post-leave) summaries. This is a convenience wrapper that reports descriptive statistics about the switch-in and switch-out subpopulations rather than a full re-estimation (which would require specialized control pools as described in the paper). See REGISTRY.md for documentation. Returns None if no join-then-leave groups exist. """ n_groups, n_periods = D_mat.shape # Identify join-then-leave groups: exactly 2 treatment changes where # the first is a join (D increases) and the second is a leave (D decreases) design2_groups = [] exit_periods = [] for g in range(n_groups): changes = [] for t in range(1, n_periods): if N_mat[g, t] <= 0 or N_mat[g, t - 1] <= 0: continue if D_mat[g, t] != D_mat[g, t - 1]: direction = 1 if D_mat[g, t] > D_mat[g, t - 1] else -1 changes.append((t, direction)) if len(changes) == 2 and changes[0][1] == 1 and changes[1][1] == -1: design2_groups.append(g) exit_periods.append(changes[1][0]) if len(design2_groups) == 0: return None # Compute summary statistics for the switch-in/switch-out subpopulation switch_in_effects = [] switch_out_effects = [] for i, g in enumerate(design2_groups): f_g = first_switch_idx[g] e_g = exit_periods[i] ref_idx = f_g - 1 # Switch-in: Y[g, f_g] - Y[g, f_g-1] (effect of joining) if ref_idx >= 0 and N_mat[g, f_g] > 0 and N_mat[g, ref_idx] > 0: switch_in = float(Y_mat[g, f_g] - Y_mat[g, ref_idx]) switch_in_effects.append(switch_in) # Switch-out: Y[g, e_g] - Y[g, e_g-1] (effect of leaving) if e_g - 1 >= 0 and N_mat[g, e_g] > 0 and N_mat[g, e_g - 1] > 0: switch_out = float(Y_mat[g, e_g] - Y_mat[g, e_g - 1]) switch_out_effects.append(switch_out) result: Dict[str, Any] = { "n_design2_groups": len(design2_groups), "switch_in": { "n_groups": len(switch_in_effects), "mean_effect": float(np.mean(switch_in_effects)) if switch_in_effects else np.nan, }, "switch_out": { "n_groups": len(switch_out_effects), "mean_effect": float(np.mean(switch_out_effects)) if switch_out_effects else np.nan, }, } return result def _build_covariate_diagnostics_df( diagnostics: Dict[str, Any], control_names: List[str], ) -> pd.DataFrame: """Build a tidy DataFrame from the per-baseline residualization diagnostics.""" rows = [] for d_val, diag in sorted(diagnostics.items()): theta = diag["theta_hat"] for k, name in enumerate(control_names): rows.append( { "baseline_treatment": d_val, "covariate": name, "theta_hat": float(theta[k]) if np.isfinite(theta[k]) else np.nan, "n_obs": diag["n_obs"], "r_squared": diag["r_squared"], } ) return pd.DataFrame(rows) # ====================================================================== # Phase 2: Multi-horizon helpers # ====================================================================== def _compute_group_switch_metadata( D_mat: np.ndarray, N_mat: np.ndarray, ) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: """ Compute per-group switch metadata from the pivoted panel matrices. For each group g, identifies the baseline treatment ``D_{g,1}``, the first-switch period index ``F_g`` (or -1 if never-switching), and the switch direction ``S_g`` (+1 joiner, -1 leaver, 0 never-switching). Also computes ``T_g`` - the last period index at which there is still a baseline-matched control that hasn't switched (needed for horizon eligibility). This helper is shared by Phase 1 (cohort-recentered IF in ``_compute_cohort_recentered_inputs``) and Phase 2 (multi-horizon ``DID_{g,l}`` computation). Parameters ---------- D_mat : np.ndarray of shape (n_groups, n_periods) Pivoted treatment matrix (cell-level, binary in Phase 1). N_mat : np.ndarray of shape (n_groups, n_periods) Pivoted observation-count matrix. Zero means group g is missing at period t. Returns ------- baselines : np.ndarray of shape (n_groups,), dtype int ``D_{g,1}`` for each group (treatment at the first global period). first_switch_idx : np.ndarray of shape (n_groups,), dtype int Period index of g's first treatment change (-1 if never-switching). This is ``F_g`` in the paper's notation, expressed as a column index into D_mat (0-based). switch_direction : np.ndarray of shape (n_groups,), dtype int ``S_g``: +1 if treatment increases at first switch (joiner), -1 if decreases (leaver), 0 if never-switching. T_g : np.ndarray of shape (n_groups,), dtype int For each group, the last period index at which a baseline-matched not-yet-switched control still exists. Groups whose baseline value has no other group that switches later get ``T_g = -1`` (they have no valid control at any horizon). This is used for horizon eligibility: ``DID_{g,l}`` is computable iff ``first_switch_idx[g] - 1 + l <= T_g[g]``. Raises ------ ValueError If any group is missing the first global period in N_mat (this should have been caught by fit() Step 5b validation). """ n_groups, n_periods = D_mat.shape # Defensive: fit() Step 5b rejects groups missing the baseline. if N_mat.size > 0 and (N_mat[:, 0] <= 0).any(): raise ValueError( "_compute_group_switch_metadata: at least one group is missing " "the first global period in N_mat. fit() Step 5b should have " "rejected this." ) baselines = D_mat[:, 0].astype(float) first_switch_idx = np.full(n_groups, -1, dtype=int) switch_direction = np.zeros(n_groups, dtype=int) for g in range(n_groups): for t in range(1, n_periods): if N_mat[g, t] <= 0 or N_mat[g, t - 1] <= 0: continue if D_mat[g, t] != D_mat[g, t - 1]: first_switch_idx[g] = t switch_direction[g] = 1 if D_mat[g, t] > D_mat[g, t - 1] else -1 break # T_g: for each group g, the last period at which there is still a # baseline-matched group whose treatment has NOT changed. This is # max_{g': D_{g',1} = D_{g,1}} (F_{g'} - 1), i.e., the period just # before the latest-switching control in g's baseline cohort. # Never-switching groups (F = -1) have F-1 = T (last period), so # they extend T_g to the panel end for their baseline cohort. unique_baselines = np.unique(baselines) max_control_period = {} # baseline -> max period index with a valid control for d in unique_baselines: baseline_mask = baselines == d # For each group with this baseline, the last period at which it # can still serve as a not-yet-switched control is F_g - 1 # (or n_periods - 1 if never-switching). f_vals = first_switch_idx[baseline_mask] control_last = np.where(f_vals == -1, n_periods - 1, f_vals - 1) max_control_period[float(d)] = int(control_last.max()) if control_last.size > 0 else -1 T_g = np.array( [max_control_period.get(float(baselines[g]), -1) for g in range(n_groups)], dtype=int, ) return baselines, first_switch_idx, switch_direction, T_g def _compute_multi_horizon_dids( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, set_ids: Optional[np.ndarray] = None, ) -> Dict[int, Dict[str, Any]]: """ Compute the per-group building block ``DID_{g,l}`` and its aggregate ``DID_l`` for horizons ``l = 1, ..., L_max``. Implements Equation 3 and Equation 5 of the dynamic companion paper (NBER WP 29873). For each switching group g eligible at horizon l:: DID_{g,l} = Y_{g, F_g-1+l} - Y_{g, F_g-1} - mean_{g' in controls} (Y_{g', F_g-1+l} - Y_{g', F_g-1}) where the control set is ``{g': D_{g',1} = D_{g,1}, F_{g'} > F_g-1+l}``. The aggregate is ``DID_l = (1/N_l) * sum S_g * DID_{g,l}`` over eligible groups. Parameters ---------- D_mat, Y_mat, N_mat : np.ndarray of shape (n_groups, n_periods) baselines, first_switch_idx, switch_direction, T_g : np.ndarray From ``_compute_group_switch_metadata()``. L_max : int Maximum horizon to compute. Returns ------- dict mapping horizon l -> { "did_l": float, # aggregate DID_l (NaN if N_l=0) "N_l": int, # count of eligible switching groups "did_g_l": np.ndarray, # per-group DID_{g,l} (NaN for non-eligible) "eligible_mask": np.ndarray, # boolean shape (n_groups,) "switcher_fraction": float, # N_l / N_1 (NaN if N_1=0) } """ n_groups, n_periods = D_mat.shape is_switcher = first_switch_idx >= 0 # Pre-compute per-baseline lookup of (group_indices, first_switch_indices) # for efficient control-pool identification. unique_baselines = np.unique(baselines) baseline_groups: Dict[float, np.ndarray] = {} baseline_f: Dict[float, np.ndarray] = {} for d in unique_baselines: mask = baselines == d baseline_groups[float(d)] = np.where(mask)[0] baseline_f[float(d)] = first_switch_idx[mask] results: Dict[int, Dict[str, Any]] = {} a11_multi_warnings: List[str] = [] N_1 = 0 # will be set at l=1 for switcher_fraction for l in range(1, L_max + 1): # noqa: E741 did_g_l = np.full(n_groups, np.nan) # Eligibility: switching group with F_g - 1 + l_h observable. # F_g is stored as a column index (0-based), so the outcome # period is first_switch_idx[g] - 1 + l. This must be a valid # column AND the group must be observed there (N_mat > 0). # Also, T_g[g] must be >= first_switch_idx[g] - 1 + l (controls # available at the outcome period). eligible = np.zeros(n_groups, dtype=bool) for g in range(n_groups): if not is_switcher[g]: continue f_g = first_switch_idx[g] ref_idx = f_g - 1 # period just before first switch out_idx = f_g - 1 + l # outcome period for horizon l if ref_idx < 0 or out_idx >= n_periods: continue if N_mat[g, ref_idx] <= 0 or N_mat[g, out_idx] <= 0: continue if T_g[g] < out_idx: continue # no baseline-matched control available eligible[g] = True N_l = int(eligible.sum()) if l == 1: N_1 = N_l if N_l == 0: results[l] = { "did_l": float("nan"), "N_l": 0, "did_g_l": did_g_l, "eligible_mask": eligible, "switcher_fraction": float("nan"), } continue # Compute DID_{g,l} for each eligible group. for g in np.where(eligible)[0]: f_g = first_switch_idx[g] ref_idx = f_g - 1 out_idx = f_g - 1 + l d_base = float(baselines[g]) # Switcher's outcome change switcher_change = Y_mat[g, out_idx] - Y_mat[g, ref_idx] # Control pool: same baseline, not yet switched by out_idx. # F_{g'} > out_idx (hasn't switched yet) OR F_{g'} = -1 # (never switches). Both must be observed at ref_idx and # out_idx. ctrl_indices = baseline_groups[d_base] ctrl_f = baseline_f[d_base] ctrl_mask = ( ((ctrl_f > out_idx) | (ctrl_f == -1)) & (N_mat[ctrl_indices, ref_idx] > 0) & (N_mat[ctrl_indices, out_idx] > 0) ) # State-set trends: restrict controls to same set as switcher if set_ids is not None: ctrl_mask &= set_ids[ctrl_indices] == set_ids[g] ctrl_pool = ctrl_indices[ctrl_mask] if ctrl_pool.size == 0: # No observed controls at this horizon (may be terminal # missingness, not a true A11 violation). Exclude the # group from N_l rather than zero-retaining, so the # missing-data case doesn't bias DID_l toward zero. eligible[g] = False a11_multi_warnings.append( f"horizon {l}, group_idx {g}: " f"no baseline-matched controls at outcome period" ) continue ctrl_changes = Y_mat[ctrl_pool, out_idx] - Y_mat[ctrl_pool, ref_idx] ctrl_avg = float(ctrl_changes.mean()) did_g_l[g] = switcher_change - ctrl_avg # Recompute N_l after control-pool exclusions N_l = int(eligible.sum()) if l == 1: N_1 = N_l if N_l == 0: results[l] = { "did_l": float("nan"), "N_l": 0, "did_g_l": did_g_l, "eligible_mask": eligible, "switcher_fraction": float("nan"), } continue # Aggregate: DID_l = (1/N_l) * sum S_g * DID_{g,l} S_eligible = switch_direction[eligible].astype(float) did_g_eligible = did_g_l[eligible] did_l = float((S_eligible * did_g_eligible).sum() / N_l) results[l] = { "did_l": did_l, "N_l": N_l, "did_g_l": did_g_l, "eligible_mask": eligible, "switcher_fraction": N_l / N_1 if N_1 > 0 else float("nan"), } # Attach A11 warnings to the results for the caller to surface if a11_multi_warnings: results["_a11_warnings"] = a11_multi_warnings # type: ignore[assignment] return results def _compute_per_group_if_multi_horizon( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, set_ids: Optional[np.ndarray] = None, compute_per_period: bool = True, switcher_subset_mask: Optional[np.ndarray] = None, ) -> Dict[int, Tuple[np.ndarray, Optional[np.ndarray]]]: """ Compute per-group influence function ``U^G_{g,l}`` for ``l = 1..L_max``. Each group g contributes to ``DID_l`` in two capacities: 1. **As a switcher** (if g is eligible at horizon l): contributes ``S_g * (Y_{g, F_g-1+l} - Y_{g, F_g-1})`` to the numerator. 2. **As a control** (if g serves as a not-yet-switched control for some other switcher g'): contributes ``-S_{g'} * (1/N^{g'}_{out}) * (Y_{g, out} - Y_{g, ref})`` where ref/out are g's reference/outcome periods. The result satisfies ``sum(U_l) == N_l * DID_l``, which is verified as a sanity check. Parameters ---------- D_mat, Y_mat, N_mat : np.ndarray of shape (n_groups, n_periods) baselines, first_switch_idx, switch_direction, T_g : np.ndarray From ``_compute_group_switch_metadata()``. L_max : int switcher_subset_mask : np.ndarray of bool, shape (n_groups,), optional When supplied, restricts the switcher iteration to groups where the mask is ``True``. Groups outside the subset contribute as controls only (their switcher-side contribution is skipped). The control pool is unchanged — this mirrors how the joiners-only / leavers-only IF is constructed (see ``_compute_cohort_recentered_inputs``). Used by ``by_path`` to zero out switcher contributions for groups not in the selected path. Default ``None`` preserves the legacy behavior of iterating over all switchers. Returns ------- dict mapping horizon l -> (U_g_l, U_per_period_l) tuple - ``U_g_l``: np.ndarray of shape (n_groups,). NOT cohort-centered; the caller applies ``_cohort_recenter()`` before computing SE. - ``U_per_period_l``: np.ndarray of shape (n_groups, n_periods). Per-``(g, t)``-cell contributions attributed to the outcome period ``t = F_g - 1 + l`` (same column for switcher and its controls, since they share the outcome period). Satisfies ``U_per_period_l.sum(axis=1) == U_g_l``. Consumed by the cell- period IF allocator for within-group-varying PSU designs. """ n_groups, n_periods = D_mat.shape is_switcher = first_switch_idx >= 0 # Pre-compute per-baseline group indices for control-pool lookup. unique_baselines = np.unique(baselines) baseline_groups: Dict[float, np.ndarray] = {} baseline_f: Dict[float, np.ndarray] = {} for d in unique_baselines: mask = baselines == d baseline_groups[float(d)] = np.where(mask)[0] baseline_f[float(d)] = first_switch_idx[mask] results: Dict[int, Tuple[np.ndarray, Optional[np.ndarray]]] = {} for l in range(1, L_max + 1): # noqa: E741 U_l = np.zeros(n_groups, dtype=float) U_per_period_l: Optional[np.ndarray] = ( np.zeros((n_groups, n_periods), dtype=float) if compute_per_period else None ) for g in range(n_groups): if not is_switcher[g]: continue if switcher_subset_mask is not None and not switcher_subset_mask[g]: continue f_g = first_switch_idx[g] ref_idx = f_g - 1 out_idx = f_g - 1 + l if ref_idx < 0 or out_idx >= n_periods: continue if N_mat[g, ref_idx] <= 0 or N_mat[g, out_idx] <= 0: continue if T_g[g] < out_idx: continue d_base = float(baselines[g]) S_g = float(switch_direction[g]) # Control pool for this switcher at this horizon ctrl_indices = baseline_groups[d_base] ctrl_f = baseline_f[d_base] ctrl_mask = ( ((ctrl_f > out_idx) | (ctrl_f == -1)) & (N_mat[ctrl_indices, ref_idx] > 0) & (N_mat[ctrl_indices, out_idx] > 0) ) # State-set trends: restrict controls to same set as switcher if set_ids is not None: ctrl_mask &= set_ids[ctrl_indices] == set_ids[g] ctrl_pool = ctrl_indices[ctrl_mask] n_ctrl = ctrl_pool.size if n_ctrl == 0: # No controls: A11-like, DID_{g,l} = 0. The switcher's # contribution to U_l is zero, but its count is in N_l. continue # Switcher contribution: +S_g * (Y_{g, out} - Y_{g, ref}). # Per-cell attribution convention: assign the whole contrast # to the outcome cell (g, out_idx). See REGISTRY.md's Note # on survey IF expansion for the rationale behind this # convention (library choice, not a derived result). switcher_change = Y_mat[g, out_idx] - Y_mat[g, ref_idx] U_l[g] += S_g * switcher_change if U_per_period_l is not None: U_per_period_l[g, out_idx] += S_g * switcher_change # Control contributions: each control g' in the pool gets # -S_g * (1/n_ctrl) * (Y_{g', out} - Y_{g', ref}). Same # post-period attribution as the switcher side. ctrl_changes = Y_mat[ctrl_pool, out_idx] - Y_mat[ctrl_pool, ref_idx] ctrl_contrib = (S_g / n_ctrl) * ctrl_changes U_l[ctrl_pool] -= ctrl_contrib if U_per_period_l is not None: U_per_period_l[ctrl_pool, out_idx] -= ctrl_contrib results[l] = (U_l, U_per_period_l) return results def _enumerate_treatment_paths( D_mat: np.ndarray, first_switch_idx: np.ndarray, N_mat: np.ndarray, L_max: int, by_path: Optional[int], paths_of_interest: Optional[List[Tuple[int, ...]]] = None, ) -> Tuple[ List[Tuple[int, ...]], Dict[Tuple[int, ...], np.ndarray], Dict[Tuple[int, ...], int], ]: """ Enumerate observed treatment paths and select either the top-``by_path`` most common or the user-specified ``paths_of_interest`` subset. For each switcher group ``g``, the path is the treatment tuple ``(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})`` — length ``L_max + 1``, matching R's ``did_multiplegt_dyn`` window convention. Groups whose window falls outside the panel or whose window contains unobserved cells are skipped (they contribute to no path bucket). Paths are ranked by group frequency; ties are broken lexicographically on the path tuple for deterministic ordering. If ``by_path`` exceeds the number of observed paths, all observed paths are returned with a ``UserWarning``. When ``paths_of_interest`` is provided, the user-specified subset is used instead of the top-k ranking. Duplicate paths emit a ``UserWarning`` and are deduplicated; paths not observed in the panel emit a ``UserWarning`` and are omitted from the result. Parameters ---------- D_mat : np.ndarray of shape (n_groups, n_periods) Treatment matrix. Binary (0/1) or integer-coded discrete treatment (D in Z); upstream validation enforces D == round(D) when ``not is_binary``. first_switch_idx : np.ndarray of shape (n_groups,) Index of first switch per group; ``-1`` for never-switching groups. N_mat : np.ndarray of shape (n_groups, n_periods) Cell-count matrix (zero where the cell is unobserved). L_max : int Event-study horizon; window length is ``L_max + 1``. by_path : int or None Number of most-common paths to select. Mutually exclusive with ``paths_of_interest``; exactly one of the two is non-None when the per-path branch fires. paths_of_interest : list of tuple[int, ...] or None, default None User-specified path subset. When provided, ``by_path`` is ignored. Returns ------- selected_paths : list of tuple[int, ...] Selected path tuples. Under ``by_path``, ordered by descending frequency. Under ``paths_of_interest``, in user-specified order modulo deduplication and unobserved-path filtering. path_to_group_mask : dict[tuple[int, ...], np.ndarray of bool] Per-path boolean mask over all ``n_groups`` identifying switchers that follow that path. path_to_count : dict[tuple[int, ...], int] Count of switcher groups per selected path. """ n_groups, n_periods = D_mat.shape path_of_group: Dict[int, Tuple[int, ...]] = {} for g in range(n_groups): f_g = int(first_switch_idx[g]) if f_g < 0: continue start = f_g - 1 stop = start + L_max + 1 if start < 0 or stop > n_periods: continue window_counts = N_mat[g, start:stop] if np.any(window_counts <= 0): continue window = D_mat[g, start:stop] if np.any(np.isnan(window)): continue path_of_group[g] = tuple(int(round(float(v))) for v in window) path_counts: Dict[Tuple[int, ...], int] = {} for path in path_of_group.values(): path_counts[path] = path_counts.get(path, 0) + 1 if paths_of_interest is not None: # Canonicalization (Tuple[int, ...] with Python int) happens in # __init__ / set_params via _validate_paths_of_interest, so # duplicates such as `[(np.int64(0), 1, 1, 1), (0, 1, 1, 1)]` # collapse to the same tuple here and the seen-set check fires. observed_paths_set = set(path_counts.keys()) seen: set = set() selected_paths: List[Tuple[int, ...]] = [] for p in paths_of_interest: if p in seen: warnings.warn( f"paths_of_interest contains duplicate path {p!r}; " f"deduplicating.", UserWarning, stacklevel=2, ) continue seen.add(p) if p not in observed_paths_set: warnings.warn( f"paths_of_interest path {p!r} has zero observed " f"groups in the panel; this path will be omitted " f"from path_effects.", UserWarning, stacklevel=2, ) continue selected_paths.append(p) else: observed_paths = sorted(path_counts.keys(), key=lambda p: (-path_counts[p], p)) n_observed = len(observed_paths) if by_path is None: # Defensive: caller always passes one of the two selectors. selected_paths = [] elif by_path >= n_observed: if by_path > n_observed and n_observed > 0: warnings.warn( f"by_path={by_path} exceeds the number of observed " f"paths ({n_observed}). Returning all observed paths.", UserWarning, stacklevel=2, ) selected_paths = observed_paths else: selected_paths = observed_paths[:by_path] path_to_group_mask: Dict[Tuple[int, ...], np.ndarray] = {} for path in selected_paths: mask = np.zeros(n_groups, dtype=bool) for g_idx, g_path in path_of_group.items(): if g_path == path: mask[g_idx] = True path_to_group_mask[path] = mask path_to_count = {p: path_counts[p] for p in selected_paths} return selected_paths, path_to_group_mask, path_to_count def _compute_path_effects( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, by_path: Optional[int], eligible_mask_var: np.ndarray, multi_horizon_dids: Dict[int, Dict[str, Any]], all_groups: List[Any], alpha: float, df_inference: Optional[int] = None, set_ids: Optional[np.ndarray] = None, paths_of_interest: Optional[List[Tuple[int, ...]]] = None, obs_survey_info: Optional[Dict[str, Any]] = None, eligible_groups: Optional[List[Any]] = None, replicate_n_valid_list: Optional[List[int]] = None, ) -> Optional[Dict[Tuple[int, ...], Dict[str, Any]]]: """ Compute per-path event-study effects using the joiners/leavers IF pattern. For each of the top-``by_path`` observed treatment paths, compute: - ``DID_{path, l}`` for ``l = 1..L_max`` as the within-path mean of ``DID_{g, l}`` — equals ``sum(U_l_path) / N_l_path`` where ``U_l_path`` is the per-group IF with switcher contributions zeroed for groups not in the path (control contributions and cohort structure unchanged). - SE depends on ``obs_survey_info``: * Non-survey (``obs_survey_info is None``): plug-in SE via ``_plugin_se(U_centered_path, divisor=N_l_path)`` after cohort- recentering with the ORIGINAL cohort structure. This mirrors how joiners_se / leavers_se use their respective counts as the divisor and preserve the full cohort structure. * Survey (``obs_survey_info is not None``): the path-restricted per-period IF is built and routed through ``_survey_se_from_group_if`` (analytical Binder TSL with the cell-period allocator; replicate-weight designs use the same cell allocator unconditionally). Under replicate weights every per-(path, l) fit appends ``n_valid`` to ``replicate_n_valid_list`` so the final ``df_survey`` reflects all per-path fits; the post-call ``_refresh_path_inference`` re-runs ``safe_inference`` on every populated entry so the stored ``t_stat`` / ``p_value`` / ``conf_int`` use the final ``df_survey`` rather than the compute-time snapshot. Returns an empty dict ``{}`` when ``by_path`` was requested but no switcher group has a complete ``[F_g - 1, F_g - 1 + L_max]`` window (all switchers too late in the panel or with unobserved cells). The empty dict is distinguished from ``None`` at the result layer: ``None`` means the feature was not requested, ``{}`` means requested but empty. A ``UserWarning`` is emitted so the caller sees that ``by_path`` was a no-op on this panel. """ from diff_diff.utils import safe_inference selected_paths, path_to_group_mask, path_to_count = _enumerate_treatment_paths( D_mat=D_mat, first_switch_idx=first_switch_idx, N_mat=N_mat, L_max=L_max, by_path=by_path, paths_of_interest=paths_of_interest, ) if not selected_paths: if paths_of_interest is not None: # Every requested path was unobserved (each emitted its own # per-path "zero observed groups" warning inside the # enumerator). Distinguish from the by_path=k case where # the panel itself has no complete-window path. warnings.warn( "paths_of_interest was requested but every " "user-specified path was either unobserved in the " "panel or had a window outside the L_max+1 " "convention (per-path 'zero observed groups' " "UserWarnings already issued). results.path_effects " "is populated as an empty dict to signal 'requested " "but empty'.", UserWarning, stacklevel=2, ) else: warnings.warn( f"by_path={by_path} was requested but no observed " f"treatment path has a complete window [F_g-1, " f"F_g-1+L_max={L_max}] within the panel. " f"results.path_effects is populated as an empty dict " f"to signal 'requested but empty'. Extend the panel " f"so switchers have L_max+1 consecutive observed " f"cells starting at F_g-1, or reduce L_max.", UserWarning, stacklevel=2, ) return {} # Cohort ids for the variance-eligible set (same construction as the # per-horizon SE path at the primary fit() site: (D_{g,1}, F_g, S_g)). n_groups = D_mat.shape[0] cohort_keys = [ ( float(baselines[g]), int(first_switch_idx[g]), int(switch_direction[g]), ) for g in range(n_groups) ] unique_c: Dict[Tuple[float, int, int], int] = {} cid = np.zeros(n_groups, dtype=int) for g in range(n_groups): if not eligible_mask_var[g]: cid[g] = -1 continue key = cohort_keys[g] if key not in unique_c: unique_c[key] = len(unique_c) cid[g] = unique_c[key] cohort_id_eligible = cid[eligible_mask_var] path_effects: Dict[Tuple[int, ...], Dict[str, Any]] = {} # `frequency_rank` is the within-selected-paths rank by descending # group count (lex tiebreak on the path tuple). Decoupled from the # iteration order over `selected_paths` so that under # `paths_of_interest` (user-specified order) the rank still # reflects true frequency. Under `by_path=k`, `selected_paths` is # already sorted by descending frequency so ranks coincide with # iteration order. rank_sorted_paths = sorted( selected_paths, key=lambda p: (-path_to_count[p], p), ) path_to_freq_rank = {p: i + 1 for i, p in enumerate(rank_sorted_paths)} for path in selected_paths: switcher_mask = path_to_group_mask[path] n_path_groups = int(switcher_mask.sum()) per_path_if = _compute_per_group_if_multi_horizon( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx, switch_direction=switch_direction, T_g=T_g, L_max=L_max, set_ids=set_ids, compute_per_period=(obs_survey_info is not None), switcher_subset_mask=switcher_mask, ) horizons: Dict[int, Dict[str, Any]] = {} for l_h in range(1, L_max + 1): U_l_path, U_pp_l_path = per_path_if[l_h] # N_l_path: path-restricted count of eligible switchers at # horizon l. Mirror _compute_multi_horizon_dids' eligibility # (the did_g_l array is NaN for non-eligible switchers). did_g_l = multi_horizon_dids[l_h].get("did_g_l") if did_g_l is None: n_l_path = 0 else: n_l_path = int(np.sum(switcher_mask & ~np.isnan(did_g_l))) if n_l_path == 0: horizons[l_h] = { "effect": float("nan"), "se": float("nan"), "t_stat": float("nan"), "p_value": float("nan"), "conf_int": (float("nan"), float("nan")), "n_obs": 0, } continue U_l_path_elig = U_l_path[eligible_mask_var] # Point estimate: within-path mean DID effect_path = float(U_l_path.sum() / n_l_path) # SE: cohort-recenter with ORIGINAL cohort structure. Under # survey, route through _survey_se_from_group_if (cell-period # allocator). Otherwise plug-in with path-specific divisor # (joiners/leavers pattern). U_centered_path = _cohort_recenter(U_l_path_elig, cohort_id_eligible) if obs_survey_info is None: se_path = _plugin_se(U_centered=U_centered_path, divisor=n_l_path) else: assert U_pp_l_path is not None assert eligible_groups is not None U_pp_l_path_elig = U_pp_l_path[eligible_mask_var] U_centered_pp_path = _cohort_recenter_per_period( U_pp_l_path_elig, cohort_id_eligible ) U_scaled = U_centered_path / n_l_path U_pp_scaled = U_centered_pp_path / n_l_path se_path, n_valid_replicates = _survey_se_from_group_if( U_centered=U_scaled, eligible_groups=eligible_groups, obs_survey_info=obs_survey_info, U_centered_per_period=U_pp_scaled, ) if n_valid_replicates is not None and replicate_n_valid_list is not None: replicate_n_valid_list.append(n_valid_replicates) # Path-scoped degenerate-cohort warning. Mirrors the overall- # path surface (`Cohort-recentered analytical variance is # unidentified...` at the main fit() site): when the centered # path IF is identically zero, the variance is unidentified # for this (path, horizon) despite n_l_path > 0. Common when # a low-frequency path's switchers all fall in singleton # (D_{g,1}, F_g, S_g) cohorts after the path subset. if np.isnan(se_path) and U_centered_path.size > 0 and n_l_path > 0: warnings.warn( f"Cohort-recentered analytical variance is " f"unidentified for path={path} at horizon l={l_h}: " f"the path-subset centered influence function is " f"identically zero (every variance-eligible path " f"switcher forms its own (D_{{g,1}}, F_g, S_g) " f"cohort, or the path has a single contributing " f"group). DID_{{path,l}} point estimate is still " f"valid; SE / t_stat / p_value / conf_int are " f"NaN-consistent. Rare paths with few contributing " f"groups routinely hit this case — include more " f"groups following this trajectory for a non-" f"degenerate analytical SE.", UserWarning, stacklevel=2, ) t_p, p_p, ci_p = safe_inference(effect_path, se_path, alpha=alpha, df=df_inference) horizons[l_h] = { "effect": effect_path, "se": se_path, "t_stat": t_p, "p_value": p_p, "conf_int": ci_p, "n_obs": n_l_path, } path_effects[path] = { "n_groups": n_path_groups, "frequency_rank": path_to_freq_rank[path], "horizons": horizons, } return path_effects def _compute_path_cumulated_event_study( D_mat: np.ndarray, N_mat: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, L_max: int, by_path: Optional[int], multi_horizon_dids: Dict[int, Dict[str, Any]], path_effects: Dict[Tuple[int, ...], Dict[str, Any]], alpha: float, df_inference: Optional[int] = None, paths_of_interest: Optional[List[Tuple[int, ...]]] = None, ) -> Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]: """ Per-path cumulated level effects under ``trends_linear=True``. Mirrors the global ``linear_trends_effects`` cumulation (``chaisemartin_dhaultfoeuille.py:3340-3398``): for each enumerated path, accumulate per-group running sums of ``DID^{fd}_{g, l'}`` over ``l' = 1..l``, then average over the path's switchers eligible at horizon ``l``. SE is the conservative upper bound (sum of per-horizon component SEs from ``path_effects[path]["horizons"][l']["se"]``, NaN-consistent: if any component SE is non-finite the cumulated SE is NaN). Inference (t-stat, p-value, CI) via ``safe_inference``. Returns ``{path: {horizon: {effect, se, t_stat, p_value, conf_int, n_obs}}}`` directly (no ``horizons`` wrapper), aligned with the ``path_placebo_event_study`` and global ``linear_trends_effects`` shapes. The outer keys match ``path_effects.keys()``; horizons that have NaN cumulated values still appear (for to_dataframe alignment). R parity: matches R ``did_multiplegt_dyn(..., by_path, trends_lin)`` which returns cumulated ``Effect_l`` per path under single-baseline panels (validated against ``joiners_only_trends_lin`` empirically). """ from diff_diff.utils import safe_inference with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) selected_paths, path_to_group_mask, _ = _enumerate_treatment_paths( D_mat=D_mat, first_switch_idx=first_switch_idx, N_mat=N_mat, L_max=L_max, by_path=by_path, paths_of_interest=paths_of_interest, ) n_groups_total = D_mat.shape[0] S_arr = switch_direction.astype(float) path_cumulated: Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]] = {} for path in selected_paths: if path not in path_effects: continue switcher_mask = path_to_group_mask[path] # Per-group running sum of DID^{fd}_{g, l'} for path switchers running_per_group = np.zeros(n_groups_total) path_horizons_anal = path_effects[path].get("horizons", {}) cumulated_path: Dict[int, Dict[str, Any]] = {} for l_h in range(1, L_max + 1): if l_h not in multi_horizon_dids: continue mh = multi_horizon_dids[l_h] did_g_l = mh.get("did_g_l") eligible_global = mh.get("eligible_mask") if did_g_l is None or eligible_global is None: cumulated_path[l_h] = { "effect": float("nan"), "se": float("nan"), "t_stat": float("nan"), "p_value": float("nan"), "conf_int": (float("nan"), float("nan")), "n_obs": 0, } continue # Add this horizon's per-group DID to running sum (NaN -> 0 # for accumulation, matching the global cumulation pattern) increment = np.where(np.isfinite(did_g_l), did_g_l, 0.0) running_per_group += increment # Path-restricted eligible mask at horizon l eligible_path = switcher_mask & eligible_global n_l_path = int(eligible_path.sum()) if n_l_path == 0: cumulated_path[l_h] = { "effect": float("nan"), "se": float("nan"), "t_stat": float("nan"), "p_value": float("nan"), "conf_int": (float("nan"), float("nan")), "n_obs": 0, } continue cum_effect = float( np.sum(S_arr[eligible_path] * running_per_group[eligible_path]) / n_l_path ) # Conservative SE upper bound: sum of per-horizon component # SEs from path_effects (matches global formula at :3402-3413). # NaN-consistency: any non-finite component SE -> cumulated NaN. component_ses = [ path_horizons_anal.get(ll, {}).get("se", float("nan")) for ll in range(1, l_h + 1) ] if all(np.isfinite(s) for s in component_ses): cum_se = float(sum(component_ses)) else: cum_se = float("nan") cum_t, cum_p, cum_ci = safe_inference( cum_effect, cum_se, alpha=alpha, df=df_inference, ) cumulated_path[l_h] = { "effect": cum_effect, "se": cum_se, "t_stat": cum_t, "p_value": cum_p, "conf_int": cum_ci, "n_obs": n_l_path, } path_cumulated[path] = cumulated_path return path_cumulated def _compute_path_placebos( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, by_path: Optional[int], eligible_mask_var: np.ndarray, multi_horizon_placebos: Dict[int, Dict[str, Any]], alpha: float, df_inference: Optional[int] = None, set_ids: Optional[np.ndarray] = None, paths_of_interest: Optional[List[Tuple[int, ...]]] = None, obs_survey_info: Optional[Dict[str, Any]] = None, eligible_groups: Optional[List[Any]] = None, replicate_n_valid_list: Optional[List[int]] = None, ) -> Optional[Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]]: """ Compute per-path backward-horizon placebos ``DID^{pl}_{path, l}``. Sibling of ``_compute_path_effects``: walks the same path enumeration and cohort-id pipeline but loops over backward horizons (lag ``l = 1..L_max``) using ``_compute_per_group_if_placebo_horizon`` with the new ``switcher_subset_mask`` parameter to zero out switcher contributions for groups not in the selected path. SE depends on ``obs_survey_info`` exactly like ``_compute_path_effects``: * Non-survey: cohort-recentered plug-in with path-specific divisor ``N^{pl}_{l, path}`` (joiners/leavers IF precedent applied backward). * Survey: the path-restricted per-period IF is routed through ``_survey_se_from_group_if`` (analytical Binder TSL cell-period allocator; replicate-weight designs use the cell allocator unconditionally). Under replicate weights, every per-(path, lag) fit appends ``n_valid`` to ``replicate_n_valid_list``, and the shared post-call ``_refresh_path_inference`` re-runs ``safe_inference`` on every populated entry so the stored inference fields use the final ``df_survey``. Inner-dict keys are **negative** ints (-l for lag l) to match the overall ``placebo_event_study`` convention, so a unified ``{**path_effects[p]["horizons"], **path_placebo_event_study[p]}`` view is well-formed. Returns ``{path: {-l: {effect, se, t_stat, p_value, conf_int, n_obs}}}`` directly (no ``n_groups`` / ``frequency_rank`` wrapper — those are already on ``path_effects[path]``; the rendering layer sorts by that rank). Returns ``{}`` when ``by_path`` was requested but no path has a complete window (mirrors ``_compute_path_effects``); the empty dict is the "requested but empty" sentinel distinct from ``None``. Inherits the cross-path cohort-sharing SE deviation from R that PR #360 documented for ``_compute_path_effects`` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within numerical tolerance on single-path cohort panels; diverges on cohort-mixed panels. See ``Note (Phase 3 by_path ...)`` in ``docs/methodology/REGISTRY.md``. The ``_enumerate_treatment_paths`` call here is wrapped in ``warnings.catch_warnings`` to suppress the overflow ``UserWarning`` duplicate — the analytical event-study pass (``_compute_path_effects``) has already surfaced that warning to the caller. """ from diff_diff.utils import safe_inference with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) selected_paths, path_to_group_mask, _ = _enumerate_treatment_paths( D_mat=D_mat, first_switch_idx=first_switch_idx, N_mat=N_mat, L_max=L_max, by_path=by_path, paths_of_interest=paths_of_interest, ) if not selected_paths: return {} n_groups = D_mat.shape[0] cohort_keys = [ ( float(baselines[g]), int(first_switch_idx[g]), int(switch_direction[g]), ) for g in range(n_groups) ] unique_c: Dict[Tuple[float, int, int], int] = {} cid = np.zeros(n_groups, dtype=int) for g in range(n_groups): if not eligible_mask_var[g]: cid[g] = -1 continue key = cohort_keys[g] if key not in unique_c: unique_c[key] = len(unique_c) cid[g] = unique_c[key] cohort_id_eligible = cid[eligible_mask_var] path_placebos: Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]] = {} for path in selected_paths: switcher_mask = path_to_group_mask[path] per_path_pl_if = _compute_per_group_if_placebo_horizon( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx, switch_direction=switch_direction, T_g=T_g, L_max=L_max, set_ids=set_ids, compute_per_period=(obs_survey_info is not None), switcher_subset_mask=switcher_mask, ) horizons: Dict[int, Dict[str, Any]] = {} for lag_l in range(1, L_max + 1): U_pl_l_path, U_pp_pl_l_path = per_path_pl_if[lag_l] pl_data = multi_horizon_placebos.get(lag_l) if pl_data is None: n_pl_l_path = 0 else: eligible_mask_pl = pl_data.get("eligible_mask") if eligible_mask_pl is None: n_pl_l_path = 0 else: n_pl_l_path = int(np.sum(switcher_mask & eligible_mask_pl)) if n_pl_l_path == 0: horizons[-lag_l] = { "effect": float("nan"), "se": float("nan"), "t_stat": float("nan"), "p_value": float("nan"), "conf_int": (float("nan"), float("nan")), "n_obs": 0, } continue U_pl_l_path_elig = U_pl_l_path[eligible_mask_var] effect_pl_path = float(U_pl_l_path.sum() / n_pl_l_path) U_centered_pl_path = _cohort_recenter(U_pl_l_path_elig, cohort_id_eligible) if obs_survey_info is None: se_pl_path = _plugin_se(U_centered=U_centered_pl_path, divisor=n_pl_l_path) else: assert U_pp_pl_l_path is not None assert eligible_groups is not None U_pp_pl_l_path_elig = U_pp_pl_l_path[eligible_mask_var] U_centered_pp_pl_path = _cohort_recenter_per_period( U_pp_pl_l_path_elig, cohort_id_eligible ) U_pl_scaled = U_centered_pl_path / n_pl_l_path U_pp_pl_scaled = U_centered_pp_pl_path / n_pl_l_path se_pl_path, n_valid_pl_replicates = _survey_se_from_group_if( U_centered=U_pl_scaled, eligible_groups=eligible_groups, obs_survey_info=obs_survey_info, U_centered_per_period=U_pp_pl_scaled, ) if n_valid_pl_replicates is not None and replicate_n_valid_list is not None: replicate_n_valid_list.append(n_valid_pl_replicates) if np.isnan(se_pl_path) and U_centered_pl_path.size > 0 and n_pl_l_path > 0: warnings.warn( f"Cohort-recentered analytical variance is " f"unidentified for path={path} at placebo lag " f"l={lag_l}: the path-subset centered placebo " f"influence function is identically zero (every " f"variance-eligible path switcher forms its own " f"(D_{{g,1}}, F_g, S_g) cohort, or the path has a " f"single contributing group). DID^{{pl}}_{{path,l}} " f"point estimate is still valid; SE / t_stat / " f"p_value / conf_int are NaN-consistent. Rare paths " f"with few contributing groups routinely hit this " f"case at placebo horizons.", UserWarning, stacklevel=2, ) t_pl, p_pl, ci_pl = safe_inference( effect_pl_path, se_pl_path, alpha=alpha, df=df_inference ) horizons[-lag_l] = { "effect": effect_pl_path, "se": se_pl_path, "t_stat": t_pl, "p_value": p_pl, "conf_int": ci_pl, "n_obs": n_pl_l_path, } path_placebos[path] = horizons return path_placebos def _compute_path_heterogeneity_test( Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, X_het: np.ndarray, L_max: int, by_path: Optional[int], paths_of_interest: Optional[List[Tuple[int, ...]]], D_mat: np.ndarray, alpha: float = 0.05, rank_deficient_action: str = "warn", group_ids_order: Optional[np.ndarray] = None, obs_survey_info: Optional[Dict[str, Any]] = None, replicate_n_valid_list: Optional[List[int]] = None, ) -> Optional[Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]]: """Per-path heterogeneity test (Web Appendix Section 1.5, Lemma 7). For each selected path ``p``, runs ``_compute_heterogeneity_test`` on the path-restricted switcher subsample. Cohort dummies absorb baseline by construction, so the path-restricted regression is methodologically well-posed even when path switchers span multiple baselines. Mirrors R ``did_multiplegt_dyn(..., by_path, predict_het)`` semantics: the R per-path dispatcher re-runs ``did_multiplegt_main(..., predict_het=...)`` on each path-restricted subsample, which is exactly what this helper does in Python. The ``_enumerate_treatment_paths`` call here re-derives the path enumeration (already computed elsewhere in fit() for ``path_effects``). The call is wrapped in ``warnings.catch_warnings()`` to suppress duplicate unobserved-path / by_path-exceeds-observed warnings; the upstream ``_compute_path_effects`` call already surfaced them. Returns ------- Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]] ``{path: {l: {beta, se, t_stat, p_value, conf_int, n_obs}}}``. Returns ``{}`` if ``selected_paths`` is empty. Return type is ``Optional[...]`` for caller-contract symmetry; the helper itself never produces ``None`` (the empty-state distinction ``None`` not requested vs ``{}`` requested but empty lives at the caller site in ``fit()``). """ with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) selected_paths, path_to_group_mask, _ = _enumerate_treatment_paths( D_mat=D_mat, first_switch_idx=first_switch_idx, N_mat=N_mat, L_max=L_max, by_path=by_path, paths_of_interest=paths_of_interest, ) out: Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]] = {} for path in selected_paths: mask = path_to_group_mask[path] path_groups: Set[int] = {int(g) for g in np.flatnonzero(mask)} out[path] = _compute_heterogeneity_test( Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx, switch_direction=switch_direction, T_g=T_g, X_het=X_het, L_max=L_max, alpha=alpha, rank_deficient_action=rank_deficient_action, group_ids_order=group_ids_order, obs_survey_info=obs_survey_info, replicate_n_valid_list=replicate_n_valid_list, path_groups=path_groups, ) return out def _collect_path_bootstrap_inputs( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, by_path: Optional[int], eligible_mask_var: np.ndarray, multi_horizon_dids: Dict[int, Dict[str, Any]], path_effects: Dict[Tuple[int, ...], Dict[str, Any]], set_ids: Optional[np.ndarray] = None, paths_of_interest: Optional[List[Tuple[int, ...]]] = None, ) -> Dict[Tuple[int, ...], Dict[int, Tuple[np.ndarray, int, float, None]]]: """ Collect per-(path, horizon) inputs for the bootstrap mixin. Walks the same path enumeration / per-path IF / cohort-recentering pipeline that ``_compute_path_effects`` uses, but returns the intermediate ``(U_centered_path, n_l_path, effect_path)`` triples needed by ``_compute_dcdh_bootstrap``. Lives as a sibling of ``_compute_path_effects`` (not extending it) to keep the already-large analytical helper focused and avoid a polymorphic return shape. The bootstrap-only recomputation is O(n_paths * n_groups * L_max), which is small compared to the bootstrap draw loop itself. The point estimate per ``(path, horizon)`` is read from ``path_effects`` to stay bit-identical with the analytical pass; the bootstrap distribution gets centered on this value by ``_bootstrap_one_target`` downstream. Returns a nested dict ``{path: {horizon: (U_centered, n, effect, None)}}``; the 4th slot is always ``None`` because the multiplier-bootstrap path under ``survey_design + by_path`` is gated out at fit-time (``n_bootstrap > 0`` + ``survey_design`` + per-path selectors raises ``NotImplementedError``); analytical and replicate-weight survey SE under per-path selectors flows through ``_compute_path_effects`` directly and does not reach this helper. ``_enumerate_treatment_paths`` is called again here (the analytical pass already called it inside ``_compute_path_effects``). The enumeration result is deterministic given identical arguments, so the selected paths and group masks will match bit-for-bit. The surrounding ``warnings.catch_warnings`` suppresses the overflow ``UserWarning`` from the re-enumeration — the analytical pass has already surfaced that warning to the caller, and re-emitting it from the bootstrap helper would be a spurious duplicate. """ with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) selected_paths, path_to_group_mask, _ = _enumerate_treatment_paths( D_mat=D_mat, first_switch_idx=first_switch_idx, N_mat=N_mat, L_max=L_max, by_path=by_path, paths_of_interest=paths_of_interest, ) n_groups = D_mat.shape[0] cohort_keys = [ ( float(baselines[g]), int(first_switch_idx[g]), int(switch_direction[g]), ) for g in range(n_groups) ] unique_c: Dict[Tuple[float, int, int], int] = {} cid = np.zeros(n_groups, dtype=int) for g in range(n_groups): if not eligible_mask_var[g]: cid[g] = -1 continue key = cohort_keys[g] if key not in unique_c: unique_c[key] = len(unique_c) cid[g] = unique_c[key] cohort_id_eligible = cid[eligible_mask_var] path_bootstrap_inputs: Dict[Tuple[int, ...], Dict[int, Tuple[np.ndarray, int, float, None]]] = ( {} ) for path in selected_paths: switcher_mask = path_to_group_mask[path] per_path_if = _compute_per_group_if_multi_horizon( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx, switch_direction=switch_direction, T_g=T_g, L_max=L_max, set_ids=set_ids, compute_per_period=False, switcher_subset_mask=switcher_mask, ) horizon_inputs: Dict[int, Tuple[np.ndarray, int, float, None]] = {} path_analytical = path_effects.get(path) if path_analytical is None: continue for l_h in range(1, L_max + 1): U_l_path, _ = per_path_if[l_h] did_g_l = multi_horizon_dids[l_h].get("did_g_l") if did_g_l is None: continue n_l_path = int(np.sum(switcher_mask & ~np.isnan(did_g_l))) if n_l_path == 0: continue U_l_path_elig = U_l_path[eligible_mask_var] U_centered_path = _cohort_recenter(U_l_path_elig, cohort_id_eligible) effect_path = float(path_analytical["horizons"][l_h]["effect"]) horizon_inputs[l_h] = (U_centered_path, n_l_path, effect_path, None) if horizon_inputs: path_bootstrap_inputs[path] = horizon_inputs return path_bootstrap_inputs def _collect_path_placebo_bootstrap_inputs( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, by_path: Optional[int], eligible_mask_var: np.ndarray, multi_horizon_placebos: Dict[int, Dict[str, Any]], path_placebos: Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]], set_ids: Optional[np.ndarray] = None, paths_of_interest: Optional[List[Tuple[int, ...]]] = None, ) -> Dict[Tuple[int, ...], Dict[int, Tuple[np.ndarray, int, float, None]]]: """ Collect per-(path, lag) inputs for the placebo bootstrap mixin dispatch. Sibling of ``_collect_path_bootstrap_inputs``. Walks the same path enumeration / per-path placebo IF / cohort-recentering pipeline that ``_compute_path_placebos`` uses, but returns the ``(U_centered_path, n_pl_l_path, effect_pl_path)`` triples needed by ``_compute_dcdh_bootstrap``'s per-`(path, lag_l)` placebo dispatch block. Returned dict keys lag by **positive** int (l = 1..L_max), matching the inner-key convention of ``placebo_horizon_inputs`` already consumed by the bootstrap mixin. The propagation block in ``fit()`` translates back to negative-keyed ``path_placebo_event_study[path][-lag_l]`` post-bootstrap. The point estimate per ``(path, lag_l)`` is read from ``path_placebos[path][-lag_l]["effect"]`` (note: no ``["horizons"]`` wrapper -- ``_compute_path_placebos`` returns the negative-keyed inner dict directly, unlike ``_compute_path_effects`` which wraps its horizons under a ``["horizons"]`` key) to stay bit-identical with the analytical pass; the bootstrap distribution gets centered on this value by ``_bootstrap_one_target`` downstream. The ``warnings.catch_warnings`` block suppresses the re-enumeration overflow ``UserWarning``; the analytical event-study pass (``_compute_path_effects``) already surfaced that warning. """ with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) selected_paths, path_to_group_mask, _ = _enumerate_treatment_paths( D_mat=D_mat, first_switch_idx=first_switch_idx, N_mat=N_mat, L_max=L_max, by_path=by_path, paths_of_interest=paths_of_interest, ) n_groups = D_mat.shape[0] cohort_keys = [ ( float(baselines[g]), int(first_switch_idx[g]), int(switch_direction[g]), ) for g in range(n_groups) ] unique_c: Dict[Tuple[float, int, int], int] = {} cid = np.zeros(n_groups, dtype=int) for g in range(n_groups): if not eligible_mask_var[g]: cid[g] = -1 continue key = cohort_keys[g] if key not in unique_c: unique_c[key] = len(unique_c) cid[g] = unique_c[key] cohort_id_eligible = cid[eligible_mask_var] path_placebo_bootstrap_inputs: Dict[ Tuple[int, ...], Dict[int, Tuple[np.ndarray, int, float, None]] ] = {} for path in selected_paths: switcher_mask = path_to_group_mask[path] per_path_pl_if = _compute_per_group_if_placebo_horizon( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, baselines=baselines, first_switch_idx=first_switch_idx, switch_direction=switch_direction, T_g=T_g, L_max=L_max, set_ids=set_ids, compute_per_period=False, switcher_subset_mask=switcher_mask, ) horizon_inputs: Dict[int, Tuple[np.ndarray, int, float, None]] = {} path_analytical = path_placebos.get(path) if path_analytical is None: continue for lag_l in range(1, L_max + 1): U_pl_l_path, _ = per_path_pl_if[lag_l] pl_data = multi_horizon_placebos.get(lag_l) if pl_data is None: continue eligible_mask_pl = pl_data.get("eligible_mask") if eligible_mask_pl is None: continue n_pl_l_path = int(np.sum(switcher_mask & eligible_mask_pl)) if n_pl_l_path == 0: continue U_pl_l_path_elig = U_pl_l_path[eligible_mask_var] U_centered_pl_path = _cohort_recenter(U_pl_l_path_elig, cohort_id_eligible) effect_pl_path = float(path_analytical[-lag_l]["effect"]) horizon_inputs[lag_l] = ( U_centered_pl_path, n_pl_l_path, effect_pl_path, None, ) if horizon_inputs: path_placebo_bootstrap_inputs[path] = horizon_inputs return path_placebo_bootstrap_inputs def _compute_per_group_if_placebo_horizon( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, set_ids: Optional[np.ndarray] = None, compute_per_period: bool = True, switcher_subset_mask: Optional[np.ndarray] = None, ) -> Dict[int, Tuple[np.ndarray, Optional[np.ndarray]]]: """ Compute per-group influence function for placebo horizons. Mirrors ``_compute_per_group_if_multi_horizon`` but for backward horizons, matching ``_compute_multi_horizon_placebos`` eligibility and control-pool logic exactly. For placebo lag ``l``, switcher ``g``'s contribution uses the backward outcome change ``Y_{g, F_g-1-l} - Y_{g, F_g-1}`` (paper convention: pre-period minus reference). Controls are identified by the **positive**-horizon cutoff ``F_{g'} > F_g - 1 + l`` AND observation at ``ref_idx``, ``backward_idx``, AND ``forward_idx`` (the terminal-missingness guard from Phase 2 Round 9). Parameters ---------- switcher_subset_mask : np.ndarray of bool, shape (n_groups,), optional When supplied, restricts the switcher iteration to groups where the mask is ``True``. Groups outside the subset contribute as controls only (their switcher-side contribution is skipped). The control pool is unchanged. Mirrors the same parameter on ``_compute_per_group_if_multi_horizon`` and is used by ``by_path`` placebos to zero out switcher contributions for groups not in the selected path. Default ``None`` preserves the legacy behavior of iterating over all switchers. Returns ------- dict mapping lag l (positive int) -> (U_pl_l, U_per_period_pl_l) tuple - ``U_pl_l``: np.ndarray of shape (n_groups,). NOT cohort- centered; the caller applies ``_cohort_recenter()``. - ``U_per_period_pl_l``: np.ndarray of shape (n_groups, n_periods). Per-``(g, t)`` contributions attributed to ``t = backward_idx`` (the pre-period observation that supports the contrast). Satisfies ``U_per_period_pl_l.sum(axis=1) == U_pl_l``. """ n_groups, n_periods = D_mat.shape is_switcher = first_switch_idx >= 0 unique_baselines = np.unique(baselines) baseline_groups: Dict[float, np.ndarray] = {} baseline_f: Dict[float, np.ndarray] = {} for d in unique_baselines: mask = baselines == d baseline_groups[float(d)] = np.where(mask)[0] baseline_f[float(d)] = first_switch_idx[mask] results: Dict[int, Tuple[np.ndarray, Optional[np.ndarray]]] = {} for l in range(1, L_max + 1): # noqa: E741 U_pl = np.zeros(n_groups, dtype=float) U_per_period_pl: Optional[np.ndarray] = ( np.zeros((n_groups, n_periods), dtype=float) if compute_per_period else None ) for g in range(n_groups): if not is_switcher[g]: continue if switcher_subset_mask is not None and not switcher_subset_mask[g]: continue f_g = first_switch_idx[g] ref_idx = f_g - 1 backward_idx = ref_idx - l forward_idx = ref_idx + l # Dual eligibility (matches _compute_multi_horizon_placebos) if backward_idx < 0 or forward_idx >= n_periods: continue if N_mat[g, ref_idx] <= 0 or N_mat[g, backward_idx] <= 0: continue if T_g[g] < forward_idx: continue d_base = float(baselines[g]) S_g = float(switch_direction[g]) # Control pool: same baseline, not switched by forward_idx, # observed at ref, backward, AND forward (terminal-missingness # guard). Matches _compute_multi_horizon_placebos exactly. ctrl_indices = baseline_groups[d_base] ctrl_f = baseline_f[d_base] ctrl_mask = ( ((ctrl_f > forward_idx) | (ctrl_f == -1)) & (N_mat[ctrl_indices, ref_idx] > 0) & (N_mat[ctrl_indices, backward_idx] > 0) & (N_mat[ctrl_indices, forward_idx] > 0) ) # State-set trends: restrict controls to same set if set_ids is not None: ctrl_mask &= set_ids[ctrl_indices] == set_ids[g] ctrl_pool = ctrl_indices[ctrl_mask] n_ctrl = ctrl_pool.size if n_ctrl == 0: continue # Switcher contribution: paper convention backward - ref. # Attribute the whole contrast to the backward cell # (mirrors the multi-horizon / DID_M post-period # attribution convention). switcher_change = Y_mat[g, backward_idx] - Y_mat[g, ref_idx] U_pl[g] += S_g * switcher_change if U_per_period_pl is not None: U_per_period_pl[g, backward_idx] += S_g * switcher_change # Control contributions ctrl_changes = Y_mat[ctrl_pool, backward_idx] - Y_mat[ctrl_pool, ref_idx] ctrl_contrib = (S_g / n_ctrl) * ctrl_changes U_pl[ctrl_pool] -= ctrl_contrib if U_per_period_pl is not None: U_per_period_pl[ctrl_pool, backward_idx] -= ctrl_contrib results[l] = (U_pl, U_per_period_pl) return results def _compute_multi_horizon_placebos( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, T_g: np.ndarray, L_max: int, set_ids: Optional[np.ndarray] = None, ) -> Dict[int, Dict[str, Any]]: """ Compute dynamic placebo estimators ``DID^{pl}_l`` for ``l = 1..L_pl_max``. Mirrors ``_compute_multi_horizon_dids`` but looks BACKWARD from each group's reference period (Web Appendix Section 1.1, Lemma 5). **Dual eligibility condition:** a group g is eligible for placebo lag l iff: - ``F_g - 1 - l >= 0`` (enough pre-treatment history), AND - ``F_g - 1 + l <= T_g`` (positive-horizon control pool exists) The control set uses the *positive*-horizon cutoff: ``{g': D_{g',1} = D_{g,1}, F_{g'} > F_g - 1 + l}``. Returns ------- dict mapping lag l (positive int) -> { "placebo_l": float, "N_pl_l": int, "eligible_mask": np.ndarray, } """ n_groups, n_periods = D_mat.shape is_switcher = first_switch_idx >= 0 unique_baselines = np.unique(baselines) baseline_groups: Dict[float, np.ndarray] = {} baseline_f: Dict[float, np.ndarray] = {} for d in unique_baselines: mask = baselines == d baseline_groups[float(d)] = np.where(mask)[0] baseline_f[float(d)] = first_switch_idx[mask] results: Dict[int, Dict[str, Any]] = {} a11_placebo_warnings: List[str] = [] for l in range(1, L_max + 1): # noqa: E741 eligible = np.zeros(n_groups, dtype=bool) pl_g_l = np.full(n_groups, np.nan) for g in range(n_groups): if not is_switcher[g]: continue f_g = first_switch_idx[g] ref_idx = f_g - 1 backward_idx = ref_idx - l # the pre-treatment outcome period forward_idx = ref_idx + l # for control-pool eligibility # Dual eligibility: backward must be in range, forward must # have controls available if backward_idx < 0 or forward_idx >= n_periods: continue if N_mat[g, ref_idx] <= 0 or N_mat[g, backward_idx] <= 0: continue if T_g[g] < forward_idx: continue eligible[g] = True N_pl_l = int(eligible.sum()) if N_pl_l == 0: results[l] = { "placebo_l": float("nan"), "N_pl_l": 0, "eligible_mask": eligible, } continue for g in np.where(eligible)[0]: f_g = first_switch_idx[g] ref_idx = f_g - 1 backward_idx = ref_idx - l forward_idx = ref_idx + l d_base = float(baselines[g]) # Switcher's backward outcome change: pre-period minus reference # (paper convention: Y_{F_g-1-l} - Y_{F_g-1}) switcher_change = Y_mat[g, backward_idx] - Y_mat[g, ref_idx] # Control pool: same baseline, not switched by forward_idx, # AND observed at all three relevant periods (ref, backward, # AND forward - the last ensures terminally missing controls # don't leak into the placebo computation). ctrl_indices = baseline_groups[d_base] ctrl_f = baseline_f[d_base] ctrl_mask = ( ((ctrl_f > forward_idx) | (ctrl_f == -1)) & (N_mat[ctrl_indices, ref_idx] > 0) & (N_mat[ctrl_indices, backward_idx] > 0) & (N_mat[ctrl_indices, forward_idx] > 0) ) # State-set trends: restrict controls to same set if set_ids is not None: ctrl_mask &= set_ids[ctrl_indices] == set_ids[g] ctrl_pool = ctrl_indices[ctrl_mask] if ctrl_pool.size == 0: eligible[g] = False a11_placebo_warnings.append(f"placebo lag {l}, group_idx {g}: no controls") continue ctrl_changes = Y_mat[ctrl_pool, backward_idx] - Y_mat[ctrl_pool, ref_idx] ctrl_avg = float(ctrl_changes.mean()) pl_g_l[g] = switcher_change - ctrl_avg # Recompute N_pl_l after control-pool exclusions N_pl_l = int(eligible.sum()) if N_pl_l == 0: results[l] = { "placebo_l": float("nan"), "N_pl_l": 0, "eligible_mask": eligible, } continue S_eligible = switch_direction[eligible].astype(float) pl_g_eligible = pl_g_l[eligible] placebo_l = float((S_eligible * pl_g_eligible).sum() / N_pl_l) results[l] = { "placebo_l": placebo_l, "N_pl_l": N_pl_l, "eligible_mask": eligible, } if a11_placebo_warnings: results["_a11_warnings"] = a11_placebo_warnings # type: ignore[assignment] return results def _compute_normalized_effects( multi_horizon_dids: Dict[int, Dict[str, Any]], D_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, L_max: int, ) -> Dict[int, Dict[str, Any]]: """ Compute normalized event-study effects ``DID^n_l = DID_l / delta^D_l``. Uses the general formula (Eq 15) that works for both binary and non-binary treatment (future-proofing for Phase 3). For binary treatment: ``delta^D_{g,l} = l`` (joiners) or ``-l`` (leavers), so ``|delta^D_{g,l}| = l`` and ``DID^n_l = DID_l / l``. Returns ------- dict mapping l -> {effect, denominator} """ n_groups = D_mat.shape[0] results: Dict[int, Dict[str, Any]] = {} for l in range(1, L_max + 1): # noqa: E741 h = multi_horizon_dids.get(l) if h is None or h["N_l"] == 0: results[l] = {"effect": float("nan"), "denominator": float("nan")} continue eligible = h["eligible_mask"] N_l = h["N_l"] did_l = h["did_l"] # Per-group incremental dose: delta^D_{g,l} = sum_{k=0}^{l-1} (D_{g,F_g+k} - D_{g,1}) # General formula, works for non-binary treatment. delta_D_g = np.zeros(n_groups) for g in np.where(eligible)[0]: f_g = first_switch_idx[g] d_base = baselines[g] dose_sum = 0.0 for k in range(l): col = f_g + k if col < D_mat.shape[1]: dose_sum += D_mat[g, col] - d_base delta_D_g[g] = dose_sum # Aggregate dose denominator delta_D_l = float(np.abs(delta_D_g[eligible]).sum() / N_l) if delta_D_l <= 0: results[l] = {"effect": float("nan"), "denominator": 0.0} continue results[l] = { "effect": did_l / delta_D_l, "denominator": delta_D_l, } return results def _compute_cost_benefit_delta( multi_horizon_dids: Dict[int, Dict[str, Any]], D_mat: np.ndarray, baselines: np.ndarray, first_switch_idx: np.ndarray, switch_direction: np.ndarray, L_max: int, ) -> Dict[str, Any]: """ Compute the cost-benefit aggregate ``delta`` from Section 3.3, Lemma 4. ``delta = sum_l w_l * DID_l`` where ``w_l = N_l / sum_{g,l'} |D_{g,F_g-1+l'} - D_{g,1}|``. When leavers are present (Assumption 7 violated), also computes ``delta_joiners`` and ``delta_leavers`` separately. Returns ------- dict with keys: delta, weights, has_leavers, delta_joiners, delta_leavers """ # Per-horizon dose via Lemma 4: w_l uses the PER-PERIOD dose # D_{g,F_g-1+l} - D_{g,1} (NOT the cumulative delta^D_{g,l}). # For binary joiners this is 1 per (g,l) pair, so w_l = N_l / sum N_l'. total_dose = 0.0 per_horizon_dose: Dict[int, float] = {} for l in range(1, L_max + 1): # noqa: E741 h = multi_horizon_dids.get(l) if h is None or h["N_l"] == 0: per_horizon_dose[l] = 0.0 continue eligible = h["eligible_mask"] dose_l = 0.0 for g in np.where(eligible)[0]: f_g = first_switch_idx[g] col = f_g - 1 + l if col < D_mat.shape[1]: dose_l += abs(float(D_mat[g, col] - baselines[g])) per_horizon_dose[l] = dose_l total_dose += dose_l if total_dose <= 0: return { "delta": float("nan"), "weights": {}, "has_leavers": False, "delta_joiners": float("nan"), "delta_leavers": float("nan"), } # Horizon weights: w_l = N_l / total_dose (but using dose, not N_l) # Per Lemma 4: w_l = N_l * E[|delta^D_{g,l}|] / total_dose # which simplifies to per_horizon_dose[l] / total_dose weights: Dict[int, float] = {} delta = 0.0 for l in range(1, L_max + 1): # noqa: E741 h = multi_horizon_dids.get(l) if h is None or h["N_l"] == 0: weights[l] = 0.0 continue w_l = per_horizon_dose[l] / total_dose weights[l] = w_l delta += w_l * h["did_l"] # Check for leavers (Assumption 7 violation) has_leavers = bool(np.any(switch_direction < 0)) delta_joiners = float("nan") delta_leavers = float("nan") if has_leavers: # Compute delta separately for joiners and leavers for direction, attr_name in [(1, "joiners"), (-1, "leavers")]: dir_dose = 0.0 dir_horizon_dose: Dict[int, float] = {} for l in range(1, L_max + 1): # noqa: E741 h = multi_horizon_dids.get(l) if h is None or h["N_l"] == 0: dir_horizon_dose[l] = 0.0 continue eligible = h["eligible_mask"] dose_l = 0.0 for g in np.where(eligible)[0]: if switch_direction[g] != direction: continue f_g = first_switch_idx[g] col = f_g - 1 + l if col < D_mat.shape[1]: dose_l += abs(float(D_mat[g, col] - baselines[g])) dir_horizon_dose[l] = dose_l dir_dose += dose_l if dir_dose > 0: dir_delta = 0.0 for l in range(1, L_max + 1): # noqa: E741 h = multi_horizon_dids.get(l) if h is None or h["N_l"] == 0: continue eligible = h["eligible_mask"] # Per-direction DID_l dir_eligible = eligible & (switch_direction == direction) n_dir = int(dir_eligible.sum()) if n_dir == 0: continue did_g_l = h["did_g_l"] S = switch_direction[dir_eligible].astype(float) did_l_dir = float((S * did_g_l[dir_eligible]).sum() / n_dir) w_dir = dir_horizon_dose[l] / dir_dose dir_delta += w_dir * did_l_dir if attr_name == "joiners": delta_joiners = dir_delta else: delta_leavers = dir_delta return { "delta": delta, "weights": weights, "has_leavers": has_leavers, "delta_joiners": delta_joiners, "delta_leavers": delta_leavers, } def _compute_full_per_group_contributions( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, n_10_t_arr: np.ndarray, n_00_t_arr: np.ndarray, n_01_t_arr: np.ndarray, n_11_t_arr: np.ndarray, a11_plus_zeroed_arr: np.ndarray, a11_minus_zeroed_arr: np.ndarray, side: str = "overall", compute_per_period: bool = True, ) -> Tuple[np.ndarray, Optional[np.ndarray]]: """ Compute the per-group influence function ``U^G_g`` for ``DID_M``, ``DID_+``, or ``DID_-`` by summing role-weighted outcome differences across all periods (full ``Lambda^G_{g,l=1}`` from Section 3.7.2 of the dynamic companion paper, evaluated at horizon ``l = 1``). Decomposition (for ``side='overall'``):: N_S * DID_M = sum_t [ sum_{g in joiners(t)} (Y_{g,t} - Y_{g,t-1}) - (n_10_t / n_00_t) * sum_{g in stable_0(t)} (Y_{g,t} - Y_{g,t-1}) + (n_01_t / n_11_t) * sum_{g in stable_1(t)} (Y_{g,t} - Y_{g,t-1}) - sum_{g in leavers(t)} (Y_{g,t} - Y_{g,t-1}) ] Each ``(g, t)`` cell contributes to ``U^G_g`` once per period, with the role weight determined by its ``(D_{g,t-1}, D_{g,t})`` transition. A switching group typically contributes from MULTIPLE periods (its own switch period + every period where it serves as a stable control); a never-switching group contributes only via its stable- control roles (which can be non-zero when it serves as a control for other cohorts' switches). Periods where ``DID_+,t`` or ``DID_-,t`` were zeroed under the A11 convention contribute zero on the affected side, matching the point estimate. Parameters ---------- D_mat, Y_mat, N_mat : np.ndarray of shape (n_groups, n_periods) Pivoted treatment, outcome, and observation-count matrices. n_10_t_arr, n_00_t_arr, n_01_t_arr, n_11_t_arr : np.ndarray Per-period CELL counts aligned to ``periods[1:]``. a11_plus_zeroed_arr, a11_minus_zeroed_arr : np.ndarray of bool Per-period A11-zeroing flags aligned to ``periods[1:]``. side : {"overall", "joiners", "leavers"} Which contribution to compute: - ``"overall"``: returns ``U^G_g`` such that ``U.sum() == N_S * DID_M`` - ``"joiners"``: returns ``U^G_g`` such that ``U.sum() == joiner_total * DID_+`` (only the joiners + stable_0 terms) - ``"leavers"``: returns ``U^G_g`` such that ``U.sum() == leaver_total * DID_-`` (only the leavers + stable_1 terms, with the leavers side's sign convention) Returns ------- U : np.ndarray of shape (n_groups,) Per-group contributions. NOT cohort-centered; the caller is responsible for centering before computing the SE. U_per_period : np.ndarray of shape (n_groups, n_periods) or ``None`` Per-``(g, t)``-cell contributions, attributed to the post- period ``t`` of each transition pair. Satisfies ``U_per_period.sum(axis=1) == U`` exactly. NOT cohort-centered. Used by the cell-period IF allocator in ``_survey_se_from_group_if`` so that PSU/strata that vary across the cells of g can be honored by design-based variance. ``None`` when ``compute_per_period=False`` (the caller has no survey design, so the allocator is not needed and building the dense O(n_groups * n_periods) tensor would be wasted work). """ if side not in ("overall", "joiners", "leavers"): raise ValueError(f"side must be one of overall/joiners/leavers, got {side!r}") n_groups, n_periods = D_mat.shape U = np.zeros(n_groups, dtype=float) U_per_period: Optional[np.ndarray] = ( np.zeros((n_groups, n_periods), dtype=float) if compute_per_period else None ) include_joiners_side = side in ("overall", "joiners") include_leavers_side = side in ("overall", "leavers") # Per-cell attribution convention (not a derivation from the # observation-level survey linearization — see REGISTRY.md # ``ChaisemartinDHaultfoeuille`` Note on survey IF expansion): # attribute each (Y_curr - Y_prev) transition as a single # difference to its post-period cell (g, t_idx). Preserves the # row-sum identity U_per_period.sum(axis=1) == U and therefore # the group-sum invariance that makes the cell expansion # byte-identical to the pre-allocator convention under PSU=group. for t_idx in range(1, n_periods): d_curr = D_mat[:, t_idx] d_prev = D_mat[:, t_idx - 1] y_diff = Y_mat[:, t_idx] - Y_mat[:, t_idx - 1] n_curr = N_mat[:, t_idx] n_prev = N_mat[:, t_idx - 1] present = (n_curr > 0) & (n_prev > 0) joiner_mask = (d_prev == 0) & (d_curr == 1) & present stable0_mask = (d_prev == 0) & (d_curr == 0) & present leaver_mask = (d_prev == 1) & (d_curr == 0) & present stable1_mask = (d_prev == 1) & (d_curr == 1) & present n_10_t = int(n_10_t_arr[t_idx - 1]) n_00_t = int(n_00_t_arr[t_idx - 1]) n_01_t = int(n_01_t_arr[t_idx - 1]) n_11_t = int(n_11_t_arr[t_idx - 1]) # Joiners side (+y_diff for joiners; -(n_10/n_00)*y_diff for stable_0) if ( include_joiners_side and not bool(a11_plus_zeroed_arr[t_idx - 1]) and n_10_t > 0 and n_00_t > 0 ): U[joiner_mask] += y_diff[joiner_mask] U[stable0_mask] -= (n_10_t / n_00_t) * y_diff[stable0_mask] if U_per_period is not None: U_per_period[joiner_mask, t_idx] += y_diff[joiner_mask] U_per_period[stable0_mask, t_idx] -= (n_10_t / n_00_t) * y_diff[stable0_mask] # Leavers side (-y_diff for leavers; +(n_01/n_11)*y_diff for stable_1) if ( include_leavers_side and not bool(a11_minus_zeroed_arr[t_idx - 1]) and n_01_t > 0 and n_11_t > 0 ): U[leaver_mask] -= y_diff[leaver_mask] U[stable1_mask] += (n_01_t / n_11_t) * y_diff[stable1_mask] if U_per_period is not None: U_per_period[leaver_mask, t_idx] -= y_diff[leaver_mask] U_per_period[stable1_mask, t_idx] += (n_01_t / n_11_t) * y_diff[stable1_mask] return U, U_per_period def _cohort_recenter( U: np.ndarray, cohort_ids: np.ndarray, ) -> np.ndarray: """ Subtract cohort-conditional means from U. For each cohort id, computes ``U_bar_k = mean(U[cohort==k])`` and returns ``U - U_bar_{cohort(g)}``. This is the per-group cohort- recentering step from Web Appendix Section 3.7.3 of the dynamic companion paper. Critical: subtracts the cohort mean, NOT a single grand mean — using a grand mean silently produces a smaller, incorrect variance. """ U_centered = U.astype(float).copy() if U.size == 0: return U_centered unique_cohorts = np.unique(cohort_ids) for k in unique_cohorts: in_cohort = cohort_ids == k if in_cohort.any(): U_centered[in_cohort] = U[in_cohort] - U[in_cohort].mean() return U_centered def _cohort_recenter_per_period( U_per_period: np.ndarray, cohort_ids: np.ndarray, ) -> np.ndarray: """ Column-wise cohort recentering of a per-(g, t) IF attribution. For each period column t, subtract the cohort-conditional mean of that column: ``U_centered[g, t] = U[g, t] - mean_{g' in cohort(g)} U[g', t]``. The row sum identity is preserved: sum_t U_centered[g, t] = sum_t U[g, t] - sum_t cohort_mean[cohort(g), t] = U[g] - cohort_mean_of_U[cohort(g)] = U_centered[g] Hence the cell-level Binder TSL aggregation telescopes to the same group-level sum produced by ``_cohort_recenter`` on ``U``, giving byte-identical variance under within-group-constant PSU (old validator's accepted input set) and a per-cell attribution under within-group-varying PSU that follows the library's post-period convention (see the REGISTRY.md Note on survey IF expansion — a formal derivation from the observation-level survey linearization is still open, tracked in ``TODO.md``). """ U_centered = U_per_period.astype(float).copy() if U_per_period.size == 0: return U_centered unique_cohorts = np.unique(cohort_ids) for k in unique_cohorts: in_cohort = cohort_ids == k if in_cohort.any(): # Per-period mean across groups in this cohort col_means = U_per_period[in_cohort].mean(axis=0) U_centered[in_cohort] = U_per_period[in_cohort] - col_means[np.newaxis, :] return U_centered def _compute_cohort_recentered_inputs( D_mat: np.ndarray, Y_mat: np.ndarray, N_mat: np.ndarray, n_10_t_arr: np.ndarray, n_00_t_arr: np.ndarray, n_01_t_arr: np.ndarray, n_11_t_arr: np.ndarray, a11_plus_zeroed_arr: np.ndarray, a11_minus_zeroed_arr: np.ndarray, all_groups: List[Any], singleton_baseline_groups: List[Any], compute_per_period: bool = True, ) -> Tuple[ np.ndarray, # U_centered_overall (n_eligible,) int, # n_groups_for_overall int, # n_cohorts int, # n_groups_dropped_never_switching np.ndarray, # U_centered_joiners np.ndarray, # U_centered_leavers List[Any], # eligible_group_ids Optional[np.ndarray], # U_centered_per_period_overall (n_eligible, n_periods) or None Optional[np.ndarray], # U_centered_per_period_joiners or None Optional[np.ndarray], # U_centered_per_period_leavers or None ]: """ Compute the cohort-centered influence-function vectors for variance. Implements the full ``Lambda^G_{g,l=1}`` weight vector from Section 3.7.2 of the dynamic companion paper (NBER WP 29873) at horizon ``l = 1``: each group's per-period role weights (joiner, stable_0, leaver, stable_1) sum to a per-group ``U^G_g`` value that, summed across groups, recovers ``N_S * DID_M``. Cohorts are defined by the triple ``(D_{g,1}, F_g, S_g)`` where ``F_g`` is the first switch period and ``S_g`` is the switch direction (+1 joiner, -1 leaver, 0 never-switching). Never- switching groups form their own cohorts indexed by baseline only. Per footnote 15 of the dynamic paper (passed in via ``singleton_baseline_groups``), groups whose baseline ``D_{g,1}`` value is unique in the post-drop panel have no cohort peer and are excluded from the variance computation only. They remain in the point-estimate sample as period-based stable controls (this matches Python's documented period-vs-cohort stable-control interpretation; the cell DataFrame entering ``_compute_per_period_dids`` retains them). Returns ------- U_centered_overall : np.ndarray Cohort-centered IF vector for ``DID_M`` over the variance- eligible groups (post-singleton-filter). n_groups_for_overall : int ``U_centered_overall.size`` for sanity-checking by the caller. n_cohorts : int Distinct cohorts in the variance-eligible group set. n_groups_dropped_never_switching : int Count of never-switching groups for results metadata. (They ARE included in the variance computation under the full IF formula because they can have non-zero contributions when serving as stable controls; this count is reported for backwards compatibility with the existing results dataclass field but no longer represents an actual exclusion.) U_centered_joiners : np.ndarray Cohort-centered IF vector for ``DID_+`` (joiners-only side). U_centered_leavers : np.ndarray Cohort-centered IF vector for ``DID_-`` (leavers-only side). """ n_groups, n_periods = D_mat.shape if n_groups == 0: empty_pp: Optional[np.ndarray] = ( np.zeros((0, 0), dtype=float) if compute_per_period else None ) return ( np.array([], dtype=float), 0, 0, 0, np.array([], dtype=float), np.array([], dtype=float), [], empty_pp, empty_pp, empty_pp, ) # Per-group switch metadata via the shared helper (factored out in # Phase 2 so both the cohort-recentered IF path and the multi- # horizon DID_{g,l} path share the same computation). baselines, first_switch_idx, switch_direction, _T_g = _compute_group_switch_metadata( D_mat, N_mat ) n_groups_dropped_never_switching = int((switch_direction == 0).sum()) # Variance-eligibility mask: include all groups EXCEPT singleton- # baseline groups (footnote 15) which have no cohort peer. singleton_baseline_set = set(singleton_baseline_groups) eligible_mask = np.array([g not in singleton_baseline_set for g in all_groups], dtype=bool) # Cohort identification: (D_{g,1}, F_g, S_g) triples for the # variance-eligible group set. Never-switching groups (S_g = 0) # have F_g = -1 and form cohorts indexed by baseline alone. cohort_keys = [ (float(baselines[g]), int(first_switch_idx[g]), int(switch_direction[g])) for g in range(n_groups) ] unique_cohorts: Dict[Tuple[float, int, int], int] = {} cohort_id = np.zeros(n_groups, dtype=int) for g in range(n_groups): if not eligible_mask[g]: cohort_id[g] = -1 continue key = cohort_keys[g] if key not in unique_cohorts: unique_cohorts[key] = len(unique_cohorts) cohort_id[g] = unique_cohorts[key] n_cohorts = len(unique_cohorts) # Compute the full IF vectors + per-period attributions via the # helper. Skip the O(n_groups * n_periods) per-period tensor # allocation when the caller won't use it (no survey design). U_overall_full, U_per_period_overall_full = _compute_full_per_group_contributions( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, n_10_t_arr=n_10_t_arr, n_00_t_arr=n_00_t_arr, n_01_t_arr=n_01_t_arr, n_11_t_arr=n_11_t_arr, a11_plus_zeroed_arr=a11_plus_zeroed_arr, a11_minus_zeroed_arr=a11_minus_zeroed_arr, side="overall", compute_per_period=compute_per_period, ) U_joiners_full, U_per_period_joiners_full = _compute_full_per_group_contributions( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, n_10_t_arr=n_10_t_arr, n_00_t_arr=n_00_t_arr, n_01_t_arr=n_01_t_arr, n_11_t_arr=n_11_t_arr, a11_plus_zeroed_arr=a11_plus_zeroed_arr, a11_minus_zeroed_arr=a11_minus_zeroed_arr, side="joiners", compute_per_period=compute_per_period, ) U_leavers_full, U_per_period_leavers_full = _compute_full_per_group_contributions( D_mat=D_mat, Y_mat=Y_mat, N_mat=N_mat, n_10_t_arr=n_10_t_arr, n_00_t_arr=n_00_t_arr, n_01_t_arr=n_01_t_arr, n_11_t_arr=n_11_t_arr, a11_plus_zeroed_arr=a11_plus_zeroed_arr, a11_minus_zeroed_arr=a11_minus_zeroed_arr, side="leavers", compute_per_period=compute_per_period, ) # Restrict to variance-eligible groups (drop singleton-baseline groups) U_overall = U_overall_full[eligible_mask] U_joiners = U_joiners_full[eligible_mask] U_leavers = U_leavers_full[eligible_mask] cohort_id_eligible = cohort_id[eligible_mask] # Cohort-recenter each IF vector (group level for plug-in path) U_centered_overall = _cohort_recenter(U_overall, cohort_id_eligible) U_centered_joiners = _cohort_recenter(U_joiners, cohort_id_eligible) U_centered_leavers = _cohort_recenter(U_leavers, cohort_id_eligible) # Per-period cohort recentering for the cell-period survey allocator. # Column-wise centering preserves sum_t U_centered_pp[g, t] = # U_centered[g], so Binder TSL PSU-level aggregation telescopes to # the same group-level sum under within-group-constant PSU (byte- # identical to the old allocator). Under within-group-varying PSU # the per-cell attribution follows the library's post-period # convention — a formal derivation from the observation-level # survey linearization is an open question tracked in TODO.md. # Skip entirely when the 2D tensor was not computed. if U_per_period_overall_full is not None: U_centered_pp_overall: Optional[np.ndarray] = _cohort_recenter_per_period( U_per_period_overall_full[eligible_mask], cohort_id_eligible ) else: U_centered_pp_overall = None if U_per_period_joiners_full is not None: U_centered_pp_joiners: Optional[np.ndarray] = _cohort_recenter_per_period( U_per_period_joiners_full[eligible_mask], cohort_id_eligible ) else: U_centered_pp_joiners = None if U_per_period_leavers_full is not None: U_centered_pp_leavers: Optional[np.ndarray] = _cohort_recenter_per_period( U_per_period_leavers_full[eligible_mask], cohort_id_eligible ) else: U_centered_pp_leavers = None # Eligible group IDs for survey IF expansion eligible_group_ids = [all_groups[g] for g in range(n_groups) if eligible_mask[g]] return ( U_centered_overall, U_centered_overall.size, n_cohorts, n_groups_dropped_never_switching, U_centered_joiners, U_centered_leavers, eligible_group_ids, U_centered_pp_overall, U_centered_pp_joiners, U_centered_pp_leavers, ) def _plugin_se(U_centered: np.ndarray, divisor: int) -> float: """ Compute the cohort-recentered plug-in standard error. Implements ``SE = sqrt(sum_g U_centered[g]^2 / N_l) / sqrt(N_l)``, which is the simplified form of Section 3.7.3's plug-in formula after the cohort recentering has been applied to ``U_centered``. The plain ``(1/N_l) * sum_g U_centered^2 / N_l`` form gives the variance; we take its square root for the SE. Returns ``NaN`` in three degenerate cases: 1. ``U_centered`` is empty (no variance-eligible groups). 2. ``divisor <= 0`` (no switching cells in N_S). 3. ``sum(U_centered**2) <= 0`` — every cohort is a singleton, so cohort recentering produces an identically-zero centered IF vector and the variance is unidentified. The caller should detect this case (NaN return + non-empty input) and emit a user-facing warning explaining the degenerate-cohort condition. Returning ``NaN`` rather than ``0.0`` prevents the silently implies-infinite-precision failure mode. """ n = U_centered.size if n == 0 or divisor <= 0: return float("nan") sum_sq = float((U_centered**2).sum()) if sum_sq <= 0: # Degenerate-cohort case: every cohort is a singleton, so # cohort recentering produces all zeros. The variance is # unidentified — return NaN rather than 0.0 so downstream # inference is NaN-consistent and the caller surfaces a # warning. See the **Note** in REGISTRY.md # ChaisemartinDHaultfoeuille. return float("nan") sigma_hat_sq = sum_sq / divisor if not np.isfinite(sigma_hat_sq) or sigma_hat_sq < 0: return float("nan") return float(np.sqrt(sigma_hat_sq) / np.sqrt(divisor)) def _strata_psu_vary_within_group( resolved: Any, data: pd.DataFrame, group_col: str, survey_weights: Optional[np.ndarray], ) -> Tuple[bool, bool]: """Return (strata_varies_within_group, psu_varies_within_group). Diagnostic helper. No longer used to gate any out-of-scope combination in ``fit()`` — the analytical TSL path, the heterogeneity WLS path, and the PSU-level wild multiplier bootstrap all support within-group-varying PSU/strata via the cell-period allocator. The helper is retained for callers that need to branch on the regime (e.g., documentation, diagnostic warnings). Zero-weight rows are excluded (subpopulation contract). """ if resolved is None: return False, False pos_mask = np.asarray(survey_weights) > 0 if not pos_mask.any(): return False, False g_eff = np.asarray(data[group_col].values)[pos_mask] strata_varies = False psu_varies = False if resolved.strata is not None: s_eff = np.asarray(resolved.strata)[pos_mask] strata_varies = bool( pd.DataFrame({"g": g_eff, "s": s_eff}).groupby("g")["s"].nunique().gt(1).any() ) if resolved.psu is not None: p_eff = np.asarray(resolved.psu)[pos_mask] psu_varies = bool( pd.DataFrame({"g": g_eff, "p": p_eff}).groupby("g")["p"].nunique().gt(1).any() ) return strata_varies, psu_varies def _validate_cell_constant_strata_psu( resolved: Any, data: pd.DataFrame, group_col: str, time_col: str, survey_weights: Optional[np.ndarray], ) -> None: """Reject survey designs where strata or PSU vary within a (g, t) cell. Under the cell-period IF allocator, ``psi_i`` attributes each observation's mass to its ``(g, t)`` cell scaled by proportional weight share: ``psi_i = U_centered_per_period[g, t] * (w_i / W_{g, t})``. Binder TSL aggregates at PSU level, so strata / PSU labels within a cell must be unambiguous. In one-obs-per-cell panels (the canonical dCDH structure) this check is trivially satisfied; in multi-obs-per-cell panels, a cell split across strata or PSUs is rejected. This is a strict relaxation of the old within-group constancy rule shipped before the cell-period allocator (REGISTRY.md ``ChaisemartinDHaultfoeuille`` Note on survey IF expansion). Zero-weight rows are excluded (subpopulation contract). """ if resolved is None: return pos_mask = np.asarray(survey_weights) > 0 if not pos_mask.any(): return g_eff = np.asarray(data[group_col].values)[pos_mask] t_eff = np.asarray(data[time_col].values)[pos_mask] if resolved.strata is not None: s_eff = np.asarray(resolved.strata)[pos_mask] df_cell = pd.DataFrame({"g": g_eff, "t": t_eff, "s": s_eff}) varying = df_cell.groupby(["g", "t"])["s"].nunique() bad = varying[varying > 1] if len(bad) > 0: raise ValueError( f"ChaisemartinDHaultfoeuille survey support requires " f"strata to be constant within each (group, time) cell " f"under the cell-period IF allocator, but {len(bad)} " f"cell(s) have multiple strata (examples: " f"{bad.index.tolist()[:5]}). The cell allocator treats " f"the (g, t) cell as the effective sampling unit for " f"stratified Binder variance; within-cell stratum " f"variation is ambiguous. The allocator DOES support " f"strata that vary across cells of the same group " f"(relaxation over the previous within-group constancy " f"rule shipped in earlier releases)." ) if resolved.psu is not None: p_eff = np.asarray(resolved.psu)[pos_mask] df_cell = pd.DataFrame({"g": g_eff, "t": t_eff, "p": p_eff}) varying = df_cell.groupby(["g", "t"])["p"].nunique() bad = varying[varying > 1] if len(bad) > 0: raise ValueError( f"ChaisemartinDHaultfoeuille survey support requires " f"PSU to be constant within each (group, time) cell " f"under the cell-period IF allocator, but {len(bad)} " f"cell(s) have multiple PSUs (examples: " f"{bad.index.tolist()[:5]}). The cell allocator treats " f"the (g, t) cell as the effective sampling unit; " f"within-cell PSU variation is ambiguous. The allocator " f"DOES support PSU that varies across cells of the same " f"group (relaxation over the previous within-group " f"constancy rule shipped in earlier releases)." ) def _refresh_path_inference( path_effects: Optional[Dict[Tuple[int, ...], Dict[str, Any]]], path_placebos: Optional[Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]], alpha: float, df_final: Optional[int], ) -> None: """Refresh per-path inference fields (t_stat, p_value, conf_int) with the final ``df_final`` so they agree with the global surfaces and ``results.survey_metadata.df_survey`` after all replicate-weight ``n_valid`` appends complete. Per-path event-study and placebo helpers compute inference using a snapshot of ``_replicate_n_valid_list`` taken at fit-time BEFORE they append their own ``n_valid`` contributions. The final R2 P1b block in ``fit()`` already refreshes the global surfaces (overall / joiners / leavers / multi_horizon_inference / placebo_horizon_inference / heterogeneity / normalized) with the final df; this helper is its per-path counterpart, called from the same final block. No-op under TSL (analytical) or non-survey fits — they skip replicate-n_valid bookkeeping entirely. Mutates dicts in place. """ from diff_diff.utils import safe_inference def _refresh_entry(entry: Dict[str, Any]) -> None: eff = entry.get("effect") se = entry.get("se") if eff is None or se is None: return if not np.isfinite(se): return t_new, p_new, ci_new = safe_inference(eff, se, alpha=alpha, df=df_final) entry["t_stat"] = t_new entry["p_value"] = p_new entry["conf_int"] = ci_new if path_effects is not None: for path_data in path_effects.values(): horizons = path_data.get("horizons", {}) for entry in horizons.values(): _refresh_entry(entry) if path_placebos is not None: for path_horizons in path_placebos.values(): for entry in path_horizons.values(): _refresh_entry(entry) def _inference_df( effective_df: Optional[int], resolved_survey: Any, ) -> Optional[int]: """Coerce an effective-df value for use in ``safe_inference(df=...)``. ``_effective_df_survey()`` returns ``None`` when the replicate design's base df is undefined (QR-rank ≤ 1) or when the reduced df would be < 1. ``safe_inference`` treats ``df=None`` as the standard normal (z-inference) and only returns NaN for ``df <= 0``. Under a replicate design, "undefined effective df" MUST map to NaN inference per REGISTRY.md — NOT to z-inference. Returns: - ``effective_df`` when defined (t-inference with that df). - ``0`` when ``effective_df is None`` AND ``resolved_survey`` uses replicate variance — forces ``safe_inference`` to return NaN t/p/CI via its ``df <= 0`` branch. - ``None`` otherwise (no survey, or TSL design where the resolver always returns an int df — z-inference is never reachable here). """ if effective_df is not None: return effective_df if resolved_survey is None: return None if getattr(resolved_survey, "uses_replicate_variance", False): return 0 return None def _effective_df_survey( resolved_survey: Any, replicate_n_valid_list: List[int], ) -> Optional[int]: """Compute the effective ``df_survey`` for t-critical values. Under replicate-weight designs, each IF site returns an ``n_valid`` count (number of replicate columns that produced a finite estimate). This helper **reduces** — never overwrites — the design-level ``resolved_survey.df_survey`` (which for replicate designs is ``QR-rank - 1`` per R's ``survey::degf()`` convention; see ``diff_diff/survey.py:590``). Returns ``min(resolved_survey.df_survey, min(n_valid) - 1)`` when both are defined. Matches the precedent in ``diff_diff/efficient_did.py:1133-1135`` and ``diff_diff/triple_diff.py:676-686`` (both reduce, not replace). Under TSL (analytical) or non-survey fits, ``replicate_n_valid_list`` is empty and this falls through to the design-level df. Returns ``None`` when either the base df is undefined (rank ≤ 1) or the reduced df would be < 1 — preserving the NaN-inference contract from ``safe_inference``. """ if resolved_survey is None: return None base_df = resolved_survey.df_survey if not replicate_n_valid_list: return base_df reduced_df = int(min(replicate_n_valid_list)) - 1 if base_df is None or reduced_df < 1: return None return min(int(base_df), reduced_df) def _compute_se( U_centered: np.ndarray, divisor: int, obs_survey_info: Optional[dict], eligible_groups: Optional[list] = None, U_centered_per_period: Optional[np.ndarray] = None, ) -> Tuple[float, Optional[int]]: """Dispatch to plug-in SE or survey-design-aware SE. When ``obs_survey_info`` is ``None``, falls back to the simple plug-in formula. Otherwise, expands group-level IFs and delegates to TSL variance (or replicate-weight variance when the resolved design carries replicate weights). ``U_centered_per_period`` is the cell-period attribution of ``U_centered``. Each row ``g`` satisfies ``U_centered_per_period[g, :].sum() == U_centered[g]``. When supplied, the survey-aware path expands influence to observations using the cell-level allocator ``psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})``; when ``None`` the path falls back to the group-level allocator ``psi_i = U_centered[g_i] * (w_i / W_{g_i})``. Byte-identical under within-group-constant PSU since both allocators telescope to the same PSU-level sum (REGISTRY.md survey IF expansion Note). Returns ``(se, n_valid_replicates)``. ``n_valid_replicates`` is ``None`` under the plug-in / TSL paths, and the number of valid replicate columns under the replicate path. """ if obs_survey_info is None: return _plugin_se(U_centered=U_centered, divisor=divisor), None if eligible_groups is None: return _plugin_se(U_centered=U_centered, divisor=divisor), None if divisor <= 0: return float("nan"), None # dCDH IFs are numerator-scale (U.sum() == N_S * DID_M). # compute_survey_if_variance() expects estimator-scale psi. # Scale by 1/divisor to normalize before survey expansion. U_scaled = U_centered / divisor U_pp_scaled = U_centered_per_period / divisor if U_centered_per_period is not None else None return _survey_se_from_group_if( U_centered=U_scaled, eligible_groups=eligible_groups, obs_survey_info=obs_survey_info, U_centered_per_period=U_pp_scaled, ) def _survey_se_from_group_if( U_centered: np.ndarray, eligible_groups: list, obs_survey_info: dict, U_centered_per_period: Optional[np.ndarray] = None, ) -> Tuple[float, Optional[int]]: """Compute survey-design-aware SE from per-group / per-cell centered IFs. Expands influence-function mass to observation level using the **cell-period allocator** ``psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i})`` when ``U_centered_per_period`` is supplied, or the legacy group- level allocator ``psi_i = U_centered[g_i] * (w_i / W_{g_i})`` otherwise. Both allocators are equivalent at the PSU-level sum under within-group-constant PSU (per-cell sums telescope to ``U_centered[g]``), so Binder TSL variance is byte-identical on inputs the old within-group-constant validator accepted. The cell- period allocator additionally supports PSU/strata that vary across cells of g — the lift the allocator buys for Stage 2. Dispatches to ``compute_survey_if_variance()`` (TSL) or ``compute_replicate_if_variance()`` (BRR/Fay/JK1/JKn/SDR) based on the resolved design. See REGISTRY.md ``ChaisemartinDHaultfoeuille`` Note on survey IF expansion for the contract. Parameters ---------- U_centered : np.ndarray Cohort-recentered per-group IF values (one per eligible group). Used for degenerate-cohort detection and fallback expansion. eligible_groups : list Group IDs corresponding to entries of ``U_centered`` / ``U_centered_per_period`` (variance-eligible groups after singleton-baseline exclusion). obs_survey_info : dict Observation-level survey data retained from ``fit()`` setup: ``{"group_ids", "time_ids", "weights", "resolved", "periods"}``. ``time_ids`` and ``periods`` are required for the cell-level allocator; when either is absent, the group-level allocator is used. U_centered_per_period : np.ndarray of shape (n_eligible, n_periods), optional Cohort-recentered per-``(g, t)``-cell IF attributions, with ``U_centered_per_period.sum(axis=1) == U_centered``. When supplied, the cell allocator is used. Returns ------- Tuple[float, Optional[int]] ``(se, n_valid_replicates)``. ``se`` is the survey-design-aware SE, or NaN if degenerate. ``n_valid_replicates`` is the number of valid replicate columns used for variance under the replicate path, or ``None`` under the TSL (analytical) path. """ from diff_diff.survey import ( compute_replicate_if_variance, compute_survey_if_variance, ) # Degenerate-cohort contract (mirror _plugin_se): when the centered # IF is empty or every cohort is a singleton (→ recentered IF is # identically zero), the variance is unidentified. Return NaN # rather than sqrt(0)=0 so the fit-time warning fires and # inference stays NaN-consistent across every surface routed # through _compute_se (overall, joiners/leavers, multi-horizon # ATT, placebos, normalized/cumulated, heterogeneity). if U_centered.size == 0: return float("nan"), None if float((U_centered**2).sum()) <= 0: return float("nan"), None group_ids = obs_survey_info["group_ids"] weights = obs_survey_info["weights"] resolved = obs_survey_info["resolved"] time_ids = obs_survey_info.get("time_ids") periods = obs_survey_info.get("periods") # Zero-weight rows are out-of-sample (SurveyDesign.subpopulation()). # Skip them before the group-ID factorization so NaN / non-comparable # group IDs on excluded rows cannot crash np.unique. psi stays full- # length with zeros in excluded positions so the alignment with # resolved.strata / resolved.psu inside compute_survey_if_variance # is preserved. weights_arr = np.asarray(weights, dtype=np.float64) pos_mask = weights_arr > 0 n_obs = len(weights_arr) psi = np.zeros(n_obs, dtype=np.float64) if not pos_mask.any(): return float("nan"), None gids_eff = np.asarray(group_ids)[pos_mask] w_eff = weights_arr[pos_mask] use_cell_allocator = ( U_centered_per_period is not None and time_ids is not None and periods is not None and U_centered_per_period.size > 0 ) # Binder TSL fallback: when the cell allocator would apply but # PSU is within-group-constant (PSU=group, strictly-coarser PSU # within-group-constant, or the auto-inject default), the cell # and group allocators are equivalent at PSU-level aggregation # via the row-sum identity `sum_{c in g} u_cell[c] == u_centered[g]`. # Prefer the legacy group-level path in that regime: it sidesteps # the sentinel-mass guard (below) that would otherwise fire # spuriously on terminally-missing panels whose PSU structure # does not require cell-level resolution. Matches the bootstrap # dispatcher's routing rule (`_compute_dcdh_bootstrap` + # `_psu_varies_within_group`). # # **Replicate-weight carve-out:** replicate designs do not carry # `resolved.psu`, but the Class A replicate contract shipped in # PR #323 applies `compute_replicate_if_variance` to the cell- # allocator `psi_obs` — per-row-varying replicate ratios are # allocator-sensitive, so forcing replicate fits onto the legacy # group-level allocator would silently change their SE. The # fallback therefore skips replicate fits; the sentinel-mass # guard still fires on mass leakage when it applies. The Class # A replicate allocator contract is asserted by # `tests/test_survey_dcdh_replicate_psu.py::TestReplicateClassA:: # test_att_cell_allocator_with_varying_replicate_ratios`. if use_cell_allocator and not getattr(resolved, "uses_replicate_variance", False): psu_arr = getattr(resolved, "psu", None) if psu_arr is None: use_cell_allocator = False else: psu_eff = np.asarray(psu_arr)[pos_mask] eligible_set = set(eligible_groups) elig_row_mask = np.array([g in eligible_set for g in gids_eff], dtype=bool) if elig_row_mask.any(): psu_varies_within = bool( pd.DataFrame( { "g": gids_eff[elig_row_mask], "p": psu_eff[elig_row_mask], } ) .groupby("g")["p"] .nunique() .gt(1) .any() ) if not psu_varies_within: use_cell_allocator = False else: use_cell_allocator = False if use_cell_allocator: tids_eff = np.asarray(time_ids)[pos_mask] # Map row's group to an index in eligible_groups (−1 when the # group is ineligible — singleton-baseline exclusion drops it). eligible_idx_by_gid = {gid: i for i, gid in enumerate(eligible_groups)} elig_idx_eff = np.array( [eligible_idx_by_gid.get(gid, -1) for gid in gids_eff], dtype=np.int64, ) # Map row's time to a column index in U_centered_per_period. # get_indexer returns −1 for values absent from `periods` # (should be rare in practice — positive-weight rows almost # always survive cell aggregation; if one slips through we # treat it like an ineligible row rather than crash). col_idx_eff = np.asarray(pd.Index(periods).get_indexer(tids_eff), dtype=np.int64) valid_cell = (elig_idx_eff >= 0) & (col_idx_eff >= 0) n_eligible = len(eligible_groups) n_periods_pp = U_centered_per_period.shape[1] # Per-(g, t) weight totals W_{g,t} over the effective sample. W_cell = np.zeros((n_eligible, n_periods_pp), dtype=np.float64) np.add.at( W_cell, (elig_idx_eff[valid_cell], col_idx_eff[valid_cell]), w_eff[valid_cell], ) # Sentinel-mass guard (mirror of `_unroll_target_to_cells` on # the bootstrap path). Under terminal missingness, # `_cohort_recenter_per_period` subtracts cohort column means # across the full period grid, so a group with no observation # at period t can acquire non-zero centered mass at that cell. # The cell-level expansion `psi_i = U[g,t] * (w_i / W_{g,t})` # has no observation to attach that mass to (W_{g,t} = 0), so # silently dropping it would understate the SE. Raise a # targeted ValueError instead (consistent with the cell-level # bootstrap's `_unroll_target_to_cells` guard). missing_cell_mask = W_cell == 0 if missing_cell_mask.any(): leaked = U_centered_per_period[missing_cell_mask] if leaked.size > 0 and bool(np.any(np.abs(leaked) > 1e-12)): # Branch the wording on replicate-vs-TSL so replicate # ATT users debugging this error are not misdirected # to "within-group-varying PSU" (replicate designs # have no PSU structure). Core mechanics and the # pre-processing workaround are shared. is_replicate = bool(getattr(resolved, "uses_replicate_variance", False)) path_name = ( "Rao-Wu replicate-weight ATT variance" if is_replicate else "Analytical Binder TSL survey SE" ) trigger_detail = ( "terminal missingness on a replicate-weight " "design (replicate ATT unconditionally uses the " "cell-period allocator per the Class A contract; " "PSU structure is not involved)" if is_replicate else "terminal missingness combined with within-" "group-varying PSU (the Binder TSL cell-period " "allocator cannot allocate leaked mass to any " "observation)" ) raise ValueError( f"{path_name} cannot be computed on this panel: " "cohort-recentered IF mass landed on (g, t) cells " "with no positive-weight observations " f"(W_{{g, t}} = 0). This typically occurs with " f"{trigger_detail}: _cohort_recenter_per_period " "subtracts cohort column means across the full " "period grid, so a group with no observation at " "period t acquires non-zero centered mass there, " "which the cell-period allocator cannot allocate " "to any observation. Pre-process the panel to " "remove terminal missingness (drop late-exit " "groups or trim to a balanced sub-panel) before " "fitting." ) # Lookup U_centered_per_period and W_cell per row. u_obs_cell = np.zeros(w_eff.shape[0], dtype=np.float64) u_obs_cell[valid_cell] = U_centered_per_period[ elig_idx_eff[valid_cell], col_idx_eff[valid_cell] ] w_cell_at_row = np.zeros(w_eff.shape[0], dtype=np.float64) w_cell_at_row[valid_cell] = W_cell[elig_idx_eff[valid_cell], col_idx_eff[valid_cell]] safe_w = np.where(w_cell_at_row > 0, w_cell_at_row, 1.0) psi[pos_mask] = u_obs_cell * (w_eff / safe_w) else: # Legacy group-level allocator (no per-period attribution # provided, or time/period info unavailable). Preserved for # defensive fallback and for unit tests that exercise the # legacy allocator. No current caller in fit() uses this # branch — ATT / joiners / leavers / placebos all thread # U_centered_per_period, and heterogeneity (as of PR 3) # constructs its own cell-period psi_obs and calls # compute_survey_if_variance directly. group_to_u = {gid: U_centered[idx] for idx, gid in enumerate(eligible_groups)} u_obs_eff = np.array([group_to_u.get(gid, 0.0) for gid in gids_eff]) unique_gids, inverse = np.unique(gids_eff, return_inverse=True) w_totals_per_group = np.bincount(inverse, weights=w_eff) w_obs_total_eff = w_totals_per_group[inverse] safe_w = np.where(w_obs_total_eff > 0, w_obs_total_eff, 1.0) psi[pos_mask] = u_obs_eff * (w_eff / safe_w) # Dispatch: replicate variance (BRR/Fay/JK1/JKn/SDR) vs TSL. # Mirrors the inline branch in TripleDifference:1206-1238 and # EfficientDiD:1119-1142. Under replicate, returns (variance, n_valid); # the caller propagates min(n_valid) to df_survey. if getattr(resolved, "uses_replicate_variance", False): variance, n_valid = compute_replicate_if_variance(psi, resolved) if not np.isfinite(variance) or variance < 0: return float("nan"), n_valid return float(np.sqrt(variance)), n_valid variance = compute_survey_if_variance(psi, resolved) if not np.isfinite(variance) or variance < 0: return float("nan"), None return float(np.sqrt(variance)), None def _build_group_time_design( cell: pd.DataFrame, group_col: str, time_col: str, ) -> Tuple[np.ndarray, List[str]]: """ Build a dense (intercept + group dummies + time dummies) design matrix. Used by the TWFE decomposition diagnostic. The first group and first period are dropped as the reference categories. Returns the matrix and a list of column names. """ if cell.empty: raise ValueError( "Cannot compute TWFE diagnostic on an empty cell DataFrame. " "Provide a panel with at least 2 groups and 2 time periods." ) groups = sorted(cell[group_col].unique().tolist()) times = sorted(cell[time_col].unique().tolist()) n = len(cell) n_groups = len(groups) n_times = len(times) if n_groups < 2 or n_times < 2: raise ValueError( f"TWFE diagnostic requires at least 2 groups and 2 time periods, " f"got {n_groups} group(s) and {n_times} period(s)." ) # Columns: [intercept, group_1, ..., group_{G-1}, time_1, ..., time_{T-1}] n_cols = 1 + (n_groups - 1) + (n_times - 1) X = np.zeros((n, n_cols), dtype=float) X[:, 0] = 1.0 # intercept group_to_col = {g: 1 + i for i, g in enumerate(groups[1:])} time_to_col = {t: 1 + (n_groups - 1) + i for i, t in enumerate(times[1:])} group_arr = cell[group_col].to_numpy() time_arr = cell[time_col].to_numpy() for i in range(n): g = group_arr[i] t = time_arr[i] if g in group_to_col: X[i, group_to_col[g]] = 1.0 if t in time_to_col: X[i, time_to_col[t]] = 1.0 column_names = ( ["intercept"] + [f"group[{g}]" for g in groups[1:]] + [f"time[{t}]" for t in times[1:]] ) return X, column_names def _compute_twfe_diagnostic( cell: pd.DataFrame, group_col: str, time_col: str, rank_deficient_action: str, ) -> TWFEWeightsResult: """ Compute the per-cell TWFE decomposition diagnostic from Theorem 1 of AER 2020. Steps: 1. Regress ``d_gt`` on group + time fixed effects via :func:`solve_ols`. 2. Compute residuals ``eps_{g, t}`` from the regression. 3. Compute per-cell **contribution weights** (the Theorem 1 decomposition object): ``cw_{g,t} = N_{g,t} * eps_{g,t} / sum_{treated} N * eps`` These are exported in the ``weights`` column of the returned ``TWFEWeightsResult``. 4. Count negative contribution weights among treated cells. 5. Compute the plain TWFE coefficient as a separate regression of ``y_gt`` on the same FE plus the treatment indicator. 6. Compute ``sigma_fe`` from the Corollary 1 **paper weights** (a distinct object from the contribution weights): ``w_paper = eps / sum_treated(s * eps)`` where ``s = N_{g,t} / N_1`` are observation shares. The paper weight is centered at 1 under observation-share weighting. Then: ``sigma_fe = |beta_fe| / sqrt(sum_treated(s * (w_paper - 1)^2))`` which is the smallest standard deviation of cell-level treatment effects that could flip the sign of the plain TWFE estimator. """ X, _ = _build_group_time_design(cell, group_col, time_col) d_arr = cell["d_gt"].to_numpy().astype(float) # Cell weight for Theorem 1: under survey_design, survey-weighted # cell totals (w_gt) replace raw cell counts (n_gt) so the FE # regressions, normalization denominator, and Corollary 1 shares # match the observation-level pweighted TWFE estimand. Without # survey_design (w_gt column absent), fall back to n_gt — the # non-survey path is unchanged. if "w_gt" in cell.columns: cell_weight = cell["w_gt"].to_numpy().astype(float) else: cell_weight = cell["n_gt"].to_numpy().astype(float) y_arr = cell["y_gt"].to_numpy().astype(float) # Step 1-2: regress d on FE coef_d, residuals_d, _ = solve_ols( X, d_arr, return_vcov=False, rank_deficient_action=rank_deficient_action, weights=cell_weight, ) eps = residuals_d # Step 3: per-cell weights — normalize by sum over treated cells treated_mask = d_arr == 1 denom = float((cell_weight[treated_mask] * eps[treated_mask]).sum()) if denom == 0: # Cannot normalize: the design has zero treated mass after FE absorption. # Warn so the user knows the diagnostic returned NaN values rather than # silently substituting them. warnings.warn( "TWFE decomposition diagnostic could not normalize per-cell " "weights: the sum of N_{g,t} * residual over treated cells is " "zero. This typically means the design matrix has perfect " "collinearity between treatment and the group/period fixed " "effects. Returning NaN for fraction_negative, sigma_fe, and " "beta_fe.", UserWarning, stacklevel=3, ) weights_df = cell[[group_col, time_col]].copy() weights_df["weight"] = 0.0 return TWFEWeightsResult( weights=weights_df, fraction_negative=float("nan"), sigma_fe=float("nan"), beta_fe=float("nan"), ) contribution_weights = (cell_weight * eps) / denom weights_df = cell[[group_col, time_col]].copy() weights_df["weight"] = contribution_weights fraction_negative = float((contribution_weights[treated_mask] < 0).sum() / treated_mask.sum()) # Step 5: plain TWFE regression of y on (FE + d_gt) X_with_d = np.column_stack([X, d_arr.reshape(-1, 1)]) coef_fe, _, _ = solve_ols( X_with_d, y_arr, return_vcov=False, rank_deficient_action=rank_deficient_action, weights=cell_weight, ) beta_fe = float(coef_fe[-1]) # Step 6: sigma_fe per Corollary 1 of AER 2020 # # The paper defines w_{g,t} = eps_{g,t} / E_treated[eps], which # is DIFFERENT from the contribution weights w_gt exported in the # weights DataFrame (contribution_weight = s * w_paper). The paper # weight has the property that sum(s * w_paper) = 1 (centered at # 1 under observation-share weighting). sigma_fe uses the paper # weight: # # w_paper = eps / sum_treated(s * eps) # sigma(w) = sqrt(sum_treated(s * (w_paper - 1)^2)) # sigma_fe = |beta_fe| / sigma(w) # # where s_{g,t} = N_{g,t} / N_1 are observation shares (under # survey_design, cell_weight is w_gt so shares are effective- # weight shares; non-survey path is byte-identical). eps_treated = eps[treated_mask] w_treated_arr = cell_weight[treated_mask] w1 = float(w_treated_arr.sum()) # total treated weight (N_1 or W_1) if w1 > 0: shares = w_treated_arr / w1 # s_{g,t} = w_{g,t} / w_1 denom_paper = float((shares * eps_treated).sum()) if abs(denom_paper) > 0: w_paper = eps_treated / denom_paper # paper's w_{g,t} # Weighted variance around 1 (the weighted mean of w_paper is 1 by construction) var_w = float((shares * (w_paper - 1.0) ** 2).sum()) else: var_w = 0.0 else: var_w = 0.0 if var_w > 0 and np.isfinite(beta_fe): sigma_fe = float(abs(beta_fe) / np.sqrt(var_w)) else: sigma_fe = float("nan") return TWFEWeightsResult( weights=weights_df, fraction_negative=fraction_negative, sigma_fe=sigma_fe, beta_fe=beta_fe, ) # ============================================================================= # Convenience functions # =============================================================================
[docs] def chaisemartin_dhaultfoeuille( data: pd.DataFrame, outcome: str, group: str, time: str, treatment: str, **kwargs: Any, ) -> ChaisemartinDHaultfoeuilleResults: """ One-shot convenience wrapper around :class:`ChaisemartinDHaultfoeuille`. Equivalent to:: ChaisemartinDHaultfoeuille(**init_kwargs).fit( data, outcome=..., group=..., time=..., treatment=..., **fit_kwargs, ) All keyword arguments are split between ``__init__`` and ``fit`` based on which signature accepts them. Useful for one-line use in scripts. Parameters ---------- data : pd.DataFrame outcome, group, time, treatment : str **kwargs : Any Forwarded to ``ChaisemartinDHaultfoeuille.__init__`` or ``.fit()`` based on parameter name. Returns ------- ChaisemartinDHaultfoeuilleResults """ import inspect init_keys = { name for name, p in inspect.signature(ChaisemartinDHaultfoeuille.__init__).parameters.items() if p.kind not in (p.VAR_POSITIONAL, p.VAR_KEYWORD) and name != "self" } init_kwargs = {k: v for k, v in kwargs.items() if k in init_keys} fit_kwargs = {k: v for k, v in kwargs.items() if k not in init_keys} est = ChaisemartinDHaultfoeuille(**init_kwargs) return est.fit( data, outcome=outcome, group=group, time=time, treatment=treatment, **fit_kwargs, )
[docs] def twowayfeweights( data: pd.DataFrame, outcome: str, group: str, time: str, treatment: str, rank_deficient_action: str = "warn", survey_design: Any = None, ) -> TWFEWeightsResult: """ Standalone TWFE decomposition diagnostic. Computes the per-cell weights, fraction negative, and ``sigma_fe`` from Theorem 1 of de Chaisemartin & D'Haultfoeuille (2020), without fitting the full dCDH estimator. Mirrors the standalone Stata ``twowayfeweights`` package. Parameters ---------- data : pd.DataFrame Individual-level panel. outcome : str group : str time : str treatment : str rank_deficient_action : str, default="warn" Action when the FE design matrix is rank-deficient. survey_design : SurveyDesign, optional If provided, cell aggregation uses survey-weighted cell means (matching ``fit(..., survey_design=sd).twfe_*``). Required to preserve fit-vs-helper parity under survey-backed inputs. Only ``weight_type='pweight'`` is supported; other types raise ValueError. Replicate-weight designs (BRR/Fay/JK1/JKn/SDR) are accepted — the TWFE diagnostic has no SE field on ``TWFEWeightsResult``, so replicate weights only affect the cell aggregation path (aggregated numbers are identical to ``fit(..., survey_design=sd).twfe_*`` under the same input). Returns ------- TWFEWeightsResult Object with attributes ``weights`` (DataFrame), ``fraction_negative`` (float), ``sigma_fe`` (float), and ``beta_fe`` (float). """ # Survey resolution (optional): mirrors the fit() path so that the # standalone helper produces identical numbers to fit(..., survey_design=sd). survey_weights = None if survey_design is not None: from diff_diff.survey import _resolve_survey_for_fit resolved, survey_weights, _, _ = _resolve_survey_for_fit(survey_design, data, "analytical") if resolved is not None and resolved.weight_type != "pweight": raise ValueError( f"twowayfeweights() survey support requires " f"weight_type='pweight', got '{resolved.weight_type}'. " f"The TWFE diagnostic under survey uses survey-weighted cell " f"means; other weight types are not supported." ) # Replicate-weight designs are accepted. TWFE diagnostic has no # SE field on TWFEWeightsResult — the replicate weights only # participate through the full-sample weight (resolved.weights), # which drives survey-weighted cell aggregation in # _validate_and_aggregate_to_cells. Diagnostic numbers # (beta_fe, sigma_fe, fraction_negative) match # fit(..., survey_design=sd).twfe_* under replicate input. # Validation + cell aggregation via the same helper used by # ChaisemartinDHaultfoeuille.fit() — enforces NaN/binary/within-cell # rules from REGISTRY.md so the standalone diagnostic does not # silently mishandle malformed input. cell = _validate_and_aggregate_to_cells( data=data, outcome=outcome, group=group, time=time, treatment=treatment, weights=survey_weights, ) # TWFE diagnostic assumes binary treatment (d_arr == 1 for treated mask). if not set(cell["d_gt"].unique()).issubset({0.0, 1.0, 0, 1}): raise ValueError( "twowayfeweights() requires binary treatment {0, 1}. " "Non-binary treatment is supported by fit() with L_max >= 1 " "but the TWFE diagnostic (Theorem 1 of AER 2020) assumes " "binary treatment." ) return _compute_twfe_diagnostic( cell=cell, group_col=group, time_col=time, rank_deficient_action=rank_deficient_action, )