diff_diff.HeterogeneousAdoptionDiD#

class diff_diff.HeterogeneousAdoptionDiD[source]#

Bases: object

Heterogeneous Adoption Difference-in-Differences estimator.

Implements de Chaisemartin, Ciccia, D’Haultfoeuille, and Knau (2026) Weighted-Average-Slope (WAS) estimator with three design-dispatch paths: Design 1’ (continuous-at-zero), Design 1 continuous-near- d_lower, and Design 1 mass-point (2SLS sample-average per paper Section 3.2.4). Two aggregation modes:

aggregate="overall" (Phase 2a, default) returns a single-period HeterogeneousAdoptionDiDResults on a two-period panel.
aggregate="event_study" (Phase 2b, paper Appendix B.2) returns a HeterogeneousAdoptionDiDEventStudyResults with per- event-time WAS estimates on a multi-period panel, using a uniform F-1 anchor and pointwise CIs per horizon. Staggered-timing panels auto-filter to the last-treatment cohort plus never-treated units (paper Appendix B.2 prescription).

Parameters:

design ({"auto", "continuous_at_zero", "continuous_near_d_lower", "mass_point"}) –
Design-dispatch strategy. Defaults to "auto" which resolves via the REGISTRY auto-detect rule on the fitted dose data (see _detect_design()).

Explicit overrides are checked against the paper’s regime-partition contract (Section 3.2) at fit time:
- "continuous_at_zero" (Design 1’): paper requires the support infimum d_lower = 0. Phase 1c’s _validate_had_inputs rejects mass-point samples passed to this path.
- "continuous_near_d_lower" (Design 1, continuous density near d_lower): requires d_lower > 0 and a non-mass-point sample (modal fraction at d.min() must be <= 2%). d_lower must equal float(d.min()) within float tolerance; non-support-infimum thresholds are off- support and raise.
- "mass_point" (Design 1 mass-point): requires d_lower > 0 AND a mass-point sample (modal fraction at d.min() must be > 2%). d_lower must equal float(d.min()) within float tolerance. Forcing this design on a d_lower = 0 sample or on a continuous (non-mass-point) sample raises; in either case 2SLS identifies a different estimand than the paper’s Design 1 mass-point WAS.
Mismatched overrides raise ValueError pointing at the correct design rather than silently identifying a different estimand.
d_lower (float or None) – Support infimum d_lower. None means use 0.0 on the Design 1’ path and float(d.min()) on the other two paths. On Design 1 paths (continuous_near_d_lower and mass_point), an explicit d_lower must equal float(d.min()) within float tolerance AND must be strictly positive; zero-valued or mismatched thresholds raise.
kernel ({"epanechnikov", "triangular", "uniform"}) – Forwarded to bias_corrected_local_linear() on the continuous paths. Ignored on the mass-point path.
alpha (float) – CI level (0.05 for 95% CI).
vcov_type ({"classical", "hc1"} or None) – Mass-point-path only. When None, the effective family falls back to the robust flag: robust=True -> "hc1", robust=False -> "classical" (the default construction). Explicit "hc2" and "hc2_bm" raise NotImplementedError pending a 2SLS-specific leverage derivation. Ignored on the continuous paths (which use the CCT-2014 robust SE from Phase 1c); passing a non-default vcov_type on a continuous path emits a UserWarning per fit call.
robust (bool) – Backward-compat alias used only when vcov_type is None: True -> "hc1", False -> "classical". Explicit vcov_type takes precedence (e.g., vcov_type="classical", robust=True runs classical). Only the mass-point path consumes these; continuous paths ignore both with a warning.
cluster (str or None) – Column name for cluster-robust SE. On the mass-point path this is the 2SLS CR1 sandwich; on the continuous (continuous_at_zero / continuous_near_d_lower) paths (Phase 2a) it threads the cluster IDs into bias_corrected_local_linear so se_robust is the cluster-robust CCT-2014 nonparametric SE (β̂-scale se = se_robust / |den|). A bare cluster= gives unweighted cluster-robust inference; the cluster + survey_design= composition raises NotImplementedError (the Binder-TSL survey variance would override the cluster-robust SE — for weighted clustering route through survey_design=SurveyDesign(weights='<weight_col>', psu='<cluster_col>') instead). Cluster must be constant within unit. On the event-study path (aggregate="event_study", Phase 2b) cluster= provides cluster-robust per-horizon pointwise CIs (both designs) AND a cluster-robust simultaneous sup-t band (cband=True, fires even on unweighted fits); cluster= + survey_design= is rejected there too.

Notes

Non-testable assumptions (paper Section 3.1.2). Point identification of WAS_{d_lower} on the Design 1 family (continuous_near_d_lower and mass_point) requires Assumption 6 in addition to parallel trends; sign identification requires Assumption 5. Neither is testable via pre-trends:

Assumption 5 (sign identification): the boundary slope-ratio lim_{d down d_lower} E(TE_2 | D_2 <= d) / WAS < E(D_2) / d_lower relates the conditional expectation near the boundary to the overall WAS; it cannot be inferred from pre-period outcome trajectories alone.
Assumption 6 (point identification): the counterfactual-mean alignment lim_{d down d_lower} E[Y_2(d_lower) - Y_2(0) | D_2 <= d] = E[Y_2(d_lower) - Y_2(0)] is a statement about an unobserved counterfactual at the support infimum.

The fit() method emits a UserWarning whenever resolved_design is on the Design 1 family (continuous_near_d_lower or mass_point) so users are not silently led to interpret point estimates as full point identification. The available pre-tests verify ADJACENT identifying conditions:

diff_diff.qug_test(): Theorem 4 / Design 1’ support-infimum null d_lower = 0 (adjacent evidence on the d_lower = 0 clause of Assumption 4 only, NOT a test of the full Assumption 4 statement which also covers boundary-density positivity, conditional-mean smoothness, conditional-variance regularity, and bandwidth conditions).
diff_diff.stute_test() / diff_diff.yatchew_hr_test(): Assumption 8 linearity of E[ΔY | D_2] in D_2 (residuals from dy ~ 1 + d).
diff_diff.joint_pretrends_test(): Assumption 7 mean-independence pre-trends across multi-period placebos (intercept-only residual form via null_form="mean_independence"; the raw stute_test / yatchew_hr_test helpers do NOT cover Assumption 7 on their own).

None of these test Assumptions 5 or 6 directly. The Assumption 5/6 non-testability caveat is surfaced by the Design 1 fit-time UserWarning and by T21 (HAD pretest workflow tutorial) prose, NOT by the composite workflow verdict string (which only flags the Assumption 7 step-2 gap on the two-period aggregate="overall" path).

Diagnostics coverage. HeterogeneousAdoptionDiDResults.bandwidth_diagnostics and .bias_corrected_fit are populated only on the continuous paths; both are None on the mass-point path (which is parametric and has no bandwidth). Conversely, .n_mass_point and .n_above_d_lower are populated only on the mass-point path.

Clone idempotence. self.design stores the RAW user input (e.g., "auto"); the resolved mode is stored on the result object at fit time. This mirrors Phase 1a’s _vcov_type_arg pattern and keeps get_params() / sklearn.clone() round-trips exact.

Examples

Construct a two-period HAD panel by hand. Phase 2a requires exactly two periods with D_{g,1} = 0 for every unit.

>>> import numpy as np
>>> import pandas as pd
>>> from diff_diff import HeterogeneousAdoptionDiD
>>> rng = np.random.default_rng(42)
>>> G = 500
>>> dose_post = rng.uniform(0.0, 1.0, G)
>>> dose_post[0] = 0.0  # at least one zero-dose unit for Design 1'
>>> delta_y = 0.3 * dose_post + 0.1 * rng.standard_normal(G)
>>> data = pd.DataFrame({
...     "unit": np.repeat(np.arange(G), 2),
...     "period": np.tile([1, 2], G),
...     "dose": np.column_stack([np.zeros(G), dose_post]).ravel(),
...     "outcome": np.column_stack([np.zeros(G), delta_y]).ravel(),
... })
>>> est = HeterogeneousAdoptionDiD(design="auto")
>>> result = est.fit(
...     data, outcome_col="outcome", dose_col="dose",
...     time_col="period", unit_col="unit",
... )
>>> result.design
'continuous_at_zero'

Methods

`__init__`([design, d_lower, kernel, alpha, ...])
`fit`(data, outcome_col, dose_col, time_col, ...)	Fit the HAD estimator.
`get_params`([deep])	Return the raw constructor parameters (sklearn-compatible).
`set_params`(**params)	Set estimator parameters and return self (sklearn-compatible).

__init__(design='auto', d_lower=None, kernel='epanechnikov', alpha=0.05, vcov_type=None, robust=False, cluster=None, n_bootstrap=999, seed=None)[source]#

Parameters:

design (str)
d_lower (float | None)
kernel (str)
alpha (float)
vcov_type (str | None)
robust (bool)
cluster (str | None)
n_bootstrap (int)
seed (int | None)

Return type:

None

classmethod __new__(*args, **kwargs)#