diff_diff.HeterogeneousAdoptionDiD#
- class diff_diff.HeterogeneousAdoptionDiD[source]#
Bases:
objectHeterogeneous Adoption Difference-in-Differences estimator.
Implements de Chaisemartin, Ciccia, D’Haultfoeuille, and Knau (2026) Weighted-Average-Slope (WAS) estimator with three design-dispatch paths: Design 1’ (continuous-at-zero), Design 1 continuous-near- d_lower, and Design 1 mass-point (2SLS sample-average per paper Section 3.2.4). Two aggregation modes:
aggregate="overall"(Phase 2a, default) returns a single-periodHeterogeneousAdoptionDiDResultson a two-period panel.aggregate="event_study"(Phase 2b, paper Appendix B.2) returns aHeterogeneousAdoptionDiDEventStudyResultswith per- event-time WAS estimates on a multi-period panel, using a uniformF-1anchor and pointwise CIs per horizon. Staggered-timing panels auto-filter to the last-treatment cohort plus never-treated units (paper Appendix B.2 prescription).
- Parameters:
design ({"auto", "continuous_at_zero", "continuous_near_d_lower", "mass_point"}) –
Design-dispatch strategy. Defaults to
"auto"which resolves via the REGISTRY auto-detect rule on the fitted dose data (see_detect_design()).Explicit overrides are checked against the paper’s regime-partition contract (Section 3.2) at fit time:
"continuous_at_zero"(Design 1’): paper requires the support infimumd_lower = 0. Phase 1c’s_validate_had_inputsrejects mass-point samples passed to this path."continuous_near_d_lower"(Design 1, continuous density neard_lower): requiresd_lower > 0and a non-mass-point sample (modal fraction atd.min()must be <= 2%).d_lowermust equalfloat(d.min())within float tolerance; non-support-infimum thresholds are off- support and raise."mass_point"(Design 1 mass-point): requiresd_lower > 0AND a mass-point sample (modal fraction atd.min()must be > 2%).d_lowermust equalfloat(d.min())within float tolerance. Forcing this design on ad_lower = 0sample or on a continuous (non-mass-point) sample raises; in either case 2SLS identifies a different estimand than the paper’s Design 1 mass-point WAS.
Mismatched overrides raise
ValueErrorpointing at the correct design rather than silently identifying a different estimand.d_lower (float or None) – Support infimum
d_lower.Nonemeans use0.0on the Design 1’ path andfloat(d.min())on the other two paths. On Design 1 paths (continuous_near_d_lowerandmass_point), an explicitd_lowermust equalfloat(d.min())within float tolerance AND must be strictly positive; zero-valued or mismatched thresholds raise.kernel ({"epanechnikov", "triangular", "uniform"}) – Forwarded to
bias_corrected_local_linear()on the continuous paths. Ignored on the mass-point path.alpha (float) – CI level (0.05 for 95% CI).
vcov_type ({"classical", "hc1"} or None) – Mass-point-path only. When
None, the effective family falls back to therobustflag:robust=True->"hc1",robust=False->"classical"(the default construction). Explicit"hc2"and"hc2_bm"raiseNotImplementedErrorpending a 2SLS-specific leverage derivation. Ignored on the continuous paths (which use the CCT-2014 robust SE from Phase 1c); passing a non-defaultvcov_typeon a continuous path emits aUserWarningper fit call.robust (bool) – Backward-compat alias used only when
vcov_type is None:True->"hc1",False->"classical". Explicitvcov_typetakes precedence (e.g.,vcov_type="classical", robust=Trueruns classical). Only the mass-point path consumes these; continuous paths ignore both with a warning.cluster (str or None) – Column name for cluster-robust SE on the mass-point path (CR1). Ignored with a
UserWarningon the continuous paths in Phase 2a (nonparametric cluster support exists on Phase 1c but is exposed separately viabias_corrected_local_linear; the estimator-level knob is queued for a follow-up PR).
Notes
Non-testable assumptions (paper Section 3.1.2). Point identification of
WAS_{d_lower}on the Design 1 family (continuous_near_d_lowerandmass_point) requires Assumption 6 in addition to parallel trends; sign identification requires Assumption 5. Neither is testable via pre-trends:Assumption 5 (sign identification): the boundary slope-ratio
lim_{d down d_lower} E(TE_2 | D_2 <= d) / WAS < E(D_2) / d_lowerrelates the conditional expectation near the boundary to the overall WAS; it cannot be inferred from pre-period outcome trajectories alone.Assumption 6 (point identification): the counterfactual-mean alignment
lim_{d down d_lower} E[Y_2(d_lower) - Y_2(0) | D_2 <= d] = E[Y_2(d_lower) - Y_2(0)]is a statement about an unobserved counterfactual at the support infimum.
The fit() method emits a
UserWarningwheneverresolved_designis on the Design 1 family (continuous_near_d_lowerormass_point) so users are not silently led to interpret point estimates as full point identification. The available pre-tests verify ADJACENT identifying conditions:diff_diff.qug_test(): Theorem 4 / Design 1’ support-infimum nulld_lower = 0(adjacent evidence on thed_lower = 0clause of Assumption 4 only, NOT a test of the full Assumption 4 statement which also covers boundary-density positivity, conditional-mean smoothness, conditional-variance regularity, and bandwidth conditions).diff_diff.stute_test()/diff_diff.yatchew_hr_test(): Assumption 8 linearity ofE[ΔY | D_2]inD_2(residuals fromdy ~ 1 + d).diff_diff.joint_pretrends_test(): Assumption 7 mean-independence pre-trends across multi-period placebos (intercept-only residual form vianull_form="mean_independence"; the rawstute_test/yatchew_hr_testhelpers do NOT cover Assumption 7 on their own).
None of these test Assumptions 5 or 6 directly. The Assumption 5/6 non-testability caveat is surfaced by the Design 1 fit-time
UserWarningand by T21 (HAD pretest workflow tutorial) prose, NOT by the composite workflow verdict string (which only flags the Assumption 7 step-2 gap on the two-periodaggregate="overall"path).Diagnostics coverage.
HeterogeneousAdoptionDiDResults.bandwidth_diagnosticsand.bias_corrected_fitare populated only on the continuous paths; both areNoneon the mass-point path (which is parametric and has no bandwidth). Conversely,.n_mass_pointand.n_above_d_lowerare populated only on the mass-point path.Clone idempotence.
self.designstores the RAW user input (e.g.,"auto"); the resolved mode is stored on the result object at fit time. This mirrors Phase 1a’s_vcov_type_argpattern and keepsget_params()/sklearn.clone()round-trips exact.Examples
Construct a two-period HAD panel by hand. Phase 2a requires exactly two periods with
D_{g,1} = 0for every unit.>>> import numpy as np >>> import pandas as pd >>> from diff_diff import HeterogeneousAdoptionDiD >>> rng = np.random.default_rng(42) >>> G = 500 >>> dose_post = rng.uniform(0.0, 1.0, G) >>> dose_post[0] = 0.0 # at least one zero-dose unit for Design 1' >>> delta_y = 0.3 * dose_post + 0.1 * rng.standard_normal(G) >>> data = pd.DataFrame({ ... "unit": np.repeat(np.arange(G), 2), ... "period": np.tile([1, 2], G), ... "dose": np.column_stack([np.zeros(G), dose_post]).ravel(), ... "outcome": np.column_stack([np.zeros(G), delta_y]).ravel(), ... }) >>> est = HeterogeneousAdoptionDiD(design="auto") >>> result = est.fit( ... data, outcome_col="outcome", dose_col="dose", ... time_col="period", unit_col="unit", ... ) >>> result.design 'continuous_at_zero'
Methods
__init__([design, d_lower, kernel, alpha, ...])fit(data, outcome_col, dose_col, time_col, ...)Fit the HAD estimator.
get_params([deep])Return the raw constructor parameters (sklearn-compatible).
set_params(**params)Set estimator parameters and return self (sklearn-compatible).
- __init__(design='auto', d_lower=None, kernel='epanechnikov', alpha=0.05, vcov_type=None, robust=False, cluster=None, n_bootstrap=999, seed=None)[source]#
- classmethod __new__(*args, **kwargs)#