diff_diff.DifferenceInDifferences#
- class diff_diff.DifferenceInDifferences[source]#
Bases:
objectDifference-in-Differences estimator with sklearn-like interface.
Estimates the Average Treatment effect on the Treated (ATT) using the canonical 2x2 DiD design or panel data with two-way fixed effects.
- Parameters:
formula (str, optional) – R-style formula for the model (e.g., “outcome ~ treated * post”). If provided, overrides column name parameters.
robust (bool, default=True) – Legacy alias for
vcov_type.robust=Truemaps tovcov_type="hc1";robust=Falsemaps tovcov_type="classical". Explicitvcov_typeoverridesrobustunless the pair is contradictory (e.g.robust=False, vcov_type="hc2"raises).cluster (str, optional) – Column name for cluster-robust standard errors. Combined with
vcov_type: with"hc1"dispatches to CR1 (Liang-Zeger); with"hc2_bm"dispatches to CR2 Bell-McCaffrey (Pustejovsky-Tipton 2018 symmetric-sqrt + Satterthwaite DOF).vcov_type ({"classical", "hc1", "hc2", "hc2_bm", "conley"}, optional) –
Variance-covariance family. Defaults to the
robustalias."classical": non-robust OLS SEs,sigma_hat^2 * (X'X)^{-1}."hc1": heteroskedasticity-robust HC1 withn/(n-k)adjustment (library default). Withcluster=, uses CR1 (Liang-Zeger)."hc2": leverage-corrected meat (one-way only). Errors withcluster=; use"hc2_bm"for clustered Bell-McCaffrey."hc2_bm": one-way HC2 + Imbens-Kolesar (2016) Satterthwaite DOF; withcluster=, Pustejovsky-Tipton (2018) CR2 cluster-robust.MultiPeriodDiD(cluster=..., vcov_type="hc2_bm")is supported and uses a cluster-aware Bell-McCaffrey contrast DOF for the post-period-average ATT (see_compute_cr2_bm_contrast_dofinlinalg.pyand the REGISTRY.md note). Weighted CR2-BM (survey_design=paths) is a separate gate."conley": Conley 1999 spatial-HAC sandwich. Passconley_coords=(lat_col, lon_col),conley_cutoff_km=<float>, andconley_lag_cutoff=<int>on the constructor; passunit=<col>as a fit-time kwarg tofit()(NOT on__init__; unused unless Conley is set; not part ofget_params()/set_params()). The block-decomposed panel sandwich (matches Rconleyregwithlag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded). Explicitcluster=<col>enables the combined spatial + cluster product kernel;survey_design=andinference='wild_bootstrap'both raiseNotImplementedError.
alpha (float, default=0.05) – Significance level for confidence intervals.
inference (str, default="analytical") – Inference method: “analytical” for standard asymptotic inference, or “wild_bootstrap” for wild cluster bootstrap (recommended when number of clusters is small, <50).
n_bootstrap (int, default=999) – Number of bootstrap replications when inference=”wild_bootstrap”.
bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher” (standard), “webb” (recommended for <10 clusters), or “mammen” (skewness correction).
seed (int, optional) – Random seed for reproducibility when using bootstrap inference. If None (default), results will vary between runs.
rank_deficient_action (str, default "warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning
conley_coords – Conley (1999) spatial-HAC variance configuration. Pass
conley_coords=(lat_col, lon_col),conley_cutoff_km=<float>, andconley_lag_cutoff=<int>on the constructor; theunitidentifier is passed as a fit-time arg tofit(...)(NOT on__init__) — it is unused unlessvcov_type="conley"and is therefore not part ofget_params()/set_params()(which return constructor-arg dicts). The block-decomposed panel sandwich (matching Rconleyregwithlag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicitcluster=<col>+ Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absentcluster=, pure Conley spatial HAC applies.survey_design=+ Conley andinference='wild_bootstrap'+ Conley both raiseNotImplementedError.conley_cutoff_km – Conley (1999) spatial-HAC variance configuration. Pass
conley_coords=(lat_col, lon_col),conley_cutoff_km=<float>, andconley_lag_cutoff=<int>on the constructor; theunitidentifier is passed as a fit-time arg tofit(...)(NOT on__init__) — it is unused unlessvcov_type="conley"and is therefore not part ofget_params()/set_params()(which return constructor-arg dicts). The block-decomposed panel sandwich (matching Rconleyregwithlag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicitcluster=<col>+ Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absentcluster=, pure Conley spatial HAC applies.survey_design=+ Conley andinference='wild_bootstrap'+ Conley both raiseNotImplementedError.conley_metric – Conley (1999) spatial-HAC variance configuration. Pass
conley_coords=(lat_col, lon_col),conley_cutoff_km=<float>, andconley_lag_cutoff=<int>on the constructor; theunitidentifier is passed as a fit-time arg tofit(...)(NOT on__init__) — it is unused unlessvcov_type="conley"and is therefore not part ofget_params()/set_params()(which return constructor-arg dicts). The block-decomposed panel sandwich (matching Rconleyregwithlag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicitcluster=<col>+ Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absentcluster=, pure Conley spatial HAC applies.survey_design=+ Conley andinference='wild_bootstrap'+ Conley both raiseNotImplementedError.conley_kernel – Conley (1999) spatial-HAC variance configuration. Pass
conley_coords=(lat_col, lon_col),conley_cutoff_km=<float>, andconley_lag_cutoff=<int>on the constructor; theunitidentifier is passed as a fit-time arg tofit(...)(NOT on__init__) — it is unused unlessvcov_type="conley"and is therefore not part ofget_params()/set_params()(which return constructor-arg dicts). The block-decomposed panel sandwich (matching Rconleyregwithlag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicitcluster=<col>+ Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absentcluster=, pure Conley spatial HAC applies.survey_design=+ Conley andinference='wild_bootstrap'+ Conley both raiseNotImplementedError.conley_lag_cutoff – Conley (1999) spatial-HAC variance configuration. Pass
conley_coords=(lat_col, lon_col),conley_cutoff_km=<float>, andconley_lag_cutoff=<int>on the constructor; theunitidentifier is passed as a fit-time arg tofit(...)(NOT on__init__) — it is unused unlessvcov_type="conley"and is therefore not part ofget_params()/set_params()(which return constructor-arg dicts). The block-decomposed panel sandwich (matching Rconleyregwithlag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicitcluster=<col>+ Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absentcluster=, pure Conley spatial HAC applies.survey_design=+ Conley andinference='wild_bootstrap'+ Conley both raiseNotImplementedError.
- results_#
Estimation results after calling fit().
- Type:
Examples
Basic usage with a DataFrame:
>>> import pandas as pd >>> from diff_diff import DifferenceInDifferences >>> >>> # Create sample data >>> data = pd.DataFrame({ ... 'outcome': [10, 11, 15, 18, 9, 10, 12, 13], ... 'treated': [1, 1, 1, 1, 0, 0, 0, 0], ... 'post': [0, 0, 1, 1, 0, 0, 1, 1] ... }) >>> >>> # Fit the model >>> did = DifferenceInDifferences() >>> results = did.fit(data, outcome='outcome', treatment='treated', time='post') >>> >>> # View results >>> print(results.att) # ATT estimate >>> results.print_summary() # Full summary table
Using formula interface:
>>> did = DifferenceInDifferences() >>> results = did.fit(data, formula='outcome ~ treated * post')
Notes
The ATT is computed using the standard DiD formula:
ATT = (E[Y|D=1,T=1] - E[Y|D=1,T=0]) - (E[Y|D=0,T=1] - E[Y|D=0,T=0])
Or equivalently via OLS regression:
Y = α + β₁*D + β₂*T + β₃*(D×T) + ε
Where β₃ is the ATT.
Methods
__init__([robust, cluster, vcov_type, ...])fit(data[, outcome, treatment, time, ...])Fit the Difference-in-Differences model.
get_params()Get estimator parameters (sklearn-compatible).
predict(data)Predict outcomes using fitted model.
print_summary()Print summary to stdout.
set_params(**params)Set estimator parameters (sklearn-compatible).
summary()Get summary of estimation results.
- __init__(robust=True, cluster=None, vcov_type=None, alpha=0.05, inference='analytical', n_bootstrap=999, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', conley_coords=None, conley_cutoff_km=None, conley_metric='haversine', conley_kernel='bartlett', conley_lag_cutoff=None)[source]#
- classmethod __new__(*args, **kwargs)#