diff_diff.DifferenceInDifferences#

class diff_diff.DifferenceInDifferences[source]#

Bases: object

Difference-in-Differences estimator with sklearn-like interface.

Estimates the Average Treatment effect on the Treated (ATT) using the canonical 2x2 DiD design or panel data with two-way fixed effects.

Parameters:
  • formula (str, optional) – R-style formula for the model (e.g., “outcome ~ treated * post”). If provided, overrides column name parameters.

  • robust (bool, default=True) – Legacy alias for vcov_type. robust=True maps to vcov_type="hc1"; robust=False maps to vcov_type="classical". Explicit vcov_type overrides robust unless the pair is contradictory (e.g. robust=False, vcov_type="hc2" raises).

  • cluster (str, optional) – Column name for cluster-robust standard errors. Combined with vcov_type: with "hc1" dispatches to CR1 (Liang-Zeger); with "hc2_bm" dispatches to CR2 Bell-McCaffrey (Pustejovsky-Tipton 2018 symmetric-sqrt + Satterthwaite DOF).

  • vcov_type ({"classical", "hc1", "hc2", "hc2_bm", "conley"}, optional) –

    Variance-covariance family. Defaults to the robust alias.

    • "classical": non-robust OLS SEs, sigma_hat^2 * (X'X)^{-1}.

    • "hc1": heteroskedasticity-robust HC1 with n/(n-k) adjustment (library default). With cluster=, uses CR1 (Liang-Zeger).

    • "hc2": leverage-corrected meat (one-way only). Errors with cluster=; use "hc2_bm" for clustered Bell-McCaffrey.

    • "hc2_bm": one-way HC2 + Imbens-Kolesar (2016) Satterthwaite DOF; with cluster=, Pustejovsky-Tipton (2018) CR2 cluster-robust. MultiPeriodDiD(cluster=..., vcov_type="hc2_bm") is supported and uses a cluster-aware Bell-McCaffrey contrast DOF for the post-period-average ATT (see _compute_cr2_bm_contrast_dof in linalg.py and the REGISTRY.md note). Weighted CR2-BM (survey_design= paths) is a separate gate.

    • "conley": Conley 1999 spatial-HAC sandwich. Pass conley_coords=(lat_col, lon_col), conley_cutoff_km=<float>, and conley_lag_cutoff=<int> on the constructor; pass unit=<col> as a fit-time kwarg to fit() (NOT on __init__; unused unless Conley is set; not part of get_params() / set_params()). The block-decomposed panel sandwich (matches R conleyreg with lag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded). Explicit cluster=<col> enables the combined spatial + cluster product kernel; survey_design= and inference='wild_bootstrap' both raise NotImplementedError.

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • inference (str, default="analytical") – Inference method: “analytical” for standard asymptotic inference, or “wild_bootstrap” for wild cluster bootstrap (recommended when number of clusters is small, <50).

  • n_bootstrap (int, default=999) – Number of bootstrap replications when inference=”wild_bootstrap”.

  • bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher” (standard), “webb” (recommended for <10 clusters), or “mammen” (skewness correction).

  • seed (int, optional) – Random seed for reproducibility when using bootstrap inference. If None (default), results will vary between runs.

  • rank_deficient_action (str, default "warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning

  • conley_coords – Conley (1999) spatial-HAC variance configuration. Pass conley_coords=(lat_col, lon_col), conley_cutoff_km=<float>, and conley_lag_cutoff=<int> on the constructor; the unit identifier is passed as a fit-time arg to fit(...) (NOT on __init__) — it is unused unless vcov_type="conley" and is therefore not part of get_params() / set_params() (which return constructor-arg dicts). The block-decomposed panel sandwich (matching R conleyreg with lag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicit cluster=<col> + Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absent cluster=, pure Conley spatial HAC applies. survey_design= + Conley and inference='wild_bootstrap' + Conley both raise NotImplementedError.

  • conley_cutoff_km – Conley (1999) spatial-HAC variance configuration. Pass conley_coords=(lat_col, lon_col), conley_cutoff_km=<float>, and conley_lag_cutoff=<int> on the constructor; the unit identifier is passed as a fit-time arg to fit(...) (NOT on __init__) — it is unused unless vcov_type="conley" and is therefore not part of get_params() / set_params() (which return constructor-arg dicts). The block-decomposed panel sandwich (matching R conleyreg with lag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicit cluster=<col> + Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absent cluster=, pure Conley spatial HAC applies. survey_design= + Conley and inference='wild_bootstrap' + Conley both raise NotImplementedError.

  • conley_metric – Conley (1999) spatial-HAC variance configuration. Pass conley_coords=(lat_col, lon_col), conley_cutoff_km=<float>, and conley_lag_cutoff=<int> on the constructor; the unit identifier is passed as a fit-time arg to fit(...) (NOT on __init__) — it is unused unless vcov_type="conley" and is therefore not part of get_params() / set_params() (which return constructor-arg dicts). The block-decomposed panel sandwich (matching R conleyreg with lag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicit cluster=<col> + Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absent cluster=, pure Conley spatial HAC applies. survey_design= + Conley and inference='wild_bootstrap' + Conley both raise NotImplementedError.

  • conley_kernel – Conley (1999) spatial-HAC variance configuration. Pass conley_coords=(lat_col, lon_col), conley_cutoff_km=<float>, and conley_lag_cutoff=<int> on the constructor; the unit identifier is passed as a fit-time arg to fit(...) (NOT on __init__) — it is unused unless vcov_type="conley" and is therefore not part of get_params() / set_params() (which return constructor-arg dicts). The block-decomposed panel sandwich (matching R conleyreg with lag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicit cluster=<col> + Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absent cluster=, pure Conley spatial HAC applies. survey_design= + Conley and inference='wild_bootstrap' + Conley both raise NotImplementedError.

  • conley_lag_cutoff – Conley (1999) spatial-HAC variance configuration. Pass conley_coords=(lat_col, lon_col), conley_cutoff_km=<float>, and conley_lag_cutoff=<int> on the constructor; the unit identifier is passed as a fit-time arg to fit(...) (NOT on __init__) — it is unused unless vcov_type="conley" and is therefore not part of get_params() / set_params() (which return constructor-arg dicts). The block-decomposed panel sandwich (matching R conleyreg with lag_cutoff > 0) sums within-period spatial pairs plus within-unit Bartlett serial pairs (lag=0 excluded to avoid double-counting). Explicit cluster=<col> + Conley enables the combined spatial + cluster product kernel; the cluster must be constant within each unit across periods (validator-enforced). DiD has no auto-cluster, so cluster is fully opt-in on the Conley path — absent cluster=, pure Conley spatial HAC applies. survey_design= + Conley and inference='wild_bootstrap' + Conley both raise NotImplementedError.

results_#

Estimation results after calling fit().

Type:

DiDResults

is_fitted_#

Whether the model has been fitted.

Type:

bool

Examples

Basic usage with a DataFrame:

>>> import pandas as pd
>>> from diff_diff import DifferenceInDifferences
>>>
>>> # Create sample data
>>> data = pd.DataFrame({
...     'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
...     'treated': [1, 1, 1, 1, 0, 0, 0, 0],
...     'post': [0, 0, 1, 1, 0, 0, 1, 1]
... })
>>>
>>> # Fit the model
>>> did = DifferenceInDifferences()
>>> results = did.fit(data, outcome='outcome', treatment='treated', time='post')
>>>
>>> # View results
>>> print(results.att)  # ATT estimate
>>> results.print_summary()  # Full summary table

Using formula interface:

>>> did = DifferenceInDifferences()
>>> results = did.fit(data, formula='outcome ~ treated * post')

Notes

The ATT is computed using the standard DiD formula:

ATT = (E[Y|D=1,T=1] - E[Y|D=1,T=0]) - (E[Y|D=0,T=1] - E[Y|D=0,T=0])

Or equivalently via OLS regression:

Y = α + β₁*D + β₂*T + β₃*(D×T) + ε

Where β₃ is the ATT.

Methods

__init__([robust, cluster, vcov_type, ...])

fit(data[, outcome, treatment, time, ...])

Fit the Difference-in-Differences model.

get_params()

Get estimator parameters (sklearn-compatible).

predict(data)

Predict outcomes using fitted model.

print_summary()

Print summary to stdout.

set_params(**params)

Set estimator parameters (sklearn-compatible).

summary()

Get summary of estimation results.

__init__(robust=True, cluster=None, vcov_type=None, alpha=0.05, inference='analytical', n_bootstrap=999, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', conley_coords=None, conley_cutoff_km=None, conley_metric='haversine', conley_kernel='bartlett', conley_lag_cutoff=None)[source]#
Parameters:
  • robust (bool)

  • cluster (str | None)

  • vcov_type (str | None)

  • alpha (float)

  • inference (str)

  • n_bootstrap (int)

  • bootstrap_weights (str)

  • seed (int | None)

  • rank_deficient_action (str)

  • conley_coords (Tuple[str, str] | None)

  • conley_cutoff_km (float | None)

  • conley_metric (str)

  • conley_kernel (str)

  • conley_lag_cutoff (int | None)

classmethod __new__(*args, **kwargs)#