diff_diff.ImputationDiD

class diff_diff.ImputationDiD[source]

Bases: ImputationDiDBootstrapMixin

Borusyak-Jaravel-Spiess (2024) imputation DiD estimator.

This is the efficient estimator for staggered Difference-in-Differences under parallel trends. It produces shorter confidence intervals than Callaway-Sant’Anna (~50% shorter) and Sun-Abraham (2-3.5x shorter) under homogeneous treatment effects.

The estimation procedure: 1. Run OLS on untreated observations to estimate unit + time fixed effects 2. Impute counterfactual Y(0) for treated observations 3. Aggregate imputed treatment effects with researcher-chosen weights

Inference uses the conservative clustered variance estimator from Theorem 3 of the paper.

Parameters:
  • anticipation (int, default=0) – Number of periods before treatment where effects may occur.

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default.

  • n_bootstrap (int, default=0) – Number of bootstrap iterations. If 0, uses analytical inference (conservative variance from Theorem 3).

  • bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher”, “mammen”, or “webb”.

  • seed (int, optional) – Random seed for reproducibility.

  • rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient: - “warn”: Issue warning and drop linearly dependent columns - “error”: Raise ValueError - “silent”: Drop columns silently

  • horizon_max (int, optional) – Maximum event-study horizon. If set, event study effects are only computed for |h| <= horizon_max.

  • aux_partition (str, default="cohort_horizon") – Controls the auxiliary model partition for Theorem 3 variance: - “cohort_horizon”: Groups by cohort x relative time (tightest SEs) - “cohort”: Groups by cohort only (more conservative) - “horizon”: Groups by relative time only (more conservative)

results_

Estimation results after calling fit().

Type:

ImputationDiDResults

is_fitted_

Whether the model has been fitted.

Type:

bool

Examples

Basic usage:

>>> from diff_diff import ImputationDiD, generate_staggered_data
>>> data = generate_staggered_data(n_units=200, seed=42)
>>> est = ImputationDiD()
>>> results = est.fit(data, outcome='outcome', unit='unit',
...                   time='time', first_treat='first_treat')
>>> results.print_summary()

With event study:

>>> est = ImputationDiD()
>>> results = est.fit(data, outcome='outcome', unit='unit',
...                   time='time', first_treat='first_treat',
...                   aggregate='event_study')
>>> from diff_diff import plot_event_study
>>> plot_event_study(results)

Notes

The imputation estimator uses ALL untreated observations (never-treated + not-yet-treated periods of eventually-treated units) to estimate the counterfactual model. There is no control_group parameter because this is fundamental to the method’s efficiency.

References

Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. Review of Economic Studies, 91(6), 3253-3285.

__init__(anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', horizon_max=None, aux_partition='cohort_horizon')[source]
Parameters:
  • anticipation (int)

  • alpha (float)

  • cluster (str | None)

  • n_bootstrap (int)

  • bootstrap_weights (str)

  • seed (int | None)

  • rank_deficient_action (str)

  • horizon_max (int | None)

  • aux_partition (str)

Methods

__init__([anticipation, alpha, cluster, ...])

fit(data, outcome, unit, time, first_treat)

Fit the imputation DiD estimator.

get_params()

Get estimator parameters (sklearn-compatible).

print_summary()

Print summary to stdout.

set_params(**params)

Set estimator parameters (sklearn-compatible).

summary()

Get summary of estimation results.

__init__(anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', horizon_max=None, aux_partition='cohort_horizon')[source]
Parameters:
  • anticipation (int)

  • alpha (float)

  • cluster (str | None)

  • n_bootstrap (int)

  • bootstrap_weights (str)

  • seed (int | None)

  • rank_deficient_action (str)

  • horizon_max (int | None)

  • aux_partition (str)

fit(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None)[source]

Fit the imputation DiD estimator.

Parameters:
  • data (pd.DataFrame) – Panel data with unit and time identifiers.

  • outcome (str) – Name of outcome variable column.

  • unit (str) – Name of unit identifier column.

  • time (str) – Name of time period column.

  • first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.

  • covariates (list of str, optional) – List of covariate column names.

  • aggregate (str, optional) – Aggregation mode: None/”simple” (overall ATT only), “event_study”, “group”, or “all”.

  • balance_e (int, optional) – When computing event study, restrict to cohorts observed at all relative times in [-balance_e, max_h].

Returns:

Object containing all estimation results.

Return type:

ImputationDiDResults

Raises:

ValueError – If required columns are missing or data validation fails.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:

ImputationDiD

summary()[source]

Get summary of estimation results.

Return type:

str

print_summary()[source]

Print summary to stdout.

Return type:

None