diff_diff.ImputationDiD

class diff_diff.ImputationDiD[source]

Bases: ImputationDiDBootstrapMixin

Borusyak-Jaravel-Spiess (2024) imputation DiD estimator.

This is the efficient estimator for staggered Difference-in-Differences under parallel trends. It produces shorter confidence intervals than Callaway-Sant’Anna (~50% shorter) and Sun-Abraham (2-3.5x shorter) under homogeneous treatment effects.

The estimation procedure: 1. Run OLS on untreated observations to estimate unit + time fixed effects 2. Impute counterfactual Y(0) for treated observations 3. Aggregate imputed treatment effects with researcher-chosen weights

Inference uses the conservative clustered variance estimator from Theorem 3 of the paper.

Parameters:

anticipation (int, default=0) – Number of periods before treatment where effects may occur.
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default.
n_bootstrap (int, default=0) – Number of bootstrap iterations. If 0, uses analytical inference (conservative variance from Theorem 3).
bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher”, “mammen”, or “webb”.
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient: - “warn”: Issue warning and drop linearly dependent columns - “error”: Raise ValueError - “silent”: Drop columns silently
horizon_max (int, optional) – Maximum event-study horizon. If set, event study effects are only computed for |h| <= horizon_max.
aux_partition (str, default="cohort_horizon") – Controls the auxiliary model partition for Theorem 3 variance: - “cohort_horizon”: Groups by cohort x relative time (tightest SEs) - “cohort”: Groups by cohort only (more conservative) - “horizon”: Groups by relative time only (more conservative)

results_

Estimation results after calling fit().

Type:: ImputationDiDResults

is_fitted_

Whether the model has been fitted.

Type:: bool

Examples

Basic usage:

>>> from diff_diff import ImputationDiD, generate_staggered_data
>>> data = generate_staggered_data(n_units=200, seed=42)
>>> est = ImputationDiD()
>>> results = est.fit(data, outcome='outcome', unit='unit',
...                   time='time', first_treat='first_treat')
>>> results.print_summary()

With event study:

>>> est = ImputationDiD()
>>> results = est.fit(data, outcome='outcome', unit='unit',
...                   time='time', first_treat='first_treat',
...                   aggregate='event_study')
>>> from diff_diff import plot_event_study
>>> plot_event_study(results)

Notes

The imputation estimator uses ALL untreated observations (never-treated + not-yet-treated periods of eventually-treated units) to estimate the counterfactual model. There is no control_group parameter because this is fundamental to the method’s efficiency.

References

Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. Review of Economic Studies, 91(6), 3253-3285.

__init__(anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', horizon_max=None, aux_partition='cohort_horizon')[source]

Parameters:

anticipation (int)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str)
seed (int | None)
rank_deficient_action (str)
horizon_max (int | None)
aux_partition (str)

Methods

`__init__`([anticipation, alpha, cluster, ...])
`fit`(data, outcome, unit, time, first_treat)	Fit the imputation DiD estimator.
`get_params`()	Get estimator parameters (sklearn-compatible).
`print_summary`()	Print summary to stdout.
`set_params`(**params)	Set estimator parameters (sklearn-compatible).
`summary`()	Get summary of estimation results.

__init__(anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', horizon_max=None, aux_partition='cohort_horizon')[source]

Parameters:

anticipation (int)
alpha (float)
cluster (str | None)
n_bootstrap (int)
bootstrap_weights (str)
seed (int | None)
rank_deficient_action (str)
horizon_max (int | None)
aux_partition (str)

fit(data, outcome, unit, time, first_treat, covariates=None, aggregate=None, balance_e=None)[source]

Fit the imputation DiD estimator.

Parameters:

data (pd.DataFrame) – Panel data with unit and time identifiers.
outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. Use 0 (or np.inf) for never-treated units.
covariates (list of str, optional) – List of covariate column names.
aggregate (str, optional) – Aggregation mode: None/”simple” (overall ATT only), “event_study”, “group”, or “all”.
balance_e (int, optional) – When computing event study, restrict to cohorts observed at all relative times in [-balance_e, max_h].

Returns:

Object containing all estimation results.

Return type:

ImputationDiDResults

Raises:

ValueError – If required columns are missing or data validation fails.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Return type:: Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Return type:: ImputationDiD

summary()[source]

Get summary of estimation results.

Return type:: str

print_summary()[source]

Print summary to stdout.

Return type:: None