diff_diff.DifferenceInDifferences

class diff_diff.DifferenceInDifferences[source]

Bases: object

Difference-in-Differences estimator with sklearn-like interface.

Estimates the Average Treatment effect on the Treated (ATT) using the canonical 2x2 DiD design or panel data with two-way fixed effects.

Parameters:

formula (str, optional) – R-style formula for the model (e.g., “outcome ~ treated * post”). If provided, overrides column name parameters.
robust (bool, default=True) – Whether to use heteroskedasticity-robust standard errors (HC1).
cluster (str, optional) – Column name for cluster-robust standard errors.
alpha (float, default=0.05) – Significance level for confidence intervals.
inference (str, default="analytical") – Inference method: “analytical” for standard asymptotic inference, or “wild_bootstrap” for wild cluster bootstrap (recommended when number of clusters is small, <50).
n_bootstrap (int, default=999) – Number of bootstrap replications when inference=”wild_bootstrap”.
bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher” (standard), “webb” (recommended for <10 clusters), or “mammen” (skewness correction).
seed (int, optional) – Random seed for reproducibility when using bootstrap inference. If None (default), results will vary between runs.
rank_deficient_action (str, default "warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning

results_

Estimation results after calling fit().

Type:: DiDResults

is_fitted_

Whether the model has been fitted.

Type:: bool

Examples

Basic usage with a DataFrame:

>>> import pandas as pd
>>> from diff_diff import DifferenceInDifferences
>>>
>>> # Create sample data
>>> data = pd.DataFrame({
...     'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
...     'treated': [1, 1, 1, 1, 0, 0, 0, 0],
...     'post': [0, 0, 1, 1, 0, 0, 1, 1]
... })
>>>
>>> # Fit the model
>>> did = DifferenceInDifferences()
>>> results = did.fit(data, outcome='outcome', treatment='treated', time='post')
>>>
>>> # View results
>>> print(results.att)  # ATT estimate
>>> results.print_summary()  # Full summary table

Using formula interface:

>>> did = DifferenceInDifferences()
>>> results = did.fit(data, formula='outcome ~ treated * post')

Notes

The ATT is computed using the standard DiD formula:

ATT = (E[Y|D=1,T=1] - E[Y|D=1,T=0]) - (E[Y|D=0,T=1] - E[Y|D=0,T=0])

Or equivalently via OLS regression:

Y = α + β₁*D + β₂*T + β₃*(D×T) + ε

Where β₃ is the ATT.

__init__(robust=True, cluster=None, alpha=0.05, inference='analytical', n_bootstrap=999, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn')[source]

Parameters:

robust (bool)
cluster (str | None)
alpha (float)
inference (str)
n_bootstrap (int)
bootstrap_weights (str)
seed (int | None)
rank_deficient_action (str)

Methods

`__init__`([robust, cluster, alpha, ...])
`fit`(data[, outcome, treatment, time, ...])	Fit the Difference-in-Differences model.
`get_params`()	Get estimator parameters (sklearn-compatible).
`predict`(data)	Predict outcomes using fitted model.
`print_summary`()	Print summary to stdout.
`set_params`(**params)	Set estimator parameters (sklearn-compatible).
`summary`()	Get summary of estimation results.

__init__(robust=True, cluster=None, alpha=0.05, inference='analytical', n_bootstrap=999, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn')[source]

Parameters:

robust (bool)
cluster (str | None)
alpha (float)
inference (str)
n_bootstrap (int)
bootstrap_weights (str)
seed (int | None)
rank_deficient_action (str)

fit(data, outcome=None, treatment=None, time=None, formula=None, covariates=None, fixed_effects=None, absorb=None)[source]

Fit the Difference-in-Differences model.

Parameters:

data (pd.DataFrame) – DataFrame containing the outcome, treatment, and time variables.
outcome (str) – Name of the outcome variable column.
treatment (str) – Name of the treatment group indicator column (0/1).
time (str) – Name of the post-treatment period indicator column (0/1).
formula (str, optional) – R-style formula (e.g., “outcome ~ treated * post”). If provided, overrides outcome, treatment, and time parameters.
covariates (list, optional) – List of covariate column names to include as linear controls.
fixed_effects (list, optional) – List of categorical column names to include as fixed effects. Creates dummy variables for each category (drops first level). Use for low-dimensional fixed effects (e.g., industry, region).
absorb (list, optional) – List of categorical column names for high-dimensional fixed effects. Uses within-transformation (demeaning) instead of dummy variables. More efficient for large numbers of categories (e.g., firm, individual).

Returns:

Object containing estimation results.

Return type:

DiDResults

Raises:

ValueError – If required parameters are missing or data validation fails.

Examples

Using fixed effects (dummy variables):

>>> did.fit(data, outcome='sales', treatment='treated', time='post',
...         fixed_effects=['state', 'industry'])

Using absorbed fixed effects (within-transformation):

>>> did.fit(data, outcome='sales', treatment='treated', time='post',
...         absorb=['firm_id'])

predict(data)[source]

Predict outcomes using fitted model.

Parameters:: data (pd.DataFrame) – DataFrame with same structure as training data.
Returns:: Predicted values.
Return type:: np.ndarray

get_params()[source]

Get estimator parameters (sklearn-compatible).

Returns:: Estimator parameters.
Return type:: Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Parameters:: **params – Estimator parameters.
Return type:: self

summary()[source]

Get summary of estimation results.

Returns:: Formatted summary.
Return type:: str

print_summary()[source]

Print summary to stdout.

Return type:: None