diff_diff.DifferenceInDifferences

class diff_diff.DifferenceInDifferences[source]

Bases: object

Difference-in-Differences estimator with sklearn-like interface.

Estimates the Average Treatment effect on the Treated (ATT) using the canonical 2x2 DiD design or panel data with two-way fixed effects.

Parameters:
  • formula (str, optional) – R-style formula for the model (e.g., “outcome ~ treated * post”). If provided, overrides column name parameters.

  • robust (bool, default=True) – Whether to use heteroskedasticity-robust standard errors (HC1).

  • cluster (str, optional) – Column name for cluster-robust standard errors.

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • inference (str, default="analytical") – Inference method: “analytical” for standard asymptotic inference, or “wild_bootstrap” for wild cluster bootstrap (recommended when number of clusters is small, <50).

  • n_bootstrap (int, default=999) – Number of bootstrap replications when inference=”wild_bootstrap”.

  • bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher” (standard), “webb” (recommended for <10 clusters), or “mammen” (skewness correction).

  • seed (int, optional) – Random seed for reproducibility when using bootstrap inference. If None (default), results will vary between runs.

  • rank_deficient_action (str, default "warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning

results_

Estimation results after calling fit().

Type:

DiDResults

is_fitted_

Whether the model has been fitted.

Type:

bool

Examples

Basic usage with a DataFrame:

>>> import pandas as pd
>>> from diff_diff import DifferenceInDifferences
>>>
>>> # Create sample data
>>> data = pd.DataFrame({
...     'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
...     'treated': [1, 1, 1, 1, 0, 0, 0, 0],
...     'post': [0, 0, 1, 1, 0, 0, 1, 1]
... })
>>>
>>> # Fit the model
>>> did = DifferenceInDifferences()
>>> results = did.fit(data, outcome='outcome', treatment='treated', time='post')
>>>
>>> # View results
>>> print(results.att)  # ATT estimate
>>> results.print_summary()  # Full summary table

Using formula interface:

>>> did = DifferenceInDifferences()
>>> results = did.fit(data, formula='outcome ~ treated * post')

Notes

The ATT is computed using the standard DiD formula:

ATT = (E[Y|D=1,T=1] - E[Y|D=1,T=0]) - (E[Y|D=0,T=1] - E[Y|D=0,T=0])

Or equivalently via OLS regression:

Y = α + β₁*D + β₂*T + β₃*(D×T) + ε

Where β₃ is the ATT.

__init__(robust=True, cluster=None, alpha=0.05, inference='analytical', n_bootstrap=999, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn')[source]
Parameters:
  • robust (bool)

  • cluster (str | None)

  • alpha (float)

  • inference (str)

  • n_bootstrap (int)

  • bootstrap_weights (str)

  • seed (int | None)

  • rank_deficient_action (str)

Methods

__init__([robust, cluster, alpha, ...])

fit(data[, outcome, treatment, time, ...])

Fit the Difference-in-Differences model.

get_params()

Get estimator parameters (sklearn-compatible).

predict(data)

Predict outcomes using fitted model.

print_summary()

Print summary to stdout.

set_params(**params)

Set estimator parameters (sklearn-compatible).

summary()

Get summary of estimation results.

__init__(robust=True, cluster=None, alpha=0.05, inference='analytical', n_bootstrap=999, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn')[source]
Parameters:
  • robust (bool)

  • cluster (str | None)

  • alpha (float)

  • inference (str)

  • n_bootstrap (int)

  • bootstrap_weights (str)

  • seed (int | None)

  • rank_deficient_action (str)

fit(data, outcome=None, treatment=None, time=None, formula=None, covariates=None, fixed_effects=None, absorb=None)[source]

Fit the Difference-in-Differences model.

Parameters:
  • data (pd.DataFrame) – DataFrame containing the outcome, treatment, and time variables.

  • outcome (str) – Name of the outcome variable column.

  • treatment (str) – Name of the treatment group indicator column (0/1).

  • time (str) – Name of the post-treatment period indicator column (0/1).

  • formula (str, optional) – R-style formula (e.g., “outcome ~ treated * post”). If provided, overrides outcome, treatment, and time parameters.

  • covariates (list, optional) – List of covariate column names to include as linear controls.

  • fixed_effects (list, optional) – List of categorical column names to include as fixed effects. Creates dummy variables for each category (drops first level). Use for low-dimensional fixed effects (e.g., industry, region).

  • absorb (list, optional) – List of categorical column names for high-dimensional fixed effects. Uses within-transformation (demeaning) instead of dummy variables. More efficient for large numbers of categories (e.g., firm, individual).

Returns:

Object containing estimation results.

Return type:

DiDResults

Raises:

ValueError – If required parameters are missing or data validation fails.

Examples

Using fixed effects (dummy variables):

>>> did.fit(data, outcome='sales', treatment='treated', time='post',
...         fixed_effects=['state', 'industry'])

Using absorbed fixed effects (within-transformation):

>>> did.fit(data, outcome='sales', treatment='treated', time='post',
...         absorb=['firm_id'])
predict(data)[source]

Predict outcomes using fitted model.

Parameters:

data (pd.DataFrame) – DataFrame with same structure as training data.

Returns:

Predicted values.

Return type:

np.ndarray

get_params()[source]

Get estimator parameters (sklearn-compatible).

Returns:

Estimator parameters.

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Parameters:

**params – Estimator parameters.

Return type:

self

summary()[source]

Get summary of estimation results.

Returns:

Formatted summary.

Return type:

str

print_summary()[source]

Print summary to stdout.

Return type:

None