diff_diff.DifferenceInDifferences
- class diff_diff.DifferenceInDifferences[source]
Bases:
objectDifference-in-Differences estimator with sklearn-like interface.
Estimates the Average Treatment effect on the Treated (ATT) using the canonical 2x2 DiD design or panel data with two-way fixed effects.
- Parameters:
formula (str, optional) – R-style formula for the model (e.g., “outcome ~ treated * post”). If provided, overrides column name parameters.
robust (bool, default=True) – Whether to use heteroskedasticity-robust standard errors (HC1).
cluster (str, optional) – Column name for cluster-robust standard errors.
alpha (float, default=0.05) – Significance level for confidence intervals.
inference (str, default="analytical") – Inference method: “analytical” for standard asymptotic inference, or “wild_bootstrap” for wild cluster bootstrap (recommended when number of clusters is small, <50).
n_bootstrap (int, default=999) – Number of bootstrap replications when inference=”wild_bootstrap”.
bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher” (standard), “webb” (recommended for <10 clusters), or “mammen” (skewness correction).
seed (int, optional) – Random seed for reproducibility when using bootstrap inference. If None (default), results will vary between runs.
rank_deficient_action (str, default "warn") – Action when design matrix is rank-deficient (linearly dependent columns): - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning
- results_
Estimation results after calling fit().
- Type:
Examples
Basic usage with a DataFrame:
>>> import pandas as pd >>> from diff_diff import DifferenceInDifferences >>> >>> # Create sample data >>> data = pd.DataFrame({ ... 'outcome': [10, 11, 15, 18, 9, 10, 12, 13], ... 'treated': [1, 1, 1, 1, 0, 0, 0, 0], ... 'post': [0, 0, 1, 1, 0, 0, 1, 1] ... }) >>> >>> # Fit the model >>> did = DifferenceInDifferences() >>> results = did.fit(data, outcome='outcome', treatment='treated', time='post') >>> >>> # View results >>> print(results.att) # ATT estimate >>> results.print_summary() # Full summary table
Using formula interface:
>>> did = DifferenceInDifferences() >>> results = did.fit(data, formula='outcome ~ treated * post')
Notes
The ATT is computed using the standard DiD formula:
ATT = (E[Y|D=1,T=1] - E[Y|D=1,T=0]) - (E[Y|D=0,T=1] - E[Y|D=0,T=0])
Or equivalently via OLS regression:
Y = α + β₁*D + β₂*T + β₃*(D×T) + ε
Where β₃ is the ATT.
- __init__(robust=True, cluster=None, alpha=0.05, inference='analytical', n_bootstrap=999, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn')[source]
Methods
__init__([robust, cluster, alpha, ...])fit(data[, outcome, treatment, time, ...])Fit the Difference-in-Differences model.
Get estimator parameters (sklearn-compatible).
predict(data)Predict outcomes using fitted model.
Print summary to stdout.
set_params(**params)Set estimator parameters (sklearn-compatible).
summary()Get summary of estimation results.
- __init__(robust=True, cluster=None, alpha=0.05, inference='analytical', n_bootstrap=999, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn')[source]
- fit(data, outcome=None, treatment=None, time=None, formula=None, covariates=None, fixed_effects=None, absorb=None)[source]
Fit the Difference-in-Differences model.
- Parameters:
data (pd.DataFrame) – DataFrame containing the outcome, treatment, and time variables.
outcome (str) – Name of the outcome variable column.
treatment (str) – Name of the treatment group indicator column (0/1).
time (str) – Name of the post-treatment period indicator column (0/1).
formula (str, optional) – R-style formula (e.g., “outcome ~ treated * post”). If provided, overrides outcome, treatment, and time parameters.
covariates (list, optional) – List of covariate column names to include as linear controls.
fixed_effects (list, optional) – List of categorical column names to include as fixed effects. Creates dummy variables for each category (drops first level). Use for low-dimensional fixed effects (e.g., industry, region).
absorb (list, optional) – List of categorical column names for high-dimensional fixed effects. Uses within-transformation (demeaning) instead of dummy variables. More efficient for large numbers of categories (e.g., firm, individual).
- Returns:
Object containing estimation results.
- Return type:
- Raises:
ValueError – If required parameters are missing or data validation fails.
Examples
Using fixed effects (dummy variables):
>>> did.fit(data, outcome='sales', treatment='treated', time='post', ... fixed_effects=['state', 'industry'])
Using absorbed fixed effects (within-transformation):
>>> did.fit(data, outcome='sales', treatment='treated', time='post', ... absorb=['firm_id'])
- predict(data)[source]
Predict outcomes using fitted model.
- Parameters:
data (pd.DataFrame) – DataFrame with same structure as training data.
- Returns:
Predicted values.
- Return type:
np.ndarray
- get_params()[source]
Get estimator parameters (sklearn-compatible).
- Returns:
Estimator parameters.
- Return type:
Dict[str, Any]