Utilities
Statistical utilities for parallel trends testing, robust standard errors, and bootstrap inference.
Parallel Trends Testing
check_parallel_trends
Test for parallel trends using pre-treatment data.
- diff_diff.check_parallel_trends(data, outcome, time, treatment_group, pre_periods=None)[source]
Perform a simple check for parallel trends assumption.
This computes the trend (slope) in the outcome variable for both treatment and control groups during pre-treatment periods.
- Parameters:
- Returns:
Dictionary with trend statistics and test results.
- Return type:
Example
from diff_diff import check_parallel_trends
result = check_parallel_trends(
data,
outcome='y',
unit='unit_id',
time='period',
treated='treated',
pre_periods=4
)
print(f"F-statistic: {result['f_stat']:.3f}")
print(f"p-value: {result['p_value']:.3f}")
if result['p_value'] > 0.05:
print("Cannot reject parallel trends")
else:
print("Evidence against parallel trends")
check_parallel_trends_robust
Robust parallel trends test using Wasserstein distance with permutation-based inference.
- diff_diff.check_parallel_trends_robust(data, outcome, time, treatment_group, unit=None, pre_periods=None, n_permutations=1000, seed=None, wasserstein_threshold=0.2)[source]
Perform robust parallel trends testing using distributional comparisons.
Uses the Wasserstein (Earth Mover’s) distance to compare the full distribution of outcome changes between treated and control groups, with permutation-based inference.
- Parameters:
data (pd.DataFrame) – Panel data with repeated observations over time.
outcome (str) – Name of outcome variable column.
time (str) – Name of time period column.
treatment_group (str) – Name of treatment group indicator column (0/1).
unit (str, optional) – Name of unit identifier column. If provided, computes unit-level changes. Otherwise uses observation-level data.
pre_periods (list, optional) – List of pre-treatment time periods. If None, uses first half of periods.
n_permutations (int, default=1000) – Number of permutations for computing p-value.
seed (int, optional) – Random seed for reproducibility.
wasserstein_threshold (float, default=0.2) – Threshold for normalized Wasserstein distance. Values below this threshold (combined with p > 0.05) suggest parallel trends are plausible.
- Returns:
Dictionary containing: - wasserstein_distance: Wasserstein distance between group distributions - wasserstein_p_value: Permutation-based p-value - ks_statistic: Kolmogorov-Smirnov test statistic - ks_p_value: KS test p-value - mean_difference: Difference in mean changes - variance_ratio: Ratio of variances in changes - treated_changes: Array of outcome changes for treated - control_changes: Array of outcome changes for control - parallel_trends_plausible: Boolean assessment
- Return type:
Examples
>>> results = check_parallel_trends_robust( ... data, outcome='sales', time='year', ... treatment_group='treated', unit='firm_id' ... ) >>> print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}") >>> print(f"P-value: {results['wasserstein_p_value']:.4f}")
Notes
The Wasserstein distance (Earth Mover’s Distance) measures the minimum “cost” of transforming one distribution into another. Unlike simple mean comparisons, it captures differences in the entire distribution shape, making it more robust to non-normal data and heterogeneous effects.
A small Wasserstein distance and high p-value suggest the distributions of pre-treatment changes are similar, supporting the parallel trends assumption.
equivalence_test_trends
Equivalence test for parallel trends (TOST procedure).
- diff_diff.equivalence_test_trends(data, outcome, time, treatment_group, unit=None, pre_periods=None, equivalence_margin=None)[source]
Perform equivalence testing (TOST) for parallel trends.
Tests whether the difference in trends is practically equivalent to zero using Two One-Sided Tests (TOST) procedure.
- Parameters:
data (pd.DataFrame) – Panel data.
outcome (str) – Name of outcome variable column.
time (str) – Name of time period column.
treatment_group (str) – Name of treatment group indicator column.
unit (str, optional) – Name of unit identifier column.
pre_periods (list, optional) – List of pre-treatment time periods.
equivalence_margin (float, optional) – The margin for equivalence (delta). If None, uses 0.5 * pooled SD of outcome changes as a default.
- Returns:
Dictionary containing: - mean_difference: Difference in mean changes - equivalence_margin: The margin used - lower_p_value: P-value for lower bound test - upper_p_value: P-value for upper bound test - tost_p_value: Maximum of the two p-values - equivalent: Boolean indicating equivalence at alpha=0.05
- Return type:
Example
from diff_diff import equivalence_test_trends
# Test if pre-trends are equivalent to zero within bounds
result = equivalence_test_trends(
data,
outcome='y',
unit='unit_id',
time='period',
treated='treated',
equivalence_bound=0.5 # Effect size bound
)
if result['equivalent']:
print("Pre-trends are practically equivalent to zero")
Wild Cluster Bootstrap
wild_bootstrap_se
Compute wild cluster bootstrap standard errors.
- diff_diff.wild_bootstrap_se(X, y, residuals, cluster_ids, coefficient_index, n_bootstrap=999, weight_type='rademacher', null_hypothesis=0.0, alpha=0.05, seed=None, return_distribution=False)[source]
Compute wild cluster bootstrap standard errors and p-values.
Implements the Wild Cluster Residual (WCR) bootstrap procedure from Cameron, Gelbach, and Miller (2008). Uses the restricted residuals approach (imposing H0: coefficient = null_hypothesis) for more accurate p-value computation.
- Parameters:
X (np.ndarray) – Design matrix of shape (n, k).
y (np.ndarray) – Outcome vector of shape (n,).
residuals (np.ndarray) – OLS residuals from unrestricted regression, shape (n,).
cluster_ids (np.ndarray) – Cluster identifiers of shape (n,).
coefficient_index (int) – Index of the coefficient for which to compute bootstrap inference. For DiD, this is typically 3 (the treatment*post interaction term).
n_bootstrap (int, default=999) – Number of bootstrap replications. Odd numbers are recommended for exact p-value computation.
weight_type (str, default="rademacher") – Type of bootstrap weights: - “rademacher”: +1 or -1 with equal probability (standard choice) - “webb”: 6-point distribution (recommended for <10 clusters) - “mammen”: Two-point distribution with skewness correction
null_hypothesis (float, default=0.0) – Value of the null hypothesis for p-value computation.
alpha (float, default=0.05) – Significance level for confidence interval.
seed (int, optional) – Random seed for reproducibility. If None (default), results will vary between runs.
return_distribution (bool, default=False) – If True, include full bootstrap distribution in results.
- Returns:
Dataclass containing bootstrap SE, p-value, confidence interval, and other inference results.
- Return type:
- Raises:
ValueError – If weight_type is not recognized or if there are fewer than 2 clusters.
- Warns:
UserWarning – If the number of clusters is less than 5, as bootstrap inference may be unreliable.
Examples
>>> from diff_diff.utils import wild_bootstrap_se >>> results = wild_bootstrap_se( ... X, y, residuals, cluster_ids, ... coefficient_index=3, # ATT coefficient ... n_bootstrap=999, ... weight_type="rademacher", ... seed=42 ... ) >>> print(f"Bootstrap SE: {results.se:.4f}") >>> print(f"Bootstrap p-value: {results.p_value:.4f}")
References
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. The Review of Economics and Statistics, 90(3), 414-427.
MacKinnon, J. G., & Webb, M. D. (2018). The wild bootstrap for few (treated) clusters. The Econometrics Journal, 21(2), 114-135.
Example
from diff_diff import DifferenceInDifferences, wild_bootstrap_se
# Fit model
did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treated='treated', post='post')
# Bootstrap standard errors
boot_results = wild_bootstrap_se(
data,
outcome='y',
treated='treated',
post='post',
cluster='unit_id',
n_bootstrap=999,
weight_type='rademacher'
)
print(f"Bootstrap SE: {boot_results.se:.3f}")
print(f"Bootstrap 95% CI: [{boot_results.ci[0]:.3f}, {boot_results.ci[1]:.3f}]")
WildBootstrapResults
Container for wild bootstrap results.
- class diff_diff.WildBootstrapResults[source]
Bases:
objectResults from wild cluster bootstrap inference.
- bootstrap_distribution
Full bootstrap distribution of coefficients (if requested).
- Type:
np.ndarray, optional
References
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. The Review of Economics and Statistics, 90(3), 414-427.
- __init__(se, p_value, t_stat_original, ci_lower, ci_upper, n_clusters, n_bootstrap, weight_type, alpha=0.05, bootstrap_distribution=None)
Weight Types
The wild bootstrap supports several weight distributions:
'rademacher': ±1 with equal probability (default, good general choice)'mammen': Two-point distribution matching higher moments'webb': Six-point distribution, better for few clusters
# Using different weight types
boot_rad = wild_bootstrap_se(data, ..., weight_type='rademacher')
boot_webb = wild_bootstrap_se(data, ..., weight_type='webb')
boot_mammen = wild_bootstrap_se(data, ..., weight_type='mammen')
Recommendation
Use
'rademacher'(default) for most casesUse
'webb'when you have fewer than 10 clustersThe
n_bootstrapshould typically be at least 999 for reliable inference