Utilities#
Statistical utilities for parallel trends testing, robust standard errors, and bootstrap inference.
Parallel Trends Testing#
check_parallel_trends#
Test for parallel trends using pre-treatment data.
- diff_diff.check_parallel_trends(data, outcome, time, treatment_group, pre_periods=None)[source]#
Perform a simple check for parallel trends assumption.
This computes the trend (slope) in the outcome variable for both treatment and control groups during pre-treatment periods.
- Parameters:
- Returns:
Dictionary with trend statistics and test results.
- Return type:
Example#
from diff_diff import check_parallel_trends
result = check_parallel_trends(
data,
outcome='y',
time='period',
treatment_group='treated',
pre_periods=[0, 1, 2, 3]
)
print(f"t-statistic: {result['t_statistic']:.3f}")
print(f"p-value: {result['p_value']:.3f}")
if result['p_value'] > 0.05:
print("Cannot reject parallel trends")
else:
print("Evidence against parallel trends")
check_parallel_trends_robust#
Robust parallel trends test using Wasserstein distance with permutation-based inference.
- diff_diff.check_parallel_trends_robust(data, outcome, time, treatment_group, unit=None, pre_periods=None, n_permutations=1000, seed=None, wasserstein_threshold=0.2)[source]#
Perform robust parallel trends testing using distributional comparisons.
Uses the Wasserstein (Earth Mover’s) distance to compare the full distribution of outcome changes between treated and control groups, with permutation-based inference.
- Parameters:
data (pd.DataFrame) – Panel data with repeated observations over time.
outcome (str) – Name of outcome variable column.
time (str) – Name of time period column.
treatment_group (str) – Name of treatment group indicator column (0/1).
unit (str, optional) – Name of unit identifier column. If provided, computes unit-level changes. Otherwise uses observation-level data.
pre_periods (list, optional) – List of pre-treatment time periods. If None, uses first half of periods.
n_permutations (int, default=1000) – Number of permutations for computing p-value.
seed (int, optional) – Random seed for reproducibility.
wasserstein_threshold (float, default=0.2) – Threshold for normalized Wasserstein distance. Values below this threshold (combined with p > 0.05) suggest parallel trends are plausible.
- Returns:
Dictionary containing: - wasserstein_distance: Wasserstein distance between group distributions - wasserstein_p_value: Permutation-based p-value - ks_statistic: Kolmogorov-Smirnov test statistic - ks_p_value: KS test p-value - mean_difference: Difference in mean changes - variance_ratio: Ratio of variances in changes - treated_changes: Array of outcome changes for treated - control_changes: Array of outcome changes for control - parallel_trends_plausible: Boolean assessment
- Return type:
Examples
>>> results = check_parallel_trends_robust( ... data, outcome='sales', time='year', ... treatment_group='treated', unit='firm_id' ... ) >>> print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}") >>> print(f"P-value: {results['wasserstein_p_value']:.4f}")
Notes
The Wasserstein distance (Earth Mover’s Distance) measures the minimum “cost” of transforming one distribution into another. Unlike simple mean comparisons, it captures differences in the entire distribution shape, making it more robust to non-normal data and heterogeneous effects.
A small Wasserstein distance and high p-value suggest the distributions of pre-treatment changes are similar, supporting the parallel trends assumption.
equivalence_test_trends#
Equivalence test for parallel trends (TOST procedure).
- diff_diff.equivalence_test_trends(data, outcome, time, treatment_group, unit=None, pre_periods=None, equivalence_margin=None)[source]#
Perform equivalence testing (TOST) for parallel trends.
Tests whether the difference in trends is practically equivalent to zero using Two One-Sided Tests (TOST) procedure.
- Parameters:
data (pd.DataFrame) – Panel data.
outcome (str) – Name of outcome variable column.
time (str) – Name of time period column.
treatment_group (str) – Name of treatment group indicator column.
unit (str, optional) – Name of unit identifier column.
pre_periods (list, optional) – List of pre-treatment time periods.
equivalence_margin (float, optional) – The margin for equivalence (delta). If None, uses 0.5 * pooled SD of outcome changes as a default.
- Returns:
Dictionary containing: - mean_difference: Difference in mean changes - equivalence_margin: The margin used - lower_p_value: P-value for lower bound test - upper_p_value: P-value for upper bound test - tost_p_value: Maximum of the two p-values - equivalent: Boolean indicating equivalence at alpha=0.05
- Return type:
Example#
from diff_diff import equivalence_test_trends
# Test if pre-trends are equivalent to zero within bounds
result = equivalence_test_trends(
data,
outcome='y',
time='period',
treatment_group='treated',
equivalence_margin=0.5 # Effect size bound
)
if result['equivalent']:
print("Pre-trends are practically equivalent to zero")
Wild Cluster Bootstrap#
wild_bootstrap_se#
Compute wild cluster bootstrap standard errors.
- diff_diff.wild_bootstrap_se(X, y, residuals, cluster_ids, coefficient_index, n_bootstrap=999, weight_type='rademacher', null_hypothesis=0.0, alpha=0.05, seed=None, return_distribution=False)[source]#
Compute wild cluster bootstrap standard errors and p-values.
Implements the Wild Cluster Residual (WCR) bootstrap procedure from Cameron, Gelbach, and Miller (2008). Uses the restricted residuals approach (imposing H0: coefficient = null_hypothesis) for more accurate p-value computation.
- Parameters:
X (np.ndarray) – Design matrix of shape (n, k).
y (np.ndarray) – Outcome vector of shape (n,).
residuals (np.ndarray) – OLS residuals from unrestricted regression, shape (n,).
cluster_ids (np.ndarray) – Cluster identifiers of shape (n,).
coefficient_index (int) – Index of the coefficient for which to compute bootstrap inference. For DiD, this is typically 3 (the treatment*post interaction term).
n_bootstrap (int, default=999) – Number of bootstrap replications. Odd numbers are recommended for exact p-value computation.
weight_type (str, default="rademacher") – Type of bootstrap weights: - “rademacher”: +1 or -1 with equal probability (standard choice) - “webb”: 6-point distribution (recommended for <10 clusters) - “mammen”: Two-point distribution with skewness correction
null_hypothesis (float, default=0.0) – Value of the null hypothesis for p-value computation.
alpha (float, default=0.05) – Significance level for confidence interval.
seed (int, optional) – Random seed for reproducibility. If None (default), results will vary between runs.
return_distribution (bool, default=False) – If True, include full bootstrap distribution in results.
- Returns:
Dataclass containing bootstrap SE, p-value, confidence interval, and other inference results.
- Return type:
- Raises:
ValueError – If weight_type is not recognized or if there are fewer than 2 clusters.
- Warns:
UserWarning – If the number of clusters is less than 5, as bootstrap inference may be unreliable.
Examples
>>> from diff_diff.utils import wild_bootstrap_se >>> results = wild_bootstrap_se( ... X, y, residuals, cluster_ids, ... coefficient_index=3, # ATT coefficient ... n_bootstrap=999, ... weight_type="rademacher", ... seed=42 ... ) >>> print(f"Bootstrap SE: {results.se:.4f}") >>> print(f"Bootstrap p-value: {results.p_value:.4f}")
References
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. The Review of Economics and Statistics, 90(3), 414-427.
MacKinnon, J. G., & Webb, M. D. (2018). The wild bootstrap for few (treated) clusters. The Econometrics Journal, 21(2), 114-135.
Example#
from diff_diff import DifferenceInDifferences, generate_did_data
panel = generate_did_data(n_units=200, n_periods=10, treatment_effect=2.0)
# Use wild bootstrap via the estimator's inference parameter (recommended)
did = DifferenceInDifferences(inference='wild_bootstrap', n_bootstrap=999,
cluster='unit')
results = did.fit(panel, outcome='outcome', treatment='treated',
time='post')
print(f"Bootstrap SE: {results.se:.3f}")
print(f"Bootstrap 95% CI: [{results.conf_int[0]:.3f}, {results.conf_int[1]:.3f}]")
Note
wild_bootstrap_se() is a low-level function that operates on numpy arrays
(X, y, residuals, cluster_ids). For most users, the estimator-level
inference='wild_bootstrap' parameter shown above is more convenient.
WildBootstrapResults#
Container for wild bootstrap results.
- class diff_diff.WildBootstrapResults[source]
Bases:
objectResults from wild cluster bootstrap inference.
- se
Bootstrap standard error of the coefficient.
- Type:
- p_value
Bootstrap p-value (two-sided).
- Type:
- t_stat_original
Original t-statistic from the data.
- Type:
- ci_lower
Lower bound of the confidence interval.
- Type:
- ci_upper
Upper bound of the confidence interval.
- Type:
- n_clusters
Number of clusters in the data.
- Type:
- n_bootstrap
Number of bootstrap replications.
- Type:
- weight_type
Type of bootstrap weights used (“rademacher”, “webb”, or “mammen”).
- Type:
- alpha
Significance level used for confidence interval.
- Type:
- bootstrap_distribution
Full bootstrap distribution of coefficients (if requested).
- Type:
np.ndarray, optional
References
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. The Review of Economics and Statistics, 90(3), 414-427.
- se: float
- p_value: float
- t_stat_original: float
- ci_lower: float
- ci_upper: float
- n_clusters: int
- n_bootstrap: int
- weight_type: str
- alpha: float = 0.05
- print_summary()[source]
Print formatted summary to stdout.
- Return type:
None
- __init__(se, p_value, t_stat_original, ci_lower, ci_upper, n_clusters, n_bootstrap, weight_type, alpha=0.05, bootstrap_distribution=None)
Weight Types#
The wild bootstrap supports several weight distributions:
'rademacher': ±1 with equal probability (default, good general choice)'mammen': Two-point distribution matching higher moments'webb': Six-point distribution, better for few clusters
# Using different weight types (low-level array API)
# wild_bootstrap_se(X, y, residuals, cluster_ids, coefficient_index, ...)
boot_rad = wild_bootstrap_se(X, y, resid, clusters, 0, weight_type='rademacher')
boot_webb = wild_bootstrap_se(X, y, resid, clusters, 0, weight_type='webb')
boot_mammen = wild_bootstrap_se(X, y, resid, clusters, 0, weight_type='mammen')
Recommendation#
Use
'rademacher'(default) for most casesUse
'webb'when you have fewer than 10 clustersThe
n_bootstrapshould typically be at least 999 for reliable inference