Utilities#

Statistical utilities for parallel trends testing, robust standard errors, and bootstrap inference.

Wild Cluster Bootstrap#

wild_bootstrap_se#

Compute wild cluster bootstrap standard errors.

diff_diff.wild_bootstrap_se(X, y, residuals, cluster_ids, coefficient_index, n_bootstrap=999, weight_type='rademacher', null_hypothesis=0.0, alpha=0.05, seed=None, return_distribution=False)[source]#

Compute wild cluster bootstrap standard errors and p-values.

Implements the Wild Cluster Residual (WCR) bootstrap procedure from Cameron, Gelbach, and Miller (2008). Uses the restricted residuals approach (imposing H0: coefficient = null_hypothesis) for more accurate p-value computation.

Parameters:
  • X (np.ndarray) – Design matrix of shape (n, k).

  • y (np.ndarray) – Outcome vector of shape (n,).

  • residuals (np.ndarray) – OLS residuals from unrestricted regression, shape (n,).

  • cluster_ids (np.ndarray) – Cluster identifiers of shape (n,).

  • coefficient_index (int) – Index of the coefficient for which to compute bootstrap inference. For DiD, this is typically 3 (the treatment*post interaction term).

  • n_bootstrap (int, default=999) – Number of bootstrap replications. Odd numbers are recommended for exact p-value computation.

  • weight_type (str, default="rademacher") – Type of bootstrap weights: - “rademacher”: +1 or -1 with equal probability (standard choice) - “webb”: 6-point distribution (recommended for <10 clusters) - “mammen”: Two-point distribution with skewness correction

  • null_hypothesis (float, default=0.0) – Value of the null hypothesis for p-value computation.

  • alpha (float, default=0.05) – Significance level for confidence interval.

  • seed (int, optional) – Random seed for reproducibility. If None (default), results will vary between runs.

  • return_distribution (bool, default=False) – If True, include full bootstrap distribution in results.

Returns:

Dataclass containing bootstrap SE, p-value, confidence interval, and other inference results.

Return type:

WildBootstrapResults

Raises:

ValueError – If weight_type is not recognized or if there are fewer than 2 clusters.

Warns:

UserWarning – If the number of clusters is less than 5, as bootstrap inference may be unreliable.

Examples

>>> from diff_diff.utils import wild_bootstrap_se
>>> results = wild_bootstrap_se(
...     X, y, residuals, cluster_ids,
...     coefficient_index=3,  # ATT coefficient
...     n_bootstrap=999,
...     weight_type="rademacher",
...     seed=42
... )
>>> print(f"Bootstrap SE: {results.se:.4f}")
>>> print(f"Bootstrap p-value: {results.p_value:.4f}")

References

Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. The Review of Economics and Statistics, 90(3), 414-427.

MacKinnon, J. G., & Webb, M. D. (2018). The wild bootstrap for few (treated) clusters. The Econometrics Journal, 21(2), 114-135.

Example#

from diff_diff import DifferenceInDifferences, generate_did_data

panel = generate_did_data(n_units=200, n_periods=10, treatment_effect=2.0)

# Use wild bootstrap via the estimator's inference parameter (recommended)
did = DifferenceInDifferences(inference='wild_bootstrap', n_bootstrap=999,
                               cluster='unit')
results = did.fit(panel, outcome='outcome', treatment='treated',
                  time='post')

print(f"Bootstrap SE: {results.se:.3f}")
print(f"Bootstrap 95% CI: [{results.conf_int[0]:.3f}, {results.conf_int[1]:.3f}]")

Note

wild_bootstrap_se() is a low-level function that operates on numpy arrays (X, y, residuals, cluster_ids). For most users, the estimator-level inference='wild_bootstrap' parameter shown above is more convenient.

WildBootstrapResults#

Container for wild bootstrap results.

class diff_diff.WildBootstrapResults[source]

Bases: object

Results from wild cluster bootstrap inference.

se

Bootstrap standard error of the coefficient.

Type:

float

p_value

Bootstrap p-value (two-sided).

Type:

float

t_stat_original

Original t-statistic from the data.

Type:

float

ci_lower

Lower bound of the confidence interval.

Type:

float

ci_upper

Upper bound of the confidence interval.

Type:

float

n_clusters

Number of clusters in the data.

Type:

int

n_bootstrap

Number of bootstrap replications.

Type:

int

weight_type

Type of bootstrap weights used (“rademacher”, “webb”, or “mammen”).

Type:

str

alpha

Significance level used for confidence interval.

Type:

float

bootstrap_distribution

Full bootstrap distribution of coefficients (if requested).

Type:

np.ndarray, optional

References

Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-Based Improvements for Inference with Clustered Errors. The Review of Economics and Statistics, 90(3), 414-427.

se: float
p_value: float
t_stat_original: float
ci_lower: float
ci_upper: float
n_clusters: int
n_bootstrap: int
weight_type: str
alpha: float = 0.05
bootstrap_distribution: ndarray | None = None
summary()[source]

Generate formatted summary of bootstrap results.

Return type:

str

print_summary()[source]

Print formatted summary to stdout.

Return type:

None

__init__(se, p_value, t_stat_original, ci_lower, ci_upper, n_clusters, n_bootstrap, weight_type, alpha=0.05, bootstrap_distribution=None)
Parameters:
Return type:

None

Weight Types#

The wild bootstrap supports several weight distributions:

  • 'rademacher': ±1 with equal probability (default, good general choice)

  • 'mammen': Two-point distribution matching higher moments

  • 'webb': Six-point distribution, better for few clusters

# Using different weight types (low-level array API)
# wild_bootstrap_se(X, y, residuals, cluster_ids, coefficient_index, ...)
boot_rad = wild_bootstrap_se(X, y, resid, clusters, 0, weight_type='rademacher')
boot_webb = wild_bootstrap_se(X, y, resid, clusters, 0, weight_type='webb')
boot_mammen = wild_bootstrap_se(X, y, resid, clusters, 0, weight_type='mammen')

Recommendation#

  • Use 'rademacher' (default) for most cases

  • Use 'webb' when you have fewer than 10 clusters

  • The n_bootstrap should typically be at least 999 for reliable inference