diff_diff.check_parallel_trends_robust#
- diff_diff.check_parallel_trends_robust(data, outcome, time, treatment_group, unit=None, pre_periods=None, n_permutations=1000, seed=None, wasserstein_threshold=0.2)[source]
Perform robust parallel trends testing using distributional comparisons.
Uses the Wasserstein (Earth Mover’s) distance to compare the full distribution of outcome changes between treated and control groups, with permutation-based inference.
- Parameters:
data (pd.DataFrame) – Panel data with repeated observations over time.
outcome (str) – Name of outcome variable column.
time (str) – Name of time period column.
treatment_group (str) – Name of treatment group indicator column (0/1).
unit (str, optional) – Name of unit identifier column. If provided, computes unit-level changes. Otherwise uses observation-level data.
pre_periods (list, optional) – List of pre-treatment time periods. If None, uses first half of periods.
n_permutations (int, default=1000) – Number of permutations for computing p-value.
seed (int, optional) – Random seed for reproducibility.
wasserstein_threshold (float, default=0.2) – Threshold for normalized Wasserstein distance. Values below this threshold (combined with p > 0.05) suggest parallel trends are plausible.
- Returns:
Dictionary containing: - wasserstein_distance: Wasserstein distance between group distributions - wasserstein_p_value: Permutation-based p-value - ks_statistic: Kolmogorov-Smirnov test statistic - ks_p_value: KS test p-value - mean_difference: Difference in mean changes - variance_ratio: Ratio of variances in changes - treated_changes: Array of outcome changes for treated - control_changes: Array of outcome changes for control - parallel_trends_plausible: Boolean assessment
- Return type:
Examples
>>> results = check_parallel_trends_robust( ... data, outcome='sales', time='year', ... treatment_group='treated', unit='firm_id' ... ) >>> print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}") >>> print(f"P-value: {results['wasserstein_p_value']:.4f}")
Notes
The Wasserstein distance (Earth Mover’s Distance) measures the minimum “cost” of transforming one distribution into another. Unlike simple mean comparisons, it captures differences in the entire distribution shape, making it more robust to non-normal data and heterogeneous effects.
A small Wasserstein distance and high p-value suggest the distributions of pre-treatment changes are similar, supporting the parallel trends assumption.