Interactive notebook
This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.
Testing Parallel Trends and DiD Diagnostics#
The parallel trends assumption is the key identifying assumption for Difference-in-Differences. It states that in the absence of treatment, treated and control groups would have followed the same trend.
This notebook covers:
Visual inspection of parallel trends
Statistical tests for parallel trends
Equivalence testing (TOST)
Distributional comparison (Wasserstein)
Placebo tests and diagnostics
Sensitivity analysis
[ ]:
import numpy as np
import pandas as pd
from diff_diff import DifferenceInDifferences, MultiPeriodDiD
from diff_diff.utils import (
check_parallel_trends,
check_parallel_trends_robust,
equivalence_test_trends
)
from diff_diff.diagnostics import (
run_placebo_test,
placebo_timing_test,
placebo_group_test,
permutation_test,
run_all_placebo_tests
)
# For plots
try:
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
HAS_MATPLOTLIB = True
except ImportError:
HAS_MATPLOTLIB = False
print("matplotlib not installed - visualization examples will be skipped")
1. Create Example Data#
We’ll create two datasets:
One where parallel trends holds
One where parallel trends is violated
[ ]:
# Generate panel data using the library function
from diff_diff import generate_panel_data
# Generate data with parallel trends
df_parallel = generate_panel_data(
n_units=100,
n_periods=8,
treatment_period=4,
treatment_fraction=0.5,
treatment_effect=5.0,
parallel_trends=True, # Parallel trends holds
unit_fe_sd=2.0,
noise_sd=0.5,
seed=42
)
# Generate data with non-parallel trends (violation)
df_nonparallel = generate_panel_data(
n_units=100,
n_periods=8,
treatment_period=4,
treatment_fraction=0.5,
treatment_effect=5.0,
parallel_trends=False, # Treated has steeper trend
trend_violation=1.0, # Differential trend = 1.0 per period
unit_fe_sd=2.0,
noise_sd=0.5,
seed=42
)
print("Generated two datasets:")
print(f" - df_parallel: Parallel trends holds")
print(f" - df_nonparallel: Parallel trends violated")
2. Visual Inspection#
The first step is always to plot the data. Look for:
Similar slopes in pre-treatment periods
Divergence only after treatment begins
[ ]:
def plot_trends(df, title, ax):
"""Plot mean outcomes by group over time."""
means = df.groupby(['period', 'treated'])['outcome'].mean().unstack()
treatment_time = df[df['post'] == 1]['period'].min()
ax.plot(means.index, means[0], 'o-', label='Control', color='blue')
ax.plot(means.index, means[1], 's-', label='Treated', color='red')
ax.axvline(x=treatment_time - 0.5, color='gray', linestyle='--',
label='Treatment')
ax.set_xlabel('Period')
ax.set_ylabel('Mean Outcome')
ax.set_title(title)
ax.legend()
if HAS_MATPLOTLIB:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
plot_trends(df_parallel, 'Parallel Trends Holds', axes[0])
plot_trends(df_nonparallel, 'Parallel Trends Violated', axes[1])
plt.tight_layout()
plt.show()
3. Simple Parallel Trends Test#
The check_parallel_trends() function computes and compares the pre-treatment trends.
[ ]:
# Test for parallel trends (parallel case)
results_pt_parallel = check_parallel_trends(
df_parallel,
outcome='outcome',
time='period',
treatment_group='treated',
pre_periods=[0, 1, 2, 3] # Pre-treatment periods
)
print("Parallel Trends Test (parallel case):")
print("=" * 50)
print(f"Treated trend: {results_pt_parallel['treated_trend']:.4f} "
f"(SE: {results_pt_parallel['treated_trend_se']:.4f})")
print(f"Control trend: {results_pt_parallel['control_trend']:.4f} "
f"(SE: {results_pt_parallel['control_trend_se']:.4f})")
print(f"Difference: {results_pt_parallel['trend_difference']:.4f} "
f"(SE: {results_pt_parallel['trend_difference_se']:.4f})")
print(f"t-statistic: {results_pt_parallel['t_statistic']:.4f}")
print(f"p-value: {results_pt_parallel['p_value']:.4f}")
print(f"\nParallel trends plausible: {results_pt_parallel['parallel_trends_plausible']}")
[ ]:
# Test for parallel trends (non-parallel case)
results_pt_nonparallel = check_parallel_trends(
df_nonparallel,
outcome='outcome',
time='period',
treatment_group='treated',
pre_periods=[0, 1, 2, 3]
)
print("\nParallel Trends Test (non-parallel case):")
print("=" * 50)
print(f"Treated trend: {results_pt_nonparallel['treated_trend']:.4f}")
print(f"Control trend: {results_pt_nonparallel['control_trend']:.4f}")
print(f"Difference: {results_pt_nonparallel['trend_difference']:.4f}")
print(f"p-value: {results_pt_nonparallel['p_value']:.4f}")
print(f"\nParallel trends plausible: {results_pt_nonparallel['parallel_trends_plausible']}")
4. Robust Parallel Trends Test (Wasserstein)#
The check_parallel_trends_robust() function uses the Wasserstein (Earth Mover’s) distance to compare the full distribution of outcome changes, not just means.
[ ]:
# Robust test (parallel case)
results_robust_parallel = check_parallel_trends_robust(
df_parallel,
outcome='outcome',
time='period',
treatment_group='treated',
unit='unit',
pre_periods=[0, 1, 2, 3],
n_permutations=999,
seed=42
)
print("Robust Parallel Trends Test (parallel case):")
print("=" * 50)
print(f"Wasserstein distance: {results_robust_parallel['wasserstein_distance']:.4f}")
print(f"Wasserstein (normalized): {results_robust_parallel['wasserstein_normalized']:.4f}")
print(f"Wasserstein p-value: {results_robust_parallel['wasserstein_p_value']:.4f}")
print(f"KS statistic: {results_robust_parallel['ks_statistic']:.4f}")
print(f"KS p-value: {results_robust_parallel['ks_p_value']:.4f}")
print(f"Mean difference: {results_robust_parallel['mean_difference']:.4f}")
print(f"Variance ratio: {results_robust_parallel['variance_ratio']:.4f}")
print(f"\nParallel trends plausible: {results_robust_parallel['parallel_trends_plausible']}")
[ ]:
# Robust test (non-parallel case)
results_robust_nonparallel = check_parallel_trends_robust(
df_nonparallel,
outcome='outcome',
time='period',
treatment_group='treated',
unit='unit',
pre_periods=[0, 1, 2, 3],
n_permutations=999,
seed=42
)
print("\nRobust Parallel Trends Test (non-parallel case):")
print("=" * 50)
print(f"Wasserstein distance: {results_robust_nonparallel['wasserstein_distance']:.4f}")
print(f"Wasserstein p-value: {results_robust_nonparallel['wasserstein_p_value']:.4f}")
print(f"\nParallel trends plausible: {results_robust_nonparallel['parallel_trends_plausible']}")
[ ]:
if HAS_MATPLOTLIB:
# Visualize the distribution of outcome changes
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
for i, (results, title) in enumerate([
(results_robust_parallel, 'Parallel Trends'),
(results_robust_nonparallel, 'Non-Parallel Trends')
]):
ax = axes[i]
ax.hist(results['treated_changes'], bins=20, alpha=0.5,
label='Treated', color='red')
ax.hist(results['control_changes'], bins=20, alpha=0.5,
label='Control', color='blue')
ax.set_xlabel('Outcome Change')
ax.set_ylabel('Frequency')
ax.set_title(f'{title}\n(Wasserstein p={results["wasserstein_p_value"]:.3f})')
ax.legend()
plt.tight_layout()
plt.show()
5. Equivalence Testing (TOST)#
Standard hypothesis testing has low power to detect parallel trends. A better approach is equivalence testing using the Two One-Sided Tests (TOST) procedure.
Instead of asking “Can we reject that trends are different?”, we ask: “Can we confirm that trend differences are smaller than some practically meaningful threshold?”
[ ]:
# Equivalence test (parallel case)
results_equiv_parallel = equivalence_test_trends(
df_parallel,
outcome='outcome',
time='period',
treatment_group='treated',
unit='unit',
pre_periods=[0, 1, 2, 3],
equivalence_margin=0.5 # Differences < 0.5 are "equivalent"
)
print("Equivalence Test (parallel case):")
print("=" * 50)
print(f"Mean difference: {results_equiv_parallel['mean_difference']:.4f}")
print(f"SE: {results_equiv_parallel['se_difference']:.4f}")
print(f"Equivalence margin: +/- {results_equiv_parallel['equivalence_margin']:.4f}")
print(f"TOST p-value: {results_equiv_parallel['tost_p_value']:.4f}")
print(f"\nTrends are equivalent (at alpha=0.05): {results_equiv_parallel['equivalent']}")
[ ]:
# Equivalence test (non-parallel case)
results_equiv_nonparallel = equivalence_test_trends(
df_nonparallel,
outcome='outcome',
time='period',
treatment_group='treated',
unit='unit',
pre_periods=[0, 1, 2, 3],
equivalence_margin=0.5
)
print("\nEquivalence Test (non-parallel case):")
print("=" * 50)
print(f"Mean difference: {results_equiv_nonparallel['mean_difference']:.4f}")
print(f"TOST p-value: {results_equiv_nonparallel['tost_p_value']:.4f}")
print(f"\nTrends are equivalent: {results_equiv_nonparallel['equivalent']}")
6. Placebo Tests#
Placebo tests check whether we would detect “effects” where none should exist. Types of placebo tests:
Timing placebo: Pretend treatment happened earlier
Group placebo: Estimate DiD on never-treated units only
Permutation test: Randomly reassign treatment and see if effect persists
[ ]:
# First, fit the main model
did = DifferenceInDifferences()
main_results = did.fit(
df_parallel,
outcome='outcome',
treatment='treated',
time='post'
)
print("Main DiD Results:")
print(f"ATT: {main_results.att:.4f} (SE: {main_results.se:.4f})")
print(f"p-value: {main_results.p_value:.4f}")
[ ]:
# Placebo timing test
# Estimate DiD with a fake treatment time in pre-period
placebo_timing = placebo_timing_test(
df_parallel,
outcome='outcome',
treatment='treated',
time='period',
fake_treatment_period=2, # Pretend treatment at period 2
post_periods=[4, 5, 6, 7] # Actual post-treatment periods to exclude
)
print("\nPlacebo Timing Test:")
print("=" * 50)
print(f"Placebo ATT: {placebo_timing.placebo_effect:.4f}")
print(f"SE: {placebo_timing.se:.4f}")
print(f"p-value: {placebo_timing.p_value:.4f}")
print(f"\nPass (effect not significant): {not placebo_timing.is_significant}")
[ ]:
# Placebo group test
# Estimate DiD using only never-treated units (some randomly designated as "fake treated")
# First, identify control units (never-treated)
control_units = df_parallel[df_parallel['treated'] == 0]['unit'].unique()
# Randomly select half of control units as "fake treated"
np.random.seed(42)
fake_treated = np.random.choice(control_units, size=len(control_units)//2, replace=False).tolist()
placebo_group = placebo_group_test(
df_parallel,
outcome='outcome',
time='period',
unit='unit',
fake_treated_units=fake_treated,
post_periods=[4, 5, 6, 7] # Periods to use as post-treatment
)
print("\nPlacebo Group Test:")
print("=" * 50)
print(f"Placebo ATT: {placebo_group.placebo_effect:.4f}")
print(f"SE: {placebo_group.se:.4f}")
print(f"p-value: {placebo_group.p_value:.4f}")
print(f"\nPass (effect not significant): {not placebo_group.is_significant}")
[ ]:
# Permutation test
perm_results = permutation_test(
df_parallel,
outcome='outcome',
treatment='treated',
time='post',
unit='unit',
n_permutations=999,
seed=42
)
print("\nPermutation Test:")
print("=" * 50)
print(f"Observed ATT: {perm_results.placebo_effect:.4f}")
print(f"Permutation p-value: {perm_results.p_value:.4f}")
print(f"Number of permutations: {len(perm_results.permutation_distribution)}")
[ ]:
if HAS_MATPLOTLIB:
# Visualize permutation distribution
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(perm_results.permutation_distribution, bins=30, alpha=0.7,
edgecolor='black', label='Permuted effects')
ax.axvline(x=perm_results.placebo_effect, color='red', linewidth=2,
linestyle='--', label=f'Observed = {perm_results.placebo_effect:.2f}')
ax.axvline(x=0, color='gray', linewidth=1, linestyle=':')
ax.set_xlabel('Effect')
ax.set_ylabel('Frequency')
ax.set_title(f'Permutation Test Distribution\n(p-value = {perm_results.p_value:.3f})')
ax.legend()
plt.tight_layout()
plt.show()
7. Comprehensive Diagnostics#
Run all placebo tests at once with run_all_placebo_tests().
[ ]:
# Run comprehensive diagnostics
all_tests = run_all_placebo_tests(
df_parallel,
outcome='outcome',
treatment='treated',
time='period',
unit='unit',
pre_periods=[0, 1, 2, 3], # Pre-treatment periods
post_periods=[4, 5, 6, 7], # Post-treatment periods
n_permutations=499,
seed=42
)
print("Comprehensive Placebo Test Results:")
print("=" * 60)
print(f"{'Test':<25} {'Effect':>10} {'p-value':>10} {'Pass':>10}")
print("-" * 60)
for test_name, result in all_tests.items():
if isinstance(result, dict) and 'error' in result:
print(f"{test_name:<25} {'ERROR':>10} {'-':>10} {result['error'][:20]}")
else:
passed = not result.is_significant # Pass if NOT significant
print(f"{test_name:<25} {result.placebo_effect:>10.4f} {result.p_value:>10.4f} {str(passed):>10}")
8. Event Study as a Parallel Trends Check#
An event study shows period-by-period effects. Pre-treatment coefficients should be close to zero if parallel trends holds.
[ ]:
# Event study
mp_did = MultiPeriodDiD()
event_results = mp_did.fit(
df_parallel,
outcome='outcome',
treatment='treated',
time='period',
post_periods=[4, 5, 6, 7],
reference_period=3 # Use period 3 as reference
)
print(event_results.summary())
[ ]:
from diff_diff.visualization import plot_event_study
if HAS_MATPLOTLIB:
fig, ax = plt.subplots(figsize=(10, 6))
plot_event_study(
results=event_results,
ax=ax,
title='Event Study: Check Pre-trends',
xlabel='Period',
ylabel='Effect'
)
plt.tight_layout()
plt.show()
9. What to Do If Parallel Trends Fails?#
If parallel trends is violated, consider:
Add covariates that might explain differential trends
Use Synthetic DiD which is more robust to trend differences
Use bounds/sensitivity analysis (Rambachan-Roth)
Consider alternative designs (RDD, IV, etc.)
[ ]:
# Example: Compare standard DiD vs Synthetic DiD on non-parallel data
from diff_diff import SyntheticDiD
# Standard DiD (biased when trends differ)
did_np = DifferenceInDifferences()
results_did_np = did_np.fit(
df_nonparallel,
outcome='outcome',
treatment='treated',
time='post'
)
# Synthetic DiD (may be less biased)
sdid = SyntheticDiD(n_bootstrap=99, seed=42)
results_sdid = sdid.fit(
df_nonparallel,
outcome='outcome',
treatment='treated',
unit='unit',
time='period',
post_periods=[4, 5, 6, 7]
)
print("Comparison on Non-Parallel Trends Data")
print("=" * 50)
print(f"True ATT: 5.0")
print(f"")
print(f"Standard DiD:")
print(f" ATT: {results_did_np.att:.4f} (Bias: {results_did_np.att - 5.0:.4f})")
print(f"")
print(f"Synthetic DiD:")
print(f" ATT: {results_sdid.att:.4f} (Bias: {results_sdid.att - 5.0:.4f})")
Summary#
Key takeaways for parallel trends testing:
Always visualize the data first
Simple tests (
check_parallel_trends):Compare pre-treatment slopes
Easy to interpret but limited
Robust tests (
check_parallel_trends_robust):Compare full distributions with Wasserstein distance
More powerful for detecting violations
Equivalence testing (
equivalence_test_trends):Tests whether differences are practically small
Better than “failing to reject” parallel trends
Placebo tests:
Timing: Fake treatment in pre-period
Group: DiD on never-treated only
Permutation: Randomize treatment assignment
Event studies show pre-treatment coefficients should be ~0
If parallel trends fails, consider:
Adding covariates
Synthetic DiD
Sensitivity analysis
Alternative identification strategies