Interactive notebook

This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.

Honest DiD: Sensitivity Analysis for Parallel Trends#

The parallel trends assumption is crucial for difference-in-differences (DiD) validity, but it is fundamentally untestable. Honest DiD (Rambachan & Roth 2023) provides a framework for:

Relaxing the parallel trends assumption
Computing bounds on treatment effects under potential violations
Constructing robust confidence intervals that remain valid even if parallel trends is violated
Computing “breakdown values” showing how much violation is needed to nullify results

This notebook covers:

Motivation: Why standard event studies can be misleading
Basic usage with HonestDiD
Interpreting bounds and breakdown values
Sensitivity analysis over a grid of M values
Visualization
Advanced: Smoothness restrictions

[ ]:

import numpy as np
import pandas as pd
from diff_diff import MultiPeriodDiD
from diff_diff.honest_did import (
    HonestDiD,
    compute_honest_did,
    DeltaSD,
    DeltaRM,
)

# For plots
try:
    import matplotlib.pyplot as plt
    plt.style.use('seaborn-v0_8-whitegrid')
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False
    print("matplotlib not installed - visualization examples will be skipped")

1. Motivation: The Problem with Pre-trend Testing#

Researchers often test for parallel trends by checking if pre-treatment coefficients are statistically significant. However, this approach has serious problems:

Low power: With typical sample sizes, we may fail to detect real violations
Pre-test bias: Conditioning on passing a pre-trends test biases inference
Post-treatment violations: Even if pre-trends look good, post-treatment violations can occur

Honest DiD addresses these issues by:

Not requiring parallel trends to hold exactly
Allowing for bounded violations related to observed pre-trends
Providing valid inference under these weaker assumptions

2. Generate Example Data#

We’ll create panel data with:

A true treatment effect of 5.0
Some pre-trend violations (to make results interesting)

[ ]:

def generate_did_data(n_units=200, n_periods=10, true_att=5.0,
                      pre_trend_violation=0.3, seed=42):
    """
    Generate panel data with potential parallel trends violations.

    Parameters
    ----------
    pre_trend_violation : float
        Magnitude of differential pre-trend between treated and control.
        0 = perfect parallel trends, larger = more violation.
    """
    np.random.seed(seed)
    treatment_time = n_periods // 2

    data = []
    for unit in range(n_units):
        is_treated = unit < n_units // 2
        unit_effect = np.random.normal(0, 2)

        for period in range(n_periods):
            # Common time trend
            time_effect = period * 1.0

            # Add differential pre-trend for treated (parallel trends violation)
            if is_treated:
                time_effect += pre_trend_violation * (period - treatment_time)

            y = 10.0 + unit_effect + time_effect

            # Treatment effect
            post = period >= treatment_time
            if is_treated and post:
                y += true_att

            y += np.random.normal(0, 1)

            data.append({
                'unit': unit,
                'period': period,
                'treated': int(is_treated),
                'post': int(post),
                'outcome': y
            })

    return pd.DataFrame(data)

# Generate data with mild pre-trend violation
df = generate_did_data(pre_trend_violation=0.2)
print(f"Generated {len(df)} observations")
print(f"Treatment time: period 5")
print(f"True ATT: 5.0")

3. Fit Standard Event Study#

First, let’s estimate a standard event study using MultiPeriodDiD.

[ ]:

# Fit event study
mp_did = MultiPeriodDiD()
event_results = mp_did.fit(
    df,
    outcome='outcome',
    treatment='treated',
    time='period',
    post_periods=[5, 6, 7, 8, 9]
)

print(event_results.summary())

[ ]:

from diff_diff.visualization import plot_event_study

if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(10, 6))
    plot_event_study(
        event_results,
        ax=ax,
        title='Standard Event Study',
        show=False
    )
    plt.tight_layout()
    plt.show()

4. Basic Honest DiD: Relative Magnitudes#

The relative magnitudes approach bounds post-treatment violations by M times the maximum observed pre-treatment violation:

\[|\delta_{post}| \leq \bar{M} \times \max(|\delta_{pre}|)\]

Where:

\(\delta_t\) is the violation of parallel trends at time \(t\)
\(\bar{M} = 1\) means post-treatment violations can be as bad as the worst pre-treatment violation
\(\bar{M} = 0\) is equivalent to assuming parallel trends holds exactly

[ ]:

# Create HonestDiD estimator
honest = HonestDiD(
    method='relative_magnitude',
    M=1.0,  # Post-treatment violations up to 1x max pre-treatment violation
    alpha=0.05
)

# Compute bounds
honest_results = honest.fit(event_results)

print(honest_results.summary())

Interpreting the Results#

The output shows:

Original Estimate: The point estimate assuming parallel trends (standard DiD)
Identified Set: The range of treatment effects consistent with the data and our assumptions about violations. Wider with larger M.
Robust CI: A confidence interval that covers the true effect with 95% probability regardless of which value in the identified set is correct.
Effect robust to violations: Whether the robust CI excludes zero. If yes, the effect is significant even under potential violations.

[ ]:

# Key results
print(f"Original estimate: {honest_results.original_estimate:.4f}")
print(f"Identified set: [{honest_results.lb:.4f}, {honest_results.ub:.4f}]")
print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]")
print(f"CI width: {honest_results.ci_width:.4f}")
print(f"")
print(f"Effect robust to M={honest_results.M} violations: {honest_results.is_significant}")

5. Sensitivity Analysis#

A key feature of Honest DiD is examining how results change as we allow larger violations. This helps answer: “How much would parallel trends need to be violated to overturn our conclusions?”

[ ]:

# Run sensitivity analysis over a grid of M values
sensitivity = honest.sensitivity_analysis(
    event_results,
    M_grid=[0, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 3.0]
)

print(sensitivity.summary())

[ ]:

# Key takeaway: the breakdown value
print(f"Breakdown value: {sensitivity.breakdown_M}")
print("")
if sensitivity.breakdown_M is not None:
    print(f"The result is robust to violations up to M = {sensitivity.breakdown_M:.2f}")
    print(f"This means post-treatment trend violations could be up to ")
    print(f"{sensitivity.breakdown_M:.1f}x the worst pre-treatment violation ")
    print(f"and we'd still conclude the effect is positive.")
else:
    print("No breakdown found - effect is always significant!")

[ ]:

# Visualize the sensitivity analysis
if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(10, 6))
    sensitivity.plot(ax=ax, show=False)
    plt.tight_layout()
    plt.show()

Reading the Sensitivity Plot#

X-axis (M): How much we allow post-treatment violations relative to pre-treatment violations
Shaded region: The identified set (range of possible treatment effects)
Blue lines: Robust confidence interval
Red dashed line: Breakdown value (where CI first includes zero)
Black line: Original estimate (under parallel trends)

As M increases:

The identified set widens (more possible violations)
Eventually, the CI includes zero (we can no longer rule out no effect)

6. Different Restriction Parameters#

Let’s compare results for different values of M:

[ ]:

# Compare different M values
M_values = [0, 0.5, 1.0, 2.0]

print(f"{'M':<8} {'CI Lower':>12} {'CI Upper':>12} {'Significant':>12}")
print("-" * 48)

for M in M_values:
    result = honest.fit(event_results, M=M)
    sig = "Yes" if result.is_significant else "No"
    print(f"{M:<8.2f} {result.ci_lb:>12.4f} {result.ci_ub:>12.4f} {sig:>12}")

7. Breakdown Value#

The breakdown value is the smallest M where the robust CI first includes zero. It tells us how robust our conclusion is to parallel trends violations.

[ ]:

# Compute breakdown value directly
breakdown = honest.breakdown_value(event_results, tol=0.01)

if breakdown is not None:
    print(f"Breakdown value: M = {breakdown:.3f}")
    print("")
    print("Interpretation:")
    print(f"  - For M < {breakdown:.2f}: Effect is statistically significant")
    print(f"  - For M >= {breakdown:.2f}: Cannot rule out zero effect")
    print("")
    print("Is this robust enough?")
    if breakdown >= 1.0:
        print(f"  Yes! Result holds even if post-treatment violations ")
        print(f"  are as bad as observed pre-treatment violations.")
    else:
        print(f"  Somewhat. Result requires post-treatment violations ")
        print(f"  to be smaller than pre-treatment violations.")
else:
    print("No breakdown found - effect is always significant!")

8. Smoothness Restrictions#

An alternative approach restricts the second differences of the trend violations:

\[|\delta_{t+1} - 2\delta_t + \delta_{t-1}| \leq M\]

This says violations must change smoothly over time:

\(M = 0\): Violations must follow a linear trend (linear extrapolation of pre-trends)
\(M > 0\): Allows some non-linearity in how violations evolve

[ ]:

# Smoothness restriction
honest_smooth = HonestDiD(
    method='smoothness',
    M=0.5,  # Allow some curvature
    alpha=0.05
)

smooth_results = honest_smooth.fit(event_results)
print(smooth_results.summary())

[ ]:

# Compare smoothness vs relative magnitudes
print("Comparison of Methods (M=1.0)")
print("=" * 60)

rm_result = HonestDiD(method='relative_magnitude', M=1.0).fit(event_results)
sd_result = HonestDiD(method='smoothness', M=1.0).fit(event_results)

print(f"{'Method':<25} {'CI Lower':>12} {'CI Upper':>12} {'Width':>10}")
print("-" * 60)
print(f"{'Relative Magnitudes':<25} {rm_result.ci_lb:>12.4f} {rm_result.ci_ub:>12.4f} {rm_result.ci_width:>10.4f}")
print(f"{'Smoothness':<25} {sd_result.ci_lb:>12.4f} {sd_result.ci_ub:>12.4f} {sd_result.ci_width:>10.4f}")

9. Using the Convenience Function#

For quick analysis, use compute_honest_did():

[ ]:

# One-liner for quick bounds
quick_result = compute_honest_did(
    event_results,
    method='relative_magnitude',
    M=1.0
)

print(f"Quick bounds: [{quick_result.ci_lb:.3f}, {quick_result.ci_ub:.3f}]")

10. Converting Results to DataFrames#

Results can be exported for further analysis:

[ ]:

# Single result to DataFrame
print("Single result:")
print(honest_results.to_dataframe())

[ ]:

# Sensitivity analysis to DataFrame
print("\nSensitivity analysis:")
sensitivity.to_dataframe()

Summary#

Key Takeaways:

Honest DiD provides robust inference without assuming parallel trends holds exactly
Relative magnitudes (M̄) bounds post-treatment violations by a multiple of observed pre-treatment violations
- M̄=0: Standard parallel trends
- M̄=1: Violations as bad as worst pre-period
- M̄>1: Even larger violations allowed
Smoothness (M) bounds the curvature of violations over time
- M=0: Linear extrapolation of pre-trends
- M>0: Allows non-linear changes
Breakdown value tells you how robust your conclusion is
Best practices:
- Report results for multiple M values
- Include the sensitivity plot in publications
- Discuss what violation magnitudes are plausible in your setting
- Use breakdown value to assess robustness

Related Tutorials:

04_parallel_trends.ipynb - Standard parallel trends testing
06_power_analysis.ipynb - Power analysis for study design
07_pretrends_power.ipynb - Pre-trends power analysis (Roth 2022) - assess what violations your pre-trends test could have detected

Reference:

Rambachan, A., & Roth, J. (2023). A More Credible Approach to Parallel Trends. The Review of Economic Studies, 90(5), 2555-2591. https://doi.org/10.1093/restud/rdad018