Interactive notebook

This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.

Real-World Data Examples#

This notebook demonstrates diff-diff using real-world datasets from classic econometric studies. We’ll cover:

  1. Card & Krueger (1994) - Classic 2x2 DiD: Effect of minimum wage on employment

  2. Castle Doctrine Laws - Staggered adoption: Effect of self-defense laws on homicide rates

  3. Unilateral Divorce Laws - Staggered adoption: Effect of no-fault divorce on divorce rates

These examples show how to apply DiD methods to real policy questions and replicate findings from influential studies.

[ ]:
import numpy as np
import pandas as pd

from diff_diff import (
    DifferenceInDifferences,
    TwoWayFixedEffects,
    CallawaySantAnna,
    SunAbraham,
    bacon_decompose,
)
from diff_diff.datasets import (
    load_card_krueger,
    load_castle_doctrine,
    load_divorce_laws,
    list_datasets,
)
from diff_diff.visualization import plot_event_study, plot_bacon, plot_group_effects

# For plots
try:
    import matplotlib.pyplot as plt
    plt.style.use('seaborn-v0_8-whitegrid')
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False
    print("matplotlib not installed - visualization examples will be skipped")
[ ]:
# List available datasets
print("Available real-world datasets in diff-diff:")
print("=" * 60)
for name, desc in list_datasets().items():
    print(f"  {name}: {desc}")

1. Card & Krueger (1994): Minimum Wage and Employment#

Background#

On April 1, 1992, New Jersey raised its minimum wage from $4.25 to $5.05 per hour, while neighboring Pennsylvania kept its minimum wage at $4.25. Card and Krueger conducted a survey of fast-food restaurants in both states before and after the wage increase.

Research question: Does raising the minimum wage reduce employment?

Design: Classic 2x2 DiD

  • Treatment group: New Jersey restaurants

  • Control group: Pennsylvania restaurants

  • Pre-period: February 1992 (before wage increase)

  • Post-period: November 1992 (after wage increase)

Key finding: No significant negative effect on employment; point estimate was actually positive (+2.8 FTE employees).

[ ]:
# Load the Card-Krueger dataset
ck = load_card_krueger()

print(f"Dataset shape: {ck.shape}")
print(f"\nStores by state:")
print(ck.groupby('state').size())
print(f"\nFirst few rows:")
ck.head()
[ ]:
# Summary statistics by state
print("Summary Statistics by State")
print("=" * 60)

summary = ck.groupby('state').agg({
    'emp_pre': ['mean', 'std'],
    'emp_post': ['mean', 'std'],
    'emp_change': ['mean', 'std'],
    'wage_pre': 'mean',
    'wage_post': 'mean',
}).round(2)

summary.columns = ['Emp Pre (mean)', 'Emp Pre (sd)',
                   'Emp Post (mean)', 'Emp Post (sd)',
                   'Emp Change (mean)', 'Emp Change (sd)',
                   'Wage Pre', 'Wage Post']
summary

Preparing Data for DiD#

The data is in “wide” format (one row per store). We need to convert it to “long” format for the DiD estimator.

[ ]:
# Reshape to long format
ck_long = ck.melt(
    id_vars=['store_id', 'state', 'chain', 'treated'],
    value_vars=['emp_pre', 'emp_post'],
    var_name='period',
    value_name='employment'
)

# Create post indicator
ck_long['post'] = (ck_long['period'] == 'emp_post').astype(int)

# Drop missing employment values
ck_long = ck_long.dropna(subset=['employment'])

print(f"Long format shape: {ck_long.shape}")
print(f"\nSample distribution:")
print(ck_long.groupby(['state', 'post']).size().unstack())
ck_long.head()

DiD Estimation#

[ ]:
# Basic DiD estimation
did = DifferenceInDifferences(robust=True)

results = did.fit(
    ck_long,
    outcome='employment',
    treatment='treated',
    time='post'
)

print("Card & Krueger DiD Results")
print("=" * 60)
print(results.summary())
[ ]:
# Manual calculation to verify
print("\nManual DiD Calculation:")
print("-" * 40)

nj_pre = ck_long[(ck_long['state'] == 'NJ') & (ck_long['post'] == 0)]['employment'].mean()
nj_post = ck_long[(ck_long['state'] == 'NJ') & (ck_long['post'] == 1)]['employment'].mean()
pa_pre = ck_long[(ck_long['state'] == 'PA') & (ck_long['post'] == 0)]['employment'].mean()
pa_post = ck_long[(ck_long['state'] == 'PA') & (ck_long['post'] == 1)]['employment'].mean()

print(f"NJ (pre):  {nj_pre:.2f}")
print(f"NJ (post): {nj_post:.2f}")
print(f"NJ change: {nj_post - nj_pre:.2f}")
print()
print(f"PA (pre):  {pa_pre:.2f}")
print(f"PA (post): {pa_post:.2f}")
print(f"PA change: {pa_post - pa_pre:.2f}")
print()
print(f"DiD estimate: {(nj_post - nj_pre) - (pa_post - pa_pre):.2f}")
[ ]:
# With chain fixed effects for better precision
did_fe = DifferenceInDifferences(robust=True)

results_fe = did_fe.fit(
    ck_long,
    outcome='employment',
    treatment='treated',
    time='post',
    fixed_effects=['chain']
)

print("DiD with Chain Fixed Effects")
print("=" * 60)
print(results_fe.summary())
print(f"\nNote: Adding chain FE controls for systematic differences across chains.")

Interpretation#

The DiD estimate suggests that New Jersey’s minimum wage increase did not lead to a decrease in employment. If anything, the point estimate is slightly positive, though not statistically significant.

This result challenged the traditional economic view that minimum wage increases necessarily reduce employment, and sparked extensive debate and follow-up research.

[ ]:
# Visualization: Employment trends
if HAS_MATPLOTLIB:
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Mean employment by state and period
    means = ck_long.groupby(['state', 'post'])['employment'].mean().unstack()
    means.columns = ['Feb 1992', 'Nov 1992']

    ax = axes[0]
    x = [0, 1]
    ax.plot(x, means.loc['NJ'], 'o-', label='NJ (Treated)', color='#2ecc71', linewidth=2, markersize=8)
    ax.plot(x, means.loc['PA'], 's--', label='PA (Control)', color='#3498db', linewidth=2, markersize=8)
    ax.axvline(x=0.5, color='red', linestyle=':', alpha=0.5, label='Min wage increase')
    ax.set_xticks([0, 1])
    ax.set_xticklabels(['Feb 1992\n(Pre)', 'Nov 1992\n(Post)'])
    ax.set_ylabel('Mean FTE Employment')
    ax.set_title('Employment Trends: NJ vs PA')
    ax.legend()
    ax.grid(True, alpha=0.3)

    # Distribution of employment changes
    ax = axes[1]
    nj_changes = ck[ck['state'] == 'NJ']['emp_change'].dropna()
    pa_changes = ck[ck['state'] == 'PA']['emp_change'].dropna()
    ax.hist(nj_changes, bins=20, alpha=0.6, label='NJ', color='#2ecc71')
    ax.hist(pa_changes, bins=20, alpha=0.6, label='PA', color='#3498db')
    ax.axvline(nj_changes.mean(), color='#27ae60', linestyle='--', linewidth=2)
    ax.axvline(pa_changes.mean(), color='#2980b9', linestyle='--', linewidth=2)
    ax.set_xlabel('Employment Change (FTE)')
    ax.set_ylabel('Frequency')
    ax.set_title('Distribution of Employment Changes')
    ax.legend()

    plt.tight_layout()
    plt.show()

2. Castle Doctrine Laws: Staggered Adoption#

Background#

Castle Doctrine (or “Stand Your Ground”) laws expand self-defense rights by removing the duty to retreat before using deadly force. These laws were adopted by different U.S. states at different times, creating a staggered adoption design.

Research question: Do Castle Doctrine laws affect homicide rates?

Design: Staggered DiD

  • Treatment: Adoption of Castle Doctrine law

  • Cohorts: States adopting in 2005, 2006, 2007, 2008, 2009

  • Control: States that never adopted during the study period

Key finding: Cheng & Hoekstra (2013) found an approximately 8% increase in homicide rates following adoption.

[ ]:
# Load the Castle Doctrine dataset
castle = load_castle_doctrine()

print(f"Dataset shape: {castle.shape}")
print(f"Years: {castle['year'].min()} to {castle['year'].max()}")
print(f"States: {castle['state'].nunique()}")
castle.head()
[ ]:
# Treatment timing
cohort_summary = castle.drop_duplicates('state')[['state', 'first_treat']].sort_values('first_treat')

print("Treatment Cohorts")
print("=" * 40)
cohort_counts = cohort_summary.groupby('first_treat').size()
for cohort, n in cohort_counts.items():
    if cohort == 0:
        print(f"Never treated: {n} states")
    else:
        print(f"Adopted in {cohort}: {n} states")

print(f"\nTotal: {len(cohort_summary)} states")

Why Standard TWFE Fails Here#

With staggered adoption and potentially heterogeneous treatment effects, traditional TWFE can give biased estimates. Let’s see why using the Goodman-Bacon decomposition.

[ ]:
# TWFE estimation (potentially biased)
twfe = TwoWayFixedEffects()

# Need to create numeric state IDs for TWFE
castle['state_id'] = castle['state'].astype('category').cat.codes

results_twfe = twfe.fit(
    castle,
    outcome='homicide_rate',
    treatment='treated',
    unit='state_id',
    time='year'
)

print("TWFE Results (potentially biased)")
print("=" * 60)
print(f"ATT: {results_twfe.att:.4f}")
print(f"SE:  {results_twfe.se:.4f}")
print(f"\nNote: TWFE may be biased with staggered adoption.")
[ ]:
# Goodman-Bacon decomposition reveals the problem
bacon_results = bacon_decompose(
    castle,
    outcome='homicide_rate',
    unit='state',
    time='year',
    first_treat='first_treat'
)

bacon_results.print_summary()
[ ]:
# Visualize the decomposition
if HAS_MATPLOTLIB:
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    plot_bacon(bacon_results, ax=axes[0], plot_type='scatter', show=False)
    plot_bacon(bacon_results, ax=axes[1], plot_type='bar', show=False)

    plt.tight_layout()
    plt.show()

    forbidden_weight = bacon_results.total_weight_later_vs_earlier
    print(f"\n{forbidden_weight:.1%} of TWFE weight comes from 'forbidden comparisons'")

Callaway-Sant’Anna Estimator#

The CS estimator properly handles staggered adoption by:

  1. Computing group-time effects ATT(g,t) for each cohort and time period

  2. Only using not-yet-treated or never-treated units as controls

  3. Properly aggregating effects

[ ]:
# Callaway-Sant'Anna estimation
cs = CallawaySantAnna(
    control_group='never_treated',
    n_bootstrap=199,
    seed=42
)

results_cs = cs.fit(
    castle,
    outcome='homicide_rate',
    unit='state',
    time='year',
    first_treat='first_treat',
    aggregate='all'  # Compute all aggregations (simple, event_study, group)
)

print(results_cs.summary())
[ ]:
# Aggregated Results
print("Aggregated Results")
print("=" * 60)

# Overall ATT (simple aggregation is computed automatically)
print(f"\nOverall ATT: {results_cs.overall_att:.4f} (SE: {results_cs.overall_se:.4f})")
print(f"95% CI: [{results_cs.overall_conf_int[0]:.4f}, {results_cs.overall_conf_int[1]:.4f}]")

# By cohort (group_effects is populated when aggregate='group' or 'all')
print("\nEffects by Adoption Cohort:")
for cohort in sorted(results_cs.group_effects.keys()):
    eff = results_cs.group_effects[cohort]
    print(f"  Cohort {cohort}: {eff['effect']:>7.4f} (SE: {eff['se']:.4f})")
[ ]:
# Event study aggregation (event_study_effects is populated when aggregate='event_study' or 'all')
print("Event Study Results (Effect by Years Since Adoption)")
print("=" * 60)
print(f"{'Event Time':>12} {'ATT':>10} {'SE':>10} {'95% CI':>25}")
print("-" * 60)

for e in sorted(results_cs.event_study_effects.keys()):
    eff = results_cs.event_study_effects[e]
    ci = eff['conf_int']
    sig = '*' if eff['p_value'] < 0.05 else ''
    print(f"{e:>12} {eff['effect']:>10.4f} {eff['se']:>10.4f} [{ci[0]:>8.4f}, {ci[1]:>8.4f}] {sig}")
[ ]:
# Event study visualization
if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(10, 6))
    plot_event_study(
        results=results_cs,
        ax=ax,
        title='Castle Doctrine Laws: Effect on Homicide Rates',
        xlabel='Years Since Law Adoption',
        ylabel='Effect on Homicide Rate (per 100k)'
    )
    plt.tight_layout()
    plt.show()

Robustness Check: Sun-Abraham Estimator#

Running both CS and Sun-Abraham provides a useful robustness check.

[ ]:
# Sun-Abraham estimation
sa = SunAbraham(control_group='never_treated')

results_sa = sa.fit(
    castle,
    outcome='homicide_rate',
    unit='state',
    time='year',
    first_treat='first_treat'
)

results_sa.print_summary()
[ ]:
# Compare CS and SA
cs_name = "Callaway-Sant'Anna"
sa_name = "Sun-Abraham"
twfe_name = "TWFE (potentially biased)"

print("Robustness Check: CS vs Sun-Abraham")
print("=" * 60)
print(f"{'Estimator':<25} {'Overall ATT':>15} {'SE':>10}")
print("-" * 60)
print(f"{cs_name:<25} {results_cs.overall_att:>15.4f} {results_cs.overall_se:>10.4f}")
print(f"{sa_name:<25} {results_sa.overall_att:>15.4f} {results_sa.overall_se:>10.4f}")
print(f"{twfe_name:<25} {results_twfe.att:>15.4f} {results_twfe.se:>10.4f}")

3. Unilateral Divorce Laws: Long Panel with Staggered Adoption#

Background#

Unilateral (no-fault) divorce laws allow one spouse to obtain a divorce without the other’s consent. These laws were adopted at different times across U.S. states, primarily between 1969 and 1985.

Research question: How did unilateral divorce laws affect divorce rates?

Design: Staggered DiD with long panel

  • Treatment: Adoption of unilateral divorce law

  • Time period: 1968-1988

  • Cohorts: States adopting in different years

Key finding: Wolfers (2006) found an initial spike in divorce rates that faded over time.

[ ]:
# Load divorce laws dataset
divorce = load_divorce_laws()

print(f"Dataset shape: {divorce.shape}")
print(f"Years: {divorce['year'].min()} to {divorce['year'].max()}")
print(f"States: {divorce['state'].nunique()}")
divorce.head()
[ ]:
# Treatment timing distribution
cohort_summary = divorce.drop_duplicates('state')[['state', 'first_treat']].sort_values('first_treat')

print("Adoption Timeline")
print("=" * 50)

cohort_counts = cohort_summary[cohort_summary['first_treat'] > 0].groupby('first_treat').size()
never_treated = (cohort_summary['first_treat'] == 0).sum()

for year, n in cohort_counts.items():
    print(f"{year}: {n} state(s)")
print(f"\nNever adopted: {never_treated} states")
[ ]:
# Callaway-Sant'Anna estimation
cs_divorce = CallawaySantAnna(
    control_group='never_treated',
    n_bootstrap=199,
    seed=42
)

results_divorce = cs_divorce.fit(
    divorce,
    outcome='divorce_rate',
    unit='state',
    time='year',
    first_treat='first_treat',
    aggregate='all'  # Compute all aggregations (simple, event_study, group)
)

print(results_divorce.summary())
[ ]:
# Event study results (event_study_effects is populated when aggregate='event_study' or 'all')
print("Event Study: Effect of Unilateral Divorce on Divorce Rates")
print("=" * 65)
print(f"{'Years Since':>12} {'Effect':>10} {'SE':>10} {'Significant':>12}")
print("-" * 65)

for e in sorted(results_divorce.event_study_effects.keys()):
    eff = results_divorce.event_study_effects[e]
    sig = 'Yes' if eff['p_value'] < 0.05 else 'No'
    print(f"{e:>12} {eff['effect']:>10.4f} {eff['se']:>10.4f} {sig:>12}")
[ ]:
# Event study visualization
if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(12, 6))
    plot_event_study(
        results=results_divorce,
        ax=ax,
        title='Unilateral Divorce Laws: Effect on Divorce Rates',
        xlabel='Years Since Law Adoption',
        ylabel='Effect on Divorce Rate (per 1,000)'
    )
    plt.tight_layout()
    plt.show()

Dynamic Effects Pattern#

Notice the pattern in the event study:

  1. Pre-treatment: Effects near zero (validating parallel trends)

  2. Short-run: Spike in divorce rates immediately after adoption

  3. Medium-run: Effects diminish over time

  4. Long-run: Effects may return close to zero

This “spike and fade” pattern was documented by Wolfers (2006) and suggests that unilateral divorce primarily moved forward divorces that would have happened anyway (“harvesting effect”).

[ ]:
# Effects by cohort (group_effects is populated when aggregate='group' or 'all')
print("Effects by Adoption Cohort")
print("=" * 50)

for cohort in sorted(results_divorce.group_effects.keys()):
    eff = results_divorce.group_effects[cohort]
    sig = '*' if eff['p_value'] < 0.05 else ''
    print(f"Cohort {cohort}: {eff['effect']:>7.4f} (SE: {eff['se']:.4f}) {sig}")

Summary#

Key Takeaways#

  1. Card-Krueger (1994)

    • Classic 2x2 DiD design

    • Simple before/after, treatment/control comparison

    • Key insight: Minimum wage increases don’t necessarily reduce employment

  2. Castle Doctrine Laws

    • Staggered adoption across states

    • TWFE can be biased; use CS or Sun-Abraham

    • Bacon decomposition reveals the problem with TWFE

    • Finding: Laws associated with increased homicide rates

  3. Unilateral Divorce Laws

    • Long panel with many cohorts

    • Dynamic treatment effects (spike and fade)

    • Event study reveals time-varying patterns

When to Use Which Estimator#

Design

Recommended Estimator

Classic 2x2

DifferenceInDifferences

Panel with 2 periods

DifferenceInDifferences or TwoWayFixedEffects

Staggered adoption

CallawaySantAnna or SunAbraham

Heterogeneous timing

Always use CallawaySantAnna / SunAbraham

Few never-treated

CallawaySantAnna(control_group='not_yet_treated')

References#

  • Card, D., & Krueger, A. B. (1994). Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania. American Economic Review, 84(4), 772-793.

  • Cheng, C., & Hoekstra, M. (2013). Does Strengthening Self-Defense Law Deter Crime or Escalate Violence? Evidence from Expansions to Castle Doctrine. Journal of Human Resources, 48(3), 821-854.

  • Stevenson, B., & Wolfers, J. (2006). Bargaining in the Shadow of the Law: Divorce Laws and Family Distress. Quarterly Journal of Economics, 121(1), 267-288.

  • Wolfers, J. (2006). Did Unilateral Divorce Laws Raise Divorce Rates? A Reconciliation and New Results. American Economic Review, 96(5), 1802-1820.

  • Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200-230.

  • Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254-277.