Interactive notebook

This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.

Efficient DiD (Chen, Sant’Anna & Xie 2025)#

This tutorial demonstrates the EfficientDiD estimator, which implements the semiparametrically efficient ATT estimator from Chen, Sant’Anna & Xie (2025).

What EDiD does: Standard staggered DiD estimators like Callaway-Sant’Anna use one comparison per target ATT(g,t). When parallel trends holds across all pre-treatment periods (PT-All), this leaves valid information on the table. EDiD optimally weights across all valid comparison groups and baselines to achieve the semiparametric efficiency bound — the tightest possible confidence intervals.

When to use EDiD:

  • Staggered adoption design where you want maximum statistical efficiency

  • You believe parallel trends holds across all pre-treatment periods (PT-All)

  • You want tighter confidence intervals than Callaway-Sant’Anna

  • You need a formal efficiency benchmark for comparing estimators

Topics covered:

  1. Basic usage and overall ATT

  2. Group-time effects

  3. PT-All vs PT-Post assumptions

  4. Demonstrating efficiency gains over Callaway-Sant’Anna

  5. Event study aggregation and visualization

  6. Group-level aggregation

  7. Bootstrap inference and weight distributions

  8. Diagnostics: efficient weights and condition numbers

  9. Anticipation periods

  10. Three-way comparison: EDiD vs CS vs ImputationDiD

Prerequisites:Tutorial 02(Staggered DiD) andTutorial 04(Parallel Trends).

See also:Tutorial 11for Imputation DiD,Tutorial 13for Stacked DiD.

[ ]:
import numpy as np
import pandas as pd

from diff_diff import (
    EfficientDiD, CallawaySantAnna, ImputationDiD,
    generate_staggered_data,
)

# For nicer plots (optional)
try:
    import matplotlib.pyplot as plt
    plt.style.use('seaborn-v0_8-whitegrid')
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False
    print("matplotlib not installed - visualization examples will be skipped")

What Makes EDiD Different?#

Consider a staggered adoption design with cohorts treated at periods 3, 5, and 7, plus a never-treated group. To estimate ATT(g=5, t=6), Callaway-Sant’Anna uses a single 2x2 comparison:

Compare the outcome change from period 4 to 6 for cohort 5 versus the never-treated group.

But under PT-All (parallel trends across all pre-treatment periods), there are additional valid comparisons. Cohort 7 is also untreated at period 6, so it can serve as a comparison group too. And periods 2 and 3 can serve as additional valid baselines beyond CS’s default period 4. (Period 1 is excluded — it is the fixed \(Y_1\) reference used in every comparison’s differencing, so using it as a baseline adds no information.)

Each of these comparisons provides an unbiased estimate of ATT(g=5, t=6), but with different variances. EDiD finds the optimal linear combination — the one that minimizes variance — by computing the inverse covariance matrix of these “generated outcomes” (the paper calls this \(\Omega^*\)).

The result: matching post-treatment ATT(g,t) with CS under PT-Post, but tighter standard errors under PT-All because EDiD exploits the overidentification.

Key equation (for the curious): The efficient weight vector is \(w^* = \frac{\mathbf{1}' \Omega^{*-1}}{\mathbf{1}' \Omega^{*-1} \mathbf{1}}\), where \(\Omega^*\) is the covariance matrix of the generated outcomes across all valid (comparison group, baseline) pairs. This is the classic GLS optimal weighting. See REGISTRY.md or the paper for full derivations.

Data Setup#

We use generate_staggered_data() to create a balanced panel with 3 treatment cohorts, a never-treated group, and a known treatment effect of 2.0.

[ ]:
data = generate_staggered_data(n_units=300, n_periods=10, treatment_effect=2.0,
                               dynamic_effects=False, seed=42)

print(f"Shape: {data.shape}")
print(f"Cohorts: {sorted(data['first_treat'].unique())}")
print(f"Periods: {sorted(data['period'].unique())}")
print(f"Units per cohort:")
print(data.groupby('first_treat')['unit'].nunique().to_string())
print()
data.head(10)

Basic Estimation#

The EfficientDiD API follows the same pattern as other staggered estimators: create the estimator, call fit(), and inspect results. The key parameter is pt_assumption — we start with "all" (the default) which uses all valid pre-treatment periods for tighter inference.

[ ]:
edid = EfficientDiD(pt_assumption="all")
results = edid.fit(data, outcome='outcome', unit='unit', time='period',
                   first_treat='first_treat', aggregate='all')
results.print_summary()

Group-Time Effects#

Like Callaway-Sant’Anna, EDiD estimates ATT(g,t) for each (cohort, time period) pair. These are the building blocks for all aggregations. Use to_dataframe(level='group_time') to access them.

[ ]:
gt_df = results.to_dataframe(level='group_time')
gt_df

PT-All vs PT-Post#

EDiD supports two parallel trends assumptions:

  • PT-All (pt_assumption="all"): Parallel trends holds across all pre-treatment periods. The model is overidentified — more valid comparisons exist than needed — and EDiD exploits this for tighter SEs.

  • PT-Post (pt_assumption="post"): Parallel trends holds only from g-1 onward (the weaker, standard assumption). EDiD uses a single baseline (g-1) per cohort, matching CallawaySantAnna(control_group='never_treated') for post-treatment ATT(g,t). Pre-treatment diagnostics may differ from CS’s default base_period='varying'.

PT-All is the default because it delivers efficiency gains when the assumption holds. Use PT-Post if you’re concerned about violations in early pre-treatment periods.

[ ]:
# Fit under both assumptions
results_all = EfficientDiD(pt_assumption="all").fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='all')

results_post = EfficientDiD(pt_assumption="post").fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='all')

# Compare with Callaway-Sant'Anna
results_cs = CallawaySantAnna().fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat')

print("PT-All vs PT-Post vs Callaway-Sant'Anna")
print("=" * 65)
print(f"{'Estimator':<25} {'ATT':>10} {'SE':>10} {'CI Width':>12}")
print("-" * 65)
for name, r in [("EDiD (PT-All)", results_all),
                ("EDiD (PT-Post)", results_post),
                ("CallawaySantAnna", results_cs)]:
    ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]
    print(f"{name:<25} {r.overall_att:>10.4f} {r.overall_se:>10.4f} {ci_width:>12.4f}")
print()
print("PT-Post and CS produce identical post-treatment ATTs.")

Demonstrating Efficiency Gains#

The efficiency gain from PT-All is not a one-off coincidence — it holds systematically across datasets. Here we run a small Monte Carlo to show that EDiD (PT-All) consistently produces smaller SEs than Callaway-Sant’Anna.

[ ]:
n_seeds = 10
se_edid_list = []
se_cs_list = []

for seed in range(n_seeds):
    sim_data = generate_staggered_data(n_units=200, n_periods=8,
                                       treatment_effect=2.0,
                                       dynamic_effects=False, seed=seed)
    r_edid = EfficientDiD(pt_assumption="all").fit(
        sim_data, outcome='outcome', unit='unit', time='period',
        first_treat='first_treat')
    r_cs = CallawaySantAnna().fit(
        sim_data, outcome='outcome', unit='unit', time='period',
        first_treat='first_treat')
    se_edid_list.append(r_edid.overall_se)
    se_cs_list.append(r_cs.overall_se)

se_edid = np.array(se_edid_list)
se_cs = np.array(se_cs_list)

print("Efficiency Comparison: EDiD (PT-All) vs CallawaySantAnna")
print("=" * 55)
print(f"{'Metric':<30} {'EDiD':>10} {'CS':>10}")
print("-" * 55)
print(f"{'Mean SE':<30} {se_edid.mean():>10.4f} {se_cs.mean():>10.4f}")
print(f"{'Median SE':<30} {np.median(se_edid):>10.4f} {np.median(se_cs):>10.4f}")
print(f"{'Mean SE ratio (EDiD/CS)':<30} {(se_edid / se_cs).mean():>10.4f}")
print()
print(f"EDiD SEs are on average {(1 - (se_edid / se_cs).mean()) * 100:.1f}% "
      f"smaller than CS SEs across {n_seeds} simulations.")

Event Study Aggregation#

Event study effects aggregate ATT(g,t) by relative time \(e = t - g\), averaging across cohorts at each horizon. This shows how treatment effects evolve over time since adoption. Pre-treatment coefficients (\(e < 0\)) serve as a diagnostic for parallel trends.

[ ]:
edid_es = EfficientDiD(pt_assumption="all")
results_es = edid_es.fit(data, outcome='outcome', unit='unit', time='period',
                         first_treat='first_treat', aggregate='event_study')

es_df = results_es.to_dataframe(level='event_study')
es_df
[ ]:
if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.errorbar(es_df['relative_period'], es_df['effect'],
                yerr=[es_df['effect'] - es_df['conf_int_lower'],
                      es_df['conf_int_upper'] - es_df['effect']],
                fmt='o-', capsize=4, color='steelblue', label='EDiD (PT-All)')
    ax.axhline(y=0, color='black', linestyle='--', linewidth=0.8)
    ax.axvline(x=-0.5, color='red', linestyle=':', linewidth=0.8, label='Treatment onset')
    ax.set_xlabel('Relative Period (e = t - g)')
    ax.set_ylabel('Effect')
    ax.set_title('Efficient DiD Event Study')
    ax.legend()
    plt.tight_layout()
    plt.show()
else:
    print("Install matplotlib to see visualizations: pip install matplotlib")

Group-Level Aggregation#

Group aggregation averages post-treatment effects within each cohort, showing how the treatment effect varies by adoption timing.

[ ]:
grp_df = results.to_dataframe(level='group')
grp_df

Bootstrap Inference#

EDiD supports multiplier bootstrap for inference. The bootstrap perturbs the influence function values with random weights to obtain bootstrap distributions of all parameters.

Three weight distributions are available:

  • Rademacher (default): \(\pm 1\) with equal probability — standard choice, works well in most settings

  • Mammen: Two-point distribution that matches third moments

  • Webb: Six-point distribution with wider support

[ ]:
# Analytical vs bootstrap inference
results_boot = EfficientDiD(pt_assumption="all", n_bootstrap=499, seed=42).fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='all')

print("Analytical vs Bootstrap Inference")
print("=" * 70)
print(f"{'Method':<20} {'ATT':>10} {'SE':>10} {'CI Lower':>12} {'CI Upper':>12}")
print("-" * 70)
print(f"{'Analytical':<20} {results.overall_att:>10.4f} {results.overall_se:>10.4f} "
      f"{results.overall_conf_int[0]:>12.4f} {results.overall_conf_int[1]:>12.4f}")
print(f"{'Bootstrap (499)':<20} {results_boot.overall_att:>10.4f} {results_boot.overall_se:>10.4f} "
      f"{results_boot.overall_conf_int[0]:>12.4f} {results_boot.overall_conf_int[1]:>12.4f}")
print()

# Compare weight distributions
print("Bootstrap Weight Distributions")
print("-" * 45)
for wt in ['rademacher', 'mammen', 'webb']:
    r = EfficientDiD(pt_assumption="all", n_bootstrap=499,
                     bootstrap_weights=wt, seed=42).fit(
        data, outcome='outcome', unit='unit', time='period',
        first_treat='first_treat')
    print(f"{wt:<15} SE={r.overall_se:.4f}  "
          f"CI=[{r.overall_conf_int[0]:.4f}, {r.overall_conf_int[1]:.4f}]")

Diagnostics: Efficient Weights and Condition Numbers#

EDiD exposes two diagnostic quantities:

  • ``efficient_weights``: The optimal weight vector for each (g, t) target. These weights sum to 1 and show how much each (comparison group, baseline) pair contributes to the estimate.

  • ``omega_condition_numbers``: The condition number of the \(\Omega^*\) covariance matrix for each target. High condition numbers (> 100) indicate near-singular matrices where the weight estimates may be unstable.

[ ]:
if results.efficient_weights:
    print("Efficient Weights by (g, t)")
    print("=" * 55)
    for (g, t), w in sorted(results.efficient_weights.items()):
        print(f"  (g={int(g)}, t={int(t)}): {len(w)} weights, sum={w.sum():.4f}")

print()

if results.omega_condition_numbers:
    print("Omega* Condition Numbers")
    print("=" * 55)
    for (g, t), cond in sorted(results.omega_condition_numbers.items()):
        flag = "  << HIGH" if cond > 100 else ""
        print(f"  (g={int(g)}, t={int(t)}): {cond:.2f}{flag}")
    print()
    print("Condition numbers measure matrix stability. Values > 100 may")
    print("indicate near-singular covariance and less reliable weights.")

Anticipation#

If treatment effects begin before the official treatment date (e.g., firms adjust behavior in anticipation of a policy), use anticipation=k to shift the effective treatment boundary forward by k periods. This reclassifies periods e >= -k as post-treatment.

[ ]:
r_no_antic = EfficientDiD(pt_assumption="all").fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='all')

r_antic = EfficientDiD(pt_assumption="all", anticipation=1).fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='all')

print("Anticipation Comparison")
print("=" * 55)
print(f"{'Setting':<30} {'ATT':>10} {'SE':>10}")
print("-" * 55)
print(f"{'No anticipation':<30} {r_no_antic.overall_att:>10.4f} {r_no_antic.overall_se:>10.4f}")
print(f"{'1-period anticipation':<30} {r_antic.overall_att:>10.4f} {r_antic.overall_se:>10.4f}")
print()
print("Anticipation shifts the effective treatment boundary forward,")
print("reclassifying the period before treatment as post-treatment.")

Comparison: EDiD vs Callaway-Sant’Anna vs Imputation DiD#

These three estimators address TWFE bias in staggered settings via different approaches:

  • EfficientDiD: Optimal EIF-based weighting across all valid comparisons

  • CallawaySantAnna: Separate 2x2 DiD regressions, then aggregate

  • ImputationDiD: Impute Y(0) via a fixed effects model, compute unit-level effects

Under the DGP used here (homogeneous effects, PT holds everywhere), all three should produce similar point estimates. The key difference is in standard errors: EDiD (PT-All) should be the tightest.

[ ]:
edid_r = EfficientDiD(pt_assumption="all").fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='all')
cs_r = CallawaySantAnna().fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat')
imp_r = ImputationDiD().fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat')

print("Estimator Comparison (True effect = 2.0)")
print("=" * 70)
print(f"{'Estimator':<25} {'ATT':>10} {'SE':>10} {'p-value':>10} {'CI Width':>12}")
print("-" * 70)
for name, r in [("EfficientDiD (PT-All)", edid_r),
                ("CallawaySantAnna", cs_r),
                ("ImputationDiD", imp_r)]:
    ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]
    print(f"{name:<25} {r.overall_att:>10.4f} {r.overall_se:>10.4f} "
          f"{r.overall_p_value:>10.4f} {ci_width:>12.4f}")
[ ]:
# Side-by-side event study comparison
edid_es_r = EfficientDiD(pt_assumption="all").fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='event_study')
cs_es_r = CallawaySantAnna().fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='event_study')
imp_es_r = ImputationDiD().fit(
    data, outcome='outcome', unit='unit', time='period',
    first_treat='first_treat', aggregate='event_study')

edid_es_df = edid_es_r.to_dataframe(level='event_study')
cs_es_df = cs_es_r.to_dataframe(level='event_study')
imp_es_df = imp_es_r.to_dataframe(level='event_study')

if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(10, 6))
    offset = 0.15

    ax.errorbar(edid_es_df['relative_period'] - offset, edid_es_df['effect'],
                yerr=[edid_es_df['effect'] - edid_es_df['conf_int_lower'],
                      edid_es_df['conf_int_upper'] - edid_es_df['effect']],
                fmt='o-', capsize=3, color='steelblue', label='EfficientDiD (PT-All)')
    ax.errorbar(cs_es_df['relative_period'], cs_es_df['effect'],
                yerr=[cs_es_df['effect'] - cs_es_df['conf_int_lower'],
                      cs_es_df['conf_int_upper'] - cs_es_df['effect']],
                fmt='s-', capsize=3, color='darkorange', label='CallawaySantAnna')
    ax.errorbar(imp_es_df['relative_period'] + offset, imp_es_df['effect'],
                yerr=[imp_es_df['effect'] - imp_es_df['conf_int_lower'],
                      imp_es_df['conf_int_upper'] - imp_es_df['effect']],
                fmt='^-', capsize=3, color='forestgreen', label='ImputationDiD')

    ax.axhline(y=0, color='black', linestyle='--', linewidth=0.8)
    ax.axvline(x=-0.5, color='red', linestyle=':', linewidth=0.8)
    ax.set_xlabel('Relative Period (e = t - g)')
    ax.set_ylabel('Effect')
    ax.set_title('Event Study Comparison: EDiD vs CS vs ImputationDiD')
    ax.legend()
    plt.tight_layout()
    plt.show()
else:
    print("Install matplotlib to see visualizations: pip install matplotlib")

Summary#

Key takeaways:

  1. EDiD achieves the semiparametric efficiency bound for ATT estimation in staggered designs

  2. Under PT-All, EDiD exploits overidentification for tighter SEs than CS

  3. Under PT-Post, EDiD matches CS for post-treatment ATT(g,t); pre-treatment diagnostics use a fixed baseline and may differ from CS’s default varying baseline

  4. The efficiency gain comes from optimally weighting across all valid (comparison group, baseline) pairs

  5. Event study and group aggregations work just like CS

  6. Multiplier bootstrap provides robust inference with Rademacher, Mammen, or Webb weights

  7. Condition numbers flag potentially unstable weight matrices

  8. Anticipation shifts the effective treatment boundary for pre-treatment effects

  9. Phase 1 is no-covariates only — Phase 2 will add covariate support

  10. When in doubt, run both EDiD and CS — if ATTs agree, report EDiD for tighter CIs

Parameter reference:

Parameter

Default

Description

pt_assumption

"all"

"all" (overidentified) or "post" (just-identified, matches CS post-treatment ATT)

alpha

0.05

Significance level

n_bootstrap

0

Number of bootstrap iterations (0 = analytical only)

bootstrap_weights

"rademacher"

Bootstrap weight distribution: "rademacher", "mammen", "webb"

seed

None

Random seed for reproducibility

anticipation

0

Anticipation periods

Reference: Chen, X., Sant’Anna, P. H. C., & Xie, H. (2025). Efficient Difference-in-Differences and Event Study Estimators.

See also:Choosing an Estimatorfor guidance on when to use EDiD vs other estimators.