Choosing an Estimator
This guide helps you select the right estimator for your research design.
Decision Flowchart
Start here and follow the questions:
Is treatment staggered? (Different units treated at different times)
No → Go to question 2
Yes → Use
CallawaySantAnna
Do you have panel data? (Multiple observations per unit over time)
No → Use
DifferenceInDifferences(basic 2x2)Yes → Go to question 3
Do you need period-specific effects? (Event study design)
No → Use
TwoWayFixedEffectsYes → Use
MultiPeriodDiD
Is your treated group small? (Few treated units, many controls)
Consider
SyntheticDiDfor better pre-treatment fit
Quick Reference
Estimator |
Best For |
Key Assumption |
Output |
|---|---|---|---|
|
Simple 2x2 designs, cross-sectional comparisons |
Parallel trends (2 periods) |
Single ATT |
|
Panel data, simultaneous treatment |
Parallel trends (all periods) |
Single ATT with unit/time FE |
|
Event studies, dynamic effects |
Parallel trends (pre-periods) |
Period-specific effects |
|
Staggered adoption, heterogeneous timing |
Conditional parallel trends |
Group-time ATT(g,t), aggregations |
|
Few treated units, many controls |
Synthetic parallel trends |
ATT with unit/time weights |
Detailed Guidance
Basic 2x2 DiD
Use DifferenceInDifferences when:
You have a simple before/after, treatment/control design
Treatment occurs simultaneously for all treated units
You want a single average treatment effect
from diff_diff import DifferenceInDifferences
did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treated='treated', post='post')
Two-Way Fixed Effects
Use TwoWayFixedEffects when:
You have panel data with multiple time periods
Treatment timing is the same for all treated units
You want to control for unit and time fixed effects
You don’t need to see period-by-period effects
Warning
TWFE can be biased with staggered treatment timing. Already-treated units
act as controls for newly-treated units, which can cause negative weighting.
Use CallawaySantAnna for staggered designs.
from diff_diff import TwoWayFixedEffects
twfe = TwoWayFixedEffects()
results = twfe.fit(data, outcome='y', treated='treated',
unit='unit_id', time='period')
Multi-Period Event Study
Use MultiPeriodDiD when:
You want a full event-study with pre and post treatment effects
You need pre-period coefficients to assess parallel trends
You want to visualize treatment effect dynamics over time
All treated units receive treatment at the same time (simultaneous adoption)
from diff_diff import MultiPeriodDiD, plot_event_study
event = MultiPeriodDiD(reference_period=-1)
results = event.fit(data, outcome='y', treated='treated',
time='period', unit='unit_id', treatment_start=5)
# Visualize
plot_event_study(results)
Callaway-Sant’Anna
Use CallawaySantAnna when:
Treatment is adopted at different times (staggered rollout)
You want valid treatment effect estimates with heterogeneous timing
You need group-time specific effects ATT(g,t)
This is the recommended estimator for most applied work with staggered adoption.
from diff_diff import CallawaySantAnna
cs = CallawaySantAnna(
control_group='never_treated', # or 'not_yet_treated'
estimation_method='dr' # doubly robust (recommended)
)
results = cs.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
covariates=['x1', 'x2'])
# Get aggregated effects
print(f"Overall ATT: {results.att:.3f}")
# Event study aggregation
event_study = results.aggregate('event_time')
Synthetic DiD
Use SyntheticDiD when:
You have few treated units but many control units
Pre-treatment fit between treated and control is poor
You want to construct a weighted synthetic control
from diff_diff import SyntheticDiD
sdid = SyntheticDiD()
results = sdid.fit(data, outcome='y', unit='unit_id',
time='period', treated='treated',
treatment_start=5)
# View the unit weights
print(results.unit_weights)
Common Pitfalls
Using TWFE with staggered adoption
TWFE estimates a weighted average of all 2x2 comparisons, including “forbidden” comparisons where already-treated units serve as controls. This can lead to severe bias, even negative weights on treatment effects.
Solution: Use CallawaySantAnna for staggered designs.
Ignoring treatment effect heterogeneity
If treatment effects vary by cohort (when units are treated) or over time (dynamic effects), aggregated estimators may be misleading.
Solution: Use CallawaySantAnna and examine ATT(g,t) and event study plots.
Failing to test parallel trends
The parallel trends assumption is untestable in the post-period but can be assessed using pre-treatment data.
Solution: Use
check_parallel_trends()andHonestDiDfor sensitivity analysis.Inappropriate clustering
Standard errors should typically be clustered at the level of treatment assignment (often the unit level).
Solution: Always specify
cluster_colfor panel data.
Standard Error Methods
Different estimators compute standard errors differently. Understanding these differences helps interpret results and choose appropriate inference.
Estimator |
Default SE Method |
Details |
|---|---|---|
|
HC1 (heteroskedasticity-robust) |
Uses White’s robust SEs by default. Specify |
|
Cluster-robust (unit level) |
Always clusters at unit level after within-transformation. Specify |
|
HC1 (heteroskedasticity-robust) |
Same as basic DiD. Cluster-robust available via |
|
Analytical (simple difference) |
Uses simple variance of group-time means. Use |
|
Bootstrap or placebo-based |
Default uses bootstrap resampling. Set |
Recommendations by sample size:
Large samples (N > 1000, clusters > 50): Default analytical SEs are reliable
Medium samples (clusters 30-50): Cluster-robust SEs recommended
Small samples (clusters < 30): Use wild cluster bootstrap (
inference='wild_bootstrap')Very few clusters (< 10): Use Webb 6-point distribution (
weight_type='webb')
Common pitfall: Forgetting to cluster when units are observed multiple times. For panel data, always cluster at the unit level unless you have a strong reason not to.
# Good: Cluster at unit level for panel data
did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treated='treated',
post='post', cluster_col='unit_id')
# Better for few clusters: Wild bootstrap
did = DifferenceInDifferences(inference='wild_bootstrap')
results = did.fit(data, outcome='y', treated='treated',
post='post', cluster_col='state')
When in Doubt
If you’re unsure which estimator to use:
Start with CallawaySantAnna - It’s valid even for non-staggered designs and provides the most flexible output (group-time effects, aggregations)
Check for heterogeneity - Plot event studies to see if effects vary
Run sensitivity analysis - Use HonestDiD to assess robustness
Compare estimators - If results differ substantially across estimators, investigate why (often reveals violations of assumptions)