Choosing an Estimator

This guide helps you select the right estimator for your research design.

Decision Flowchart

Start here and follow the questions:

Is treatment staggered? (Different units treated at different times)
- No → Go to question 2
- Yes → Use CallawaySantAnna
Do you have panel data? (Multiple observations per unit over time)
- No → Use DifferenceInDifferences (basic 2x2)
- Yes → Go to question 3
Do you need period-specific effects? (Event study design)
- No → Use TwoWayFixedEffects
- Yes → Use MultiPeriodDiD
Is your treated group small? (Few treated units, many controls)
- Consider SyntheticDiD for better pre-treatment fit

Quick Reference

Estimator	Best For	Key Assumption	Output
`DifferenceInDifferences`	Simple 2x2 designs, cross-sectional comparisons	Parallel trends (2 periods)	Single ATT
`TwoWayFixedEffects`	Panel data, simultaneous treatment	Parallel trends (all periods)	Single ATT with unit/time FE
`MultiPeriodDiD`	Event studies, dynamic effects	Parallel trends (pre-periods)	Period-specific effects
`CallawaySantAnna`	Staggered adoption, heterogeneous timing	Conditional parallel trends	Group-time ATT(g,t), aggregations
`SyntheticDiD`	Few treated units, many controls	Synthetic parallel trends	ATT with unit/time weights

Detailed Guidance

Basic 2x2 DiD

Use DifferenceInDifferences when:

You have a simple before/after, treatment/control design
Treatment occurs simultaneously for all treated units
You want a single average treatment effect

from diff_diff import DifferenceInDifferences

did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treated='treated', post='post')

Two-Way Fixed Effects

Use TwoWayFixedEffects when:

You have panel data with multiple time periods
Treatment timing is the same for all treated units
You want to control for unit and time fixed effects
You don’t need to see period-by-period effects

Warning

TWFE can be biased with staggered treatment timing. Already-treated units act as controls for newly-treated units, which can cause negative weighting. Use CallawaySantAnna for staggered designs.

from diff_diff import TwoWayFixedEffects

twfe = TwoWayFixedEffects()
results = twfe.fit(data, outcome='y', treated='treated',
                   unit='unit_id', time='period')

Multi-Period Event Study

Use MultiPeriodDiD when:

You want a full event-study with pre and post treatment effects
You need pre-period coefficients to assess parallel trends
You want to visualize treatment effect dynamics over time
All treated units receive treatment at the same time (simultaneous adoption)

from diff_diff import MultiPeriodDiD, plot_event_study

event = MultiPeriodDiD(reference_period=-1)
results = event.fit(data, outcome='y', treated='treated',
                    time='period', unit='unit_id', treatment_start=5)

# Visualize
plot_event_study(results)

Callaway-Sant’Anna

Use CallawaySantAnna when:

Treatment is adopted at different times (staggered rollout)
You want valid treatment effect estimates with heterogeneous timing
You need group-time specific effects ATT(g,t)

This is the recommended estimator for most applied work with staggered adoption.

from diff_diff import CallawaySantAnna

cs = CallawaySantAnna(
    control_group='never_treated',  # or 'not_yet_treated'
    estimation_method='dr'  # doubly robust (recommended)
)
results = cs.fit(data, outcome='y', unit='unit_id',
                 time='period', first_treat='first_treat',
                 covariates=['x1', 'x2'])

# Get aggregated effects
print(f"Overall ATT: {results.att:.3f}")

# Event study aggregation
event_study = results.aggregate('event_time')

Synthetic DiD

Use SyntheticDiD when:

You have few treated units but many control units
Pre-treatment fit between treated and control is poor
You want to construct a weighted synthetic control

from diff_diff import SyntheticDiD

sdid = SyntheticDiD()
results = sdid.fit(data, outcome='y', unit='unit_id',
                   time='period', treated='treated',
                   treatment_start=5)

# View the unit weights
print(results.unit_weights)

Common Pitfalls

Using TWFE with staggered adoption

TWFE estimates a weighted average of all 2x2 comparisons, including “forbidden” comparisons where already-treated units serve as controls. This can lead to severe bias, even negative weights on treatment effects.

Solution: Use CallawaySantAnna for staggered designs.
Ignoring treatment effect heterogeneity

If treatment effects vary by cohort (when units are treated) or over time (dynamic effects), aggregated estimators may be misleading.

Solution: Use CallawaySantAnna and examine ATT(g,t) and event study plots.
Failing to test parallel trends

The parallel trends assumption is untestable in the post-period but can be assessed using pre-treatment data.

Solution: Use check_parallel_trends() and HonestDiD for sensitivity analysis.
Inappropriate clustering

Standard errors should typically be clustered at the level of treatment assignment (often the unit level).

Solution: Always specify cluster_col for panel data.

Standard Error Methods

Different estimators compute standard errors differently. Understanding these differences helps interpret results and choose appropriate inference.

Estimator	Default SE Method	Details
`DifferenceInDifferences`	HC1 (heteroskedasticity-robust)	Uses White’s robust SEs by default. Specify `cluster_col` for cluster-robust SEs. Use `inference='wild_bootstrap'` for few clusters (<30).
`TwoWayFixedEffects`	Cluster-robust (unit level)	Always clusters at unit level after within-transformation. Specify `cluster_col` to override. Use `inference='wild_bootstrap'` for few clusters.
`MultiPeriodDiD`	HC1 (heteroskedasticity-robust)	Same as basic DiD. Cluster-robust available via `cluster_col`. Wild bootstrap not yet supported for multi-coefficient inference.
`CallawaySantAnna`	Analytical (simple difference)	Uses simple variance of group-time means. Use `bootstrap()` method for multiplier bootstrap inference with proper SEs, CIs, and p-values.
`SyntheticDiD`	Bootstrap or placebo-based	Default uses bootstrap resampling. Set `n_bootstrap=0` for placebo-based inference using pre-treatment residuals.

Recommendations by sample size:

Large samples (N > 1000, clusters > 50): Default analytical SEs are reliable
Medium samples (clusters 30-50): Cluster-robust SEs recommended
Small samples (clusters < 30): Use wild cluster bootstrap (inference='wild_bootstrap')
Very few clusters (< 10): Use Webb 6-point distribution (weight_type='webb')

Common pitfall: Forgetting to cluster when units are observed multiple times. For panel data, always cluster at the unit level unless you have a strong reason not to.

# Good: Cluster at unit level for panel data
did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treated='treated',
                  post='post', cluster_col='unit_id')

# Better for few clusters: Wild bootstrap
did = DifferenceInDifferences(inference='wild_bootstrap')
results = did.fit(data, outcome='y', treated='treated',
                  post='post', cluster_col='state')

When in Doubt

If you’re unsure which estimator to use:

Start with CallawaySantAnna - It’s valid even for non-staggered designs and provides the most flexible output (group-time effects, aggregations)
Check for heterogeneity - Plot event studies to see if effects vary
Run sensitivity analysis - Use HonestDiD to assess robustness
Compare estimators - If results differ substantially across estimators, investigate why (often reveals violations of assumptions)