.. meta::
   :description: Guide to choosing the right Difference-in-Differences estimator. Covers basic DiD, TWFE, staggered adoption methods (Callaway-Sant'Anna, Sun-Abraham), Synthetic DiD, and more.
   :keywords: which DiD estimator, staggered DiD estimator, difference-in-differences method selection, TWFE alternatives

Choosing an Estimator
=====================

This guide helps you select the right estimator for your research design.

Decision Flowchart
------------------

Start here and follow the questions:

0. **Is this a triple-difference (DDD) design?** (Two criteria for treatment: e.g., policy adoption AND group eligibility)

   - **No** → Go to question 1
   - **Yes, simultaneous treatment (2×2×2)** → Use :class:`~diff_diff.TripleDifference`
   - **Yes, with staggered timing** → Use :class:`~diff_diff.StaggeredTripleDifference`

1. **Is treatment continuous?** (Units receive different doses or intensities)

   - **No** → Go to question 2
   - **Yes** → Use :class:`~diff_diff.ContinuousDiD`

2. **Can treatment switch on AND off?** (Reversible / non-absorbing treatment — e.g., marketing campaigns, seasonal promotions, on/off policy cycles)

   - **No (treatment is absorbing — once treated, stays treated)** → Go to question 3
   - **Yes** → Use :class:`~diff_diff.ChaisemartinDHaultfoeuille` — the only library estimator that handles non-absorbing treatments

3. **Is treatment staggered?** (Different units treated at different times)

   - **No** → Go to question 4
   - **Yes** → Use :class:`~diff_diff.CallawaySantAnna` (or :class:`~diff_diff.EfficientDiD` for tighter SEs under PT-All)
   - **Yes, and you suspect homogeneous effects** → Use :class:`~diff_diff.ImputationDiD` or :class:`~diff_diff.TwoStageDiD` for tighter CIs
   - **Yes, with nonlinear outcome (binary/count)** → Use :class:`~diff_diff.WooldridgeDiD` with ``method='logit'`` or ``method='poisson'``
   - **Want to diagnose TWFE bias?** → Use :class:`~diff_diff.BaconDecomposition` first

4. **Do you have panel data?** (Multiple observations per unit over time)

   - **No** → Use :class:`~diff_diff.DifferenceInDifferences` (basic 2x2)
   - **Yes** → Go to question 5

5. **Do you need period-specific effects?** (Event study design)

   - **No** → Use :class:`~diff_diff.TwoWayFixedEffects`
   - **Yes** → Use :class:`~diff_diff.MultiPeriodDiD`

6. **Is your treated group small?** (Few treated units, many controls)

   - Consider :class:`~diff_diff.SyntheticDiD` for better pre-treatment fit

Quick Reference
---------------

.. list-table::
   :header-rows: 1
   :widths: 20 30 25 25

   * - Estimator
     - Best For
     - Key Assumption
     - Output
   * - ``DifferenceInDifferences``
     - Simple 2x2 designs, cross-sectional comparisons
     - Parallel trends (2 periods)
     - Single ATT
   * - ``TwoWayFixedEffects``
     - Panel data, simultaneous treatment
     - Parallel trends (all periods)
     - Single ATT with unit/time FE
   * - ``MultiPeriodDiD``
     - Event studies, dynamic effects
     - Parallel trends (pre-periods)
     - Period-specific effects
   * - ``CallawaySantAnna``
     - Staggered adoption, heterogeneous timing
     - Conditional parallel trends
     - Group-time ATT(g,t), aggregations
   * - ``ChaisemartinDHaultfoeuille``
     - Reversible / non-absorbing treatments (only library option)
     - Parallel trends + A5 (no crossing) + A11 (stable controls)
     - DID_l event study (L_max), normalized DID^n_l, cost-benefit delta, placebos, sup-t bands, TWFE diagnostic
   * - ``SyntheticDiD``
     - Few treated units, many controls
     - Synthetic parallel trends
     - ATT with unit/time weights
   * - ``EfficientDiD``
     - Staggered adoption with optimal efficiency
     - PT-All (overidentified) or PT-Post
     - Group-time ATT(g,t), aggregations
   * - ``ContinuousDiD``
     - Continuous dose / treatment intensity
     - Strong Parallel Trends (SPT) for dose-response; PT for binarized ATT
     - ATT\ :sup:`loc` (PT); ATT(d), ACRT(d) (SPT)
   * - ``HeterogeneousAdoptionDiD``
     - Universal rollout, dose varies, no untreated unit
     - dCDH 2026 Assumptions (Design 1' QUG case or Design 1 with A6/A5)
     - WAS or WAS\ :sub:`d_lower` per resolved estimand; event-study Appendix B.2
   * - ``SunAbraham``
     - Staggered adoption, interaction-weighted
     - Conditional parallel trends
     - Cohort-specific ATTs, event study
   * - ``ImputationDiD``
     - Staggered, homogeneous effects
     - Unit + time FE structure
     - Imputed treatment effects, event study
   * - ``TwoStageDiD``
     - Staggered adoption, efficient
     - Unit + time FE structure
     - Single ATT or event study
   * - ``StackedDiD``
     - Staggered, sub-experiment approach
     - Parallel trends per cohort
     - Trimmed aggregate ATT
   * - ``TROP``
     - Factor confounding suspected
     - Factor model + weights
     - ATT with triple robustness
   * - ``TripleDifference``
     - Two eligibility criteria (DDD)
     - Parallel trends for both dimensions
     - DDD ATT (regression, IPW, or DR)
   * - ``StaggeredTripleDifference``
     - Staggered DDD with treatment timing
     - Conditional parallel trends (DDD)
     - Group-time ATT(g,t), aggregations
   * - ``WooldridgeDiD``
     - Nonlinear outcomes or saturated OLS
     - Conditional parallel trends
     - OLS: direct coefficients; logit/Poisson: ASF-based ATT
   * - ``BaconDecomposition``
     - TWFE diagnostic
     - (diagnostic tool)
     - 2x2 decomposition weights

Detailed Guidance
-----------------

Basic 2x2 DiD
~~~~~~~~~~~~~

Use :class:`~diff_diff.DifferenceInDifferences` when:

- You have a simple before/after, treatment/control design
- Treatment occurs simultaneously for all treated units
- You want a single average treatment effect

.. code-block:: python

   from diff_diff import DifferenceInDifferences

   did = DifferenceInDifferences()
   results = did.fit(data, outcome='y', treatment='treated', time='post')

Two-Way Fixed Effects
~~~~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.TwoWayFixedEffects` when:

- You have panel data with multiple time periods
- Treatment timing is the same for all treated units
- You want to control for unit and time fixed effects
- You don't need to see period-by-period effects

.. warning::

   TWFE can be biased with staggered treatment timing. Already-treated units
   act as controls for newly-treated units, which can cause negative weighting.
   Use :class:`~diff_diff.CallawaySantAnna` for staggered designs.

.. code-block:: python

   from diff_diff import TwoWayFixedEffects

   twfe = TwoWayFixedEffects()
   results = twfe.fit(data, outcome='y', treatment='treated',
                      unit='unit_id', time='period')

Multi-Period Event Study
~~~~~~~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.MultiPeriodDiD` when:

- You want a full event-study with pre and post treatment effects
- You need pre-period coefficients to assess parallel trends
- You want to visualize treatment effect dynamics over time
- All treated units receive treatment at the same time (simultaneous adoption)

.. code-block:: python

   from diff_diff import MultiPeriodDiD, plot_event_study

   event = MultiPeriodDiD()
   results = event.fit(data, outcome='y', treatment='treated',
                       time='period', unit='unit_id', reference_period=2)

   # Visualize
   plot_event_study(results)

Callaway-Sant'Anna
~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.CallawaySantAnna` when:

- Treatment is adopted at different times (staggered rollout)
- You want valid treatment effect estimates with heterogeneous timing
- You need group-time specific effects ATT(g,t)

This is the recommended estimator for most applied work with staggered adoption.

.. code-block:: python

   from diff_diff import CallawaySantAnna

   cs = CallawaySantAnna(
       control_group='never_treated',  # or 'not_yet_treated'
       estimation_method='dr'  # doubly robust (recommended)
   )
   results = cs.fit(data, outcome='y', unit='unit_id',
                    time='period', first_treat='first_treat',
                    covariates=['x1', 'x2'])

   # Overall ATT
   print(f"Overall ATT: {results.overall_att:.3f}")

   # Event study aggregation
   es = cs.fit(data, outcome='y', unit='unit_id',
               time='period', first_treat='first_treat',
               covariates=['x1', 'x2'], aggregate='event_study')
   event_study_df = es.to_dataframe('event_study')

Reversible (Non-Absorbing) Treatment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.ChaisemartinDHaultfoeuille` (alias :class:`~diff_diff.DCDH`) when:

- Treatment can switch on **and** off over time (e.g., marketing campaigns,
  seasonal promotions, on/off policy cycles)
- You need separate joiners (``DID_+``) and leavers (``DID_-``) views, plus
  the aggregate ``DID_M``
- You want a built-in placebo and a TWFE decomposition diagnostic computed
  on the data you pass in (pre-filter) for direct comparison against
  ``DID_M``
- You want a multi-horizon event study (pass ``L_max`` to ``fit()``) with
  normalized effects, cost-benefit aggregation, dynamic placebos, and
  sup-t simultaneous confidence bands

This is **the only library estimator that handles non-absorbing treatments**.
All other staggered estimators
(:class:`~diff_diff.CallawaySantAnna`, :class:`~diff_diff.SunAbraham`,
:class:`~diff_diff.ImputationDiD`, :class:`~diff_diff.TwoStageDiD`,
:class:`~diff_diff.EfficientDiD`, :class:`~diff_diff.WooldridgeDiD`) assume
treatment is absorbing - once treated, stays treated.

Ships ``DID_M`` (= ``DID_1``) from de Chaisemartin & D'Haultfœuille
(2020), the full multi-horizon event study ``DID_l`` for ``l = 1..L_max``
from the dynamic companion paper (NBER WP 29873), residualization-style
covariate adjustment (``controls``), group-specific linear trends
(``trends_linear``), state-set-specific trends (``trends_nonparam``),
heterogeneity testing, non-binary treatment, HonestDiD sensitivity
integration on placebos, and survey support via Taylor-series linearization.

.. code-block:: python

   from diff_diff import ChaisemartinDHaultfoeuille
   from diff_diff.prep import generate_reversible_did_data

   data = generate_reversible_did_data(n_groups=80, n_periods=6, seed=42)

   est = ChaisemartinDHaultfoeuille()
   results = est.fit(
       data,
       outcome="outcome",
       group="group",
       time="period",
       treatment="treatment",
   )
   results.print_summary()

   print(f"DID_M (overall): {results.overall_att:.3f}")
   print(f"DID_+ (joiners): {results.joiners_att:.3f}")
   print(f"DID_- (leavers): {results.leavers_att:.3f}")
   print(f"Placebo:         {results.placebo_effect:.3f}")

.. note::

   By default, the estimator drops groups whose treatment switches more
   than once before estimation (``drop_larger_lower=True``, matching the R
   ``DIDmultiplegtDYN`` reference). This is required for the analytical
   variance formula to be consistent with the point estimate. Each drop
   emits an explicit warning.

.. note::

   Single-period placebo ``DID_M^pl`` (``L_max=None``) has ``NaN`` SE -
   the per-period aggregation path has no influence-function derivation,
   so inference fields stay ``NaN`` even when ``n_bootstrap > 0``. The
   point estimate is meaningful for visual pre-trends inspection.
   Multi-horizon dynamic placebos ``DID^{pl}_l`` (``L_max >= 1``) have
   valid analytical SE and bootstrap SE via the placebo IF. See
   ``docs/methodology/REGISTRY.md`` for the full contract.

.. note::

   ``ChaisemartinDHaultfoeuille`` supports ``survey_design`` with pweight
   and strata/PSU/FPC via Taylor Series Linearization. Replicate weights
   are not yet supported.

Synthetic DiD
~~~~~~~~~~~~~

Use :class:`~diff_diff.SyntheticDiD` when:

- You have few treated units but many control units
- Pre-treatment fit between treated and control is poor
- You want to construct a weighted synthetic control

.. code-block:: python

   from diff_diff import SyntheticDiD, generate_did_data

   # SyntheticDiD requires block treatment (constant within units)
   block_data = generate_did_data(n_units=40, n_periods=10, treatment_effect=2.0)
   sdid = SyntheticDiD()
   results = sdid.fit(block_data, outcome='outcome', unit='unit',
                      time='period', treatment='treated')

   # View the unit weights
   print(results.unit_weights)

Continuous Treatment
~~~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.ContinuousDiD` when:

- Treatment varies in **intensity or dose** (e.g., subsidy amount, hours of training)
- You want to estimate how effects change with treatment dose
- You need the full dose-response curve, not just a single average effect
- Staggered adoption where units receive different treatment levels

.. note::

   Dose-response curves ATT(d) and ACRT(d) require **Strong Parallel Trends (SPT)**.
   Under standard PT only the binarized ATT\ :sup:`loc` is identified.
   Data must include an untreated group (D = 0), a balanced panel, and
   time-invariant dose (each unit's dose is fixed across periods).

.. code-block:: python

   from diff_diff import ContinuousDiD, generate_continuous_did_data

   data = generate_continuous_did_data(n_units=200, seed=42)

   est = ContinuousDiD(n_bootstrap=199, seed=42)
   results = est.fit(data, outcome='outcome', unit='unit',
                     time='period', first_treat='first_treat',
                     dose='dose', aggregate='dose')

   # Overall effect and dose-response curve
   print(f"Overall ATT: {results.overall_att:.3f}")
   att_curve = results.dose_response_att.to_dataframe()

Universal Rollout / No Untreated Control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.HeterogeneousAdoptionDiD` when:

- **Every unit is treated at the post period** (universal-rollout policy,
  industry-wide tariff change, simultaneous launch into all markets)
- Treatment **intensity (dose) varies across units**, but no genuinely
  untreated control group exists to anchor a standard DiD contrast
- :class:`~diff_diff.ContinuousDiD` is unavailable because its untreated-group
  requirement (``D = 0``) is violated

The estimator implements de Chaisemartin, Ciccia, D'Haultfoeuille and Knau
(2026, arXiv:2405.04465v6) and resolves to one of two estimands depending on
the dose support:

- **Design 1' (QUG case, ``d_lower = 0``)** identifies the **Weighted Average
  Slope (WAS)** under the Quasi-Untreated-Group assumption (units with the
  smallest dose serve as the comparison anchor). The shipped result class
  exposes ``target_parameter == "WAS"``.
- **Design 1 (no QUG, ``d_lower > 0``)** identifies ``WAS_{d_lower}`` under
  Assumption 6, or sign identification only under Assumption 5; neither
  additional assumption is testable via pre-trends. Result class exposes
  ``target_parameter == "WAS_d_lower"``.

The dose-distribution path is auto-detected. Run
:func:`~diff_diff.did_had_pretest_workflow` to vet the identifying assumptions
before estimation; see :doc:`api/had` for the full API and SE-regime contract.

.. code-block:: python

   import numpy as np
   import pandas as pd
   from diff_diff import HeterogeneousAdoptionDiD, did_had_pretest_workflow

   # Build a HAD-shape panel: D=0 in pre-periods (t < F), D > 0 only at F+.
   rng = np.random.default_rng(42)
   G, F, T = 200, 4, 5
   doses = rng.beta(0.5, 1.0, size=G)
   rows = []
   for g in range(G):
       for t in range(1, T + 1):
           y = (rng.normal()
                + (doses[g] + doses[g] ** 2) * (t >= F)
                + rng.normal(0, 0.5))
           d = doses[g] if t >= F else 0.0
           rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
   had_data = pd.DataFrame(rows)

   pretests = did_had_pretest_workflow(had_data, outcome_col='y', unit_col='unit',
                                       time_col='period', dose_col='dose',
                                       aggregate='event_study')

   est = HeterogeneousAdoptionDiD()
   results = est.fit(had_data, outcome_col='y', unit_col='unit',
                     time_col='period', dose_col='dose',
                     aggregate='event_study')

   # Event-study results: per-horizon WAS at each event time
   for e, att in zip(results.event_times, results.att):
       print(f"  e={e}: {att:.3f}")

Efficient DiD
~~~~~~~~~~~~~

Use :class:`~diff_diff.EfficientDiD` when:

- You have staggered adoption and want **maximum statistical efficiency** on the no-covariate path
- You believe parallel trends holds across all pre-treatment periods (PT-All)
- You want tighter confidence intervals than Callaway-Sant'Anna
- You need a formal efficiency benchmark for comparing estimators

.. note::

   EfficientDiD supports covariate adjustment via a doubly-robust path with
   all nuisances estimated nonparametrically: sieve-based propensity score
   ratios, a sieve outcome regression (polynomial basis, AIC/BIC order
   selection), and a kernel-smoothed conditional covariance. The DR property
   gives consistency if either the outcome regression or the PS is correctly
   specified, and the covariate path attains the semiparametric efficiency
   bound asymptotically under the paper's regularity conditions (a growing
   sieve; degree 1 reproduces a linear working model, and ``sieve_k_max=1``
   forces all covariate-path sieves to degree 1). Pass column
   names to the ``covariates`` parameter on ``fit()``. See
   ``docs/methodology/REGISTRY.md`` for the full contract.

.. code-block:: python

   from diff_diff import EfficientDiD

   edid = EfficientDiD(pt_assumption="all")  # or "post" for post-treatment CS match
   results = edid.fit(data, outcome='y', unit='unit_id',
                      time='period', first_treat='first_treat',
                      aggregate='all')
   results.print_summary()

Sun-Abraham
~~~~~~~~~~~

Use :class:`~diff_diff.SunAbraham` when:

- You have staggered adoption and want an interaction-weighted event study
- You want to decompose effects by cohort and relative time
- You need a regression-based complement to Callaway-Sant'Anna

Sun & Abraham (2021) uses a saturated TWFE regression with cohort x relative-time
interactions, then aggregates cohort-specific effects using interaction weights.

.. code-block:: python

   from diff_diff import SunAbraham

   sa = SunAbraham(control_group='never_treated')
   results = sa.fit(data, outcome='y', unit='unit_id',
                    time='period', first_treat='first_treat')
   results.print_summary()

.. note::

   Running both Sun-Abraham and Callaway-Sant'Anna provides a useful robustness
   check. Both are consistent under heterogeneous treatment effects.

Imputation DiD
~~~~~~~~~~~~~~

Use :class:`~diff_diff.ImputationDiD` when:

- You have staggered adoption with homogeneous treatment effects
- You want shorter confidence intervals than Callaway-Sant'Anna (~50% shorter)
- You need imputed counterfactual outcomes for treated observations

Borusyak, Jaravel & Spiess (2024) estimate unit + time FE on untreated observations,
impute counterfactual Y(0) for treated observations, then aggregate.

.. code-block:: python

   from diff_diff import ImputationDiD

   imp = ImputationDiD()
   results = imp.fit(data, outcome='y', unit='unit_id',
                     time='period', first_treat='first_treat',
                     aggregate='event_study')
   results.print_summary()

.. note::

   Under homogeneous effects, ImputationDiD is semiparametrically efficient.
   If you suspect heterogeneous effects across cohorts, prefer Callaway-Sant'Anna.

Two-Stage DiD
~~~~~~~~~~~~~

Use :class:`~diff_diff.TwoStageDiD` when:

- You want the same point estimates as ImputationDiD with a different variance estimator
- You prefer the GMM sandwich variance that accounts for first-stage uncertainty
- You want a single ATT or an event study from a two-stage procedure

Gardner (2022) estimates FE on untreated obs (stage 1), residualizes all outcomes,
then regresses residuals on treatment indicators (stage 2).

.. code-block:: python

   from diff_diff import TwoStageDiD

   ts = TwoStageDiD()
   results = ts.fit(data, outcome='y', unit='unit_id',
                    time='period', first_treat='first_treat',
                    aggregate='event_study')
   results.print_summary()

.. note::

   Point estimates are identical to ImputationDiD; the key difference is the
   variance estimator (GMM sandwich vs. conservative clustered).

Stacked DiD
~~~~~~~~~~~

Use :class:`~diff_diff.StackedDiD` when:

- You have staggered adoption and want a sub-experiment approach
- You want to avoid forbidden comparisons in TWFE by construction
- You need corrective Q-weights for unbiased stacked estimation

Wing, Freedman & Hollingsworth (2024) create one sub-experiment per adoption cohort
with clean controls and apply Q-weights to reweight the stacked regression.

.. code-block:: python

   from diff_diff import StackedDiD

   stk = StackedDiD(kappa_pre=2, kappa_post=3)
   results = stk.fit(data, outcome='y', unit='unit_id',
                     time='period', first_treat='first_treat',
                     aggregate='event_study')
   results.print_summary()

.. note::

   The trimmed aggregate ATT may exclude early or late cohorts whose event
   windows do not fit in the data. Check ``results.trimmed_groups``.

TROP
~~~~

Use :class:`~diff_diff.TROP` when:

- You suspect interactive fixed effects (factor confounding)
- Standard parallel trends may not hold due to unobserved factors
- You want triple robustness: factor model + unit weights + time weights

Athey, Imbens, Qu & Viviano (2025) combine nuclear norm regularization,
exponential unit distance weights, and time decay weights with LOOCV tuning.

.. code-block:: python

   from diff_diff import TROP

   trop = TROP(n_bootstrap=200)
   results = trop.fit(data, outcome='y', treatment='treated',
                      unit='unit_id', time='period')
   results.print_summary()

.. note::

   TROP is computationally intensive. Use ``method='global'`` for faster
   estimation at the cost of some flexibility vs. ``method='local'``.

Bacon Decomposition
~~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.BaconDecomposition` when:

- You want to **diagnose** whether TWFE is biased in your staggered setting
- You need to see which 2x2 comparisons drive the TWFE estimate
- You want to check whether later-vs-earlier or already-treated-as-control comparisons carry substantial weight

Goodman-Bacon (2021) decomposes the TWFE estimate into a weighted average of
all 2x2 DiD comparisons and their weights.

.. code-block:: python

   from diff_diff import BaconDecomposition, plot_bacon

   bacon = BaconDecomposition()
   results = bacon.fit(data, outcome='y', unit='unit_id',
                       time='period', first_treat='first_treat')
   results.print_summary()

   # Visualize the decomposition
   plot_bacon(results)

.. note::

   This is a diagnostic tool, not an estimator. If the decomposition reveals
   problematic weights, switch to Callaway-Sant'Anna or another robust estimator.

Common Pitfalls
---------------

1. **Using TWFE with staggered adoption**

   TWFE estimates a weighted average of all 2x2 comparisons, including
   "forbidden" comparisons where already-treated units serve as controls.
   This can lead to severe bias, even negative weights on treatment effects.

   *Solution*: Use CallawaySantAnna for staggered designs.

2. **Ignoring treatment effect heterogeneity**

   If treatment effects vary by cohort (when units are treated) or over time
   (dynamic effects), aggregated estimators may be misleading.

   *Solution*: Use CallawaySantAnna and examine ATT(g,t) and event study plots.

3. **Failing to test parallel trends**

   The parallel trends assumption is untestable in the post-period but can
   be assessed using pre-treatment data.

   *Solution*: Use :func:`~diff_diff.check_parallel_trends` and
   :class:`~diff_diff.HonestDiD` for sensitivity analysis.

4. **Inappropriate clustering**

   Standard errors should typically be clustered at the level of treatment
   assignment (often the unit level).

   *Solution*: Always specify ``cluster`` for panel data.

Standard Error Methods
----------------------

Different estimators compute standard errors differently. Understanding these
differences helps interpret results and choose appropriate inference.

.. list-table::
   :header-rows: 1
   :widths: 20 25 55

   * - Estimator
     - Default SE Method
     - Details
   * - ``DifferenceInDifferences``
     - HC1 (heteroskedasticity-robust)
     - Uses White's robust SEs by default. Specify ``cluster`` for cluster-robust SEs. Use ``inference='wild_bootstrap'`` for few clusters (<30).
   * - ``TwoWayFixedEffects``
     - Cluster-robust (unit level)
     - Always clusters at unit level after within-transformation. Specify ``cluster`` to override. Use ``inference='wild_bootstrap'`` for few clusters.
   * - ``MultiPeriodDiD``
     - HC1 (heteroskedasticity-robust)
     - Same as basic DiD. Cluster-robust available via ``cluster``. Wild bootstrap not yet supported for multi-coefficient inference.
   * - ``CallawaySantAnna``
     - Analytical (influence function)
     - Uses influence-function SEs with WIF adjustment by default. Set ``n_bootstrap=999`` for multiplier bootstrap inference (weight types: ``rademacher``, ``mammen``, ``webb``).
   * - ``SyntheticDiD``
     - Placebo, paper-faithful refit bootstrap, or jackknife
     - Default uses placebo-based variance (``variance_method="placebo"``). Set ``variance_method="bootstrap"`` for paper-faithful Algorithm 2 bootstrap (re-estimates ω and λ via Frank-Wolfe per draw; ~5–30× slower than placebo, panel-size dependent). Both methods use ``n_bootstrap`` replications (default 200). ``variance_method="jackknife"`` is also available.
   * - ``ContinuousDiD``
     - Analytical (influence function)
     - Uses influence-function-based SEs by default. Use ``n_bootstrap=199`` (or higher) for multiplier bootstrap inference with proper CIs.
   * - ``HeterogeneousAdoptionDiD``
     - Path-dependent (CCT-2014 / 2SLS / Binder TSL)
     - Three SE regimes per :doc:`api/had`. **Unweighted**: continuous-dose paths use the CCT-2014 weighted-robust SE from the in-house ``lprobust`` port; mass-point uses a 2SLS sandwich. **Deprecated ``weights=`` shortcut**: continuous reuses CCT-2014; mass-point uses analytical weighted 2SLS (``classical`` / ``hc1``; CR1 when ``cluster=`` is supplied, except mass-point + ``cluster=`` + ``aggregate="event_study"`` + ``cband=True`` is rejected outright - see :doc:`api/had` for the cluster-combination deviation note); yields ``variance_formula="pweight"`` / ``"pweight_2sls"``. **``survey_design=SurveyDesign(weights="col", ...)``**: both paths compose Binder (1983) Taylor-series linearization (``"survey_binder_tsl"`` / ``"survey_binder_tsl_2sls"``); mass-point + ``survey_design=`` + ``cluster=`` is also rejected outright (combined survey + cluster inference is deferred). The two weighted families differ on this estimator until the next-minor unification lands. Per-horizon CIs are pointwise; sup-t bands available only on the weighted event-study path via ``cband=True``.
   * - ``SunAbraham``
     - Cluster-robust (unit level)
     - Clusters at unit level by default. Specify ``cluster`` to override. Use ``n_bootstrap`` for pairs bootstrap inference.
   * - ``ImputationDiD``
     - Conservative clustered (Theorem 3)
     - Uses conservative clustered variance from Borusyak et al. Theorem 3, clustered at unit level. Use ``n_bootstrap`` for multiplier bootstrap.
   * - ``TwoStageDiD``
     - GMM sandwich (clustered)
     - Uses GMM sandwich variance accounting for first-stage estimation uncertainty, clustered at unit level. Use ``n_bootstrap`` for multiplier bootstrap.
   * - ``StackedDiD``
     - Cluster-robust (unit level)
     - Clusters at unit level by default. Set ``cluster='unit_subexp'`` for (unit, sub-experiment) clustering.
   * - ``TripleDifference``
     - Influence function (robust)
     - Uses influence-function-based SEs (inherently heteroskedasticity-robust). Specify ``cluster`` for cluster-robust SEs.
   * - ``TROP``
     - Bootstrap (n_bootstrap=200)
     - Uses unit-level block bootstrap for variance estimation. Bootstrap is always required (minimum n_bootstrap=2).
   * - ``EfficientDiD``
     - Analytical (EIF-based)
     - Uses efficient influence function SE = sqrt(mean(EIF^2) / n). Use ``n_bootstrap`` for multiplier bootstrap.
   * - ``BaconDecomposition``
     - N/A (diagnostic)
     - Diagnostic tool only; does not produce standard errors.

**Recommendations by sample size:**

- **Large samples (N > 1000, clusters > 50)**: Default analytical SEs are reliable
- **Medium samples (clusters 30-50)**: Cluster-robust SEs recommended
- **Small samples (clusters < 30)**: Use wild cluster bootstrap (``inference='wild_bootstrap'``)
- **Very few clusters (< 10)**: Use Webb 6-point distribution (``weight_type='webb'``)

**Common pitfall:** Forgetting to cluster when units are observed multiple times.
For panel data, always cluster at the unit level unless you have a strong reason not to.

.. code-block:: python

   from diff_diff import DifferenceInDifferences, generate_did_data

   panel = generate_did_data(n_units=200, n_periods=10, treatment_effect=2.0)

   # Good: Cluster at unit level for panel data
   did = DifferenceInDifferences(cluster='unit')
   results = did.fit(panel, outcome='outcome', treatment='treated',
                     time='post')

   # Better for few clusters: Wild bootstrap
   did = DifferenceInDifferences(inference='wild_bootstrap', cluster='unit')
   results = did.fit(panel, outcome='outcome', treatment='treated',
                     time='post')

When in Doubt
-------------

If you're unsure which estimator to use:

1. **Start with CallawaySantAnna** - It's valid even for non-staggered designs
   and provides the most flexible output (group-time effects, aggregations)

2. **Check for heterogeneity** - Plot event studies to see if effects vary

3. **Run sensitivity analysis** - Use HonestDiD to assess robustness

4. **Compare estimators** - If results differ substantially across estimators,
   investigate why (often reveals violations of assumptions)

5. **Using survey data?** - Pass a ``SurveyDesign`` to ``fit()`` for design-based
   variance estimation. See the :ref:`survey-design-support` section below for
   the compatibility matrix, and the `survey tutorial <https://github.com/igerber/diff-diff/blob/main/docs/tutorials/16_survey_did.ipynb>`_
   for a full walkthrough.

.. _survey-design-support:

Survey Design Support
---------------------

All estimators accept an optional ``survey_design`` parameter in ``fit()``.
Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance
estimation. The depth of support varies by estimator:

.. note::

   If your data starts as **individual-level survey microdata** (e.g., BRFSS,
   ACS, CPS, NHANES respondent records), use :func:`~diff_diff.aggregate_survey`
   as a preprocessing step. It pools microdata into geographic-period cells and
   returns a pre-configured :class:`~diff_diff.SurveyDesign`. By default, the
   returned design uses ``weight_type="pweight"`` (unit-constant population
   weights), which is compatible with **all** survey-capable
   estimators in the matrix below. Pass ``second_stage_weights="aweight"`` for
   precision weights (inverse variance) if you prefer efficiency-weighted
   estimates - this mode is limited to estimators marked **Full**.
   See :doc:`api/prep` for the API reference.

.. list-table::
   :header-rows: 1
   :widths: 25 12 18 18 18

   * - Estimator
     - Weights
     - Strata/PSU/FPC
     - Replicate Weights
     - Survey Bootstrap
   * - ``DifferenceInDifferences``
     - Full
     - Full
     - Full
     - --
   * - ``TwoWayFixedEffects``
     - Full
     - Full
     - Full
     - --
   * - ``MultiPeriodDiD``
     - Full
     - Full
     - Full
     - --
   * - ``CallawaySantAnna``
     - pweight only
     - Full
     - Full
     - Multiplier at PSU
   * - ``ChaisemartinDHaultfoeuille``
     - pweight only
     - Full (TSL)
     - --
     - Group-level (warning)
   * - ``TripleDifference``
     - pweight only
     - Full
     - Full (analytical)
     - --
   * - ``StaggeredTripleDifference``
     - pweight only
     - Full
     - Full
     - Multiplier at PSU
   * - ``SunAbraham``
     - Full
     - Full
     - Full
     - Rao-Wu rescaled
   * - ``StackedDiD``
     - pweight only
     - Full (pweight only)
     - Full
     - --
   * - ``ImputationDiD``
     - pweight only
     - Full
     - Full (analytical)
     - Multiplier at PSU
   * - ``TwoStageDiD``
     - pweight only
     - Full
     - Full (analytical)
     - Multiplier at PSU
   * - ``ContinuousDiD``
     - Full
     - Full
     - Full (analytical)
     - Multiplier at PSU
   * - ``HeterogeneousAdoptionDiD``
     - pweight only
     - Full (Binder TSL)
     - --
     - Multiplier (event-study, ``cband=True`` only)
   * - ``EfficientDiD``
     - Full
     - Full
     - Full (analytical)
     - Multiplier at PSU
   * - ``SyntheticDiD``
     - pweight only
     - Via bootstrap
     - --
     - Hybrid pairs-bootstrap + Rao-Wu rescaled (bootstrap only)
   * - ``TROP``
     - pweight only
     - Via bootstrap
     - --
     - Rao-Wu rescaled
   * - ``WooldridgeDiD``
     - Full (pweight only)
     - Full (analytical)
     - --
     - --
   * - ``BaconDecomposition``
     - Diagnostic
     - Diagnostic
     - --
     - --

**Legend:**

- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance
- **Full (pweight only)**: Full TSL with strata/PSU/FPC, but only ``pweight`` accepted (``fweight``/``aweight`` rejected because composition changes weight semantics)
- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance. ``TROP`` uses bootstrap by default. ``SyntheticDiD`` supports strata/PSU/FPC on ``variance_method='bootstrap'`` via a hybrid pairs-bootstrap + Rao-Wu rescaling composition (see the ``Note (survey + bootstrap composition)`` in REGISTRY.md §SyntheticDiD); ``placebo`` and ``jackknife`` remain pweight-only.
- **pweight only** (Weights column): Only ``pweight`` accepted; ``fweight``/``aweight`` raise an error
- **Diagnostic**: Weighted descriptive statistics only (no inference)
- **--**: Not supported

.. note::

   ``SyntheticDiD`` supports survey designs on ``variance_method='bootstrap'``
   — both pweight-only and full strata/PSU/FPC — via a hybrid pairs-bootstrap
   composed with per-draw Rao-Wu rescaled weights fed into a weighted
   Frank-Wolfe re-estimation of ω and λ. See the
   ``Note (survey + bootstrap composition)`` in REGISTRY.md §SyntheticDiD
   for the objective form and argmin-set caveat.

   ``variance_method='placebo'`` and ``variance_method='jackknife'`` remain
   pweight-only — composing placebo permutations / leave-one-out with
   Rao-Wu rescaling under the weighted objective is a separate derivation
   (tracked in ``TODO.md``).

For the full walkthrough with code examples, see the
`survey tutorial <https://github.com/igerber/diff-diff/blob/main/docs/tutorials/16_survey_did.ipynb>`_.
For deferred work and remaining limitations, see ``docs/survey-roadmap.md``.