Choosing an Estimator
=====================

This guide helps you select the right estimator for your research design.

Decision Flowchart
------------------

Start here and follow the questions:

1. **Is treatment staggered?** (Different units treated at different times)

   - **No** → Go to question 2
   - **Yes** → Use :class:`~diff_diff.CallawaySantAnna`

2. **Do you have panel data?** (Multiple observations per unit over time)

   - **No** → Use :class:`~diff_diff.DifferenceInDifferences` (basic 2x2)
   - **Yes** → Go to question 3

3. **Do you need period-specific effects?** (Event study design)

   - **No** → Use :class:`~diff_diff.TwoWayFixedEffects`
   - **Yes** → Use :class:`~diff_diff.MultiPeriodDiD`

4. **Is your treated group small?** (Few treated units, many controls)

   - Consider :class:`~diff_diff.SyntheticDiD` for better pre-treatment fit

Quick Reference
---------------

.. list-table::
   :header-rows: 1
   :widths: 20 30 25 25

   * - Estimator
     - Best For
     - Key Assumption
     - Output
   * - ``DifferenceInDifferences``
     - Simple 2x2 designs, cross-sectional comparisons
     - Parallel trends (2 periods)
     - Single ATT
   * - ``TwoWayFixedEffects``
     - Panel data, simultaneous treatment
     - Parallel trends (all periods)
     - Single ATT with unit/time FE
   * - ``MultiPeriodDiD``
     - Event studies, dynamic effects
     - Parallel trends (pre-periods)
     - Period-specific effects
   * - ``CallawaySantAnna``
     - Staggered adoption, heterogeneous timing
     - Conditional parallel trends
     - Group-time ATT(g,t), aggregations
   * - ``SyntheticDiD``
     - Few treated units, many controls
     - Synthetic parallel trends
     - ATT with unit/time weights

Detailed Guidance
-----------------

Basic 2x2 DiD
~~~~~~~~~~~~~

Use :class:`~diff_diff.DifferenceInDifferences` when:

- You have a simple before/after, treatment/control design
- Treatment occurs simultaneously for all treated units
- You want a single average treatment effect

.. code-block:: python

   from diff_diff import DifferenceInDifferences

   did = DifferenceInDifferences()
   results = did.fit(data, outcome='y', treated='treated', post='post')

Two-Way Fixed Effects
~~~~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.TwoWayFixedEffects` when:

- You have panel data with multiple time periods
- Treatment timing is the same for all treated units
- You want to control for unit and time fixed effects
- You don't need to see period-by-period effects

.. warning::

   TWFE can be biased with staggered treatment timing. Already-treated units
   act as controls for newly-treated units, which can cause negative weighting.
   Use :class:`~diff_diff.CallawaySantAnna` for staggered designs.

.. code-block:: python

   from diff_diff import TwoWayFixedEffects

   twfe = TwoWayFixedEffects()
   results = twfe.fit(data, outcome='y', treated='treated',
                      unit='unit_id', time='period')

Multi-Period Event Study
~~~~~~~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.MultiPeriodDiD` when:

- You want a full event-study with pre and post treatment effects
- You need pre-period coefficients to assess parallel trends
- You want to visualize treatment effect dynamics over time
- All treated units receive treatment at the same time (simultaneous adoption)

.. code-block:: python

   from diff_diff import MultiPeriodDiD, plot_event_study

   event = MultiPeriodDiD(reference_period=-1)
   results = event.fit(data, outcome='y', treated='treated',
                       time='period', unit='unit_id', treatment_start=5)

   # Visualize
   plot_event_study(results)

Callaway-Sant'Anna
~~~~~~~~~~~~~~~~~~

Use :class:`~diff_diff.CallawaySantAnna` when:

- Treatment is adopted at different times (staggered rollout)
- You want valid treatment effect estimates with heterogeneous timing
- You need group-time specific effects ATT(g,t)

This is the recommended estimator for most applied work with staggered adoption.

.. code-block:: python

   from diff_diff import CallawaySantAnna

   cs = CallawaySantAnna(
       control_group='never_treated',  # or 'not_yet_treated'
       estimation_method='dr'  # doubly robust (recommended)
   )
   results = cs.fit(data, outcome='y', unit='unit_id',
                    time='period', first_treat='first_treat',
                    covariates=['x1', 'x2'])

   # Get aggregated effects
   print(f"Overall ATT: {results.att:.3f}")

   # Event study aggregation
   event_study = results.aggregate('event_time')

Synthetic DiD
~~~~~~~~~~~~~

Use :class:`~diff_diff.SyntheticDiD` when:

- You have few treated units but many control units
- Pre-treatment fit between treated and control is poor
- You want to construct a weighted synthetic control

.. code-block:: python

   from diff_diff import SyntheticDiD

   sdid = SyntheticDiD()
   results = sdid.fit(data, outcome='y', unit='unit_id',
                      time='period', treated='treated',
                      treatment_start=5)

   # View the unit weights
   print(results.unit_weights)

Common Pitfalls
---------------

1. **Using TWFE with staggered adoption**

   TWFE estimates a weighted average of all 2x2 comparisons, including
   "forbidden" comparisons where already-treated units serve as controls.
   This can lead to severe bias, even negative weights on treatment effects.

   *Solution*: Use CallawaySantAnna for staggered designs.

2. **Ignoring treatment effect heterogeneity**

   If treatment effects vary by cohort (when units are treated) or over time
   (dynamic effects), aggregated estimators may be misleading.

   *Solution*: Use CallawaySantAnna and examine ATT(g,t) and event study plots.

3. **Failing to test parallel trends**

   The parallel trends assumption is untestable in the post-period but can
   be assessed using pre-treatment data.

   *Solution*: Use :func:`~diff_diff.check_parallel_trends` and
   :class:`~diff_diff.HonestDiD` for sensitivity analysis.

4. **Inappropriate clustering**

   Standard errors should typically be clustered at the level of treatment
   assignment (often the unit level).

   *Solution*: Always specify ``cluster_col`` for panel data.

Standard Error Methods
----------------------

Different estimators compute standard errors differently. Understanding these
differences helps interpret results and choose appropriate inference.

.. list-table::
   :header-rows: 1
   :widths: 20 25 55

   * - Estimator
     - Default SE Method
     - Details
   * - ``DifferenceInDifferences``
     - HC1 (heteroskedasticity-robust)
     - Uses White's robust SEs by default. Specify ``cluster_col`` for cluster-robust SEs. Use ``inference='wild_bootstrap'`` for few clusters (<30).
   * - ``TwoWayFixedEffects``
     - Cluster-robust (unit level)
     - Always clusters at unit level after within-transformation. Specify ``cluster_col`` to override. Use ``inference='wild_bootstrap'`` for few clusters.
   * - ``MultiPeriodDiD``
     - HC1 (heteroskedasticity-robust)
     - Same as basic DiD. Cluster-robust available via ``cluster_col``. Wild bootstrap not yet supported for multi-coefficient inference.
   * - ``CallawaySantAnna``
     - Analytical (simple difference)
     - Uses simple variance of group-time means. Use ``bootstrap()`` method for multiplier bootstrap inference with proper SEs, CIs, and p-values.
   * - ``SyntheticDiD``
     - Bootstrap or placebo-based
     - Default uses bootstrap resampling. Set ``n_bootstrap=0`` for placebo-based inference using pre-treatment residuals.

**Recommendations by sample size:**

- **Large samples (N > 1000, clusters > 50)**: Default analytical SEs are reliable
- **Medium samples (clusters 30-50)**: Cluster-robust SEs recommended
- **Small samples (clusters < 30)**: Use wild cluster bootstrap (``inference='wild_bootstrap'``)
- **Very few clusters (< 10)**: Use Webb 6-point distribution (``weight_type='webb'``)

**Common pitfall:** Forgetting to cluster when units are observed multiple times.
For panel data, always cluster at the unit level unless you have a strong reason not to.

.. code-block:: python

   # Good: Cluster at unit level for panel data
   did = DifferenceInDifferences()
   results = did.fit(data, outcome='y', treated='treated',
                     post='post', cluster_col='unit_id')

   # Better for few clusters: Wild bootstrap
   did = DifferenceInDifferences(inference='wild_bootstrap')
   results = did.fit(data, outcome='y', treated='treated',
                     post='post', cluster_col='state')

When in Doubt
-------------

If you're unsure which estimator to use:

1. **Start with CallawaySantAnna** - It's valid even for non-staggered designs
   and provides the most flexible output (group-time effects, aggregations)

2. **Check for heterogeneity** - Plot event studies to see if effects vary

3. **Run sensitivity analysis** - Use HonestDiD to assess robustness

4. **Compare estimators** - If results differ substantially across estimators,
   investigate why (often reveals violations of assumptions)