.. meta::
   :description: Compare diff-diff with R packages for DiD analysis. Migration guide from R did, fixest, synthdid, and HonestDiD to Python with side-by-side code examples.
   :keywords: R did package python alternative, fixest python, synthdid python, R to python DiD, econometrics R vs python

R Comparison
============

This guide compares diff-diff with popular R packages for DiD analysis, helping
users familiar with R transition to Python.

Overview
--------

.. list-table::
   :header-rows: 1
   :widths: 25 25 25 25

   * - Feature
     - diff-diff (Python)
     - did (R)
     - Other R
   * - Basic DiD
     - ✅ ``DifferenceInDifferences``
     - ✅ ``att_gt``
     - ✅ ``fixest::feols``
   * - Staggered DiD
     - ✅ ``CallawaySantAnna``
     - ✅ ``att_gt``
     - ``did2s``, ``DRDID``
   * - Covariate adjustment
     - ✅ DR, IPW, Reg
     - ✅ DR, IPW, Reg
     - ✅ Varies
   * - Honest DiD
     - ✅ ``HonestDiD``
     - ``HonestDiD`` package
     - N/A
   * - Synthetic DiD
     - ✅ ``SyntheticDiD``
     - ``synthdid`` package
     - N/A
   * - Wild bootstrap
     - ✅ ``wild_bootstrap_se``
     - ``fwildclusterboot``
     - N/A

Package Correspondence
----------------------

R ``did`` Package → diff-diff
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The R ``did`` package by Callaway and Sant'Anna is the gold standard for
staggered DiD. Here's how to translate common operations:

**Basic estimation:**

.. code-block:: r

   # R (did package)
   library(did)
   out <- att_gt(
     yname = "Y",
     tname = "period",
     idname = "id",
     gname = "G",
     data = data
   )

.. code-block:: python

   # Python (diff-diff)
   from diff_diff import CallawaySantAnna

   cs = CallawaySantAnna()
   results = cs.fit(
       data,
       outcome='Y',
       time='period',
       unit='id',
       first_treat='G'
   )

**With covariates (doubly robust):**

.. code-block:: r

   # R
   out <- att_gt(
     yname = "Y", tname = "period",
     idname = "id", gname = "G",
     xformla = ~ X1 + X2,
     est_method = "dr",
     data = data
   )

.. code-block:: python

   # Python
   cs = CallawaySantAnna(estimation_method='dr')
   results = cs.fit(
       data,
       outcome='Y',
       time='period',
       unit='id',
       first_treat='G',
       covariates=['X1', 'X2']
   )

**Aggregations:**

.. code-block:: r

   # R
   agg_simple <- aggte(out, type = "simple")
   agg_dynamic <- aggte(out, type = "dynamic")
   agg_group <- aggte(out, type = "group")

.. code-block:: python

   # Python (unlike R's aggte(), aggregation is requested at fit time)
   results = cs.fit(data, outcome='Y', time='period', unit='id',
                    first_treat='G', aggregate='all')
   overall_att = results.overall_att  # Simple aggregation
   event_study = results.event_study_effects  # Dynamic
   by_group = results.group_effects  # By cohort

R ``HonestDiD`` Package → diff-diff
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The HonestDiD package implements Rambachan & Roth (2023) sensitivity analysis:

**Relative magnitudes (ΔRM):**

.. code-block:: r

   # R
   library(HonestDiD)
   delta_rm_results <- createSensitivityResults_relativeMagnitudes(
     betahat = beta_hat,
     sigma = sigma,
     numPrePeriods = 4,
     numPostPeriods = 3,
     Mbarvec = seq(0, 2, by = 0.5)
   )

.. code-block:: python

   # Python
   from diff_diff import HonestDiD

   honest = HonestDiD(method='relative_magnitude', M=1.0)
   results = honest.fit(event_study_results)

   # Sensitivity analysis over M grid
   sensitivity = honest.sensitivity_analysis(
       event_study_results,
       M_grid=[0, 0.5, 1.0, 1.5, 2.0]
   )

**Smoothness restrictions (ΔSD):**

.. code-block:: r

   # R
   delta_sd_results <- createSensitivityResults(
     betahat = beta_hat,
     sigma = sigma,
     numPrePeriods = 4,
     numPostPeriods = 3,
     Mvec = seq(0, 0.1, by = 0.02)
   )

.. code-block:: python

   # Python
   from diff_diff import HonestDiD

   honest = HonestDiD(method='smoothness', M=0.05)
   results = honest.fit(event_study_results)

R ``synthdid`` Package → diff-diff
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The synthdid package implements Arkhangelsky et al. (2021):

.. code-block:: r

   # R
   library(synthdid)
   setup <- panel.matrices(data, unit = "unit", time = "time",
                           outcome = "Y", treatment = "treatment")
   tau.hat <- synthdid_estimate(setup$Y, setup$N0, setup$T0)

.. code-block:: python

   # Python
   from diff_diff import SyntheticDiD

   # SyntheticDiD requires a time-invariant ever-treated indicator
   data['ever_treated'] = data.groupby('unit')['treatment'].transform('max')

   # Derive post-treatment periods from treatment timing
   post_periods = sorted(data.loc[data['treatment'] == 1, 'time'].unique())

   sdid = SyntheticDiD()
   results = sdid.fit(
       data,
       outcome='Y',
       unit='unit',
       time='time',
       treatment='ever_treated',
       post_periods=post_periods
   )

Heterogeneous Adoption (HAD)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When every unit is treated at the post period (universal-rollout policies,
industry-wide regime changes) but treatment intensity varies across units,
the standard R workhorses (``did``, ``fixest``, ``synthdid``,
``DIDmultiplegtDYN``) assume an untreated comparison group exists and do
not apply. The dedicated R package ``DIDHAD`` (de Chaisemartin et al.,
August 2025) covers the QUG case (Design 1', ``d_lower = 0``) from the
same arXiv paper.

``diff-diff`` ships :class:`~diff_diff.HeterogeneousAdoptionDiD`, which
implements de Chaisemartin, Ciccia, D'Haultfoeuille and Knau (2026,
arXiv:2405.04465v6) and adds two surfaces beyond the QUG-focused R
package: Design 1 (no QUG, ``d_lower > 0``, targets ``WAS_{d_lower}`` under
Assumption 6 or sign-only under Assumption 5), and survey-design
integration via Binder (1983) Taylor-series linearization (sampling weights
+ optional strata / PSU / FPC). The diagnostic battery
:func:`~diff_diff.did_had_pretest_workflow` surfaces violations of the HAD
identification assumptions (the design path is auto-detected separately by
:meth:`HeterogeneousAdoptionDiD.fit` from the dose support).

.. code-block:: python

   import numpy as np
   import pandas as pd
   from diff_diff import HeterogeneousAdoptionDiD

   # Build a HAD-shape panel: D=0 in pre-periods (t < F), D > 0 only at F+.
   rng = np.random.default_rng(42)
   G, F, T = 200, 4, 5
   doses = rng.beta(0.5, 1.0, size=G)
   rows = []
   for g in range(G):
       for t in range(1, T + 1):
           y = (rng.normal()
                + (doses[g] + doses[g] ** 2) * (t >= F)
                + rng.normal(0, 0.5))
           d = doses[g] if t >= F else 0.0
           rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
   had_data = pd.DataFrame(rows)

   est = HeterogeneousAdoptionDiD()
   results = est.fit(had_data, outcome_col='y', unit_col='unit',
                     time_col='period', dose_col='dose',
                     aggregate='event_study')

Key Differences
---------------

Design Philosophy
~~~~~~~~~~~~~~~~~

- **diff-diff**: sklearn-style API with ``fit()`` method, returning rich result objects
- **R packages**: Function-based, returning lists or S3/S4 objects

Inference
~~~~~~~~~

- **diff-diff**: Analytical SEs by default, wild bootstrap available
- **R did**: Multiplier bootstrap by default

Fixed Effects
~~~~~~~~~~~~~

- **diff-diff**: ``absorb`` parameter for high-dimensional FE (within transformation)
- **R fixest**: ``feols`` with ``|`` notation for absorbed FE

Output Format
~~~~~~~~~~~~~

diff-diff results have convenience methods:

.. code-block:: python

   results.summary()       # Print formatted table
   results.to_dict()       # Dictionary representation
   results.to_dataframe()  # pandas DataFrame

Feature Comparison Table
------------------------

.. list-table::
   :header-rows: 1
   :widths: 40 15 15 15 15

   * - Feature
     - diff-diff
     - R did
     - R HonestDiD
     - R synthdid
   * - Basic 2x2 DiD
     - ✅
     - ✅
     - ❌
     - ❌
   * - TWFE
     - ✅
     - ❌
     - ❌
     - ❌
   * - Staggered DiD (CS)
     - ✅
     - ✅
     - ❌
     - ❌
   * - Covariate adjustment
     - ✅
     - ✅
     - ❌
     - ❌
   * - Doubly robust
     - ✅
     - ✅
     - ❌
     - ❌
   * - Group-time effects
     - ✅
     - ✅
     - ❌
     - ❌
   * - Event study
     - ✅
     - ✅
     - ✅
     - ❌
   * - Synthetic DiD
     - ✅
     - ❌
     - ❌
     - ✅
   * - Honest DiD (ΔRM)
     - ✅
     - ❌
     - ✅
     - ❌
   * - Honest DiD (ΔSD)
     - ✅
     - ❌
     - ✅
     - ❌
   * - Wild bootstrap
     - ✅
     - ❌
     - ❌
     - ❌
   * - Cluster-robust SE
     - ✅
     - ✅
     - ❌
     - ✅
   * - Placebo tests
     - ✅
     - ❌
     - ❌
     - ✅
   * - Parallel trends tests
     - ✅
     - ✅
     - ❌
     - ❌
   * - Bacon decomposition
     - ✅
     - ❌
     - ❌
     - ❌
   * - Sun-Abraham
     - ✅
     - ❌
     - ❌
     - ❌
   * - Imputation DiD
     - ✅
     - ❌
     - ❌
     - ❌
   * - Two-Stage DiD (did2s)
     - ✅
     - ❌
     - ❌
     - ❌
   * - Stacked DiD
     - ✅
     - ❌
     - ❌
     - ❌
   * - Continuous DiD
     - ✅
     - ✅
     - ❌
     - ❌
   * - Triple Difference (DDD)
     - ✅
     - ❌
     - ❌
     - ❌
   * - TROP
     - ✅
     - ❌
     - ❌
     - ❌
   * - Efficient DiD
     - ✅
     - ❌
     - ❌
     - ❌
   * - Heterogeneous adoption (HAD)
     - ✅
     - ❌
     - ❌
     - ❌

.. note::

   R equivalents for estimators not covered by the ``did``, ``HonestDiD``, or
   ``synthdid`` packages: Sun-Abraham is available via ``fixest::sunab()``;
   Imputation DiD via the ``didimputation`` package; Two-Stage DiD via the
   ``did2s`` package; Bacon Decomposition via the ``bacondecomp`` package;
   Stacked DiD requires manual implementation or the ``stackedev`` package;
   Continuous DiD is available via the ``did`` package continuous extension;
   Triple Difference requires manual implementation in R.
   TROP and Efficient DiD have no direct R equivalents.
   HeterogeneousAdoptionDiD (dCDH 2026) overlaps with the dedicated R
   package ``DIDHAD`` (de Chaisemartin et al., 2025), which covers the
   QUG case (Design 1'); diff-diff additionally covers Design 1 (no QUG,
   ``WAS_{d_lower}``) and survey-design integration via Binder TSL.

Migration Tips
--------------

1. **Column names**: diff-diff uses string column names, similar to R packages

2. **Formula interface**: diff-diff supports R-style formulas for basic DiD:
   ``formula='y ~ treated * post'``

3. **Results access**: Use ``.att``, ``.se``, ``.ci`` instead of ``$att``, ``$se``

4. **Visualization**: ``plot_event_study()`` produces matplotlib figures similar
   to ``ggdid()`` output

5. **Missing data**: diff-diff requires complete data; use ``balance_panel()``
   or ``dropna()`` first

6. **Heterogeneous Adoption (HAD)**: If you need surfaces the R ``DIDHAD``
   package does not cover - Design 1 (no QUG, ``WAS_{d_lower}``) or
   survey-design integration - reach for
   :class:`~diff_diff.HeterogeneousAdoptionDiD`. See the
   `Heterogeneous Adoption (HAD)`_ section above for the migration pattern.