Choosing an Estimator#
This guide helps you select the right estimator for your research design.
Decision Flowchart#
Start here and follow the questions:
Is this a triple-difference (DDD) design? (Two criteria for treatment: e.g., policy adoption AND group eligibility)
No → Go to question 1
Yes, simultaneous treatment (2×2×2) → Use
TripleDifferenceYes, with staggered timing → Use
StaggeredTripleDifference
Is treatment continuous? (Units receive different doses or intensities)
No → Go to question 2
Yes → Use
ContinuousDiD
Can treatment switch on AND off? (Reversible / non-absorbing treatment — e.g., marketing campaigns, seasonal promotions, on/off policy cycles)
No (treatment is absorbing — once treated, stays treated) → Go to question 3
Yes → Use
ChaisemartinDHaultfoeuille— the only library estimator that handles non-absorbing treatments
Is treatment staggered? (Different units treated at different times)
No → Go to question 4
Yes → Use
CallawaySantAnna(orEfficientDiDfor tighter SEs under PT-All)Yes, and you suspect homogeneous effects → Use
ImputationDiDorTwoStageDiDfor tighter CIsYes, with nonlinear outcome (binary/count) → Use
WooldridgeDiDwithmethod='logit'ormethod='poisson'Want to diagnose TWFE bias? → Use
BaconDecompositionfirst
Do you have panel data? (Multiple observations per unit over time)
No → Use
DifferenceInDifferences(basic 2x2)Yes → Go to question 5
Do you need period-specific effects? (Event study design)
No → Use
TwoWayFixedEffectsYes → Use
MultiPeriodDiD
Is your treated group small? (Few treated units, many controls)
Consider
SyntheticDiDfor better pre-treatment fit
Quick Reference#
Estimator |
Best For |
Key Assumption |
Output |
|---|---|---|---|
|
Simple 2x2 designs, cross-sectional comparisons |
Parallel trends (2 periods) |
Single ATT |
|
Panel data, simultaneous treatment |
Parallel trends (all periods) |
Single ATT with unit/time FE |
|
Event studies, dynamic effects |
Parallel trends (pre-periods) |
Period-specific effects |
|
Staggered adoption, heterogeneous timing |
Conditional parallel trends |
Group-time ATT(g,t), aggregations |
|
Reversible / non-absorbing treatments (only library option) |
Parallel trends + A5 (no crossing) + A11 (stable controls) |
DID_l event study (L_max), normalized DID^n_l, cost-benefit delta, placebos, sup-t bands, TWFE diagnostic |
|
Few treated units, many controls |
Synthetic parallel trends |
ATT with unit/time weights |
|
Staggered adoption with optimal efficiency |
PT-All (overidentified) or PT-Post |
Group-time ATT(g,t), aggregations |
|
Continuous dose / treatment intensity |
Strong Parallel Trends (SPT) for dose-response; PT for binarized ATT |
ATTloc (PT); ATT(d), ACRT(d) (SPT) |
|
Universal rollout, dose varies, no untreated unit |
dCDH 2026 Assumptions (Design 1’ QUG case or Design 1 with A6/A5) |
WAS or WASd_lower per resolved estimand; event-study Appendix B.2 |
|
Staggered adoption, interaction-weighted |
Conditional parallel trends |
Cohort-specific ATTs, event study |
|
Staggered, homogeneous effects |
Unit + time FE structure |
Imputed treatment effects, event study |
|
Staggered adoption, efficient |
Unit + time FE structure |
Single ATT or event study |
|
Staggered, sub-experiment approach |
Parallel trends per cohort |
Trimmed aggregate ATT |
|
Factor confounding suspected |
Factor model + weights |
ATT with triple robustness |
|
Two eligibility criteria (DDD) |
Parallel trends for both dimensions |
DDD ATT (regression, IPW, or DR) |
|
Staggered DDD with treatment timing |
Conditional parallel trends (DDD) |
Group-time ATT(g,t), aggregations |
|
Nonlinear outcomes or saturated OLS |
Conditional parallel trends |
OLS: direct coefficients; logit/Poisson: ASF-based ATT |
|
TWFE diagnostic |
(diagnostic tool) |
2x2 decomposition weights |
Detailed Guidance#
Basic 2x2 DiD#
Use DifferenceInDifferences when:
You have a simple before/after, treatment/control design
Treatment occurs simultaneously for all treated units
You want a single average treatment effect
from diff_diff import DifferenceInDifferences
did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treatment='treated', time='post')
Two-Way Fixed Effects#
Use TwoWayFixedEffects when:
You have panel data with multiple time periods
Treatment timing is the same for all treated units
You want to control for unit and time fixed effects
You don’t need to see period-by-period effects
Warning
TWFE can be biased with staggered treatment timing. Already-treated units
act as controls for newly-treated units, which can cause negative weighting.
Use CallawaySantAnna for staggered designs.
from diff_diff import TwoWayFixedEffects
twfe = TwoWayFixedEffects()
results = twfe.fit(data, outcome='y', treatment='treated',
unit='unit_id', time='period')
Multi-Period Event Study#
Use MultiPeriodDiD when:
You want a full event-study with pre and post treatment effects
You need pre-period coefficients to assess parallel trends
You want to visualize treatment effect dynamics over time
All treated units receive treatment at the same time (simultaneous adoption)
from diff_diff import MultiPeriodDiD, plot_event_study
event = MultiPeriodDiD()
results = event.fit(data, outcome='y', treatment='treated',
time='period', unit='unit_id', reference_period=2)
# Visualize
plot_event_study(results)
Callaway-Sant’Anna#
Use CallawaySantAnna when:
Treatment is adopted at different times (staggered rollout)
You want valid treatment effect estimates with heterogeneous timing
You need group-time specific effects ATT(g,t)
This is the recommended estimator for most applied work with staggered adoption.
from diff_diff import CallawaySantAnna
cs = CallawaySantAnna(
control_group='never_treated', # or 'not_yet_treated'
estimation_method='dr' # doubly robust (recommended)
)
results = cs.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
covariates=['x1', 'x2'])
# Overall ATT
print(f"Overall ATT: {results.overall_att:.3f}")
# Event study aggregation
es = cs.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
covariates=['x1', 'x2'], aggregate='event_study')
event_study_df = es.to_dataframe('event_study')
Reversible (Non-Absorbing) Treatment#
Use ChaisemartinDHaultfoeuille (alias DCDH) when:
Treatment can switch on and off over time (e.g., marketing campaigns, seasonal promotions, on/off policy cycles)
You need separate joiners (
DID_+) and leavers (DID_-) views, plus the aggregateDID_MYou want a built-in placebo and a TWFE decomposition diagnostic computed on the data you pass in (pre-filter) for direct comparison against
DID_MYou want a multi-horizon event study (pass
L_maxtofit()) with normalized effects, cost-benefit aggregation, dynamic placebos, and sup-t simultaneous confidence bands
This is the only library estimator that handles non-absorbing treatments.
All other staggered estimators
(CallawaySantAnna, SunAbraham,
ImputationDiD, TwoStageDiD,
EfficientDiD, WooldridgeDiD) assume
treatment is absorbing - once treated, stays treated.
Ships DID_M (= DID_1) from de Chaisemartin & D’Haultfœuille
(2020), the full multi-horizon event study DID_l for l = 1..L_max
from the dynamic companion paper (NBER WP 29873), residualization-style
covariate adjustment (controls), group-specific linear trends
(trends_linear), state-set-specific trends (trends_nonparam),
heterogeneity testing, non-binary treatment, HonestDiD sensitivity
integration on placebos, and survey support via Taylor-series linearization.
from diff_diff import ChaisemartinDHaultfoeuille
from diff_diff.prep import generate_reversible_did_data
data = generate_reversible_did_data(n_groups=80, n_periods=6, seed=42)
est = ChaisemartinDHaultfoeuille()
results = est.fit(
data,
outcome="outcome",
group="group",
time="period",
treatment="treatment",
)
results.print_summary()
print(f"DID_M (overall): {results.overall_att:.3f}")
print(f"DID_+ (joiners): {results.joiners_att:.3f}")
print(f"DID_- (leavers): {results.leavers_att:.3f}")
print(f"Placebo: {results.placebo_effect:.3f}")
Note
By default, the estimator drops groups whose treatment switches more
than once before estimation (drop_larger_lower=True, matching the R
DIDmultiplegtDYN reference). This is required for the analytical
variance formula to be consistent with the point estimate. Each drop
emits an explicit warning.
Note
Single-period placebo DID_M^pl (L_max=None) has NaN SE -
the per-period aggregation path has no influence-function derivation,
so inference fields stay NaN even when n_bootstrap > 0. The
point estimate is meaningful for visual pre-trends inspection.
Multi-horizon dynamic placebos DID^{pl}_l (L_max >= 1) have
valid analytical SE and bootstrap SE via the placebo IF. See
docs/methodology/REGISTRY.md for the full contract.
Note
ChaisemartinDHaultfoeuille supports survey_design with pweight
and strata/PSU/FPC via Taylor Series Linearization. Replicate weights
are not yet supported.
Synthetic DiD#
Use SyntheticDiD when:
You have few treated units but many control units
Pre-treatment fit between treated and control is poor
You want to construct a weighted synthetic control
from diff_diff import SyntheticDiD, generate_did_data
# SyntheticDiD requires block treatment (constant within units)
block_data = generate_did_data(n_units=40, n_periods=10, treatment_effect=2.0)
sdid = SyntheticDiD()
results = sdid.fit(block_data, outcome='outcome', unit='unit',
time='period', treatment='treated')
# View the unit weights
print(results.unit_weights)
Continuous Treatment#
Use ContinuousDiD when:
Treatment varies in intensity or dose (e.g., subsidy amount, hours of training)
You want to estimate how effects change with treatment dose
You need the full dose-response curve, not just a single average effect
Staggered adoption where units receive different treatment levels
Note
Dose-response curves ATT(d) and ACRT(d) require Strong Parallel Trends (SPT). Under standard PT only the binarized ATTloc is identified. Data must include an untreated group (D = 0), a balanced panel, and time-invariant dose (each unit’s dose is fixed across periods).
from diff_diff import ContinuousDiD, generate_continuous_did_data
data = generate_continuous_did_data(n_units=200, seed=42)
est = ContinuousDiD(n_bootstrap=199, seed=42)
results = est.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
dose='dose', aggregate='dose')
# Overall effect and dose-response curve
print(f"Overall ATT: {results.overall_att:.3f}")
att_curve = results.dose_response_att.to_dataframe()
Universal Rollout / No Untreated Control#
Use HeterogeneousAdoptionDiD when:
Every unit is treated at the post period (universal-rollout policy, industry-wide tariff change, simultaneous launch into all markets)
Treatment intensity (dose) varies across units, but no genuinely untreated control group exists to anchor a standard DiD contrast
ContinuousDiDis unavailable because its untreated-group requirement (D = 0) is violated
The estimator implements de Chaisemartin, Ciccia, D’Haultfoeuille and Knau (2026, arXiv:2405.04465v6) and resolves to one of two estimands depending on the dose support:
Design 1’ (QUG case, ``d_lower = 0``) identifies the Weighted Average Slope (WAS) under the Quasi-Untreated-Group assumption (units with the smallest dose serve as the comparison anchor). The shipped result class exposes
target_parameter == "WAS".Design 1 (no QUG, ``d_lower > 0``) identifies
WAS_{d_lower}under Assumption 6, or sign identification only under Assumption 5; neither additional assumption is testable via pre-trends. Result class exposestarget_parameter == "WAS_d_lower".
The dose-distribution path is auto-detected. Run
did_had_pretest_workflow() to vet the identifying assumptions
before estimation; see Heterogeneous Adoption Difference-in-Differences for the full API and SE-regime contract.
import numpy as np
import pandas as pd
from diff_diff import HeterogeneousAdoptionDiD, did_had_pretest_workflow
# Build a HAD-shape panel: D=0 in pre-periods (t < F), D > 0 only at F+.
rng = np.random.default_rng(42)
G, F, T = 200, 4, 5
doses = rng.beta(0.5, 1.0, size=G)
rows = []
for g in range(G):
for t in range(1, T + 1):
y = (rng.normal()
+ (doses[g] + doses[g] ** 2) * (t >= F)
+ rng.normal(0, 0.5))
d = doses[g] if t >= F else 0.0
rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
had_data = pd.DataFrame(rows)
pretests = did_had_pretest_workflow(had_data, outcome_col='y', unit_col='unit',
time_col='period', dose_col='dose',
aggregate='event_study')
est = HeterogeneousAdoptionDiD()
results = est.fit(had_data, outcome_col='y', unit_col='unit',
time_col='period', dose_col='dose',
aggregate='event_study')
# Event-study results: per-horizon WAS at each event time
for e, att in zip(results.event_times, results.att):
print(f" e={e}: {att:.3f}")
Efficient DiD#
Use EfficientDiD when:
You have staggered adoption and want maximum statistical efficiency on the no-covariate path
You believe parallel trends holds across all pre-treatment periods (PT-All)
You want tighter confidence intervals than Callaway-Sant’Anna
You need a formal efficiency benchmark for comparing estimators
Note
EfficientDiD supports covariate adjustment via a doubly-robust path:
sieve-based propensity score ratios combined with a linear OLS outcome
regression. The DR property gives consistency if either the OR or the
PS is correctly specified, but the linear OLS outcome regression does
not generically attain the semiparametric efficiency bound unless the
conditional mean is linear in the covariates. The unqualified efficiency
claim applies to the no-covariate path only. Pass column names to the
covariates parameter on fit(). See
docs/methodology/REGISTRY.md for the full contract.
from diff_diff import EfficientDiD
edid = EfficientDiD(pt_assumption="all") # or "post" for post-treatment CS match
results = edid.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
aggregate='all')
results.print_summary()
Sun-Abraham#
Use SunAbraham when:
You have staggered adoption and want an interaction-weighted event study
You want to decompose effects by cohort and relative time
You need a regression-based complement to Callaway-Sant’Anna
Sun & Abraham (2021) uses a saturated TWFE regression with cohort x relative-time interactions, then aggregates cohort-specific effects using interaction weights.
from diff_diff import SunAbraham
sa = SunAbraham(control_group='never_treated')
results = sa.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat')
results.print_summary()
Note
Running both Sun-Abraham and Callaway-Sant’Anna provides a useful robustness check. Both are consistent under heterogeneous treatment effects.
Imputation DiD#
Use ImputationDiD when:
You have staggered adoption with homogeneous treatment effects
You want shorter confidence intervals than Callaway-Sant’Anna (~50% shorter)
You need imputed counterfactual outcomes for treated observations
Borusyak, Jaravel & Spiess (2024) estimate unit + time FE on untreated observations, impute counterfactual Y(0) for treated observations, then aggregate.
from diff_diff import ImputationDiD
imp = ImputationDiD()
results = imp.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
aggregate='event_study')
results.print_summary()
Note
Under homogeneous effects, ImputationDiD is semiparametrically efficient. If you suspect heterogeneous effects across cohorts, prefer Callaway-Sant’Anna.
Two-Stage DiD#
Use TwoStageDiD when:
You want the same point estimates as ImputationDiD with a different variance estimator
You prefer the GMM sandwich variance that accounts for first-stage uncertainty
You want a single ATT or an event study from a two-stage procedure
Gardner (2022) estimates FE on untreated obs (stage 1), residualizes all outcomes, then regresses residuals on treatment indicators (stage 2).
from diff_diff import TwoStageDiD
ts = TwoStageDiD()
results = ts.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
aggregate='event_study')
results.print_summary()
Note
Point estimates are identical to ImputationDiD; the key difference is the variance estimator (GMM sandwich vs. conservative clustered).
Stacked DiD#
Use StackedDiD when:
You have staggered adoption and want a sub-experiment approach
You want to avoid forbidden comparisons in TWFE by construction
You need corrective Q-weights for unbiased stacked estimation
Wing, Freedman & Hollingsworth (2024) create one sub-experiment per adoption cohort with clean controls and apply Q-weights to reweight the stacked regression.
from diff_diff import StackedDiD
stk = StackedDiD(kappa_pre=2, kappa_post=3)
results = stk.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
aggregate='event_study')
results.print_summary()
Note
The trimmed aggregate ATT may exclude early or late cohorts whose event
windows do not fit in the data. Check results.trimmed_groups.
TROP#
Use TROP when:
You suspect interactive fixed effects (factor confounding)
Standard parallel trends may not hold due to unobserved factors
You want triple robustness: factor model + unit weights + time weights
Athey, Imbens, Qu & Viviano (2025) combine nuclear norm regularization, exponential unit distance weights, and time decay weights with LOOCV tuning.
from diff_diff import TROP
trop = TROP(n_bootstrap=200)
results = trop.fit(data, outcome='y', treatment='treated',
unit='unit_id', time='period')
results.print_summary()
Note
TROP is computationally intensive. Use method='global' for faster
estimation at the cost of some flexibility vs. method='local'.
Bacon Decomposition#
Use BaconDecomposition when:
You want to diagnose whether TWFE is biased in your staggered setting
You need to see which 2x2 comparisons drive the TWFE estimate
You want to check whether later-vs-earlier or already-treated-as-control comparisons carry substantial weight
Goodman-Bacon (2021) decomposes the TWFE estimate into a weighted average of all 2x2 DiD comparisons and their weights.
from diff_diff import BaconDecomposition, plot_bacon
bacon = BaconDecomposition()
results = bacon.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat')
results.print_summary()
# Visualize the decomposition
plot_bacon(results)
Note
This is a diagnostic tool, not an estimator. If the decomposition reveals problematic weights, switch to Callaway-Sant’Anna or another robust estimator.
Common Pitfalls#
Using TWFE with staggered adoption
TWFE estimates a weighted average of all 2x2 comparisons, including “forbidden” comparisons where already-treated units serve as controls. This can lead to severe bias, even negative weights on treatment effects.
Solution: Use CallawaySantAnna for staggered designs.
Ignoring treatment effect heterogeneity
If treatment effects vary by cohort (when units are treated) or over time (dynamic effects), aggregated estimators may be misleading.
Solution: Use CallawaySantAnna and examine ATT(g,t) and event study plots.
Failing to test parallel trends
The parallel trends assumption is untestable in the post-period but can be assessed using pre-treatment data.
Solution: Use
check_parallel_trends()andHonestDiDfor sensitivity analysis.Inappropriate clustering
Standard errors should typically be clustered at the level of treatment assignment (often the unit level).
Solution: Always specify
clusterfor panel data.
Standard Error Methods#
Different estimators compute standard errors differently. Understanding these differences helps interpret results and choose appropriate inference.
Estimator |
Default SE Method |
Details |
|---|---|---|
|
HC1 (heteroskedasticity-robust) |
Uses White’s robust SEs by default. Specify |
|
Cluster-robust (unit level) |
Always clusters at unit level after within-transformation. Specify |
|
HC1 (heteroskedasticity-robust) |
Same as basic DiD. Cluster-robust available via |
|
Analytical (influence function) |
Uses influence-function SEs with WIF adjustment by default. Set |
|
Placebo, paper-faithful refit bootstrap, or jackknife |
Default uses placebo-based variance ( |
|
Analytical (influence function) |
Uses influence-function-based SEs by default. Use |
|
Path-dependent (CCT-2014 / 2SLS / Binder TSL) |
Three SE regimes per Heterogeneous Adoption Difference-in-Differences. Unweighted: continuous-dose paths use the CCT-2014 weighted-robust SE from the in-house |
|
Cluster-robust (unit level) |
Clusters at unit level by default. Specify |
|
Conservative clustered (Theorem 3) |
Uses conservative clustered variance from Borusyak et al. Theorem 3, clustered at unit level. Use |
|
GMM sandwich (clustered) |
Uses GMM sandwich variance accounting for first-stage estimation uncertainty, clustered at unit level. Use |
|
Cluster-robust (unit level) |
Clusters at unit level by default. Set |
|
Influence function (robust) |
Uses influence-function-based SEs (inherently heteroskedasticity-robust). Specify |
|
Bootstrap (n_bootstrap=200) |
Uses unit-level block bootstrap for variance estimation. Bootstrap is always required (minimum n_bootstrap=2). |
|
Analytical (EIF-based) |
Uses efficient influence function SE = sqrt(mean(EIF^2) / n). Use |
|
N/A (diagnostic) |
Diagnostic tool only; does not produce standard errors. |
Recommendations by sample size:
Large samples (N > 1000, clusters > 50): Default analytical SEs are reliable
Medium samples (clusters 30-50): Cluster-robust SEs recommended
Small samples (clusters < 30): Use wild cluster bootstrap (
inference='wild_bootstrap')Very few clusters (< 10): Use Webb 6-point distribution (
weight_type='webb')
Common pitfall: Forgetting to cluster when units are observed multiple times. For panel data, always cluster at the unit level unless you have a strong reason not to.
from diff_diff import DifferenceInDifferences, generate_did_data
panel = generate_did_data(n_units=200, n_periods=10, treatment_effect=2.0)
# Good: Cluster at unit level for panel data
did = DifferenceInDifferences(cluster='unit')
results = did.fit(panel, outcome='outcome', treatment='treated',
time='post')
# Better for few clusters: Wild bootstrap
did = DifferenceInDifferences(inference='wild_bootstrap', cluster='unit')
results = did.fit(panel, outcome='outcome', treatment='treated',
time='post')
When in Doubt#
If you’re unsure which estimator to use:
Start with CallawaySantAnna - It’s valid even for non-staggered designs and provides the most flexible output (group-time effects, aggregations)
Check for heterogeneity - Plot event studies to see if effects vary
Run sensitivity analysis - Use HonestDiD to assess robustness
Compare estimators - If results differ substantially across estimators, investigate why (often reveals violations of assumptions)
Using survey data? - Pass a
SurveyDesigntofit()for design-based variance estimation. See the Survey Design Support section below for the compatibility matrix, and the survey tutorial for a full walkthrough.
Survey Design Support#
All estimators accept an optional survey_design parameter in fit().
Pass a SurveyDesign object to get design-based variance
estimation. The depth of support varies by estimator:
Note
If your data starts as individual-level survey microdata (e.g., BRFSS,
ACS, CPS, NHANES respondent records), use aggregate_survey()
as a preprocessing step. It pools microdata into geographic-period cells and
returns a pre-configured SurveyDesign. By default, the
returned design uses weight_type="pweight" (unit-constant population
weights), which is compatible with all survey-capable
estimators in the matrix below. Pass second_stage_weights="aweight" for
precision weights (inverse variance) if you prefer efficiency-weighted
estimates - this mode is limited to estimators marked Full.
See Data Preparation for the API reference.
Estimator |
Weights |
Strata/PSU/FPC |
Replicate Weights |
Survey Bootstrap |
|---|---|---|---|---|
|
Full |
Full |
Full |
– |
|
Full |
Full |
Full |
– |
|
Full |
Full |
Full |
– |
|
pweight only |
Full |
Full |
Multiplier at PSU |
|
pweight only |
Full (TSL) |
– |
Group-level (warning) |
|
pweight only |
Full |
Full (analytical) |
– |
|
pweight only |
Full |
Full |
Multiplier at PSU |
|
Full |
Full |
Full |
Rao-Wu rescaled |
|
pweight only |
Full (pweight only) |
Full |
– |
|
pweight only |
Full |
Full (analytical) |
Multiplier at PSU |
|
pweight only |
Full |
Full (analytical) |
Multiplier at PSU |
|
Full |
Full |
Full (analytical) |
Multiplier at PSU |
|
pweight only |
Full (Binder TSL) |
– |
Multiplier (event-study, |
|
Full |
Full |
Full (analytical) |
Multiplier at PSU |
|
pweight only |
Via bootstrap |
– |
Hybrid pairs-bootstrap + Rao-Wu rescaled (bootstrap only) |
|
pweight only |
Via bootstrap |
– |
Rao-Wu rescaled |
|
Full (pweight only) |
Full (analytical) |
– |
– |
|
Diagnostic |
Diagnostic |
– |
– |
Legend:
Full: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance
Full (pweight only): Full TSL with strata/PSU/FPC, but only
pweightaccepted (fweight/aweightrejected because composition changes weight semantics)Via bootstrap: Strata/PSU/FPC supported only with bootstrap variance.
TROPuses bootstrap by default.SyntheticDiDsupports strata/PSU/FPC onvariance_method='bootstrap'via a hybrid pairs-bootstrap + Rao-Wu rescaling composition (see theNote (survey + bootstrap composition)in REGISTRY.md §SyntheticDiD);placeboandjackkniferemain pweight-only.pweight only (Weights column): Only
pweightaccepted;fweight/aweightraise an errorDiagnostic: Weighted descriptive statistics only (no inference)
–: Not supported
Note
SyntheticDiD supports survey designs on variance_method='bootstrap'
— both pweight-only and full strata/PSU/FPC — via a hybrid pairs-bootstrap
composed with per-draw Rao-Wu rescaled weights fed into a weighted
Frank-Wolfe re-estimation of ω and λ. See the
Note (survey + bootstrap composition) in REGISTRY.md §SyntheticDiD
for the objective form and argmin-set caveat.
variance_method='placebo' and variance_method='jackknife' remain
pweight-only — composing placebo permutations / leave-one-out with
Rao-Wu rescaling under the weighted objective is a separate derivation
(tracked in TODO.md).
For the full walkthrough with code examples, see the
survey tutorial.
For deferred work and remaining limitations, see docs/survey-roadmap.md.