Troubleshooting#
This guide covers common issues and their solutions when using diff-diff.
Data Issues#
“No treated observations found”#
Problem: The estimator raises an error that no treated units were found.
Causes:
Treatment column contains wrong values (e.g., strings instead of 0/1)
Treatment column has all zeros
Column name is misspelled
Solutions:
# Check your treatment column
print(data['treated'].value_counts())
# Ensure binary 0/1 values
data['treated'] = (data['group'] == 'treatment').astype(int)
# Or use make_treatment_indicator
from diff_diff import make_treatment_indicator
data = make_treatment_indicator(data, 'group', treated_values='treatment')
“Panel is unbalanced”#
Problem: TwoWayFixedEffects or CallawaySantAnna fails with unbalanced panel.
Causes:
Some units are missing observations for certain time periods
Units have different numbers of observations
Solutions:
from diff_diff import balance_panel
# Balance the panel (keeps only units with all periods)
balanced = balance_panel(data, unit_column='unit_id', time_column='period')
print(f"Dropped {len(data) - len(balanced)} observations")
# Alternative: check balance first
from diff_diff import validate_did_data
issues = validate_did_data(data, outcome='y', treatment='treated',
time='period', unit='unit_id')
print(issues)
Estimation Errors#
“Singular matrix” or “Matrix is singular”#
Problem: Linear algebra error during estimation.
Causes:
Perfect collinearity in covariates
Too few observations relative to parameters
Fixed effects that absorb all variation
Solutions:
# Check for collinearity
import numpy as np
X = data[['x1', 'x2', 'x3']].values
print(f"Matrix rank: {np.linalg.matrix_rank(X)} vs {X.shape[1]} columns")
# Remove redundant covariates
# Or use fewer fixed effects
# For SyntheticDiD, increase regularization
sdid = SyntheticDiD(zeta_omega=1e-4) # increase unit weight regularization
“Bootstrap iterations failed” warning#
Problem: SyntheticDiD warns that many bootstrap iterations failed.
Causes:
Small sample size leads to singular matrices in resamples
Insufficient pre-treatment periods for weight computation
Near-singular weight matrices
Solutions:
# Increase regularization
sdid = SyntheticDiD(zeta_omega=1e-4, zeta_lambda=1e-4, n_bootstrap=500)
# Or use placebo-based inference instead
sdid = SyntheticDiD(variance_method="placebo") # Uses placebo inference
# Ensure sufficient pre-treatment periods (recommend >= 4)
Standard Error Issues#
“Standard errors seem too small/large”#
Problem: SEs don’t match expectations or other software.
Causes:
Wrong clustering level
Not accounting for serial correlation
Different SE formulas (HC0 vs HC1 vs cluster)
Solutions:
# For panel data, always cluster at unit level
did = DifferenceInDifferences(cluster='unit_id')
results = did.fit(data, outcome='y', treatment='treated', time='post')
# Compare SE methods
did_robust = DifferenceInDifferences()
did_cluster = DifferenceInDifferences(cluster='unit_id')
did_wild = DifferenceInDifferences(inference='wild_bootstrap', cluster='unit_id')
r1 = did_robust.fit(data, outcome='y', treatment='treated', time='post')
r2 = did_cluster.fit(data, outcome='y', treatment='treated', time='post')
r3 = did_wild.fit(data, outcome='y', treatment='treated', time='post')
print(f"Robust SE: {r1.se:.4f}")
print(f"Cluster SE: {r2.se:.4f}")
print(f"Wild bootstrap SE: {r3.se:.4f}")
“Wild bootstrap takes too long”#
Problem: Bootstrap inference is slow.
Solutions:
# Reduce number of bootstrap iterations (default is 999)
did = DifferenceInDifferences(inference='wild_bootstrap', n_bootstrap=499)
# Note: Fewer iterations = less precise p-values
# 499 is minimum recommended for publication
Staggered Adoption Issues#
“No never-treated units found”#
Problem: CallawaySantAnna fails when using control_group='never_treated'.
Causes:
All units are eventually treated
first_treatcolumn has no never-treated indicator (typically 0 or inf)
Solutions:
# Check first_treat distribution
print(data['first_treat'].value_counts())
# Option 1: Use not-yet-treated as controls
cs = CallawaySantAnna(control_group='not_yet_treated')
# Option 2: Mark never-treated units correctly
# Never-treated should have first_treat = 0 or np.inf
data.loc[data['ever_treated'] == 0, 'first_treat'] = 0
“Group-time effects have large standard errors”#
Problem: ATT(g,t) estimates are imprecise.
Causes:
Small cohort sizes
Few comparison periods
High variance in outcomes
Solutions:
# Check cohort sizes
print(data.groupby('first_treat')['unit_id'].nunique())
# Use bootstrap for better inference
cs = CallawaySantAnna(n_bootstrap=999)
results = cs.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
aggregate='event_study')
# Access aggregated results
print(results.overall_att) # Overall ATT
print(results.event_study_effects) # Event study effects
Visualization Issues#
“Event study plot looks wrong”#
Problem: Plot has unexpected gaps, wrong reference period, or missing periods.
Solutions:
from diff_diff import plot_event_study
# Check your results first
print(results.period_effects) # or results.event_study_effects
# Specify reference period explicitly
plot_event_study(results, reference_period=-1)
# For CallawaySantAnna, fit with aggregate='event_study'
results = cs.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
aggregate='event_study')
plot_event_study(results)
“Plot doesn’t show in Jupyter”#
Problem: Matplotlib figure doesn’t display.
Solutions:
import matplotlib.pyplot as plt
# Option 1: Use plt.show()
ax = plot_event_study(results)
plt.show()
# Option 2: Use inline magic (Jupyter)
%matplotlib inline
# Option 3: Return and display figure
ax = plot_event_study(results)
ax # Display in Jupyter
Performance Issues#
“Estimation is slow”#
Problem: Fitting takes a long time.
Causes:
Large dataset with many fixed effects
Bootstrap inference with many iterations
CallawaySantAnna with many cohorts and time periods
Solutions:
# TWFE already handles unit + time FE via within-transformation
twfe = TwoWayFixedEffects()
results = twfe.fit(data, outcome='y', treatment='treated',
unit='unit_id', time='period')
# Reduce bootstrap iterations for initial exploration
did = DifferenceInDifferences(inference='wild_bootstrap', n_bootstrap=99)
# For CallawaySantAnna, start without bootstrap
cs = CallawaySantAnna()
results = cs.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat')
# Use n_bootstrap for final results
cs_boot = CallawaySantAnna(n_bootstrap=999)
results = cs_boot.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat')
Rust Backend Issues#
“Rust backend is not available”#
Problem: ImportError when using DIFF_DIFF_BACKEND=rust or attempting to
use Rust-accelerated operations.
Causes:
Rust backend was not compiled during installation
The
maturinbuild step was skipped or failedPlatform does not have a pre-built wheel available
Solutions:
# Check if Rust backend is available
from diff_diff import HAS_RUST_BACKEND
print(f"Rust backend available: {HAS_RUST_BACKEND}")
# Force pure Python mode (no Rust required)
import os
os.environ['DIFF_DIFF_BACKEND'] = 'python'
# Rebuild with Rust backend
pip install -e ".[dev]"
maturin develop --release
# On macOS with Apple Accelerate
maturin develop --release --features accelerate
TROP Issues#
“All tuning parameter combinations failed”#
Problem: TROP raises an error that all tuning parameter combinations failed during leave-one-out cross-validation (LOOCV).
Causes:
Insufficient pre-treatment periods (minimum 2; recommend 4+ for stability)
Near-constant outcomes that leave no variation to fit
Data is too sparse for the requested lambda grids
Solutions:
from diff_diff import TROP
# Widen the lambda grids to give the optimizer more room
trop = TROP(
lambda_time_grid=[0.0, 0.5, 1.0, 2.0, 5.0],
lambda_unit_grid=[0.0, 0.5, 1.0, 2.0, 5.0],
lambda_nn_grid=[0.0, 0.1, 1.0, 10.0],
)
# TROP requires at least 2 pre-treatment periods (4+ recommended)
pre_periods = data.loc[data['post'] == 0, 'period'].nunique()
print(f"Pre-treatment periods: {pre_periods}") # Must be >= 2; stability improves with >= 4
# If TROP cannot find valid parameters, try CallawaySantAnna as a fallback
from diff_diff import CallawaySantAnna
cs = CallawaySantAnna()
results = cs.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat')
“LOOCV fits failed / numerical instability”#
Problem: Partial LOOCV failures during TROP tuning, or warnings about numerical instability in cross-validation fits.
Causes:
Poor data quality (missing values, outliers)
Regularization parameters too small for the data scale
Solutions:
# Check data quality
print(data[['y', 'treatment', 'post']].describe())
print(f"Missing values:\n{data.isnull().sum()}")
# Increase regularization to improve numerical stability
trop = TROP(
lambda_nn_grid=[0.1, 1.0, 10.0, 100.0], # Larger minimum lambda
)
“Few bootstrap iterations succeeded”#
Problem: TROP warns that only N of M bootstrap iterations completed successfully, leading to imprecise standard errors.
Causes:
Small sample sizes cause singular matrices in bootstrap resamples
Complex model specification amplifies resampling instability
Solutions:
# Increase total bootstrap iterations to get enough successes
trop = TROP(n_bootstrap=999)
# Simplify the model to reduce bootstrap failures
trop = TROP(method='global', n_bootstrap=999)
Continuous DiD Issues#
“Dose appears discrete”#
Problem: ContinuousDiD warns that the dose variable appears to contain
only integer or discrete values.
Causes:
Treatment is truly binary (0/1) and should use standard DiD
Dose variable is coded as integers but represents a continuous measure
Solutions:
# Check dose distribution
print(data['dose'].value_counts())
# If treatment is truly binary, use standard DiD instead
from diff_diff import DifferenceInDifferences
did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treatment='treatment', time='post')
# If dose is continuous but stored as int, convert
data['dose'] = data['dose'].astype(float)
“No post-treatment cells available for aggregation”#
Problem: No (g, t) cells are available after filtering, so aggregation cannot produce an ATT estimate.
Causes:
first_treatis miscoded (e.g., all zeros or all the same value)No post-treatment periods exist in the data for treated cohorts
Filtering removed all valid cells
Solutions:
# Check first_treat coding
print(data['first_treat'].value_counts())
# Verify that post-treatment periods exist for treated units
treated = data[data['first_treat'] > 0]
for g, group in treated.groupby('first_treat'):
post_obs = group[group['period'] >= g]
print(f"Cohort {g}: {len(post_obs)} post-treatment observations")
HeterogeneousAdoptionDiD (HAD) Issues#
“Resolved estimand is not what I expected (WAS vs WAS_d_lower)”#
Problem: HeterogeneousAdoptionDiD resolves target_parameter to
"WAS_d_lower" when you expected "WAS" (or vice versa).
Cause: HAD auto-detects the design path from the unit-level
post-treatment dose D_{g,F} (the dose at the first treated period
F, one value per unit), NOT from the full panel dose column. The
panel column carries structural pre-period zeros (HAD requires
D_{g,t} = 0 for t < F), so had_data['dose'].min() is always
zero on a valid HAD panel and tells you nothing about the resolved
design. _detect_design then resolves on D_{g,F} and picks Design
1’ (continuous_at_zero, targets WAS) when EITHER
D_{g,F}.min() == 0 exactly OR D_{g,F}.min() is a small positive
value below 0.01 * median(|D_{g,F}|) (the small-share-of-treated
escape clause). Otherwise the estimator routes to Design 1, with a
further check for mass-point structure (modal fraction at D_{g,F}.min()
exceeding 2% routes to mass_point; otherwise
continuous_near_d_lower); both Design 1 paths target WAS_{d_lower}.
Solutions:
import numpy as np
import pandas as pd
from diff_diff import HeterogeneousAdoptionDiD
# Build a HAD-shape panel: D=0 in pre-periods (t < F), D > 0 only at F+.
rng = np.random.default_rng(42)
G, F, T = 200, 4, 5
doses = rng.beta(0.5, 1.0, size=G)
rows = []
for g in range(G):
for t in range(1, T + 1):
y = (rng.normal()
+ (doses[g] + doses[g] ** 2) * (t >= F)
+ rng.normal(0, 0.5))
d = doses[g] if t >= F else 0.0
rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
had_data = pd.DataFrame(rows)
# Inspect the support the detector actually uses: per-unit dose at the
# first treated period F. Pre-period zeros on the panel column are
# structural and ignored by `_detect_design()`.
d_at_F = had_data.loc[had_data['period'] == F].set_index('unit')['dose']
print(d_at_F.describe())
d_min = float(d_at_F.min())
d_thr = 0.01 * float(np.median(np.abs(d_at_F)))
print(f"D_{{g,F}}.min() = {d_min:.6g}; "
f"0.01 * median(|D_{{g,F}}|) = {d_thr:.6g}; "
f"D_{{g,F}}.min() < threshold => Design 1' (WAS)")
# Check the resolved estimand after fitting
est = HeterogeneousAdoptionDiD()
results = est.fit(had_data, outcome_col='y', unit_col='unit',
time_col='period', dose_col='dose',
aggregate='event_study')
print(f"Resolved: {results.target_parameter}")
# If you intend Design 1' but `D_{g,F}.min()` exceeds the threshold,
# verify the dose-variable encoding (e.g. log-transformed doses where
# 0 was mapped to a small positive value larger than 1% of the median).
“Mass-point design selected”#
Problem: HAD reports that the mass_point design was selected
instead of continuous_at_zero or continuous_near_d_lower.
Cause: mass_point is a distinct Design 1 estimator path from the
dCDH 2026 paper (Section 3.2.4), not a fallback from the continuous
local-linear fits. _detect_design() resolves to mass_point when the
modal fraction at d.min() exceeds 2%, signalling a heavy point mass at
the dose-support boundary. On this path both the point estimate and the SE
differ from the continuous paths: the estimator uses the Wald-IV
sample-average ratio with binary instrument Z_g = 1{D_{g,2} > d_lower}
- (Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0}) - and inference
uses the structural-residual 2SLS sandwich (the local-linear / CCT-2014
SE path is not used here).
Solutions:
import numpy as np
import pandas as pd
from diff_diff import HeterogeneousAdoptionDiD
# Build a HAD panel with a heavy boundary mass at d_lower so the
# modal fraction at d.min() exceeds 2% and `_detect_design` resolves
# to `mass_point`.
rng = np.random.default_rng(42)
G, F, T = 200, 4, 5
d_lower = 0.5
mass_frac = 0.3
doses = np.where(
rng.uniform(size=G) < mass_frac,
d_lower,
rng.uniform(d_lower + 0.1, 2.0, size=G),
)
rows = []
for g in range(G):
for t in range(1, T + 1):
y = (rng.normal()
+ doses[g] * (t >= F)
+ rng.normal(0, 0.5))
d = doses[g] if t >= F else 0.0
rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
had_data = pd.DataFrame(rows)
est = HeterogeneousAdoptionDiD()
results = est.fit(had_data, outcome_col='y', unit_col='unit',
time_col='period', dose_col='dose',
aggregate='event_study')
# Inspect the resolved design
print(f"Design: {results.design}") # 'mass_point' here
# The mass-point Wald-IV estimator + structural-residual 2SLS
# sandwich is the canonical Section 3.2.4 path for designs with a
# heavy boundary point mass; accept the resolution unless you can
# re-bin the dose variable so the modal fraction at d.min() drops
# below 2% (then the detector picks continuous_near_d_lower).
“NotImplementedError on survey + mass-point + vcov_type=’classical’”#
Problem: Calling HeterogeneousAdoptionDiD.fit(..., vcov_type="classical")
under survey_design=SurveyDesign(...) (or under the deprecated survey=
alias) raises NotImplementedError on the mass-point path. The same
NotImplementedError fires on the deprecated weights= shortcut +
aggregate="event_study" + cband=True.
Cause: The per-unit 2SLS influence function returned by the mass-point fit
is HC1-scaled so that compute_survey_if_variance and the sup-t bootstrap
target V_HC1 consistently. Mixing it with a classical analytical SE would
silently report a V_HC1-targeted variance under a classical label.
Solutions:
# The constructor default `robust=False` maps to `vcov_type='classical'`
# and triggers the guard on the mass-point survey path - so plain
# `HeterogeneousAdoptionDiD()` is NOT a workaround. Pick one of:
est = HeterogeneousAdoptionDiD(vcov_type='hc1')
# Or equivalently:
est = HeterogeneousAdoptionDiD(robust=True) # maps to vcov_type='hc1'
A classical-aligned IF derivation is queued for a follow-up release; until
then, vcov_type='hc1' (or the equivalent robust=True) is the
recommended path for survey + mass-point fits. See Heterogeneous Adoption Difference-in-Differences for the
full SE-regime contract.
“Panel-only event-study restriction”#
Problem: HeterogeneousAdoptionDiD.fit(..., aggregate="event_study")
raises on a staggered panel.
Cause: The Appendix B.2 event-study extension requires either a
common-adoption panel (single first-treat period; first_treat_col is
then optional and the period is inferred from the dose invariant) or a
staggered panel with first_treat_col provided so the estimator can
auto-filter to the last-treatment cohort plus never-treated units (with
a UserWarning). The fit raises only when the panel is staggered
and first_treat_col is missing.
Solutions:
import numpy as np
import pandas as pd
# Build a staggered HAD panel for this example: 120 units, three
# cohorts (30 never-treated + 30 treated at period 5 + 60 treated at
# period 8). Dose is zero pre-treatment per unit and a constant
# positive value post-treatment, so the first_treat / dose-path
# consistency validator passes. The 60-unit last cohort gives the
# boundary local-linear estimator enough distinct dose values to fit.
np.random.seed(42)
n_units, n_periods = 120, 10
first_treat_per_unit = np.array([0] * 30 + [5] * 30 + [8] * 60)
dose_per_unit = np.where(
first_treat_per_unit > 0, np.random.uniform(0.5, 2.0, n_units), 0.0
)
rows = []
for u in range(n_units):
ft = first_treat_per_unit[u]
for t in range(n_periods):
d_ut = dose_per_unit[u] if (ft > 0 and t >= ft) else 0.0
y_ut = (d_ut > 0) * dose_per_unit[u] * 0.5 + np.random.normal()
rows.append((u, t, d_ut, ft, y_ut))
data = pd.DataFrame(rows, columns=["unit", "period", "dose", "first_treat", "y"])
# Primary remedy: pass `first_treat_col` so the estimator auto-filters
# to the last-treatment cohort + never-treated and emits a UserWarning.
est = HeterogeneousAdoptionDiD()
results = est.fit(data, outcome_col='y', unit_col='unit',
time_col='period', dose_col='dose',
first_treat_col='first_treat',
aggregate='event_study')
# Equivalent: subset to the last-treatment cohort + never-treated
# before fitting (skips the UserWarning).
last_cohort = data['first_treat'].max()
subset = data[(data['first_treat'] == last_cohort) |
(data['first_treat'] == 0)]
results = est.fit(subset, outcome_col='y', unit_col='unit',
time_col='period', dose_col='dose',
aggregate='event_study')
Imputation / Two-Stage DiD Issues#
“Non-constant first_treat values”#
Problem: ImputationDiD or TwoStageDiD issues a warning because
first_treat varies within units. The estimator coerces to a single value
per unit (using the first observed value) and proceeds, but results may be
unreliable.
Causes:
Units switch treatment status back and forth
Data merge errors created inconsistent
first_treatvalues
Solutions:
# Check for non-constant first_treat within units
varying = data.groupby('unit_id')['first_treat'].nunique()
bad_units = varying[varying > 1].index
print(f"Units with varying first_treat: {len(bad_units)}")
# Fix: ensure first_treat is constant per unit (absorbing state)
first_treat_map = data.groupby('unit_id')['first_treat'].first()
data['first_treat'] = data['unit_id'].map(first_treat_map)
“Units treated in all observed periods”#
Problem: All observed periods for some units are post-treatment, so no pre-treatment outcomes exist to construct counterfactuals.
Causes:
Always-treated units entered the panel already treated
Observation window starts after treatment onset for some cohorts
Solutions:
# Identify always-treated units (treated at or before first observed period)
# Exclude never-treated (first_treat == 0) which are the control group
unit_ft = data.groupby('unit_id')['first_treat'].first()
min_period = data['period'].min()
always_treated = unit_ft[(unit_ft > 0) & (unit_ft <= min_period)]
print(f"Always-treated units: {len(always_treated)}")
# Drop always-treated units (keep never-treated controls)
data = data[~data['unit_id'].isin(always_treated.index)]
“Horizons not identified without never-treated units”#
Problem: Certain event study horizons return NaN because they require never-treated units for identification (Proposition 5 in Borusyak et al.).
Causes:
No never-treated units in the data
Specific long-horizon estimates need a comparison group that spans those periods
Solutions:
# Check for never-treated units
never_treated = data.groupby('unit_id')['first_treat'].first()
print(f"Never-treated units: {(never_treated == 0).sum()}")
# Option 1: Include never-treated units in your sample
# Option 2: Accept NaN for unidentified horizons
results = ImputationDiD().fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat')
# NaN horizons are expected when never-treated units are absent
Bacon Decomposition Issues#
“Unbalanced panel detected”#
Problem: BaconDecomposition issues a warning because the panel is
unbalanced. Bacon decomposition assumes balanced panels and results may be
inaccurate with missing observations.
Causes:
Some units are missing observations for certain time periods
Units entered or exited the panel at different times
Solutions:
from diff_diff import balance_panel, BaconDecomposition
# Balance the panel first
balanced = balance_panel(data, unit_column='unit_id', time_column='period')
print(f"Dropped {len(data) - len(balanced)} observations to balance panel")
# Then run decomposition
bacon = BaconDecomposition()
results = bacon.fit(balanced, outcome='y', unit='unit_id',
time='period', first_treat='first_treat')
Getting Help#
If you encounter issues not covered here:
Check the API documentation for parameter details
Run validation with
validate_did_data()to catch data issuesStart simple with basic DiD before adding complexity
Compare with known results using
generate_did_data()
# Generate test data with known effect
from diff_diff import generate_did_data, DifferenceInDifferences
data = generate_did_data(n_units=100, n_periods=10, treatment_effect=2.0)
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
print(f"True effect: 2.0, Estimated: {results.att:.3f}")
For bugs or feature requests, please open an issue on GitHub.