Troubleshooting#

This guide covers common issues and their solutions when using diff-diff.

Data Issues#

“No treated observations found”#

Problem: The estimator raises an error that no treated units were found.

Causes:

  1. Treatment column contains wrong values (e.g., strings instead of 0/1)

  2. Treatment column has all zeros

  3. Column name is misspelled

Solutions:

# Check your treatment column
print(data['treated'].value_counts())

# Ensure binary 0/1 values
data['treated'] = (data['group'] == 'treatment').astype(int)

# Or use make_treatment_indicator
from diff_diff import make_treatment_indicator
data = make_treatment_indicator(data, 'group', treated_values='treatment')

“Panel is unbalanced”#

Problem: TwoWayFixedEffects or CallawaySantAnna fails with unbalanced panel.

Causes:

  1. Some units are missing observations for certain time periods

  2. Units have different numbers of observations

Solutions:

from diff_diff import balance_panel

# Balance the panel (keeps only units with all periods)
balanced = balance_panel(data, unit_column='unit_id', time_column='period')
print(f"Dropped {len(data) - len(balanced)} observations")

# Alternative: check balance first
from diff_diff import validate_did_data
issues = validate_did_data(data, outcome='y', treatment='treated',
                            time='period', unit='unit_id')
print(issues)

Estimation Errors#

“Singular matrix” or “Matrix is singular”#

Problem: Linear algebra error during estimation.

Causes:

  1. Perfect collinearity in covariates

  2. Too few observations relative to parameters

  3. Fixed effects that absorb all variation

Solutions:

# Check for collinearity
import numpy as np
X = data[['x1', 'x2', 'x3']].values
print(f"Matrix rank: {np.linalg.matrix_rank(X)} vs {X.shape[1]} columns")

# Remove redundant covariates
# Or use fewer fixed effects

# For SyntheticDiD, increase regularization
sdid = SyntheticDiD(zeta_omega=1e-4)  # increase unit weight regularization

“Bootstrap iterations failed” warning#

Problem: SyntheticDiD warns that many bootstrap iterations failed.

Causes:

  1. Small sample size leads to singular matrices in resamples

  2. Insufficient pre-treatment periods for weight computation

  3. Near-singular weight matrices

Solutions:

# Increase regularization
sdid = SyntheticDiD(zeta_omega=1e-4, zeta_lambda=1e-4, n_bootstrap=500)

# Or use placebo-based inference instead
sdid = SyntheticDiD(variance_method="placebo")  # Uses placebo inference

# Ensure sufficient pre-treatment periods (recommend >= 4)

Standard Error Issues#

“Standard errors seem too small/large”#

Problem: SEs don’t match expectations or other software.

Causes:

  1. Wrong clustering level

  2. Not accounting for serial correlation

  3. Different SE formulas (HC0 vs HC1 vs cluster)

Solutions:

# For panel data, always cluster at unit level
did = DifferenceInDifferences(cluster='unit_id')
results = did.fit(data, outcome='y', treatment='treated', time='post')

# Compare SE methods
did_robust = DifferenceInDifferences()
did_cluster = DifferenceInDifferences(cluster='unit_id')
did_wild = DifferenceInDifferences(inference='wild_bootstrap', cluster='unit_id')

r1 = did_robust.fit(data, outcome='y', treatment='treated', time='post')
r2 = did_cluster.fit(data, outcome='y', treatment='treated', time='post')
r3 = did_wild.fit(data, outcome='y', treatment='treated', time='post')

print(f"Robust SE: {r1.se:.4f}")
print(f"Cluster SE: {r2.se:.4f}")
print(f"Wild bootstrap SE: {r3.se:.4f}")

“Wild bootstrap takes too long”#

Problem: Bootstrap inference is slow.

Solutions:

# Reduce number of bootstrap iterations (default is 999)
did = DifferenceInDifferences(inference='wild_bootstrap', n_bootstrap=499)

# Note: Fewer iterations = less precise p-values
# 499 is minimum recommended for publication

Staggered Adoption Issues#

“No never-treated units found”#

Problem: CallawaySantAnna fails when using control_group='never_treated'.

Causes:

  1. All units are eventually treated

  2. first_treat column has no never-treated indicator (typically 0 or inf)

Solutions:

# Check first_treat distribution
print(data['first_treat'].value_counts())

# Option 1: Use not-yet-treated as controls
cs = CallawaySantAnna(control_group='not_yet_treated')

# Option 2: Mark never-treated units correctly
# Never-treated should have first_treat = 0 or np.inf
data.loc[data['ever_treated'] == 0, 'first_treat'] = 0

“Group-time effects have large standard errors”#

Problem: ATT(g,t) estimates are imprecise.

Causes:

  1. Small cohort sizes

  2. Few comparison periods

  3. High variance in outcomes

Solutions:

# Check cohort sizes
print(data.groupby('first_treat')['unit_id'].nunique())

# Use bootstrap for better inference
cs = CallawaySantAnna(n_bootstrap=999)
results = cs.fit(data, outcome='y', unit='unit_id',
                 time='period', first_treat='first_treat',
                 aggregate='event_study')

# Access aggregated results
print(results.overall_att)  # Overall ATT
print(results.event_study_effects)  # Event study effects

Visualization Issues#

“Event study plot looks wrong”#

Problem: Plot has unexpected gaps, wrong reference period, or missing periods.

Solutions:

from diff_diff import plot_event_study

# Check your results first
print(results.period_effects)  # or results.event_study_effects

# Specify reference period explicitly
plot_event_study(results, reference_period=-1)

# For CallawaySantAnna, fit with aggregate='event_study'
results = cs.fit(data, outcome='y', unit='unit_id',
                 time='period', first_treat='first_treat',
                 aggregate='event_study')
plot_event_study(results)

“Plot doesn’t show in Jupyter”#

Problem: Matplotlib figure doesn’t display.

Solutions:

import matplotlib.pyplot as plt

# Option 1: Use plt.show()
ax = plot_event_study(results)
plt.show()

# Option 2: Use inline magic (Jupyter)
%matplotlib inline

# Option 3: Return and display figure
ax = plot_event_study(results)
ax  # Display in Jupyter

Performance Issues#

“Estimation is slow”#

Problem: Fitting takes a long time.

Causes:

  1. Large dataset with many fixed effects

  2. Bootstrap inference with many iterations

  3. CallawaySantAnna with many cohorts and time periods

Solutions:

# TWFE already handles unit + time FE via within-transformation
twfe = TwoWayFixedEffects()
results = twfe.fit(data, outcome='y', treatment='treated',
                   unit='unit_id', time='period')

# Reduce bootstrap iterations for initial exploration
did = DifferenceInDifferences(inference='wild_bootstrap', n_bootstrap=99)

# For CallawaySantAnna, start without bootstrap
cs = CallawaySantAnna()
results = cs.fit(data, outcome='y', unit='unit_id',
                 time='period', first_treat='first_treat')
# Use n_bootstrap for final results
cs_boot = CallawaySantAnna(n_bootstrap=999)
results = cs_boot.fit(data, outcome='y', unit='unit_id',
                      time='period', first_treat='first_treat')

Rust Backend Issues#

“Rust backend is not available”#

Problem: ImportError when using DIFF_DIFF_BACKEND=rust or attempting to use Rust-accelerated operations.

Causes:

  1. Rust backend was not compiled during installation

  2. The maturin build step was skipped or failed

  3. Platform does not have a pre-built wheel available

Solutions:

# Check if Rust backend is available
from diff_diff import HAS_RUST_BACKEND
print(f"Rust backend available: {HAS_RUST_BACKEND}")

# Force pure Python mode (no Rust required)
import os
os.environ['DIFF_DIFF_BACKEND'] = 'python'
# Rebuild with Rust backend
pip install -e ".[dev]"
maturin develop --release

# On macOS with Apple Accelerate
maturin develop --release --features accelerate

TROP Issues#

“All tuning parameter combinations failed”#

Problem: TROP raises an error that all tuning parameter combinations failed during leave-one-out cross-validation (LOOCV).

Causes:

  1. Insufficient pre-treatment periods (minimum 2; recommend 4+ for stability)

  2. Near-constant outcomes that leave no variation to fit

  3. Data is too sparse for the requested lambda grids

Solutions:

from diff_diff import TROP

# Widen the lambda grids to give the optimizer more room
trop = TROP(
    lambda_time_grid=[0.0, 0.5, 1.0, 2.0, 5.0],
    lambda_unit_grid=[0.0, 0.5, 1.0, 2.0, 5.0],
    lambda_nn_grid=[0.0, 0.1, 1.0, 10.0],
)

# TROP requires at least 2 pre-treatment periods (4+ recommended)
pre_periods = data.loc[data['post'] == 0, 'period'].nunique()
print(f"Pre-treatment periods: {pre_periods}")  # Must be >= 2; stability improves with >= 4

# If TROP cannot find valid parameters, try CallawaySantAnna as a fallback
from diff_diff import CallawaySantAnna
cs = CallawaySantAnna()
results = cs.fit(data, outcome='y', unit='unit_id',
                 time='period', first_treat='first_treat')

“LOOCV fits failed / numerical instability”#

Problem: Partial LOOCV failures during TROP tuning, or warnings about numerical instability in cross-validation fits.

Causes:

  1. Poor data quality (missing values, outliers)

  2. Regularization parameters too small for the data scale

Solutions:

# Check data quality
print(data[['y', 'treatment', 'post']].describe())
print(f"Missing values:\n{data.isnull().sum()}")

# Increase regularization to improve numerical stability
trop = TROP(
    lambda_nn_grid=[0.1, 1.0, 10.0, 100.0],  # Larger minimum lambda
)

“Few bootstrap iterations succeeded”#

Problem: TROP warns that only N of M bootstrap iterations completed successfully, leading to imprecise standard errors.

Causes:

  1. Small sample sizes cause singular matrices in bootstrap resamples

  2. Complex model specification amplifies resampling instability

Solutions:

# Increase total bootstrap iterations to get enough successes
trop = TROP(n_bootstrap=999)

# Simplify the model to reduce bootstrap failures
trop = TROP(method='global', n_bootstrap=999)

Continuous DiD Issues#

“Dose appears discrete”#

Problem: ContinuousDiD warns that the dose variable appears to contain only integer or discrete values.

Causes:

  1. Treatment is truly binary (0/1) and should use standard DiD

  2. Dose variable is coded as integers but represents a continuous measure

Solutions:

# Check dose distribution
print(data['dose'].value_counts())

# If treatment is truly binary, use standard DiD instead
from diff_diff import DifferenceInDifferences
did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treatment='treatment', time='post')

# If dose is continuous but stored as int, convert
data['dose'] = data['dose'].astype(float)

“No post-treatment cells available for aggregation”#

Problem: No (g, t) cells are available after filtering, so aggregation cannot produce an ATT estimate.

Causes:

  1. first_treat is miscoded (e.g., all zeros or all the same value)

  2. No post-treatment periods exist in the data for treated cohorts

  3. Filtering removed all valid cells

Solutions:

# Check first_treat coding
print(data['first_treat'].value_counts())

# Verify that post-treatment periods exist for treated units
treated = data[data['first_treat'] > 0]
for g, group in treated.groupby('first_treat'):
    post_obs = group[group['period'] >= g]
    print(f"Cohort {g}: {len(post_obs)} post-treatment observations")

HeterogeneousAdoptionDiD (HAD) Issues#

“Resolved estimand is not what I expected (WAS vs WAS_d_lower)”#

Problem: HeterogeneousAdoptionDiD resolves target_parameter to "WAS_d_lower" when you expected "WAS" (or vice versa).

Cause: HAD auto-detects the design path from the unit-level post-treatment dose D_{g,F} (the dose at the first treated period F, one value per unit), NOT from the full panel dose column. The panel column carries structural pre-period zeros (HAD requires D_{g,t} = 0 for t < F), so had_data['dose'].min() is always zero on a valid HAD panel and tells you nothing about the resolved design. _detect_design then resolves on D_{g,F} and picks Design 1’ (continuous_at_zero, targets WAS) when EITHER D_{g,F}.min() == 0 exactly OR D_{g,F}.min() is a small positive value below 0.01 * median(|D_{g,F}|) (the small-share-of-treated escape clause). Otherwise the estimator routes to Design 1, with a further check for mass-point structure (modal fraction at D_{g,F}.min() exceeding 2% routes to mass_point; otherwise continuous_near_d_lower); both Design 1 paths target WAS_{d_lower}.

Solutions:

import numpy as np
import pandas as pd
from diff_diff import HeterogeneousAdoptionDiD

# Build a HAD-shape panel: D=0 in pre-periods (t < F), D > 0 only at F+.
rng = np.random.default_rng(42)
G, F, T = 200, 4, 5
doses = rng.beta(0.5, 1.0, size=G)
rows = []
for g in range(G):
    for t in range(1, T + 1):
        y = (rng.normal()
             + (doses[g] + doses[g] ** 2) * (t >= F)
             + rng.normal(0, 0.5))
        d = doses[g] if t >= F else 0.0
        rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
had_data = pd.DataFrame(rows)

# Inspect the support the detector actually uses: per-unit dose at the
# first treated period F. Pre-period zeros on the panel column are
# structural and ignored by `_detect_design()`.
d_at_F = had_data.loc[had_data['period'] == F].set_index('unit')['dose']
print(d_at_F.describe())
d_min = float(d_at_F.min())
d_thr = 0.01 * float(np.median(np.abs(d_at_F)))
print(f"D_{{g,F}}.min() = {d_min:.6g}; "
      f"0.01 * median(|D_{{g,F}}|) = {d_thr:.6g}; "
      f"D_{{g,F}}.min() < threshold => Design 1' (WAS)")

# Check the resolved estimand after fitting
est = HeterogeneousAdoptionDiD()
results = est.fit(had_data, outcome_col='y', unit_col='unit',
                  time_col='period', dose_col='dose',
                  aggregate='event_study')
print(f"Resolved: {results.target_parameter}")

# If you intend Design 1' but `D_{g,F}.min()` exceeds the threshold,
# verify the dose-variable encoding (e.g. log-transformed doses where
# 0 was mapped to a small positive value larger than 1% of the median).

“Mass-point design selected”#

Problem: HAD reports that the mass_point design was selected instead of continuous_at_zero or continuous_near_d_lower.

Cause: mass_point is a distinct Design 1 estimator path from the dCDH 2026 paper (Section 3.2.4), not a fallback from the continuous local-linear fits. _detect_design() resolves to mass_point when the modal fraction at d.min() exceeds 2%, signalling a heavy point mass at the dose-support boundary. On this path both the point estimate and the SE differ from the continuous paths: the estimator uses the Wald-IV sample-average ratio with binary instrument Z_g = 1{D_{g,2} > d_lower} - (Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0}) - and inference uses the structural-residual 2SLS sandwich (the local-linear / CCT-2014 SE path is not used here).

Solutions:

import numpy as np
import pandas as pd
from diff_diff import HeterogeneousAdoptionDiD

# Build a HAD panel with a heavy boundary mass at d_lower so the
# modal fraction at d.min() exceeds 2% and `_detect_design` resolves
# to `mass_point`.
rng = np.random.default_rng(42)
G, F, T = 200, 4, 5
d_lower = 0.5
mass_frac = 0.3
doses = np.where(
    rng.uniform(size=G) < mass_frac,
    d_lower,
    rng.uniform(d_lower + 0.1, 2.0, size=G),
)
rows = []
for g in range(G):
    for t in range(1, T + 1):
        y = (rng.normal()
             + doses[g] * (t >= F)
             + rng.normal(0, 0.5))
        d = doses[g] if t >= F else 0.0
        rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
had_data = pd.DataFrame(rows)

est = HeterogeneousAdoptionDiD()
results = est.fit(had_data, outcome_col='y', unit_col='unit',
                  time_col='period', dose_col='dose',
                  aggregate='event_study')

# Inspect the resolved design
print(f"Design: {results.design}")  # 'mass_point' here

# The mass-point Wald-IV estimator + structural-residual 2SLS
# sandwich is the canonical Section 3.2.4 path for designs with a
# heavy boundary point mass; accept the resolution unless you can
# re-bin the dose variable so the modal fraction at d.min() drops
# below 2% (then the detector picks continuous_near_d_lower).

“NotImplementedError on survey + mass-point + vcov_type=’classical’”#

Problem: Calling HeterogeneousAdoptionDiD.fit(..., vcov_type="classical") under survey_design=SurveyDesign(...) (or under the deprecated survey= alias) raises NotImplementedError on the mass-point path. The same NotImplementedError fires on the deprecated weights= shortcut + aggregate="event_study" + cband=True.

Cause: The per-unit 2SLS influence function returned by the mass-point fit is HC1-scaled so that compute_survey_if_variance and the sup-t bootstrap target V_HC1 consistently. Mixing it with a classical analytical SE would silently report a V_HC1-targeted variance under a classical label.

Solutions:

# The constructor default `robust=False` maps to `vcov_type='classical'`
# and triggers the guard on the mass-point survey path - so plain
# `HeterogeneousAdoptionDiD()` is NOT a workaround. Pick one of:
est = HeterogeneousAdoptionDiD(vcov_type='hc1')
# Or equivalently:
est = HeterogeneousAdoptionDiD(robust=True)  # maps to vcov_type='hc1'

A classical-aligned IF derivation is queued for a follow-up release; until then, vcov_type='hc1' (or the equivalent robust=True) is the recommended path for survey + mass-point fits. See Heterogeneous Adoption Difference-in-Differences for the full SE-regime contract.

“Panel-only event-study restriction”#

Problem: HeterogeneousAdoptionDiD.fit(..., aggregate="event_study") raises on a staggered panel.

Cause: The Appendix B.2 event-study extension requires either a common-adoption panel (single first-treat period; first_treat_col is then optional and the period is inferred from the dose invariant) or a staggered panel with first_treat_col provided so the estimator can auto-filter to the last-treatment cohort plus never-treated units (with a UserWarning). The fit raises only when the panel is staggered and first_treat_col is missing.

Solutions:

import numpy as np
import pandas as pd

# Build a staggered HAD panel for this example: 120 units, three
# cohorts (30 never-treated + 30 treated at period 5 + 60 treated at
# period 8). Dose is zero pre-treatment per unit and a constant
# positive value post-treatment, so the first_treat / dose-path
# consistency validator passes. The 60-unit last cohort gives the
# boundary local-linear estimator enough distinct dose values to fit.
np.random.seed(42)
n_units, n_periods = 120, 10
first_treat_per_unit = np.array([0] * 30 + [5] * 30 + [8] * 60)
dose_per_unit = np.where(
    first_treat_per_unit > 0, np.random.uniform(0.5, 2.0, n_units), 0.0
)
rows = []
for u in range(n_units):
    ft = first_treat_per_unit[u]
    for t in range(n_periods):
        d_ut = dose_per_unit[u] if (ft > 0 and t >= ft) else 0.0
        y_ut = (d_ut > 0) * dose_per_unit[u] * 0.5 + np.random.normal()
        rows.append((u, t, d_ut, ft, y_ut))
data = pd.DataFrame(rows, columns=["unit", "period", "dose", "first_treat", "y"])

# Primary remedy: pass `first_treat_col` so the estimator auto-filters
# to the last-treatment cohort + never-treated and emits a UserWarning.
est = HeterogeneousAdoptionDiD()
results = est.fit(data, outcome_col='y', unit_col='unit',
                  time_col='period', dose_col='dose',
                  first_treat_col='first_treat',
                  aggregate='event_study')

# Equivalent: subset to the last-treatment cohort + never-treated
# before fitting (skips the UserWarning).
last_cohort = data['first_treat'].max()
subset = data[(data['first_treat'] == last_cohort) |
              (data['first_treat'] == 0)]
results = est.fit(subset, outcome_col='y', unit_col='unit',
                  time_col='period', dose_col='dose',
                  aggregate='event_study')

Imputation / Two-Stage DiD Issues#

“Non-constant first_treat values”#

Problem: ImputationDiD or TwoStageDiD issues a warning because first_treat varies within units. The estimator coerces to a single value per unit (using the first observed value) and proceeds, but results may be unreliable.

Causes:

  1. Units switch treatment status back and forth

  2. Data merge errors created inconsistent first_treat values

Solutions:

# Check for non-constant first_treat within units
varying = data.groupby('unit_id')['first_treat'].nunique()
bad_units = varying[varying > 1].index
print(f"Units with varying first_treat: {len(bad_units)}")

# Fix: ensure first_treat is constant per unit (absorbing state)
first_treat_map = data.groupby('unit_id')['first_treat'].first()
data['first_treat'] = data['unit_id'].map(first_treat_map)

“Units treated in all observed periods”#

Problem: All observed periods for some units are post-treatment, so no pre-treatment outcomes exist to construct counterfactuals.

Causes:

  1. Always-treated units entered the panel already treated

  2. Observation window starts after treatment onset for some cohorts

Solutions:

# Identify always-treated units (treated at or before first observed period)
# Exclude never-treated (first_treat == 0) which are the control group
unit_ft = data.groupby('unit_id')['first_treat'].first()
min_period = data['period'].min()
always_treated = unit_ft[(unit_ft > 0) & (unit_ft <= min_period)]
print(f"Always-treated units: {len(always_treated)}")

# Drop always-treated units (keep never-treated controls)
data = data[~data['unit_id'].isin(always_treated.index)]

“Horizons not identified without never-treated units”#

Problem: Certain event study horizons return NaN because they require never-treated units for identification (Proposition 5 in Borusyak et al.).

Causes:

  1. No never-treated units in the data

  2. Specific long-horizon estimates need a comparison group that spans those periods

Solutions:

# Check for never-treated units
never_treated = data.groupby('unit_id')['first_treat'].first()
print(f"Never-treated units: {(never_treated == 0).sum()}")

# Option 1: Include never-treated units in your sample
# Option 2: Accept NaN for unidentified horizons
results = ImputationDiD().fit(data, outcome='y', unit='unit_id',
                             time='period', first_treat='first_treat')
# NaN horizons are expected when never-treated units are absent

Bacon Decomposition Issues#

“Unbalanced panel detected”#

Problem: BaconDecomposition issues a warning because the panel is unbalanced. Bacon decomposition assumes balanced panels and results may be inaccurate with missing observations.

Causes:

  1. Some units are missing observations for certain time periods

  2. Units entered or exited the panel at different times

Solutions:

from diff_diff import balance_panel, BaconDecomposition

# Balance the panel first
balanced = balance_panel(data, unit_column='unit_id', time_column='period')
print(f"Dropped {len(data) - len(balanced)} observations to balance panel")

# Then run decomposition
bacon = BaconDecomposition()
results = bacon.fit(balanced, outcome='y', unit='unit_id',
                    time='period', first_treat='first_treat')

Getting Help#

If you encounter issues not covered here:

  1. Check the API documentation for parameter details

  2. Run validation with validate_did_data() to catch data issues

  3. Start simple with basic DiD before adding complexity

  4. Compare with known results using generate_did_data()

# Generate test data with known effect
from diff_diff import generate_did_data, DifferenceInDifferences

data = generate_did_data(n_units=100, n_periods=10, treatment_effect=2.0)
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
print(f"True effect: 2.0, Estimated: {results.att:.3f}")

For bugs or feature requests, please open an issue on GitHub.