Bacon Decomposition (Goodman-Bacon 2021)#
Diagnostic decomposition of Two-Way Fixed Effects (TWFE) estimators for staggered treatment designs.
This module implements the Goodman-Bacon (2021) decomposition, which reveals that a TWFE estimate with variation in treatment timing is a weighted average of all possible 2x2 Difference-in-Differences comparisons. The decomposition exposes the implicit comparisons that drive the TWFE estimate – including potentially problematic “forbidden comparisons” where already-treated units serve as controls – and quantifies their relative importance.
When to use BaconDecomposition:
You have a staggered adoption design and want to diagnose whether the TWFE estimate is driven by clean or problematic comparisons
You need to assess the severity of heterogeneous treatment effect bias in existing TWFE results
You want to understand why TWFE and robust estimators (e.g., Callaway-Sant’Anna) produce different estimates
You are deciding whether a simple TWFE specification is adequate or whether a robust staggered estimator is needed
Reference: Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254-277.
BaconDecomposition#
Main estimator class for the Goodman-Bacon decomposition.
- class diff_diff.BaconDecomposition[source]
Bases:
objectGoodman-Bacon (2021) decomposition of Two-Way Fixed Effects estimator.
This class decomposes a TWFE estimate into a weighted average of all possible 2x2 DiD comparisons, revealing the implicit comparisons that drive the TWFE estimate and their relative importance.
The decomposition identifies three types of comparisons:
Treated vs Never-treated: Uses never-treated units as controls. These are “clean” comparisons without bias concerns.
Earlier vs Later treated: Units treated earlier are compared to units treated later, using the later group as controls before they are treated. These are valid comparisons.
Later vs Earlier treated: Units treated later are compared to units treated earlier, using the earlier group as controls AFTER they are already treated. These are “forbidden comparisons” that can introduce bias when treatment effects vary over time.
- Parameters:
weights (str, default="exact") –
Weight calculation method:
”exact” (default): Variance-based weights from Goodman-Bacon (2021) Theorem 1, Eqs. 7-9 and 10e-g. Produces the paper-faithful decomposition where the weighted sum matches the TWFE estimate to machine precision. Use for publication-quality work and the standard methodology contract.
”approximate”: Fast simplified formula using group shares and treatment variance, with post-hoc sum-to-1 normalization. Opt in for speed-sensitive diagnostic loops where the relative weight structure is sufficient. Approximate-mode results may differ numerically from R
bacondecomp::bacon().
- weights
The weight calculation method.
- Type:
- results_
Decomposition results after calling fit().
- is_fitted_
Whether the model has been fitted.
- Type:
Examples
Basic usage:
>>> import pandas as pd >>> from diff_diff import BaconDecomposition >>> >>> # Panel data with staggered treatment >>> data = pd.DataFrame({ ... 'unit': [...], ... 'time': [...], ... 'outcome': [...], ... 'first_treat': [...] # 0 for never-treated ... }) >>> >>> bacon = BaconDecomposition() >>> results = bacon.fit(data, outcome='outcome', unit='unit', ... time='time', first_treat='first_treat') >>> results.print_summary()
Visualizing the decomposition:
>>> from diff_diff import plot_bacon >>> plot_bacon(results)
Notes
The key insight from Goodman-Bacon (2021) is that TWFE with staggered treatment timing implicitly makes comparisons using already-treated units as controls. When treatment effects are dynamic (changing over time since treatment), these “forbidden comparisons” can bias the TWFE estimate, potentially even reversing its sign.
The decomposition helps diagnose this issue by showing: - How much weight is on each type of comparison - Whether forbidden comparisons contribute significantly to the estimate - How the 2x2 estimates vary across comparison types
If forbidden comparisons have substantial weight and different estimates than clean comparisons, consider using robust estimators like Callaway-Sant’Anna that avoid these problematic comparisons.
References
Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254-277.
See also
CallawaySantAnnaRobust estimator for staggered DiD
TwoWayFixedEffectsThe TWFE estimator being decomposed
Methods
fit(data, outcome, unit, time, first_treat)Perform the Goodman-Bacon decomposition.
get_params()Get estimator parameters (sklearn-compatible).
set_params(**params)Set estimator parameters (sklearn-compatible).
- __init__(weights='exact')[source]
Initialize BaconDecomposition.
- Parameters:
weights (str, default="exact") –
Weight calculation method:
”exact” (default): Variance-based weights from Goodman-Bacon (2021) Theorem 1 (paper-faithful Eqs. 7-9 and 10e-g).
”approximate”: Fast simplified formula. Opt in for speed.
- results_: BaconDecompositionResults | None
- is_fitted_: bool
- fit(data, outcome, unit, time, first_treat, survey_design=None)[source]
Perform the Goodman-Bacon decomposition.
- Parameters:
data (pd.DataFrame) – Panel data with unit and time identifiers.
outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. The values
0andnp.infare reserved as never-treated sentinels (not configurable today); a real treatment cohort withfirst_treat == 0would be folded intoUand should instead be re-labeled to a non-sentinel value before fitting. Units whosefirst_treatis at or before the first observable period (first_treat <= min(time), excluding the never-treated sentinels0andnp.inf) are automatically remapped to theU(untreated) bucket per Goodman-Bacon (2021) footnote 11, with aUserWarning. Library boundary extension: the paper uses the strict inequalityt_i < 1(units treated before the first observable period); the library uses the inclusivefirst_treat <= min(time)rule, additionally folding units treated at the first observable period (first_treat == min(time)) intoUbecause such units have no untreated cell in-panel. See REGISTRY’s**Deviation (first-period boundary extension on always-treated remap)**block for the full contract. Detection uses ordered-time logic on the time axis so panels whosetimecolumn contains negative or zero-crossing labels (e.g. event-timetime ∈ [-2,..,3]) are handled correctly; the0sentinel restriction applies only tofirst_treat, not totime. The user’s originalfirst_treatcolumn ondatais preserved unchanged; remapping happens in an internal column. The count of remapped units is exposed on the result asBaconDecompositionResults.n_always_treated_remapped.survey_design (SurveyDesign, optional) – Survey design specification for weighted estimation. When provided, all means and group shares use survey weights. The decomposition remains diagnostic (no survey vcov needed).
- Returns:
Object containing decomposition results.
- Return type:
- Raises:
ValueError – If required columns are missing or data validation fails.
- set_params(**params)[source]
Set estimator parameters (sklearn-compatible).
- Return type:
- print_summary()[source]
Print summary to stdout.
- Return type:
None
BaconDecompositionResults#
Results container for the Bacon decomposition.
- class diff_diff.bacon.BaconDecompositionResults[source]
Bases:
objectResults from Goodman-Bacon decomposition of TWFE.
This decomposition shows that the TWFE estimate equals a weighted average of all possible 2x2 DiD comparisons between timing groups.
- twfe_estimate
The overall TWFE coefficient (should equal weighted sum of 2x2 estimates).
- Type:
- comparisons
List of all 2x2 comparisons with their estimates and weights.
- Type:
List[Comparison2x2]
- total_weight_treated_vs_never
Total weight on treated vs never-treated comparisons.
- Type:
- total_weight_earlier_vs_later
Total weight on earlier vs later treated comparisons.
- Type:
- total_weight_later_vs_earlier
Total weight on later vs earlier treated comparisons (forbidden).
- Type:
- weighted_avg_treated_vs_never
Weighted average effect from treated vs never-treated comparisons.
- Type:
- weighted_avg_earlier_vs_later
Weighted average effect from earlier vs later comparisons.
- Type:
- weighted_avg_later_vs_earlier
Weighted average effect from later vs earlier comparisons.
- Type:
- n_timing_groups
Number of distinct treatment timing groups.
- Type:
- n_never_treated
Number of never-treated units.
- Type:
- n_always_treated_remapped
Number of units whose
first_treatwas at or before the first observable period (first_treat <= min(time), excluding the never-treated sentinels0andnp.inf) and which were automatically remapped to theU(untreated) bucket per Goodman-Bacon (2021) footnote 11. Detection uses ordered-time logic so negative or zero-crossing period labels work correctly. Zero on inputs where the user only used thefirst_treat ∈ {0, np.inf}sentinels. The user’s originalfirst_treatcolumn is preserved unchanged on the inputdataframe; remapping happens in an internal column.- Type:
- timing_groups
List of treatment timing cohorts.
- Type:
List[Any]
Methods
summary()Generate a formatted summary of the decomposition.
print_summary()Print the summary to stdout.
to_dataframe()Convert comparisons to a DataFrame.
weight_by_type()Get total weight by comparison type.
effect_by_type()Get weighted average effect by comparison type.
- twfe_estimate: float
- comparisons: List[Comparison2x2]
- total_weight_treated_vs_never: float
- total_weight_earlier_vs_later: float
- total_weight_later_vs_earlier: float
- n_timing_groups: int
- n_never_treated: int
- n_obs: int = 0
- decomposition_error: float = 0.0
- n_always_treated_remapped: int = 0
- summary()[source]
Generate a formatted summary of the decomposition.
- Returns:
Formatted summary table.
- Return type:
- print_summary()[source]
Print the summary to stdout.
- Return type:
None
- to_dataframe()[source]
Convert comparisons to a DataFrame.
- Returns:
DataFrame with one row per 2x2 comparison.
- Return type:
pd.DataFrame
- weight_by_type()[source]
Get total weight by comparison type.
- effect_by_type()[source]
Get weighted average effect by comparison type.
- __init__(twfe_estimate, comparisons, total_weight_treated_vs_never, total_weight_earlier_vs_later, total_weight_later_vs_earlier, weighted_avg_treated_vs_never, weighted_avg_earlier_vs_later, weighted_avg_later_vs_earlier, n_timing_groups, n_never_treated, timing_groups, n_obs=0, decomposition_error=0.0, n_always_treated_remapped=0, survey_metadata=None)
- Parameters:
twfe_estimate (float)
comparisons (List[Comparison2x2])
total_weight_treated_vs_never (float)
total_weight_earlier_vs_later (float)
total_weight_later_vs_earlier (float)
weighted_avg_treated_vs_never (float | None)
weighted_avg_earlier_vs_later (float | None)
weighted_avg_later_vs_earlier (float | None)
n_timing_groups (int)
n_never_treated (int)
n_obs (int)
decomposition_error (float)
n_always_treated_remapped (int)
survey_metadata (Any | None)
- Return type:
None
Comparison2x2#
Container for an individual 2x2 DiD comparison within the decomposition.
- class diff_diff.bacon.Comparison2x2[source]
Bases:
objectA single 2x2 DiD comparison in the Bacon decomposition.
- treated_group
The timing group used as “treated” in this comparison.
- Type:
Any
- control_group
The timing group used as “control” in this comparison. For
comparison_type="treated_vs_never", this is the literal string"never_treated", which refers to the post-remap U bucket (the paper’sUper Goodman-Bacon 2021 footnote 11). On inputs with no remapped always-treated units this is exactly the true never-treated set; with remapping it is the broader U bucket. CheckBaconDecompositionResults.n_never_treatedandn_always_treated_remappedfor the precise composition.- Type:
Any
- comparison_type
Type of comparison: “treated_vs_never”, “earlier_vs_later”, or “later_vs_earlier”.
- Type:
- estimate
The 2x2 DiD estimate for this comparison.
- Type:
- weight
The weight assigned to this comparison in the TWFE average.
- Type:
- n_treated
Number of treated observations in this comparison.
- Type:
- n_control
Number of control observations in this comparison.
- Type:
- treated_group: Any
- control_group: Any
- comparison_type: str
- estimate: float
- weight: float
- n_treated: int
- n_control: int
Convenience Function#
- diff_diff.bacon_decompose(data, outcome, unit, time, first_treat, weights='exact', survey_design=None)[source]#
Convenience function for Goodman-Bacon decomposition.
Decomposes a TWFE estimate into weighted 2x2 DiD comparisons, showing which comparisons drive the estimate and whether problematic “forbidden comparisons” are involved.
- Parameters:
data (pd.DataFrame) – Panel data with unit and time identifiers.
outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. The values
0andnp.infare reserved as never-treated sentinels; a real treatment cohort withfirst_treat == 0would be folded intoUand should be re-labeled to a non-sentinel value before fitting. Units whosefirst_treatis at or before the first observable period (first_treat <= min(time), excluding the sentinels) are automatically remapped to theU(untreated) bucket per Goodman-Bacon (2021) footnote 11, with aUserWarning. SeeBaconDecomposition.fit()for the full contract andBaconDecompositionResults.n_always_treated_remappedfor the count. The user’s originalfirst_treatcolumn is preserved unchanged.weights (str, default="exact") –
Weight calculation method:
”exact” (default): Variance-based weights from Goodman-Bacon (2021) Theorem 1, Eqs. 7-9 and 10e-g. Paper-faithful.
”approximate”: Fast simplified formula. Opt in for speed-sensitive diagnostic loops; numerical output may differ from R
bacondecomp::bacon().
survey_design (SurveyDesign, optional) –
Survey design specification for weighted estimation. When provided, cell means, group shares, and within-transform use survey weights. The decomposition remains diagnostic (no survey vcov needed).
Default-flip caveat (PR-B, 2026-05-16): the new
weights="exact"default routes through_validate_unit_constant_survey, which rejects survey designs whose weights / strata / PSU / FPC columns vary within a unit across periods (the exact path collapses to per-unit aggregation viagroupby().first()). Users whose survey design has time-varying within-unit columns must either (a) collapse the columns to be unit-constant or (b) pass explicitweights="approximate"to retain the legacy observation-level weighted-means path.
- Returns:
Object containing decomposition results with:
twfe_estimate: The overall TWFE coefficient
comparisons: List of all 2x2 comparisons with estimates and weights
Weight totals by comparison type
Methods for visualization and export
- Return type:
Examples
>>> from diff_diff import bacon_decompose >>> >>> # Default: paper-faithful Goodman-Bacon (2021) Theorem 1 weights >>> # (weights="exact"); matches R bacondecomp::bacon() at atol=1e-6 on >>> # the aggregate (TWFE coefficient + weights-sum) across all panels, >>> # and on the per-component breakdown when there are no >>> # always-treated / first-period-treated cohorts (i.e. all >>> # non-sentinel first_treat values are strictly greater than >>> # min(time)). For panels with always-treated units, the >>> # per-component breakdown diverges by convention (Python remaps >>> # to U per paper footnote 11; R emits `Later vs Always Treated`); >>> # see REGISTRY note on R parity convention divergence. Validated >>> # via tests/test_methodology_bacon.py::TestBaconParityR. >>> results = bacon_decompose( ... data=panel_df, ... outcome='earnings', ... unit='state', ... time='year', ... first_treat='treatment_year' ... ) >>> >>> # Opt-in: simplified-variance fast path for diagnostic loops >>> # (numerical output may differ from R; sum-to-1 still holds). >>> results_approx = bacon_decompose( ... data=panel_df, ... outcome='earnings', ... unit='state', ... time='year', ... first_treat='treatment_year', ... weights='approximate' ... ) >>> >>> # View summary >>> results.print_summary() >>> >>> # Check weight on forbidden comparisons >>> print(f"Forbidden weight: {results.total_weight_later_vs_earlier:.1%}") >>> >>> # Convert to DataFrame for analysis >>> df = results.to_dataframe()
See also
BaconDecompositionClass-based interface with more options
plot_baconVisualize the decomposition
CallawaySantAnnaRobust estimator that avoids forbidden comparisons
Example Usage#
Basic usage:
from diff_diff import BaconDecomposition, generate_staggered_data
data = generate_staggered_data(n_units=200, n_periods=12,
cohort_periods=[4, 6, 8], seed=42)
bacon = BaconDecomposition()
results = bacon.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
results.print_summary()
Visualizing with plot_bacon:
from diff_diff import plot_bacon
# Scatter plot of 2x2 estimates vs weights, colored by comparison type
ax = plot_bacon(results)
ax.figure.show()
Interpreting the decomposition:
# Convert to DataFrame for detailed inspection
df = results.to_dataframe()
print(df[['treated_group', 'control_group', 'comparison_type',
'estimate', 'weight']])
# Check weight breakdown by comparison type
weights = results.weight_by_type()
print(f"Treated vs Never-treated: {weights['treated_vs_never']:.1%}")
print(f"Earlier vs Later: {weights['earlier_vs_later']:.1%}")
print(f"Later vs Earlier: {weights['later_vs_earlier']:.1%}")
# Compare weighted average effects across comparison types
effects = results.effect_by_type()
for comp_type, effect in effects.items():
if effect is not None:
print(f" {comp_type}: {effect:.4f}")
Using exact weights for publication-quality results:
bacon = BaconDecomposition(weights='exact')
results = bacon.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
# Verify the weighted sum closely matches the TWFE estimate
print(f"TWFE estimate: {results.twfe_estimate:.4f}")
print(f"Decomposition error: {results.decomposition_error:.6f}")
When Is TWFE Reliable?#
The Bacon decomposition helps answer whether a standard TWFE regression is adequate for a particular dataset. As a rule of thumb:
TWFE is likely reliable when the weight on “later vs earlier” (forbidden) comparisons is small, or when 2x2 estimates are similar across all comparison types. This suggests treatment effect heterogeneity is not meaningfully biasing the TWFE estimate.
TWFE may be unreliable when forbidden comparisons carry substantial weight and their estimates differ markedly from the clean comparisons. In this case, consider using a robust staggered estimator such as
CallawaySantAnna,SunAbraham, orStackedDiD.