Interactive notebook
This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.
Staggered Difference-in-Differences#
This notebook demonstrates how to handle staggered treatment adoption using modern DiD estimators. In staggered DiD settings:
Different units get treated at different times
Traditional TWFE can give biased estimates due to “forbidden comparisons”
Modern estimators compute group-time specific effects and aggregate them properly
We’ll cover:
Understanding staggered adoption
The problem with TWFE (and Goodman-Bacon decomposition)
The Callaway-Sant’Anna estimator
Group-time effects ATT(g,t)
Aggregating effects (simple, group, event-study)
Bootstrap inference for valid standard errors
Visualization
Pre-treatment effects and parallel trends testing
Different control group options
Handling anticipation effects
Adding covariates
Comparing with MultiPeriodDiD
Sun-Abraham interaction-weighted estimator
Comparing CS and SA as a robustness check
[ ]:
import numpy as np
import pandas as pd
from diff_diff import CallawaySantAnna, SunAbraham, MultiPeriodDiD
from diff_diff.visualization import plot_event_study, plot_group_effects
# For nicer plots (optional)
try:
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-whitegrid')
HAS_MATPLOTLIB = True
except ImportError:
HAS_MATPLOTLIB = False
print("matplotlib not installed - visualization examples will be skipped")
1. Understanding Staggered Adoption#
In a staggered adoption design, units adopt treatment at different times. We call the period when a unit first receives treatment its cohort or group.
[ ]:
# Generate staggered adoption data using the library function
from diff_diff import generate_staggered_data
# Generate data with 100 units, 8 periods, two treatment cohorts (periods 3 and 5),
# and 40% never-treated
df = generate_staggered_data(
n_units=100,
n_periods=8,
cohort_periods=[3, 5], # Treatment cohorts at periods 3 and 5
never_treated_frac=0.4,
treatment_effect=2.0,
dynamic_effects=True,
effect_growth=0.5, # Effect grows 0.5 per period
unit_fe_sd=2.0,
noise_sd=0.5,
seed=42
)
# The DGP returns 'first_treat' column: 0 = never-treated, >0 = first treatment period
print(f"Dataset: {len(df)} observations, {df['unit'].nunique()} units, {df['period'].nunique()} periods")
df.head(10)
[ ]:
# Examine treatment timing
cohort_summary = df.groupby('unit').agg({'first_treat': 'first', 'treated': 'sum'}).reset_index()
print("Treatment cohorts:")
print(cohort_summary.groupby('first_treat').size())
print("\nTreatment adoption over time:")
print(df.groupby('period')['treated'].mean().round(3))
2. The Problem with TWFE in Staggered Settings#
Traditional Two-Way Fixed Effects (TWFE) can give biased estimates because:
It uses already-treated units as controls for newly-treated units
With heterogeneous treatment effects, this leads to “negative weighting”
Let’s see what TWFE would give us:
[ ]:
from diff_diff import TwoWayFixedEffects
# TWFE estimation (potentially biased with heterogeneous effects)
twfe = TwoWayFixedEffects()
results_twfe = twfe.fit(
df,
outcome="outcome",
treatment="treated",
unit="unit",
time="period"
)
print("TWFE Estimate (potentially biased):")
print(f"ATT: {results_twfe.att:.4f}")
Understanding Why TWFE Fails: Goodman-Bacon Decomposition#
The Goodman-Bacon (2021) decomposition reveals exactly why TWFE can be biased. It shows that the TWFE estimate is a weighted average of all possible 2x2 DiD comparisons, including problematic “forbidden comparisons” where already-treated units are used as controls.
There are three types of comparisons:
Treated vs Never-treated (green): Clean comparisons using never-treated units
Earlier vs Later treated (blue): Uses later-treated as controls before they’re treated
Later vs Earlier treated (red): Uses already-treated as controls — the “forbidden comparisons”
When treatment effects are heterogeneous (as in our data where effects grow over time), the forbidden comparisons can bias the TWFE estimate.
[ ]:
from diff_diff import bacon_decompose, plot_bacon
# Perform the Goodman-Bacon decomposition
bacon_results = bacon_decompose(
df,
outcome='outcome',
unit='unit',
time='period',
first_treat='first_treat' # 0 means never-treated
)
# View the decomposition summary
bacon_results.print_summary()
[ ]:
# Visualize the decomposition
if HAS_MATPLOTLIB:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Scatter plot: shows each 2x2 comparison
plot_bacon(bacon_results, ax=axes[0], plot_type='scatter', show=False)
# Bar chart: shows total weight by comparison type
plot_bacon(bacon_results, ax=axes[1], plot_type='bar', show=False)
plt.tight_layout()
plt.show()
# Interpret the results
forbidden_weight = bacon_results.total_weight_later_vs_earlier
print(f"\n⚠️ {forbidden_weight:.1%} of the TWFE weight comes from 'forbidden comparisons'")
print(" where already-treated units are used as controls.")
print("\n→ This explains why TWFE can be biased. Use Callaway-Sant'Anna instead!")
3. Callaway-Sant’Anna Estimator#
The CS estimator avoids these problems by:
Computing separate effects for each (group, time) pair: ATT(g,t)
Only using not-yet-treated or never-treated units as controls
Properly aggregating these effects
[ ]:
# Callaway-Sant'Anna estimation
cs = CallawaySantAnna(
control_group="never_treated", # Use never-treated as controls
anticipation=0 # No anticipation effects
)
results_cs = cs.fit(
df,
outcome="outcome",
unit="unit",
time="period",
first_treat="first_treat", # Column with first treatment period (0 = never treated)
aggregate="all" # Compute all aggregations (simple, event_study, group)
)
print(results_cs.summary())
4. Group-Time Effects ATT(g,t)#
The CS estimator computes separate effects for each combination of:
g: Treatment cohort (when the group was first treated)
t: Calendar time period
[ ]:
# View all group-time effects
print("Group-Time Effects ATT(g,t):")
print("=" * 60)
for (g, t), data in results_cs.group_time_effects.items():
sig = "*" if data['p_value'] < 0.05 else ""
print(f"ATT({g},{t}): {data['effect']:>7.4f} "
f"(SE: {data['se']:.4f}, p: {data['p_value']:.3f}) {sig}")
[ ]:
# Convert to DataFrame for easier analysis
gt_df = results_cs.to_dataframe()
print("\nGroup-time effects as DataFrame:")
gt_df
5. Aggregating Effects#
We often want to summarize the group-time effects into a single number or event-study style estimates.
[ ]:
# Simple aggregation: weighted average across all (g,t)
# This is computed automatically and stored in overall_att/overall_se
print("Simple Aggregation (Overall ATT):")
print(f"ATT: {results_cs.overall_att:.4f}")
print(f"SE: {results_cs.overall_se:.4f}")
print(f"95% CI: [{results_cs.overall_conf_int[0]:.4f}, {results_cs.overall_conf_int[1]:.4f}]")
[ ]:
# Group aggregation: average effect by cohort
# Requires aggregate="group" or "all" in fit()
print("\nGroup Aggregation (ATT by cohort):")
for cohort, effects in results_cs.group_effects.items():
print(f"Cohort {cohort}: ATT = {effects['effect']:.4f} (SE: {effects['se']:.4f})")
[ ]:
# Event-study aggregation: average effect by time relative to treatment
# Requires aggregate="event_study" or "all" in fit()
print("\nEvent-Study Aggregation (ATT by event time):")
print(f"{'Event Time':>12} {'ATT':>10} {'SE':>10} {'95% CI':>25}")
print("-" * 60)
for event_time in sorted(results_cs.event_study_effects.keys()):
effects = results_cs.event_study_effects[event_time]
ci = effects['conf_int']
print(f"{event_time:>12} {effects['effect']:>10.4f} {effects['se']:>10.4f} "
f"[{ci[0]:>8.4f}, {ci[1]:>8.4f}]")
6. Bootstrap Inference#
With few clusters or when analytical standard errors may be unreliable, the multiplier bootstrap provides valid inference. This implements the approach from Callaway & Sant’Anna (2021), perturbing unit-level influence functions.
Why use bootstrap?
Analytical SEs may understate uncertainty with few clusters
Bootstrap provides finite-sample valid confidence intervals
P-values are computed from the bootstrap distribution
Weight types:
'rademacher'- Default, ±1 with p=0.5, good for most cases'mammen'- Two-point distribution, matches first 3 moments'webb'- Six-point distribution, recommended for very few clusters (<10)
[ ]:
# Callaway-Sant'Anna with bootstrap inference
cs_boot = CallawaySantAnna(
control_group="never_treated",
n_bootstrap=499, # Number of bootstrap iterations
bootstrap_weights='rademacher', # or 'mammen', 'webb'
seed=42 # For reproducibility
)
results_boot = cs_boot.fit(
df,
outcome="outcome",
unit="unit",
time="period",
first_treat="first_treat", # Column with first treatment period
aggregate="event_study" # Compute event study aggregation
)
# Access bootstrap results
print("Bootstrap Inference Results:")
print("=" * 60)
print(f"\nOverall ATT: {results_boot.overall_att:.4f}")
print(f"Bootstrap SE: {results_boot.bootstrap_results.overall_att_se:.4f}")
print(f"Bootstrap 95% CI: [{results_boot.bootstrap_results.overall_att_ci[0]:.4f}, "
f"{results_boot.bootstrap_results.overall_att_ci[1]:.4f}]")
print(f"Bootstrap p-value: {results_boot.bootstrap_results.overall_att_p_value:.4f}")
[ ]:
# Event study with bootstrap confidence intervals
print("\nEvent Study with Bootstrap Inference:")
print(f"{'Event Time':>12} {'ATT':>10} {'Boot SE':>10} {'Boot 95% CI':>25} {'p-value':>10}")
print("-" * 70)
event_ses = results_boot.bootstrap_results.event_study_ses
event_cis = results_boot.bootstrap_results.event_study_cis
event_pvals = results_boot.bootstrap_results.event_study_p_values
for event_time in sorted(event_ses.keys()):
att = results_boot.event_study_effects[event_time]['effect']
se = event_ses[event_time]
ci = event_cis[event_time]
pval = event_pvals[event_time]
sig = "*" if pval < 0.05 else ""
print(f"{event_time:>12} {att:>10.4f} {se:>10.4f} [{ci[0]:>8.4f}, {ci[1]:>8.4f}] {pval:>10.4f} {sig}")
7. Visualization#
Event-study plots are the standard way to visualize DiD results with multiple periods.
[ ]:
if HAS_MATPLOTLIB:
# Event study plot
fig, ax = plt.subplots(figsize=(10, 6))
plot_event_study(
results=results_cs,
ax=ax,
title="Event Study: Effect of Treatment Over Time",
xlabel="Periods Since Treatment",
ylabel="ATT"
)
plt.tight_layout()
plt.show()
else:
print("Install matplotlib to see visualizations: pip install matplotlib")
[ ]:
if HAS_MATPLOTLIB:
# Plot effects by cohort
fig, ax = plt.subplots(figsize=(10, 6))
plot_group_effects(
results=results_cs,
ax=ax,
title="Treatment Effects by Cohort"
)
plt.tight_layout()
plt.show()
8. Pre-Treatment Effects and Parallel Trends Testing#
The Callaway-Sant’Anna estimator can compute pre-treatment effects ATT(g,t) for periods before treatment. These should be near zero if parallel trends holds.
The base_period parameter controls how the reference period is selected:
"varying"(default): For pre-treatment periods, compares t to t-1 (consecutive comparisons)"universal": Always compares to g-1 (or g-anticipation-1 when anticipation > 0)
Both produce identical post-treatment effects; they differ only for pre-treatment diagnostics.
[ ]:
# CallawaySantAnna with explicit base_period for pre-treatment effects
cs_pretrends = CallawaySantAnna(
control_group="never_treated",
base_period="varying" # Default: consecutive comparisons for pre-periods
)
results_pretrends = cs_pretrends.fit(
df,
outcome="outcome",
unit="unit",
time="period",
first_treat="first_treat",
aggregate="event_study"
)
# The base_period is recorded in results
print(f"Base period method: {results_pretrends.base_period}")
[ ]:
# Examine pre-treatment effects (event time < 0)
print("Pre-Treatment Effects (Parallel Trends Diagnostic):")
print("=" * 65)
print(f"{'Event Time':>12} {'ATT':>10} {'SE':>10} {'95% CI':>25} {'Test'}")
print("-" * 65)
pre_period_effects = []
for event_time in sorted(results_pretrends.event_study_effects.keys()):
if event_time < 0:
effects = results_pretrends.event_study_effects[event_time]
ci = effects['conf_int']
includes_zero = ci[0] <= 0 <= ci[1]
marker = "Pass" if includes_zero else "Fail"
pre_period_effects.append(effects['effect'])
print(f"{event_time:>12} {effects['effect']:>10.4f} {effects['se']:>10.4f} "
f"[{ci[0]:>8.4f}, {ci[1]:>8.4f}] {marker}")
if pre_period_effects:
print(f"\n-> All pre-treatment effects should be close to zero")
print(f" Mean pre-treatment effect: {np.mean(pre_period_effects):.4f}")
else:
print("No pre-treatment effects computed (insufficient pre-periods)")
Comparing Base Period Methods#
Let’s compare the two base period methods to understand their difference:
[ ]:
# Compare varying vs universal base period
cs_universal = CallawaySantAnna(
control_group="never_treated",
base_period="universal" # Always use g-1 as base (g-anticipation-1 if anticipation > 0)
)
results_universal = cs_universal.fit(
df,
outcome="outcome",
unit="unit",
time="period",
first_treat="first_treat",
aggregate="event_study"
)
print("Pre-Treatment Effects: Varying vs Universal Base Period")
print("=" * 70)
print(f"{'Event Time':>12} {'Varying':>12} {'Universal':>12} {'Difference':>12}")
print("-" * 70)
for event_time in sorted(results_pretrends.event_study_effects.keys()):
if event_time < 0:
varying_eff = results_pretrends.event_study_effects[event_time]['effect']
universal_eff = results_universal.event_study_effects.get(event_time, {}).get('effect', np.nan)
diff = varying_eff - universal_eff if not np.isnan(universal_eff) else np.nan
print(f"{event_time:>12} {varying_eff:>12.4f} {universal_eff:>12.4f} {diff:>12.4f}")
print("\nNote: 'Varying' uses consecutive period comparisons (t vs t-1)")
print(" 'Universal' compares all periods to g-1 (g-anticipation-1 if anticipation > 0)")
Interpreting Pre-Treatment Effects#
What we’re testing:
Pre-treatment ATT(g,t) should be approximately zero if parallel trends holds
Significant non-zero pre-treatment effects suggest potential parallel trends violations
Key insights:
Visual inspection in the event study plot shows pre-period coefficients
Formal tests: 95% CIs including zero is consistent with parallel trends
Important caveat: A “passing” test doesn’t prove parallel trends—the test may lack power
When concerned about pre-trends:
Add covariates for precision (Section 11)
Use
control_group="not_yet_treated"for more data (Section 9)Apply Honest DiD sensitivity analysis to bound effects under violations (Tutorial 05)
Assess pre-trends test power using Tutorial 07
For comprehensive parallel trends testing: Tutorial 04 For pre-trends power analysis (Roth 2022): Tutorial 07
9. Different Control Group Options#
The CS estimator supports different control group specifications:
"never_treated": Only use units that are never treated"not_yet_treated": Use units that haven’t been treated yet at time t
[ ]:
# Using not-yet-treated as control
cs_nyt = CallawaySantAnna(
control_group="not_yet_treated"
)
results_nyt = cs_nyt.fit(
df,
outcome="outcome",
unit="unit",
time="period",
first_treat="first_treat"
)
# Compare using overall_att/overall_se attributes
print("Comparison of control group specifications:")
print(f"{'Control Group':<20} {'ATT':>10} {'SE':>10}")
print("-" * 40)
print(f"{'Never-treated':<20} {results_cs.overall_att:>10.4f} {results_cs.overall_se:>10.4f}")
print(f"{'Not-yet-treated':<20} {results_nyt.overall_att:>10.4f} {results_nyt.overall_se:>10.4f}")
10. Handling Anticipation Effects#
If units start changing behavior before official treatment (anticipation), you can specify the anticipation period.
[ ]:
# Allow for 1 period of anticipation
cs_antic = CallawaySantAnna(
control_group="never_treated",
anticipation=1 # Treatment effects may start 1 period early
)
results_antic = cs_antic.fit(
df,
outcome="outcome",
unit="unit",
time="period",
first_treat="first_treat"
)
print(f"With anticipation=1: ATT = {results_antic.overall_att:.4f}")
11. Adding Covariates#
You can include covariates to improve precision through outcome regression or propensity score methods.
[ ]:
# Add covariates to data
df['size'] = np.random.normal(100, 20, len(df))
df['age'] = np.random.normal(10, 3, len(df))
# Fit with covariates
cs_cov = CallawaySantAnna(
control_group="never_treated"
)
results_cov = cs_cov.fit(
df,
outcome="outcome",
unit="unit",
time="period",
first_treat="first_treat",
covariates=["size", "age"]
)
print(f"With covariates: ATT = {results_cov.overall_att:.4f} (SE: {results_cov.overall_se:.4f})")
12. Comparing with MultiPeriodDiD#
For comparison, here’s how you would use MultiPeriodDiD which estimates period-specific effects.
Important: MultiPeriodDiD assumes simultaneous treatment timing (all treated units get treated at the same time). For staggered adoption, always use CallawaySantAnna or SunAbraham instead.
To demonstrate MultiPeriodDiD properly, we’ll create a simple dataset where all treated units receive treatment at the same time.
[ ]:
# Create a simple dataset with simultaneous treatment timing
# This is the appropriate data structure for MultiPeriodDiD
from diff_diff import generate_did_data
# Generate data with simultaneous treatment at period 4
mp_data = generate_did_data(
n_units=100,
n_periods=8,
treatment_period=4, # All treated units get treatment at period 4
treatment_fraction=0.5,
treatment_effect=2.5,
seed=42
)
print(f"MultiPeriodDiD dataset: {len(mp_data)} obs")
print(f"Treatment starts at period 4 for all treated units")
mp_did = MultiPeriodDiD()
results_mp = mp_did.fit(
mp_data,
outcome="outcome",
treatment="treated",
time="period",
post_periods=[4, 5, 6, 7]
)
print(results_mp.summary())
[ ]:
# Period-specific effects from MultiPeriodDiD
print("\nPeriod-specific effects:")
for period, pe in results_mp.period_effects.items():
print(f"Period {period}: {pe.effect:.4f} (SE: {pe.se:.4f})")
13. Sun-Abraham Interaction-Weighted Estimator#
The Sun-Abraham (2021) estimator provides an alternative approach to staggered DiD. While Callaway-Sant’Anna aggregates 2x2 DiD comparisons, Sun-Abraham uses an interaction-weighted regression approach:
Run a saturated regression with cohort × relative-time indicators
Weight cohort-specific effects by each cohort’s share of treated observations at each relative time
Key differences from CS:
Regression-based vs. 2x2 DiD aggregation
Different weighting scheme
More efficient under homogeneous effects
Consistent under heterogeneous effects (like CS)
When to use both: Running both CS and SA provides a useful robustness check. When they agree, results are more credible.
[ ]:
# Sun-Abraham estimation
sa = SunAbraham(
control_group="never_treated", # Use never-treated as controls
anticipation=0 # No anticipation effects
)
results_sa = sa.fit(
df,
outcome="outcome",
unit="unit",
time="period",
first_treat="first_treat" # Column with first treatment period (0 = never treated)
)
# View summary
results_sa.print_summary()
[ ]:
# Event study effects by relative time
print("Sun-Abraham Event Study Effects:")
print(f"{'Rel. Time':>12} {'Effect':>10} {'SE':>10} {'p-value':>10}")
print("-" * 45)
for rel_time in sorted(results_sa.event_study_effects.keys()):
eff = results_sa.event_study_effects[rel_time]
sig = "*" if eff['p_value'] < 0.05 else ""
print(f"{rel_time:>12} {eff['effect']:>10.4f} {eff['se']:>10.4f} {eff['p_value']:>10.4f} {sig}")
# Cohort weights show how each cohort contributes to event-study estimates
print("\n\nCohort Weights by Relative Time:")
for rel_time in sorted(results_sa.cohort_weights.keys()):
weights = results_sa.cohort_weights[rel_time]
print(f"e={rel_time}: {weights}")
14. Comparing CS and SA as a Robustness Check#
Running both estimators provides a useful robustness check. When they agree, results are more credible.
Understanding Pre-Period Differences#
You may notice that post-treatment effects align closely between CS and SA, but pre-treatment effects can differ in magnitude and significance. This is expected methodological behavior, not a bug.
Why the difference?
Callaway-Sant’Anna with ``base_period=”varying”`` (default):
Pre-treatment effects use consecutive period comparisons (period t vs period t-1)
Each pre-period coefficient represents a one-period change
These smaller incremental changes often yield lower t-statistics
Sun-Abraham:
Uses a fixed reference period (e=-1 when anticipation=0, or e=-1-anticipation otherwise)
All coefficients are deviations from this single reference
Pre-period coefficients show cumulative difference from the reference
To make CS pre-periods more comparable to SA, use base_period="universal":
cs_universal = CallawaySantAnna(base_period="universal")
This makes CS compare all periods to g-1 (like SA), producing more similar pre-treatment estimates.
[ ]:
# Compare overall ATT from both estimators
cs_label = "Callaway-Sant'Anna (varying)"
print("Robustness Check: CS vs SA")
print("=" * 60)
print(f"{'Estimator':<30} {'Overall ATT':>12} {'SE':>10}")
print("-" * 60)
print(f"{cs_label:<30} {results_cs.overall_att:>12.4f} {results_cs.overall_se:>10.4f}")
print(f"{'Sun-Abraham':<30} {results_sa.overall_att:>12.4f} {results_sa.overall_se:>10.4f}")
# Also fit CS with universal base period for comparison
cs_universal = CallawaySantAnna(control_group="never_treated", base_period="universal")
results_cs_univ = cs_universal.fit(
df, outcome="outcome", unit="unit",
time="period", first_treat="first_treat",
aggregate="event_study"
)
# Compare event study effects
print("\n\nEvent Study Comparison:")
print("Note: Pre-periods differ due to base period methodology (see explanation above)")
print(f"{'Rel. Time':>10} {'CS (vary)':>12} {'CS (univ)':>12} {'SA':>10} {'Note':>20}")
print("-" * 70)
for rel_time in sorted(results_sa.event_study_effects.keys()):
sa_eff = results_sa.event_study_effects[rel_time]['effect']
cs_vary = results_cs.event_study_effects.get(rel_time, {}).get('effect', np.nan)
cs_univ = results_cs_univ.event_study_effects.get(rel_time, {}).get('effect', np.nan)
note = "pre (differs)" if rel_time < 0 else "post (matches)"
print(f"{rel_time:>10} {cs_vary:>12.4f} {cs_univ:>12.4f} {sa_eff:>10.4f} {note:>20}")
print("\nPost-treatment effects should be similar across all methods")
print("Pre-treatment differences are expected due to base period methodology")
Summary#
Key takeaways:
TWFE can be biased with staggered adoption and heterogeneous effects
Goodman-Bacon decomposition reveals why TWFE fails by showing:
The implicit 2x2 comparisons and their weights
How much weight falls on “forbidden comparisons” (already-treated as controls)
Callaway-Sant’Anna properly handles staggered adoption by:
Computing group-time specific effects ATT(g,t)
Only using valid comparison groups
Properly aggregating effects
Sun-Abraham provides an alternative approach using:
Interaction-weighted regression with cohort x relative-time indicators
Different weighting scheme than CS
More efficient under homogeneous effects
Run both CS and SA as a robustness check—when they agree, results are more credible
Aggregation options:
"simple": Overall ATT"group": ATT by cohort"event": ATT by event time (for event-study plots)
Bootstrap inference provides valid standard errors and confidence intervals:
Use
n_bootstrapparameter to enable multiplier bootstrapChoose weight type:
'rademacher','mammen', or'webb'Bootstrap results include SEs, CIs, and p-values for all aggregations
Pre-treatment effects provide parallel trends diagnostics:
Use
base_period="varying"for consecutive period comparisonsPre-treatment ATT(g,t) should be near zero
95% CIs including zero is consistent with parallel trends
See Tutorial 07 for pre-trends power analysis (Roth 2022)
Control group choices affect efficiency and assumptions:
"never_treated": Stronger parallel trends assumption"not_yet_treated": Weaker assumption, uses more data
CS vs SA pre-period differences are expected:
Post-treatment effects should be similar (robustness check)
Pre-treatment effects differ due to base period methodology
CS (varying): consecutive comparisons → one-period changes
SA: fixed reference (e=-1-anticipation) → cumulative deviations
Use
base_period="universal"in CS for comparable pre-periods
For more details, see:
Callaway, B., & Sant’Anna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics.
Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics.
Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics.