Stacked Difference-in-Differences#
Stacked DiD estimator for staggered adoption designs with corrective Q-weights.
This module implements the methodology from Wing, Freedman & Hollingsworth (2024), which addresses bias in naive stacked DiD regressions by:
Constructing sub-experiments: One per adoption cohort with clean controls
Applying corrective Q-weights: Ensures proper weighting of treatment and control group trends across sub-experiments
Running weighted event-study regression: WLS with Q-weights identifies the “trimmed aggregate ATT”
When to use Stacked DiD:
Staggered adoption design with multiple treatment cohorts
Want an intuitive sub-experiment-based approach (vs. aggregation methods)
Desire compositional balance: treatment group composition fixed across event times
Need direct access to the stacked dataset for custom analysis
Reference: Wing, C., Freedman, S. M., & Hollingsworth, A. (2024). Stacked Difference-in-Differences. NBER Working Paper 32054. http://www.nber.org/papers/w32054
StackedDiD#
Main estimator class for Stacked Difference-in-Differences.
- class diff_diff.StackedDiD[source]
Bases:
objectStacked Difference-in-Differences estimator.
Implements Wing, Freedman & Hollingsworth (2024). Builds a stacked dataset of sub-experiments (one per adoption cohort), applies corrective Q-weights to address implicit weighting bias in naive stacked regressions, and runs a weighted event-study regression.
- Parameters:
kappa_pre (int, default=1) – Number of pre-treatment event-time periods in the event window. The event window spans [-kappa_pre, …, kappa_post].
kappa_post (int, default=1) – Number of post-treatment event-time periods.
weighting (str, default="aggregate") – Target estimand weighting scheme per Table 1 of the paper: - “aggregate”: Equal weight per adoption event (trimmed aggregate ATT) - “population”: Weight by population size of treated cohort - “sample_share”: Weight by sample share of each sub-experiment
clean_control (str, default="not_yet_treated") – How to define clean controls per Appendix A of the paper: - “not_yet_treated”: Units with A_s > a + kappa_post - “strict”: Units with A_s > a + kappa_post + kappa_pre - “never_treated”: Only units with A_s = infinity
cluster (str, default="unit") – Clustering level for standard errors: - “unit”: Cluster on original unit identifier - “unit_subexp”: Cluster on (unit, sub_experiment) pairs
alpha (float, default=0.05) – Significance level for confidence intervals.
anticipation (int, default=0) – Number of anticipation periods. When anticipation > 0: - Reference period shifts from e=-1 to e=-1-anticipation - Post-treatment includes anticipation periods (e >= -anticipation) - Event window expands by anticipation pre-periods Consistent with ImputationDiD, TwoStageDiD, SunAbraham.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient: - “warn”: Issue warning and drop linearly dependent columns - “error”: Raise ValueError - “silent”: Drop columns silently
- results_
Estimation results after calling fit().
- Type:
- is_fitted_
Whether the model has been fitted.
- Type:
Examples
Basic usage:
>>> from diff_diff import StackedDiD, generate_staggered_data >>> data = generate_staggered_data(n_units=200, seed=42) >>> est = StackedDiD(kappa_pre=2, kappa_post=2) >>> results = est.fit(data, outcome='outcome', unit='unit', ... time='period', first_treat='first_treat') >>> results.print_summary()
With event study:
>>> results = est.fit(data, outcome='outcome', unit='unit', ... time='period', first_treat='first_treat', ... aggregate='event_study') >>> from diff_diff import plot_event_study >>> plot_event_study(results)
Notes
The stacked estimator addresses TWFE bias by: 1. Creating one sub-experiment per adoption cohort with clean controls 2. Applying Q-weights to reweight the stacked regression 3. Running a single event-study WLS regression on the weighted stack
References
- Wing, C., Freedman, S. M., & Hollingsworth, A. (2024). Stacked
Difference-in-Differences. NBER Working Paper 32054.
Methods
fit(data, outcome, unit, time, first_treat)Fit the stacked DiD estimator.
get_params()Get estimator parameters (sklearn-compatible).
set_params(**params)Set estimator parameters (sklearn-compatible).
- __init__(kappa_pre=1, kappa_post=1, weighting='aggregate', clean_control='not_yet_treated', cluster='unit', alpha=0.05, anticipation=0, rank_deficient_action='warn')[source]
- fit(data, outcome, unit, time, first_treat, aggregate=None, population=None, survey_design=None)[source]
Fit the stacked DiD estimator.
- Parameters:
data (pd.DataFrame) – Panel data with unit and time identifiers.
outcome (str) – Name of outcome variable column.
unit (str) – Name of unit identifier column.
time (str) – Name of time period column.
first_treat (str) – Name of column indicating when unit was first treated. Use 0 or np.inf for never-treated units.
aggregate (str, optional) – Aggregation mode: None/”simple” (overall ATT only) or “event_study”. Group aggregation is not supported because the pooled stacked regression cannot produce cohort-specific effects. Use CallawaySantAnna or ImputationDiD for cohort-level estimates.
population (str, optional) – Column name for population weights. Required only when weighting=”population”.
survey_design (SurveyDesign, optional) – Survey design specification for design-based inference. When provided, uses Taylor Series Linearization for variance estimation and applies sampling weights to the regression.
- Returns:
Object containing all estimation results.
- Return type:
- Raises:
ValueError – If required columns are missing or data validation fails.
- set_params(**params)[source]
Set estimator parameters (sklearn-compatible).
- Parameters:
params (Any)
- Return type:
- print_summary()[source]
Print summary to stdout.
- Return type:
None
StackedDiDResults#
Results container for Stacked DiD estimation.
- class diff_diff.StackedDiDResults[source]
Bases:
objectResults from Stacked DiD estimation (Wing, Freedman & Hollingsworth 2024).
- overall_att
Overall average treatment effect on the treated (average of post-treatment event-study coefficients).
- Type:
- overall_se
Standard error of overall ATT (delta method on VCV).
- Type:
- overall_t_stat
T-statistic for overall ATT.
- Type:
- overall_p_value
P-value for overall ATT.
- Type:
- overall_conf_int
Confidence interval for overall ATT.
- Type:
- event_study_effects
Dictionary mapping event time h to effect dict with keys: ‘effect’, ‘se’, ‘t_stat’, ‘p_value’, ‘conf_int’, ‘n_obs’.
- Type:
dict, optional
- group_effects
Dictionary mapping cohort g to effect dict.
- Type:
dict, optional
- stacked_data
Full stacked dataset with _sub_exp, _event_time, _D_sa, _Q_weight columns. Accessible for custom analysis.
- Type:
pd.DataFrame
- groups
Adoption events in the trimmed set (Omega_kappa).
- Type:
- trimmed_groups
Adoption events excluded by IC1/IC2.
- Type:
- time_periods
All time periods in the original data.
- Type:
- n_obs
Number of observations in the original data.
- Type:
- n_stacked_obs
Number of observations in the stacked dataset.
- Type:
- n_sub_experiments
Number of sub-experiments in the stack.
- Type:
- n_treated_units
Distinct treated units across trimmed set.
- Type:
- n_control_units
Distinct control units across trimmed set.
- Type:
- kappa_pre
Pre-treatment event-time window size.
- Type:
- kappa_post
Post-treatment event-time window size.
- Type:
- weighting
Weighting scheme used.
- Type:
- clean_control
Clean control definition used.
- Type:
- alpha
Significance level used.
- Type:
Methods
summary([alpha])Generate formatted summary of estimation results.
print_summary([alpha])Print summary to stdout.
to_dataframe([level])Convert results to DataFrame.
- overall_att: float
- overall_se: float
- overall_t_stat: float
- overall_p_value: float
- stacked_data: DataFrame
- n_obs: int = 0
- n_stacked_obs: int = 0
- n_sub_experiments: int = 0
- n_treated_units: int = 0
- n_control_units: int = 0
- kappa_pre: int = 1
- kappa_post: int = 1
- weighting: str = 'aggregate'
- clean_control: str = 'not_yet_treated'
- alpha: float = 0.05
- anticipation: int = 0
- property att: float
- property se: float
- property p_value: float
- property t_stat: float
- property coef_var: float
SE / abs(overall ATT). NaN when ATT is 0 or SE non-finite.
- Type:
Coefficient of variation
- summary(alpha=None)[source]
Generate formatted summary of estimation results.
- print_summary(alpha=None)[source]
Print summary to stdout.
- Parameters:
alpha (float | None)
- Return type:
None
- to_dataframe(level='event_study')[source]
Convert results to DataFrame.
- Parameters:
level (str, default="event_study") – Level of aggregation: - “event_study”: Event study effects by relative time - “group”: Group (cohort) effects
- Returns:
Results as DataFrame.
- Return type:
pd.DataFrame
- property is_significant: bool
Check if overall ATT is significant.
- property significance_stars: str
Significance stars for overall ATT.
- __init__(overall_att, overall_se, overall_t_stat, overall_p_value, overall_conf_int, event_study_effects, group_effects, stacked_data, groups=<factory>, trimmed_groups=<factory>, time_periods=<factory>, n_obs=0, n_stacked_obs=0, n_sub_experiments=0, n_treated_units=0, n_control_units=0, kappa_pre=1, kappa_post=1, weighting='aggregate', clean_control='not_yet_treated', alpha=0.05, anticipation=0, survey_metadata=None)
- Parameters:
overall_att (float)
overall_se (float)
overall_t_stat (float)
overall_p_value (float)
stacked_data (DataFrame)
n_obs (int)
n_stacked_obs (int)
n_sub_experiments (int)
n_treated_units (int)
n_control_units (int)
kappa_pre (int)
kappa_post (int)
weighting (str)
clean_control (str)
alpha (float)
anticipation (int)
survey_metadata (Any | None)
- Return type:
None
Convenience Function#
- diff_diff.stacked_did(data, outcome, unit, time, first_treat, kappa_pre=1, kappa_post=1, aggregate=None, population=None, survey_design=None, **kwargs)[source]#
Convenience function for stacked DiD estimation.
This is a shortcut for creating a StackedDiD estimator and calling fit().
- Parameters:
data (pd.DataFrame) – Panel data.
outcome (str) – Outcome variable column name.
unit (str) – Unit identifier column name.
time (str) – Time period column name.
first_treat (str) – Column indicating first treatment period (0 or inf for never-treated).
kappa_pre (int, default=1) – Pre-treatment event-time periods.
kappa_post (int, default=1) – Post-treatment event-time periods.
aggregate (str, optional) – Aggregation mode: None, “simple”, or “event_study”.
population (str, optional) – Population column for weighting=”population”.
survey_design (SurveyDesign, optional) – Survey design specification for design-based inference.
**kwargs – Additional keyword arguments passed to StackedDiD constructor.
- Returns:
Estimation results.
- Return type:
Examples
>>> from diff_diff import stacked_did, generate_staggered_data >>> data = generate_staggered_data(seed=42) >>> results = stacked_did(data, 'outcome', 'unit', 'period', ... 'first_treat', kappa_pre=2, kappa_post=2, ... aggregate='event_study') >>> results.print_summary()
Example Usage#
Basic usage:
from diff_diff import StackedDiD, generate_staggered_data
data = generate_staggered_data(n_units=200, n_periods=12,
cohort_periods=[4, 6, 8], seed=42)
est = StackedDiD(kappa_pre=2, kappa_post=2)
results = est.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='event_study')
results.print_summary()
Accessing the stacked dataset:
# The stacked data is available for custom analysis
stacked = results.stacked_data
print(stacked[['unit', 'period', '_sub_exp', '_event_time', '_D_sa', '_Q_weight']].head())
Different weighting schemes:
# Population-weighted ATT (requires population column)
est = StackedDiD(kappa_pre=2, kappa_post=2, weighting='population')
results = est.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
population='pop_size')
# Sample-share weighted ATT
est = StackedDiD(kappa_pre=2, kappa_post=2, weighting='sample_share')
results = est.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
Comparison with Other Staggered Estimators#
Feature |
Stacked DiD |
Callaway-Sant’Anna |
|---|---|---|
Approach |
Pooled WLS on stacked sub-experiments |
Separate group-time regressions |
Compositional balance |
Enforced by IC1/IC2 trimming |
Via balanced event study aggregation |
Target parameter |
Trimmed aggregate ATT |
Weighted average of ATT(g,t) |
Custom analysis |
Full stacked dataset accessible |
Group-time effects accessible |
Covariates |
Not yet supported |
Supported (OR, IPW, DR) |