diff_diff.TwoStageDiD#
- class diff_diff.TwoStageDiD[source]#
Bases:
TwoStageDiDBootstrapMixinGardner (2022) two-stage Difference-in-Differences estimator.
This estimator addresses TWFE bias under heterogeneous treatment effects by: 1. Estimating unit + time FEs on untreated observations only 2. Residualizing ALL outcomes using estimated FEs 3. Regressing residualized outcomes on treatment indicators
Point estimates are identical to ImputationDiD (Borusyak et al. 2024). The key difference is the variance estimator: TwoStageDiD uses a GMM sandwich variance that accounts for first-stage estimation uncertainty, while ImputationDiD uses the conservative variance from Theorem 3.
- Parameters:
anticipation (int, default=0) – Number of periods before treatment where effects may occur.
alpha (float, default=0.05) – Significance level for confidence intervals.
cluster (str, optional) – Column name for cluster-robust standard errors. If None, clusters at the unit level by default.
n_bootstrap (int, default=0) – Number of bootstrap iterations. If 0, uses analytical GMM sandwich inference.
bootstrap_weights (str, default="rademacher") – Type of bootstrap weights: “rademacher”, “mammen”, or “webb”.
seed (int, optional) – Random seed for reproducibility.
rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient: - “warn”: Issue warning and drop linearly dependent columns - “error”: Raise ValueError - “silent”: Drop columns silently
horizon_max (int, optional) – Maximum event-study horizon. If set, event study effects are only computed for abs(h) <= horizon_max.
pretrends (bool, default=False) – If True, event study includes pre-treatment horizons for visual pre-trends assessment. Pre-period effects should be ~0 under parallel trends. Only affects event_study aggregation; overall ATT and group aggregation are unchanged.
vcov_type (str, default="hc1") – Variance estimator family. Permanently narrow to
{"hc1"}— the Gardner (2022) two-stage GMM cluster-sandwich. Analytical-sandwich families{"classical", "hc2", "hc2_bm"}and"conley"are rejected at__init__/fit()because the GMM-corrected meat folds first-stage estimation uncertainty into the score, leaving no single hat matrix on which hat-matrix leverage or Bell-McCaffrey Satterthwaite DOF can be defined. Usecluster=<col>to select the cluster level;cluster=None(the default) clusters at the unit level, so the summary renders the unit-cluster CR1 label.
- results_#
Estimation results after calling fit().
- Type:
Examples
Basic usage:
>>> from diff_diff import TwoStageDiD, generate_staggered_data >>> data = generate_staggered_data(n_units=200, seed=42) >>> est = TwoStageDiD() >>> results = est.fit(data, outcome='outcome', unit='unit', ... time='period', first_treat='first_treat') >>> results.print_summary()
With event study:
>>> est = TwoStageDiD() >>> results = est.fit(data, outcome='outcome', unit='unit', ... time='period', first_treat='first_treat', ... aggregate='event_study') >>> from diff_diff import plot_event_study >>> plot_event_study(results)
Notes
The two-stage estimator uses ALL untreated observations (never-treated + not-yet-treated periods of eventually-treated units) to estimate the counterfactual model.
References
- Gardner, J. (2022). Two-stage differences in differences.
arXiv:2207.05943.
- Butts, K. & Gardner, J. (2022). did2s: Two-Stage
Difference-in-Differences. R Journal, 14(1), 162-173.
Methods
__init__([anticipation, alpha, cluster, ...])fit(data, outcome, unit, time, first_treat)Fit the two-stage DiD estimator.
get_params()Get estimator parameters (sklearn-compatible).
print_summary()Print summary to stdout.
set_params(**params)Set estimator parameters (sklearn-compatible).
summary()Get summary of estimation results.
Attributes
n_bootstrapbootstrap_weightsalphaseedhorizon_max- __init__(anticipation=0, alpha=0.05, cluster=None, n_bootstrap=0, bootstrap_weights='rademacher', seed=None, rank_deficient_action='warn', horizon_max=None, pretrends=False, vcov_type='hc1')[source]#
- classmethod __new__(*args, **kwargs)#