References#
This library implements methods from the following scholarly works.
Difference-in-Differences#
Ashenfelter, O., & Card, D. (1985). “Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs.” The Review of Economics and Statistics, 67(4), 648-660. https://doi.org/10.2307/1924810
Card, D., & Krueger, A. B. (1994). “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania.” The American Economic Review, 84(4), 772-793. https://www.jstor.org/stable/2118030
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press. Chapter 5: Differences-in-Differences.
Two-Way Fixed Effects#
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.
Imai, K., & Kim, I. S. (2021). “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.” Political Analysis, 29(3), 405-415. https://doi.org/10.1017/pan.2020.33
Wooldridge ETWFE#
Wooldridge, J. M. (2025). “Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators.” Empirical Economics, 69(5), 2545-2587. (Published version of NBER Working Paper 29154.)
Primary source for the saturated OLS ETWFE design implemented in our
WooldridgeDiDclass.Wooldridge, J. M. (2023). “Simple Approaches to Nonlinear Difference-in-Differences with Panel Data.” The Econometrics Journal, 26(3), C31-C66. https://doi.org/10.1093/ectj/utad016
Secondary source for the logit/Poisson QMLE (ASF-based ATT) extensions in
WooldridgeDiD.
Robust Standard Errors#
White, H. (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica, 48(4), 817-838. https://doi.org/10.2307/1912934
MacKinnon, J. G., & White, H. (1985). “Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties.” Journal of Econometrics, 29(3), 305-325. https://doi.org/10.1016/0304-4076(85)90158-7
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). “Robust Inference With Multiway Clustering.” Journal of Business & Economic Statistics, 29(2), 238-249. https://doi.org/10.1198/jbes.2010.07136
Wild Cluster Bootstrap#
Wu, C. F. J. (1986). “Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis.” The Annals of Statistics, 14(4), 1261-1295. https://doi.org/10.1214/aos/1176350142
Source of the
sqrt(n_h/(n_h-1))Bessel small-sample correction applied to within-stratum bootstrap multipliers indiff_diff.bootstrap_utils.apply_stratum_centering().Liu, R. Y. (1988). “Bootstrap Procedures Under Some Non-I.I.D. Models.” The Annals of Statistics, 16(4), 1696-1708. https://doi.org/10.1214/aos/1176351062
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). “Bootstrap-Based Improvements for Inference with Clustered Errors.” The Review of Economics and Statistics, 90(3), 414-427. https://doi.org/10.1162/rest.90.3.414
Davidson, R., & Flachaire, E. (2008). “The Wild Bootstrap, Tamed at Last.” Journal of Econometrics, 146(1), 162-169. https://doi.org/10.1016/j.jeconom.2008.08.003
Wild bootstrap for heteroskedastic regression. Cited as an ingredient — the within-cluster mean-zero requirement on the multipliers (the canonical
η_gcentering that makes the wild bootstrap recover heteroskedasticity-robust inference) is the analog applied within each stratum indiff_diff.bootstrap_utils.apply_stratum_centering(). The paper does NOT cover stratified-survey designs directly; the stratification extension is the library synthesis (see REGISTRY § “Note (Stute stratified survey-bootstrap calibration)”).Kreiss, J.-P., & Lahiri, S. N. (2012). “Bootstrap Methods for Time Series.” In Time Series Analysis: Methods and Applications (Vol. 30, pp. 3-26). Elsevier. https://doi.org/10.1016/B978-0-444-53858-1.00001-6
Bootstrap methods for time series. Cited as a methodological touchstone for the family of block-bootstrap / within-block centering arguments that underpins the wild-cluster-bootstrap extension to stratified PSU sampling. The exact composition (within-stratum demean +
sqrt(n_h/(n_h-1))Bessel rescale on PSU multipliers applied before the per-obs broadcast in a wild-residual refit-in-loop bootstrap for a nonlinear empirical-process functional like the Stute CvM) is NOT in any single paper — it is a library synthesis of these ingredients.Webb, M. D. (2014). “Reworking Wild Bootstrap Based Inference for Clustered Errors.” Queen’s Economics Department Working Paper No. 1315. https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf
MacKinnon, J. G., & Webb, M. D. (2018). “The Wild Bootstrap for Few (Treated) Clusters.” The Econometrics Journal, 21(2), 114-135. https://doi.org/10.1111/ectj.12107
Djogbenou, A. A., MacKinnon, J. G., & Nielsen, M. Ø. (2019). “Asymptotic Theory and Wild Bootstrap Inference with Clustered Errors.” Journal of Econometrics, 212(2), 393-412. https://doi.org/10.1016/j.jeconom.2019.04.035
Theorem 2 establishes empirical-process consistency of the cluster wild bootstrap for nonlinear smooth-functional consumers of per-obs residuals — the theoretical anchor for the stratified Stute CvM survey-bootstrap in
diff_diff.had_pretests.stute_test()/stute_joint_pretest()(Phase 4.5 C strata extension).Hlávka, Z., & Hušková, M. (2020). “Multivariate Tests of Independence and the Wild Bootstrap.” Computational Statistics & Data Analysis, 152, 107048. https://doi.org/10.1016/j.csda.2020.107048
Vector-valued wild bootstrap consistency for empirical-process functionals (§3 condition on shared per-replicate multipliers preserving cross-component dependence). The theoretical anchor for the multi-horizon joint Stute survey-bootstrap in
diff_diff.had_pretests.stute_joint_pretest()(the samepsu_mults[b, :]row is shared across horizons within each replicate, preserving cross-horizon empirical-process dependence).
Nonparametric Bias-Corrected Inference#
Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica, 82(6), 2295-2326. https://doi.org/10.3982/ECTA11757
Source of the bias-combined design matrix used by the in-house
lprobustport that backsHeterogeneousAdoptionDiDPhase 1c (continuous-dose paths) for the bias-corrected weighted-robust SE.Calonico, S., Cattaneo, M. D., & Farrell, M. H. (2018). “On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference.” Journal of the American Statistical Association, 113(522), 767-779. https://doi.org/10.1080/01621459.2017.1285776
Calonico, S., Cattaneo, M. D., & Farrell, M. H. (2019). “nprobust: Nonparametric Kernel-Based Estimation and Robust Bias-Corrected Inference.” Journal of Statistical Software, 91(8), 1-33. https://doi.org/10.18637/jss.v091.i08
CCF (2018, 2019) is the underlying
nprobustmachinery (MSE-optimal bandwidth selection and robust bias-corrected CIs) thatHeterogeneousAdoptionDiDports in-house for the continuous-dose paths.
Survey-Design Inference (Taylor-Series Linearization)#
Binder, D. A. (1983). “On the Variances of Asymptotically Normal Estimators from Complex Surveys.” International Statistical Review, 51(3), 279-292. https://doi.org/10.2307/1402588
Foundational TSL (Taylor-Series Linearization) variance derivation used across diff-diff’s survey-aware estimators (
compute_survey_if_varianceand the per-estimator influence-function compositions, including the dCDH and HeterogeneousAdoptionDiDsurvey_design=paths).Gerber, I. (2026). “Design-Based Variance Estimation for Modern Heterogeneity-Robust Difference-in-Differences Estimators.” arXiv:2605.04124 (stat.ME). https://arxiv.org/abs/2605.04124
Proposition 1 shows that the influence-function representations of 15 modern DiD estimators (including TwoStageDiD, explicitly derived in the Appendix) satisfy Binder’s (1983) smoothness conditions, so standard stratified-cluster Taylor linearization produces design-consistent SEs. SpilloverDiD’s Wave E.1 survey-design integration composes this result with the Wave D Gardner GMM first-stage uncertainty correction; see
docs/methodology/REGISTRY.mdSpilloverDiD section “Variance (Wave E.1)”.
Placebo Tests and DiD Diagnostics#
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). “How Much Should We Trust Differences-in-Differences Estimates?” The Quarterly Journal of Economics, 119(1), 249-275. https://doi.org/10.1162/003355304772839588
Synthetic Control Method#
Abadie, A., & Gardeazabal, J. (2003). “The Economic Costs of Conflict: A Case Study of the Basque Country.” The American Economic Review, 93(1), 113-132. https://doi.org/10.1257/000282803321455188
Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association, 105(490), 493-505. https://doi.org/10.1198/jasa.2009.ap08746
Abadie, A., Diamond, A., & Hainmueller, J. (2015). “Comparative Politics and the Synthetic Control Method.” American Journal of Political Science, 59(2), 495-510. https://doi.org/10.1111/ajps.12116
Synthetic Difference-in-Differences#
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review, 111(12), 4088-4118. https://doi.org/10.1257/aer.20190159
Triply Robust Panel (TROP)#
Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025). “Triply Robust Panel Estimators.” Working Paper. https://arxiv.org/abs/2508.21536
This paper introduces the TROP estimator which combines three robustness components:
Factor model adjustment: Low-rank factor structure via SVD removes unobserved confounders
Unit weights: Synthetic control style weighting for optimal comparison
Time weights: SDID style time weighting for informative pre-periods
TROP is particularly useful when there are unobserved time-varying confounders with a factor structure that affect different units differently over time.
Triple Difference (DDD)#
Ortiz-Villavicencio, M., & Sant’Anna, P. H. C. (2025). “Better Understanding Triple Differences Estimators.” Working Paper. https://arxiv.org/abs/2505.09942
This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The
TripleDifferenceclass implements their regression adjustment, inverse probability weighting, and doubly robust estimators.Gruber, J. (1994). “The Incidence of Mandated Maternity Benefits.” American Economic Review, 84(3), 622-641. https://www.jstor.org/stable/2118071
Classic paper introducing the Triple Difference design for policy evaluation.
Olden, A., & Møen, J. (2022). “The Triple Difference Estimator.” The Econometrics Journal, 25(3), 531-553. https://doi.org/10.1093/ectj/utac010
Parallel Trends and Pre-Trend Testing#
Roth, J. (2022). “Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.” American Economic Review: Insights, 4(3), 305-322. https://doi.org/10.1257/aeri.20210236
Lakens, D. (2017). “Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses.” Social Psychological and Personality Science, 8(4), 355-362. https://doi.org/10.1177/1948550617697177
Honest DiD / Sensitivity Analysis#
The HonestDiD module implements sensitivity analysis methods for relaxing the parallel trends assumption.
Rambachan, A., & Roth, J. (2023). “A More Credible Approach to Parallel Trends.” The Review of Economic Studies, 90(5), 2555-2591. https://doi.org/10.1093/restud/rdad018
This paper introduces the “Honest DiD” framework implemented in our
HonestDiDclass:Relative Magnitudes (ΔRM): Bounds post-treatment violations by a multiple of observed pre-treatment violations
Smoothness (ΔSD): Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends
Breakdown Analysis: Finding the smallest violation magnitude that would overturn conclusions
Robust Confidence Intervals: Valid inference under partial identification
Roth, J., & Sant’Anna, P. H. C. (2023). “When Is Parallel Trends Sensitive to Functional Form?” Econometrica, 91(2), 737-747. https://doi.org/10.3982/ECTA19402
Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.
Multi-Period and Staggered Adoption#
Borusyak, K., Jaravel, X., & Spiess, J. (2024). “Revisiting Event-Study Designs: Robust and Efficient Estimation.” Review of Economic Studies, 91(6), 3253-3285. https://doi.org/10.1093/restud/rdae007
This paper introduces the imputation estimator implemented in our
ImputationDiDclass:Efficient imputation: OLS on untreated observations, impute counterfactuals, aggregate
Conservative variance: Theorem 3 clustered variance estimator with auxiliary model
Pre-trend test: Independent of treatment effect estimation (Proposition 9)
Efficiency gains: ~50% shorter CIs than Callaway-Sant’Anna under homogeneous effects
Callaway, B., & Sant’Anna, P. H. C. (2021). “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics, 225(2), 200-230. https://doi.org/10.1016/j.jeconom.2020.12.001
Sant’Anna, P. H. C., & Zhao, J. (2020). “Doubly Robust Difference-in-Differences Estimators.” Journal of Econometrics, 219(1), 101-122. https://doi.org/10.1016/j.jeconom.2020.06.003
Sun, L., & Abraham, S. (2021). “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics, 225(2), 175-199. https://doi.org/10.1016/j.jeconom.2020.09.006
Gardner, J. (2022). “Two-stage differences in differences.” arXiv preprint arXiv:2207.05943. https://arxiv.org/abs/2207.05943
Butts, K., & Gardner, J. (2022). “did2s: Two-Stage Difference-in-Differences.” The R Journal, 14(1), 162-173. https://doi.org/10.32614/RJ-2022-048
Butts, K. (2023). “Difference-in-Differences with Spatial Spillovers.” arXiv:2105.03737v3 (originally posted 2021). https://arxiv.org/abs/2105.03737
Identifies the ring-indicator estimator implemented in our
SpilloverDiDclass. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff =d_bar.Conley, T. G. (1999). “GMM Estimation with Cross Sectional Dependence.” Journal of Econometrics, 92(1), 1-45. https://doi.org/10.1016/S0304-4076(98)00084-0
Primary source for the Conley spatial-HAC variance estimator. Equations 5-9 derive the spatial-kernel cross-product meat. Our
diff_diff/conley.pyimplements the practitioner specializations (Bartlett / uniform kernels with haversine / euclidean metrics) cited in ourSpilloverDiDWave A/D Conley path and composed with Binder TSL + Gardner GMM in Wave E.2 (_compute_stratified_conley_meatatdiff_diff/two_stage.py).de Chaisemartin, C., & D’Haultfœuille, X. (2020). “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169
de Chaisemartin, C., & D’Haultfœuille, X. (2022, revised 2024). “Difference-in-Differences Estimators of Intertemporal Treatment Effects.” NBER Working Paper 29873. https://www.nber.org/papers/w29873
Dynamic companion to the 2020 paper. Web Appendix Section 3.7.3 contains the cohort-recentered plug-in variance formula implemented in our
ChaisemartinDHaultfoeuilleclass.Goodman-Bacon, A. (2021). “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics, 225(2), 254-277. https://doi.org/10.1016/j.jeconom.2021.03.014
Newey, W. K., & West, K. D. (1987). “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica, 55(3), 703-708. https://doi.org/10.2307/1913610
Primary source for the Bartlett serial-HAC kernel weights 1 - |t-s|/(L+1) used in time-series and panel HAC variance estimators. Our
diff_diff/conley.pypanel-block path at L949-965 hardcodes this kernel for the within-unit serial component (mirrors Rconleyreg::time_dist); SpilloverDiD’s Wave E.2 follow-up composes the same kernel weights with Binder/Gerber TSL + Wave D Gardner GMM correction for the panel-block survey case (_compute_stratified_serial_bartlett_meatatdiff_diff/two_stage.py; see REGISTRY section “Variance (Wave E.2 follow-up)”).Wing, C., Freedman, S. M., & Hollingsworth, A. (2024). “Stacked Difference-in-Differences.” NBER Working Paper 32054. https://www.nber.org/papers/w32054
Chen, X., Sant’Anna, P. H. C., & Xie, H. (2025). “Efficient Difference-in-Differences and Event Study Estimators.” Working Paper.
Primary source for the optimal-weighting / PT-All / PT-Post efficient DiD implemented in our
EfficientDiDclass.Baker, A., Callaway, B., Cunningham, S., Goodman-Bacon, A., & Sant’Anna, P. H. C. (2025). “Difference-in-Differences Designs: A Practitioner’s Guide.” arXiv preprint arXiv:2503.13323. https://arxiv.org/abs/2503.13323
Source for the 8-step practitioner workflow surfaced via
diff_diff.get_llm_guide("practitioner")and the README## Practitioner Workflowsection. Seedocs/methodology/REGISTRY.mdfor the diff-diff renumbering and per-step deviations.
Continuous Treatment DiD#
Callaway, B., Goodman-Bacon, A., & Sant’Anna, P. H. C. (2024). “Difference-in-Differences with a Continuous Treatment.” NBER Working Paper 32117. https://www.nber.org/papers/w32117
Primary source for ATT(d), ACRT, dose-response curves, and B-spline flexibility implemented in our
ContinuousDiDclass.
Heterogeneous Adoption (No-Untreated Designs)#
de Chaisemartin, C., Ciccia, D., D’Haultfœuille, X., & Knau, F. (2026). “Difference-in-Differences Estimators When No Unit Remains Untreated.” arXiv preprint arXiv:2405.04465v6. https://arxiv.org/abs/2405.04465
Primary source for the Weighted Average Slope (WAS) estimator and its multi-period event-study extension implemented in our
HeterogeneousAdoptionDiDclass. Targets panels where no unit remains untreated at the post period and treatment doseD_{g,2}is nonnegative, using local-linear regression at the dose support boundary - both Design 1’ (the QUG case withd̲ = 0) and Design 1 (no QUG withd̲ > 0) are supported.
Power Analysis#
Bloom, H. S. (1995). “Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs.” Evaluation Review, 19(5), 547-556. https://doi.org/10.1177/0193841X9501900504
Burlig, F., Preonas, L., & Woerman, M. (2020). “Panel Data and Experimental Design.” Journal of Development Economics, 144, 102458. https://doi.org/10.1016/j.jdeveco.2020.102458
Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings.
Djimeu, E. W., & Houndolo, D.-G. (2016). “Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination.” Journal of Development Effectiveness, 8(4), 508-527. https://doi.org/10.1080/19439342.2016.1244555
General Causal Inference#
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.
Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press. https://mixtape.scunning.com/