References#

This library implements methods from the following scholarly works.

Difference-in-Differences#

  • Ashenfelter, O., & Card, D. (1985). “Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs.” The Review of Economics and Statistics, 67(4), 648-660. https://doi.org/10.2307/1924810

  • Card, D., & Krueger, A. B. (1994). “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania.” The American Economic Review, 84(4), 772-793. https://www.jstor.org/stable/2118030

  • Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press. Chapter 5: Differences-in-Differences.

Two-Way Fixed Effects#

  • Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.

  • Imai, K., & Kim, I. S. (2021). “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.” Political Analysis, 29(3), 405-415. https://doi.org/10.1017/pan.2020.33

Wooldridge ETWFE#

  • Wooldridge, J. M. (2025). “Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators.” Empirical Economics, 69(5), 2545-2587. (Published version of NBER Working Paper 29154.)

    Primary source for the saturated OLS ETWFE design implemented in our WooldridgeDiD class.

  • Wooldridge, J. M. (2023). “Simple Approaches to Nonlinear Difference-in-Differences with Panel Data.” The Econometrics Journal, 26(3), C31-C66. https://doi.org/10.1093/ectj/utad016

    Secondary source for the logit/Poisson QMLE (ASF-based ATT) extensions in WooldridgeDiD.

Robust Standard Errors#

  • White, H. (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica, 48(4), 817-838. https://doi.org/10.2307/1912934

  • MacKinnon, J. G., & White, H. (1985). “Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties.” Journal of Econometrics, 29(3), 305-325. https://doi.org/10.1016/0304-4076(85)90158-7

  • Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). “Robust Inference With Multiway Clustering.” Journal of Business & Economic Statistics, 29(2), 238-249. https://doi.org/10.1198/jbes.2010.07136

Wild Cluster Bootstrap#

  • Wu, C. F. J. (1986). “Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis.” The Annals of Statistics, 14(4), 1261-1295. https://doi.org/10.1214/aos/1176350142

    Source of the sqrt(n_h/(n_h-1)) Bessel small-sample correction applied to within-stratum bootstrap multipliers in diff_diff.bootstrap_utils.apply_stratum_centering().

  • Liu, R. Y. (1988). “Bootstrap Procedures Under Some Non-I.I.D. Models.” The Annals of Statistics, 16(4), 1696-1708. https://doi.org/10.1214/aos/1176351062

  • Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). “Bootstrap-Based Improvements for Inference with Clustered Errors.” The Review of Economics and Statistics, 90(3), 414-427. https://doi.org/10.1162/rest.90.3.414

  • Davidson, R., & Flachaire, E. (2008). “The Wild Bootstrap, Tamed at Last.” Journal of Econometrics, 146(1), 162-169. https://doi.org/10.1016/j.jeconom.2008.08.003

    Wild bootstrap for heteroskedastic regression. Cited as an ingredient — the within-cluster mean-zero requirement on the multipliers (the canonical η_g centering that makes the wild bootstrap recover heteroskedasticity-robust inference) is the analog applied within each stratum in diff_diff.bootstrap_utils.apply_stratum_centering(). The paper does NOT cover stratified-survey designs directly; the stratification extension is the library synthesis (see REGISTRY § “Note (Stute stratified survey-bootstrap calibration)”).

  • Kreiss, J.-P., & Lahiri, S. N. (2012). “Bootstrap Methods for Time Series.” In Time Series Analysis: Methods and Applications (Vol. 30, pp. 3-26). Elsevier. https://doi.org/10.1016/B978-0-444-53858-1.00001-6

    Bootstrap methods for time series. Cited as a methodological touchstone for the family of block-bootstrap / within-block centering arguments that underpins the wild-cluster-bootstrap extension to stratified PSU sampling. The exact composition (within-stratum demean + sqrt(n_h/(n_h-1)) Bessel rescale on PSU multipliers applied before the per-obs broadcast in a wild-residual refit-in-loop bootstrap for a nonlinear empirical-process functional like the Stute CvM) is NOT in any single paper — it is a library synthesis of these ingredients.

  • Webb, M. D. (2014). “Reworking Wild Bootstrap Based Inference for Clustered Errors.” Queen’s Economics Department Working Paper No. 1315. https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf

  • MacKinnon, J. G., & Webb, M. D. (2018). “The Wild Bootstrap for Few (Treated) Clusters.” The Econometrics Journal, 21(2), 114-135. https://doi.org/10.1111/ectj.12107

  • Djogbenou, A. A., MacKinnon, J. G., & Nielsen, M. Ø. (2019). “Asymptotic Theory and Wild Bootstrap Inference with Clustered Errors.” Journal of Econometrics, 212(2), 393-412. https://doi.org/10.1016/j.jeconom.2019.04.035

    Theorem 2 establishes empirical-process consistency of the cluster wild bootstrap for nonlinear smooth-functional consumers of per-obs residuals — the theoretical anchor for the stratified Stute CvM survey-bootstrap in diff_diff.had_pretests.stute_test() / stute_joint_pretest() (Phase 4.5 C strata extension).

  • Hlávka, Z., & Hušková, M. (2020). “Multivariate Tests of Independence and the Wild Bootstrap.” Computational Statistics & Data Analysis, 152, 107048. https://doi.org/10.1016/j.csda.2020.107048

    Vector-valued wild bootstrap consistency for empirical-process functionals (§3 condition on shared per-replicate multipliers preserving cross-component dependence). The theoretical anchor for the multi-horizon joint Stute survey-bootstrap in diff_diff.had_pretests.stute_joint_pretest() (the same psu_mults[b, :] row is shared across horizons within each replicate, preserving cross-horizon empirical-process dependence).

Nonparametric Bias-Corrected Inference#

  • Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica, 82(6), 2295-2326. https://doi.org/10.3982/ECTA11757

    Source of the bias-combined design matrix used by the in-house lprobust port that backs HeterogeneousAdoptionDiD Phase 1c (continuous-dose paths) for the bias-corrected weighted-robust SE.

  • Calonico, S., Cattaneo, M. D., & Farrell, M. H. (2018). “On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference.” Journal of the American Statistical Association, 113(522), 767-779. https://doi.org/10.1080/01621459.2017.1285776

  • Calonico, S., Cattaneo, M. D., & Farrell, M. H. (2019). “nprobust: Nonparametric Kernel-Based Estimation and Robust Bias-Corrected Inference.” Journal of Statistical Software, 91(8), 1-33. https://doi.org/10.18637/jss.v091.i08

    CCF (2018, 2019) is the underlying nprobust machinery (MSE-optimal bandwidth selection and robust bias-corrected CIs) that HeterogeneousAdoptionDiD ports in-house for the continuous-dose paths.

Survey-Design Inference (Taylor-Series Linearization)#

  • Binder, D. A. (1983). “On the Variances of Asymptotically Normal Estimators from Complex Surveys.” International Statistical Review, 51(3), 279-292. https://doi.org/10.2307/1402588

    Foundational TSL (Taylor-Series Linearization) variance derivation used across diff-diff’s survey-aware estimators (compute_survey_if_variance and the per-estimator influence-function compositions, including the dCDH and HeterogeneousAdoptionDiD survey_design= paths).

  • Gerber, I. (2026). “Design-Based Variance Estimation for Modern Heterogeneity-Robust Difference-in-Differences Estimators.” arXiv:2605.04124 (stat.ME). https://arxiv.org/abs/2605.04124

    Proposition 1 shows that the influence-function representations of 15 modern DiD estimators (including TwoStageDiD, explicitly derived in the Appendix) satisfy Binder’s (1983) smoothness conditions, so standard stratified-cluster Taylor linearization produces design-consistent SEs. SpilloverDiD’s Wave E.1 survey-design integration composes this result with the Wave D Gardner GMM first-stage uncertainty correction; see docs/methodology/REGISTRY.md SpilloverDiD section “Variance (Wave E.1)”.

Placebo Tests and DiD Diagnostics#

  • Bertrand, M., Duflo, E., & Mullainathan, S. (2004). “How Much Should We Trust Differences-in-Differences Estimates?” The Quarterly Journal of Economics, 119(1), 249-275. https://doi.org/10.1162/003355304772839588

Synthetic Control Method#

  • Abadie, A., & Gardeazabal, J. (2003). “The Economic Costs of Conflict: A Case Study of the Basque Country.” The American Economic Review, 93(1), 113-132. https://doi.org/10.1257/000282803321455188

  • Abadie, A., Diamond, A., & Hainmueller, J. (2010). “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association, 105(490), 493-505. https://doi.org/10.1198/jasa.2009.ap08746

  • Abadie, A., Diamond, A., & Hainmueller, J. (2015). “Comparative Politics and the Synthetic Control Method.” American Journal of Political Science, 59(2), 495-510. https://doi.org/10.1111/ajps.12116

Synthetic Difference-in-Differences#

  • Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). “Synthetic Difference-in-Differences.” American Economic Review, 111(12), 4088-4118. https://doi.org/10.1257/aer.20190159

Triply Robust Panel (TROP)#

  • Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025). “Triply Robust Panel Estimators.” Working Paper. https://arxiv.org/abs/2508.21536

    This paper introduces the TROP estimator which combines three robustness components:

    • Factor model adjustment: Low-rank factor structure via SVD removes unobserved confounders

    • Unit weights: Synthetic control style weighting for optimal comparison

    • Time weights: SDID style time weighting for informative pre-periods

    TROP is particularly useful when there are unobserved time-varying confounders with a factor structure that affect different units differently over time.

Triple Difference (DDD)#

  • Ortiz-Villavicencio, M., & Sant’Anna, P. H. C. (2025). “Better Understanding Triple Differences Estimators.” Working Paper. https://arxiv.org/abs/2505.09942

    This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The TripleDifference class implements their regression adjustment, inverse probability weighting, and doubly robust estimators.

  • Gruber, J. (1994). “The Incidence of Mandated Maternity Benefits.” American Economic Review, 84(3), 622-641. https://www.jstor.org/stable/2118071

    Classic paper introducing the Triple Difference design for policy evaluation.

  • Olden, A., & Møen, J. (2022). “The Triple Difference Estimator.” The Econometrics Journal, 25(3), 531-553. https://doi.org/10.1093/ectj/utac010

Honest DiD / Sensitivity Analysis#

The HonestDiD module implements sensitivity analysis methods for relaxing the parallel trends assumption.

  • Rambachan, A., & Roth, J. (2023). “A More Credible Approach to Parallel Trends.” The Review of Economic Studies, 90(5), 2555-2591. https://doi.org/10.1093/restud/rdad018

    This paper introduces the “Honest DiD” framework implemented in our HonestDiD class:

    • Relative Magnitudes (ΔRM): Bounds post-treatment violations by a multiple of observed pre-treatment violations

    • Smoothness (ΔSD): Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends

    • Breakdown Analysis: Finding the smallest violation magnitude that would overturn conclusions

    • Robust Confidence Intervals: Valid inference under partial identification

  • Roth, J., & Sant’Anna, P. H. C. (2023). “When Is Parallel Trends Sensitive to Functional Form?” Econometrica, 91(2), 737-747. https://doi.org/10.3982/ECTA19402

    Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate.

Multi-Period and Staggered Adoption#

  • Borusyak, K., Jaravel, X., & Spiess, J. (2024). “Revisiting Event-Study Designs: Robust and Efficient Estimation.” Review of Economic Studies, 91(6), 3253-3285. https://doi.org/10.1093/restud/rdae007

    This paper introduces the imputation estimator implemented in our ImputationDiD class:

    • Efficient imputation: OLS on untreated observations, impute counterfactuals, aggregate

    • Conservative variance: Theorem 3 clustered variance estimator with auxiliary model

    • Pre-trend test: Independent of treatment effect estimation (Proposition 9)

    • Efficiency gains: ~50% shorter CIs than Callaway-Sant’Anna under homogeneous effects

  • Callaway, B., & Sant’Anna, P. H. C. (2021). “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics, 225(2), 200-230. https://doi.org/10.1016/j.jeconom.2020.12.001

  • Sant’Anna, P. H. C., & Zhao, J. (2020). “Doubly Robust Difference-in-Differences Estimators.” Journal of Econometrics, 219(1), 101-122. https://doi.org/10.1016/j.jeconom.2020.06.003

  • Sun, L., & Abraham, S. (2021). “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics, 225(2), 175-199. https://doi.org/10.1016/j.jeconom.2020.09.006

  • Gardner, J. (2022). “Two-stage differences in differences.” arXiv preprint arXiv:2207.05943. https://arxiv.org/abs/2207.05943

  • Butts, K., & Gardner, J. (2022). “did2s: Two-Stage Difference-in-Differences.” The R Journal, 14(1), 162-173. https://doi.org/10.32614/RJ-2022-048

  • Butts, K. (2023). “Difference-in-Differences with Spatial Spillovers.” arXiv:2105.03737v3 (originally posted 2021). https://arxiv.org/abs/2105.03737

    Identifies the ring-indicator estimator implemented in our SpilloverDiD class. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff = d_bar.

  • Conley, T. G. (1999). “GMM Estimation with Cross Sectional Dependence.” Journal of Econometrics, 92(1), 1-45. https://doi.org/10.1016/S0304-4076(98)00084-0

    Primary source for the Conley spatial-HAC variance estimator. Equations 5-9 derive the spatial-kernel cross-product meat. Our diff_diff/conley.py implements the practitioner specializations (Bartlett / uniform kernels with haversine / euclidean metrics) cited in our SpilloverDiD Wave A/D Conley path and composed with Binder TSL + Gardner GMM in Wave E.2 (_compute_stratified_conley_meat at diff_diff/two_stage.py).

  • de Chaisemartin, C., & D’Haultfœuille, X. (2020). “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169

  • de Chaisemartin, C., & D’Haultfœuille, X. (2022, revised 2024). “Difference-in-Differences Estimators of Intertemporal Treatment Effects.” NBER Working Paper 29873. https://www.nber.org/papers/w29873

    Dynamic companion to the 2020 paper. Web Appendix Section 3.7.3 contains the cohort-recentered plug-in variance formula implemented in our ChaisemartinDHaultfoeuille class.

  • Goodman-Bacon, A. (2021). “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics, 225(2), 254-277. https://doi.org/10.1016/j.jeconom.2021.03.014

  • Newey, W. K., & West, K. D. (1987). “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica, 55(3), 703-708. https://doi.org/10.2307/1913610

    Primary source for the Bartlett serial-HAC kernel weights 1 - |t-s|/(L+1) used in time-series and panel HAC variance estimators. Our diff_diff/conley.py panel-block path at L949-965 hardcodes this kernel for the within-unit serial component (mirrors R conleyreg::time_dist); SpilloverDiD’s Wave E.2 follow-up composes the same kernel weights with Binder/Gerber TSL + Wave D Gardner GMM correction for the panel-block survey case (_compute_stratified_serial_bartlett_meat at diff_diff/two_stage.py; see REGISTRY section “Variance (Wave E.2 follow-up)”).

  • Wing, C., Freedman, S. M., & Hollingsworth, A. (2024). “Stacked Difference-in-Differences.” NBER Working Paper 32054. https://www.nber.org/papers/w32054

  • Chen, X., Sant’Anna, P. H. C., & Xie, H. (2025). “Efficient Difference-in-Differences and Event Study Estimators.” Working Paper.

    Primary source for the optimal-weighting / PT-All / PT-Post efficient DiD implemented in our EfficientDiD class.

  • Baker, A., Callaway, B., Cunningham, S., Goodman-Bacon, A., & Sant’Anna, P. H. C. (2025). “Difference-in-Differences Designs: A Practitioner’s Guide.” arXiv preprint arXiv:2503.13323. https://arxiv.org/abs/2503.13323

    Source for the 8-step practitioner workflow surfaced via diff_diff.get_llm_guide("practitioner") and the README ## Practitioner Workflow section. See docs/methodology/REGISTRY.md for the diff-diff renumbering and per-step deviations.

Continuous Treatment DiD#

  • Callaway, B., Goodman-Bacon, A., & Sant’Anna, P. H. C. (2024). “Difference-in-Differences with a Continuous Treatment.” NBER Working Paper 32117. https://www.nber.org/papers/w32117

    Primary source for ATT(d), ACRT, dose-response curves, and B-spline flexibility implemented in our ContinuousDiD class.

Heterogeneous Adoption (No-Untreated Designs)#

  • de Chaisemartin, C., Ciccia, D., D’Haultfœuille, X., & Knau, F. (2026). “Difference-in-Differences Estimators When No Unit Remains Untreated.” arXiv preprint arXiv:2405.04465v6. https://arxiv.org/abs/2405.04465

    Primary source for the Weighted Average Slope (WAS) estimator and its multi-period event-study extension implemented in our HeterogeneousAdoptionDiD class. Targets panels where no unit remains untreated at the post period and treatment dose D_{g,2} is nonnegative, using local-linear regression at the dose support boundary - both Design 1’ (the QUG case with = 0) and Design 1 (no QUG with > 0) are supported.

Power Analysis#

  • Bloom, H. S. (1995). “Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs.” Evaluation Review, 19(5), 547-556. https://doi.org/10.1177/0193841X9501900504

  • Burlig, F., Preonas, L., & Woerman, M. (2020). “Panel Data and Experimental Design.” Journal of Development Economics, 144, 102458. https://doi.org/10.1016/j.jdeveco.2020.102458

    Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings.

  • Djimeu, E. W., & Houndolo, D.-G. (2016). “Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination.” Journal of Development Effectiveness, 8(4), 508-527. https://doi.org/10.1080/19439342.2016.1244555

General Causal Inference#

  • Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.

  • Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press. https://mixtape.scunning.com/