{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Imputation DiD (Borusyak, Jaravel & Spiess 2024)\n", "\n", "This tutorial demonstrates the `ImputationDiD` estimator, which implements the efficient imputation approach from Borusyak, Jaravel & Spiess (2024), \"Revisiting Event-Study Designs: Robust and Efficient Estimation\", *Review of Economic Studies*.\n", "\n", "**When to use ImputationDiD:**\n", "- Staggered adoption settings where treatment effects may be **homogeneous** across cohorts and time — produces ~50% shorter CIs than Callaway-Sant'Anna\n", "- When you want to use **all untreated observations** (never-treated + not-yet-treated) for maximum efficiency\n", "- As a complement to Callaway-Sant'Anna or Sun-Abraham: if all three agree, results are robust; if they disagree, investigate heterogeneity" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "from diff_diff import (\n", " ImputationDiD, CallawaySantAnna, SunAbraham,\n", " generate_staggered_data, plot_event_study\n", ")\n", "\n", "# For nicer plots (optional)\n", "try:\n", " import matplotlib.pyplot as plt\n", " plt.style.use('seaborn-v0_8-whitegrid')\n", " HAS_MATPLOTLIB = True\n", "except ImportError:\n", " HAS_MATPLOTLIB = False\n", " print(\"matplotlib not installed - visualization examples will be skipped\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Usage\n", "\n", "The imputation estimator follows a simple three-step process:\n", "1. Estimate unit and time fixed effects using only untreated observations\n", "2. Impute counterfactual Y(0) for treated observations\n", "3. Aggregate imputed treatment effects with researcher-chosen weights" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate staggered adoption data with known treatment effect\n", "data = generate_staggered_data(n_units=300, n_periods=10, treatment_effect=2.0, seed=42)\n", "\n", "# Fit the imputation estimator\n", "est = ImputationDiD()\n", "results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')\n", "results.print_summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": "## Event Study with Pre-Trend Diagnostics\n\nEvent study aggregation estimates treatment effects at each relative time horizon. Setting `pretrends=True` adds **pre-period coefficients** (negative horizons) to the event study, enabling a diagnostic check of the parallel trends assumption.\n\nUnder parallel trends, pre-period coefficients should cluster around zero — indicating no differential trends before treatment. The reference period (h = -1) is normalized to zero by construction." }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "# Fit with event study aggregation and pre-period coefficients\nest = ImputationDiD(pretrends=True)\nresults_es = est.fit(data, outcome='outcome', unit='unit', time='period',\n first_treat='first_treat', aggregate='event_study')\n\n# Plot event study — pre-period region is automatically shaded\nif HAS_MATPLOTLIB:\n plot_event_study(results_es, title='Imputation DiD Event Study (with Pre-Trends)')\nelse:\n print(\"Install matplotlib to see visualizations: pip install matplotlib\")" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# View event study effects as a table\n", "results_es.to_dataframe(level='event_study')" ] }, { "cell_type": "markdown", "metadata": {}, "source": "## Formal Pre-Trend Test\n\nThe event study plot above gives a **visual** diagnostic — do pre-period coefficients look close to zero? For a **statistical** check, `pretrend_test()` runs a Wald F-test on whether all pre-treatment leads are jointly zero (Equation 9 in the paper). This complements the plot: the eye spots patterns, the F-test quantifies evidence consistent with parallel trends.\n\nNote: `pretrend_test()` does not require `pretrends=True` — it runs its own internal lead regression on untreated observations, independent of the treatment effect estimator (Proposition 9). This avoids the pre-testing problem identified by Roth (2022)." }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run pre-trend test\n", "pt = results.pretrend_test(n_leads=3)\n", "print(f\"F-statistic: {pt['f_stat']:.3f}\")\n", "print(f\"P-value: {pt['p_value']:.4f}\")\n", "print(f\"Leads tested: {pt['n_leads']}\")\n", "print(f\"\\nConclusion: {'Fail to reject' if pt['p_value'] > 0.05 else 'Reject'} parallel trends at 5% level\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparison with Other Estimators\n", "\n", "Under homogeneous treatment effects, ImputationDiD, Callaway-Sant'Anna, and Sun-Abraham should produce similar point estimates. The key difference is efficiency — ImputationDiD produces shorter confidence intervals." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Fit all three estimators on the same data\n", "imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n", " time='period', first_treat='first_treat')\n", "cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',\n", " time='period', first_treat='first_treat')\n", "sa = SunAbraham().fit(data, outcome='outcome', unit='unit',\n", " time='period', first_treat='first_treat')\n", "\n", "print(\"Estimator Comparison (True effect = 2.0)\")\n", "print(\"=\" * 55)\n", "print(f\"{'Estimator':<25} {'ATT':>8} {'SE':>8} {'CI Width':>10}\")\n", "print(\"-\" * 55)\n", "\n", "for name, r in [(\"ImputationDiD\", imp), (\"CallawaySantAnna\", cs), (\"SunAbraham\", sa)]:\n", " ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]\n", " print(f\"{name:<25} {r.overall_att:>8.3f} {r.overall_se:>8.3f} {ci_width:>10.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Group Aggregation\n", "\n", "Group aggregation estimates average treatment effects by treatment cohort (groups defined by first treatment period)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Fit with group aggregation\n", "results_grp = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n", " time='period', first_treat='first_treat',\n", " aggregate='group')\n", "results_grp.to_dataframe(level='group')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced Features\n", "\n", "### Anticipation\n", "\n", "If treatment effects begin before the official treatment date, use the `anticipation` parameter to account for this." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Account for 1 period of anticipation\n", "est_antic = ImputationDiD(anticipation=1)\n", "results_antic = est_antic.fit(data, outcome='outcome', unit='unit',\n", " time='period', first_treat='first_treat')\n", "print(f\"ATT (no anticipation): {results.overall_att:.3f}\")\n", "print(f\"ATT (1-period anticipation): {results_antic.overall_att:.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Auxiliary Model Partition\n", "\n", "The `aux_partition` parameter controls the auxiliary model partition for the conservative variance estimator (Theorem 3). Finer partitions give tighter SEs but may overfit with few observations per group." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare different partition choices\n", "for partition in ['cohort_horizon', 'cohort', 'horizon']:\n", " r = ImputationDiD(aux_partition=partition).fit(\n", " data, outcome='outcome', unit='unit',\n", " time='period', first_treat='first_treat')\n", " print(f\"aux_partition='{partition}': ATT={r.overall_att:.3f}, SE={r.overall_se:.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "| Feature | ImputationDiD | CallawaySantAnna | SunAbraham |\n", "|---------|--------------|------------------|------------|\n", "| **Approach** | Impute Y(0) via FE model | Group-time ATT(g,t) | Saturated regression |\n", "| **Efficiency** | Most efficient under homogeneity | Less efficient | Least efficient |\n", "| **Robustness** | Requires homogeneity for efficiency | Fully robust to heterogeneity | Robust to heterogeneity |\n", "| **Control group** | All untreated (always) | Never-treated or not-yet-treated | Never-treated |\n", "| **Best for** | Homogeneous effects, maximum power | Heterogeneous effects, flexible | Robustness check |\n", "\n", "**Reference:** Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. *Review of Economic Studies*, 91(6), 3253-3285." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 4 }