{ "cells": [ { "cell_type": "markdown", "id": "t19-cell-001", "metadata": {}, "source": [ "# Tutorial 19: dCDH for Marketing Pulse Campaigns\n", "\n", "A practitioner walkthrough for measuring lift from promotional campaigns that turn on AND off across markets at staggered times. The tutorial uses the `ChaisemartinDHaultfoeuille` estimator (alias `DCDH`) - diff-diff's only estimator built for reversible (non-absorbing) treatment, where every other modern staggered estimator in the library assumes treatment is absorbing." ] }, { "cell_type": "markdown", "id": "t19-cell-002", "metadata": {}, "source": "## 1. The Marketing Pulse Problem\n\nYour team runs paid-promo pulses across 60 markets. Some markets ran the promo at the start of the quarter and turned it off as the campaign budget rolled to the next geo (leavers); others started untreated and switched the promo on at some point during the quarter (joiners). Leadership wants the average lift on weekly checkout sessions while the promo was on.\n\n**Why dCDH.** This panel has *reversible* (non-absorbing) treatment in the dCDH sense: across the panel, the promo turns on in some markets and off in others - both directions appear in the same dataset. Every other modern staggered-DiD estimator in diff-diff (Callaway-Sant'Anna, Sun-Abraham, Wooldridge ETWFE, ImputationDiD, TwoStageDiD, EfficientDiD) assumes treatment is absorbing: once treated, always treated. They simply don't apply to a panel that contains leavers. dCDH does, following [de Chaisemartin & D'Haultfoeuille (2020)](https://www.aeaweb.org/articles?id=10.1257/aer.20181169) and the [dynamic companion paper](https://www.nber.org/papers/w29873).\n\n**Scope of this tutorial.** Each market in our panel switches *at most once* during the quarter (the dCDH paper's Assumption 5, which the default analytical SE path requires). So a market is either a stable-untreated unit, a joiner that turns the promo on exactly once, a leaver that turns it off exactly once, or a stable-treated unit. dCDH does support multi-switch within-market paths (e.g., on-off-on cycles) via `drop_larger_lower=False` plus `by_path=k` for per-path effects, but that's a separate scope - see the extensions section at the end. Implementation details and any documented deviations from R's `did_multiplegt_dyn` reference live in [`docs/methodology/REGISTRY.md`](https://github.com/igerber/diff-diff/blob/main/docs/methodology/REGISTRY.md)." }, { "cell_type": "code", "id": "t19-cell-003", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "import warnings\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from diff_diff import DCDH, generate_reversible_did_data\n", "\n", "plt.style.use(\"seaborn-v0_8-whitegrid\")" ] }, { "cell_type": "markdown", "id": "t19-cell-004", "metadata": {}, "source": "## 2. The Data\n\nWe'll simulate a panel that mirrors a marketing pulse campaign:\n\n- **60 markets**, each observed for **8 weeks**\n- Some markets started the quarter with the promo on and switched it off (leavers); others started untreated and switched the promo on (joiners). Each market switches exactly once during the panel - the A5 single-switch contract (see [`docs/methodology/REGISTRY.md`](https://github.com/igerber/diff-diff/blob/main/docs/methodology/REGISTRY.md)) the analytical SE is derived under.\n- Outcome: weekly checkout sessions per market, baseline ~110\n- True treatment effect: **+12 sessions per market-week** when the promo is on, with mild cell-level heterogeneity around that average." }, { "cell_type": "code", "id": "t19-cell-005", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "raw = generate_reversible_did_data(\n", " n_groups=60,\n", " n_periods=8,\n", " pattern=\"single_switch\",\n", " initial_treat_frac=0.4,\n", " treatment_effect=12.0,\n", " heterogeneous_effects=True,\n", " effect_sd=1.5,\n", " group_fe_sd=8.0,\n", " time_trend=0.5,\n", " noise_sd=2.0,\n", " seed=46, # locked via _scratch/dcdh_tutorial/ seed-search\n", ")\n", "df = raw.rename(\n", " columns={\n", " \"group\": \"market_id\",\n", " \"period\": \"week\",\n", " \"treatment\": \"promo_on\",\n", " \"outcome\": \"sessions\",\n", " }\n", ")\n", "df[\"sessions\"] = df[\"sessions\"] + 100.0 # shift to a realistic baseline\n", "\n", "print(f\"Panel shape: {df.shape}\")\n", "print(f\"Markets: {df['market_id'].nunique()}\")\n", "print(f\"Weeks: {sorted(df['week'].unique())}\")\n", "print(f\"Sessions range: [{df['sessions'].min():.0f}, {df['sessions'].max():.0f}]\")" ] }, { "cell_type": "code", "id": "t19-cell-006", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Switcher-type counts. With pattern=\"single_switch\" every market\n", "# switches exactly once, so we have only joiners (0 → 1) and\n", "# leavers (1 → 0); no never-treated or always-treated markets by\n", "# construction.\n", "df.groupby(\"switcher_type\").size()" ] }, { "cell_type": "code", "id": "t19-cell-007", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Mean sessions over time, split by which direction the market\n", "# switched. Joiners (blue) ramp up after they turn the promo on;\n", "# leavers (red) drop off after they turn it off.\n", "first_treat = df.groupby(\"market_id\")[\"promo_on\"].first()\n", "category = df[\"market_id\"].map(\n", " lambda m: \"starts off, switches on (joiner)\" if first_treat[m] == 0 else \"starts on, switches off (leaver)\"\n", ")\n", "df_plot = df.assign(category=category)\n", "\n", "fig, ax = plt.subplots(figsize=(9, 5))\n", "for label, color in [\n", " (\"starts off, switches on (joiner)\", \"#1f77b4\"),\n", " (\"starts on, switches off (leaver)\", \"#d62728\"),\n", "]:\n", " weekly = df_plot[df_plot[\"category\"] == label].groupby(\"week\")[\"sessions\"].mean()\n", " ax.plot(weekly.index, weekly.values, label=label, color=color, marker=\"o\", linewidth=2)\n", "ax.set_xlabel(\"Week\")\n", "ax.set_ylabel(\"Mean weekly sessions\")\n", "ax.set_title(\"Marketing pulses on/off across markets - outcomes by switcher type\")\n", "ax.legend(loc=\"upper left\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "t19-cell-008", "metadata": {}, "source": [ "## 3. Fitting dCDH\n", "\n", "`DID_M` is the headline dCDH estimator: the average across periods of two pieces:\n", "\n", "- **DID_+** (joiners): markets switching `0 → 1` between consecutive periods, compared to *contemporaneously untreated* control cells.\n", "- **DID_-** (leavers): markets switching `1 → 0`, compared to *contemporaneously treated* control cells.\n", "\n", "Both pieces use only cells whose treatment status was stable across the two periods being compared - so no treated unit is ever used as a control for another treated unit. The library reports DID_+, DID_-, and their average DID_M separately, so you can see if the two halves agree." ] }, { "cell_type": "markdown", "id": "t19-cell-009", "metadata": {}, "source": [ "**Where do the controls come from?** dCDH's controls are *contemporaneously stable cells*, not a permanently-untreated comparison group. A market that's untreated at week 3 and week 4 contributes a stable-untreated cell at week 4 - even if that same market eventually turns the promo on at week 5 and keeps it on through week 8. Symmetrically, a market that's been running the promo since week 1 and is still running it at week 4 contributes a stable-treated cell at week 4. This is what lets dCDH work on panels with **no permanent never-treated markets at all** - our panel has zero never-treated and zero always-treated units, only 60 switchers. The technical condition - de Chaisemartin & D'Haultfoeuille's Assumption 11 - is **per-period**: at every period when a switcher exists, at least one stable cell of the relevant type also exists. The library checks A11 at fit time period-by-period and emits a `UserWarning` (zeroing the offending period's contribution by paper convention) if any switching period lacks stable controls. A11 is *not* automatic on single-switch panels - the test suite has a single-switch panel where joiners exist at a period with zero stable-untreated controls (`tests/test_chaisemartin_dhaultfoeuille.py::TestA11Handling::test_a11_violation_zero_in_numerator_retain_in_denominator`). On the seed and DGP we use here, the fit happens not to trigger an A11 warning, so we're in the clean regime. On your own data, check the warning output before trusting the headline." ] }, { "cell_type": "code", "id": "t19-cell-010", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "model = DCDH(twfe_diagnostic=False, placebo=False, seed=42)\n", "results = model.fit(\n", " df,\n", " outcome=\"sessions\",\n", " group=\"market_id\",\n", " time=\"week\",\n", " treatment=\"promo_on\",\n", ")\n", "print(results.summary())" ] }, { "cell_type": "markdown", "id": "t19-cell-011", "metadata": {}, "source": [ "**Reading the headline.** dCDH estimates the lift at **about 12.1 sessions per market-week** while the promo was on (95% CI: 11.3 to 12.8), recovering the true effect of 12.0 within sampling uncertainty. The CI half-width is about 0.7 sessions, which translates to a ~6% margin of error around a roughly 11% lift on a baseline of ~110 weekly sessions.\n", "\n", "(We passed `placebo=False` on this fit because Phase 1's single-lag placebo SE is `NaN` by design - the per-period aggregation path doesn't have an analytical influence-function derivation. We get valid placebo CIs from the multi-horizon path in Section 4 below, which has a proper IF.)" ] }, { "cell_type": "code", "id": "t19-cell-012", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Joiners vs leavers breakdown\n", "jl = results.to_dataframe(level=\"joiners_leavers\")\n", "jl.round(3)" ] }, { "cell_type": "markdown", "id": "t19-cell-013", "metadata": {}, "source": [ "**Reading joiners vs leavers.** Both halves should produce a positive lift in a healthy marketing-pulse design - turning the promo on increases sessions, and turning it off decreases them. Here DID_+ ≈ 12.1 (38 joiner cells) and DID_- ≈ 11.9 (22 leaver cells): both substantially positive, both within sampling uncertainty of each other and of the true effect of 12. If they had disagreed by sign or by a large margin (say one was 5 and the other was 20), that would be a heterogeneity signal worth investigating before reporting one number to leadership." ] }, { "cell_type": "markdown", "id": "t19-cell-014", "metadata": {}, "source": [ "## 4. Multi-Horizon Event Study with Bootstrap\n", "\n", "DID_M collapses the dynamic effect to one number - the average lift across all switching cells. Setting `L_max=L` instead computes `DID_l` for each horizon `l = 1..L` after each switch, plus `DID^pl_l` placebos at horizons `l = -L..-1`. This tells you whether the on-impact lift is sustained or fades, and whether the pre-treatment placebos sit on zero.\n", "\n", "With `L_max=2` we get two post-switch horizons and two placebo horizons. The multiplier bootstrap (`n_bootstrap=199`, matching the library's `ci_params.bootstrap` convention) gives valid CIs at every horizon, including the placebo horizons." ] }, { "cell_type": "markdown", "id": "t19-cell-015", "metadata": {}, "source": [ "**About the warning you're about to see.** The fit below will emit a single `UserWarning` saying *Assumption 7 (D_{g,t} >= D_{g,1}) is violated: leavers present*. This is **expected for any reversible panel** and we don't suppress it - it's the library being explicit about a methodology choice on a separate estimand:\n", "\n", "- **Assumption 7** is a monotonic-treatment-progression assumption used by the optional **cost-benefit delta** computation (a secondary aggregate the library reports for absorbing-treatment panels). On reversible panels the assumption fails by construction - leavers' treatment goes *down*, not up.\n", "- The library's response is to compute the cost-benefit delta on the full sample anyway and warn that the interpretation isn't clean. The headline `DID_M`, the joiners/leavers split, and the event-study horizons are **unaffected** by this warning - they use a different aggregation that doesn't rest on A7.\n", "\n", "So the warning is informational, points at a result we won't use in this tutorial, and is the price of admission for a reversible design. We surface it; we don't silence it." ] }, { "cell_type": "code", "id": "t19-cell-016", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Narrow filter: silence the spurious numpy RuntimeWarnings about\n", "# \"<...> encountered in matmul\" that fire only on macOS NumPy\n", "# builds linked against Apple's Accelerate BLAS framework.\n", "# Accelerate sets FP error flags during matmul on certain shapes/\n", "# values; the computation is correct (Linux / OpenBLAS users don't\n", "# see these warnings at all). See numpy issue #26669. The filter\n", "# is scoped to the matmul message pattern only - any unrelated\n", "# RuntimeWarning from the fit will still surface, and the\n", "# Assumption 7 UserWarning below is NOT suppressed (that's the\n", "# methodology warning we explained above).\n", "with warnings.catch_warnings():\n", " warnings.filterwarnings(\n", " \"ignore\",\n", " message=r\".*encountered in matmul\",\n", " category=RuntimeWarning,\n", " )\n", " model_es = DCDH(\n", " twfe_diagnostic=False, placebo=True, n_bootstrap=199, seed=42\n", " )\n", " results_es = model_es.fit(\n", " df,\n", " outcome=\"sessions\",\n", " group=\"market_id\",\n", " time=\"week\",\n", " treatment=\"promo_on\",\n", " L_max=2,\n", " )\n", "\n", "es_df = results_es.to_dataframe(level=\"event_study\")\n", "es_df.round(3)" ] }, { "cell_type": "code", "id": "t19-cell-017", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Event-study errorbar plot with bootstrap CIs.\n", "es_plot = es_df[es_df[\"horizon\"] != 0] # drop reference row\n", "is_pre = es_plot[\"horizon\"] < 0\n", "\n", "fig, ax = plt.subplots(figsize=(9, 5))\n", "ax.errorbar(\n", " es_plot.loc[is_pre, \"horizon\"],\n", " es_plot.loc[is_pre, \"effect\"],\n", " yerr=[\n", " es_plot.loc[is_pre, \"effect\"] - es_plot.loc[is_pre, \"conf_int_lower\"],\n", " es_plot.loc[is_pre, \"conf_int_upper\"] - es_plot.loc[is_pre, \"effect\"],\n", " ],\n", " fmt=\"o\", color=\"#888888\", capsize=4, label=\"Pre-treatment placebos\",\n", ")\n", "ax.errorbar(\n", " es_plot.loc[~is_pre, \"horizon\"],\n", " es_plot.loc[~is_pre, \"effect\"],\n", " yerr=[\n", " es_plot.loc[~is_pre, \"effect\"] - es_plot.loc[~is_pre, \"conf_int_lower\"],\n", " es_plot.loc[~is_pre, \"conf_int_upper\"] - es_plot.loc[~is_pre, \"effect\"],\n", " ],\n", " fmt=\"o\", color=\"#1f77b4\", capsize=4, label=\"Post-treatment effects\",\n", ")\n", "ax.axhline(0, color=\"black\", linewidth=0.7, linestyle=\"--\")\n", "ax.axvline(0, color=\"black\", linewidth=0.7, linestyle=\"--\")\n", "ax.axhline(12.0, color=\"green\", linewidth=0.8, linestyle=\":\", label=\"true effect = 12.0\")\n", "ax.set_xlabel(\"Weeks since promo switched\")\n", "ax.set_ylabel(\"Effect on weekly sessions\")\n", "ax.set_title(\"dCDH event study (L_max=2, multiplier bootstrap)\")\n", "ax.legend(loc=\"lower right\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "t19-cell-018", "metadata": {}, "source": [ "**Reading the event study.**\n", "\n", "- **Both placebo horizons** (l = -2 and l = -1) sit on zero with confidence intervals comfortably covering it. Pre-trends look parallel - we have no evidence that something other than the promo was driving session growth in the cells we're using as controls.\n", "- **On-impact effect** at l = 1 is about **+12.4 sessions** with a 95% bootstrap CI of roughly [11.4, 13.3], covering the true effect of 12.\n", "- **Sustained effect** at l = 2 is **+12.6 sessions** with CI [11.5, 13.6]. The lift didn't fade in the second week post-switch.\n", "\n", "Bootstrap CIs reflect the cohort-recentered influence-function variance with the finite-sample stability the multiplier bootstrap provides. Both horizons agree closely with each other AND with the headline `DID_M` from Section 3 - a built-in consistency check across the per-period and per-group aggregation paths." ] }, { "cell_type": "markdown", "id": "t19-cell-019", "metadata": {}, "source": [ "## 5. Communicating Results to Leadership\n", "\n", "A stakeholder-ready summary of the analysis above:\n", "\n", "> **Headline.** The pulse campaign lifted weekly checkout sessions by approximately **12 sessions per market per week** while the promo was on (95% CI: 11.3 to 12.8). On a baseline of about 110 weekly sessions per market, that's roughly an **11% lift**. *[Source: `results.overall_att` from Section 3.]*\n", ">\n", "> **Sample size and design.** 60 markets observed for 8 weeks (480 market-weeks). Of those, 38 markets started untreated and switched the promo on at some point during the quarter (joiners), and 22 markets started with the promo on and switched it off (leavers). Method: dCDH (de Chaisemartin & D'Haultfoeuille 2020) - diff-diff's only estimator built for treatment that can switch on AND off in the same panel. *[Source: switcher counts and panel shape from Section 2.]*\n", ">\n", "> **Validity evidence.** Two checks supported the result. (a) The joiners-vs-leavers split agreed: joiners produced a +12.1 lift, leavers a +11.9 lift, well within sampling uncertainty of each other and of the headline. (b) The multi-horizon placebos at l = -2 and l = -1 both sat on zero with bootstrap CIs comfortably covering it - parallel pre-trends look credible. *[Sources: joiners/leavers from Section 3, multi-horizon placebos from Section 4.]*\n", ">\n", "> **What \"+12 sessions per market per week\" means in business terms.** Across 60 markets and the weeks each one had the promo on, that's the per-market-week lift attributable to the campaign. Translate to your own revenue-per-session to compare against campaign spend, then use the per-market lift estimate to project what scaling the promo to additional markets would deliver.\n", ">\n", "> **Practical significance caveat.** The 11% lift is statistically significant (bootstrap p < 0.01 at both post-treatment horizons), and the on-impact effect persists at the second horizon - the pulse worked while it was on. Whether 11% justifies the campaign cost is a business judgment, not a statistical one. *[Sources: dynamic horizons from Section 4.]*" ] }, { "cell_type": "markdown", "id": "t19-cell-020", "metadata": {}, "source": [ "Adapt this template for your own campaign by swapping in your numbers from `results.summary()`, your own market and switcher counts, your own validity diagnostics, and your own business translation. The pattern - **headline → sample size and design → validity evidence → business interpretation → practical significance** - is the part to keep." ] }, { "cell_type": "markdown", "id": "t19-cell-021", "metadata": {}, "source": "## 6. Extensions and Where to Go Next\n\nThis tutorial covered the core dCDH workflow on a reversible panel: `DID_M` with the joiners/leavers split, plus the `L_max` multi-horizon event study with multiplier bootstrap. The library also supports several extensions we did not demonstrate here:\n\n- **Per-trajectory disaggregation** (`by_path=k`): when joiners and leavers each follow a few common treatment paths (e.g., on-off-on vs on-on-off), `by_path=k` reports the event study separately for the top-k most common observed paths. Useful for pulse campaigns where the schedule varies across markets.\n- **Group-specific linear trends** (`trends_linear=True`): allows each market to have its own pre-treatment slope, absorbing differential trends.\n- **State-set-specific trends** (`trends_nonparam=...`): allows non-parametric trends shared within state-set strata.\n- **HonestDiD sensitivity analysis** (`honest_did=True`): Rambachan-Roth (2023) bounds on the post-treatment effects under controlled parallel-trends violations, computed on the placebo event-study surface.\n- **Survey-design support** (`survey_design=...`): Taylor-series linearization with sampling weights, strata, PSU, and FPC.\n\nSee [`docs/api/chaisemartin_dhaultfoeuille.rst`](../api/chaisemartin_dhaultfoeuille.html) for the full parameter reference and [`docs/methodology/REGISTRY.md`](https://github.com/igerber/diff-diff/blob/main/docs/methodology/REGISTRY.md) for the methodology contract on each surface." }, { "cell_type": "markdown", "id": "t19-cell-022", "metadata": {}, "source": [ "**Related tutorials.**\n", "\n", "- [Tutorial 1: Basic DiD](01_basic_did.ipynb) - the 2x2 building block dCDH generalizes.\n", "- [Tutorial 2: Staggered DiD](02_staggered_did.ipynb) - Callaway-Sant'Anna for absorbing staggered adoption (when treatment doesn't turn off).\n", "- [Tutorial 5: HonestDiD](05_honest_did.ipynb) - sensitivity to parallel-trends violations on event studies; works on dCDH's placebo surface via `honest_did=True`.\n", "- [Tutorial 17: Brand Awareness Survey](17_brand_awareness_survey.ipynb) - reach for this if you have survey data with sampling weights / strata / PSU instead of a panel.\n", "- [Tutorial 18: Geo-Experiment Analysis](18_geo_experiments.ipynb) - reach for this if you have a single-launch pilot in a small number of test markets." ] }, { "cell_type": "markdown", "id": "t19-cell-023", "metadata": {}, "source": [ "**Summary: when to reach for dCDH.**\n", "\n", "1. Use dCDH when treatment is **reversible** - the panel has switchers in both directions (joiners and leavers) in the same data.\n", "2. Read joiners (`DID_+`) and leavers (`DID_-`) separately. Disagreement between the two halves is heterogeneity worth investigating before averaging into one number for stakeholders.\n", "3. Use `L_max` + multiplier bootstrap to expose the dynamic structure of the effect - is the lift on-impact only, sustained, or fading? - and to get valid placebo CIs that the Phase 1 single-lag placebo can't provide.\n", "4. Defer to follow-up tutorials for `by_path`, `trends_linear`/`trends_nonparam`, HonestDiD on dCDH's placebo surface, and the survey-design integration. Each is a single constructor or `fit()` kwarg away." ] } ], "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }