{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 8: Triple Difference (DDD) Estimation\n", "\n", "This tutorial covers the Triple Difference (DDD) estimator, which extends standard Difference-in-Differences to settings where treatment requires satisfying two criteria.\n", "\n", "## When to Use Triple Difference\n", "\n", "Triple Difference is appropriate when:\n", "\n", "1. **Treatment requires two criteria**: Units must satisfy BOTH conditions to be treated:\n", " - Belonging to a **treated group** (e.g., states that enacted a policy)\n", " - Being in an **eligible partition** (e.g., women, low-income individuals)\n", "\n", "2. **You want to relax parallel trends**: DDD allows for group-specific AND partition-specific violations of parallel trends, as long as the *differential* trend is parallel.\n", "\n", "### Classic Example: Maternity Benefits\n", "\n", "Gruber (1994) studied state mandates requiring employers to provide maternity benefits:\n", "- **Group**: States that enacted mandates vs. states that didn't\n", "- **Partition**: Women of childbearing age vs. other workers\n", "- **Outcome**: Wages\n", "\n", "Only women in mandate states were \"treated\" - the policy affected their labor costs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "from diff_diff import TripleDifference, triple_difference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating Synthetic DDD Data\n", "\n", "Let's create synthetic data that mimics a DDD setting. We'll simulate a policy that:\n", "- Was enacted in some states (`group=1`) but not others (`group=0`)\n", "- Affects only eligible individuals (`partition=1`) but not others (`partition=0`)\n", "- Has a true treatment effect of 2.0" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate DDD data using the library function\n", "from diff_diff import generate_ddd_data\n", "\n", "# Generate synthetic DDD data that mimics a policy setting:\n", "# - Enacted in some states (group=1) but not others (group=0)\n", "# - Affects only eligible individuals (partition=1) but not others (partition=0)\n", "# - Has a true treatment effect of 2.0\n", "data = generate_ddd_data(\n", " n_per_cell=200,\n", " treatment_effect=2.0,\n", " group_effect=5.0, # Main effect of being in treated group\n", " partition_effect=3.0, # Main effect of being in eligible partition\n", " time_effect=2.0, # Main effect of post-treatment period\n", " noise_sd=3.0,\n", " add_covariates=True, # Include age and education covariates\n", " seed=42\n", ")\n", "\n", "print(f\"Dataset shape: {data.shape}\")\n", "print(f\"\\nSample composition:\")\n", "print(data.groupby(['group', 'partition', 'time']).size().unstack(fill_value=0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic DDD Estimation\n", "\n", "Let's estimate the treatment effect using the `TripleDifference` class:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create and fit the DDD estimator\n", "ddd = TripleDifference(estimation_method='dr') # doubly robust (recommended)\n", "\n", "results = ddd.fit(\n", " data,\n", " outcome='outcome',\n", " group='group',\n", " partition='partition',\n", " time='time'\n", ")\n", "\n", "# Print results\n", "results.print_summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Understanding the DDD Estimand\n", "\n", "The Triple Difference can be written as:\n", "\n", "```\n", "DDD = [Y(G=1,P=1,T=1) - Y(G=1,P=1,T=0)] # Change for treated, eligible\n", " - [Y(G=1,P=0,T=1) - Y(G=1,P=0,T=0)] # Change for treated, ineligible \n", " - [Y(G=0,P=1,T=1) - Y(G=0,P=1,T=0)] # Change for control, eligible\n", " + [Y(G=0,P=0,T=1) - Y(G=0,P=0,T=0)] # Change for control, ineligible\n", "```\n", "\n", "Let's verify this manually:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compute cell means\n", "cell_means = data.groupby(['group', 'partition', 'time'])['outcome'].mean().unstack()\n", "print(\"Cell Means:\")\n", "print(cell_means)\n", "print()\n", "\n", "# Manual DDD calculation\n", "y_111 = data[(data['group']==1) & (data['partition']==1) & (data['time']==1)]['outcome'].mean()\n", "y_110 = data[(data['group']==1) & (data['partition']==1) & (data['time']==0)]['outcome'].mean()\n", "y_101 = data[(data['group']==1) & (data['partition']==0) & (data['time']==1)]['outcome'].mean()\n", "y_100 = data[(data['group']==1) & (data['partition']==0) & (data['time']==0)]['outcome'].mean()\n", "y_011 = data[(data['group']==0) & (data['partition']==1) & (data['time']==1)]['outcome'].mean()\n", "y_010 = data[(data['group']==0) & (data['partition']==1) & (data['time']==0)]['outcome'].mean()\n", "y_001 = data[(data['group']==0) & (data['partition']==0) & (data['time']==1)]['outcome'].mean()\n", "y_000 = data[(data['group']==0) & (data['partition']==0) & (data['time']==0)]['outcome'].mean()\n", "\n", "manual_ddd = (y_111 - y_110) - (y_101 - y_100) - (y_011 - y_010) + (y_001 - y_000)\n", "print(f\"Manual DDD calculation: {manual_ddd:.4f}\")\n", "print(f\"Estimator DDD result: {results.att:.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimation Methods\n", "\n", "The `TripleDifference` class supports three estimation methods:\n", "\n", "1. **Regression Adjustment (`reg`)**: Uses outcome regression with full interactions\n", "2. **Inverse Probability Weighting (`ipw`)**: Uses propensity scores to reweight observations\n", "3. **Doubly Robust (`dr`)**: Combines both methods for robustness\n", "\n", "The doubly robust estimator is recommended as it's consistent if *either* the outcome model or the propensity score model is correctly specified." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare estimation methods\n", "methods = ['reg', 'ipw', 'dr']\n", "results_comparison = {}\n", "\n", "for method in methods:\n", " est = TripleDifference(estimation_method=method)\n", " res = est.fit(\n", " data,\n", " outcome='outcome',\n", " group='group',\n", " partition='partition',\n", " time='time'\n", " )\n", " results_comparison[method] = res\n", " print(f\"{method.upper():4s}: ATT = {res.att:7.4f} (SE = {res.se:.4f}, p = {res.p_value:.4f})\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding Covariates\n", "\n", "A key insight from Ortiz-Villavicencio & Sant'Anna (2025) is that naive DDD implementations are **invalid when covariates are needed for identification**. The `TripleDifference` class properly incorporates covariates:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Estimate with covariates\n", "ddd_with_cov = TripleDifference(estimation_method='dr')\n", "\n", "results_cov = ddd_with_cov.fit(\n", " data,\n", " outcome='outcome',\n", " group='group',\n", " partition='partition',\n", " time='time',\n", " covariates=['age', 'education']\n", ")\n", "\n", "results_cov.print_summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convenience Function\n", "\n", "For quick estimation, you can use the `triple_difference()` convenience function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# One-liner estimation\n", "quick_results = triple_difference(\n", " data,\n", " outcome='outcome',\n", " group='group',\n", " partition='partition',\n", " time='time',\n", " covariates=['age', 'education'],\n", " estimation_method='dr'\n", ")\n", "\n", "print(f\"ATT: {quick_results.att:.4f} (95% CI: [{quick_results.conf_int[0]:.4f}, {quick_results.conf_int[1]:.4f}])\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing Cell Means\n", "\n", "It's often helpful to visualize the data structure to understand the DDD:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot cell means over time\n", "fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n", "\n", "cell_means = data.groupby(['group', 'partition', 'time'])['outcome'].mean().reset_index()\n", "\n", "# Plot by group\n", "for g, group_name in [(0, 'Control States'), (1, 'Treated States')]:\n", " ax = axes[g]\n", " for p, (style, label) in enumerate([('--', 'Ineligible'), ('-', 'Eligible')]):\n", " subset = cell_means[(cell_means['group']==g) & (cell_means['partition']==p)]\n", " ax.plot(subset['time'], subset['outcome'], style, marker='o', \n", " linewidth=2, markersize=8, label=label)\n", " \n", " ax.set_xlabel('Time Period (0=Pre, 1=Post)', fontsize=12)\n", " ax.set_ylabel('Mean Outcome', fontsize=12)\n", " ax.set_title(group_name, fontsize=14)\n", " ax.set_xticks([0, 1])\n", " ax.set_xticklabels(['Pre', 'Post'])\n", " ax.legend()\n", " ax.grid(True, alpha=0.3)\n", "\n", "plt.suptitle('DDD Structure: Comparing Trends Across Groups and Partitions', fontsize=14, y=1.02)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The DDD Parallel Trends Assumption\n", "\n", "The key identifying assumption for DDD is:\n", "\n", "> **In the absence of treatment**, the *differential* trend between eligible and ineligible units would have been the same across treated and control groups.\n", "\n", "This is weaker than requiring two separate DiD parallel trends assumptions. Even if:\n", "- Eligible units have different trends than ineligible units\n", "- Treated states have different trends than control states\n", "\n", "...the DDD is valid as long as the *difference in differences* is constant.\n", "\n", "### When DDD Helps\n", "\n", "DDD is particularly useful when you suspect:\n", "1. Group-specific shocks (e.g., economic conditions in treatment states)\n", "2. Partition-specific shocks (e.g., trends affecting the eligible population everywhere)\n", "\n", "As long as these biases are additive, DDD differences them out." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Accessing Results\n", "\n", "The `TripleDifferenceResults` object provides easy access to all estimation details:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Access individual results\n", "print(f\"ATT estimate: {results.att:.4f}\")\n", "print(f\"Standard error: {results.se:.4f}\")\n", "print(f\"t-statistic: {results.t_stat:.4f}\")\n", "print(f\"p-value: {results.p_value:.4f}\")\n", "print(f\"95% CI: ({results.conf_int[0]:.4f}, {results.conf_int[1]:.4f})\")\n", "print(f\"\\nStatistically significant at 5% level: {results.is_significant}\")\n", "print(f\"Significance stars: {results.significance_stars}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Convert to DataFrame for further analysis\n", "results_df = results.to_dataframe()\n", "print(results_df.T)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# View cell means\n", "print(\"Cell Means from Estimation:\")\n", "for cell, mean in results.group_means.items():\n", " print(f\" {cell}: {mean:.4f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "Key takeaways:\n", "\n", "1. **Use DDD when treatment requires two criteria** (group membership AND partition eligibility)\n", "\n", "2. **DDD relaxes parallel trends** by allowing group-specific and partition-specific violations\n", "\n", "3. **Use doubly robust estimation** (`estimation_method='dr'`) for robustness to model misspecification\n", "\n", "4. **Properly handle covariates** - the `TripleDifference` class correctly incorporates them, unlike naive implementations\n", "\n", "## References\n", "\n", "- Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025). Better Understanding Triple Differences Estimators. *arXiv:2505.09942*.\n", "\n", "- Gruber, J. (1994). The incidence of mandated maternity benefits. *American Economic Review*, 84(3), 622-641.\n", "\n", "- Olden, A., & Møen, J. (2022). The triple difference estimator. *The Econometrics Journal*, 25(3), 531-553." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 4 }