{ "cells": [ { "cell_type": "markdown", "id": "cell-0", "metadata": {}, "source": "# Power Analysis for Difference-in-Differences\n\nThis notebook demonstrates how to use the power analysis tools in `diff-diff` for study design. We'll cover:\n\n1. Computing minimum detectable effects (MDE)\n2. Calculating required sample sizes\n3. Estimating statistical power\n4. Creating power curves for visualization\n5. Panel data considerations (ICC, multiple periods)\n6. Simulation-based power analysis for complex designs\n7. Power analysis for any estimator (staggered, synthetic DiD, triple difference)\n8. Finding MDE via simulation (bisection search)\n9. Finding required sample size via simulation (bisection search)\n10. Custom data generators\n11. Convenience functions\n12. Practical recommendations" }, { "cell_type": "code", "execution_count": null, "id": "cell-1", "metadata": {}, "outputs": [], "source": "%matplotlib inline\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nfrom diff_diff import (\n PowerAnalysis,\n DifferenceInDifferences,\n CallawaySantAnna,\n SyntheticDiD,\n TripleDifference,\n simulate_power,\n simulate_mde,\n simulate_sample_size,\n compute_mde,\n compute_power,\n compute_sample_size,\n plot_power_curve,\n)" }, { "cell_type": "markdown", "id": "cell-2", "metadata": {}, "source": [ "## 1. The Power Analysis Problem\n", "\n", "Before running a DiD study, researchers need to answer key design questions:\n", "\n", "- **\"How many units do I need?\"** (sample size calculation)\n", "- **\"What's the smallest effect I can detect?\"** (minimum detectable effect)\n", "- **\"What's my chance of finding a significant result?\"** (power calculation)\n", "\n", "The `PowerAnalysis` class provides analytical formulas for these calculations in basic DiD designs." ] }, { "cell_type": "code", "execution_count": null, "id": "cell-3", "metadata": {}, "outputs": [], "source": [ "# Create a PowerAnalysis object with standard settings\n", "# alpha = 0.05 (5% significance level)\n", "# power = 0.80 (80% power - conventional target)\n", "pa = PowerAnalysis(alpha=0.05, power=0.80)\n", "\n", "print(f\"Significance level: {pa.alpha}\")\n", "print(f\"Target power: {pa.target_power}\")\n", "print(f\"Alternative hypothesis: {pa.alternative}\")" ] }, { "cell_type": "markdown", "id": "cell-4", "metadata": {}, "source": [ "## 2. Minimum Detectable Effect (MDE)\n", "\n", "The MDE is the smallest treatment effect you can detect with your specified power and sample size. It depends on:\n", "\n", "- Sample sizes (n_treated, n_control)\n", "- Residual variance (sigma)\n", "- Significance level (alpha)\n", "- Target power" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-5", "metadata": {}, "outputs": [], "source": [ "# Calculate MDE for a basic 2x2 DiD design\n", "# Assume: 50 treated units, 50 control units, outcome SD of 10\n", "result = pa.mde(\n", " n_treated=50,\n", " n_control=50,\n", " sigma=10.0 # Residual standard deviation\n", ")\n", "\n", "print(result.summary())" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-6", "metadata": {}, "outputs": [], "source": [ "# Access individual results\n", "print(f\"Minimum Detectable Effect: {result.mde:.2f}\")\n", "print(f\"This is {result.mde / 10.0:.2f} standard deviations\")\n", "print(f\"\")\n", "print(f\"With 50 treated and 50 control units, and sigma=10,\")\n", "print(f\"you can detect an effect of {result.mde:.2f} or larger\")\n", "print(f\"with {result.power:.0%} power at alpha={result.alpha:.2f}.\")" ] }, { "cell_type": "markdown", "id": "cell-7", "metadata": {}, "source": [ "### How MDE Changes with Sample Size" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-8", "metadata": {}, "outputs": [], "source": [ "# Compare MDE across different sample sizes\n", "sample_sizes = [25, 50, 100, 200, 500]\n", "\n", "print(f\"{'N per group':>12} {'Total N':>10} {'MDE':>10} {'MDE/SD':>10}\")\n", "print(\"-\" * 45)\n", "\n", "for n in sample_sizes:\n", " result = pa.mde(n_treated=n, n_control=n, sigma=10.0)\n", " print(f\"{n:>12} {2*n:>10} {result.mde:>10.2f} {result.mde/10.0:>10.2f}\")" ] }, { "cell_type": "markdown", "id": "cell-9", "metadata": {}, "source": [ "## 3. Required Sample Size\n", "\n", "Given a target effect size to detect, calculate the required sample size." ] }, { "cell_type": "code", "execution_count": null, "id": "cell-10", "metadata": {}, "outputs": [], "source": [ "# How many units do we need to detect an effect of 5 units?\n", "result = pa.sample_size(\n", " effect_size=5.0, # Effect we want to detect\n", " sigma=10.0 # Outcome standard deviation\n", ")\n", "\n", "print(result.summary())" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-11", "metadata": {}, "outputs": [], "source": [ "# Compare sample sizes needed for different effect sizes\n", "effect_sizes = [2.0, 3.0, 5.0, 7.0, 10.0]\n", "\n", "print(f\"{'Effect Size':>12} {'Effect/SD':>10} {'Required N':>12} {'N per group':>12}\")\n", "print(\"-\" * 50)\n", "\n", "for effect in effect_sizes:\n", " result = pa.sample_size(effect_size=effect, sigma=10.0)\n", " print(f\"{effect:>12.1f} {effect/10.0:>10.2f} {result.required_n:>12} {result.n_treated:>12}\")" ] }, { "cell_type": "markdown", "id": "cell-12", "metadata": {}, "source": [ "## 4. Power Calculation\n", "\n", "Given a specific effect size and sample size, calculate the statistical power." ] }, { "cell_type": "code", "execution_count": null, "id": "cell-13", "metadata": {}, "outputs": [], "source": [ "# What's our power to detect an effect of 4 with 75 units per group?\n", "pa_calc = PowerAnalysis(alpha=0.05) # Don't specify target power for calculation\n", "\n", "result = pa_calc.power(\n", " effect_size=4.0,\n", " n_treated=75,\n", " n_control=75,\n", " sigma=10.0\n", ")\n", "\n", "print(f\"Power to detect effect of 4.0: {result.power:.1%}\")\n", "print(f\"This is {'adequate' if result.power >= 0.80 else 'below the conventional 80% threshold'}\")" ] }, { "cell_type": "markdown", "id": "cell-14", "metadata": {}, "source": [ "## 5. Power Curves\n", "\n", "Power curves show how power changes with effect size (or sample size). They're essential for understanding the trade-offs in study design." ] }, { "cell_type": "code", "execution_count": null, "id": "cell-15", "metadata": {}, "outputs": [], "source": [ "# Generate power curve data\n", "curve_df = pa.power_curve(\n", " n_treated=50,\n", " n_control=50,\n", " sigma=10.0\n", ")\n", "\n", "# Get MDE for reference\n", "mde_result = pa.mde(n_treated=50, n_control=50, sigma=10.0)\n", "\n", "print(\"First few rows of power curve:\")\n", "print(curve_df.head(10))" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-16", "metadata": {}, "outputs": [], "source": "# Plot the power curve\nplot_power_curve(\n curve_df,\n mde=mde_result.mde,\n target_power=0.80,\n title=\"Power Curve: 50 Treated, 50 Control, SD=10\",\n xlabel=\"Treatment Effect Size\",\n figsize=(10, 6)\n)\nplt.show()" }, { "cell_type": "markdown", "id": "cell-17", "metadata": {}, "source": [ "### Power vs Sample Size" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-18", "metadata": {}, "outputs": [], "source": "# How does power change with sample size for a fixed effect?\nsample_curve = pa.sample_size_curve(\n effect_size=5.0,\n sigma=10.0\n)\n\n# Plot\nplot_power_curve(\n sample_curve,\n target_power=0.80,\n show_mde_line=False,\n title=\"Power vs Sample Size (Effect=5, SD=10)\",\n figsize=(10, 6)\n)\nplt.show()" }, { "cell_type": "markdown", "id": "cell-19", "metadata": {}, "source": [ "## 6. Panel Data Considerations\n", "\n", "For panel DiD with multiple time periods, power depends on:\n", "- Number of pre/post periods\n", "- Intra-cluster correlation (ICC) - the correlation of outcomes within units over time\n", "\n", "More periods generally improve precision, but high ICC reduces the effective sample size." ] }, { "cell_type": "code", "execution_count": null, "id": "cell-20", "metadata": {}, "outputs": [], "source": [ "# Compare MDE with different numbers of periods\n", "print(\"Effect of number of periods on MDE (50 treated, 50 control, sigma=10):\")\n", "print(f\"{'Periods (pre+post)':>20} {'MDE':>10} {'Improvement':>15}\")\n", "print(\"-\" * 50)\n", "\n", "baseline_mde = None\n", "for n_pre, n_post in [(1, 1), (2, 2), (3, 3), (5, 5)]:\n", " result = pa.mde(\n", " n_treated=50, n_control=50, sigma=10.0,\n", " n_pre=n_pre, n_post=n_post\n", " )\n", " if baseline_mde is None:\n", " baseline_mde = result.mde\n", " improvement = \"-\"\n", " else:\n", " improvement = f\"{(1 - result.mde/baseline_mde)*100:.1f}%\"\n", " print(f\"{n_pre + n_post:>20} {result.mde:>10.2f} {improvement:>15}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-21", "metadata": {}, "outputs": [], "source": [ "# Effect of intra-cluster correlation (ICC)\n", "print(\"Effect of ICC on MDE (50 treated, 50 control, 6 periods, sigma=10):\")\n", "print(f\"{'ICC':>10} {'MDE':>10} {'Design Effect':>15}\")\n", "print(\"-\" * 40)\n", "\n", "for rho in [0.0, 0.1, 0.3, 0.5, 0.7]:\n", " result = pa.mde(\n", " n_treated=50, n_control=50, sigma=10.0,\n", " n_pre=3, n_post=3, rho=rho\n", " )\n", " # Design effect: how much ICC inflates variance\n", " T = 6\n", " design_effect = 1 + (T - 1) * rho\n", " print(f\"{rho:>10.1f} {result.mde:>10.2f} {design_effect:>15.2f}\")" ] }, { "cell_type": "markdown", "id": "cell-22", "metadata": {}, "source": [ "## 7. Simulation-Based Power Analysis\n", "\n", "For complex designs (staggered adoption, synthetic DiD, etc.), analytical formulas may not exist. Use Monte Carlo simulation instead.\n", "\n", "The `simulate_power` function:\n", "1. Generates synthetic data with known treatment effect\n", "2. Fits your estimator\n", "3. Records whether the effect is statistically significant\n", "4. Repeats many times\n", "5. Reports the proportion of significant results (= power)" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-23", "metadata": {}, "outputs": [], "source": [ "# Simulation-based power analysis\n", "did = DifferenceInDifferences()\n", "\n", "results = simulate_power(\n", " estimator=did,\n", " n_units=100,\n", " n_periods=4,\n", " treatment_effect=5.0,\n", " treatment_fraction=0.5,\n", " sigma=5.0, # Noise level\n", " n_simulations=200, # More simulations = more precise power estimate\n", " seed=42,\n", " progress=False\n", ")\n", "\n", "print(results.summary())" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-24", "metadata": {}, "outputs": [], "source": [ "# Key metrics from simulation\n", "print(\"Simulation Results:\")\n", "print(f\" Power (rejection rate): {results.power:.1%}\")\n", "print(f\" 95% CI for power: [{results.power_ci[0]:.1%}, {results.power_ci[1]:.1%}]\")\n", "print(f\"\")\n", "print(\"Estimator Performance:\")\n", "print(f\" True effect: {results.true_effect:.2f}\")\n", "print(f\" Mean estimate: {results.mean_estimate:.2f}\")\n", "print(f\" Bias: {results.bias:.4f}\")\n", "print(f\" RMSE: {results.rmse:.4f}\")\n", "print(f\" Coverage: {results.coverage:.1%}\")" ] }, { "cell_type": "markdown", "id": "cell-25", "metadata": {}, "source": [ "### Power Curve via Simulation" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-26", "metadata": {}, "outputs": [], "source": [ "# Simulate power for multiple effect sizes\n", "results_multi = simulate_power(\n", " estimator=did,\n", " n_units=100,\n", " n_periods=4,\n", " effect_sizes=[1.0, 2.0, 3.0, 5.0, 7.0, 10.0], # Multiple effects\n", " sigma=5.0,\n", " n_simulations=100, # Fewer per effect size for speed\n", " seed=42,\n", " progress=False\n", ")\n", "\n", "# Get power curve data\n", "power_curve_sim = results_multi.power_curve_df()\n", "print(power_curve_sim)" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-27", "metadata": {}, "outputs": [], "source": "# Plot simulation-based power curve\nplot_power_curve(\n power_curve_sim,\n target_power=0.80,\n title=\"Simulation-Based Power Curve (100 units, 4 periods, SD=5)\",\n figsize=(10, 6)\n)\nplt.show()" }, { "cell_type": "markdown", "id": "cjpvh2ze7lh", "source": "## 8. Power Analysis for Any Estimator\n\nThe simulation-based approach works with **all 12 supported estimators** — not just basic DiD. An internal registry automatically selects the appropriate data-generating process (DGP) and fit signature for each registered estimator. Just swap in the estimator object and everything else is handled. See the support table below for the full list, and Section 11 for using custom DGPs with unsupported estimators.\n\n### Staggered Adoption Estimators", "metadata": {} }, { "cell_type": "code", "id": "la7ps3nxufq", "source": "# Power analysis with Callaway-Sant'Anna — the registry auto-selects\n# generate_staggered_data as the DGP and the correct fit kwargs\ncs = CallawaySantAnna()\n\ncs_results = simulate_power(\n estimator=cs,\n n_units=100,\n n_periods=6,\n treatment_effect=5.0,\n treatment_fraction=0.5,\n treatment_period=3,\n sigma=5.0,\n n_simulations=100,\n seed=42,\n progress=False,\n)\n\nprint(cs_results.summary())", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "q61cjchqjrd", "source": "### Factor Model Estimators (Synthetic DiD)\n\nFor `SyntheticDiD` with the default placebo variance method, the DGP must generate **more control than treated units** (`treatment_fraction < 0.5`). The registry uses `generate_factor_data` automatically.", "metadata": {} }, { "cell_type": "code", "id": "x068rpe24gf", "source": "import warnings\n\n# Synthetic DiD — note treatment_fraction=0.3 (placebo variance requires\n# more control units than treated units)\nsdid = SyntheticDiD()\n\n# Suppress pre-treatment fit warnings from individual simulation draws;\n# the factor-model DGP can trigger this on some random seeds without\n# affecting the overall power estimate.\nwith warnings.catch_warnings():\n warnings.filterwarnings(\"ignore\", message=\"Pre-treatment fit is poor\")\n sdid_results = simulate_power(\n estimator=sdid,\n n_units=60,\n n_periods=6,\n treatment_effect=5.0,\n treatment_fraction=0.3,\n treatment_period=3,\n sigma=3.0,\n n_simulations=100,\n seed=42,\n progress=False,\n )\n\nprint(sdid_results.summary())", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "6qpu05hi18s", "source": "### Triple Difference\n\n`TripleDifference` uses a fixed 2×2×2 factorial design (group × partition × time). Sample sizes are **rounded via `n_per_cell = max(2, n_units // 8)`**, so the minimum effective N is 16 (2 units per cell × 8 cells). The `effective_n_units` field in results tracks any rounding. Note that `simulate_sample_size()` uses a higher search floor of 64 from the registry.", "metadata": {} }, { "cell_type": "code", "id": "91uackfwqp", "source": "# Triple Difference — n_units snaps to multiples of 8\nddd = TripleDifference()\n\nddd_results = simulate_power(\n estimator=ddd,\n n_units=64,\n treatment_effect=3.0,\n sigma=2.0,\n n_simulations=100,\n seed=42,\n progress=False,\n)\n\nprint(ddd_results.summary())\nif ddd_results.effective_n_units is not None:\n print(f\"\\nEffective N (after grid rounding): {ddd_results.effective_n_units}\")", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "6kb8ovmue4m", "source": "### Supported Estimators\n\nThe following 12 estimators are supported by the simulation power analysis registry. Each is automatically paired with the correct data-generating process:\n\n| DGP Family | Estimators | Min N |\n|---|---|---|\n| **Basic DiD** (`generate_did_data`) | DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | 20 |\n| **Staggered** (`generate_staggered_data`) | CallawaySantAnna, SunAbraham, ImputationDiD, TwoStageDiD, StackedDiD, EfficientDiD | 40 |\n| **Factor Model** (`generate_factor_data`) | TROP, SyntheticDiD | 30 |\n| **Triple Difference** (`generate_ddd_data`) | TripleDifference | 16* |\n\n\\* DDD effective N rounds to `max(2, n_units // 8) * 8` with minimum 16. `simulate_sample_size()` uses a higher search floor of 64.\n\n> **Note:** `ContinuousDiD` is not in the registry because continuous/dose-response treatments require a different DGP structure. `BaconDecomposition` and `HonestDiD` are diagnostic/sensitivity tools rather than treatment effect estimators. For unsupported estimators, you can pass a custom `data_generator` and `result_extractor` (see Section 11).", "metadata": {} }, { "cell_type": "markdown", "id": "c4erqwll1af", "source": "### Power Curve for a Staggered Estimator", "metadata": {} }, { "cell_type": "code", "id": "ox3uab7h5bj", "source": "# Power curve across effect sizes for Callaway-Sant'Anna\ncs_curve = simulate_power(\n estimator=CallawaySantAnna(),\n n_units=100,\n n_periods=6,\n effect_sizes=[1.0, 2.0, 3.0, 5.0, 7.0],\n treatment_period=3,\n sigma=5.0,\n n_simulations=100,\n seed=42,\n progress=False,\n)\n\nplot_power_curve(\n cs_curve.power_curve_df(),\n target_power=0.80,\n title=\"CS Power Curve (100 units, 6 periods, SD=5)\",\n figsize=(10, 6),\n)\nplt.show()", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "kqw6y4du5u", "source": "## 9. Finding MDE via Simulation\n\nThe analytical `PowerAnalysis.mde()` works for basic DiD, but for complex estimators there is no closed-form formula. `simulate_mde()` uses **bisection search** to find the minimum detectable effect: it repeatedly calls `simulate_power()` at different effect sizes, narrowing the bracket until it finds the smallest effect that achieves the target power.", "metadata": {} }, { "cell_type": "code", "id": "p9a03aycu2", "source": "# Find MDE for basic DiD via simulation\nmde_result = simulate_mde(\n DifferenceInDifferences(),\n n_units=100,\n n_periods=4,\n sigma=5.0,\n n_simulations=100,\n seed=42,\n progress=False,\n)\n\nprint(mde_result.summary())", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "1mgrw7qmish", "source": "### Inspecting the Search Path\n\nThe `search_path` attribute records the effect size and power at each bisection step, which is useful for diagnosing convergence:", "metadata": {} }, { "cell_type": "code", "id": "3p2tmxivi2g", "source": "# search_path is a List[Dict] — convert to DataFrame for display\nsearch_df = pd.DataFrame(mde_result.search_path)\nprint(search_df.to_string(index=False))", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "o4se5ofngjh", "source": "### MDE for a Staggered Estimator\n\nThe same function works with any registered estimator:", "metadata": {} }, { "cell_type": "code", "id": "gwou7kv0ht9", "source": "# MDE for Callaway-Sant'Anna\ncs_mde = simulate_mde(\n CallawaySantAnna(),\n n_units=100,\n n_periods=6,\n treatment_period=3,\n sigma=5.0,\n n_simulations=100,\n seed=42,\n progress=False,\n)\n\nprint(cs_mde.summary())", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "ljyojxybdhc", "source": "**Key parameters for `simulate_mde()`:**\n- `effect_range=(lo, hi)` — custom search bracket (auto-detected if omitted)\n- `tol` — convergence tolerance on power (default 0.02)\n- `max_steps` — maximum bisection steps (default 15)\n- `n_simulations` — simulations per step (use 500+ for production analyses)", "metadata": {} }, { "cell_type": "markdown", "id": "qrfhvtizg58", "source": "## 10. Finding Required Sample Size via Simulation\n\n`simulate_sample_size()` uses bisection search over `n_units` to find the smallest sample size that achieves the target power for a given effect size.", "metadata": {} }, { "cell_type": "code", "id": "8o018e2dnl6", "source": "# Find required sample size for basic DiD\nn_result = simulate_sample_size(\n DifferenceInDifferences(),\n treatment_effect=5.0,\n sigma=5.0,\n n_simulations=100,\n seed=42,\n progress=False,\n)\n\nprint(n_result.summary())", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "gwtzwm3oedl", "source": "### Inspecting the Search Path", "metadata": {} }, { "cell_type": "code", "id": "ubg3hqe9ypa", "source": "# View the bisection steps\nn_search_df = pd.DataFrame(n_result.search_path)\nprint(n_search_df.to_string(index=False))", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "dblnwe076kf", "source": "### Comparing Analytical and Simulation Results\n\nFor basic DiD, we can compare the simulation result against the analytical formula. With only 100 simulations per bisection step there will be Monte Carlo noise, so we expect **approximate** — not exact — agreement:", "metadata": {} }, { "cell_type": "code", "id": "bgpin5xmsud", "source": "# Analytical sample size\nanalytical = pa.sample_size(effect_size=5.0, sigma=5.0)\n\nprint(f\"{'Method':<25} {'Required N':>12}\")\nprint(\"-\" * 40)\nprint(f\"{'Analytical:':<25} {analytical.required_n:>12}\")\nprint(f\"{'Simulation:':<25} {n_result.required_n:>12}\")\nprint(f\"\\nSimulation power at N: {n_result.power_at_n:.1%}\")", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "rnte1h09hra", "source": "**Key parameters for `simulate_sample_size()`:**\n- `n_range=(lo, hi)` — custom search bracket for sample size (auto-detected if omitted)\n- `max_steps` — maximum bisection steps (default 15)\n- `n_simulations` — simulations per step (use 500+ for production analyses)", "metadata": {} }, { "cell_type": "markdown", "id": "usp5iwyacop", "source": "## 11. Custom Data Generators\n\nThe default DGPs cover common designs, but you can customize them in two ways:\n1. **Tweak the default DGP** with `data_generator_kwargs` (e.g., add multiple treatment cohorts)\n2. **Supply a fully custom DGP** with `data_generator`\n\n### Tweaking the Default DGP\n\nPass additional keyword arguments to the registry's DGP via `data_generator_kwargs`. For example, the default staggered DGP generates a single treatment cohort — here we create a multi-cohort design:\n\n> **Note:** Some keys are *protected* and cannot be overridden via `data_generator_kwargs` because they are controlled by the simulation function itself: `treatment_effect`, `noise_sd`, `n_units`, `n_periods`, `treatment_fraction`, `treatment_period`, `n_pre`, `n_post`.", "metadata": {} }, { "cell_type": "code", "id": "igft0epkiic", "source": "# Multi-cohort staggered design: treatment starts at periods 2 and 4\ncs_multi = simulate_power(\n estimator=CallawaySantAnna(),\n n_units=120,\n n_periods=6,\n treatment_effect=5.0,\n sigma=5.0,\n n_simulations=100,\n seed=42,\n progress=False,\n data_generator_kwargs={\n \"cohort_periods\": [2, 4],\n \"never_treated_frac\": 0.3,\n },\n)\n\nprint(cs_multi.summary())", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "hznafqrhzoq", "source": "### Fully Custom Data Generator\n\nFor designs not covered by the built-in DGPs, supply your own `data_generator` function. It receives the standard simulation parameters and must return a DataFrame. You may also need a custom `result_extractor` if your estimator returns non-standard results, and `estimator_kwargs` to pass the right column names to `fit()`.", "metadata": {} }, { "cell_type": "code", "id": "v06p7ubbj9p", "source": "def my_dgp(n_units, n_periods, treatment_effect, treatment_fraction,\n treatment_period, noise_sd, seed=None):\n \"\"\"Custom DGP with heterogeneous unit effects.\"\"\"\n rng = np.random.default_rng(seed)\n n_treat = int(n_units * treatment_fraction)\n\n rows = []\n for i in range(n_units):\n unit_fe = rng.normal(0, 3) # heterogeneous unit effect\n treated_unit = i < n_treat\n for t in range(n_periods):\n post = int(t >= treatment_period)\n effect = treatment_effect * post if treated_unit else 0.0\n y = unit_fe + 2.0 * t + effect + rng.normal(0, noise_sd)\n rows.append({\n \"unit\": i, \"period\": t, \"outcome\": y,\n \"ever_treated\": int(treated_unit), \"post\": post,\n })\n return pd.DataFrame(rows)\n\n# Use the custom DGP with simulate_power\ncustom_results = simulate_power(\n estimator=DifferenceInDifferences(),\n n_units=80,\n n_periods=4,\n treatment_effect=4.0,\n sigma=3.0,\n n_simulations=100,\n seed=42,\n progress=False,\n data_generator=my_dgp,\n estimator_kwargs={\"outcome\": \"outcome\", \"treatment\": \"ever_treated\", \"time\": \"post\"},\n)\n\nprint(custom_results.summary())", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "id": "cell-28", "metadata": {}, "source": "## 12. Convenience Functions\n\nFor quick calculations, use the convenience functions:" }, { "cell_type": "code", "execution_count": null, "id": "cell-29", "metadata": {}, "outputs": [], "source": [ "# Quick MDE calculation\n", "mde = compute_mde(n_treated=100, n_control=100, sigma=10.0, power=0.80)\n", "print(f\"MDE: {mde:.2f}\")\n", "\n", "# Quick power calculation\n", "power = compute_power(effect_size=5.0, n_treated=100, n_control=100, sigma=10.0)\n", "print(f\"Power: {power:.1%}\")\n", "\n", "# Quick sample size calculation\n", "n = compute_sample_size(effect_size=5.0, sigma=10.0, power=0.80)\n", "print(f\"Required N: {n}\")" ] }, { "cell_type": "markdown", "id": "cell-30", "metadata": {}, "source": "## 13. Practical Recommendations\n\n### Estimating Sigma (Residual SD)\n\nThe residual standard deviation is crucial for power calculations. Options:\n\n1. **Pilot data**: Fit a model on historical data and get residual SD\n2. **Literature**: Find similar studies and use their reported SDs\n3. **Domain knowledge**: Expert judgment about outcome variability\n4. **Sensitivity analysis**: Calculate power for a range of sigma values" }, { "cell_type": "code", "execution_count": null, "id": "cell-31", "metadata": {}, "outputs": [], "source": [ "# Sensitivity to sigma\n", "print(\"MDE sensitivity to residual SD (50 treated, 50 control):\")\n", "print(f\"{'Sigma':>10} {'MDE':>10} {'Effect/SD':>12}\")\n", "print(\"-\" * 35)\n", "\n", "for sigma in [5.0, 7.5, 10.0, 12.5, 15.0]:\n", " result = pa.mde(n_treated=50, n_control=50, sigma=sigma)\n", " print(f\"{sigma:>10.1f} {result.mde:>10.2f} {result.mde/sigma:>12.3f}\")" ] }, { "cell_type": "markdown", "id": "cell-32", "metadata": {}, "source": [ "### Choosing a Target Power\n", "\n", "- **80% power** is conventional but arbitrary\n", "- **90% power** is more conservative and recommended for important studies\n", "- Consider the cost of Type II errors (missing a real effect) vs Type I errors" ] }, { "cell_type": "code", "execution_count": null, "id": "cell-33", "metadata": {}, "outputs": [], "source": [ "# Compare required sample sizes for different power levels\n", "print(\"Required sample size by target power (effect=5, sigma=10):\")\n", "print(f\"{'Power':>10} {'Required N':>15}\")\n", "print(\"-\" * 30)\n", "\n", "for power in [0.70, 0.80, 0.90, 0.95]:\n", " pa_custom = PowerAnalysis(power=power)\n", " result = pa_custom.sample_size(effect_size=5.0, sigma=10.0)\n", " print(f\"{power:>10.0%} {result.required_n:>15}\")" ] }, { "cell_type": "markdown", "id": "eub4cew045u", "source": "### Analytical vs. Simulation: When to Use Each\n\n| Approach | Best for | Advantages |\n|---|---|---|\n| **Analytical** (`PowerAnalysis`) | Basic 2×2 DiD, panel DiD | Fast, exact, closed-form |\n| **Simulation** (`simulate_power/mde/sample_size`) | Staggered, SDID, TROP, DDD, custom designs | Works with any estimator, reports bias/RMSE/coverage |\n\n**Rule of thumb:** Start with analytical power analysis for basic designs. Move to simulation when using specialized estimators or non-standard DGPs.", "metadata": {} }, { "cell_type": "markdown", "id": "cell-34", "metadata": {}, "source": "## Summary\n\nKey takeaways for DiD power analysis:\n\n1. **Always do a power analysis** before running a study\n2. **MDE decreases** with sample size, more periods, and lower variance\n3. **ICC matters** for panel data — high autocorrelation reduces effective sample size\n4. **Use simulation** for complex designs (staggered, synthetic DiD, triple difference)\n5. **12 estimators are supported** out of the box via the auto-registry — just swap in the estimator\n6. **`simulate_mde()` and `simulate_sample_size()`** extend MDE and sample size calculations to any estimator via bisection search\n7. **Custom DGPs** let you model non-standard designs with `data_generator` and `data_generator_kwargs`\n8. **Be realistic about sigma** — err on the side of larger values\n9. **Consider your smallest meaningful effect** — don't just target statistical significance\n\nFor more on DiD estimation, see the other tutorials:\n- `01_basic_did.ipynb` — Basic DiD estimation\n- `02_staggered_did.ipynb` — Staggered adoption designs\n- `03_synthetic_did.ipynb` — Synthetic DiD\n- `04_parallel_trends.ipynb` — Testing assumptions\n- `05_honest_did.ipynb` — Sensitivity analysis\n- `07_pretrends_power.ipynb` — Pre-trends power analysis (Roth 2022)" } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }