Interactive notebook

This tutorial is a Jupyter notebook. You can view it on GitHub or download it to run locally.

Measuring Campaign Impact on Brand Awareness with Survey Data#

Your company launched a brand awareness campaign in certain markets. The marketing team conducted brand tracking surveys across all markets before and after the campaign, using a stratified sampling design with demographic weighting. Each wave surveyed 200 respondents — some in campaign markets, some in control markets.

Marketing leadership wants to know:

Did aided awareness actually increase among respondents in campaign markets?
Did consideration move?
How confident should we be in these numbers?

This tutorial shows how to answer these questions using Difference-in-Differences (DiD) with proper survey design corrections. DiD compares the change among campaign-exposed respondents to the change among control respondents — if awareness went up 8 points in campaign markets but only 2 in control markets, the incremental lift is 6 points.

The complication: your survey data has a complex sampling design — stratified by region, with unequal selection probabilities and geographic clustering. Ignoring this can make you overconfident in your results.

What you’ll learn:

Analyzing brand tracking survey data with DiD
Why survey design (weights, strata, clusters) changes your answer
Measuring multiple brand funnel metrics
Checking whether the result is trustworthy
Extending to staggered campaign rollouts
Communicating results to stakeholders

1. Setup#

[ ]:

import warnings

import numpy as np
import pandas as pd
from diff_diff import (
    DifferenceInDifferences,
    SurveyDesign,
    check_parallel_trends,
)
from diff_diff.prep import generate_survey_did_data
from diff_diff.practitioner import practitioner_next_steps

# Suppress numerical artifacts from survey variance computation with
# extreme weights. These are benign matmul edge cases, not methodology
# issues — results are unaffected. All other warnings come through.
warnings.filterwarnings("ignore", category=RuntimeWarning, module="diff_diff.survey")

try:
    import matplotlib.pyplot as plt

    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_MATPLOTLIB = True
except ImportError:
    HAS_MATPLOTLIB = False
    print("matplotlib not installed — plots will be skipped.")

2. Data Preparation#

We’ll generate synthetic brand tracking data that mirrors a real survey: 200 respondents across 8 waves, sampled from 5 geographic regions with cluster sampling and demographic weighting. The campaign launches at wave 5 for respondents in certain markets.

[ ]:

# Generate survey data with known treatment effect (~5 percentage points)
raw = generate_survey_did_data(
    n_units=200,
    n_periods=8,
    cohort_periods=[5],       # Campaign launches at wave 5
    never_treated_frac=0.6,   # ~60% of respondents are in control markets
    treatment_effect=5.0,     # True lift: 5 percentage points
    n_strata=5,               # 5 geographic regions
    psu_per_stratum=4,        # 4 sampling clusters per region
    weight_variation="high",  # Substantial demographic weighting
    informative_sampling=True,
    return_true_population_att=True,
    seed=46,
)

# Create the binary indicators that DiD needs
raw["campaign_respondent"] = (raw["first_treat"] > 0).astype(int)
raw["post_campaign"] = (raw["period"] >= 5).astype(int)

# Rename columns to business terms
data = raw.rename(columns={
    "unit": "respondent_id",
    "period": "wave",
    "outcome": "awareness",
    "stratum": "region",
    "psu": "cluster",
    "weight": "survey_weight",
    "first_treat": "campaign_start_wave",
    "treated": "campaign_active",
})

# Scale awareness to realistic brand metric percentages (~45% baseline)
data["awareness"] = data["awareness"] + 45

# Create additional brand funnel metrics
# Effects attenuate down the funnel: awareness > consideration > purchase intent
rng = np.random.default_rng(seed=99)
data["consideration"] = 25 + (data["awareness"] - 45) * 0.6 + rng.normal(0, 1.0, len(data))
data["purchase_intent"] = 12 + (data["awareness"] - 45) * 0.3 + rng.normal(0, 0.8, len(data))

print(f"Dataset: {data.shape[0]} observations, {data['respondent_id'].nunique()} respondents, {data['wave'].nunique()} waves")
print(f"Campaign respondents: {data.groupby('respondent_id')['campaign_respondent'].first().sum()}")
print(f"Control respondents: {(~data.groupby('respondent_id')['campaign_respondent'].first().astype(bool)).sum()}")

[ ]:

# Average brand metrics by group and period
summary = data.groupby(["campaign_respondent", "post_campaign"]).agg(
    awareness=("awareness", "mean"),
    consideration=("consideration", "mean"),
    purchase_intent=("purchase_intent", "mean"),
).round(1)
summary.index = summary.index.set_names(["Campaign Respondent", "Post Campaign"])
summary

3. Visual Inspection#

Before running any analysis, plot awareness over time for campaign vs. control markets. The key question: were the two groups trending similarly before the campaign launched?

[ ]:

trends = data.groupby(["wave", "campaign_respondent"])["awareness"].mean().unstack()
trends.columns = ["Control", "Campaign"]

if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(10, 5))
    trends.plot(ax=ax, marker="o", linewidth=2)
    ax.axvline(x=4.5, color="gray", linestyle="--", alpha=0.7, label="Campaign Launch")
    ax.set_xlabel("Wave")
    ax.set_ylabel("Aided Awareness (%)")
    ax.set_title("Brand Awareness Over Time")
    ax.legend()
    plt.tight_layout()
    plt.show()
else:
    print(trends.to_string())

Before the campaign launched (waves 1-4), awareness was trending similarly in both groups. After launch (waves 5-8), campaign markets pulled ahead. This is exactly the pattern DiD is designed to measure.

4. Naive DiD (Ignoring Survey Design)#

First, run a standard DiD analysis that treats every survey response equally.

[ ]:

did_naive = DifferenceInDifferences()
results_naive = did_naive.fit(
    data,
    outcome="awareness",
    treatment="campaign_respondent",
    time="post_campaign",
)
print(results_naive)
print(f"\nThe campaign increased awareness by {results_naive.att:.1f} percentage points")
print(f"95% CI: ({results_naive.conf_int[0]:.1f}, {results_naive.conf_int[1]:.1f})")

This looks like a strong, precise result. But it treats every survey response as equally informative and ignores the sampling structure. Let’s see what happens when we account for the survey design.

5. Survey-Aware DiD#

Brand tracking surveys rarely use simple random sampling. Respondents are sampled in geographic clusters with demographic quotas and weighting. The SurveyDesign object tells diff-diff how the survey was conducted.

[ ]:

sd = SurveyDesign(
    weights="survey_weight",  # Accounts for demographic oversampling
    strata="region",          # Sample was drawn separately within each region
    psu="cluster",            # Respondents sampled in geographic clusters
    fpc="fpc",                # Finite population correction
)

did_survey = DifferenceInDifferences()
results_survey = did_survey.fit(
    data,
    outcome="awareness",
    treatment="campaign_respondent",
    time="post_campaign",
    survey_design=sd,
)

print(results_survey)
print(f"\nThe campaign increased awareness by {results_survey.att:.1f} percentage points")
print(f"95% CI: ({results_survey.conf_int[0]:.1f}, {results_survey.conf_int[1]:.1f})")

What Changed?#

Let’s compare the naive and survey-aware results side by side.

[ ]:

se_ratio = results_survey.se / results_naive.se

comparison = pd.DataFrame({
    "Naive": [
        f"{results_naive.att:.2f}",
        f"{results_naive.se:.3f}",
        f"({results_naive.conf_int[0]:.1f}, {results_naive.conf_int[1]:.1f})",
        f"{results_naive.p_value:.4f}",
    ],
    "Survey-Aware": [
        f"{results_survey.att:.2f}",
        f"{results_survey.se:.3f}",
        f"({results_survey.conf_int[0]:.1f}, {results_survey.conf_int[1]:.1f})",
        f"{results_survey.p_value:.4f}",
    ],
}, index=["Lift (pp)", "Std Error", "95% CI", "p-value"])

print(comparison.to_string())
print(f"\nSE inflation ratio: {se_ratio:.2f}x")
print(f"Survey-aware standard errors are {(se_ratio - 1) * 100:.0f}% larger than naive.")
print(f"\nThe lift estimate is similar, but the naive analysis makes you think")
print(f"you know it more precisely than you actually do.")

The standard errors more than doubled. Respondents within the same geographic cluster tend to answer similarly, so each response carries less independent information than the raw sample size suggests. The naive analysis was overconfident.

In this case, both analyses agree the campaign worked — but the survey-aware confidence interval is much wider. In a closer call, ignoring the survey design could lead you to claim a significant result when the evidence is actually inconclusive.

6. Multiple Brand Metrics#

Brand campaigns don’t just move awareness — they should also move consideration and purchase intent. Let’s measure the lift across the full brand funnel.

[ ]:

outcomes = ["awareness", "consideration", "purchase_intent"]
funnel_results = {}

for outcome in outcomes:
    did = DifferenceInDifferences()
    r = did.fit(
        data,
        outcome=outcome,
        treatment="campaign_respondent",
        time="post_campaign",
        survey_design=sd,
    )
    funnel_results[outcome] = r

# Results table
funnel_df = pd.DataFrame({
    "Metric": ["Awareness", "Consideration", "Purchase Intent"],
    "Lift (pp)": [funnel_results[o].att for o in outcomes],
    "SE": [funnel_results[o].se for o in outcomes],
    "95% CI Lower": [funnel_results[o].conf_int[0] for o in outcomes],
    "95% CI Upper": [funnel_results[o].conf_int[1] for o in outcomes],
    "p-value": [funnel_results[o].p_value for o in outcomes],
}).round(2)

print(funnel_df.to_string(index=False))

[ ]:

if HAS_MATPLOTLIB:
    metrics = ["Awareness", "Consideration", "Purchase\nIntent"]
    lifts = [funnel_results[o].att for o in outcomes]
    ci_low = [funnel_results[o].conf_int[0] for o in outcomes]
    ci_high = [funnel_results[o].conf_int[1] for o in outcomes]
    errors = [[l - lo for l, lo in zip(lifts, ci_low)],
              [hi - l for l, hi in zip(lifts, ci_high)]]

    fig, ax = plt.subplots(figsize=(8, 5))
    bars = ax.bar(metrics, lifts, color=["#2196F3", "#4CAF50", "#FF9800"],
                  yerr=errors, capsize=8, edgecolor="black", linewidth=0.5)
    ax.axhline(y=0, color="black", linewidth=0.5)
    ax.set_ylabel("Incremental Lift (percentage points)")
    ax.set_title("Campaign Impact Across the Brand Funnel")

    for bar, lift in zip(bars, lifts):
        ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.3,
                f"+{lift:.1f}pp", ha="center", va="bottom", fontweight="bold")

    plt.tight_layout()
    plt.show()

The campaign moved awareness the most, consideration less, and purchase intent the least. This is typical funnel attenuation — the message reached people but didn’t fully convert to purchase consideration. All three effects are statistically significant.

7. Is This Result Trustworthy?#

Two diagnostic checks help validate the result.

Parallel Trends Check#

DiD assumes campaign and control groups would have continued trending the same way if the campaign hadn’t run. check_parallel_trends() is a quick informal check that compares pre-campaign slopes — it does not account for survey design, so treat it as a sanity check rather than a formal test. The formal robustness assessment comes from HonestDiD in Section 8.

[ ]:

pt = check_parallel_trends(
    data,
    outcome="awareness",
    time="wave",
    treatment_group="campaign_respondent",
)

print(f"Pre-campaign trend difference: {pt['trend_difference']:.3f}")
print(f"p-value: {pt['p_value']:.3f}")
print(f"\nParallel trends {'consistent with the data' if pt['parallel_trends_plausible'] else 'NOT supported'}")
if pt["parallel_trends_plausible"]:
    print("Before the campaign, awareness was trending at a similar rate in both groups.")

Placebo Test#

Run the same DiD analysis on the pre-campaign period only, where no campaign effect should exist. If we find a “significant” effect here, something is wrong.

[ ]:

# Use waves 1-4 only; split at wave 3 as a "placebo" campaign launch
pre_data = data[data["wave"] <= 4].copy()
pre_data["placebo_post"] = (pre_data["wave"] >= 3).astype(int)

# Use survey_design here too — consistent with the main analysis
did_placebo = DifferenceInDifferences()
r_placebo = did_placebo.fit(
    pre_data,
    outcome="awareness",
    treatment="campaign_respondent",
    time="placebo_post",
    survey_design=sd,
)

print(f"Placebo lift: {r_placebo.att:.2f} pp (p = {r_placebo.p_value:.3f})")
if r_placebo.p_value > 0.05:
    print("No significant effect in the pre-campaign period — the method isn't picking up spurious patterns.")
else:
    print("WARNING: Significant placebo effect detected — investigate further.")

The informal trend check is consistent with parallel trends, and the survey-aware placebo test finds no effect where none should exist. Together these are supportive evidence, though neither formally proves the parallel trends assumption — it is always an untestable assumption about what would have happened.

For event study designs (Section 8 below), HonestDiD sensitivity analysis provides a formal assessment of how robust the result is to violations of this assumption.

Practitioner Guidance#

diff-diff includes an automated checklist based on the Baker et al. (2025) practitioner workflow. It suggests diagnostic steps based on your estimator and results.

Note: the code snippets in the output use placeholder column names — substitute your own.

[ ]:

practitioner_next_steps(results_survey, verbose=True)

8. Extension: Staggered Campaign Rollout#

Many campaigns don’t launch in all markets at once — they roll out in waves. Some markets go live in month 2, others in month 4. When this happens, basic DiD can give biased results. The CallawaySantAnna estimator handles this correctly.

Let’s generate data where the campaign rolled out in two waves.

[ ]:

from diff_diff import CallawaySantAnna
from diff_diff.visualization import plot_event_study

# Campaign rolls out in two waves: some markets at wave 3, others at wave 5
stag_raw = generate_survey_did_data(
    n_units=200,
    n_periods=8,
    cohort_periods=[3, 5],
    never_treated_frac=0.4,
    treatment_effect=5.0,
    dynamic_effects=True,
    effect_growth=0.1,        # Effect builds 10% per wave (repeated exposure)
    n_strata=5,
    psu_per_stratum=4,
    weight_variation="high",
    informative_sampling=True,
    return_true_population_att=True,
    seed=42,
)

stag_data = stag_raw.rename(columns={
    "unit": "respondent_id", "period": "wave", "outcome": "awareness",
    "stratum": "region", "psu": "cluster", "weight": "survey_weight",
    "first_treat": "campaign_start_wave",
})
stag_data["awareness"] = stag_data["awareness"] + 45

print(f"Campaign cohorts: {sorted(stag_data['campaign_start_wave'].unique())}")
print(f"  Wave 3 launch: {(stag_data.groupby('respondent_id')['campaign_start_wave'].first() == 3).sum()} respondents")
print(f"  Wave 5 launch: {(stag_data.groupby('respondent_id')['campaign_start_wave'].first() == 5).sum()} respondents")
print(f"  Control:       {(stag_data.groupby('respondent_id')['campaign_start_wave'].first() == 0).sum()} respondents")

[ ]:

stag_sd = SurveyDesign(
    weights="survey_weight", strata="region", psu="cluster", fpc="fpc",
)

# base_period="universal" is required for valid HonestDiD sensitivity analysis —
# it uses a common reference period so pre-treatment coefficients are comparable.
cs = CallawaySantAnna(base_period="universal")
stag_results = cs.fit(
    stag_data,
    outcome="awareness",
    unit="respondent_id",
    time="wave",
    first_treat="campaign_start_wave",
    aggregate="event_study",
    survey_design=stag_sd,
)

print(stag_results)
print(f"\nOverall campaign lift: {stag_results.overall_att:.1f} pp")
print("\nEvent study effects (relative to campaign launch):")
print(stag_results.to_dataframe(level="event_study").round(2).to_string(index=False))

[ ]:

if HAS_MATPLOTLIB:
    fig, ax = plt.subplots(figsize=(10, 6))
    plot_event_study(
        stag_results,
        ax=ax,
        title="Campaign Effect Over Time (Staggered Rollout)",
        xlabel="Waves Relative to Campaign Launch",
        ylabel="Awareness Lift (pp)",
    )
    plt.tight_layout()
    plt.show()

The event study shows the campaign effect building over time — starting around 5pp at launch and growing to about 7pp with sustained exposure. Pre-campaign periods show no significant effects, consistent with the parallel trends assumption.

Sensitivity Analysis#

HonestDiD (Rambachan & Roth, 2023) tells us how much the parallel trends assumption would need to be violated for the result to disappear.

[ ]:

from diff_diff import compute_honest_did

honest = compute_honest_did(stag_results, method="relative_magnitude", M=1.0)

print(honest.summary())
print("\nIn plain English: even if the pre-campaign trends were off by as much as")
print("the largest observed pre-period fluctuation, the campaign effect remains positive.")

9. Communicating Results to Leadership#

Here’s how to write up the finding for stakeholders:

[ ]:

r = results_survey  # The main 2x2 result
n_campaign = data.groupby("respondent_id")["campaign_respondent"].first().sum()
n_control = (~data.groupby("respondent_id")["campaign_respondent"].first().astype(bool)).sum()

print("=" * 70)
print("EXECUTIVE SUMMARY")
print("=" * 70)
print(f"""
The brand awareness campaign increased aided awareness by {r.att:.1f}
percentage points (95% CI: {r.conf_int[0]:.1f} to {r.conf_int[1]:.1f})
among {n_campaign} campaign-exposed respondents compared to {n_control}
control respondents.

This result accounts for the complex survey sampling design and is
supported by pre-campaign trend analysis and placebo testing.

Impact across the brand funnel:
  - Awareness:       +{funnel_results['awareness'].att:.1f} pp
  - Consideration:   +{funnel_results['consideration'].att:.1f} pp
  - Purchase Intent: +{funnel_results['purchase_intent'].att:.1f} pp

The effect attenuates down the funnel, suggesting the campaign
successfully raised awareness but further investment is needed to
convert awareness into purchase consideration.
""")

Key points for your write-up:

Report the survey-aware estimate, not the naive one — it reflects the true uncertainty
Include confidence intervals, not just point estimates — leadership should understand the range
Distinguish statistical significance (is the effect real?) from practical significance (is it big enough to matter?)
A 5pp lift in awareness from 46% to 51% may or may not justify the campaign spend — that’s a business judgment, not a statistical one

Summary#

What we covered:

Survey design matters: Ignoring the complex sampling structure made standard errors more than 2x too small, creating false precision
DiD with survey data: SurveyDesign integrates directly with all diff-diff estimators — just pass survey_design=sd to .fit()
Brand funnel analysis: Measuring awareness, consideration, and purchase intent together reveals where the campaign effect attenuates
Diagnostics: Informal trend checks and survey-aware placebo tests provide supportive evidence; HonestDiD provides formal robustness assessment
Staggered rollouts: CallawaySantAnna handles campaigns that launch in waves, with event study plots showing how the effect builds over time
Sensitivity: HonestDiD quantifies how robust the result is to assumption violations

When to use this approach:

You have survey data collected before and after a campaign or intervention
The campaign ran in some markets/regions but not others
Randomized A/B testing wasn’t feasible
Your survey uses stratified sampling, clustering, or weighting

Related tutorials:

Tutorial 16: Survey DiD — deep dive into survey design theory, replicate weights, and design effect diagnostics
Tutorial 02: Staggered DiD — more on Callaway-Sant’Anna and staggered adoption designs
Tutorial 05: Honest DiD — full sensitivity analysis guide