Triple Difference (DDD)#

Triple Difference estimator for designs where treatment requires two criteria.

This module implements the methodology from Ortiz-Villavicencio & Sant’Anna (2025), which correctly handles covariate adjustment in DDD designs. Unlike naive implementations that difference two DiDs, this approach provides valid estimates when identification requires conditioning on covariates.

When to use DDD instead of DiD:

DDD allows for violations of parallel trends that are:

  • Group-specific (e.g., economic shocks affecting treatment states)

  • Partition-specific (e.g., trends affecting women everywhere)

As long as these biases are additive, DDD differences them out. The key assumption is that the differential trend between eligible and ineligible units would be the same across groups.

Reference: Ortiz-Villavicencio, M., & Sant’Anna, P. H. C. (2025). Better Understanding Triple Differences Estimators. Working Paper. arXiv:2505.09942

TripleDifference#

Main estimator class for Triple Difference designs.

class diff_diff.TripleDifference[source]

Bases: object

Triple Difference (DDD) estimator.

Estimates the Average Treatment effect on the Treated (ATT) when treatment requires satisfying two criteria: belonging to a treated group AND being in an eligible partition of the population. The DDD design was popularized by Gruber (1994) [2].

This implementation follows Ortiz-Villavicencio & Sant’Anna (2025) [1], which shows that naive DDD implementations (difference of two DiDs, three-way fixed effects) are invalid when covariates are needed for identification.

Parameters:
  • estimation_method (str, default="dr") –

    Estimation method to use:

    • ”dr”: Doubly robust (recommended). Consistent if either the outcome model or propensity score model is correctly specified.

    • ”reg”: Regression adjustment (outcome regression).

    • ”ipw”: Inverse probability weighting.

  • robust (bool, default=True) – Whether to use heteroskedasticity-robust standard errors. Note: influence function-based SEs are inherently robust to heteroskedasticity, so this parameter has no effect. Retained for API compatibility.

  • cluster (str, optional) – Column name for cluster-robust standard errors. When provided, SEs are computed using the Liang-Zeger cluster-robust variance estimator on the influence function.

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • pscore_trim (float, default=0.01) – Trimming threshold for propensity scores. Scores below this value or above (1 - pscore_trim) are clipped to avoid extreme weights.

  • rank_deficient_action (str, default="warn") –

    Action when design matrix is rank-deficient (linearly dependent columns):

    • ”warn”: Issue warning and drop linearly dependent columns (default)

    • ”error”: Raise ValueError

    • ”silent”: Drop columns silently without warning

  • epv_threshold (float, default=10) – Events Per Variable threshold for propensity score logit. When the ratio of minority-class observations to predictor variables (excluding intercept) falls below this value, a warning is emitted (or ValueError raised if rank_deficient_action="error"). Based on Peduzzi et al. (1996). Only applies to IPW and DR estimation methods.

  • pscore_fallback (str, default="error") –

    Action when propensity score estimation fails:

    • ”error”: Raise the exception (default)

    • ”unconditional”: Fall back to unconditional propensity with a warning. For IPW, drops all covariates. For DR, the propensity model becomes unconditional but outcome regression still uses covariates.

    When rank_deficient_action="error", errors are always re-raised regardless of this setting.

results_

Estimation results after calling fit().

Type:

TripleDifferenceResults

is_fitted_

Whether the model has been fitted.

Type:

bool

Examples

Basic usage with a DataFrame:

>>> import pandas as pd
>>> from diff_diff import TripleDifference
>>>
>>> # Data where treatment affects women (partition=1) in states
>>> # that enacted a policy (group=1)
>>> data = pd.DataFrame({
...     'outcome': [...],
...     'group': [1, 1, 0, 0, ...],      # 1=policy state, 0=control state
...     'partition': [1, 0, 1, 0, ...],  # 1=women, 0=men
...     'post': [0, 0, 1, 1, ...],       # 1=post-treatment period
... })
>>>
>>> # Fit using doubly robust estimation
>>> ddd = TripleDifference(estimation_method="dr")
>>> results = ddd.fit(
...     data,
...     outcome='outcome',
...     group='group',
...     partition='partition',
...     time='post'
... )
>>> print(results.att)  # ATT estimate

With covariates (properly handled unlike naive DDD):

>>> results = ddd.fit(
...     data,
...     outcome='outcome',
...     group='group',
...     partition='partition',
...     time='post',
...     covariates=['age', 'income']
... )

Notes

The DDD estimator is appropriate when:

  1. Treatment affects only units satisfying BOTH criteria: - Belonging to a treated group (G=1), e.g., states with a policy - Being in an eligible partition (P=1), e.g., women, low-income

  2. The DDD parallel trends assumption holds: the differential trend between eligible and ineligible partitions would have been the same across treated and control groups, absent treatment.

This is weaker than requiring separate parallel trends for two DiDs, as biases can cancel out in the differencing.

References

Methods

fit(data, outcome, group, partition, time[, ...])

Fit the Triple Difference model.

get_params()

Get estimator parameters (sklearn-compatible).

set_params(**params)

Set estimator parameters (sklearn-compatible).

__init__(estimation_method='dr', robust=True, cluster=None, alpha=0.05, pscore_trim=0.01, rank_deficient_action='warn', epv_threshold=10, pscore_fallback='error')[source]
Parameters:
  • estimation_method (str)

  • robust (bool)

  • cluster (str | None)

  • alpha (float)

  • pscore_trim (float)

  • rank_deficient_action (str)

  • epv_threshold (float)

  • pscore_fallback (str)

results_: TripleDifferenceResults | None
fit(data, outcome, group, partition, time, covariates=None, survey_design=None)[source]

Fit the Triple Difference model.

Parameters:
  • data (pd.DataFrame) – DataFrame containing all variables.

  • outcome (str) – Name of the outcome variable column.

  • group (str) – Name of the group indicator column (0/1). 1 = treated group (e.g., states that enacted policy). 0 = control group.

  • partition (str) – Name of the partition/eligibility indicator column (0/1). 1 = eligible partition (e.g., women, targeted demographic). 0 = ineligible partition.

  • time (str) – Name of the time period indicator column (0/1). 1 = post-treatment period. 0 = pre-treatment period.

  • covariates (list of str, optional) – List of covariate column names to adjust for. These are properly incorporated using the selected estimation method (unlike naive DDD implementations).

  • survey_design (SurveyDesign, optional) – Survey design specification for complex survey data. When provided, uses survey weights for estimation and Taylor Series Linearization (TSL) for variance estimation. Supported with all estimation methods (“reg”, “ipw”, “dr”).

Returns:

Object containing estimation results.

Return type:

TripleDifferenceResults

Raises:
  • ValueError – If required columns are missing or data validation fails.

  • NotImplementedError – If survey_design is used with wild_bootstrap inference.

get_params()[source]

Get estimator parameters (sklearn-compatible).

Returns:

Estimator parameters.

Return type:

Dict[str, Any]

set_params(**params)[source]

Set estimator parameters (sklearn-compatible).

Parameters:

**params – Estimator parameters.

Return type:

self

summary()[source]

Get summary of estimation results.

Returns:

Formatted summary.

Return type:

str

print_summary()[source]

Print summary to stdout.

Return type:

None

TripleDifferenceResults#

Results container for Triple Difference estimation.

class diff_diff.TripleDifferenceResults[source]

Bases: object

Results from Triple Difference (DDD) estimation.

Provides access to the estimated average treatment effect on the treated (ATT), standard errors, confidence intervals, and diagnostic information.

att

Average Treatment effect on the Treated (ATT). This is the effect on units in the treated group (G=1) and eligible partition (P=1) after treatment (T=1).

Type:

float

se

Standard error of the ATT estimate.

Type:

float

t_stat

T-statistic for the ATT estimate.

Type:

float

p_value

P-value for the null hypothesis that ATT = 0.

Type:

float

conf_int

Confidence interval for the ATT.

Type:

tuple[float, float]

n_obs

Total number of observations used in estimation.

Type:

int

n_treated_eligible

Number of observations in treated group and eligible partition.

Type:

int

n_treated_ineligible

Number of observations in treated group and ineligible partition.

Type:

int

n_control_eligible

Number of observations in control group and eligible partition.

Type:

int

n_control_ineligible

Number of observations in control group and ineligible partition.

Type:

int

estimation_method

Estimation method used: “dr” (doubly robust), “reg” (regression adjustment), or “ipw” (inverse probability weighting).

Type:

str

alpha

Significance level used for confidence intervals.

Type:

float

Methods

summary([alpha])

Generate a formatted summary of the estimation results.

print_summary([alpha])

Print the summary to stdout.

to_dict()

Convert results to a dictionary.

to_dataframe()

Convert results to a pandas DataFrame.

att: float
se: float
t_stat: float
p_value: float
conf_int: Tuple[float, float]
n_obs: int
n_treated_eligible: int
n_treated_ineligible: int
n_control_eligible: int
n_control_ineligible: int
estimation_method: str
alpha: float = 0.05
group_means: Dict[str, float] | None = None
pscore_stats: Dict[str, float] | None = None
r_squared: float | None = None
covariate_balance: DataFrame | None = None
inference_method: str = 'analytical'
n_bootstrap: int | None = None
n_clusters: int | None = None
survey_metadata: Any | None = None
epv_diagnostics: Dict[int, Dict[str, Any]] | None = None
epv_threshold: float = 10
pscore_fallback: str = 'error'
__repr__()[source]

Concise string representation.

Return type:

str

summary(alpha=None)[source]

Generate a formatted summary of the estimation results.

Parameters:

alpha (float, optional) – Significance level for confidence intervals. Defaults to the alpha used during estimation.

Returns:

Formatted summary table.

Return type:

str

print_summary(alpha=None)[source]

Print the summary to stdout.

Parameters:

alpha (float | None)

Return type:

None

to_dict()[source]

Convert results to a dictionary.

Returns:

Dictionary containing all estimation results.

Return type:

Dict[str, Any]

to_dataframe()[source]

Convert results to a pandas DataFrame.

Returns:

DataFrame with estimation results.

Return type:

pd.DataFrame

property is_significant: bool

Check if the ATT is statistically significant at the alpha level.

property significance_stars: str

Return significance stars based on p-value.

epv_summary(show_all=False)[source]

Return per-subgroup EPV diagnostics as a DataFrame.

Parameters:

show_all (bool, default False) – If False, only show subgroups with low EPV. If True, show all.

Returns:

Columns: subgroup, epv, n_events, n_params, is_low.

Return type:

pd.DataFrame

__init__(att, se, t_stat, p_value, conf_int, n_obs, n_treated_eligible, n_treated_ineligible, n_control_eligible, n_control_ineligible, estimation_method, alpha=0.05, group_means=None, pscore_stats=None, r_squared=None, covariate_balance=None, inference_method='analytical', n_bootstrap=None, n_clusters=None, survey_metadata=None, epv_diagnostics=None, epv_threshold=10, pscore_fallback='error')
Parameters:
Return type:

None

Convenience Function#

diff_diff.triple_difference(data, outcome, group, partition, time, covariates=None, estimation_method='dr', robust=True, cluster=None, alpha=0.05, rank_deficient_action='warn', epv_threshold=10, pscore_fallback='error', survey_design=None)[source]#

Estimate Triple Difference (DDD) treatment effect.

Convenience function that creates a TripleDifference estimator and fits it to the data in one step.

Parameters:
  • data (pd.DataFrame) – DataFrame containing all variables.

  • outcome (str) – Name of the outcome variable column.

  • group (str) – Name of the group indicator column (0/1). 1 = treated group (e.g., states that enacted policy).

  • partition (str) – Name of the partition/eligibility indicator column (0/1). 1 = eligible partition (e.g., women, targeted demographic).

  • time (str) – Name of the time period indicator column (0/1). 1 = post-treatment period.

  • covariates (list of str, optional) – List of covariate column names to adjust for.

  • estimation_method (str, default="dr") – Estimation method: “dr” (doubly robust), “reg” (regression), or “ipw” (inverse probability weighting).

  • robust (bool, default=True) – Whether to use heteroskedasticity-robust standard errors. Note: influence function-based SEs are inherently robust to heteroskedasticity, so this parameter has no effect. Retained for API compatibility.

  • cluster (str, optional) – Column name for cluster-robust standard errors.

  • alpha (float, default=0.05) – Significance level for confidence intervals.

  • rank_deficient_action (str, default="warn") – Action when design matrix is rank-deficient: - “warn”: Issue warning and drop linearly dependent columns (default) - “error”: Raise ValueError - “silent”: Drop columns silently without warning

  • epv_threshold (float, default=10) – Events Per Variable threshold for propensity score logit.

  • pscore_fallback (str, default="error") – Action when propensity score estimation fails: - “error”: Raise (default) - “unconditional”: Fall back to unconditional propensity

  • survey_design (object)

Returns:

Object containing estimation results.

Return type:

TripleDifferenceResults

Examples

>>> from diff_diff import triple_difference
>>> results = triple_difference(
...     data,
...     outcome='earnings',
...     group='policy_state',
...     partition='female',
...     time='post_policy',
...     covariates=['age', 'education']
... )
>>> print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")

Estimation Methods#

The estimator supports three estimation methods:

Method

Description

When to use

"dr"

Doubly robust

Recommended. Consistent if either outcome or propensity model is correct

"reg"

Regression adjustment

Simple outcome regression with full interactions

"ipw"

Inverse probability weighting

When propensity score model is well-specified

Example Usage#

Basic usage:

from diff_diff import TripleDifference

ddd = TripleDifference(estimation_method='dr')
results = ddd.fit(
    data,
    outcome='wages',
    group='policy_state',       # 1=state enacted policy, 0=control state
    partition='female',         # 1=women (affected by policy), 0=men
    time='post'                 # 1=post-policy, 0=pre-policy
)
results.print_summary()

With covariates:

results = ddd.fit(
    data,
    outcome='wages',
    group='policy_state',
    partition='female',
    time='post',
    covariates=['age', 'education', 'experience']
)

Using the convenience function:

from diff_diff import triple_difference

results = triple_difference(
    data,
    outcome='wages',
    group='policy_state',
    partition='female',
    time='post',
    estimation_method='dr'
)