diff_diff.aggregate_to_cohorts#

Name: diff-diff
Author: diff-diff contributors

diff_diff.aggregate_to_cohorts(data, unit_column, time_column, treatment_column, outcome, covariates=None)[source]

Aggregate unit-level data to treatment cohort means.

Useful for visualization and cohort-level analysis.

Parameters:

data (pd.DataFrame) – Unit-level panel data.
unit_column (str) – Name of unit identifier column.
time_column (str) – Name of time period column.
treatment_column (str) – Name of treatment indicator column.
outcome (str) – Name of outcome variable column.
covariates (list of str, optional) – Additional columns to aggregate (will compute means).

Returns:

Cohort-level data with mean outcomes by treatment status and period.

Return type:

pd.DataFrame

Examples

>>> df = pd.DataFrame({
...     'unit': [1, 1, 2, 2, 3, 3, 4, 4],
...     'period': [0, 1, 0, 1, 0, 1, 0, 1],
...     'treated': [1, 1, 1, 1, 0, 0, 0, 0],
...     'y': [10, 15, 12, 17, 8, 10, 9, 11]
... })
>>> cohort_df = aggregate_to_cohorts(df, 'unit', 'period', 'treated', 'y')
>>> len(cohort_df)
4