diff_diff.aggregate_to_cohorts#
- diff_diff.aggregate_to_cohorts(data, unit_column, time_column, treatment_column, outcome, covariates=None)[source]
Aggregate unit-level data to treatment cohort means.
Useful for visualization and cohort-level analysis.
- Parameters:
data (pd.DataFrame) – Unit-level panel data.
unit_column (str) – Name of unit identifier column.
time_column (str) – Name of time period column.
treatment_column (str) – Name of treatment indicator column.
outcome (str) – Name of outcome variable column.
covariates (list of str, optional) – Additional columns to aggregate (will compute means).
- Returns:
Cohort-level data with mean outcomes by treatment status and period.
- Return type:
pd.DataFrame
Examples
>>> df = pd.DataFrame({ ... 'unit': [1, 1, 2, 2, 3, 3, 4, 4], ... 'period': [0, 1, 0, 1, 0, 1, 0, 1], ... 'treated': [1, 1, 1, 1, 0, 0, 0, 0], ... 'y': [10, 15, 12, 17, 8, 10, 9, 11] ... }) >>> cohort_df = aggregate_to_cohorts(df, 'unit', 'period', 'treated', 'y') >>> len(cohort_df) 4