diff_diff.aggregate_to_cohorts#

diff_diff.aggregate_to_cohorts(data, unit_column, time_column, treatment_column, outcome, covariates=None)[source]

Aggregate unit-level data to treatment cohort means.

Useful for visualization and cohort-level analysis.

Parameters:
  • data (pd.DataFrame) – Unit-level panel data.

  • unit_column (str) – Name of unit identifier column.

  • time_column (str) – Name of time period column.

  • treatment_column (str) – Name of treatment indicator column.

  • outcome (str) – Name of outcome variable column.

  • covariates (list of str, optional) – Additional columns to aggregate (will compute means).

Returns:

Cohort-level data with mean outcomes by treatment status and period.

Return type:

pd.DataFrame

Examples

>>> df = pd.DataFrame({
...     'unit': [1, 1, 2, 2, 3, 3, 4, 4],
...     'period': [0, 1, 0, 1, 0, 1, 0, 1],
...     'treated': [1, 1, 1, 1, 0, 0, 0, 0],
...     'y': [10, 15, 12, 17, 8, 10, 9, 11]
... })
>>> cohort_df = aggregate_to_cohorts(df, 'unit', 'period', 'treated', 'y')
>>> len(cohort_df)
4