diff_diff.wide_to_long#

diff_diff.wide_to_long(data, value_columns, id_column, time_name='period', value_name='value', time_values=None)[source]

Convert wide-format panel data to long format for DiD analysis.

Wide format has one row per unit with multiple columns for each time period. Long format has one row per unit-period combination.

Parameters:
  • data (pd.DataFrame) – Wide-format DataFrame with one row per unit.

  • value_columns (list of str) – Column names containing the outcome values for each period. These should be in chronological order.

  • id_column (str) – Column name for the unit identifier.

  • time_name (str, default="period") – Name for the new time period column.

  • value_name (str, default="value") – Name for the new value/outcome column.

  • time_values (list, optional) – Values to use for time periods. If None, uses 0, 1, 2, … Must have same length as value_columns.

Returns:

Long-format DataFrame with one row per unit-period.

Return type:

pd.DataFrame

Examples

>>> wide_df = pd.DataFrame({
...     'firm_id': [1, 2, 3],
...     'sales_2019': [100, 150, 200],
...     'sales_2020': [110, 160, 210],
...     'sales_2021': [120, 170, 220]
... })
>>> long_df = wide_to_long(
...     wide_df,
...     value_columns=['sales_2019', 'sales_2020', 'sales_2021'],
...     id_column='firm_id',
...     time_name='year',
...     value_name='sales',
...     time_values=[2019, 2020, 2021]
... )
>>> len(long_df)
9
>>> long_df.columns.tolist()
['firm_id', 'year', 'sales']