diff_diff.balance_panel#

diff_diff.balance_panel(data, unit_column, time_column, method='inner', fill_value=None)[source]

Balance a panel dataset to ensure all units have all time periods.

Parameters:
  • data (pd.DataFrame) – Unbalanced panel data.

  • unit_column (str) – Column name for unit identifier.

  • time_column (str) – Column name for time period.

  • method (str, default="inner") – Balancing method: - “inner”: Keep only units that appear in all periods (drops units) - “outer”: Include all unit-period combinations (creates NaN) - “fill”: Include all combinations and fill missing values

  • fill_value (float, optional) – Value to fill missing observations when method=”fill”. If None with method=”fill”, uses column-specific forward fill.

Returns:

Balanced panel DataFrame.

Return type:

pd.DataFrame

Examples

Keep only complete units:

>>> df = pd.DataFrame({
...     'unit': [1, 1, 1, 2, 2, 3, 3, 3],
...     'period': [1, 2, 3, 1, 2, 1, 2, 3],
...     'y': [10, 11, 12, 20, 21, 30, 31, 32]
... })
>>> balanced = balance_panel(df, 'unit', 'period', method='inner')
>>> balanced['unit'].unique().tolist()
[1, 3]

Include all combinations:

>>> balanced = balance_panel(df, 'unit', 'period', method='outer')
>>> len(balanced)
9