diff_diff.balance_panel#
- diff_diff.balance_panel(data, unit_column, time_column, method='inner', fill_value=None)[source]
Balance a panel dataset to ensure all units have all time periods.
- Parameters:
data (pd.DataFrame) – Unbalanced panel data.
unit_column (str) – Column name for unit identifier.
time_column (str) – Column name for time period.
method (str, default="inner") – Balancing method: - “inner”: Keep only units that appear in all periods (drops units) - “outer”: Include all unit-period combinations (creates NaN) - “fill”: Include all combinations and fill missing values
fill_value (float, optional) – Value to fill missing observations when method=”fill”. If None with method=”fill”, uses column-specific forward fill.
- Returns:
Balanced panel DataFrame.
- Return type:
pd.DataFrame
Examples
Keep only complete units:
>>> df = pd.DataFrame({ ... 'unit': [1, 1, 1, 2, 2, 3, 3, 3], ... 'period': [1, 2, 3, 1, 2, 1, 2, 3], ... 'y': [10, 11, 12, 20, 21, 30, 31, 32] ... }) >>> balanced = balance_panel(df, 'unit', 'period', method='inner') >>> balanced['unit'].unique().tolist() [1, 3]
Include all combinations:
>>> balanced = balance_panel(df, 'unit', 'period', method='outer') >>> len(balanced) 9