diff_diff.validate_did_data#

diff_diff.validate_did_data(data, outcome, treatment, time, unit=None, raise_on_error=True)[source]

Validate that data is properly formatted for DiD analysis.

Checks for common data issues and provides informative error messages.

Parameters:
  • data (pd.DataFrame) – Data to validate.

  • outcome (str) – Name of outcome variable column.

  • treatment (str) – Name of treatment indicator column.

  • time (str) – Name of time/post indicator column.

  • unit (str, optional) – Name of unit identifier column (for panel data validation).

  • raise_on_error (bool, default=True) – If True, raises ValueError on validation failures. If False, returns validation results without raising.

Returns:

Validation results with keys: - valid: bool indicating if data passed all checks - errors: list of error messages - warnings: list of warning messages - summary: dict with data summary statistics

Return type:

dict

Examples

>>> df = pd.DataFrame({
...     'y': [1, 2, 3, 4],
...     'treated': [0, 0, 1, 1],
...     'post': [0, 1, 0, 1]
... })
>>> result = validate_did_data(df, 'y', 'treated', 'post', raise_on_error=False)
>>> result['valid']
True