diff_diff.validate_did_data#
- diff_diff.validate_did_data(data, outcome, treatment, time, unit=None, raise_on_error=True)[source]
Validate that data is properly formatted for DiD analysis.
Checks for common data issues and provides informative error messages.
- Parameters:
data (pd.DataFrame) – Data to validate.
outcome (str) – Name of outcome variable column.
treatment (str) – Name of treatment indicator column.
time (str) – Name of time/post indicator column.
unit (str, optional) – Name of unit identifier column (for panel data validation).
raise_on_error (bool, default=True) – If True, raises ValueError on validation failures. If False, returns validation results without raising.
- Returns:
Validation results with keys: - valid: bool indicating if data passed all checks - errors: list of error messages - warnings: list of warning messages - summary: dict with data summary statistics
- Return type:
Examples
>>> df = pd.DataFrame({ ... 'y': [1, 2, 3, 4], ... 'treated': [0, 0, 1, 1], ... 'post': [0, 1, 0, 1] ... }) >>> result = validate_did_data(df, 'y', 'treated', 'post', raise_on_error=False) >>> result['valid'] True