diff_diff.profile_panel#

diff_diff.profile_panel(df, *, unit, time, treatment, outcome)[source]

Describe the structure of a DiD panel.

Reports structural facts — balance, treatment-type classification, outcome characteristics, factual alerts. Descriptive, not opinionated: the profile says what is, never what to do about it. Estimator selection is up to the caller.

Parameters:
  • df (pandas.DataFrame) – Long-format panel data containing the four named columns.

  • unit (str) – Column identifying the cross-sectional unit.

  • time (str) – Column identifying the time period.

  • treatment (str) – Column holding the treatment indicator or dose. See Notes for the classification rules.

  • outcome (str) – Column holding the outcome variable.

Returns:

Frozen dataclass. Call .to_dict() for a JSON-serializable view.

Return type:

PanelProfile

Raises:

ValueError – If any of the four column names is not present in df.

Examples

>>> import pandas as pd
>>> from diff_diff import profile_panel
>>> df = pd.DataFrame({
...     "u":  [1, 1, 2, 2],
...     "t":  [0, 1, 0, 1],
...     "tr": [0, 0, 1, 1],
...     "y":  [0.1, 0.2, 0.1, 0.9],
... })
>>> profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y")
>>> profile.is_balanced
True
>>> profile.treatment_type
'binary_absorbing'

Notes

Classification rules for treatment_type:

  • "binary_absorbing": numeric treatment whose observed non-NaN values are a subset of \(\{0, 1\}\) (one or two distinct values) AND each unit’s treatment sequence (ordered by time) is weakly monotone non-decreasing. All-zero and all-one panels are valid degenerate cases.

  • "binary_non_absorbing": values a subset of \(\{0, 1\}\) with at least two distinct values observed, where at least one unit switches from 1 back to 0.

  • "continuous": numeric treatment with more than two distinct values, or a 2-valued numeric whose values are not in \(\{0, 1\}\) (matches the ContinuousDiD convention).

  • "categorical": non-numeric dtype (object / category) or a column that is entirely NaN.

Bool-dtype columns (True / False) are classified the same way as numeric {0, 1}: the library’s binary estimators validate on value support via diff_diff.utils.validate_binary(), so True / False behave like 1 / 0 for absorbing / non-absorbing classification.

has_never_treated is computed across both binary and continuous numeric treatment types: some unit has treatment == 0 in every observed non-NaN row. For binary this flags the clean-control group; for continuous this flags zero-dose controls (required by ContinuousDiD). Always False for "categorical".

has_always_treated has binary-only semantics: some unit has treatment == 1 in every observed non-NaN row (no pre-treatment information in the DiD sense). For "continuous" and "categorical" treatment this field is always False regardless of dose positivity — pre-treatment periods on continuous DiD are determined by the separate first_treat column passed to ContinuousDiD.fit, not by whether the dose is strictly positive.

Rows with NaN in unit or time are dropped up front and surfaced via the missing_id_rows_dropped alert; all subsequent structural facts are computed on the non-missing subset, so observation_coverage is always in [0, 1]. Duplicate (unit, time) rows are surfaced separately via the duplicate_unit_time_rows alert.

The profile does not recommend an estimator. Consult diff_diff.get_llm_guide("autonomous") for the estimator-support matrix and per-design-feature reasoning.