diff_diff.profile_panel#
- diff_diff.profile_panel(df, *, unit, time, treatment, outcome)[source]
Describe the structure of a DiD panel.
Reports structural facts — balance, treatment-type classification, outcome characteristics, factual alerts. Descriptive, not opinionated: the profile says what is, never what to do about it. Estimator selection is up to the caller.
- Parameters:
df (pandas.DataFrame) – Long-format panel data containing the four named columns.
unit (str) – Column identifying the cross-sectional unit.
time (str) – Column identifying the time period.
treatment (str) – Column holding the treatment indicator or dose. See Notes for the classification rules.
outcome (str) – Column holding the outcome variable.
- Returns:
Frozen dataclass. Call
.to_dict()for a JSON-serializable view.- Return type:
- Raises:
ValueError – If any of the four column names is not present in
df.
Examples
>>> import pandas as pd >>> from diff_diff import profile_panel >>> df = pd.DataFrame({ ... "u": [1, 1, 2, 2], ... "t": [0, 1, 0, 1], ... "tr": [0, 0, 1, 1], ... "y": [0.1, 0.2, 0.1, 0.9], ... }) >>> profile = profile_panel(df, unit="u", time="t", treatment="tr", outcome="y") >>> profile.is_balanced True >>> profile.treatment_type 'binary_absorbing'
Notes
Classification rules for
treatment_type:"binary_absorbing": numeric treatment whose observed non-NaN values are a subset of \(\{0, 1\}\) (one or two distinct values) AND each unit’s treatment sequence (ordered bytime) is weakly monotone non-decreasing. All-zero and all-one panels are valid degenerate cases."binary_non_absorbing": values a subset of \(\{0, 1\}\) with at least two distinct values observed, where at least one unit switches from 1 back to 0."continuous": numeric treatment with more than two distinct values, or a 2-valued numeric whose values are not in \(\{0, 1\}\) (matches theContinuousDiDconvention)."categorical": non-numeric dtype (object / category) or a column that is entirely NaN.
Bool-dtype columns (
True/False) are classified the same way as numeric{0, 1}: the library’s binary estimators validate on value support viadiff_diff.utils.validate_binary(), soTrue/Falsebehave like1/0for absorbing / non-absorbing classification.has_never_treatedis computed across both binary and continuous numeric treatment types: some unit hastreatment == 0in every observed non-NaN row. For binary this flags the clean-control group; for continuous this flags zero-dose controls (required byContinuousDiD). AlwaysFalsefor"categorical".has_always_treatedhas binary-only semantics: some unit hastreatment == 1in every observed non-NaN row (no pre-treatment information in the DiD sense). For"continuous"and"categorical"treatment this field is alwaysFalseregardless of dose positivity — pre-treatment periods on continuous DiD are determined by the separatefirst_treatcolumn passed toContinuousDiD.fit, not by whether the dose is strictly positive.Rows with
NaNinunitortimeare dropped up front and surfaced via themissing_id_rows_droppedalert; all subsequent structural facts are computed on the non-missing subset, soobservation_coverageis always in[0, 1]. Duplicate(unit, time)rows are surfaced separately via theduplicate_unit_time_rowsalert.The profile does not recommend an estimator. Consult
diff_diff.get_llm_guide("autonomous")for the estimator-support matrix and per-design-feature reasoning.