Synthetic Control Method (SCM)
==============================

Classic synthetic control estimator for a single treated unit (Abadie, Diamond &
Hainmueller 2010; originating in Abadie & Gardeazabal 2003).

The treated unit's counterfactual is a convex combination of "donor" (never-treated)
units. Donor weights ``W*(V)`` solve a simplex-constrained, predictor-importance-weighted
least-squares fit of the treated unit's pre-period predictors; the diagonal
predictor-importance matrix ``V`` is chosen data-driven (minimizing pre-period outcome
MSPE, ``v_method="nested"``; out-of-sample cross-validation, ``v_method="cv"``; or
closed-form inverse-variance, ``v_method="inverse_variance"``) or supplied by the user
(``v_method="custom"``). The
treatment-effect path is the gap :math:`\hat{\alpha}_{1t} = Y_{1t} - \sum_j w_j Y_{jt}`
over the post periods.

**When to use SCM:**

- Exactly **one treated unit** with a long, well-fit pre-treatment period.
- A curated **donor pool** of comparable never-treated units.
- Aggregate / few-unit comparative case studies (states, regions, countries).

**Inference:** classic SCM has **no analytical standard error**. ``se``, ``t_stat``,
``p_value`` and ``conf_int`` are always NaN; ``att`` (the mean post-period gap) is the
reported estimate. Significance comes from **in-space placebo permutation inference** via
:meth:`~diff_diff.SyntheticControlResults.in_space_placebo` (post/pre RMSPE-ratio statistic,
``placebo_p_value = rank/(n_placebos+1)``). This permutation p-value is a separate field
from the (NaN) ``p_value``; ``is_significant`` stays bound to ``p_value``.

**Robustness diagnostics (ADH 2015 §4, opt-in):**
:meth:`~diff_diff.SyntheticControlResults.leave_one_out` drops each reportably-weighted (weight > 1e-6)
donor and re-fits (per-drop ATT / ``delta_att`` table — a large ``delta_att`` flags
single-donor dependence), and
:meth:`~diff_diff.SyntheticControlResults.in_time_placebo` reassigns the intervention to an
earlier pre-date and checks for a spurious gap before the true treatment date (the
backdating placebo; ``placebo_att`` should be ~0). Both re-run the validated solver and
leave the analytical inference fields NaN.

**Distinct from** :class:`~diff_diff.SyntheticDiD` (Arkhangelsky et al. 2021), which adds
time weights and ridge regularization; classic SCM uses **donor weights only** plus the
outer ``V`` search.

**Reference:** Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic Control
Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco
Control Program. *Journal of the American Statistical Association*, 105(490), 493–505.
`doi:10.1198/jasa.2009.ap08746 <https://doi.org/10.1198/jasa.2009.ap08746>`_

SyntheticControl
----------------

Main estimator class for classic synthetic control estimation.

.. autoclass:: diff_diff.SyntheticControl
   :no-index:
   :members:
   :undoc-members:
   :show-inheritance:
   :inherited-members:

   .. rubric:: Methods

   .. autosummary::

      ~SyntheticControl.fit
      ~SyntheticControl.get_params
      ~SyntheticControl.set_params

SyntheticControlResults
-----------------------

Results container for synthetic control estimation.

.. autoclass:: diff_diff.SyntheticControlResults
   :no-index:
   :members:
   :undoc-members:
   :show-inheritance:

   .. rubric:: Methods

   .. autosummary::

      ~SyntheticControlResults.in_space_placebo
      ~SyntheticControlResults.get_placebo_df
      ~SyntheticControlResults.leave_one_out
      ~SyntheticControlResults.get_leave_one_out_df
      ~SyntheticControlResults.get_leave_one_out_gaps
      ~SyntheticControlResults.in_time_placebo
      ~SyntheticControlResults.get_in_time_placebo_df
      ~SyntheticControlResults.get_in_time_placebo_gaps
      ~SyntheticControlResults.summary
      ~SyntheticControlResults.print_summary
      ~SyntheticControlResults.to_dict
      ~SyntheticControlResults.to_dataframe
      ~SyntheticControlResults.get_gap_df
      ~SyntheticControlResults.get_weights_df

Convenience Function
--------------------

.. autofunction:: diff_diff.synthetic_control

Predictors and V selection
--------------------------

Predictor rows of ``X1`` (treated) / ``X0`` (donors) are built, in this canonical row
order (the ordering matches R ``Synth::dataprep``), from:

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Argument
     - Meaning
   * - ``predictors`` + ``predictor_window`` + ``predictors_op``
     - Columns averaged over a pre-period window (default: all pre periods).
   * - ``special_predictors``
     - ``(var, periods, op)`` triples, each averaged over its own periods/operator.
   * - ``pre_period_outcomes``
     - Individual pre-period outcomes as predictor rows (``"all"`` or a list). When
       no predictor arguments are given, defaults to all pre-period outcomes.

``v_method="nested"`` selects the diagonal predictor-importance matrix ``V`` by minimizing
the pre-period **outcome** MSPE of ``W*(V)`` over a multistart Nelder-Mead search with a
derivative-free Powell polish. ``v_method="cv"`` selects ``V`` by **out-of-sample
cross-validation** (Abadie-Diamond-Hainmueller 2015; Abadie 2021): the pre-period is split
at ``v_cv_t0`` (default ``len(pre)//2``, i.e. ``t0 = T0/2``) into a training and a validation
window; ``V`` is chosen to minimize the validation-window outcome MSPE of the training-fit
weights, then the final weights are re-estimated on the validation-window predictors. Each
predictor is **re-aggregated** over each window (a separate ``dataprep`` per window, as
ADH 2015's CV does), so it must **span both windows** — the default per-period outcome lags
(single-period) are rejected; pass spanning covariate / multi-period ``special_predictors``
(see ``docs/methodology/REGISTRY.md`` §SyntheticControl).
``v_method="inverse_variance"`` uses the closed-form ``v_h = 1/Var(X_h)`` (variance over
donors+treated; no search), applied to the **raw** predictors — it intentionally bypasses
``standardize`` (inverse-variance weighting *is* the unit-variance rescaling). ``v_method="custom"`` takes a user-supplied ``custom_v``
(one entry per predictor row, trace-normalized) and skips the outer search. ``v_cv_t0``
must be ``None`` unless ``v_method="cv"``.

.. note::

   The predictor standardization (per-row SD over donors+treated, ddof=1) and the
   optimizer are pinned from the R ``Synth`` source — they are not specified in
   Abadie-Diamond-Hainmueller (2010). The outer objective uses all pre periods rather than
   R's ``time.optimize.ssr`` window, so the nested ``V`` differs from R by an
   efficiency-only choice. Predictor/outcome aggregation also **fails closed** on any
   non-finite cell, whereas R ``dataprep`` uses ``na.rm=TRUE`` — restrict
   ``predictor_window`` / ``special_predictors`` periods to where a variable is observed.
   Predictor rows support only **equal-weight** linear combinations (``mean``, ``sum``,
   per-period lags); ADH (2010) §2.3's general weighted form ``Σ_s k_s Y_is`` with
   arbitrary ``k_s`` (and non-linear ops such as ``median``) is not accepted in this
   release. See ``docs/methodology/REGISTRY.md`` §SyntheticControl for all deviation labels.

Example Usage
-------------

Basic usage with covariate and special predictors::

    from diff_diff import SyntheticControl

    scm = SyntheticControl(v_method="nested", seed=0)
    results = scm.fit(
        data,
        outcome="gdpcap",
        treatment="treated",   # absorbing 0/1 indicator
        unit="region",
        time="year",
        predictors=["invest", "school.high"],
        # Set predictor_window explicitly when a covariate is only observed on a
        # subset of the pre periods — the default averages over ALL pre periods and
        # fails closed if any selected cell is non-finite.
        predictor_window=[1964, 1965, 1966, 1967, 1968, 1969],
        special_predictors=[("gdpcap", [1960, 1965, 1969], "mean")],
    )
    results.print_summary()

    # Effect path and donor weights
    gap_df = results.get_gap_df()        # period, gap, phase
    weights_df = results.get_weights_df()  # unit, weight (descending)

Quick estimation with the convenience function::

    from diff_diff import synthetic_control

    results = synthetic_control(
        data, outcome="gdpcap", treatment="treated",
        unit="region", time="year",
    )
    print(f"ATT: {results.att:.3f}, pre-RMSPE: {results.pre_rmspe:.3f}")

In-space placebo permutation inference (opt-in; refits one synthetic control per donor)::

    placebo_df = results.in_space_placebo()       # reassigns treatment to each donor
    print(f"placebo p-value: {results.placebo_p_value:.3f} "
          f"(n_placebos={results.n_placebos})")    # p = rank/(n_placebos+1)
    print(placebo_df)   # per-unit RMSPE-ratio table used for the permutation rank

Supplying a fixed predictor-importance matrix (skips the outer V search)::

    import numpy as np

    scm = SyntheticControl(v_method="custom", custom_v=np.ones(n_predictors))
    results = scm.fit(data, outcome="gdpcap", treatment="treated",
                      unit="region", time="year", predictors=["invest"])

Comparison with Synthetic DiD
-----------------------------

.. list-table::
   :header-rows: 1
   :widths: 25 35 40

   * - Feature
     - SyntheticControl
     - SyntheticDiD
   * - Unit (donor) weights
     - Simplex, predictor-importance weighted
     - Simplex, ridge-regularized
   * - Time weights
     - None (level matching)
     - Simplex (double difference)
   * - Predictor-importance ``V``
     - Nested / cv / inverse-variance / custom diagonal ``V``
     - No analog
   * - Inference
     - Placebo permutation (no analytical SE)
     - Bootstrap / jackknife / placebo

Use **SCM** for a single treated unit with a long pre-period and a curated donor pool;
use **SDID** when you have several treated units and parallel trends is plausible.