API: Core

Collect a base model's out-of-fold predictions and pointwise errors.

Parameters:

Name	Type	Description	Default
`estimator`	`Any`	The base model `f` (any scikit-learn-style `fit`/`predict` object). It is cloned per fold; the passed instance is never fitted in place.	required
`cv`	`Any`	A splitter exposing `split(X, y, groups)` (e.g. `KFold`, `TimeSeriesSplit`, or :class:`deup.splitters.PurgedWalkForward`). For time-ordered data use a non-shuffling splitter; the collector itself never shuffles.	required
`loss`	`str \| LossFn`	Error-target loss: a registry name (`"squared"`, `"absolute"`, `"logloss"`, `"brier"`, `"pinball"`, `"rank"`) or a callable `loss(y_true, y_pred, groups)`.	`'squared'`
`proba`	`bool`	If `True`, use `predict_proba` instead of `predict` -- required for classification log-loss / Brier targets. Binary probabilities are stored as the positive-class column; multiclass probabilities are stored as a 2-D array and passed through to the loss.	`False`
`refit_on_all`	`bool`	If `True` (default), also refit a clone of the base model on all data and expose it as `OOFResult.estimator`. See the module docstring for the "g trained on errors of a slightly smaller f" assumption this entails.	`True`

Notes

Rows never assigned to a test fold (e.g. the earliest rows under walk-forward) are excluded from the returned :class:~deup.core.types.OOFResult. If a row is assigned to more than one test fold (e.g. repeated CV), a warning is raised and the last fold's prediction is kept, since averaging would break the one-error-per-row contract that g is trained on.

`fit_collect(X, y, groups=None)`

Run the out-of-fold loop and return the collected errors.

Parameters:

Name	Type	Description	Default
`X`	`Any`	Training features and targets.	required
`y`	`Any`	Training features and targets.	required
`groups`	`ArrayLike \| None`	Optional per-row group labels (e.g. dates). Passed to the splitter and to group-aware losses such as `"rank"`.	`None`

Resolve loss (a registry name or a callable) to a loss function.

For pinball, pass q (default 0.5) or use the string "pinball:0.9".

Stabilize heavy-tailed error targets before training g.

log: log(error + eps) (default; used by :class:~deup.estimators.DEUPRegressor)
asinh: asinh(error / eps) — robust alternative for very heavy tails
none: identity

Map g's prediction back to the error scale.

Maps each row to a group and supports within-group operations.

Attributes:

Name	Type	Description
`codes`	`NDArray[Any]`	Integer group code per row, in `[0, n_groups)`.
`labels`	`NDArray[Any]`	The unique group labels, indexed by code.

`n_groups` `property`

Number of distinct groups.

`is_trivial` `property`

True when there is a single group (the i.i.d. case).

`from_labels(group_labels, n)` `classmethod`

Build a grouping from per-row labels.

Parameters:

Name	Type	Description	Default
`group_labels`	`ArrayLike \| None`	Per-row group labels (e.g. dates). If `None`, all `n` rows form a single trivial group (the i.i.d. case).	required
`n`	`int`	Number of rows (used to size the trivial group and validate lengths).	required

`indices()`

Row indices for each group, ordered by group code.

`rank_within(values, pct=True)`

Rank values within each group.

Ties are averaged. With pct=True (default) ranks are divided by the group size, matching pandas.Series.groupby(...).rank(pct=True) — the convention used for cross-sectional rank features and rank losses.

Out-of-fold artifacts produced when collecting a base model's errors.

Attributes:

Name	Type	Description
`predictions`	`NDArray[Any]`	Out-of-fold predictions of the base model `f`, one per row.
`errors`	`NDArray[Any]`	Per-row error targets that the secondary predictor `g` will learn from (e.g. squared residuals or per-group rank losses).
`fold_ids`	`NDArray[Any]`	The fold in which each row was held out. Useful for diagnostics and for walk-forward reporting.
`group_ids`	`NDArray[Any] \| None`	Optional per-row group label (e.g. a date for cross-sectional ranking). `None` for i.i.d. data.
`indices`	`NDArray[Any] \| None`	Optional positions of these rows in the original input `X` (the rows that received an out-of-fold prediction). `None` if not tracked.
`estimator`	`Any`	Optionally, the base model refit on all data for deployment. `None` if the caller chose not to refit.

`n` `property`

Number of rows.

A prediction together with its uncertainty decomposition.

Attributes:

Name	Type	Description
`prediction`	`NDArray[Any]`	Point prediction of the base model.
`epistemic`	`NDArray[Any]`	Estimated epistemic uncertainty `g(x)` (optionally net of aleatoric).
`aleatoric`	`NDArray[Any] \| None`	Optional estimated aleatoric (irreducible) uncertainty `a(x)`.
`lower, upper`		Optional calibrated prediction-interval bounds.

`n` `property`

Number of rows.

API: Core

fit_collect(X, y, groups=None)

n_groups property

is_trivial property

from_labels(group_labels, n) classmethod

indices()

rank_within(values, pct=True)

n property

n property

`fit_collect(X, y, groups=None)`

`n_groups` `property`

`is_trivial` `property`

`from_labels(group_labels, n)` `classmethod`

`indices()`

`rank_within(values, pct=True)`

`n` `property`

`n` `property`