Skip to content

API: Core

Collect a base model's out-of-fold predictions and pointwise errors.

Parameters:

Name Type Description Default
estimator Any

The base model f (any scikit-learn-style fit/predict object). It is cloned per fold; the passed instance is never fitted in place.

required
cv Any

A splitter exposing split(X, y, groups) (e.g. KFold, TimeSeriesSplit, or :class:deup.splitters.PurgedWalkForward). For time-ordered data use a non-shuffling splitter; the collector itself never shuffles.

required
loss str | LossFn

Error-target loss: a registry name ("squared", "absolute", "logloss", "brier", "pinball", "rank") or a callable loss(y_true, y_pred, groups).

'squared'
proba bool

If True, use predict_proba instead of predict -- required for classification log-loss / Brier targets. Binary probabilities are stored as the positive-class column; multiclass probabilities are stored as a 2-D array and passed through to the loss.

False
refit_on_all bool

If True (default), also refit a clone of the base model on all data and expose it as OOFResult.estimator. See the module docstring for the "g trained on errors of a slightly smaller f" assumption this entails.

True
Notes

Rows never assigned to a test fold (e.g. the earliest rows under walk-forward) are excluded from the returned :class:~deup.core.types.OOFResult. If a row is assigned to more than one test fold (e.g. repeated CV), a warning is raised and the last fold's prediction is kept, since averaging would break the one-error-per-row contract that g is trained on.

fit_collect(X, y, groups=None)

Run the out-of-fold loop and return the collected errors.

Parameters:

Name Type Description Default
X Any

Training features and targets.

required
y Any

Training features and targets.

required
groups ArrayLike | None

Optional per-row group labels (e.g. dates). Passed to the splitter and to group-aware losses such as "rank".

None

Resolve loss (a registry name or a callable) to a loss function.

For pinball, pass q (default 0.5) or use the string "pinball:0.9".

Stabilize heavy-tailed error targets before training g.

  • log: log(error + eps) (default; used by :class:~deup.estimators.DEUPRegressor)
  • asinh: asinh(error / eps) — robust alternative for very heavy tails
  • none: identity

Map g's prediction back to the error scale.

Maps each row to a group and supports within-group operations.

Attributes:

Name Type Description
codes NDArray[Any]

Integer group code per row, in [0, n_groups).

labels NDArray[Any]

The unique group labels, indexed by code.

n_groups property

Number of distinct groups.

is_trivial property

True when there is a single group (the i.i.d. case).

from_labels(group_labels, n) classmethod

Build a grouping from per-row labels.

Parameters:

Name Type Description Default
group_labels ArrayLike | None

Per-row group labels (e.g. dates). If None, all n rows form a single trivial group (the i.i.d. case).

required
n int

Number of rows (used to size the trivial group and validate lengths).

required

indices()

Row indices for each group, ordered by group code.

rank_within(values, pct=True)

Rank values within each group.

Ties are averaged. With pct=True (default) ranks are divided by the group size, matching pandas.Series.groupby(...).rank(pct=True) — the convention used for cross-sectional rank features and rank losses.

Out-of-fold artifacts produced when collecting a base model's errors.

Attributes:

Name Type Description
predictions NDArray[Any]

Out-of-fold predictions of the base model f, one per row.

errors NDArray[Any]

Per-row error targets that the secondary predictor g will learn from (e.g. squared residuals or per-group rank losses).

fold_ids NDArray[Any]

The fold in which each row was held out. Useful for diagnostics and for walk-forward reporting.

group_ids NDArray[Any] | None

Optional per-row group label (e.g. a date for cross-sectional ranking). None for i.i.d. data.

indices NDArray[Any] | None

Optional positions of these rows in the original input X (the rows that received an out-of-fold prediction). None if not tracked.

estimator Any

Optionally, the base model refit on all data for deployment. None if the caller chose not to refit.

n property

Number of rows.

A prediction together with its uncertainty decomposition.

Attributes:

Name Type Description
prediction NDArray[Any]

Point prediction of the base model.

epistemic NDArray[Any]

Estimated epistemic uncertainty g(x) (optionally net of aleatoric).

aleatoric NDArray[Any] | None

Optional estimated aleatoric (irreducible) uncertainty a(x).

lower, upper

Optional calibrated prediction-interval bounds.

n property

Number of rows.