API: Core
Collect a base model's out-of-fold predictions and pointwise errors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimator
|
Any
|
The base model |
required |
cv
|
Any
|
A splitter exposing |
required |
loss
|
str | LossFn
|
Error-target loss: a registry name ( |
'squared'
|
proba
|
bool
|
If |
False
|
refit_on_all
|
bool
|
If |
True
|
Notes
Rows never assigned to a test fold (e.g. the earliest rows under walk-forward)
are excluded from the returned :class:~deup.core.types.OOFResult. If a row is
assigned to more than one test fold (e.g. repeated CV), a warning is raised and
the last fold's prediction is kept, since averaging would break the
one-error-per-row contract that g is trained on.
fit_collect(X, y, groups=None)
Run the out-of-fold loop and return the collected errors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
Any
|
Training features and targets. |
required |
y
|
Any
|
Training features and targets. |
required |
groups
|
ArrayLike | None
|
Optional per-row group labels (e.g. dates). Passed to the splitter and to
group-aware losses such as |
None
|
Resolve loss (a registry name or a callable) to a loss function.
For pinball, pass q (default 0.5) or use the string "pinball:0.9".
Stabilize heavy-tailed error targets before training g.
log:log(error + eps)(default; used by :class:~deup.estimators.DEUPRegressor)asinh:asinh(error / eps)— robust alternative for very heavy tailsnone: identity
Maps each row to a group and supports within-group operations.
Attributes:
| Name | Type | Description |
|---|---|---|
codes |
NDArray[Any]
|
Integer group code per row, in |
labels |
NDArray[Any]
|
The unique group labels, indexed by code. |
n_groups
property
Number of distinct groups.
is_trivial
property
True when there is a single group (the i.i.d. case).
from_labels(group_labels, n)
classmethod
Build a grouping from per-row labels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group_labels
|
ArrayLike | None
|
Per-row group labels (e.g. dates). If |
required |
n
|
int
|
Number of rows (used to size the trivial group and validate lengths). |
required |
indices()
Row indices for each group, ordered by group code.
rank_within(values, pct=True)
Rank values within each group.
Ties are averaged. With pct=True (default) ranks are divided by the
group size, matching pandas.Series.groupby(...).rank(pct=True) — the
convention used for cross-sectional rank features and rank losses.
Out-of-fold artifacts produced when collecting a base model's errors.
Attributes:
| Name | Type | Description |
|---|---|---|
predictions |
NDArray[Any]
|
Out-of-fold predictions of the base model |
errors |
NDArray[Any]
|
Per-row error targets that the secondary predictor |
fold_ids |
NDArray[Any]
|
The fold in which each row was held out. Useful for diagnostics and for walk-forward reporting. |
group_ids |
NDArray[Any] | None
|
Optional per-row group label (e.g. a date for cross-sectional ranking).
|
indices |
NDArray[Any] | None
|
Optional positions of these rows in the original input |
estimator |
Any
|
Optionally, the base model refit on all data for deployment. |
n
property
Number of rows.
A prediction together with its uncertainty decomposition.
Attributes:
| Name | Type | Description |
|---|---|---|
prediction |
NDArray[Any]
|
Point prediction of the base model. |
epistemic |
NDArray[Any]
|
Estimated epistemic uncertainty |
aleatoric |
NDArray[Any] | None
|
Optional estimated aleatoric (irreducible) uncertainty |
lower, upper |
Optional calibrated prediction-interval bounds. |
n
property
Number of rows.