Decomposition
Bases: BaseEstimator
Fit and predict the DEUP error model g.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Any
|
The secondary regressor. Defaults to
:class: |
None
|
features
|
Any
|
Optional feature builder (e.g. a
:class: |
None
|
target_transform
|
TargetTransform
|
Stabilization for the error target: |
'log'
|
error_eps
|
float
|
Stabilizer for |
1e-06
|
clip_negative
|
bool
|
If |
True
|
Attributes:
| Name | Type | Description |
|---|---|---|
model_ |
The fitted secondary regressor. |
|
features_ |
The fitted feature builder (or |
fit(X, errors, y=None)
Fit g on (features(X), errors).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
Any
|
Inputs aligned with |
required |
errors
|
ArrayLike
|
Non-negative pointwise error targets from the OOF collector. |
required |
y
|
ArrayLike | None
|
Optional original targets, forwarded to feature builders that need them
(e.g. |
None
|
predict(X)
Predict the (non-negative) error estimate g(x).
Bases: BaseEstimator
Constant aleatoric variance a(x) = sigma^2 for all x.
The global noise level is estimated as the mean local label variance among
k nearest neighbors (a bias-corrected estimate of Var(Y | X) averaged over
the training inputs). Use when label noise is believed roughly constant across the
input space (the paper's scenario 3 with a non-zero floor).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int
|
Neighbors used to estimate local label variance. |
10
|
Bases: BaseEstimator
Input-dependent aleatoric variance via local k-NN label variance.
For each x the estimate is the bias-corrected variance of training y among
its k nearest neighbors — a model-free estimate of Var(Y | X = x).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
k
|
int
|
Number of neighbors for the local variance estimate. |
10
|
Bases: BaseEstimator
Aleatoric variance from a quantile-regression spread.
Fits two quantile regressors at q_lo and q_hi and converts the predicted
interval width to a variance via the Gaussian relation
sigma = (q_hi - q_lo) / (z_hi - z_lo), then a(x) = sigma^2.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimator
|
Any
|
A quantile regressor factory taking a |
None
|
q_lo
|
float
|
Lower / upper quantiles (default 0.159 / 0.841 ~ +/-1 sigma). |
0.159
|
q_hi
|
float
|
Lower / upper quantiles (default 0.159 / 0.841 ~ +/-1 sigma). |
0.159
|
Return the epistemic estimate e_hat = max(0, g - a).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
error
|
ArrayLike
|
The error estimate |
required |
aleatoric
|
ArrayLike | None
|
The aleatoric estimate |
None
|
clip
|
bool
|
If |
True
|
Residualize a signal on the within-group rank of a model score.
For cross-sectional rankers the raw epistemic signal can be partly mechanical:
the within-date rank percentile of |score| mechanically tracks the loss target
(Finding 3, per-date rho(e_hat, |score|) ~ 0.616). This transform fits an isotonic
map from the within-group rank to the signal and subtracts it, leaving the part of
the signal not explained by rank geometry.
Apply the same fitted residualizer to both g and the loss target to obtain a
decoupled signal whose association with realized loss can then be measured honestly.
The axis to rank on is supplied as the score argument of fit/transform
(pass |score| to decouple from rank-of-conviction).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
out_of_bounds
|
str
|
Passed to :class: |
'clip'
|
fit(values, score, groups=None)
Fit the isotonic rank -> value map (pooled across groups).
transform(values, score, groups=None)
Return values minus the rank-explained component.
Quantify rank-geometry coupling reduction and loss-association retention.
Returns Spearman rho(g, |score|) before/after residualization (coupling) and
rho(signal, loss) before/after (loss association). retention is the ratio
of after/before loss association (Finding 3 reports R ~ 0.955).
Decide whether to keep density features in g.
Finding 3 corollary: in homogeneous tabular/finance universes density features can
add no signal beyond rank geometry. Drop density when BOTH its gain importance
is negligible AND adding it changes the loss partial-correlation by less than
corr_tol.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gain_importance
|
float
|
The density feature's relative gain importance in |
required |
delta_partial_corr
|
float
|
|
required |
importance_tol
|
float
|
Thresholds below which each signal is considered negligible. |
0.001
|
corr_tol
|
float
|
Thresholds below which each signal is considered negligible. |
0.001
|
Returns:
| Type | Description |
|---|---|
DensityKillDecision
|
|