Conformal calibration

DEUP's predict_epistemic returns an uncalibrated score: higher means "less trustworthy", but not a probability. Split-conformal calibration turns it into prediction intervals with finite-sample, distribution-free marginal coverage \(P(y \in [\hat{y}^-, \hat{y}^+]) \ge 1 - \alpha\) — using the DEUP signal as the interval's width.

How it works

On a held-out calibration set, compute normalized residuals \(r_i = |y_i - f(x_i)| / g(x_i)\) and take their \((1-\alpha)\) empirical quantile \(q\). The interval at a new point is

\[ [\,f(x) - q\,g(x),\;\; f(x) + q\,g(x)\,]. \]

Intervals are narrow where \(g\) is small (confident) and wide where \(g\) is large — locally adaptive coverage, unlike a constant-width baseline.

Usage

from deup import DEUPRegressor

model = DEUPRegressor(base_model=my_model).fit(X_train, y_train)

# calibrate on a separate held-out split (NOT the training data)
model.calibrate(X_cal, y_cal, method="normalized", alpha=0.1)

interval = model.predict_interval(X_test)
interval.lower, interval.upper, interval.width

Use held-out data

Coverage guarantees require the calibration set to be unseen by both the base model \(f\) and the error model \(g\). Don't calibrate on training rows.

Methods

`method`	Score	Use when
`normalized` (default)	\(\lvert y-f(x)\rvert / g(x)\)	locally adaptive intervals
`mondrian`	per-group quantile	group/regime-conditional coverage
`cqr`	conformalized quantile regression	you already have quantile models

# Mondrian: group-conditional coverage (e.g. per regime)
model.calibrate(X_cal, y_cal, method="mondrian", alpha=0.1, groups=regime_cal)
interval = model.predict_interval(X_test, groups=regime_test)

The standalone UncertaintyCalibrator works with raw arrays (any base model):

from deup.calibration import UncertaintyCalibrator

cal = UncertaintyCalibrator(method="normalized", alpha=0.1)
cal.fit(y_cal, y_pred_cal, uncertainty_cal)
interval = cal.predict_interval(y_pred_test, uncertainty_test)

MAPIE interop

deup is complementary to MAPIE: MAPIE supplies mature conformal machinery, DEUP supplies a high-quality per-point scale \(g(x)\). Expose the DEUP scale as a normalizer:

from deup.calibration import deup_normalizer

normalizer = deup_normalizer(model)   # .predict(X) == model.predict_epistemic(X)
scale = normalizer.predict(X_cal)     # feed into MAPIE as a residual scale

See examples/mapie_interop.py for a runnable script.

Coverage guarantee

Split conformal gives the finite-sample bound (Lei et al., 2018)

\[ 1 - \alpha \;\le\; P(y \in \hat{C}(x)) \;\le\; 1 - \alpha + \tfrac{1}{n_{\text{cal}}+1}, \]

so intervals may slightly over-cover; this is correct, not a bug. deup's test suite checks empirical coverage within tolerance on i.i.d. and purged time-split fixtures.