Tutorial: Tabular regression
Goal: wrap an existing sklearn regressor with DEUP and rank test points by expected error.
Setup
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from deup import DEUPRegressor
X, y = fetch_california_housing(return_X_y=True)
X = StandardScaler().fit_transform(X)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=0)
Fit DEUP
model = DEUPRegressor(
base_model=RandomForestRegressor(n_estimators=80, random_state=0),
cv=5,
random_state=0,
)
model.fit(X_tr, y_tr)
Under the hood: OOFErrorCollector gathers out-of-fold squared errors, then
ErrorEstimator trains g to predict them.
Predict with uncertainty
pred, unc = model.predict(X_te, return_uncertainty=True)
pred— base model point prediction (refit on all training data)unc— epistemic estimateg(x) ≥ 0(higher = less trustworthy)
Optional: tabular preset with density features
from deup.domains.tabular import TabularDEUP
# sklearn default (HistGradientBoosting)
preset = TabularDEUP(cv=5, random_state=0)
preset.fit(X_tr, y_tr)
unc = preset.predict_epistemic(X_te)
LightGBM / XGBoost / CatBoost
# pip install "deup[xgb]" # or deup[gbm], deup[catboost], deup[gbm-all]
preset = TabularDEUP(backend="xgb", cv=5, random_state=0)
preset.fit(X_tr, y_tr)
unc = preset.predict_epistemic(X_te)
Same OOF + density-feature pipeline; only the base and error models change.
Benchmark context
On California housing, DEUP beats ensemble disagreement and conformal-residual baselines for ranking realized squared error (Spearman ρ ≈ 0.51). See Benchmarks.