Skip to content

API: Splitters

Expanding-window walk-forward splitter with an embargo (purge).

Time is measured in units. If groups is passed to :meth:split, each unique group value (e.g. a date) is one time unit and the whole cross-section of a unit always stays together in the same fold — which is required for cross-sectional rank losses. If groups is None, each row is its own unit.

For each of n_splits folds, the test block is a contiguous range of the most recent units; the training set is all units strictly before it, minus an embargo of units immediately preceding the test block (the purge). This prevents the base model from being trained on data adjacent to (and potentially leaking into) the evaluation block.

Parameters:

Name Type Description Default
n_splits int

Number of walk-forward test folds.

5
embargo int

Number of time units to drop between the train set and each test block.

0
min_train_size int

Minimum number of training units required to emit a fold; smaller folds are skipped.

1
max_train_size int | None

If set, training uses at most this many of the most recent units (rolling window). Otherwise the window expands from the start.

None

split(X, y=None, groups=None)

Yield (train_idx, test_idx) row-index arrays for each fold.

Re-exported from scikit-learn: KFold, TimeSeriesSplit.