API: Splitters
Expanding-window walk-forward splitter with an embargo (purge).
Time is measured in units. If groups is passed to :meth:split, each
unique group value (e.g. a date) is one time unit and the whole cross-section
of a unit always stays together in the same fold — which is required for
cross-sectional rank losses. If groups is None, each row is its own unit.
For each of n_splits folds, the test block is a contiguous range of the most
recent units; the training set is all units strictly before it, minus an
embargo of units immediately preceding the test block (the purge). This
prevents the base model from being trained on data adjacent to (and potentially
leaking into) the evaluation block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_splits
|
int
|
Number of walk-forward test folds. |
5
|
embargo
|
int
|
Number of time units to drop between the train set and each test block. |
0
|
min_train_size
|
int
|
Minimum number of training units required to emit a fold; smaller folds are skipped. |
1
|
max_train_size
|
int | None
|
If set, training uses at most this many of the most recent units (rolling window). Otherwise the window expands from the start. |
None
|
split(X, y=None, groups=None)
Yield (train_idx, test_idx) row-index arrays for each fold.
Re-exported from scikit-learn: KFold, TimeSeriesSplit.