Class KFold (2.15.0)

KFold(n_splits: int = 5, *, random_state: typing.Optional[int] = None)

K-Fold cross-validator.

Split data in train/test sets. Split dataset into k consecutive folds.

Each fold is then used once as a validation while the k - 1 remaining folds form the training set.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.model_selection import KFold
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [1, 3, 5], "feat1": [2, 4, 6]})
>>> y = bpd.DataFrame({"label": [1, 2, 3]})
>>> kf = KFold(n_splits=3, random_state=42)
>>> for i, (X_train, X_test, y_train, y_test) in enumerate(kf.split(X, y)):
... print(f"Fold {i}:")
... print(f" X_train: {X_train}")
... print(f" X_test: {X_test}")
... print(f" y_train: {y_train}")
... print(f" y_test: {y_test}")
...
Fold 0:
 X_train: feat0 feat1
1 3 4
2 5 6
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
0 1 2
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
1 2
2 3
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
0 1
<BLANKLINE>
[1 rows x 1 columns]
Fold 1:
 X_train: feat0 feat1
0 1 2
2 5 6
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
1 3 4
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
0 1
2 3
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
1 2
<BLANKLINE>
[1 rows x 1 columns]
Fold 2:
 X_train: feat0 feat1
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
2 5 6
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
0 1
1 2
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
2 3
<BLANKLINE>
[1 rows x 1 columns]

Parameters

Name Description
n_splits int

Number of folds. Must be at least 2. Default to 5.

random_state Optional[int]

A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time. Default to None.

Methods

get_n_splits

get_n_splits() -> int

Returns the number of splitting iterations in the cross-validator.

Returns
Type Description
int the number of splitting iterations in the cross-validator.

split

split(
 X: typing.Union[
 bigframes.dataframe.DataFrame,
 bigframes.series.Series,
 pandas.core.frame.DataFrame,
 pandas.core.series.Series,
 ],
 y: typing.Optional[
 typing.Union[
 bigframes.dataframe.DataFrame,
 bigframes.series.Series,
 pandas.core.frame.DataFrame,
 pandas.core.series.Series,
 ]
 ] = None,
) -> typing.Generator[
 tuple[
 typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series, NoneType],
 ...,
 ],
 None,
 None,
]

Generate indices to split data into training and test set.

Parameters
Name Description
X bigframes.dataframe.DataFrame or bigframes.series.Series

BigFrames DataFrame or Series of shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features.

y bigframes.dataframe.DataFrame, bigframes.series.Series or None :Yields: *X_train (bigframes.dataframe.DataFrame or bigframes.series.Series)* -- The training data for that split. X_test (bigframes.dataframe.DataFrame or bigframes.series.Series): The testing data for that split. y_train (bigframes.dataframe.DataFrame, bigframes.series.Series or None): The training label for that split. y_test (bigframes.dataframe.DataFrame, bigframes.series.Series or None): The testing label for that split.

BigFrames DataFrame, Series of shape (n_samples,) or None. The target variable for supervised learning problems. Default to None.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月27日 UTC.