Class KFold (2.15.0)

KFold(n_splits: int = 5, *, random_state: typing.Optional[int] = None)

K-Fold cross-validator.

Split data in train/test sets. Split dataset into k consecutive folds.

Each fold is then used once as a validation while the k - 1 remaining folds form the training set.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.model_selection import KFold
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [1, 3, 5], "feat1": [2, 4, 6]})
>>> y = bpd.DataFrame({"label": [1, 2, 3]})
>>> kf = KFold(n_splits=3, random_state=42)
>>> for i, (X_train, X_test, y_train, y_test) in enumerate(kf.split(X, y)):
... print(f"Fold {i}:")
... print(f" X_train: {X_train}")
... print(f" X_test: {X_test}")
... print(f" y_train: {y_train}")
... print(f" y_test: {y_test}")
...
Fold 0:
 X_train: feat0 feat1
1 3 4
2 5 6
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
0 1 2
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
1 2
2 3
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
0 1
<BLANKLINE>
[1 rows x 1 columns]
Fold 1:
 X_train: feat0 feat1
0 1 2
2 5 6
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
1 3 4
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
0 1
2 3
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
1 2
<BLANKLINE>
[1 rows x 1 columns]
Fold 2:
 X_train: feat0 feat1
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
2 5 6
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
0 1
1 2
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
2 3
<BLANKLINE>
[1 rows x 1 columns]

Parameters
Name	Description
`n_splits`	`int` Number of folds. Must be at least 2. Default to 5.
`random_state`	`Optional[int]` A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time. Default to None.

Methods

get_n_splits

get_n_splits() -> int

Returns the number of splitting iterations in the cross-validator.

Returns
Type	Description
`int`	the number of splitting iterations in the cross-validator.

split

split(
 X: typing.Union[
 bigframes.dataframe.DataFrame,
 bigframes.series.Series,
 pandas.core.frame.DataFrame,
 pandas.core.series.Series,
 ],
 y: typing.Optional[
 typing.Union[
 bigframes.dataframe.DataFrame,
 bigframes.series.Series,
 pandas.core.frame.DataFrame,
 pandas.core.series.Series,
 ]
 ] = None,
) -> typing.Generator[
 tuple[
 typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series, NoneType],
 ...,
 ],
 None,
 None,
]

Generate indices to split data into training and test set.

Parameters
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series` BigFrames DataFrame or Series of shape (n_samples, n_features) Training data, where `n_samples` is the number of samples and `n_features` is the number of features.
`y`	`bigframes.dataframe.DataFrame, bigframes.series.Series or None :Yields: X_train (bigframes.dataframe.DataFrame or bigframes.series.Series) -- The training data for that split. X_test (bigframes.dataframe.DataFrame or bigframes.series.Series): The testing data for that split. y_train (bigframes.dataframe.DataFrame, bigframes.series.Series or None): The training label for that split. y_test (bigframes.dataframe.DataFrame, bigframes.series.Series or None): The testing label for that split.` BigFrames DataFrame, Series of shape (n_samples,) or None. The target variable for supervised learning problems. Default to None.

Class KFold (2.15.0) Stay organized with collections Save and categorize content based on your preferences.

Parameters

Methods

get_n_splits

split

Class KFold (2.15.0)