Class KFold (2.18.0)
 
 
 
 
 
 
 Stay organized with collections
 
 
 
 Save and categorize content based on your preferences.
 
  
 
 - 2.27.0 (latest)
- 2.26.0
- 2.25.0
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
KFold(n_splits: int = 5, *, random_state: typing.Optional[int] = None)K-Fold cross-validator.
Split data in train/test sets. Split dataset into k consecutive folds.
Each fold is then used once as a validation while the k - 1 remaining folds form the training set.
Examples:
>>> import bigframes.pandas as bpd
>>> from bigframes.ml.model_selection import KFold
>>> bpd.options.display.progress_bar = None
>>> X = bpd.DataFrame({"feat0": [1, 3, 5], "feat1": [2, 4, 6]})
>>> y = bpd.DataFrame({"label": [1, 2, 3]})
>>> kf = KFold(n_splits=3, random_state=42)
>>> for i, (X_train, X_test, y_train, y_test) in enumerate(kf.split(X, y)):
... print(f"Fold {i}:")
... print(f" X_train: {X_train}")
... print(f" X_test: {X_test}")
... print(f" y_train: {y_train}")
... print(f" y_test: {y_test}")
...
Fold 0:
 X_train: feat0 feat1
1 3 4
2 5 6
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
0 1 2
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
1 2
2 3
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
0 1
<BLANKLINE>
[1 rows x 1 columns]
Fold 1:
 X_train: feat0 feat1
0 1 2
2 5 6
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
1 3 4
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
0 1
2 3
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
1 2
<BLANKLINE>
[1 rows x 1 columns]
Fold 2:
 X_train: feat0 feat1
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
 X_test: feat0 feat1
2 5 6
<BLANKLINE>
[1 rows x 2 columns]
 y_train: label
0 1
1 2
<BLANKLINE>
[2 rows x 1 columns]
 y_test: label
2 3
<BLANKLINE>
[1 rows x 1 columns]
| Parameters | |
|---|---|
| Name | Description | 
| n_splits | intNumber of folds. Must be at least 2. Default to 5. | 
| random_state | Optional[int]A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time. Default to None. | 
Methods
get_n_splits
get_n_splits() -> intReturns the number of splitting iterations in the cross-validator.
| Returns | |
|---|---|
| Type | Description | 
| int | the number of splitting iterations in the cross-validator. | 
split
split(
 X: typing.Union[
 bigframes.dataframe.DataFrame,
 bigframes.series.Series,
 pandas.core.frame.DataFrame,
 pandas.core.series.Series,
 ],
 y: typing.Optional[
 typing.Union[
 bigframes.dataframe.DataFrame,
 bigframes.series.Series,
 pandas.core.frame.DataFrame,
 pandas.core.series.Series,
 ]
 ] = None,
) -> typing.Generator[
 tuple[
 typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series, NoneType],
 ...,
 ],
 None,
 None,
]Generate indices to split data into training and test set.
| Parameters | |
|---|---|
| Name | Description | 
| X | bigframes.dataframe.DataFrame or bigframes.series.Series BigFrames DataFrame or Series of shape (n_samples, n_features) Training data, where  | 
| y | bigframes.dataframe.DataFrame, bigframes.series.Series or None :Yields: *X_train (bigframes.dataframe.DataFrame or bigframes.series.Series)* -- The training data for that split. X_test (bigframes.dataframe.DataFrame or bigframes.series.Series): The testing data for that split. y_train (bigframes.dataframe.DataFrame, bigframes.series.Series or None): The training label for that split. y_test (bigframes.dataframe.DataFrame, bigframes.series.Series or None): The testing label for that split.BigFrames DataFrame, Series of shape (n_samples,) or None. The target variable for supervised learning problems. Default to None. |