Module model_selection (1.6.0)

Functions for test/train split and model tuning. This module is styled after scikit-learn's model_selection module: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection.

Modules Functions

train_test_split

train_test_split(
 *arrays: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
 test_size: typing.Optional[float] = None,
 train_size: typing.Optional[float] = None,
 random_state: typing.Optional[int] = None
) -> typing.List[typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]]

Splits dataframes or series into random train and test subsets.

Parameters
Name Description
\*arrays bigframes.dataframe.DataFrame or bigframes.series.Series

A sequence of BigQuery DataFrames or Series that can be joined on their indexes.

test_size default None

The proportion of the dataset to include in the test split. If None, this will default to the complement of train_size. If both are none, it will be set to 0.25.

train_size default None

The proportion of the dataset to include in the train split. If None, this will default to the complement of test_size.

random_state default None

A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time.

Returns
Type Description
List[Union[bigframes.dataframe.DataFrame, bigframes.series.Series]] A list of BigQuery DataFrames or Series.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月14日 UTC.