Class DataFrameGroupBy (1.7.0)

DataFrameGroupBy(
 block: bigframes.core.blocks.Block,
 by_col_ids: typing.Sequence[str],
 *,
 selected_cols: typing.Optional[typing.Sequence[str]] = None,
 dropna: bool = True,
 as_index: bool = True
)

Class for grouping and aggregating relational data.

Methods

agg

agg(func=None, **kwargs) -> bigframes.dataframe.DataFrame

Aggregate using one or more operations.

Parameter

Name Description

func

function, str, list, dict or None

Function to use for aggregating the data. Accepted combinations are: - string function name - list of function names, e.g. ['sum', 'mean'] - dict of axis labels -> function names or list of such. - None, in which case kwargs are used with Named Aggregation. Here the output has one column for each element in kwargs. The name of the column is keyword, whereas the value determines the aggregation used to compute the values in the column.

aggregate

aggregate(func=None, **kwargs) -> bigframes.dataframe.DataFrame

API documentation for aggregate method.

all

all() -> bigframes.dataframe.DataFrame

Return True if all values in the group are true, else False.

Returns
Type	Description
`Series or DataFrame`	DataFrame or Series of boolean values, where a value is True if all elements are True within its respective group; otherwise False.

any

any() -> bigframes.dataframe.DataFrame

Return True if any value in the group is true, else False.

Returns
Type	Description
`Series or DataFrame`	DataFrame or Series of boolean values, where a value is True if any element is True within its respective group; otherwise False.

count

count() -> bigframes.dataframe.DataFrame

Compute count of group, excluding missing values.

Returns
Type	Description
`Series or DataFrame`	Count of values within each group.

cumcount

cumcount(ascending: bool = True)

Number each item in each group from 0 to the length of that group - 1.

Parameter
Name	Description
`ascending`	`bool, default True` If False, number in reverse, from length of group - 1 to 0.

Returns
Type	Description
`Series`	Sequence number of each element within each group.

cummax

cummax(
 *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrame

Cumulative max for each group.

Returns
Type	Description
`Series or DataFrame`	Cumulative max for each group.

cummin

cummin(
 *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrame

Cumulative min for each group.

Returns
Type	Description
`Series or DataFrame`	Cumulative min for each group.

cumprod

cumprod(*args, **kwargs) -> bigframes.dataframe.DataFrame

Cumulative product for each group.

Returns
Type	Description
`Series or DataFrame`	Cumulative product for each group.

cumsum

cumsum(
 *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrame

Cumulative sum for each group.

Returns
Type	Description
`Series or DataFrame`	Cumulative sum for each group.

diff

diff(periods=1) -> bigframes.series.Series

First discrete difference of element. Calculates the difference of each element compared with another element in the group (default is element in previous row).

Returns
Type	Description
`Series or DataFrame`	First differences.

expanding

expanding(min_periods: int = 1) -> bigframes.core.window.Window

Provides expanding functionality.

Returns
Type	Description
`Series or DataFrame`	An expanding grouper, providing expanding functionality per group.

kurt

kurt(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameter
Name	Description
`numeric_only`	`bool, default False` Include only `float`, `int` or `boolean` data.

kurtosis

kurtosis(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

API documentation for kurtosis method.

max

max(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrame

Compute max of group values.

Parameters
Name	Description
`numeric_only`	`bool, default False` Include only float, int, boolean columns.
`min_count`	`int, default 0` The required number of valid values to perform the operation. If fewer than `min_count` and non-NA values are present, the result will be NA.

Returns
Type	Description
`Series or DataFrame`	Computed max of values within each group.

mean

mean(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrame

Compute mean of groups, excluding missing values.

Parameter
Name	Description
`numeric_only`	`bool, default False` Include only float, int, boolean columns.

Returns
Type	Description
`pandas.Series or pandas.DataFrame`	Mean of groups.

median

median(
 numeric_only: bool = False, *, exact: bool = True
) -> bigframes.dataframe.DataFrame

Compute median of groups, excluding missing values.

Parameters
Name	Description
`numeric_only`	`bool, default False` Include only float, int, boolean columns.
`exact`	`bool, default True` Calculate the exact median instead of an approximation.

Returns
Type	Description
`pandas.Series or pandas.DataFrame`	Median of groups.

min

min(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrame

Compute min of group values.

Parameters
Name	Description
`numeric_only`	`bool, default False` Include only float, int, boolean columns.
`min_count`	`int, default 0` The required number of valid values to perform the operation. If fewer than `min_count` and non-NA values are present, the result will be NA.

Returns
Type	Description
`Series or DataFrame`	Computed min of values within each group.

nunique

nunique() -> bigframes.dataframe.DataFrame

Return DataFrame with counts of unique elements in each position.

prod

prod(numeric_only: bool = False, min_count: int = 0)

Compute prod of group values.

Parameters
Name	Description
`numeric_only`	`bool, default False` Include only float, int, boolean columns.
`min_count`	`int, default 0` The required number of valid values to perform the operation. If fewer than `min_count` and non-NA values are present, the result will be NA.

Returns
Type	Description
`Series or DataFrame`	Computed prod of values within each group.

quantile

quantile(
 q: typing.Union[float, typing.Sequence[float]] = 0.5, *, numeric_only: bool = False
) -> bigframes.dataframe.DataFrame

Return group values at the given quantile, a la numpy.percentile.

Examples:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([
... ['a', 1], ['a', 2], ['a', 3],
... ['b', 1], ['b', 3], ['b', 5]
... ], columns=['key', 'val'])
>>> df.groupby('key').quantile()
 val
key
a 2.0
b 3.0
<BLANKLINE>
[2 rows x 1 columns]

Parameters
Name	Description
`q`	`float or array-like, default 0.5 (50% quantile)` Value(s) between 0 and 1 providing the quantile(s) to compute.
`numeric_only`	`bool, default False` Include only `float`, `int` or `boolean` data.

Returns
Type	Description
`Series or DataFrame`	Return type determined by caller of GroupBy object.

rolling

rolling(window: int, min_periods=None) -> bigframes.core.window.Window

Returns a rolling grouper, providing rolling functionality per group.

Parameter
Name	Description
`min_periods`	`int, default None` Minimum number of observations in window required to have a value; otherwise, result is `np.nan`. For a window that is specified by an offset, `min_periods` will default to 1. For a window that is specified by an integer, `min_periods` will default to the size of the window.

Returns
Type	Description
`Series or DataFrame`	Return a new grouper with our rolling appended.

shift

shift(periods=1) -> bigframes.series.Series

Shift each group by periods observations.

Parameter
Name	Description
`periods`	`int, default 1` Number of periods to shift.

Returns
Type	Description
`Series or DataFrame`	Object shifted within each group.

skew

skew(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Return unbiased skew within groups.

Normalized by N-1.

Parameter
Name	Description
`numeric_only`	`bool, default False` Include only `float`, `int` or `boolean` data.

std

std(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Compute standard deviation of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Parameter
Name	Description
`numeric_only`	`bool, default False` Include only `float`, `int` or `boolean` data.

Returns
Type	Description
`Series or DataFrame`	Standard deviation of values within each group.

sum

sum(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrame

Compute sum of group values.

Parameters
Name	Description
`numeric_only`	`bool, default False` Include only float, int, boolean columns.
`min_count`	`int, default 0` The required number of valid values to perform the operation. If fewer than `min_count` and non-NA values are present, the result will be NA.

Returns
Type	Description
`Series or DataFrame`	Computed sum of values within each group.

var

var(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Compute variance of groups, excluding missing values.