Approximate aggregate functions

GoogleSQL for BigQuery supports approximate aggregate functions. To learn about the syntax for aggregate function calls, see Aggregate function calls.

Approximate aggregate functions are scalable in terms of memory usage and time, but produce approximate results instead of exact results. These functions typically require less memory than exact aggregation functions like COUNT(DISTINCT ...), but also introduce statistical uncertainty. This makes approximate aggregation appropriate for large data streams for which linear memory usage is impractical, as well as for data that is already approximate.

The approximate aggregate functions in this section work directly on the input data, rather than an intermediate estimation of the data. These functions don't allow users to specify the precision for the estimation with sketches. If you would like to specify precision with sketches, see:

Function list

Name Summary
APPROX_COUNT_DISTINCT Gets the approximate result for COUNT(DISTINCT expression).
APPROX_QUANTILES Gets the approximate quantile boundaries.
APPROX_TOP_COUNT Gets the approximate top elements and their approximate count.
APPROX_TOP_SUM Gets the approximate top elements and sum, based on the approximate sum of an assigned weight.

APPROX_COUNT_DISTINCT

APPROX_COUNT_DISTINCT(
expression
)

Description

Returns the approximate result for COUNT(DISTINCT expression). The value returned is a statistical estimate, not necessarily the actual value.

This function is less accurate than COUNT(DISTINCT expression), but performs better on huge input.

Supported Argument Types

Any data type except:

  • ARRAY
  • STRUCT
  • INTERVAL

Returned Data Types

INT64

Examples

SELECTAPPROX_COUNT_DISTINCT(x)asapprox_distinct
FROMUNNEST([0,1,1,2,3,5])asx;
/*-----------------+
 | approx_distinct |
 +-----------------+
 | 5 |
 +-----------------*/

APPROX_QUANTILES

APPROX_QUANTILES(
[DISTINCT]
expression,number
[{IGNORE|RESPECT}NULLS]
)

Description

Returns the approximate boundaries for a group of expression values, where number represents the number of quantiles to create. This function returns an array of number + 1 elements, sorted in ascending order, where the first element is the approximate minimum and the last element is the approximate maximum.

Returns NULL if there are zero input rows or expression evaluates to NULL for all rows.

To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.

Supported Argument Types

  • expression: Any supported data type except:

    • ARRAY
    • STRUCT
    • INTERVAL
  • number: INT64 literal or query parameter.

Returned Data Types

ARRAY<T> where T is the type specified by expression.

Examples

SELECTAPPROX_QUANTILES(x,2)ASapprox_quantiles
FROMUNNEST([1,1,1,4,5,6,7,8,9,10])ASx;
/*------------------+
 | approx_quantiles |
 +------------------+
 | [1, 5, 10] |
 +------------------*/
SELECTAPPROX_QUANTILES(x,100)[OFFSET(90)]ASpercentile_90
FROMUNNEST([1,2,3,4,5,6,7,8,9,10])ASx;
/*---------------+
 | percentile_90 |
 +---------------+
 | 9 |
 +---------------*/
SELECTAPPROX_QUANTILES(DISTINCTx,2)ASapprox_quantiles
FROMUNNEST([1,1,1,4,5,6,7,8,9,10])ASx;
/*------------------+
 | approx_quantiles |
 +------------------+
 | [1, 6, 10] |
 +------------------*/
SELECTFORMAT("%T",APPROX_QUANTILES(x,2RESPECTNULLS))ASapprox_quantiles
FROMUNNEST([NULL,NULL,1,1,1,4,5,6,7,8,9,10])ASx;
/*------------------+
 | approx_quantiles |
 +------------------+
 | [NULL, 4, 10] |
 +------------------*/
SELECTFORMAT("%T",APPROX_QUANTILES(DISTINCTx,2RESPECTNULLS))ASapprox_quantiles
FROMUNNEST([NULL,NULL,1,1,1,4,5,6,7,8,9,10])ASx;
/*------------------+
 | approx_quantiles |
 +------------------+
 | [NULL, 6, 10] |
 +------------------*/

APPROX_TOP_COUNT

APPROX_TOP_COUNT(
expression,number
)

Description

Returns the approximate top elements of expression as an array of STRUCTs. The number parameter specifies the number of elements returned.

Each STRUCT contains two fields. The first field (named value) contains an input value. The second field (named count) contains an INT64 specifying the number of times the value was returned.

Returns NULL if there are zero input rows.

To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.

Supported Argument Types

  • expression: Any data type that the GROUP BY clause supports.
  • number: INT64 literal or query parameter.

Returned Data Types

ARRAY<STRUCT>

Examples

SELECTAPPROX_TOP_COUNT(x,2)asapprox_top_count
FROMUNNEST(["apple","apple","pear","pear","pear","banana"])asx;
/*-------------------------+
 | approx_top_count |
 +-------------------------+
 | [{pear, 3}, {apple, 2}] |
 +-------------------------*/

NULL handling

APPROX_TOP_COUNT doesn't ignore NULLs in the input. For example:

SELECTAPPROX_TOP_COUNT(x,2)asapprox_top_count
FROMUNNEST([NULL,"pear","pear","pear","apple",NULL])asx;
/*------------------------+
 | approx_top_count |
 +------------------------+
 | [{pear, 3}, {NULL, 2}] |
 +------------------------*/

APPROX_TOP_SUM

APPROX_TOP_SUM(
expression,weight,number
)

Description

Returns the approximate top elements of expression, ordered by the sum of the weight values provided for each unique value of expression. The number parameter specifies the number of elements returned.

If the weight input is negative or NaN, this function returns an error.

The elements are returned as an array of STRUCTs. Each STRUCT contains two fields: value and sum. The value field contains the value of the input expression. The sum field is the same type as weight, and is the approximate sum of the input weight associated with the value field.

Returns NULL if there are zero input rows.

To learn more about the optional aggregate clauses that you can pass into this function, see Aggregate function calls.

Supported Argument Types

  • expression: Any data type that the GROUP BY clause supports.
  • weight: One of the following:

    • INT64
    • NUMERIC
    • BIGNUMERIC
    • FLOAT64
  • number: INT64 literal or query parameter.

Returned Data Types

ARRAY<STRUCT>

Examples

SELECTAPPROX_TOP_SUM(x,weight,2)ASapprox_top_sumFROM
UNNEST([
STRUCT("apple"ASx,3ASweight),
("pear",2),
("apple",0),
("banana",5),
("pear",4)
]);
/*--------------------------+
 | approx_top_sum |
 +--------------------------+
 | [{pear, 6}, {banana, 5}] |
 +--------------------------*/

NULL handling

APPROX_TOP_SUM doesn't ignore NULL values for the expression and weight parameters.

SELECTAPPROX_TOP_SUM(x,weight,2)ASapprox_top_sumFROM
UNNEST([STRUCT("apple"ASx,NULLASweight),("pear",0),("pear",NULL)]);
/*----------------------------+
 | approx_top_sum |
 +----------------------------+
 | [{pear, 0}, {apple, NULL}] |
 +----------------------------*/
SELECTAPPROX_TOP_SUM(x,weight,2)ASapprox_top_sumFROM
UNNEST([STRUCT("apple"ASx,0ASweight),(NULL,2)]);
/*-------------------------+
 | approx_top_sum |
 +-------------------------+
 | [{NULL, 2}, {apple, 0}] |
 +-------------------------*/
SELECTAPPROX_TOP_SUM(x,weight,2)ASapprox_top_sumFROM
UNNEST([STRUCT("apple"ASx,0ASweight),(NULL,NULL)]);
/*----------------------------+
 | approx_top_sum |
 +----------------------------+
 | [{apple, 0}, {NULL, NULL}] |
 +----------------------------*/

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年11月24日 UTC.