HyperLogLog++ functions

The HyperLogLog++ algorithm (HLL++) estimates cardinality from sketches.

HLL++ functions are approximate aggregate functions. Approximate aggregation typically requires less memory than exact aggregation functions, like COUNT(DISTINCT), but also introduces statistical error. This makes HLL++ functions appropriate for large data streams for which linear memory usage is impractical, as well as for data that is already approximate.

A data sketch is a compact summary of a data aggregation. If you don't need materialized sketches, you can alternatively use an approximate aggregate function with system-defined precision, such as APPROX_COUNT_DISTINCT. However, APPROX_COUNT_DISTINCT doesn't allow partial aggregations, re-aggregations, and custom precision.

GoogleSQL for BigQuery supports the following HLL++ functions.

Function list

Name Summary
HLL_COUNT.EXTRACT Extracts a cardinality estimate of an HLL++ sketch.
HLL_COUNT.INIT Aggregates values of the same underlying type into a new HLL++ sketch.
HLL_COUNT.MERGE Merges HLL++ sketches of the same underlying type into a new sketch, and then gets the cardinality of the new sketch.
HLL_COUNT.MERGE_PARTIAL Merges HLL++ sketches of the same underlying type into a new sketch.

HLL_COUNT.EXTRACT

HLL_COUNT.EXTRACT(sketch)

Description

A scalar function that extracts a cardinality estimate of a single HLL++ sketch.

If sketch is NULL, this function returns a cardinality estimate of 0.

Supported input types

BYTES

Return type

INT64

Example

The following query returns the number of distinct users for each country who have at least one invoice.

SELECT
country,
HLL_COUNT.EXTRACT(HLL_sketch)ASdistinct_customers_with_open_invoice
FROM
(
SELECT
country,
HLL_COUNT.INIT(customer_id)AShll_sketch
FROM
UNNEST(
ARRAY<STRUCT<countrySTRING,customer_idSTRING,invoice_idSTRING>>[
('UA','customer_id_1','invoice_id_11'),
('BR','customer_id_3','invoice_id_31'),
('CZ','customer_id_2','invoice_id_22'),
('CZ','customer_id_2','invoice_id_23'),
('BR','customer_id_3','invoice_id_31'),
('UA','customer_id_2','invoice_id_24')])
GROUPBYcountry
);
/*---------+--------------------------------------*
 | country | distinct_customers_with_open_invoice |
 +---------+--------------------------------------+
 | UA | 2 |
 | BR | 1 |
 | CZ | 1 |
 *---------+--------------------------------------*/

HLL_COUNT.INIT

HLL_COUNT.INIT(input[,precision])

Description

An aggregate function that takes one or more input values and aggregates them into a HLL++ sketch. Each sketch is represented using the BYTES data type. You can then merge sketches using HLL_COUNT.MERGE or HLL_COUNT.MERGE_PARTIAL. If no merging is needed, you can extract the final count of distinct values from the sketch using HLL_COUNT.EXTRACT.

This function supports an optional parameter, precision. This parameter defines the accuracy of the estimate at the cost of additional memory required to process the sketches or store them on disk. The range for this value is 10 to 24. The default value is 15. For more information about precision, see Precision for sketches.

If the input is NULL, this function returns NULL.

For more information, see HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.

Supported input types

  • INT64
  • NUMERIC
  • BIGNUMERIC
  • STRING
  • BYTES

Return type

BYTES

Example

The following query creates HLL++ sketches that count the number of distinct users with at least one invoice per country.

SELECT
country,
HLL_COUNT.INIT(customer_id,10)
AShll_sketch
FROM
UNNEST(
ARRAY<STRUCT<countrySTRING,customer_idSTRING,invoice_idSTRING>>[
('UA','customer_id_1','invoice_id_11'),
('CZ','customer_id_2','invoice_id_22'),
('CZ','customer_id_2','invoice_id_23'),
('BR','customer_id_3','invoice_id_31'),
('UA','customer_id_2','invoice_id_24')])
GROUPBYcountry;
/*---------+------------------------------------------------------------------------------------*
 | country | hll_sketch |
 +---------+------------------------------------------------------------------------------------+
 | UA | "010円p020円002円030円002円 013円202円007円\r020円002円030円\n 0172円005円371円344円001円315円010円" |
 | CZ | "010円p020円002円030円002円 013円202円007円013円020円001円030円\n 0172円003円371円344円001円" |
 | BR | "010円p020円001円030円002円 013円202円007円013円020円001円030円\n 0172円003円202円341円001円" |
 *---------+------------------------------------------------------------------------------------*/

HLL_COUNT.MERGE

HLL_COUNT.MERGE(sketch)

Description

An aggregate function that returns the cardinality of several HLL++ sketches by computing their union.

Each sketch must be initialized on the same type. Attempts to merge sketches for different types results in an error. For example, you can't merge a sketch initialized from INT64 data with one initialized from STRING data.

If the merged sketches were initialized with different precisions, the precision will be downgraded to the lowest precision involved in the merge.

This function ignores NULL values when merging sketches. If the merge happens over zero rows or only over NULL values, the function returns 0.

Supported input types

BYTES

Return type

INT64

Example

The following query counts the number of distinct users across all countries who have at least one invoice.

SELECTHLL_COUNT.MERGE(hll_sketch)ASdistinct_customers_with_open_invoice
FROM
(
SELECT
country,
HLL_COUNT.INIT(customer_id)AShll_sketch
FROM
UNNEST(
ARRAY<STRUCT<countrySTRING,customer_idSTRING,invoice_idSTRING>>[
('UA','customer_id_1','invoice_id_11'),
('BR','customer_id_3','invoice_id_31'),
('CZ','customer_id_2','invoice_id_22'),
('CZ','customer_id_2','invoice_id_23'),
('BR','customer_id_3','invoice_id_31'),
('UA','customer_id_2','invoice_id_24')])
GROUPBYcountry
);
/*--------------------------------------*
 | distinct_customers_with_open_invoice |
 +--------------------------------------+
 | 3 |
 *--------------------------------------*/

HLL_COUNT.MERGE_PARTIAL

HLL_COUNT.MERGE_PARTIAL(sketch)

Description

An aggregate function that takes one or more HLL++ sketch inputs and merges them into a new sketch.

Each sketch must be initialized on the same type. Attempts to merge sketches for different types results in an error. For example, you can't merge a sketch initialized from INT64 data with one initialized from STRING data.

If the merged sketches were initialized with different precisions, the precision will be downgraded to the lowest precision involved in the merge. For example, if MERGE_PARTIAL encounters sketches of precision 14 and 15, the returned new sketch will have precision 14.

This function returns NULL if there is no input or all inputs are NULL.

Supported input types

BYTES

Return type

BYTES

Example

The following query returns an HLL++ sketch that counts the number of distinct users who have at least one invoice across all countries.

SELECTHLL_COUNT.MERGE_PARTIAL(HLL_sketch)ASdistinct_customers_with_open_invoice
FROM
(
SELECT
country,
HLL_COUNT.INIT(customer_id)AShll_sketch
FROM
UNNEST(
ARRAY<STRUCT<countrySTRING,customer_idSTRING,invoice_idSTRING>>[
('UA','customer_id_1','invoice_id_11'),
('BR','customer_id_3','invoice_id_31'),
('CZ','customer_id_2','invoice_id_22'),
('CZ','customer_id_2','invoice_id_23'),
('BR','customer_id_3','invoice_id_31'),
('UA','customer_id_2','invoice_id_24')])
GROUPBYcountry
);
/*----------------------------------------------------------------------------------------------*
 | distinct_customers_with_open_invoice |
 +----------------------------------------------------------------------------------------------+
 | "010円p020円006円030円002円 013円202円007円020円020円003円030円017円 0242円010円320円2408円352円}244円223円002円" |
 *----------------------------------------------------------------------------------------------*/

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月15日 UTC.