Package bigquery (1.12.0)

API documentation for bigquery package.

Packages Functions

array_agg

array_agg(
 obj: groupby.SeriesGroupBy | groupby.DataFrameGroupBy,
) -> series.Series | dataframe.DataFrame

Group data and create arrays from selected columns, omitting NULLs to avoid BigQuery errors (NULLs not allowed in arrays).

Examples:

>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

For a SeriesGroupBy object:

>>> lst = ['a', 'a', 'b', 'b', 'a']
>>> s = bpd.Series([1, 2, 3, 4, np.nan], index=lst)
>>> bbq.array_agg(s.groupby(level=0))
a [1. 2.]
b [3. 4.]
dtype: list<item: double>[pyarrow]

For a DataFrameGroupBy object:

>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = bpd.DataFrame(l, columns=["a", "b", "c"])
>>> bbq.array_agg(df.groupby(by=["b"]))
 a c
b
1.0 [2] [3]
2.0 [1 1] [3 2]
<BLANKLINE>
[2 rows x 2 columns]
Parameter
Name Description
obj

A GroupBy object to be applied the function.

array_length

array_length(series: series.Series) -> series.Series

Compute the length of each array element in the Series.

Examples:

>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([[1, 2, 8, 3], [], [3, 4]])
>>> bbq.array_length(s)
0 4
1 0
2 2
dtype: Int64

You can also apply this function directly to Series.

>>> s.apply(bbq.array_length, by_row=False)
0 4
1 0
2 2
dtype: Int64
Parameter
Name Description
series

A Series with array columns.

array_to_string

array_to_string(series: series.Series, delimiter: str) -> series.Series

Converts array elements within a Series into delimited strings.

Examples:

>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([["H", "i", "!"], ["Hello", "World"], np.nan, [], ["Hi"]])
>>> bbq.array_to_string(s, delimiter=", ")
 0 H, i, !
 1 Hello, World
 2
 3
 4 Hi
 dtype: string
Parameters
Name Description
series

A Series containing arrays.

delimiter

The string used to separate array elements.

json_set

json_set(
 series: series.Series,
 json_path_value_pairs: typing.Sequence[typing.Tuple[str, typing.Any]],
) -> series.Series

Produces a new JSON value within a Series by inserting or replacing values at specified paths.

Examples:

>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> s = bpd.read_gbq("SELECT JSON '{\"a\": 1}' AS data")["data"]
>>> bbq.json_set(s, json_path_value_pairs=[("$.a", 100), ("$.b", "hi")])
 0 {"a":100,"b":"hi"}
 Name: data, dtype: string
Parameters
Name Description
series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path_value_pairs

Pairs of JSON path and the new value to insert/replace.

vector_search(
 base_table: str,
 column_to_search: str,
 query: Union[dataframe.DataFrame, series.Series],
 *,
 query_column_to_search: Optional[str] = None,
 top_k: Optional[int] = 10,
 distance_type: Literal["euclidean", "cosine"] = "euclidean",
 fraction_lists_to_search: Optional[float] = None,
 use_brute_force: bool = False
) -> dataframe.DataFrame

Conduct vector search which searches embeddings to find semantically similar entities.

Examples:

>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

DataFrame embeddings for which to find nearest neighbors. The ARRAY<FLOAT64> column is used as the search query:

>>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
... "embedding": [[1.0, 2.0], [3.0, 5.2]]})
>>> bbq.vector_search(
... base_table="bigframes-dev.bigframes_tests_sys.base_table",
... column_to_search="my_embedding",
... query=search_query,
... top_k=2)
 query_id embedding id my_embedding distance
1 cat [3. 5.2] 5 [5. 5.4] 2.009975
0 dog [1. 2.] 1 [1. 2.] 0.0
0 dog [1. 2.] 4 [1. 3.2] 1.2
1 cat [3. 5.2] 2 [2. 4.] 1.56205
<BLANKLINE>
[4 rows x 5 columns]

Series embeddings for which to find nearest neighbors:

>>> search_query = bpd.Series([[1.0, 2.0], [3.0, 5.2]],
... index=["dog", "cat"],
... name="embedding")
>>> bbq.vector_search(
... base_table="bigframes-dev.bigframes_tests_sys.base_table",
... column_to_search="my_embedding",
... query=search_query,
... top_k=2)
 embedding id my_embedding distance
dog [1. 2.] 1 [1. 2.] 0.0
cat [3. 5.2] 5 [5. 5.4] 2.009975
dog [1. 2.] 4 [1. 3.2] 1.2
cat [3. 5.2] 2 [2. 4.] 1.56205
<BLANKLINE>
[4 rows x 4 columns]

You can specify the name of the column in the query DataFrame embeddings and distance type. If you specify query_column_to_search_value, it will use the provided column which contains the embeddings for which to find nearest neighbors. Otherwiese, it uses the column_to_search value.

>>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
... "embedding": [[1.0, 2.0], [3.0, 5.2]],
... "another_embedding": [[0.7, 2.2], [3.3, 5.2]]})
>>> bbq.vector_search(
... base_table="bigframes-dev.bigframes_tests_sys.base_table",
... column_to_search="my_embedding",
... query=search_query,
... distance_type="cosine",
... query_column_to_search="another_embedding",
... top_k=2)
 query_id embedding another_embedding id my_embedding distance
1 cat [3. 5.2] [3.3 5.2] 2 [2. 4.] 0.005181
0 dog [1. 2.] [0.7 2.2] 4 [1. 3.2] 0.000013
1 cat [3. 5.2] [3.3 5.2] 1 [1. 2.] 0.005181
0 dog [1. 2.] [0.7 2.2] 3 [1.5 7. ] 0.004697
<BLANKLINE>
[4 rows x 6 columns]
Parameters
Name Description
base_table

The table to search for nearest neighbor embeddings.

column_to_search

The name of the base table column to search for nearest neighbor embeddings. The column must have a type of ARRAY. All elements in the array must be non-NULL.

query

A Series or DataFrame that provides the embeddings for which to find nearest neighbors.

query_column_to_search

Specifies the name of the column in the query that contains the embeddings for which to find nearest neighbors. The column must have a type of ARRAY. All elements in the array must be non-NULL and all values in the column must have the same array dimensions as the values in the column_to_search column. Can only be set when query is a DataFrame.

top_k

Sepecifies the number of nearest neighbors to return. Default to 10.

distance_type

Specifies the type of metric to use to compute the distance between two vectors. Possible values are "euclidean" and "cosine". Default to "euclidean".

fraction_lists_to_search

Specifies the percentage of lists to search. Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage. It is only used when a vector index is also used. You can only specify fraction_lists_to_search when use_brute_force is set to False.

use_brute_force

Determines whether to use brute force search by skipping the vector index if one is available. Default to False.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月27日 UTC.