The ML.NGRAMS function

This document describes the ML.NGRAMS function, which lets you create n-grams of the input values.

You can use this function with models that support manual feature preprocessing. For more information, see the following documents:

Syntax

ML.NGRAMS(array_input, range [, separator])

Arguments

ML.NGRAMS takes the following arguments:

  • array_input: an ARRAY<STRING> value that represent the tokens to be merged.
  • range: an ARRAY of two INT64 elements or a single INT64 value. If you specify an ARRAY value, the INT64 elements provide the range of n-gram sizes to return. Provide the numerical values in order, lower to higher. If you specify a single INT64 value of x, the range of n-gram sizes to return is [x, x].
  • separator: a STRING value that specifies the separator to connect two adjacent tokens in the output. The default value is whitespace .

Output

ML.NGRAMS returns an ARRAY<STRING> value that contain the n-grams.

Example

The following example outputs all possible 2-token and 3-token combinations for a set of three input strings:

SELECT
ML.NGRAMS(['a','b','c'],[2,3],'#')ASoutput;

The output looks similar to the following:

+-----------------------+
| output |
+-----------------------+
| ["a#b","a#b#c","b#c"] |
+-----------------------+

What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年11月24日 UTC.