The ML.FEATURES_AT_TIME function

This document describes the ML.FEATURES_AT_TIME function, which lets you use a point-in-time cutoff for all entities when retrieving features, because features can have time dependencies if they include time-sensitive data. To avoid data leakage, use point-in-time features when training models and running inference.

Use this function to use the same point-in-time cutoff for all entities when retrieving features. Use the ML.ENTITY_FEATURES_AT_TIME function to retrieve features from multiple points in time for multiple entities.

Syntax

ML.FEATURES_AT_TIME(
 { TABLE `PROJECT_ID.DATASET.TABLE_NAME` | (QUERY_STATEMENT) }
 [, TIME => TIMESTAMP][, NUM_ROWS => INT64][, IGNORE_FEATURE_NULLS => BOOL])

Arguments

ML.FEATURES_AT_TIME takes the following arguments:

PROJECT_ID: the project that contains the table.
DATASET: the BigQuery dataset that contains the table.
TABLE_NAME: is the name of the BigQuery table that contains the feature data. The feature table must contain the following columns:
- entity_id: a STRING column that contains the ID of the entity related to the features.
- One or more feature columns.
- feature_timestamp: a TIMESTAMP column that identifies when the feature data was last updated.
The column names are case-insensitive. For example, you can use a column named Entity_ID instead of entity_id.

The feature table must be in wide format, with one column for each feature.
QUERY_STATEMENT: a STRING value that specifies a GoogleSQL query that returns the feature data. This query must return the same columns as the TABLE_NAME argument. See GoogleSQL query syntax for the supported SQL syntax of the QUERY_STATEMENT clause.
TIME: a TIMESTAMP value that specifies the point in time to use as a cutoff for feature data. Only rows where the value in the feature_timestamp column is equal to or earlier than the TIME value are returned. Defaults to the value of the CURRENT_TIMESTAMP function.
NUM_ROWS: an INT64 value that specifies the number of rows to return for each entity ID. Defaults to 1.

IGNORE_FEATURE_NULLS: a BOOL value that indicates whether to replace a NULL value in a feature column with the feature column value from the row for the same entity that immediately precedes it in time. For example, for the following feature table:

+-----------+------+------+--------------------------+
| entity_id | f1 | f2 | feature_timestamp |
+-----------+------+------+--------------------------+
| '2' | 5.0 | 8.0 | '2022-06-10 09:00:00+00' |
+-----------+------+------+--------------------------+
| '2' | 2.0 | 4.0 | '2022-06-10 12:00:00+00' |
+-----------+------+------+--------------------------+
| '2' | 7.0 | NULL | '2022-06-11 10:00:00+00' |
+-----------+------+------+--------------------------+

Running this query:

SELECT*
FROM
ML.FEATURES_AT_TIME(
TABLEmydataset.feature_table,
time=>'2022-06-11 10:00:00+00',
num_rows=>1,
ignore_feature_nulls=>TRUE);

Results in the following output, where the f2 value from the row for entity ID 2 that is timestamped '2022年06月10日 12:00:00+00' is substituted for the NULL value in the row timestamped '2022年06月11日 10:00:00+00':

+-----------+------+------+--------------------------+
| entity_id | f1 | f2 | feature_timestamp |
+-----------+------+------+--------------------------+
| '2' | 7.0 | 4.0 | '2022-06-11 10:00:00+00' |
+-----------+------+------+--------------------------+

If there is no available replacement value, for example, where there is no earlier row for that entity ID, a NULL value is returned.

Defaults to FALSE.

Output

The ML.FEATURES_AT_TIME function returns the input table rows that meet the point-in-time cutoff criteria, with the feature_timestamp column showing the timestamp that was input in the time argument.

Examples

Example 1

This example shows a how to retrain a model using only features that were created or updated before 2023年01月01日 12:00:00+00:

CREATEORREPLACE
`mydataset.mymodel`OPTIONS(WARM_START=TRUE)
AS
SELECT*EXCEPT(feature_timestamp,entity_id)
FROM
ML.FEATURES_AT_TIME(
TABLE`mydataset.feature_table`,
time=>'2023-01-01 12:00:00+00',
num_rows=>1,
ignore_feature_nulls=>TRUE);

Example 2

This example shows how to get predictions from a model based on features that were created or updated before 2023年01月01日 12:00:00+00:

SELECT
*
FROM
ML.PREDICT(
MODEL`mydataset.mymodel`,
(
SELECT*EXCEPT(feature_timestamp,entity_id)
FROM
ML.FEATURES_AT_TIME(
TABLE`mydataset.feature_table`,
time=>'2023-01-01 12:00:00+00',
num_rows=>1,
ignore_feature_nulls=>TRUE)
)
);

Example 3

This is a contrived example that you can use to see the output of the function:

WITH
feature_tableAS(
SELECT*FROMUNNEST(
ARRAY<STRUCT<entity_idSTRING,f_1FLOAT64,f_2FLOAT64,feature_timestampTIMESTAMP>>[
('id1',1.0,1.0,TIMESTAMP'2022-06-10 12:00:00+00'),
('id2',12.0,24.0,TIMESTAMP'2022-06-11 12:00:00+00'),
('id1',11.0,NULL,TIMESTAMP'2022-06-11 12:00:00+00'),
('id1',6.0,12.0,TIMESTAMP'2022-06-11 10:00:00+00'),
('id2',2.0,4.0,TIMESTAMP'2022-06-10 12:00:00+00'),
('id2',7.0,NULL,TIMESTAMP'2022-06-11 10:00:00+00')])
)
SELECT*
FROM
ML.FEATURES_AT_TIME(
TABLEfeature_table,
time=>'2022-06-12 10:00:00+00',
num_rows=>1,
ignore_feature_nulls=>TRUE);