Perform anomaly detection with a multivariate time-series forecasting model

This tutorial shows you how to do the following tasks:

This tutorial uses the following tables from the public epa_historical_air_quality dataset, which contains daily PM 2.5, temperature, and wind speed information collected from multiple US cities:

Required permissions

  • To create the dataset, you need the bigquery.datasets.create IAM permission.

  • To create the model, you need the following permissions:

    • bigquery.jobs.create
    • bigquery.models.create
    • bigquery.models.getData
    • bigquery.models.updateData
  • To run inference, you need the following permissions:

    • bigquery.models.getData
    • bigquery.jobs.create

For more information about IAM roles and permissions in BigQuery, see Introduction to IAM.

Costs

In this document, you use the following billable components of Google Cloud:

  • BigQuery: You incur costs for the data you process in BigQuery.

To generate a cost estimate based on your projected usage, use the pricing calculator.

New Google Cloud users might be eligible for a free trial.

For more information, see BigQuery pricing.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get 300ドル in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the BigQuery API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the BigQuery API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the API

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

  1. In the Google Cloud console, go to the BigQuery page.

    Go to the BigQuery page

  2. In the Explorer pane, click your project name.

  3. Click View actions > Create dataset

  4. On the Create dataset page, do the following:

    • For Dataset ID, enter bqml_tutorial.

    • For Location type, select Multi-region, and then select US (multiple regions in United States).

    • Leave the remaining default settings as they are, and click Create dataset.

bq

To create a new dataset, use the bq mk command with the --location flag. For a full list of possible parameters, see the bq mk --dataset command reference.

  1. Create a dataset named bqml_tutorial with the data location set to US and a description of BigQuery ML tutorial dataset:

    bq --location=US mk -d \
     --description "BigQuery ML tutorial dataset." \
     bqml_tutorial

    Instead of using the --dataset flag, the command uses the -d shortcut. If you omit -d and --dataset, the command defaults to creating a dataset.

  2. Confirm that the dataset was created:

    bqls

API

Call the datasets.insert method with a defined dataset resource.

{
"datasetReference":{
"datasetId":"bqml_tutorial"
}
}

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.

importgoogle.cloud.bigquery
bqclient = google.cloud.bigquery .Client ()
bqclient.create_dataset ("bqml_tutorial", exists_ok=True)

Prepare the training data

The PM2.5, temperature, and wind speed data are in separate tables. Create the bqml_tutorial.seattle_air_quality_daily table of training data by combining the data in these public tables. bqml_tutorial.seattle_air_quality_daily contains the following columns:

  • date: the date of the observation
  • PM2.5: the average PM2.5 value for each day
  • wind_speed: the average wind speed for each day
  • temperature: the highest temperature for each day

The new table has daily data from August 11, 2009 to January 31, 2022.

  1. Go to the BigQuery page.

    Go to BigQuery

  2. In the SQL editor pane, run the following SQL statement:

    CREATETABLE`bqml_tutorial.seattle_air_quality_daily`
    AS
    WITH
    pm25_dailyAS(
    SELECT
    avg(arithmetic_mean)ASpm25,date_localASdate
    FROM
    `bigquery-public-data.epa_historical_air_quality.pm25_nonfrm_daily_summary`
    WHERE
    city_name='Seattle'
    ANDparameter_name='Acceptable PM2.5 AQI & Speciation Mass'
    GROUPBYdate_local
    ),
    wind_speed_dailyAS(
    SELECT
    avg(arithmetic_mean)ASwind_speed,date_localASdate
    FROM
    `bigquery-public-data.epa_historical_air_quality.wind_daily_summary`
    WHERE
    city_name='Seattle'ANDparameter_name='Wind Speed - Resultant'
    GROUPBYdate_local
    ),
    temperature_dailyAS(
    SELECT
    avg(first_max_value)AStemperature,date_localASdate
    FROM
    `bigquery-public-data.epa_historical_air_quality.temperature_daily_summary`
    WHERE
    city_name='Seattle'ANDparameter_name='Outdoor Temperature'
    GROUPBYdate_local
    )
    SELECT
    pm25_daily.dateASdate,pm25,wind_speed,temperature
    FROMpm25_daily
    JOINwind_speed_dailyUSING(date)
    JOINtemperature_dailyUSING(date)

Create the model

Create a multivariate time series model, using the data from bqml_tutorial.seattle_air_quality_daily as training data.

  1. Go to the BigQuery page.

    Go to BigQuery

  2. In the SQL editor pane, run the following SQL statement:

    CREATEORREPLACEMODEL`bqml_tutorial.arimax_model`
    OPTIONS(
    model_type='ARIMA_PLUS_XREG',
    auto_arima=TRUE,
    time_series_data_col='temperature',
    time_series_timestamp_col='date'
    )
    AS
    SELECT
    *
    FROM
    `bqml_tutorial.seattle_air_quality_daily`
    WHERE
    date<"2023-02-01";

    The query takes several seconds to complete, after which the model arimax_model appears in the bqml_tutorial dataset and can be accessed in the Explorer pane.

    Because the query uses a CREATE MODEL statement to create a model, there are no query results.

Perform anomaly detection on historical data

Run anomaly detection against the historical data that you used to train the model.

  1. Go to the BigQuery page.

    Go to BigQuery

  2. In the SQL editor pane, run the following SQL statement:

    SELECT
    *
    FROM
    ML.DETECT_ANOMALIES(
    MODEL`bqml_tutorial.arimax_model`,
    STRUCT(0.6ASanomaly_prob_threshold)
    )
    ORDERBY
    dateASC;

    The results look similar to the following:

    +-------------------------+-------------+------------+--------------------+--------------------+---------------------+
    | date | temperature | is_anomaly | lower_bound | upper_bound | anomaly_probability |
    +--------------------------------------------------------------------------------------------------------------------+
    | 2009年08月11日 00:00:00 UTC | 70.1 | false | 67.647370742988727 | 72.552629257011262 | 0 |
    +--------------------------------------------------------------------------------------------------------------------+
    | 2009年08月12日 00:00:00 UTC | 73.4 | false | 71.7035428351283 | 76.608801349150838 | 0.20478819992561115 |
    +--------------------------------------------------------------------------------------------------------------------+
    | 2009年08月13日 00:00:00 UTC | 64.6 | true | 67.740408724826068 | 72.6456672388486 | 0.945588334903206 |
    +-------------------------+-------------+------------+--------------------+--------------------+---------------------+
    

Perform anomaly detection on new data

Run anomaly detection on the new data that you generate.

  1. Go to the BigQuery page.

    Go to BigQuery

  2. In the SQL editor pane, run the following SQL statement:

    SELECT
    *
    FROM
    ML.DETECT_ANOMALIES(
    MODEL`bqml_tutorial.arimax_model`,
    STRUCT(0.6ASanomaly_prob_threshold),
    (
    SELECT
    *
    FROM
    UNNEST(
    [
    STRUCT<dateTIMESTAMP,pm25FLOAT64,wind_speedFLOAT64,temperatureFLOAT64>
    ('2023-02-01 00:00:00 UTC',8.8166665,1.6525,44.0),
    ('2023-02-02 00:00:00 UTC',11.8354165,1.558333,40.5),
    ('2023-02-03 00:00:00 UTC',10.1395835,1.6895835,46.5),
    ('2023-02-04 00:00:00 UTC',11.439583500000001,2.0854165,45.0),
    ('2023-02-05 00:00:00 UTC',9.7208335,1.7083335,46.0),
    ('2023-02-06 00:00:00 UTC',13.3020835,2.23125,43.5),
    ('2023-02-07 00:00:00 UTC',5.7229165,2.377083,47.5),
    ('2023-02-08 00:00:00 UTC',7.6291665,2.24375,44.5),
    ('2023-02-09 00:00:00 UTC',8.5208335,2.2541665,40.5),
    ('2023-02-10 00:00:00 UTC',9.9086955,7.333335,39.5)
    ]
    )
    )
    );

    The results look similar to the following:

    +-------------------------+-------------+------------+--------------------+--------------------+---------------------+------------+------------+
    | date | temperature | is_anomaly | lower_bound | upper_bound | anomaly_probability | pm25 | wind_speed |
    +----------------------------------------------------------------------------------------------------------------------------------------------+
    | 2023年02月01日 00:00:00 UTC | 44.0 | true | 36.89918003713138 | 41.8044385511539 | 0.88975675709801583 | 8.8166665 | 1.6525 |
    +----------------------------------------------------------------------------------------------------------------------------------------------+
    | 2023年02月02日 00:00:00 UTC | 40.5 | false | 34.439946284051572 | 40.672021330796483 | 0.57358239699845348 | 11.8354165 | 1.558333 |
    +--------------------------------------------------------------------------------------------------------------------+-------------------------+
    | 2023年02月03日 00:00:00 UTC | 46.5 | true | 33.615139992931191 | 40.501364463964549 | 0.97902867696346974 | 10.1395835 | 1.6895835 |
    +-------------------------+-------------+------------+--------------------+--------------------+---------------------+-------------------------+
    

Clean up

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025年10月16日 UTC.