Updating partitioned table data using DML
This page provides an overview of data manipulation language (DML) support for partitioned tables.
For more information on DML, see:
Tables used in examples
The following JSON schema definitions represent the tables used in the examples on this page.
mytable
: an ingestion-time partitioned table
[
{"name": "field1", "type": "INTEGER"},
{"name": "field2", "type": "STRING"}
]
mytable2
: a standard (non-partitioned) table
[
{"name": "id", "type": "INTEGER"},
{"name": "ts", "type": "TIMESTAMP"}
]
mycolumntable
: a partitioned table
that is partitioned by using the ts
TIMESTAMP
column
[
{"name": "field1", "type": "INTEGER"},
{"name": "field2", "type": "STRING"}
{"name": "field3", "type": "BOOLEAN"}
{"name": "ts", "type": "TIMESTAMP"}
]
In examples where COLUMN_ID appears, replace it with the name of the column you wish to operate on.
Inserting data
You use a DML INSERT
statement
to add rows to a partitioned table.
Inserting data into ingestion-time partitioned tables
When you use a DML statement to add rows to an ingestion-time partitioned table,
you can specify the partition to which the rows should be added. You reference
the partition using the _PARTITIONTIME
pseudocolumn.
For example, the following INSERT
statement adds a row to the May 1, 2017
partition of mytable
— "2017年05月01日"
.
INSERTINTO project_id.dataset.mytable(_PARTITIONTIME, field1, field2) SELECT TIMESTAMP("2017-05-01"), 1, "one"
Only timestamps that correspond to exact date boundaries can be used. For example, the following DML statement returns an error:
INSERTINTO project_id.dataset.mytable(_PARTITIONTIME, field1, field2) SELECT TIMESTAMP("2017-05-01 21:30:00"), 1, "one"
Inserting data into partitioned tables
Inserting data into a partitioned table using DML is the same as inserting data into a non-partitioned table.
For example, the following INSERT
statement adds rows to partitioned table
mycolumntable
by selecting data from mytable2
(a non-partitioned table).
INSERTINTO project_id.dataset.mycolumntable(ts, field1) SELECT ts, id FROM project_id.dataset.mytable2
Deleting data
You use a DML DELETE
statement
to delete rows from a partitioned table.
Deleting data in ingestion-time partitioned tables
The following DELETE
statement deletes all rows from the June 1, 2017
partition ("2017年06月01日"
) of mytable
where field1
is equal to 21
. You
reference the partition using the _PARTITIONTIME
pseudocolumn.
DELETE project_id.dataset.mytable WHERE field1=21 AND_PARTITIONTIME="2017-06-01"
Deleting data in partitioned tables
Deleting data in a partitioned table using DML is the same as deleting data from a non-partitioned table.
For example, the following DELETE
statement deletes all rows from the
June 1, 2017 partition ("2017年06月01日"
) of mycolumntable
where field1
is
equal to 21
.
DELETE project_id.dataset.mycolumntable WHERE field1=21 ANDDATE(ts)="2017-06-01"
Using DML DELETE to delete partitions
If a qualifying DELETE
statement covers all rows in a partition,
BigQuery removes the entire partition. This removal is done
without scanning bytes or consuming slots. The following example of a DELETE
statement covers the entire partition of a filter on the _PARTITIONDATE
pseudocolumn:
DELETEmydataset.mytable WHERE_PARTITIONDATEIN('2076-10-07','2076-03-06');
Common disqualifications
Queries with the following characteristics may not benefit from the optimization:
- Partial partition coverage
- References to non-partitioning columns
- Recently ingested data through the BigQuery Storage Write API or the legacy streaming API
- Filters with subqueries or unsupported predicates
Eligibility for optimization can vary with the type of partitioning, the underlying storage metadata, and the filter predicates. As a best practice, perform a dry run to verify that the query results in 0 bytes processed.
Multi-statement transaction
This optimization works within a multi-statement transaction. The following query example replaces a partition with data from another table in a single transaction, without scanning the partition for
the DELETE
statement.
DECLAREREPLACE_DAYDATE; BEGINTRANSACTION; -- find the partition which we want to replace SETREPLACE_DAY=(SELECTMAX(d)FROMmydataset.mytable_staging); -- delete the entire partition from mytable DELETEFROMmydataset.mytableWHEREpart_col=REPLACE_DAY; -- insert the new data into the same partition in mytable INSERTINTOmydataset.mytable SELECT*FROMmydataset.mytable_stagingWHEREpart_col=REPLACE_DAY; COMMITTRANSACTION;
Updating data
You use an UPDATE
statement
to update rows in a partitioned table.
Updating data in ingestion-time partitioned tables
The following UPDATE
statement moves rows from one partition to another.
Rows in the May 1, 2017 partition ("2017年05月01日"
) of mytable
where field1
is equal to 21
are moved to the June 1, 2017 partition ("2017年06月01日"
).
UPDATE project_id.dataset.mytable SET _PARTITIONTIME="2017-06-01" WHERE _PARTITIONTIME="2017-05-01" ANDfield1=21
Updating data in partitioned tables
Updating data in a partitioned table using DML is the same as updating data
from a non-partitioned table. For example, the following UPDATE
statement moves rows from one partition to another. Rows in the May 1, 2017
partition ("2017年05月01日"
) of mytable
where field1
is equal to 21
are
moved to the June 1, 2017 partition ("2017年06月01日"
).
UPDATE project_id.dataset.mycolumntable SET ts="2017-06-01" WHERE DATE(ts)="2017-05-01" ANDfield1=21
DML in hourly, monthly, and yearly partitioned tables
You can use DML statements to modify an hourly, monthly, or yearly partitioned table. Provide the hour, month, or year range of the relevant dates/timestamps/datetimes, as in the following example for monthly partitioned tables:
bqquery--nouse_legacy_sql'DELETE FROM my_dataset.my_table WHERE TIMESTAMP_TRUNC(ts_column, MONTH) = "2020-01-01 00:00:00";'
Or another example for partitioned tables with DATETIME
column:
bqquery--nouse_legacy_sql'DELETE FROM my_dataset.my_table WHERE dt_column BETWEEN DATETIME("2020-01-01") AND DATETIME("2020-05-01");'
Using a MERGE
statement
You use a DML MERGE
statement
to combine INSERT
, UPDATE
, and DELETE
operations for a partitioned table
into one statement and perform them atomically.
Pruning partitions when using a MERGE
statement
When you run a MERGE
statement against a partitioned table, you can limit
which partitions are scanned by including the partitioning column in either
a subquery filter, a search_condition
filter, or a merge_condition
filter.
Pruning can occur when scanning the source table or the target table, or both.
Each of the examples below queries an ingestion-time partitioned table using
the _PARTITIONTIME
pseudocolumn as a filter.
Using a subquery to filter source data
In the following MERGE
statement, the subquery in the USING
clause filters
on the _PARTITIONTIME
pseudocolumn in the source table.
MERGEdataset.targetT USING(SELECT*FROMdataset.sourceWHERE_PARTITIONTIME='2018-01-01')S ONT.COLUMN_ID=S.COLUMN_ID WHENMATCHEDTHEN DELETE
Looking at the query execution plan, the subquery runs first. Only the rows in
the '2018年01月01日'
partition in the source table are scanned. Here is the
relevant stage in the query plan:
READ 10ドル:name, 11ドル:_PARTITIONTIME
FROM temp.source
WHERE equal(11,ドル 1514764800.000000000)
Using a filter in the search_condition
of a when_clause
If a search_condition
contains a filter, then the query optimizer attempts to
prune partitions. For example, in the following MERGE
statement, each WHEN
MATCHED
and WHEN NOT MATCHED
clause contains a filter on the _PARTITIONTIME
pseudocolumn.
MERGEdataset.targetT USINGdataset.sourceS ONT.COLUMN_ID=S.COLUMN_ID WHENMATCHEDANDT._PARTITIONTIME='2018-01-01'THEN UPDATESETCOLUMN_ID=S.COLUMN_ID WHENMATCHEDANDT._PARTITIONTIME='2018-01-02'THEN UPDATESETCOLUMN_ID=S.COLUMN_ID+10 WHENNOTMATCHEDBYSOURCEANDT._PARTITIONTIME='2018-01-03'THEN DELETE
During the join stage, only the following partitions are scanned in the target
table: '2018年01月01日'
, '2018年01月02日'
, and '2018年01月03日'
— that is, the
union of all the search_condition
filters.
From the query execution plan:
READ
1ドル:COLUMN_ID, 2ドル:_PARTITIONTIME, 3ドル:$file_temp_id, 4ドル:$row_temp_id
FROM temp.target
WHERE or(equal(2,ドル 1514764800.000000000), equal(2,ドル 1514851200.000000000), equal(2,ドル 1514937600.000000000))
However, in the following example, the WHEN NOT MATCHED BY SOURCE
clause does
not have a filter expression:
MERGEdataset.targetT USINGdataset.sourceS ONT.COLUMN_ID=S.COLUMN_ID WHENMATCHEDANDT._PARTITIONTIME='2018-01-01'THEN UPDATESETCOLUMN_ID=S.COLUMN_ID WHENNOTMATCHEDBYSOURCETHEN UPDATESETCOLUMN_ID=COLUMN_ID+1
This query must scan the entire target table to compute the WHEN NOT MATCHED BY
SOURCE
clause. As a result, no partitions are pruned.
Using a constant false predicate in a merge_condition
If you use the WHEN NOT MATCHED
and WHEN NOT MATCHED BY SOURCE
clauses
together, then BigQuery usually performs a full outer join, which
cannot be pruned. However, if the merge condition uses a constant false
predicate, then BigQuery can use the filter condition for
partition pruning. For more information about the use of constant false
predicates, see the description of the merge_condition
clause in the
MERGE
statement
documentation.
The following example scans only the '2018年01月01日'
partition in both the target
and source tables.
MERGEdataset.targetT USINGdataset.sourceS ONFALSE WHENNOTMATCHEDAND_PARTITIONTIME='2018-01-01'THEN INSERT(COLUMN_ID)VALUES(COLUMN_ID) WHENNOTMATCHEDBYSOURCEAND_PARTITIONTIME='2018-01-01'THEN DELETE
Using a filter in a merge_condition
The query optimizer attempts to use a filter in a merge_condition
to prune
partitions. The query optimizer might or might not be able to push the predicate
down to the table scanning stage, depending on the type of join.
In the following example, the merge_condition
is used as a predicate to join
the source and target tables. The query optimizer can push this predicate down
when it scans both tables. As a result, the query only scans the '2018年01月01日'
partition in both the target and source tables.
MERGEdataset.targetT USINGdataset.sourceS ONT.COLUMN_ID=S.COLUMN_IDAND T._PARTITIONTIME='2018-01-01'AND S._PARTITIONTIME='2018-01-01' WHENMATCHEDTHEN UPDATESETCOLUMN_ID=NEW_VALUE
In the next example, the merge_condition
does not contain a predicate for the
source table, so no partition pruning can be performed on the source table. The
statement does contain a predicate for the target table, but the statement uses
a WHEN NOT MATCHED BY SOURCE
clause, rather than a WHEN MATCHED
clause. That
means the query has to scan the entire target table for the rows that don't
match.
MERGEdataset.targetT USINGdataset.sourceS ONT.COLUMN_ID=S.COLUMN_IDANDT._PARTITIONTIME='2018-01-01' WHENNOTMATCHEDBYSOURCETHEN UPDATESETCOLUMN_ID=NEW_VALUE
Limitations
For information about DML limitations, see Limitations on the DML reference page.
Quotas
For information about DML quota information, see DML statements on the Quotas and limits page.
Pricing
For information about DML pricing, see how to compute the query size for DML statements run on Partitioned tables.
Table security
To control access to tables in BigQuery, see Control access to resources with IAM.
What's next
- Learn how to create partitioned tables
- Learn how to query partitioned tables
- Get an introduction to DML
- Learn how to compose DML statements using DML syntax