Multi-statement transactions
BigQuery supports multi-statement transactions inside a single query, or across multiple queries when using sessions. A multi-statement transaction lets you perform mutating operations, such as inserting or deleting rows on one or more tables, and either commit or roll back the changes atomically.
Uses for multi-statement transactions include:
- Performing DML mutations on multiple tables as a single transaction. The tables can span multiple datasets or projects.
- Performing mutations on a single table in several stages, based on intermediate computations.
Transactions guarantee ACID properties and support snapshot isolation. During a transaction, all reads return a consistent snapshot of the tables referenced in the transaction. If a statement in a transaction modifies a table, the changes are visible to subsequent statements within the same transaction.
Transaction scope
A transaction must be contained in a single SQL query, except when in
Session mode. A query can contain multiple
transactions, but they cannot be nested. You can run multi-statement
transactions
over multiple queries in a session.
To start a transaction, use the
BEGIN TRANSACTION
statement. The transaction ends when any of the following occur:
- The query executes a
COMMIT TRANSACTIONstatement. This statement atomically commits all changes made inside the transaction. - The query executes a
ROLLBACK TRANSACTIONstatement. This statement abandons all changes made inside the transaction. - The query ends before reaching either of these two statements. In that case, BigQuery automatically rolls back the transaction.
If an error occurs during a transaction and the query has an exception handler, then BigQuery transfers control to the exception handler. Inside the exception block, can choose whether to commit or roll back the transaction.
If an error occurs during a transaction and there is no exception handler, then the query fails and BigQuery automatically rolls back the transaction.
The following example shows an exception handler that rolls back a transaction:
BEGIN BEGINTRANSACTION; INSERTINTOmydataset.NewArrivals VALUES('top load washer',100,'warehouse #1'); -- Trigger an error. SELECT1/0; COMMITTRANSACTION; EXCEPTIONWHENERRORTHEN -- Roll back the transaction inside the exception handler. SELECT@@error.message; ROLLBACKTRANSACTION; END;
Statements supported in transactions
The following statement types are supported in transactions:
- Query statements:
SELECT - DML statements:
INSERT,UPDATE,DELETE,MERGE, andTRUNCATE TABLE DDL statements on temporary entities:
CREATE TEMP TABLECREATE TEMP FUNCTIONDROP TABLEon a temporary tableDROP FUNCTIONon a temporary function
DDL statements that create or drop permanent entities, such as datasets, tables, and functions, are not supported inside transactions.
Date/time functions in transactions
Within a transaction, the following date/time functions have special behaviors:
The
CURRENT_TIMESTAMP,CURRENT_DATE, andCURRENT_TIMEfunctions return the timestamp when the transaction started.You cannot use the
FOR SYSTEM_TIME AS OFclause to read a table beyond the timestamp when the transaction started. Doing so returns an error.
Example of a transaction
This example assumes there are two tables named Inventory and NewArrivals,
created as follows:
CREATEORREPLACETABLEmydataset.Inventory ( productstring, quantityint64, supply_constrainedbool ); CREATEORREPLACETABLEmydataset.NewArrivals ( productstring, quantityint64, warehousestring ); INSERTmydataset.Inventory(product,quantity) VALUES('top load washer',10), ('front load washer',20), ('dryer',30), ('refrigerator',10), ('microwave',20), ('dishwasher',30); INSERTmydataset.NewArrivals(product,quantity,warehouse) VALUES('top load washer',100,'warehouse #1'), ('dryer',200,'warehouse #2'), ('oven',300,'warehouse #1');
The Inventory table contains information about current inventory, and
NewArrivals contains information about newly arrived items.
The following transaction updates Inventory with new arrivals and deletes the
corresponding records from NewArrivals. Assuming that all statements complete
successfully, the changes in both tables are committed atomically as a single
transaction.
BEGINTRANSACTION; -- Create a temporary table that holds new arrivals from 'warehouse #1'. CREATETEMPTABLEtmp ASSELECT*FROMmydataset.NewArrivalsWHEREwarehouse='warehouse #1'; -- Delete the matching records from the NewArravals table. DELETEmydataset.NewArrivalsWHEREwarehouse='warehouse #1'; -- Merge the records from the temporary table into the Inventory table. MERGEmydataset.InventoryASI USINGtmpAST ONI.product=T.product WHENNOTMATCHEDTHEN INSERT(product,quantity,supply_constrained) VALUES(product,quantity,false) WHENMATCHEDTHEN UPDATESETquantity=I.quantity+T.quantity; -- Drop the temporary table and commit the transaction. DROPTABLEtmp; COMMITTRANSACTION;
Transaction concurrency
If a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. Conflicting transactions are cancelled. Conflicting DML statements that run outside of a transaction are queued to run later, subject to queuing limits.
Operations that read or append new rows can run concurrently with the transaction. For example, any of the following operations can be performed concurrently on a table while a transaction mutates data in the same table:
SELECTstatements- BigQuery Storage Read API read operations
- Queries from BigQuery BI Engine
INSERTstatements- Load jobs that use
WRITE_APPENDdisposition to append rows - Streaming writes
If a transaction only reads a table or appends new rows to it, any operation can be performed concurrently on that table.
Viewing transaction information
BigQuery assigns a transaction ID to each multi-statement
transaction. The transaction ID is attached to each query that executes inside
the transaction. To view the transaction IDs for your jobs, query the
INFORMATION_SCHEMA.JOBS* views
for the transaction_id column.
When a multi-statement transaction runs, BigQuery creates a child
job for each statement in the transaction. For a given transaction, every child
job that is associated with that transaction has the same transaction_id
value.
The following examples show how to find information about your transactions.
Find all committed or rolled back transactions
The following query returns all transactions that were successfully committed.
SELECTtransaction_id,parent_job_id,query FROM`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHEREstatement_type="COMMIT_TRANSACTION"ANDerror_resultISNULL;
The following query returns all transactions that were successfully rolled back.
SELECT transaction_id,parent_job_id,query FROM`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHEREstatement_type="ROLLBACK_TRANSACTION"ANDerror_resultISNULL;
Find the start and end time of a transaction
The following query returns the starting and ending times for a specified transaction ID.
SELECTtransaction_id,start_time,end_time,statement_type FROM`region-us`.INFORMATION_SCHEMA.JOBS_BY_USER WHEREtransaction_id="TRANSACTION_ID" ANDstatement_typeIN ("BEGIN_TRANSACTION","COMMIT_TRANSACTION","ROLLBACK_TRANSACTION") ORDERBYstart_time;
Find the transaction in which a job is running
The following query gets the transaction associated with a specified job ID. It
returns NULL if the job is not running within a multi-statement transaction.
SELECTtransaction_id FROM`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHEREjob_id='JOB_ID';
Find the current job running within a transaction
The following query returns information about the job that is currently running within a specified transaction, if any.
SELECTjob_id,query,start_time,total_slot_ms FROM`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHEREtransaction_id='TRANSACTION_ID'ANDstate=RUNNING;
Find the active transactions that affect a table
The following query returns the active transactions that affect a specified table. For each active transaction, if the transaction is running as part of multi-statement queries such as within a stored procedure, then it also returns the parent job ID. If the transaction is running within a session, then it also returns the session info.
WITHrunning_transactionsAS( SELECTDISTINCTtransaction_id FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT EXCEPTDISTINCT SELECTtransaction_id FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE statement_type='COMMIT_TRANSACTION' ORstatement_type='ROLLBACK_TRANSACTION' ) SELECT jobs.transaction_id,parent_job_id,session_info,query FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECTASjobs,running_transactions WHERE destination_table=("PROJECT_NAME","DATASET_NAME","TABLE_NAME") ANDjobs.transaction_id=running_transactions.transaction_id;
Find the active transactions running in a multi-statement transaction
The following query returns the active transactions for a particular job, specified by the ID of the job that is running the multi-statement transaction.
SELECTDISTINCTtransaction_id FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE parent_job_id="JOB_ID" EXCEPTDISTINCT SELECTtransaction_id FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE parent_job_id="JOB_ID" AND(statement_type='COMMIT_TRANSACTION' ORstatement_type='ROLLBACK_TRANSACTION');
Limitations
- Transactions cannot use DDL statements that affect permanent entities.
- Within a transaction, materialized views are interpreted as logical views. You can still query a materialized view inside a transaction, but it doesn't result in any performance improvement or cost reduction compared with the equivalent logical view.
A multi-statement transaction that fails triggers a rollback operation, undoing all pending changes and precluding retries.
A transaction can mutate data in at most 100 tables and can perform at most 100,000 partition modifications.
BI Engine does not accelerate queries inside a transaction.
Metadata for external data sources can't be refreshed within a transaction using a system procedure.