I have a nightly process which issues ALTER DATABASE
commands to change query store configuration:
ALTER DATABASE {{DB_NAME}} SET QUERY_STORE (MAX_STORAGE_SIZE_MB = X);
This is done as a workaround to prevent async Availability Group latency caused by the query store time-based cleanup policy running during primary business hours. This nightly process was working fine, but unfortunately, I sometimes see 1222 errors on unrelated code:
Lock request time out period exceeded.
For example, at 5/8/2025 11:21:49 PM, the error log reported the following:
Setting database option query_store max_storage_size_mb to 2000 for database {{DB_NAME}}.
At 2025年05月08日 23:21:50.780, error 1222 was logged by an extended events session for a DML trigger running as part of a T-SQL agent job.
There are other example of error 1222 occurring within a minute of the ALTER DATABASE
commands. Sometimes the ALTER DATABASE
command itself is killed by the error, other times it is an unrelated query. Sometimes the unrelated query is within a T-SQL agent job and sometimes it is not.
The problem only started occurring after I enabled the T-SQL agent job to change the query store settings. The error state for the 1222 errors is always 111.
The blocked process report was able to catch a few instances of the problem. The first example is when an ALTER DATABASE
statement failed. It appears to have failed after 60 seconds based on what was logged to the blocked process report. Of note, the wait resource is "DATABASE: 79:0 [QDS]" and the transaction name is "CStmtAlterDB::ChangeStateOption".
<blocked-process>
<process id="process17806091468" taskpriority="-10" logused="0" waitresource="DATABASE: 79:0 [QDS]" waittime="59342" ownerId="179645932053" transactionname="CStmtAlterDB::ChangeStateOption" lasttranstarted="2025年05月06日T20:51:03.437" XDES="0x1d3151e0040" lockMode="X" schedulerid="7" kpid="28692" status="suspended" spid="759" sbid="0" ecid="0" priority="10" trancount="2" lastbatchstarted="2025年05月06日T19:30:01.157" lastbatchcompleted="2025年05月06日T19:30:01.157" lastattention="1900年01月01日T00:00:00.157" clientapp="SQLAgent - TSQL JobStep (Job 0x48396C5ECF45A5488816DFDAA39D12FD : Step 1)" hostname="{{REMOVED}}" hostpid="6236" loginname="{{REMOVED}}" isolationlevel="read committed (2)" xactid="179645932053" currentdb="33" currentdbname="{{REMOVED}}" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">
<inputbuf>exec Monitoring.ForceQueryStoreCleanup</inputbuf>
</process>
</blocked-process>
<blocking-process>
<process status="background" waittime="1" spid="69" sbid="0" ecid="0" priority="0" trancount="0">
<inputbuf></inputbuf>
</process>
</blocking-process>
The second example is when the ALTER DATABASE
statement was delayed between 30-40 seconds but eventually succeeded:
<blocked-process>
<process id="process214f26348c8" taskpriority="-10" logused="0" waitresource="DATABASE: 38:0 [QDS]" waittime="29849" ownerId="184293576858" transactionname="CStmtAlterDB::ChangeStateOption" lasttranstarted="2025年05月08日T20:33:43.107" XDES="0x1eaa77e8040" lockMode="X" schedulerid="4" kpid="12332" status="suspended" spid="892" sbid="0" ecid="0" priority="10" trancount="2" lastbatchstarted="2025年05月08日T19:30:00.493" lastbatchcompleted="2025年05月08日T19:30:00.490" lastattention="1900年01月01日T00:00:00.490" clientapp="SQLAgent - TSQL JobStep (Job 0x48396C5ECF45A5488816DFDAA39D12FD : Step 1)" hostname="{{REMOVED}}" hostpid="6236" loginname="{{REMOVED}}" isolationlevel="read committed (2)" xactid="184293576858" currentdb="33" currentdbname="{{REMOVED}}" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">
<inputbuf>exec Monitoring.ForceQueryStoreCleanup</inputbuf>
</process>
</blocked-process>
<blocking-process>
<process status="background" waittime="2" spid="314" sbid="0" ecid="0" priority="0" trancount="0">
<inputbuf></inputbuf>
</process>
</blocking-process>
</blocked-process-report>
I would normally resolve this problem by changing the ALTER DATABASE
command to run at low priority or by changing the LOCK_TIMEOUT
setting to a low value and retrying the command if it fails. According to the documentation, ALTER DATABASE
does not respect LOCK_TIMEOUT
:
CREATE DATABASE, ALTER DATABASE, and DROP DATABASE statements do not honor the SET LOCK_TIMEOUT setting.
Further, I do not see any options to run ALTER DATABASE
at low priority.
How can I prevent ALTER DATABASE
commands from timing out unrelated queries?
-
Do you have the blocked process report on? See edit on the answer for a possible workaround.David Browne - Microsoft– David Browne - Microsoft2025年05月10日 17:11:06 +00:00Commented May 10 at 17:11
-
There are other workarounds for painful Query Store cleanup.J. Mini– J. Mini2025年05月10日 18:55:40 +00:00Commented May 10 at 18:55
2 Answers 2
Error 1222 only happens on when the session has explicitly configured [LOCK_TIMEOUT][1], which SSMS does for the object navigation pane, but not for queries in the query window. And applications normally do not.
Also in testing I can run
ALTER DATABASE current SET QUERY_STORE (MAX_STORAGE_SIZE_MB = 240);
While concurrent queries are running on the database. So the errors may not even be related.
So I'm not sure there's a real problem here.
Perhaps the failures are related to Query Store operations that can't run when reconfiguring the Query Store.
To emulate a low priority wait you could run the batch from powershell with a SqlCommand with a short CommandTimeout in a loop, or invoke-sqlcmd with the -QueryTimeout option. [1]: https://learn.microsoft.com/en-us/sql/t-sql/statements/set-lock-timeout-transact-sql?view=sql-server-ver16
As a workaround, I'm using SQL Server agent jobs to emulate running the ALTER DATABASE
command with low priority. Basic summary:
- Create a temporary agent job using
sp_add_job
with@delete_level = 3
- Add the
ALTER DATABASE
command as a job step usingsp_add_jobstep
- Start the job using
sp_start_job
- Execute
sp_stop_job
if the job is still running after X seconds - Check if the database was successfully altered. If not, then retry the process
I need to collect more data, but so far, this method has not resulted in any 1222 errors.
The full code is available under the MIT license on github.