Hello SQL Server Experts,
I'm currently experimenting with SQL Server deadlocks and have intentionally created one to better understand locking behavior. After setting up a test table, I ran two separate sessions with the following indefinite while loops:
Test table:
DROP TABLE IF EXISTS dbo.table1;
CREATE TABLE dbo.table1 (
id INT IDENTITY,
c1 CHAR(4000) DEFAULT REPLICATE('a', 4000),
c2 NVARCHAR(MAX) DEFAULT REPLICATE('x', 9000),
c3 DATETIME DEFAULT GETDATE()
);
WITH a AS (SELECT 1 AS i UNION ALL SELECT 1),
b AS (SELECT 1 AS i FROM a x, a y),
c AS (SELECT 1 AS i FROM b x, b y),
d AS (SELECT 1 AS i FROM c x, c y),
e AS (SELECT 1 AS i FROM d x, d y),
f AS (SELECT 1 AS i FROM e x, e y),
cte AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) AS n FROM f)
INSERT INTO dbo.table1
SELECT TOP (500000)
REPLICATE('a', 4000),
REPLICATE('x', 9000),
GETDATE()
FROM cte;
Session 1:
WHILE 1 = 1
BEGIN
BEGIN TRAN;
DELETE FROM table1
WHERE id % 5000 = 1
AND id >= 240000;
INSERT INTO table1 (id, c1, c2, c3)
SELECT 240000, REPLICATE('a', 4000), REPLICATE('x', 9000), GETDATE();
WAITFOR DELAY '00:00:31';
ROLLBACK;
END;
Session 2:
WHILE 1 = 1
BEGIN
BEGIN TRAN;
UPDATE table1
SET c1 = 'updated'
WHERE id % 5000 = 1;
ROLLBACK;
END;
Occasionally, this setup leads to a deadlock, and the deadlock event XML suggests that the INSERT
statement is attempting to acquire an update (U
) lock on a page. My understanding is that an INSERT
should always request an exclusive (X
) lock. Why is it trying to obtain an update lock in this scenario?
Here is the deadlock event XML for reference:
<deadlock>
<victim-list>
<victimProcess id="process236f8773848" />
</victim-list>
<process-list>
<process id="process236f8773848" taskpriority="0" logused="255572" waitresource="PAGE: 34:1:2011908 " waittime="8975" ownerId="100233793" transactionname="user_transaction" lasttranstarted="2024年02月09日T12:23:01.017" XDES="0x2374648c470" lockMode="U" schedulerid="33" kpid="16868" status="suspended" spid="60" sbid="0" ecid="0" priority="0" trancount="3" lastbatchstarted="2024年02月09日T12:27:05.243" lastbatchcompleted="2024年02月09日T12:23:50.570" lastattention="2024年02月09日T12:23:50.570" clientapp="Microsoft SQL Server Management Studio - Query" hostname="myhost" hostpid="14044" loginname="dev\mydb" isolationlevel="read committed (2)" xactid="100233793" currentdb="34" currentdbname="mydb" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
<executionStack>
<frame procname="adhoc" line="5" stmtstart="52" stmtend="142" sqlhandle="0x02000000c996cc147055426e69966dd635d985db5bc3879a0000000000000000000000000000000000000000">
unknown </frame>
<frame procname="adhoc" line="5" stmtstart="86" stmtend="218" sqlhandle="0x0200000087908a07b9eddfc8d6177991560c8f41cbc5dfad0000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
WHILE 1 = 1
BEGIN
BEGIN TRAN;
DELETE table1
WHERE id % 5000 = 1
AND id >= 240000;
INSERT INTO dbo.table1
(
id,
c1,
c2,
c3
)
SELECT 240000,
REPLICATE('a', 4000),
REPLICATE('x', 9000),
GETDATE();
WAITFOR DELAY '00:00:31'
ROLLBACK;
END; </inputbuf>
</process>
<process id="process236f8705c28" taskpriority="0" logused="972624" waitresource="PAGE: 34:1:695567 " waittime="2630" ownerId="100234046" transactionname="user_transaction" lasttranstarted="2024年02月09日T12:23:05.680" XDES="0x237cbf8c470" lockMode="U" schedulerid="11" kpid="21068" status="suspended" spid="72" sbid="0" ecid="0" priority="0" trancount="3" lastbatchstarted="2024年02月09日T12:27:07.600" lastbatchcompleted="2024年02月09日T12:23:55.770" lastattention="2024年02月09日T12:23:55.770" clientapp="Microsoft SQL Server Management Studio - Query" hostname="myhost" hostpid="14044" loginname="dev\mydb" isolationlevel="read committed (2)" xactid="100234046" currentdb="34" currentdbname="mydb" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
<executionStack>
<frame procname="adhoc" line="4" stmtstart="72" stmtend="164" sqlhandle="0x02000000b2a857012fa9eca41c1d4ad00b393826d7588f120000000000000000000000000000000000000000">
unknown </frame>
<frame procname="adhoc" line="4" stmtstart="82" stmtend="204" sqlhandle="0x02000000b67c153ad5ca4c0ec4c49b7f6d4158e6b17ea53f0000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
WHILE 1 = 1
BEGIN
BEGIN TRAN;
UPDATE table1
SET c1 = 'updated'
WHERE id % 5000 = 1
ROLLBACK;
END; </inputbuf>
</process>
</process-list>
<resource-list>
<pagelock fileid="1" pageid="2011908" dbid="34" subresource="FULL" objectname="mydb.dbo.table1" id="lock23c698f3e00" mode="X" associatedObjectId="72057594046251008">
<owner-list>
<owner id="process236f8705c28" mode="X" />
</owner-list>
<waiter-list>
<waiter id="process236f8773848" mode="U" requestType="wait" />
</waiter-list>
</pagelock>
<pagelock fileid="1" pageid="695567" dbid="34" subresource="FULL" objectname="mydb.dbo.table1" id="lock23856f2c200" mode="X" associatedObjectId="72057594046251008">
<owner-list>
<owner id="process236f8773848" mode="X" />
</owner-list>
<waiter-list>
<waiter id="process236f8705c28" mode="U" requestType="wait" />
</waiter-list>
</pagelock>
</resource-list>
</deadlock>
Additionally, since the INSERT
statement adds a row that would match the DELETE
condition (note the id equals to 240000), I would assume that the DELETE
has already placed exclusive locks on the rows or pages, and thus, they should be held until the end of the transaction. Why then is the INSERT
operation trying to get an update lock for the row/page where the record 240000
will be located?
Moreover, I'm seeking advice on how to resolve and prevent such a deadlock. What strategies or best practices can I employ to avoid this situation?
I've searched for answers online without success. Could someone please help clarify these locking behaviors for me?
Thank you in advance for your insights!
-
As it is a heap there is no specific "row/page where the record 240000 will be located"Martin Smith– Martin Smith2024年02月10日 00:14:28 +00:00Commented Feb 10, 2024 at 0:14
-
@MartinSmith I know a heap is basically a bunch of records randomly placed in some pages. But even in this case, a row will still be place some where, right?Just a learner– Just a learner2024年02月14日 21:30:52 +00:00Commented Feb 14, 2024 at 21:30
1 Answer 1
It seems to be an extremely complicated setup for something that is only designed to test a principle.
If I understand the code correctly, you begin with a test table with 500,000 sequentially and uniquely numbered rows. Only the id
column really matters in analysing the logic - the rest of the columns are just padding to make each row consume more space (intentional or not?).
The id
column is not in fact an identity for each row, because there is no primary key (or other constraint) defined.
Hence, the INSERT
can proceed to add a row with id=240000 twice. This falsifies one of your claims:
Additionally, since the INSERT statement adds a row that would match the DELETE condition (note the id equals to 240000)
If there was a primary key, the insert would always fail, because the preceding DELETE
does not remove rows with id=240000 (it removes only those that satisfy the filter id % 5000 = 1
, which in plain English means it removes every 5000th row which begins with a 1; so 1, 5001, 10001, ... 240001, etc.)
My understanding is that an INSERT should always request an exclusive (X) lock. Why is it trying to obtain an update lock in this scenario?
What the deadlock XML actually describes is that each process holds an exclusive lock on a page for itself, whilst each process is also seeking an update lock on a page that is already locked by the other.
The main conflict in principle appears to be between the DELETE
and UPDATE
statements, since these are both seeking to operate upon the same data.
My guess is that the algorithm for the DELETE
blocks the algorithm for the UPDATE
after the UPDATE
has already taken some locks, then finally the INSERT
cannot proceed because of locks already taken by the UPDATE
, and there is occasionally a coincidence where the INSERT
happens to want to affect the same page as is already locked by the UPDATE
, even though at a row level there cannot be any conflict.
What strategies or best practices can I employ to avoid this situation?
Avoid nonsensical test scenarios, I guess. Unindexed heap tables are an extremely uncommon animal, and for there to be intensive concurrent activity on such a table that would almost certainly benefit from an index, is probably even less common again.
Without understanding the purpose for this configuration - I assume it actually had no purpose, except to cause a deadlock scenario - it's impossible to advise meaningfully on what alterations to design would avoid the situation whilst still fulfilling the underlying purpose.
Potential solutions for real-world scenarios would include indexing the data differently, operating on less data at once in a single transaction, taking larger-grained locks at the outset of the transaction (locking the entire table), avoiding a 31 second wait in the middle of a transaction, and automatically retrying in the event of a deadlock.
In this particular case, it's also possible that hinting for smaller-granularity (row-level instead of page level) locks would solve the deadlock, since that would avoid the case where otherwise separate data happens to fall on the same page, and two processes therefore end up competing for locks on the same page even though their activity concerns different rows on that page.
Explore related questions
See similar questions with these tags.