Tuesday, January 08, 2019

Print.. the disruptor of batch deletes in SQL



Someone had an issue where a batched delete script was not deleting anything. I looked over some code in our repository and noticed two patterns the way queries are written to handle batch deletes

One is a while loop that runs while @@rowcount is greater than 0

WHILE @@rowcount > 0
 BEGIN
 DELETE TOP (5000)
 FROM SomeTable
 END


The other way is to run a while loop which is always true and then check if @@rowcount is 0, if it is 0 then break out of the loop

 WHILE 1 = 1 
 BEGIN 
 DELETE TOP(5000) 
 FROM SomeTable
 IF @@ROWCOUNT = 0 
 BREAK 
 END


I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts

Let's take a look at an example. This is a simplified example without a where clause..but let's say you have to delete several million rows from a table with many more millions of rows and the table is replicated... in that case you want to batch the deletes so that your log file doesn't fill up, replication has a chance to catch up and in general the deletes should run faster

SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId,
 getutcdate() AS SomeDate, newid() AS SomeValue
INTO SomeTable
FROM sys.sysobjects t1
CROSS JOIN sys.sysobjects t2
SELECT COUNT(*) from SomeTable 
SELECT * FROM SomeTable WHERE 1= 0
WHILE @@rowcount > 0
 BEGIN
 DELETE TOP (5000)
 FROM SomeTable
 END
SELECT COUNT(*) from SomeTable 

DROP TABLE SomeTable -- Added here as cleanup in case people run the example

This is of course a silly example because why would you do a count like that against a different table before a delete

But what if you had this instead, you put a nice print statement there so that from the output you see when it started and you would also see the rowcounts?

SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId,
 getutcdate() AS SomeDate, newid() AS SomeValue
INTO SomeTable
FROM sys.sysobjects t1
CROSS JOIN sys.sysobjects t2
SELECT COUNT(*) from SomeTable 
PRINT' Starting my update now....'
WHILE @@rowcount > 0
 BEGIN
 DELETE TOP (5000)
 FROM SomeTable
 END
SELECT COUNT(*) from SomeTable 

DROP TABLE SomeTable -- Added here as cleanup in case people run the example


The count is 20000 before and after the loop, nothing got delete, this is because a print statement will reset @@rowcount to 0.

Take a look by running this simple set of queries

SELECT 1 UNION ALL SELECT 2
SELECT @@rowcount as 'Rowcount'
PRINT '1'
SELECT @@rowcount as 'RowcountAfterPrint'

Here is what the output looks like


After the PRINT line @@rowcount is reset back to 0

So if you want to use a while loop while checking @@rowcount, do this instead by running the delete first once outside the loop



SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId,
 getutcdate() AS SomeDate, newid() AS SomeValue
INTO SomeTable
FROM sys.sysobjects t1
CROSS JOIN sys.sysobjects t2
SELECT COUNT(*) from SomeTable 
PRINT' Starting my update now....'
DELETE TOP (5000)
FROM SomeTable
WHILE @@rowcount > 0
 BEGIN
 DELETE TOP (5000)
 FROM SomeTable
 END
SELECT COUNT(*) from SomeTable 
DROP TABLE SomeTable -- Added here as cleanup in case people run the example
If you run the delete this way if there was something to delete, the while loop would be entered, if the table was empty then there would no need to enter the while loop

Also keep in mind that it is not just PRINT that will reset @@rowcount back to 0.

From Books On Line:

Statements such as USE, SET <option>, DEALLOCATE CURSOR, CLOSE CURSOR, BEGIN TRANSACTION, or COMMIT TRANSACTION reset the ROWCOUNT value to 0.

That's all... hopefully this helps someone out in the future if they notice nothing gets deleted

3 comments:

Anonymous said...

I am struggling to come up with a reason to delete in this manner. The examples you've given all delete every single row from the table, or at least want to. Why not use an unqualified DELETE FROM or TRUNCATE? Hell, sometimes you even drop the table afterwards. To me, that screams temporary table.

12:04 PM
Denis said...

This is a simplified example without a where clause..but let's say you have to delete several million rows from a table with many more millions of rows and the table is replicated... in that case you want to batch the deletes

12:27 PM
Kristen said...

Personally I would never do

WHILE @@rowcount > 0
BEGIN
DELETE TOP (5000)
FROM SomeTable
END

because of the risk that some new code was introduced after the DELETE statement which then interfered with the value of @@ROWCOUNT. What I know of as "defensive programming"

e.g. a PRINT statement, or a total rows deleted count, a conditional abort after X-Minutes have been spent, a WAITFOR to allow other processes to run ... or even some sort of DEBUG statement. IME those sorts of things are routinely added "later on" for one reason or another ... and worth programming to avoid someone (me even!) making a change without realising the side-effects.

Thus my preference would be:

SELECT @MyRowCount = 1 -- Force first iteration
WHILE @MyRowCount > 0
BEGIN
DELETE TOP (5000)
FROM SomeTable
SELECT @MyRowCount = @@ROWCOUNT
END

2:06 AM

Post a Comment

Subscribe to: Post Comments (Atom)

AltStyle によって変換されたページ (->オリジナル) /