Optimize delete query in SQL Server 2008 R2 SP1

Question 1

I have a delete query which takes around 6 mins to complete. I am looking for ways to optimise it and if its duration can be shortened. Below is its anonymized (due to security constraints) estimated execution plan.

Execution plan:

enter image description here

Query:

DELETE TOP(10000) Object1
 FROM Object1
 LEFT JOIN Object2
 ON 
 Object1.TID = Object2.TID
 WHERE 
 Object2.TID IS NULL

Record count details:

sp_spaceused Object1 -- 124164707
sp_spaceused Object2 -- 27799877

Index details:

1) Object1

CL index on TID(-) DESC, TIndex(-) DESC
No other indexes apart from above index

2) Object2

CL index on TID(-) DESC
NC index [NodeID] ASC
NC index [DateTime] ASC
NC index [TimeStamp] ASC
NC index [TType] ASC

SQL Server Environment details:

SQL Server 2008 R2 SP1
Edition: Enterprise
Version: 10.50.2500.0

DDL:

--Object1
CREATE TABLE [dbo].[Object1](
 [TID] [bigint] NOT NULL,
 [TIndex] [tinyint] NOT NULL,
 [OID] [varchar](1000) NOT NULL,
 [OIDName] [varchar](100) NOT NULL,
 [OIDValue_ANSI] [varchar](1000) NOT NULL CONSTRAINT [DF_Object1_OIDValue_ANSI] DEFAULT (''),
 [RawValue_ANSI] [varchar](1000) NOT NULL CONSTRAINT [DF_Object1_RawValue_ANSI] DEFAULT (''),
 [OIDValue_Unicode] [nvarchar](1000) NULL,
 [RawValue_Unicode] [nvarchar](1000) NULL,
 [OIDValue] AS (isnull([OIDValue_Unicode],[OIDValue_ANSI])),
 [RawValue] AS (isnull([RawValue_Unicode],[RawValue_ANSI])),
 CONSTRAINT [PK_Object1] PRIMARY KEY CLUSTERED 
(
 [TID] DESC,
 [TIndex] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
--Object2
 CREATE TABLE [dbo].[Object2](
 [TID] [bigint] IDENTITY(97196339,1) NOT NULL,
 [EID] [int] NOT NULL,
 [DateTime] [datetime] NOT NULL CONSTRAINT [DF_Object2_RecivedDateTime] DEFAULT (getdate()),
 [IP] [varchar](50) NOT NULL CONSTRAINT [DF_Object2_IP] DEFAULT ('0.0.0.0'),
 [C_ANSI] [varchar](255) NOT NULL CONSTRAINT [DF_Object2_C_ANSI] DEFAULT (''),
 [C_Unicode] [nvarchar](255) NULL,
 [Community] AS (isnull([Community_Unicode],[Community_ANSI])),
 [Tag_ANSI] [varchar](100) NOT NULL CONSTRAINT [DF_Object2_Tag_ANSI] DEFAULT (''),
 [Tag_Unicode] [nvarchar](100) NULL,
 [Tag] AS (isnull([Tag_Unicode],[Tag_ANSI])),
 [Acknowledged] [tinyint] NOT NULL CONSTRAINT [DF_Object2_Acknowledged] DEFAULT ((0)),
 [Hname_ANSI] [varchar](255) NOT NULL CONSTRAINT [DF_Object2_Hname_ANSI] DEFAULT (''),
 [Hname_Unicode] [nvarchar](255) NULL,
 [Hostname] AS (isnull([Hostname_Unicode],[Hostname_ANSI])),
 [NodeID] [bigint] NOT NULL CONSTRAINT [DF_Object2_NodeID] DEFAULT ((0)),
 [TType] [varchar](100) NOT NULL CONSTRAINT [DF_Object2_TType] DEFAULT (''),
 [ColorCode] [int] NULL,
 [TimeStamp] [timestamp] NOT NULL,
 CONSTRAINT [PK_Object2] PRIMARY KEY CLUSTERED 
 (
 [TID] DESC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 ) ON [PRIMARY]

I tried replacing LEFT JOIN with NOT IN and NOT EXISTS but it didn't help much. Rather NOT IN performed worst and NOT EXISTS was slightly better (5.8 secs) but still not that better. For all these testings, I used below select query since I can't use actual delete:

SELECT Object1.TID
 FROM Object1
 LEFT JOIN Object2
 ON 
 Object1.TID = Object2.TID
 WHERE 
 Object2.TID IS NULL

I would appreciate if someone could provide thoughts/suggestions on it. If you need more details, I'll try to provide it as much as I can considering security limitations I have.

Additional info:

The delete runs daily as part of SQL job. I am doing Naive Batching as described in Take Care When Scripting Batches (by Michael J. Swart) to delete rows until all of the rows are deleted.

Number of rows to delete depends on the data at that point of time the SQL job runs. Sorry but don't have the exact number as it varies daily and I never recorded it.

The following query takes 6 min 30s:

SELECT COUNT(*)
FROM Object1
LEFT JOIN Object2
 ON Object2.TID = Object1.TID
WHERE
 Object2.TID IS NULL;

The result is 0 rows (since we delete it daily we won't be getting much records).

The following query returns 123,529,024 records:

SELECT COUNT(*)
FROM Object1
LEFT JOIN Object2
 ON Object1.TID = Object2.TID

Question 2

Using Michael J. Swart's Take Care When Scripting Batches code as a base, I'd like to offer another possible solution which uses a temp table to track the next set of rows to be deleted. The code keeps track of the max TID value from the previous delete and utilizes that information to keep from re-scanning the entire table over and over. I'd be curious to know if it would work for your situation.

--------------------------
--initial demo data set up
--------------------------
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#Object1') IS NOT NULL
 DROP TABLE #Object1
IF OBJECT_ID('tempdb..#Object2') IS NOT NULL
 DROP TABLE #Object2
CREATE TABLE #Object1 (TID BIGINT)
CREATE TABLE #Object2 (TID BIGINT)
DECLARE @UpperLimit INT;
SET @UpperLimit = 1000;
WITH n
AS (
 SELECT x = ROW_NUMBER() OVER (
 ORDER BY s1.[object_id]
 )
 FROM sys.all_objects AS s1
 CROSS JOIN sys.all_objects AS s2
 CROSS JOIN sys.all_objects AS s3
 )
--Populate Object1 with some test data
INSERT INTO #Object1 (tid)
SELECT x
FROM n
WHERE x BETWEEN 1
 AND @UpperLimit;
SELECT *
FROM #Object1
--Populate Object2 with some test data
insert into #Object2(TID) values(10),(20),(30)
------------------------------
--the real delete code
------------------------------
DECLARE @LargestKeyProcessed BIGINT = - 1
 ,@NextBatchMax INT
 ,@RC INT = 1;
IF OBJECT_ID('tempdb..#Object1RowsToDelete') IS NOT NULL
 DROP TABLE #Object1RowsToDelete
--Create a temp table to hold the TID values for
--Object1 rows where there does not exist a corresponding row
--on Object2 for TID
CREATE TABLE #Object1RowsToDelete (TID BIGINT NOT NULL)
--Careful batch loop until all intended rows are deleted
WHILE (@RC > 0)
BEGIN
 --Truncate the temp table
 TRUNCATE TABLE #Object1RowsToDelete
 --Populate the temp table with the next set of TID's 
 --that need to be deleted
 INSERT INTO #Object1RowsToDelete
 SELECT TOP (10) o1.TID --Alter TOP as needed
 FROM #Object1 o1
 LEFT JOIN #Object2 o2 ON O1.TID = O2.TID
 WHERE o2.TID IS NULL
 AND O1.TID > @LargestKeyProcessed
 ORDER BY O1.TID ASC;
 --Delete from Object1 by joining against the temp table
 DELETE O1
 FROM #Object1 o1
 JOIN #Object1RowsToDelete d ON d.TID = o1.TID
 SET @RC = @@ROWCOUNT;
 --The max TID in the temp table is where we need to start on 
 --the next interation of the while loop
 SET @LargestKeyProcessed = (
 SELECT max(TID)
 FROM #Object1RowsToDelete
 )
print @LargestKeyProcessed
END
--Select the remaining rows from Object1
--In this example, only 10, 20 and 30 remain because
--those TID's were on Object2
SELECT *
FROM #Object1

Question 3

Thanks Scott and everyone for your assistance! I tested this approach as well but unfortunately duration of the query still remained the same. However would like to highlight that this new query converted clustered index scan for both the tables into clustered index seek. Apart from this didn't see any other change. Looks like I'll have to live with this duration only since there is large amount of data present in both the tables.

Question 4

As I mentioned in a earlier comment (and I wasn't sure if you had already tried this) - if the execution time of the delete is extremely important, you might benefit from creating non-clustered indexes on both tables for TID. The optimizer might decide to scan the skinny'er indexes to find the rows to delete thus saving execution time. You'd have to weigh any 'delete' benefit against the overhead of maintaining the additional indexes.

Question 5

@ScottHodgin That is a creative proposal and seems quite sound in theory. Just the old, tricky, balance between benefiting the performance of this particular task and all the other activities, of course.

Question 6

I feel like the fact that you're searching for rows to delete using the first column in the clustered index is an important distinction from the scenario in this article and might merit a slightly different approach. http://michaeljswart.com/2014/09/take-care-when-scripting-batches/

I propose first discovering all the values for TID that need to be deleted from Object1 and putting these in a temporary table. Then you'll never need to read Object2 again and the delete from Object1 won't be held up by any other operation.

-- First, let's prepare for the delete by identifying which keys will be deleted.
CREATE TABLE #Object1_TID_To_Delete (TID bigint NOT NULL PRIMARY KEY CLUSTERED);
-- By storing the keys to delete in a temporary table, we're reading from Object2 only a single time.
-- This should help reduce locking.
INSERT INTO #Object1_TID_To_Delete
SELECT TID 
 FROM Object1 AS o1
 WHERE NOT EXISTS (SELECT 1 FROM Object2 AS o2 WHERE o1.TID = o2.TID)
 GROUP BY TID;
WHILE EXISTS (SELECT 1 FROM #Object1_TID_To_Delete)
BEGIN
 -- Limit the delete to the first 1000 rows so that we're not locking Object1 for too long.
 -- The ORDER BY is so we can be absolutely sure which 1000 rows were picked.
 -- Matching the clustered index of the temp table with Object1 means we should obtain 
 -- a merge join in our statement plan.
 WITH del AS (
 SELECT TOP 1000 TID FROM #Object1_TID_To_Delete ORDER BY TID)
 DELETE 
 FROM o1
 FROM Object1 AS o1
 JOIN del ON o1.TID = del.TID;
 WITH del AS (
 SELECT TOP 1000 TID FROM #Object1_TID_To_Delete ORDER BY TID)
 DELETE 
 FROM del;
END
DROP TABLE #Object1_TID_To_Delete;

Question 7

Thank you @mendosi! I did try your approach but unfortunately it didn't make much difference as far as duration is concerned as the maximum time it is taking on initial select which is inserting data into temp table.

Question 8

@SQLPRODDBA judging by the number of rows in your (estimated) plan and the number and size of the columns in each row, I'm not surprised. Those tables must be huge so scanning them would take a long time.

Question 9

In the approach below, the elapsed time should reduce drastically. The inner sub query is driving the whole operation. Due to clustered index on primary key, it is very fast to find first 10000 candidate rows for deletion. As the row count decreases after the deletion operation, the inner sub query gets faster. You may have to find a happy medium with the set rowcount. Make sure same is reflected in the select top sub query.

set rowcount 10000
while (1=1)
begin
 delete x
 from
 Object1 x join
 (
 -- Perhaps top 10000 not needed anymore, because of rowcount enforcement
 select TOP 10000 Object1.TID 
 FROM Object1
 LEFT JOIN Object2
 ON 
 Object1.TID = Object2.TID
 WHERE 
 Object2.TID IS NULL
 ) y
 on
 x.TID=y.TID
 if @@rowcount = 0 
 break
end

Question 10

It's good form to include some explanation in the answer. Explain the parts of the given solution , why you chose to do it this way, how it improves on other options.

score 2 · Answer 1 · 2017-03-30 13:56:17Z

Using Michael J. Swart's Take Care When Scripting Batches code as a base, I'd like to offer another possible solution which uses a temp table to track the next set of rows to be deleted. The code keeps track of the max TID value from the previous delete and utilizes that information to keep from re-scanning the entire table over and over. I'd be curious to know if it would work for your situation.

--------------------------
--initial demo data set up
--------------------------
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#Object1') IS NOT NULL
 DROP TABLE #Object1
IF OBJECT_ID('tempdb..#Object2') IS NOT NULL
 DROP TABLE #Object2
CREATE TABLE #Object1 (TID BIGINT)
CREATE TABLE #Object2 (TID BIGINT)
DECLARE @UpperLimit INT;
SET @UpperLimit = 1000;
WITH n
AS (
 SELECT x = ROW_NUMBER() OVER (
 ORDER BY s1.[object_id]
 )
 FROM sys.all_objects AS s1
 CROSS JOIN sys.all_objects AS s2
 CROSS JOIN sys.all_objects AS s3
 )
--Populate Object1 with some test data
INSERT INTO #Object1 (tid)
SELECT x
FROM n
WHERE x BETWEEN 1
 AND @UpperLimit;
SELECT *
FROM #Object1
--Populate Object2 with some test data
insert into #Object2(TID) values(10),(20),(30)
------------------------------
--the real delete code
------------------------------
DECLARE @LargestKeyProcessed BIGINT = - 1
 ,@NextBatchMax INT
 ,@RC INT = 1;
IF OBJECT_ID('tempdb..#Object1RowsToDelete') IS NOT NULL
 DROP TABLE #Object1RowsToDelete
--Create a temp table to hold the TID values for
--Object1 rows where there does not exist a corresponding row
--on Object2 for TID
CREATE TABLE #Object1RowsToDelete (TID BIGINT NOT NULL)
--Careful batch loop until all intended rows are deleted
WHILE (@RC > 0)
BEGIN
 --Truncate the temp table
 TRUNCATE TABLE #Object1RowsToDelete
 --Populate the temp table with the next set of TID's 
 --that need to be deleted
 INSERT INTO #Object1RowsToDelete
 SELECT TOP (10) o1.TID --Alter TOP as needed
 FROM #Object1 o1
 LEFT JOIN #Object2 o2 ON O1.TID = O2.TID
 WHERE o2.TID IS NULL
 AND O1.TID > @LargestKeyProcessed
 ORDER BY O1.TID ASC;
 --Delete from Object1 by joining against the temp table
 DELETE O1
 FROM #Object1 o1
 JOIN #Object1RowsToDelete d ON d.TID = o1.TID
 SET @RC = @@ROWCOUNT;
 --The max TID in the temp table is where we need to start on 
 --the next interation of the while loop
 SET @LargestKeyProcessed = (
 SELECT max(TID)
 FROM #Object1RowsToDelete
 )
print @LargestKeyProcessed
END
--Select the remaining rows from Object1
--In this example, only 10, 20 and 30 remain because
--those TID's were on Object2
SELECT *
FROM #Object1

Thanks Scott and everyone for your assistance! I tested this approach as well but unfortunately duration of the query still remained the same. However would like to highlight that this new query converted clustered index scan for both the tables into clustered index seek. Apart from this didn't see any other change. Looks like I'll have to live with this duration only since there is large amount of data present in both the tables.
As I mentioned in a earlier comment (and I wasn't sure if you had already tried this) - if the execution time of the delete is extremely important, you might benefit from creating non-clustered indexes on both tables for TID. The optimizer might decide to scan the skinny'er indexes to find the rows to delete thus saving execution time. You'd have to weigh any 'delete' benefit against the overhead of maintaining the additional indexes.
@ScottHodgin That is a creative proposal and seems quite sound in theory. Just the old, tricky, balance between benefiting the performance of this particular task and all the other activities, of course.

mendosi mendosi 2,09712 silver badges22 bronze badges · Answer 2 · 2017-03-28 01:35:20Z

I feel like the fact that you're searching for rows to delete using the first column in the clustered index is an important distinction from the scenario in this article and might merit a slightly different approach. http://michaeljswart.com/2014/09/take-care-when-scripting-batches/

I propose first discovering all the values for TID that need to be deleted from Object1 and putting these in a temporary table. Then you'll never need to read Object2 again and the delete from Object1 won't be held up by any other operation.

-- First, let's prepare for the delete by identifying which keys will be deleted.
CREATE TABLE #Object1_TID_To_Delete (TID bigint NOT NULL PRIMARY KEY CLUSTERED);
-- By storing the keys to delete in a temporary table, we're reading from Object2 only a single time.
-- This should help reduce locking.
INSERT INTO #Object1_TID_To_Delete
SELECT TID 
 FROM Object1 AS o1
 WHERE NOT EXISTS (SELECT 1 FROM Object2 AS o2 WHERE o1.TID = o2.TID)
 GROUP BY TID;
WHILE EXISTS (SELECT 1 FROM #Object1_TID_To_Delete)
BEGIN
 -- Limit the delete to the first 1000 rows so that we're not locking Object1 for too long.
 -- The ORDER BY is so we can be absolutely sure which 1000 rows were picked.
 -- Matching the clustered index of the temp table with Object1 means we should obtain 
 -- a merge join in our statement plan.
 WITH del AS (
 SELECT TOP 1000 TID FROM #Object1_TID_To_Delete ORDER BY TID)
 DELETE 
 FROM o1
 FROM Object1 AS o1
 JOIN del ON o1.TID = del.TID;
 WITH del AS (
 SELECT TOP 1000 TID FROM #Object1_TID_To_Delete ORDER BY TID)
 DELETE 
 FROM del;
END
DROP TABLE #Object1_TID_To_Delete;

Thank you @mendosi! I did try your approach but unfortunately it didn't make much difference as far as duration is concerned as the maximum time it is taking on initial select which is inserting data into temp table.
@SQLPRODDBA judging by the number of rows in your (estimated) plan and the number and size of the columns in each row, I'm not surprised. Those tables must be huge so scanning them would take a long time.

jujiro jujiro 1212 bronze badges · Answer 3 · 2018-02-17 00:10:11Z

In the approach below, the elapsed time should reduce drastically. The inner sub query is driving the whole operation. Due to clustered index on primary key, it is very fast to find first 10000 candidate rows for deletion. As the row count decreases after the deletion operation, the inner sub query gets faster. You may have to find a happy medium with the set rowcount. Make sure same is reflected in the select top sub query.

set rowcount 10000
while (1=1)
begin
 delete x
 from
 Object1 x join
 (
 -- Perhaps top 10000 not needed anymore, because of rowcount enforcement
 select TOP 10000 Object1.TID 
 FROM Object1
 LEFT JOIN Object2
 ON 
 Object1.TID = Object2.TID
 WHERE 
 Object2.TID IS NULL
 ) y
 on
 x.TID=y.TID
 if @@rowcount = 0 
 break
end

It's good form to include some explanation in the answer. Explain the parts of the given solution , why you chose to do it this way, how it improves on other options.

Stack Exchange Network

Optimize delete query in SQL Server 2008 R2 SP1

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Optimize delete query in SQL Server 2008 R2 SP1

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions