I have the below stored procedure that is taking close to an hour to complete. The result set is about 200K machines.
All I am doing is extracting a set of ids from a linked server, deleting those ids in a local server and then extracting all details for those ids from a remote server and inserting them. The INSERT takes 90%, DELETE table 9% and SELECT INTO 1%.
id is the Primary Key/Clustered Index.
DECLARE @ProcessDate DATETIME
SET @ProcessDate = CONVERT(VARCHAR(10), DATEADD(DAY, -3,
GETDATE()), 111)
SELECT DISTINCT id
INTO #temp_machines
FROM remote_server.db.dbo.table1
WHERE dt_modify >= @ProcessDate
DELETE FROM local_server.db.dbo.table2
WHERE id IN ( SELECT id FROM #temp_machines )
INSERT INTO local_server.db.dbo.table2
SELECT *
FROM remote_server.db.dbo.table1
WHERE id IN ( SELECT id FROM #temp_machines )
Any suggestions on improving performance?
3 Answers 3
First step: Make sure there's something to tune and that your query isn't just being blocked. You can do this with free tools like sp_WhoIsActive or sp_BlitzFirst (disclaimer, I'm a contributor to the First Responder Kit).
Second step: Don't use local variables.
Third step: Maybe indexing the temp table will help.
Fourth step: Check your wait stats using sp_BlitzFirst (same disclaimer about me contributing to it, blah blah blah). It could be that the query as written is fine, but you're running into some other issue, like tempdb contention.
Assuming that by "remote server" you mean a linked server (or something similar, at least), then one potential issue is that you're making two queries for what amounts to the same data. Also, by using IN (SELECT ...)
, you may be forcing the server to re-run that SELECT
statement once per row.
I'd try something like the following:
DECLARE @ProcessDate DATETIME
SET @ProcessDate = CONVERT(VARCHAR(10), DATEADD(DAY, -3,
GETDATE()), 111)
SELECT *
INTO #temp_machines
FROM table1
WHERE dt_modify >= @ProcessDate
DELETE t2
FROM table2 t2
INNER JOIN (SELECT DISTINCT id FROM #temp_machines) tm ON (t2.id = tm.id)
INSERT INTO table2
SELECT *
FROM #temp_machines
After all, you're going to pull all the matching data from table1
into your local DB at some point; get it up front, and you can ignore the remote server for the rest of the query.
Even better, if table1.id
is unique, or a primary key, your DELETE
can be:
DELETE t2
FROM table2 t2
INNER JOIN #temp_machines tm ON (t2.id = tm.id)
It's possible that this would work even if there are multiple instances of certain id values in table1
.
While it doesn't show in your query syntax (anonymized, natch) your question implies that table1
is on a remote server. You're probably addressing it in real life with four-part syntax, right?
[RemoteServerAlias].[DatabaseName].[SchemaName].[TableName]
When you update that remote table SQL has an interesting time trying to optimize the problem. Because it doesn't have all the information it needs it sometimes makes very questionable decisions. The last query there is almost certainly bringing the entire contents of the remote table over from that server before trying to filter it.
If your initial query is fast enough -- that is, if SQL is applying that date criterion -- I'd suggest a workaround. Add that criterion to the end.
INSERT INTO table2
SELECT *
FROM table1
WHERE id IN ( SELECT id FROM #temp_machines )
AND dt_modify >= @ProcessDate
Explore related questions
See similar questions with these tags.
update t2 set t2.A=t1.A, t2.B=t1.B, etc. from local_server.db.dbo.table2 as t2 inner join remote_server.db.dbo.table1 as t1 on t2.id=t1.id WHERE dt_modify >= @ProcessDate