I have a really big query that needs to be run on multiple databases, and the results appended to a temp table and returned.
The basic syntax looks something like this:
INSERT INTO #tmpTable (Id, ...)
SELECT T1.Id, ...
FROM Server.Database.dbo.Table1 as T1
INNER JOIN #tmpIds as T ON T1.Id = T.Id
INNER JOIN Server.Database.dbo.Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Server.Database.dbo.Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Server.Database.dbo.Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Server.Database.dbo.Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Server.Database.dbo.Table6 as T6 ON T5.Id = T6.Id
The query runs quickly if run locally on the the individual servers, however it takes a long time to run if it's run from a linked server using the 4-part names like above.
The problem appears to be it's querying the linked server for the unfiltered result set first, then joining it to the #tmpIds
table on the local server afterwards, which makes the query take an very long time to run.
If I hardcode the Ids to filter the result set on the linked server, such as
SELECT T1.Id, ...
FROM Server.Database.dbo.Table1 as T1
-- INNER JOIN #tmpIds as T ON T1.Id = T.Id
INNER JOIN Server.Database.dbo.Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Server.Database.dbo.Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Server.Database.dbo.Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Server.Database.dbo.Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Server.Database.dbo.Table6 as T6 ON T5.Id = T6.Id
WHERE T1.Id IN (1, 2, 3)
it runs quickly in just a few seconds.
Is there a way to run this query so it filters the result set of the query from the linked server by the #tmpId
table first, before returning the result set to the local server?
Some things to note
The query is very large and complex, and Dynamic SQL is not a viable option for it due to the maintenance nightmare that causes.
I would be open to suggestions about how to use Dynamic SQL for something else though, such as running a stored procedure or UDF if there's a way to do that over a linked server (tried a few different methods such as
sp_executeSQL
,OPENROWSET
, andOPENQUERY
, but those all failed).- Because it uses the 4-part naming convention, I cannot use a UDF on the remote server
Distributed Transactions are disabled, so the following does not work
INSERT INTO #table EXEC Server.Database.dbo.StoredProcedure @ids
4 Answers 4
The performance problem actually had to do with the LEFT OUTER JOIN
tables. If I changed them to INNER JOIN
, or if I excluded their data from the SELECT
columns, the query ran fine.
What I ended up doing was creating a View
on the linked server containing all the data I wanted from it, then simply joining to it from the primary server with the #tmpIds
table.
I didn't think this would work since I thought joining everything and pulling it down to the second server before filtering was the same as what I was doing now, and would lead to the same performance problem, but surprisingly that doesn't appear to be the case.
CREATE VIEW MyView
AS
SELECT T1.Id, T2.ColA, ...
FROM Table1 as T1
INNER JOIN Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Table6 as T6 ON T5.Id = T6.Id
GO
and
INSERT INTO #tmpTable (Id, ...)
SELECT T1.Id, T1.ColA, ...
FROM Server.Database.dbo.MyView as T1
INNER JOIN #tmpIds as T ON T1.Id = T.Id
All the joined columns were correctly indexed, however according to this answer
Even though there may be indexing on tables on the remote server, SQL may not be able to take advantage of them while it can build a local query plan that does take advantages of indexing.
And this one
Let the linked server do as much as possible.
It is impossible for SQL Server to optimise a query on a linked server, even another SQL Server
so I am guessing that the query plan used for the query was not using the Indices defined, and SQL Server was generating a poor query plan for the LEFT OUTER JOIN
tables.
Have you tried the FORCE ORDER query hint? It forces the compiler to keep the order of the joins as listed in the query when optimizing it.
SELECT T1.Id, ...
FROM Server.Database.dbo.Table1 as T1
INNER JOIN #tmpIds as T ON T1.Id = T.Id
INNER JOIN Server.Database.dbo.Table2 as T2 ON T1.Id = T2.Id
INNER JOIN Server.Database.dbo.Table3 as T3 ON T2.Id = T3.Id
LEFT OUTER JOIN Server.Database.dbo.Table4 as T4 ON T3.Id = T4.Id
LEFT OUTER JOIN Server.Database.dbo.Table5 as T5 ON T4.Id = T5.Id
LEFT OUTER JOIN Server.Database.dbo.Table6 as T6 ON T5.Id = T6.Id
OPTION (FORCE ORDER)
EDIT: Given that FORCE ORDER didn't work, have you thought about doing something like this:
WHERE T1.Id IN (SELECT Id FROM #tmpIds)
2nd EDIT: One more try, this one is a bit complex though.
Can you do something like this:
On the remote server create a permanent "temporary" table
CREATE TABLE tmpTable1 (Id INT)
Then (still on the remote server) create a view
CREATE VIEW queryView AS
SELECT Table1.*
FROM Table1
JOIN tmpTable1
ON Table1.Id = tmpTable1.Id
Then in your process on your "home" instance
DELETE FROM Server.Database.dbo.tmpTable1
INSERT INTO Server.Database.dbo.tmpTable1 VALUES
SELECT * FROM #tmpIds
Then in your query join to Server.Database.dbo.queryView
-
I just tried now and it doesn't appear to be making a positive difference (it makes the query slower by about another 10 seconds)Rachel– Rachel2013年05月06日 19:10:56 +00:00Commented May 6, 2013 at 19:10
-
@kennethfisher I would be reluctant to suggest FORCE ORDER option without knowing the OP's environment and the data retrieved by the query. As OP mentioned that the query is complex. SQL Server Query Execution Engine is pretty smart to decide the best execution plan with least query cost for any query.Kin Shah– Kin Shah2013年05月06日 19:17:54 +00:00Commented May 6, 2013 at 19:17
-
@Kin the Optimizer is very smart. But it isn't perfect either and I've seen plenty of cases where hints will speed things up. On the other hand I was suggesting the OP give it a shot and see if it worked. It didn't so I certainly wouldn't use it :)Kenneth Fisher– Kenneth Fisher2013年05月06日 19:29:07 +00:00Commented May 6, 2013 at 19:29
-
In regards to your last edit suggesting using a
WHERE
instead of anINNER JOIN
on the#tmpId
table, I have tried that, and withEXISTS
, and it doesn't make a difference.Rachel– Rachel2013年05月06日 20:00:43 +00:00Commented May 6, 2013 at 20:00
I would recommend to write a user defined function on each linked server, which gets all the data necessary from them, and then query the function via openquery, like this:
INSERT INTO #tmpTable (Id, ...)
SELECT T1.Id, ...
FROM OPENQUERY([Server], 'SELECT * FROM Database.dbo.UdfGetData()')
INNER JOIN #tmpIds as T ...
In this way all the data you need to be imported is processed in the linked server, and you only get the results via opequery.
If you query each table like this:
... INNER JOIN Server.Database.dbo.Table2 ...
You get all the data from EACH table of the linked server to the local server and load it into memory. Then the joins are performed in the local server, probably (I'm guessing) without all the indexes. So you are importing more data than you need, and also the joins are more slow because the lack of indexes.
I had this issue some time ago, and using OPENQUERY I was able to reduce the execution time of my process from about two days (hehehe, in fact no one noticed it until the server become slower) to ten minutes.
The disadvantage of this method is that you have to concat the parameters in the query string of openquery. To become over this, I would suggest the next:
CREATE TABLE #TempT (
a INT NOT NULL,
b ...
);
DECLARE @query VARCHAR(MAX);
SET @query
= 'SELECT a, b, ... FROM OPENQUERY([Server], SELECT * FROM Database.dbo.UdfGetData(' + @p1 + ',' + @p2 ')';
-- @p1 and @p2 are the parameters, but you will need to format them according to the datatype: DATETIME, VARCHAR, etc
INSERT INTO #TempT EXEC (@query);
INSERT INTO #tmpTable (Id, ...)
SELECT T1.Id, ...
FROM #TempT
INNER JOIN #tmpIds as T ...
I know it's not pretty, and some people would say it is an 'unholy combination of bad ideas', but it does the work XD
You have experienced one of the main problems with linked servers.
ALL data is retrieved over the network and then filtered.
IF the #tmpFile is relatively small compared to the others you could consider a stored procedure that RECEIVES #tmpFile, then runs all the joins locally and returns a dataset to the caller.
-
That is what I don't know how to do. I cannot
INSERT INTO #tmpResults
the results of the stored procedure because distributed transactions is not available. Also keep in mind that I'm using SQL 2005 - I know some features that make life easier in later versions of SQL are not available to me, such as table-parameters for stored procedures.Rachel– Rachel2013年05月06日 20:17:29 +00:00Commented May 6, 2013 at 20:17 -
another solution might be powershell remoting. you send a sql query - via powershell - the query is run locally and the results are all returned and collated.Jimbo– Jimbo2013年05月07日 11:54:02 +00:00Commented May 7, 2013 at 11:54
Explore related questions
See similar questions with these tags.
INNER JOINS
reduces this to 20,000 records right now (this will go up however, as we only started added data to this table in January) so that is the max number of records that get returned. The#tmpIds
table further reduces this to about 1,000 records. The query is expected to run a few times a week as needed, and it will probably get run multiple times throughout the day on Mondays.SELECT
columns are actually playing a role in how the joins are getting evaluated, as if I justSELECT T1.*
, it returns quickly, as opposed toSELECT T4.ColumnA
. I'm currently trying to determine the problem tables, and am going to try to move them into a seprate query