I'm working on a query that selects counts from 2 servers and joins those together. Here is what I have:
SELECT
a.NAME
, SUM(ISNULL(b.ttl,0)) AS [Total]
FROM TableFunction(@user, DEFAULT) a
-- Selecting NAME from Table1 takes <1s
LEFT JOIN (
-- Sub-Query 1
SELECT
COUNT(*) ttl
, NAME
FROM Table2 c
WHERE active=1
GROUP BY NAME
-- Sub-Query 1 (< 1s, 662 rows)
UNION ALL
-- Sub-Query 2
SELECT
COUNT(*) ttl
, jt.NAME
FROM OtherServer.dbase.dbo.Table3 d
INNER JOIN OtherServer.dbase.dbo.JoiningTable jt ON d.ID=jt.ID
WHERE ISNULL(d.DELETED,0)=0
GROUP BY jt.NAME
-- Sub-Query 2 (7s, 5576 rows)
-- Full Sub-Query [1 UNION ALL 2] (7s, 6238 rows)
) b ON b.NAME=a.NAME
GROUP BY a.NAME
--Full query (3:44, 91 rows)
The Execution plan suggests that the problem comes from the remote query (Sub-Grid 2).
How can I improve the runtime of this? Why does a 7 second sub-query take over 3 minutes when joined on a basically instant query?
UPDATE:
After some commenting out random things in hopes of finding something useful, it appears to be specifically the WHERE ISNULL(d.DELETED,0)=0
bit. Removing that line improves query to 4s. Why would this help at all?
UPDATE 2:
Playing around more, it seems the fact that the first table is a table-valued function may have been important. Selecting it's result table into a temp table and using that temp table brings the query down to 8s. The table-valued function code is:
--Inputs
@user varchar(50)
, @includeDeleted BIT = 0
--Function
RETURNS @tbl table (ID int
, NAME varchar(max)
, DESCRIPTION varchar(max))
AS
BEGIN
INSERT INTO @tbl (ID, NAME, DESCRIPTION) (
select jt.ID, jt.NAME, jt.DESCRIPTION
from OtherServer.dbase.dbo.JoiningTable jt
where jt.USER = (select top 1 USER
from OtherServer.dbase.dbo.USERS
where user_name = @user)
and 1=CASE WHEN @includeDeleted=1 THEN 1 ELSE
CASE WHEN ISNULL(jt.Deleted, 0) <> 1 THEN 1 ELSE 0 END
END
)
INSERT INTO @tbl (ID, NAME, DESCRIPTION) (
SELECT ID, name, ''
FROM RandomTable rt
INNER JOIN RandomJoinTable rjt ON rjt.ID = rt.ID AND rjt.user = @user
LEFT JOIN @tbl t ON t.ID = rt.ID
WHERE 1=CASE WHEN @includeDeleted=1 THEN 1 ELSE
CASE WHEN ISNULL(rt.Deleted,0)=0 THEN 1 ELSE 0 END
END
AND t.Name IS NULL
)
RETURN
1 Answer 1
Querying across databases introduces a lot of uncertainty to the query optimizer because it doesn't know much about the other end. For example, if you try to join two tables, it has no idea how big the other table is (10 rows? 1 million?), what is indexed, or how the data is organized. Should it try to use an index to get 10 rows or a full scan for 1 million?
Generally, what you want to do is minimize the amount of interaction or joining that needs to happen between the two. Is there a filter you can put on JoiningTable to reduce the number of rows? Can you structure it into two CTEs that only pull the results together at the end?
-
I imagine I could select the subquery into a temp table and join on that. Just tried that, results returned in 6 seconds. I suppose my problem is a misunderstanding of how the query runs. I expected the subquery to be put into a temp table behind-the-scenes.David Starkey– David Starkey2017年03月30日 16:21:07 +00:00Commented Mar 30, 2017 at 16:21
ISNULL
, but running just the query is fast enough for my purposes. Why should it be different when running it as a sub-query?