Why does query runtime dramatically increase when made into subquery?

Question 1

I'm working on a query that selects counts from 2 servers and joins those together. Here is what I have:

SELECT
 a.NAME
 , SUM(ISNULL(b.ttl,0)) AS [Total]
FROM TableFunction(@user, DEFAULT) a
-- Selecting NAME from Table1 takes <1s
LEFT JOIN (
 -- Sub-Query 1
 SELECT
 COUNT(*) ttl
 , NAME
 FROM Table2 c
 WHERE active=1
 GROUP BY NAME
 -- Sub-Query 1 (< 1s, 662 rows)
 UNION ALL
 -- Sub-Query 2
 SELECT
 COUNT(*) ttl
 , jt.NAME
 FROM OtherServer.dbase.dbo.Table3 d
 INNER JOIN OtherServer.dbase.dbo.JoiningTable jt ON d.ID=jt.ID
 WHERE ISNULL(d.DELETED,0)=0
 GROUP BY jt.NAME
 -- Sub-Query 2 (7s, 5576 rows)
 -- Full Sub-Query [1 UNION ALL 2] (7s, 6238 rows)
) b ON b.NAME=a.NAME
GROUP BY a.NAME
--Full query (3:44, 91 rows)

The Execution plan suggests that the problem comes from the remote query (Sub-Grid 2).

Execution plan stats

How can I improve the runtime of this? Why does a 7 second sub-query take over 3 minutes when joined on a basically instant query?

UPDATE:

After some commenting out random things in hopes of finding something useful, it appears to be specifically the WHERE ISNULL(d.DELETED,0)=0 bit. Removing that line improves query to 4s. Why would this help at all?

UPDATE 2:

Playing around more, it seems the fact that the first table is a table-valued function may have been important. Selecting it's result table into a temp table and using that temp table brings the query down to 8s. The table-valued function code is:

--Inputs
@user varchar(50)
, @includeDeleted BIT = 0
--Function
RETURNS @tbl table (ID int
 , NAME varchar(max)
 , DESCRIPTION varchar(max))
AS
BEGIN
INSERT INTO @tbl (ID, NAME, DESCRIPTION) (
 select jt.ID, jt.NAME, jt.DESCRIPTION
 from OtherServer.dbase.dbo.JoiningTable jt
 where jt.USER = (select top 1 USER
 from OtherServer.dbase.dbo.USERS 
 where user_name = @user)
 and 1=CASE WHEN @includeDeleted=1 THEN 1 ELSE 
 CASE WHEN ISNULL(jt.Deleted, 0) <> 1 THEN 1 ELSE 0 END
 END
)
INSERT INTO @tbl (ID, NAME, DESCRIPTION) (
 SELECT ID, name, ''
 FROM RandomTable rt
 INNER JOIN RandomJoinTable rjt ON rjt.ID = rt.ID AND rjt.user = @user
 LEFT JOIN @tbl t ON t.ID = rt.ID
 WHERE 1=CASE WHEN @includeDeleted=1 THEN 1 ELSE 
 CASE WHEN ISNULL(rt.Deleted,0)=0 THEN 1 ELSE 0 END
 END
 AND t.Name IS NULL
)
RETURN

Question 2

Oh, I dunno: dba.stackexchange.com/questions/168276/…

Question 3

Are you able to provide the query plan? You can use this to share it: brentozar.com/pastetheplan

Question 4

@sp_BlitzErik That answer seems to be dealing with a query using an ISNULL, but running just the query is fast enough for my purposes. Why should it be different when running it as a sub-query?

Question 5

The optimizer makes weird choices when dealing with remote servers, especially prior to 2012 SP1 when you needed elevated permissions to access statistics. I couldn't tell you why without a lot more information. Query plans, indexes, remote server configs. I posted that link in response to your update.

Question 6

@Wes Here you go

Question 7

Querying across databases introduces a lot of uncertainty to the query optimizer because it doesn't know much about the other end. For example, if you try to join two tables, it has no idea how big the other table is (10 rows? 1 million?), what is indexed, or how the data is organized. Should it try to use an index to get 10 rows or a full scan for 1 million?

Generally, what you want to do is minimize the amount of interaction or joining that needs to happen between the two. Is there a filter you can put on JoiningTable to reduce the number of rows? Can you structure it into two CTEs that only pull the results together at the end?

Question 8

I imagine I could select the subquery into a temp table and join on that. Just tried that, results returned in 6 seconds. I suppose my problem is a misunderstanding of how the query runs. I expected the subquery to be put into a temp table behind-the-scenes.

CalZ CalZ 9344 silver badges10 bronze badges · Accepted Answer · 2017-03-30 16:12:12Z

Querying across databases introduces a lot of uncertainty to the query optimizer because it doesn't know much about the other end. For example, if you try to join two tables, it has no idea how big the other table is (10 rows? 1 million?), what is indexed, or how the data is organized. Should it try to use an index to get 10 rows or a full scan for 1 million?

Generally, what you want to do is minimize the amount of interaction or joining that needs to happen between the two. Is there a filter you can put on JoiningTable to reduce the number of rows? Can you structure it into two CTEs that only pull the results together at the end?

I imagine I could select the subquery into a temp table and join on that. Just tried that, results returned in 6 seconds. I suppose my problem is a misunderstanding of how the query runs. I expected the subquery to be put into a temp table behind-the-scenes.

Stack Exchange Network

Why does query runtime dramatically increase when made into subquery?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Why does query runtime dramatically increase when made into subquery?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions