I'm trying to join a few rows from a remote view to a local table. The view has about 300 million rows, so I want to use the REMOTE query hint so that all 3 million rows don't have to be transferred to my computer.
SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
SELECT TOP 100 Something, L_ID FROM [RemoteServer].[RemoteDB].[dbo].[RemoteTable]
) R
ON L.ID = R.L_ID
This returns 100 rows, as I expected, and takes basically no time, as I expected.
However,
SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER REMOTE JOIN (
SELECT TOP 100 Something, L_ID FROM [RemoteServer].[RemoteDB].[dbo].[RemoteTable]
) R
ON L.ID = R.L_ID
starts to return thousands of rows. I quit it after a few seconds, but it was in the tens - hundreds of thousands.
How could a query hint change my result set?
2 Answers 2
TOP 100
with no ORDER BY
means it is undeterministic which 100 rows from the remote table end up participating in the join. This is execution plan dependant and can vary.
If it is a one to many relationship it may be the case that one batch of 100 rows has more matches on the other side of the join than another different batch of 100 rows.
You should specify an ORDER BY
(inside the derived table) on some unique column or combination of columns to ensure deterministic results.
-
TOP
without anORDER BY
is meaningless, yes, but I'd considerTOP
before the join condition (which relevant rows!?) to be pretty meaningless anyways. You'd need to be able to get the same search criteria inside, which probably amounts to normalization violation.Clockwork-Muse– Clockwork-Muse2016年07月27日 08:14:26 +00:00Commented Jul 27, 2016 at 8:14 -
@Clockwork-Muse - Not at all. It would be entirely possible to want the most recent 100 rows from remote table (
TOP 100 ... FROM RemoteTable ORDER BY DateInserted DESC
) and their associated details from local table. e.g. there is nothing meaningless about requesting the latest 100 order headers and their line items. (Though this would be unlikely to be split over servers)Martin Smith– Martin Smith2016年07月27日 08:22:03 +00:00Commented Jul 27, 2016 at 8:22
You can try forcing the remote query to run remotely:
SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
SELECT TOP 100 Something, L_ID
FROM OPENQUERY([RemoteServer], 'SELECT Something, L_ID
FROM [RemoteDB].[dbo].[RemoteTable]'
)
) R
ON L.ID = R.L_ID
Or (if you want the 100 limiter to be in the remote query):
SELECT R.Something, L.ID, L.Something
FROM [dbo].[LocalTable] L
INNER JOIN (
SELECT Something, L_ID
FROM OPENQUERY([RemoteServer], 'SELECT TOP 100 Something, L_ID
FROM [RemoteDB].[dbo].[RemoteTable]'
)
) R
ON L.ID = R.L_ID
Explore related questions
See similar questions with these tags.
TOP
withoutORDER BY
is meaningless, but it's not much better to grab the top n rows if they're not actually in the set you're interested in - what was the point of grabbing the top 100 rows here? 4)