When I run this query on a SQL Server 2022 server with a linked server connection to PostgreSQL:
SELECT TOP 1 *
FROM PGSTACK.stackoverflow.[public].users
WHERE Id = 1;
SQL Server fetches the entire contents of the remote users table - all of the rows - over the network, and filters it all locally. The query takes >3 minutes to run.
The actual query plan doesn't show a local filter - it implies that SQL Server is only getting 1 row from the remote server:
- Question 1: Can that (getting the entire contents of the remote table) be avoided without rewriting the query to use OPENQUERY?
- Question 2: Even just trying to get the estimated execution plan actually fetches all of the rows in the remote table, and takes >3 minutes just to get the estimated plan. Can that be avoided?
Additional technical details:
- The remote server has a primary key on id, and the query runs in milliseconds on Postgres
- The remote Postgres server's pg_stat_activity shows that SQL Server is running this query:
select * from "stackoverflow"."public"."users"
- note the lack of any filter on the table - Postgres ODBC driver 16.00 2023年09月16日, latest version from here
- SQL Server 2022 build 16.0.4095.4
- Network control panel even shows throughput going through the roof once the query starts - and again, I'm only pulling one row here:
4 Answers 4
Have you set "Use Declare/Fetch" to true in the odbc driver options? From the documentation "If true, the driver automatically uses declare cursor/fetch to handle SELECT statements and keeps 100 rows in a cache. This is mostly a great advantage, especially if you are only interested in reading and not updating. It results in the driver not sucking down lots of memory to buffer the entire result set. If set to false, cursors will not be used and the driver will retrieve the entire result set."
-
WOOHOO, we have a winner! I did NOT have "Use Declare/Fetch" turned on, but by turning it on, the results return instantly. Nice work!Brent Ozar– Brent Ozar2024年02月01日 13:44:10 +00:00Commented Feb 1, 2024 at 13:44
-
It's just working around the inability of SQL Server to push down the query predicate to the remote server, isn't it?mustaccio– mustaccio2024年02月01日 21:16:59 +00:00Commented Feb 1, 2024 at 21:16
I think the issue is regarding the TOP
keyword... SQL Server doesn't know the equivalent keyword in PostgreSQL (LIMIT
), so he fetches all the table content, and after that, applies any filter & keywords reserved on your side... can you try re-writing the select statement with the WITH
clause to test?
Something like this
WITH pgt AS (SELECT *
FROM PGSTACK.stackoverflow.[public].users
WHERE Id = 1)
SELECT TOP 1 *
FROM pgt
I don't have any VM with PG and SQL to test it, but it is the first thing that comes to my mind
-
oh that's a great idea! Unfortunately no, same behavior. Tested it and it drags all the millions of rows across the network too. (And there's only one row with Id = 1.)Brent Ozar– Brent Ozar2024年02月01日 13:35:17 +00:00Commented Feb 1, 2024 at 13:35
After having flashbacks of Brent’s views on Linked servers and asking if the application could open a connection to Postgres directly. I would create a stored procedure, move the query to the stored proc on the Postgres server, and call the procedure.
-
thanks, but as I mentioned in the question, I'm specifically asking about not changing the query. (For example, even just getting a query plan is untenable for larger tables.)Brent Ozar– Brent Ozar2024年01月30日 21:45:16 +00:00Commented Jan 30, 2024 at 21:45
select * from openquery(PGSTACK,'select * from users limit 1')
with the use of openquery you can write much more complex SQL including joins and still have the performance of the indexes on the other database.
declare @sql varchar(max)
declare @ID int =1
set @sql = 'select * from openquery(PGSTACK,''select * from users where id ='+cast(@ID as varchar)+' '')'
exec(@sql)
Explore related questions
See similar questions with these tags.
users
table in the remote server?