SQL Server linked server query to Postgres not filtering rows

Question 1

When I run this query on a SQL Server 2022 server with a linked server connection to PostgreSQL:

SELECT TOP 1 * 
FROM PGSTACK.stackoverflow.[public].users 
WHERE Id = 1;

SQL Server fetches the entire contents of the remote users table - all of the rows - over the network, and filters it all locally. The query takes >3 minutes to run.

The actual query plan doesn't show a local filter - it implies that SQL Server is only getting 1 row from the remote server:

Query plan

Question 1: Can that (getting the entire contents of the remote table) be avoided without rewriting the query to use OPENQUERY?
Question 2: Even just trying to get the estimated execution plan actually fetches all of the rows in the remote table, and takes >3 minutes just to get the estimated plan. Can that be avoided?

Additional technical details:

The remote server has a primary key on id, and the query runs in milliseconds on Postgres
The remote Postgres server's pg_stat_activity shows that SQL Server is running this query: select * from "stackoverflow"."public"."users" - note the lack of any filter on the table
Postgres ODBC driver 16.00 2023年09月16日, latest version from here
SQL Server 2022 build 16.0.4095.4
Network control panel even shows throughput going through the roof once the query starts - and again, I'm only pulling one row here:

Network control panel

Question 2

What indexes are in place on the users table in the remote server?

Question 3

@J.D. there are multiple indexes, but yes, there's one on id. As I mentioned in the post, the query takes milliseconds when connecting directly to Postgres.

Question 4

Heh, just making explicitly sure (though I'm sure you already knew to verify that). Best of luck!

Question 5

I had a similar issue when working at IBM's DB2 for Linux Unix and windows . The push down logic is not what you would think it would be. I recall that I had to trick it by creating objects (in my case, UDF's) on the remote server to do some of the query processing so that not all rows were returned. Sadly, that's a query change and then some ...

Question 6

I wonder if you'd defined an external table for the remote source the query would behave differently.

Question 7

Have you set "Use Declare/Fetch" to true in the odbc driver options? From the documentation "If true, the driver automatically uses declare cursor/fetch to handle SELECT statements and keeps 100 rows in a cache. This is mostly a great advantage, especially if you are only interested in reading and not updating. It results in the driver not sucking down lots of memory to buffer the entire result set. If set to false, cursors will not be used and the driver will retrieve the entire result set."

https://odbc.postgresql.org/docs/config.html

Question 8

WOOHOO, we have a winner! I did NOT have "Use Declare/Fetch" turned on, but by turning it on, the results return instantly. Nice work!

Question 9

It's just working around the inability of SQL Server to push down the query predicate to the remote server, isn't it?

Question 10

I think the issue is regarding the TOP keyword... SQL Server doesn't know the equivalent keyword in PostgreSQL (LIMIT), so he fetches all the table content, and after that, applies any filter & keywords reserved on your side... can you try re-writing the select statement with the WITH clause to test?

Something like this

WITH pgt AS (SELECT * 
 FROM PGSTACK.stackoverflow.[public].users 
 WHERE Id = 1)
SELECT TOP 1 * 
FROM pgt

I don't have any VM with PG and SQL to test it, but it is the first thing that comes to my mind

Question 11

oh that's a great idea! Unfortunately no, same behavior. Tested it and it drags all the millions of rows across the network too. (And there's only one row with Id = 1.)

Question 12

After having flashbacks of Brent’s views on Linked servers and asking if the application could open a connection to Postgres directly. I would create a stored procedure, move the query to the stored proc on the Postgres server, and call the procedure.

Question 13

thanks, but as I mentioned in the question, I'm specifically asking about not changing the query. (For example, even just getting a query plan is untenable for larger tables.)

Question 14

select * from openquery(PGSTACK,'select * from users limit 1')

with the use of openquery you can write much more complex SQL including joins and still have the performance of the indexes on the other database.

declare @sql varchar(max) 
declare @ID int =1 
set @sql = 'select * from openquery(PGSTACK,''select * from users where id ='+cast(@ID as varchar)+' '')'
exec(@sql)

Spörri Spörri 4,73415 silver badges28 bronze badges · Accepted Answer · 2024-02-01 08:30:54Z

Have you set "Use Declare/Fetch" to true in the odbc driver options? From the documentation "If true, the driver automatically uses declare cursor/fetch to handle SELECT statements and keeps 100 rows in a cache. This is mostly a great advantage, especially if you are only interested in reading and not updating. It results in the driver not sucking down lots of memory to buffer the entire result set. If set to false, cursors will not be used and the driver will retrieve the entire result set."

https://odbc.postgresql.org/docs/config.html

WOOHOO, we have a winner! I did NOT have "Use Declare/Fetch" turned on, but by turning it on, the results return instantly. Nice work!
It's just working around the inability of SQL Server to push down the query predicate to the remote server, isn't it?

Stack Exchange Network

SQL Server linked server query to Postgres not filtering rows

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

SQL Server linked server query to Postgres not filtering rows

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions