6

In 9.4b2, postgresql_fdw doesn't know how to "push down" aggregate queries on remote tables, e.g.

> explain verbose select max(col1) from remote_tables.table1;
 QUERY PLAN 
---------------------------------------------------------------------------------------------
 Aggregate (cost=605587.30..605587.31 rows=1 width=4)
 Output: max(col1)
 -> Foreign Scan on remote_tables.table1 (cost=100.00..565653.20 rows=15973640 width=4)
 Output: col1, col2, col3
 Remote SQL: SELECT col1 FROM public.table1

It would obviously be much more efficient to send SELECT max(col1) FROM public.table1 to the remote server and just pull the one row back.

Is there a way to perform this optimization manually? I would be satisfied with something as low-level as (hypothetically speaking)

EXECUTE 'SELECT max(col1) FROM public.table1' ON remote RETURNING (col1 INTEGER);

although of course a higher-level construct would be preferred.

I'm aware that I could do something like this with dblink, but that would involve rewriting a large body of code that already uses foreign tables, so I'd prefer not to.

EDIT: Here's the query plan for Erwin Brandstetter's suggestion:

=> explain verbose select col1 from remote_tables.table1 
-> order by col1 desc nulls last limit 1;
 QUERY PLAN 
---------------------------------------------------------------------------------------------------
 Limit (cost=645521.40..645521.40 rows=1 width=4)
 Output: url
 -> Sort (cost=645521.40..685455.50 rows=15973640 width=4)
 Output: col1
 Sort Key: table1.col1
 -> Foreign Scan on remote_tables.table1 (cost=100.00..565653.20 rows=15973640 width=4)
 Output: col1
 Remote SQL: SELECT col1 FROM public.table1

This is better, in that it fetches only col1, but it's still dragging 16 million rows over the network and now it's also sorting them. By way of comparison, the original query, applied on the remote server, doesn't even have to scan, because that column has an index. (The core query planner isn't clever enough to do that for the modified query applied on the remote server, but that's minor.)

asked Sep 22, 2014 at 18:00

2 Answers 2

2

For the time being, it seems that the best available option is to create a view on the remote server that encapsulates the query needing to be "pushed down". postgres_fdw is happy to define and use foreign tables backed by views on the remote, and regular old query optimization within the view does the Right Thing. For instance, given

CREATE VIEW id_ranges AS
SELECT 'url_strings'::text AS tbl,
 min(url_strings.id)::bigint AS lo,
 max(url_strings.id)::bigint AS hi
 FROM url_strings
UNION
 SELECT 'captured_pages'::text AS tbl,
 min(captured_pages.url)::bigint AS lo,
 max(captured_pages.url)::bigint AS hi
 FROM captured_pages
UNION
 -- ... several more like that ...

on the remote, and a FOREIGN TABLE of the same name on the local server,

SELECT lo, hi FROM id_ranges WHERE tbl = 'url_strings';

the existing pushdown optimization will send the WHERE constraint to the remote, and the remote will scan only one table (making use of indexes if possible) and send back a single-row result.

answered Oct 4, 2014 at 21:50
0

Remote Query Optimization is rather basic:

postgres_fdw attempts to optimize remote queries to reduce the amount of data transferred from foreign servers. This is done by sending query WHERE clauses to the remote server for execution, and by not retrieving table columns that are not needed for the current query. [...]

My first idea to substitute with the following isn't much of an improvement either as you found out:

(削除) SELECT col1
FROM public.table1
ORDER BY col1 DESC NULLS LAST
LIMIT 1; (削除ここまで)

Currently (including pg 9.4), only WHERE conditions with all immutable functions are pushed down. I found this exhaustive thread discussing the Status of FDW pushdowns on pgsql-hackers.

Your best option seems to use dblink like you already mentioned yourself.

answered Sep 22, 2014 at 23:31
2
  • Alas, the sort happens locally. Commented Sep 22, 2014 at 23:45
  • @Zack: Too bad. Currently, really only WHERE conditions with all immutable functions are pushed down. Commented Sep 23, 2014 at 0:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.