The following query on a foreign takes about 5 seconds to execute on 3.2 million rows:
SELECT x."IncidentTypeCode", COUNT(x."IncidentTypeCode")
FROM "IntterraNearRealTimeUnitReflexes300sForeign" x
WHERE x."IncidentDateTime" >= '05/01/2016'
GROUP BY x."IncidentTypeCode"
ORDER BY 1;
When I execute the same query on normal table, it returns in .6 seconds. The execution plans are quite different:
Normal Table
Sort (cost=226861.20..226861.21 rows=4 width=4) (actual time=646.447..646.448 rows=7 loops=1)
Sort Key: "IncidentTypeCode"
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=226861.12..226861.16 rows=4 width=4) (actual time=646.433..646.434 rows=7 loops=1)
Group Key: "IncidentTypeCode"
-> Bitmap Heap Scan on "IntterraNearRealTimeUnitReflexes300s" x (cost=10597.63..223318.41 rows=708542 width=4) (actual time=74.593..342.110 rows=709376 loops=1)
Recheck Cond: ("IncidentDateTime" >= '2016-05-01 00:00:00'::timestamp without time zone)
Rows Removed by Index Recheck: 12259
Heap Blocks: exact=27052 lossy=26888
-> Bitmap Index Scan on idx_incident_date_time_300 (cost=0.00..10420.49 rows=708542 width=0) (actual time=69.722..69.722 rows=709376 loops=1)
Index Cond: ("IncidentDateTime" >= '2016-05-01 00:00:00'::timestamp without time zone)
Planning time: 0.165 ms
Execution time: 646.512 ms
Foreign Table
Sort (cost=241132.04..241132.05 rows=4 width=4) (actual time=4782.110..4782.112 rows=7 loops=1)
Sort Key: "IncidentTypeCode"
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=241131.96..241132.00 rows=4 width=4) (actual time=4782.097..4782.100 rows=7 loops=1)
Group Key: "IncidentTypeCode"
-> Foreign Scan on "IntterraNearRealTimeUnitReflexes300sForeign" x (cost=10697.63..237589.25 rows=708542 width=4) (actual time=1.916..4476.946 rows=709376 loops=1)
Planning time: 1.413 ms
Execution time: 4782.660 ms
I think I'm paying a high price for the GROUP BY
clause, which isn't passed to the foreign server when I EXPLAIN VERBOSE
:
SELECT
"IncidentTypeCode"
FROM
PUBLIC ."IntterraNearRealTimeUnitReflexes300s"
WHERE
(
(
"IncidentDateTime" >= '2016-05-01 00:00:00' :: TIMESTAMP WITHOUT TIME ZONE
)
)
This returns 700k rows. Is there a way around this?
I spent a lot of time reading this documentation page yesterday, and thought I had found my answer with setting use_remote_estimate
to true, but it had no effect.
I do have access to the foreign server to create objects if necessary. The timestamp value in the WHERE
clause can be anything; it does not come from a list of predefined values.
1 Answer 1
If you use use_remote_estimate
be sure to run ANALYZE the foreign table (I see estimations pretty close with the returned, you'd probably did it). Also, the pushdown improvements are not available in <9.5 version. I also assume that you have the same table structure on the remote server either (including indexes). If a bitmap is needed due to the low cardinality, it won't use the index due to the limitations on the pushdown mechanism. You may want to reduce the amount of returned rows to force a BTREE index scan (timestamp ranges). Unfortunately, there is no clean way to avoid the SeqScan on the remote server if the filter returns +10% of the rows of the table (may vary this percentage if the planner considers that scanning the whole table is cheaper than seek reads). If you are using SSD, you will probably find useful to tweak random_page_cost
).
You can use CTE to isolate the GROUP BY behavior:
WITH atable AS (
SELECT "IncidentTypeCode"
FROM PUBLIC ."IntterraNearRealTimeUnitReflexes300s"
WHERE
("IncidentDateTime"
BETWEEN '2016-05-01 00:00:00'::TIMESTAMP WITHOUT TIME ZONE
AND '2016-05-02 00:00:00'::TIMESTAMP WITHOUT TIME ZONE)
)
SELECT atable."IncidentTypeCode", COUNT(atable.IncidentTypeCode)
FROM atable
GROUP BY atable."IncidentTypeCode"
ORDER BY atable."IncidentTypeCode";
-
1Performance was the same using the CTE. Will try the random_page_cost settings though. Thanks!J-DawG– J-DawG2016年09月28日 16:45:19 +00:00Commented Sep 28, 2016 at 16:45
-
Hi I'm facing performance problem with Foreign Data Wrapper too. I'm wondering what value should be set for
random_page_cost
?gogofan– gogofan2020年06月09日 12:16:27 +00:00Commented Jun 9, 2020 at 12:16 -
Depending on how frequent you issue the query, I would recommend to set a pool behind the origin to keep persistent connections to the target server. If it is a single execution, you should have not differences - at least not too many - compared with local. random_page_cost could impact on query performance only if you tweak it accordingly to the disk type on the target (for SSDs, you might have 1.1 or 1.3 for this configuration variable). See postgresqlco.nf/en/doc/param/random_page_cost for more information @gogofan .3manuek– 3manuek2020年06月15日 21:01:20 +00:00Commented Jun 15, 2020 at 21:01
Explore related questions
See similar questions with these tags.
IntterraNearRealTimeUnitReflexes300sForeign
vsIntterraNearRealTimeUnitReflexes300s
andidx_incident_date_time_300
I presume the 300s ones are the same, but it might be worth checking if theidx_incident_date_time_300
index exists on the foreign server