So I have this simple query, selecting all points in my table that are within a given polygon:
WITH my_polygon as (
SELECT ST_GeomFromEWKT('<EWKT polygon>') as geom
)
SELECT count(*)
FROM points as a
, my_polygon as p
WHERE ST_Within(points.geom, my_polygon.geom);
I have two equal points tables, both consisting of the same kind of points, scattered globally. The only difference is:
- One is sparse _____ 500.000 rows
- One is dense __ 340.000.000 rows
I have now executed the query on both tables, testing with and without gist-index on the points.geom column. On the sparse table I get these execution times:
- No indexing ________ 483.544 ms
- With gist index ________ 0.142 ms
Looks good, index is working properly. Now for the dense table:
- No indexing _____ 195307.138 ms
- With gist index ___ 234495.684 ms
This was not as expected. Is the gist index not supposed to speed up this query? The explain analyze for all cases is provided in this GitHub-Gist
(I did performed vacuum analyze after creating the index. Also I see the same behavior when gin-indexing an array. Query slows down on the big table using the index.)
-
There are a number of variables at play. If the points are spatially fragmented, the large table could result in a multi-pass full table scan.Vince– Vince2018年05月15日 22:57:20 +00:00Commented May 15, 2018 at 22:57
2 Answers 2
The huge amount of lossy blocks occurs because you don't have enough memory (work_mem) available, so Postgres has to flag blocks having matching records first, and then it re-scans them to actually select the rows.
This page illustrates the issue and offers a way of computing the required work_mem.
The comments in this post are interesting, as other memory settings can have some effects.
You may also want to cluster your table according to the spatial index, which should reduce the number of pages containing matching records.
-
Good answer. I often play around with SET enable_seqcan=false with EXPLAIN. I hadn't thought to do this with things like work_mem too. I just tend to hack postgresql.conf and then leave it. Very helpful.John Powell– John Powell2018年05月16日 08:18:33 +00:00Commented May 16, 2018 at 8:18
-
Thank you for your answer. I upped the work_mem to 3GB, and now there are no lossy blocks (see the updated Gist from the original question). The execution time is unfortunately unchanged. I see from the post you linked that one should try to minimise the 'Rows Removed by Filter', which is 1224016 in my case. How could this be achieved? I wanted to cluster the table, but limitations in available disk space stopped the process. As I understand it, clustering has to be redone after new inserts, which would require more en more free disc space as the table size increase.Adrian Tofting– Adrian Tofting2018年05月16日 13:01:21 +00:00Commented May 16, 2018 at 13:01
-
@AdrianTofting Clustering can be done at any time, when it is "necessary" to regroup data according to the selected index. Since it is a table rewrite, you must have at least the same amount of free space as the current table size. The post also mention increasing other memory settings (same behavior as you report in your comment)JGH– JGH2018年05月18日 12:54:35 +00:00Commented May 18, 2018 at 12:54
-
Which settings are you referring to? I have now tried to tweak all settings suggested on this page, including working_mem, shared_buffers, effective_cache_size, wal_buffers. Actually, when setting the suggested settings, it slows down even more.Adrian Tofting– Adrian Tofting2018年05月19日 18:04:32 +00:00Commented May 19, 2018 at 18:04
This is Jack from this post. I'm actually experiencing the exact same issue again. Last time I fixed it by adding more RAM. This time adding more RAM didn't fix anything. Adrian, did you discover anything while fixing your problem?
-
Please add your comment under question comments and not as Answer. ThanksShiko– Shiko2018年08月29日 00:22:53 +00:00Commented Aug 29, 2018 at 0:22
-
-
Actually, I did not. I solved it by forcing Postgres to first perform a filtering on an index on a time stamp column. The execution time went down from 4 min to 400ms. Would still love an explanation, though!Adrian Tofting– Adrian Tofting2018年08月30日 05:39:33 +00:00Commented Aug 30, 2018 at 5:39
-
@AdrianTofting My problem seemed to have fixed itself after a few restarts. Some things are hard to explain but at least it works!Jack– Jack2018年08月31日 04:11:53 +00:00Commented Aug 31, 2018 at 4:11