Query slowed down by gist-index on points

Question 1

So I have this simple query, selecting all points in my table that are within a given polygon:

WITH my_polygon as (
 SELECT ST_GeomFromEWKT('<EWKT polygon>') as geom
)
SELECT count(*)
FROM points as a
, my_polygon as p
WHERE ST_Within(points.geom, my_polygon.geom);

I have two equal points tables, both consisting of the same kind of points, scattered globally. The only difference is:

One is sparse _____ 500.000 rows
One is dense __ 340.000.000 rows

I have now executed the query on both tables, testing with and without gist-index on the points.geom column. On the sparse table I get these execution times:

No indexing ________ 483.544 ms
With gist index ________ 0.142 ms

Looks good, index is working properly. Now for the dense table:

No indexing _____ 195307.138 ms
With gist index ___ 234495.684 ms

This was not as expected. Is the gist index not supposed to speed up this query? The explain analyze for all cases is provided in this GitHub-Gist

(I did performed vacuum analyze after creating the index. Also I see the same behavior when gin-indexing an array. Query slows down on the big table using the index.)

Question 2

There are a number of variables at play. If the points are spatially fragmented, the large table could result in a multi-pass full table scan.

Question 3

The huge amount of lossy blocks occurs because you don't have enough memory (work_mem) available, so Postgres has to flag blocks having matching records first, and then it re-scans them to actually select the rows.

This page illustrates the issue and offers a way of computing the required work_mem.

The comments in this post are interesting, as other memory settings can have some effects.

You may also want to cluster your table according to the spatial index, which should reduce the number of pages containing matching records.

Question 4

Good answer. I often play around with SET enable_seqcan=false with EXPLAIN. I hadn't thought to do this with things like work_mem too. I just tend to hack postgresql.conf and then leave it. Very helpful.

Question 5

Thank you for your answer. I upped the work_mem to 3GB, and now there are no lossy blocks (see the updated Gist from the original question). The execution time is unfortunately unchanged. I see from the post you linked that one should try to minimise the 'Rows Removed by Filter', which is 1224016 in my case. How could this be achieved? I wanted to cluster the table, but limitations in available disk space stopped the process. As I understand it, clustering has to be redone after new inserts, which would require more en more free disc space as the table size increase.

Question 6

@AdrianTofting Clustering can be done at any time, when it is "necessary" to regroup data according to the selected index. Since it is a table rewrite, you must have at least the same amount of free space as the current table size. The post also mention increasing other memory settings (same behavior as you report in your comment)

Question 7

Which settings are you referring to? I have now tried to tweak all settings suggested on this page, including working_mem, shared_buffers, effective_cache_size, wal_buffers. Actually, when setting the suggested settings, it slows down even more.

Question 8

This is Jack from this post. I'm actually experiencing the exact same issue again. Last time I fixed it by adding more RAM. This time adding more RAM didn't fix anything. Adrian, did you discover anything while fixing your problem?

Question 9

Please add your comment under question comments and not as Answer. Thanks

Question 10

@Shiko I cannot.

Question 11

Actually, I did not. I solved it by forcing Postgres to first perform a filtering on an index on a time stamp column. The execution time went down from 4 min to 400ms. Would still love an explanation, though!

Question 12

@AdrianTofting My problem seemed to have fixed itself after a few restarts. Some things are hard to explain but at least it works!

JGH JGH 44.4k3 gold badges49 silver badges95 bronze badges · Accepted Answer · 2018-05-16 00:56:42Z

3

The huge amount of lossy blocks occurs because you don't have enough memory (work_mem) available, so Postgres has to flag blocks having matching records first, and then it re-scans them to actually select the rows.

This page illustrates the issue and offers a way of computing the required work_mem.

The comments in this post are interesting, as other memory settings can have some effects.

You may also want to cluster your table according to the spatial index, which should reduce the number of pages containing matching records.

Share

Improve this answer

answered May 16, 2018 at 0:56

JGH's user avatar

JGH JGH

44.4k3 gold badges49 silver badges95 bronze badges

4

Good answer. I often play around with SET enable_seqcan=false with EXPLAIN. I hadn't thought to do this with things like work_mem too. I just tend to hack postgresql.conf and then leave it. Very helpful.

John Powell
– John Powell

2018年05月16日 08:18:33 +00:00
Commented May 16, 2018 at 8:18
Thank you for your answer. I upped the work_mem to 3GB, and now there are no lossy blocks (see the updated Gist from the original question). The execution time is unfortunately unchanged. I see from the post you linked that one should try to minimise the 'Rows Removed by Filter', which is 1224016 in my case. How could this be achieved? I wanted to cluster the table, but limitations in available disk space stopped the process. As I understand it, clustering has to be redone after new inserts, which would require more en more free disc space as the table size increase.

Adrian Tofting
– Adrian Tofting

2018年05月16日 13:01:21 +00:00
Commented May 16, 2018 at 13:01
@AdrianTofting Clustering can be done at any time, when it is "necessary" to regroup data according to the selected index. Since it is a table rewrite, you must have at least the same amount of free space as the current table size. The post also mention increasing other memory settings (same behavior as you report in your comment)

JGH
– JGH

2018年05月18日 12:54:35 +00:00
Commented May 18, 2018 at 12:54
Which settings are you referring to? I have now tried to tweak all settings suggested on this page, including working_mem, shared_buffers, effective_cache_size, wal_buffers. Actually, when setting the suggested settings, it slows down even more.

Adrian Tofting
– Adrian Tofting

2018年05月19日 18:04:32 +00:00
Commented May 19, 2018 at 18:04

Add a comment |

Stack Exchange Network

Query slowed down by gist-index on points

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Query slowed down by gist-index on points

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions