Postgres versions
- PostgreSQL 13.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.2.1_pre1) 10.2.1 20201203, 64-bit
- PostgreSQL 11.11 (Debian 11.11-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
Query
SELECT count(*) FROM "addresses" AS a1 WHERE ST_DWithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, a1.geom, '25000');
result is as expected result. But slow on PG11 (1.1s) and even slower on PG13 (2.9s)
Table definition
Table "public.addresses"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+--------------------------------+-----------+----------+---------------------------------------+----------+--------------+-------------
id | bigint | | not null | nextval('addresses_id_seq'::regclass) | plain | |
description | text | | | | extended | |
country | character varying(255) | | | | extended | |
locality | character varying(255) | | | | extended | |
region | character varying(255) | | | | extended | |
postal_code | character varying(255) | | | | extended | |
street | text | | | | extended | |
geom | geometry | | | | main | |
inserted_at | timestamp(0) without time zone | | not null | | plain | |
updated_at | timestamp(0) without time zone | | not null | | plain | |
url | character varying(255) | | not null | | extended | |
origin_id | character varying(255) | | | | extended | |
type | character varying(255) | | | | extended | |
Indexes:
"addresses_pkey" PRIMARY KEY, btree (id)
"addresses_origin_id_index" UNIQUE, btree (origin_id)
"addresses_url_index" UNIQUE, btree (url)
"idx_addresses_geom" gist (geom)
Referenced by:
TABLE "actors" CONSTRAINT "actors_physical_address_id_fkey" FOREIGN KEY (physical_address_id) REFERENCES addresses(id)
TABLE "events" CONSTRAINT "events_physical_address_id_fkey" FOREIGN KEY (physical_address_id) REFERENCES addresses(id)
Access method: heap
Cardinalities
SELECT ST_SRID(geom) AS srid, count(*) from addresses group by 1;
srid | count
------+--------
| 1
4326 | 265011
Query plans
PG11
Finalize Aggregate (cost=52276.51..52276.52 rows=1 width=8) (actual time=1082.209..1095.487 rows=1 loops=1)
Buffers: shared hit=3444 read=4611
-> Gather (cost=52276.29..52276.50 rows=2 width=8) (actual time=1082.180..1095.468 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=3444 read=4611
-> Partial Aggregate (cost=51276.29..51276.30 rows=1 width=8) (actual time=897.135..897.138 rows=1 loops=3)
Buffers: shared hit=3444 read=4611
-> Parallel Seq Scan on addresses a1 (cost=0.00..51272.61 rows=1472 width=0) (actual time=6.796..894.827 rows=1960 loops=3)
Filter: (((geom)::geography && '0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography) AND ('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography && _st_expand((geom)::geography, '25000'::double precision)) AND _st_dwithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, (geom)::geography, '25000'::double precision, true))
Rows Removed by Filter: 86378
Buffers: shared hit=3444 read=4611
Planning Time: 1.167 ms
Execution Time: 1095.650 ms
PG13 (jit = off)
Finalize Aggregate (cost=2770301.18..2770301.19 rows=1 width=8) (actual time=2801.774..2830.421 rows=1 loops=1)
Buffers: shared hit=8227
-> Gather (cost=2770300.97..2770301.18 rows=2 width=8) (actual time=2800.661..2830.384 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=8227
-> Partial Aggregate (cost=2769300.97..2769300.98 rows=1 width=8) (actual time=2639.846..2639.849 rows=1 loops=3)
Buffers: shared hit=8227
-> Parallel Seq Scan on addresses a1 (cost=0.00..2769300.94 rows=11 width=0) (actual time=63.094..2637.528 rows=1960 loops=3)
Filter: st_dwithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, (geom)::geography, '25000'::double precision, true)
Rows Removed by Filter: 86378
Buffers: shared hit=8227
Planning Time: 0.424 ms
Execution Time: 2830.699 ms
There seems to be two issues : 1/ spacial index is not used 2/ PG13 is 3x slower one the same machine, same data.
1 Answer 1
The more pressing issue is the bad query. Shouldn't take > 1 sec in either instance. Your filter on a1."geom"::geography
. This cast disables any plain index on addresses.geom
- including the spatial index you have:
"idx_addresses_geom" gist (geom)
So we see ST_DWithin()
only as FILTER
while it should be the index condition to identify qualifying addresses quickly. You need an expression index instead. See:
Your query counts the number of events within 25 km of a given point.
The third parameter of ST_DWithin()
is the distance. Measured in meters for geography
. But for geometry
, the unit is defined by the reference system (SRID). The PostGis manual:
For
geometry
: The distance is specified in units defined by the spatial reference system of the geometries. For this function to make sense, the source geometries must be in the same coordinate system (have the same SRID).
Your example input is a geography
containing a point with SRID 4326:
SELECT ST_SRID ('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography); -- 4326
The unit of measurement for SRID 4326 is degrees. Either operate with geography
(more precise, more expensive) or with geometry
and a different SRID to keep using meters.
Solution
The best solution depends on more undisclosed information. My example solution operates on transformed geometries in index and query:
CREATE INDEX idx_addresses_geom_27571 ON pg_temp.addresses USING gist (ST_Transform(geom, 27571));
SELECT count(*) -- 1
FROM addresses a
JOIN events e ON e.physical_address_id = a.id
WHERE ST_DWithin(ST_Transform('0101000020E61000001EE1B4E0455F0340E92807B309664840664840'::geometry, 27571)
, ST_Transform(a.geom, 27571)
, '25000'); -- 2
1 A minor improvement. Assuming events.begins_on
is defined NOT NULL
, count(*)
is equivalent in this query, and a bit faster.
2 The main issue as described.
I picked SRID 27571 to accommodate measurement in meters geometries in northern France (as indicated by your example). Picking the most suitable reference system is up to you.
Obviously, you only need to transform input geometries that don't have the right SRID, yet.
Related:
Performance regression in Postgres 13 / PostGIS ???
For the fixed query, I'd expect Postgres 13 to be a bit faster than Postgres 11 on an equivalent and properly configured server.
But why is the original, bad query faster with Postgres 11?
Postgres 11:
Filter: (((geom)::geography && '0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography) AND ('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography && _st_expand((geom)::geography, '25000'::double precision)) AND _st_dwithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, (geom)::geography, '25000'::double precision, true))
Postgres 13:
Filter: st_dwithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, (geom)::geography, '25000'::double precision, true)
In Postgres, index support it bound to operators, not functions. Older version of PostGIS had to "hack" the system by using inline-able functions to bring in the operators required for index usage. So ST_DWithin(geog1, geog2, 25000)
is replaced with ((geog2 && geog1) AND (geog1 && _st_expand(geog2, 25000)) AND _st_dwithin(geog1, geog2, 25000, true))
to add bounding box operators. See:
There was a major change to this whole architecture in Postgres 12 and PostGIS 3.0. Quoting the release notes for PostGis 3.0:
4341, Using "support function" API in PgSQL 12+ to replace SQL inlining as the mechanism for providing index support under ST_Intersects, et al
Since then, PostGis functions supposed to use an index have an attached "support function" instead of being rewritten with function inlining. The Postgres manual:
For target functions that return
boolean
, it may be possible to convert a function call appearing inWHERE
into an indexable operator clause or clauses. The converted clauses might be exactly equivalent to the function's condition, or they could be somewhat weaker (that is, they might accept some values that the function condition does not). In the latter case the index condition is said to be lossy; it can still be used to scan an index, but the function call will have to be executed for each row returned by the index to see if it really passes theWHERE
condition or not. To create such conditions, the support function must implement theSupportRequestIndexCondition
request type.
The query plan for my adapted query + index (as detailed above) looks like this for me in Postgres 13 and PostGIS 3.1.3:
'Aggregate (cost=52.48..52.48 rows=1 width=8)'
' -> Index Scan using idx_addresses_geom_27571 on addresses a (cost=0.26..52.47 rows=1 width=0)'
' Index Cond: (st_transform(geom, 27571) && st_expand(''0101000020B36B00006E51041D4B8022411FF9F0303F1E3141''::geometry, ''25000''::double precision))'
' Filter: st_dwithin(''0101000020B36B00006E51041D4B8022411FF9F0303F1E3141''::geometry, st_transform(geom, 27571), ''25000''::double precision)'
Note the added index condition (st_transform(geom, 27571) && st_expand(''0101000020B36B00006E51041D4B8022411FF9F0303F1E3141''::geometry, ''25000''::double precision))'
But the Filter
line sticks to the original functional expression.
It would seem that, without index support, executing the rewritten functional expression with bounding box operators in Postgres 11 is substantially cheaper than the new implementation in Postgres 13..
Either you are not using the appropriate PostGIS version for Postgres 13 (???) or you found a regression, that might be worth looking into.
Asides
timestamp(0) without time zone
may not be the best choice. The expression rounds the timestamp. When rounding up, the timestamp may be in the future, technically. Executing something like inserted_at <= now()
in the same transaction may fail you, and you'll never find out why. Well, until just now. See:
Also timestamptz
is typically the better choice for international data. See:
varchar(255)
hardly ever makes sense. See:
- Do fixed-width rows improve PostgreSQL read performance?
- Any downsides of using data type "text" for storing strings?
You have a single row with geom IS NULL
. You might fix that and declare the column NOT NULL
.
-
Thanks for your detailed explanation. I tested every proposal : remove cast, add geo index, simplify query (I edit my question with it). Same result : same timing (PG11 : 1.2s, PG13 : 3s), same execution plan :(Setop– Setop2021年09月29日 08:13:23 +00:00Commented Sep 29, 2021 at 8:13
-
about degrees versus meters, I'm surprised because the result I get in the app seems correct : I get point of interest within 25km of the point I give. I'll double check.Setop– Setop2021年09月29日 08:17:39 +00:00Commented Sep 29, 2021 at 8:17
-
@The result you get ... for what query?Erwin Brandstetter– Erwin Brandstetter2021年09月29日 11:18:42 +00:00Commented Sep 29, 2021 at 11:18
-
the one you mention in your answer
SELECT count(*) -- 2 FROM addresses a JOIN events e ON e.physical_address_id = a.id WHERE ST_DWithin('<some_geometry>, a.geom, '25000');
Setop– Setop2021年09月29日 13:56:19 +00:00Commented Sep 29, 2021 at 13:56 -
1@Setop: Instead of this piecemeal strategy in comments, please provide all relevant data in the question. See instructions here: dba.stackexchange.com/tags/postgresql-performance/info Also disclose SRID of data in table columns and input data. The query should use the index (unless your search area covers major parts of all rows in the table).Erwin Brandstetter– Erwin Brandstetter2021年09月30日 13:18:07 +00:00Commented Sep 30, 2021 at 13:18
Explore related questions
See similar questions with these tags.
addresses_pkey
is the primary key index. Don't think it can be dropped.explain analyse SELECT * FROM "addresses" AS a1 WHERE ST_DWithin('0101000020E61000001EE1B4E0455F0340E92807B309664840'::geography, a1."geom"::geography, '25000');
null
cases properly, run the cardinality query again asSELECT ST_SRID(geom) AS srid, count(*) from addresses group by 1;