4

I've 2 sets of polygons in 2 tables and the sets overlap each other. For each polygon in set A, I get the ID of the polygon in set B that it overlaps the most.

This is the query:

SELECT DISTINCT ON (a.id)
 a.id as a_id,
 b.id as b_id,
 ST_Area(ST_Intersection(a.geom, b.geom)) as intersect_area
FROM a, b
ORDER BY a.id, ST_Area(ST_Intersection(a.geom, b.geom)) DESC

It works properly so, what's the problem? It's very slow (it takes several minutes). Table "a" has more than 10,000 rows and table "b" 700 rows. So, how can I improve the perfomance?

I've tried with spatial index but I haven't had better results.

geozelot
31.4k4 gold badges38 silver badges59 bronze badges
asked Apr 23, 2021 at 10:10
2
  • 1
    Your query result should have 10000*700 rows ... because your forget the where condition to take in account only the features of a and b that really intersect. Commented Apr 23, 2021 at 10:38
  • please always add the results of an EXPLAIN or EXPLAIN ANALYZE run in SQL query questions. Commented Apr 23, 2021 at 10:49

1 Answer 1

7

An implicit CROSS JOIN (a, b) creates an unconditional (no WHERE) cartesian product of both tables, that is, for every row in a, join every row of table b - massing up 10.000 * 700 rows in the result set! The DISTINCT ON filter adds overhead in parsing the complete result set for un-indexed duplicates, and the ORDER BY is equally not optimized.

It is good practice to express relationships with explicit JOIN statements, and use conditional expressions that can actually utilize (spatial) indexes.

Since a full traversal of a is the basic idea behind your task, to actually get a conditionally limited (ordered) result set per row you want to use the LATERAL JOIN mechanics:

SELECT a.id,
 i.id,
 i.intersect_area
FROM a
LEFT JOIN LATERAL (
 SELECT b.id,
 ST_Area(ST_Intersection(a.geom, b.geom)) AS intersect_area
 FROM b
 WHERE ST_Intersects(a.geom, b.geom)
 ORDER BY
 2 DESC
 LIMIT 1
) AS i ON TRUE
;

While traversing a, the LATERAL JOIN passes the current row of a into the sub-query, where ST_Intersects filters (index driven) for intersecting polygons in b, and returns only the one (LIMIT 1) with the largest intersection area (ORDER BY [ST_Area(ST_Intersection(a.geom, b.geom))]). Using a LEFT JOIN ensures that rows in a with no overlapping geometries in b will be kept in the result; if that is undesireable, use CROSS JOIN LATERAL (<sub_query>) AS i without ON TRUE.

Notes:

  • an index on b.geom is mandatory
  • the result of ST_Area will be in units of CRS - which, if applied to a geographic reference system with degree as unit, is useless - either project (ST_Transform) to a suitable projection, or use a cast to GEOGRAPHY
answered Apr 23, 2021 at 10:39
1
  • For other proposites for example performance can be equal cross to a left join en this case? Commented Mar 4, 2024 at 20:43

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.