2

I have two point tables in SRID 4326: Table 1. unjoined_pts & Table 2. midpts. I want to join the attributes from the unjoined_pts table to the closest midpts table based on a few conditions. The query has yet to complete after sometime and I am not sure if I have fully optimized my code or if something is missing from my SQL query. I have spatial indices, vacuumed, and performed an ST_DWithin to limit the search area to only those points that are a maximum of 848 m away. The unjoined_pts table has 9,627 points and the midpts table has 1.98 million points.

 CREATE INDEX unjoined_pts_geom_idx ON public.unjoined_pts USING GIST (geom);
 CREATE INDEX midpts_geom_idx ON public.midpts USING GIST (geom);
 
-- functional index of points
 CREATE INDEX ON unjoined_pts USING gist(geography(geom));
 CREATE INDEX ON midpts USING gist(geography(geom));
 VACUUM ANALYZE public.midpts;
 VACUUM ANALYZE public.unjoined_pts;
 
 EXPLAIN CREATE TABLE nearest AS
 WITH knn_midpoints AS (
 SELECT p.lic_li_no, p.orig_licli, p.descriptor, p.attribute1, p.attribute2, ...., 
 p.attributeN, closest_midpoint.geom
 FROM public.unjoined_pts p
 CROSS JOIN LATERAL ( 
 SELECT m.geom, m.lic_li_no, m.orig_licli, m.descriptor 
 FROM public.midpts m 
 WHERE ST_DWithin(p.geom::geography, m.geom::geography, 848)
 AND p.lic_li_no = m.lic_li_no -- varchar (length=30)
 OR p.lic_li_no = m.orig_licli -- varchar (length=30)
 AND p.descriptor != m.descriptor -- varchar (length=50)
 ORDER BY 
 p.geom::geography <-> m.geom::geography -- nearest midpoint for each unjoined point 
 LIMIT 1 -- limit to only a single midpoint location
 ) AS closest_midpoint
 )
 SELECT * FROM knn_midpoints;

EXPLAIN results from a subset of the unjoined_pts, nut I used the same number of midpts. I removed the ST_DWithin. enter image description here

asked Mar 13, 2023 at 23:03
4
  • 2
    Most crucially, the cast to GEOGRAPHY is not covered by the index - you'd need a functional index ... USING GIST ( (geom::GEOGRAPHY) );. ST_DWithin is just noise with a cost for the (K)NN search - better remove it. Commented Mar 14, 2023 at 8:00
  • After what @geozelot said, if you still have problems, can you add an EXPLAIN of your request? Or even better, an EXPLAIN ANALYZE using smaller tables that you create but sampling the big ones ? Commented Mar 14, 2023 at 8:31
  • @robinloche agreed, generally, but just a warning: sampling may lead to completely different execution plans! Commented Mar 14, 2023 at 9:01
  • @geozelot I added the functional indices, but still significant process times (not completing). Commented Mar 14, 2023 at 15:50

1 Answer 1

1

First of all, why do you need to cast geoms to geography? At a distance of 848 m, there will be absolutely minuscule differences between spherical and planar systems.

Then I will try to avoid cross-join. Simple join will do just fine I thing. Something like this should return points that satisfy spatial and other conditions. It should be pretty quick to iterate over 9k or so points. Then you can grab them and left join them to your original table with 2M points (based on id or spatially...)

select p.lic_li_no, p.orig_licli, p.descriptor, p.attribute1, p.attribute2,...., p.attributeN, closest_midpoint.geom
 from public.unjoined_pts p
 left join lateral ( 
 select m.geom, m.lic_li_no, m.orig_licli, m.descriptor
 from public.midpts m
 where ST_DWithin(p.geom, m.geom, 848)
 and (p.lic_li_no = m.lic_li_no or p.lic_li_no = m.orig_licli)
 and p.descriptor != m.descriptor
 order by p.geom <-> m.geom
 limit 1) closeset_midpoint on true

--- EDIT ---

These indexes should also help:

create index midpts_licli_ix on midpts (lic_li_no);
create index midpts_orig_licli_ix on midpts (orig_licli);
answered Mar 15, 2023 at 11:38
10
  • still doesn't return any results. I have been letting it run for more than 1/2hr. Commented Mar 15, 2023 at 17:18
  • @MJM OK, what happens if you factor out non spatial conditions? Commented Mar 16, 2023 at 6:44
  • @MJM At a closer look at where condition there should be parentheses to properly group logical predicates. Please try to put parentheses around p.lic_li_no = m.lic_li_no or p.lic_li_no = m.orig_licli. Commented Mar 16, 2023 at 7:16
  • Query returned successfully in 20 min 39 secs. 9627 entries. Commented Mar 16, 2023 at 20:51
  • @MJM it is pretty slow in my opinion. I think there is still room for optimization... But if it is god enough for you... Commented Mar 16, 2023 at 22:14

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.