I have 2 tables: poi
and categories
with below schema.
POI table:
id | name | category | geog |
---|---|---|---|
1 | poi-1 | cat-1 | point() |
2 | poi-2 | cat-1 | point() |
3 | poi-3 | cat-2 | point() |
4 | poi-4 | cat-3 | point() |
.. | .. | .. | .. |
Number of records in table : about 1.8M
Categories table:
id | category | cat_type |
---|---|---|
1 | cat-1 | group-1 |
2 | cat-2 | group-1 |
3 | cat-3 | group-2 |
4 | cat-4 | group-3 |
.. | ... | ... |
3000 | cat-3000 | group-78 |
Total Number of Categories: about 3000 Total Number of category types of categories: 80
What I am trying to archive
I would live to find nearest point of interest by distance from poi
table for given latlong for each of the category type.
i.e.
for latlong: 53.960448, -1.092345, I would like to find nearest geometry which has categories (cat-1, cat-2, cat-3)
what I have done so far
SELECT up.id , up.name, up.category, up.geog <-> 'SRID=4326;MULTIPOINT ((-1.092345 53.960448))'::geography as distance
FROM poi up
WHERE up.category in (SELECT category FROM categories WHERE cat_type = 'group-1')
ORDER BY distance
LIMIT 1;
above query gives me nearest point for a latlong for only 1 group of categories. to get nearest point for all category types, right now I have to run this query for 80 times (total number of category groups).
Any guidance to optimize this / achieve required result in a better way?
Result I am expecting
What I am expecting is, nearest point of interest for each of the category type with distance.
poi_id | category | distance |
---|---|---|
1 | cat-1 | 215 |
2 | cat-2 | 582 |
3 | cat-3 | 217 |
4 | cat-4 | 852 |
.. | ... | ... |
Update 1
Solution provided by @dr_jts is able to provide required result. below is the query which is able to provide result in about 14-16 sec.
SELECT cat_type, id, latitude, longitude, dist
FROM (SELECT dce.cat_type, array_agg(dce.category) as cats
FROM categories dce group by cat_type ) AS grps
CROSS JOIN LATERAL
(SELECT d.id, d.latitude, d.longitude,
geog <-> 'SRID=4326;MULTIPOINT ((-1.100818 53.956503))'::geography AS dist
FROM poi d
WHERE d.category = ANY(grps.cats)
ORDER BY dist LIMIT 1) AS d;
below is the sql explain
result of the query:
Nested Loop (cost=12.75..46877.63 rows=71 width=68) (actual time=24.431..14579.211 rows=77 loops=1)
-> HashAggregate (cost=12.34..13.22 rows=71 width=47) (actual time=1.138..1.713 rows=77 loops=1)
Group Key: dce.cat_type
Batches: 1 Memory Usage: 80kB
-> Seq Scan on categories dce (cost=0.00..9.89 rows=489 width=31) (actual time=0.512..0.974 rows=516 loops=1)
-> Limit (cost=0.41..660.03 rows=1 width=53) (actual time=189.314..189.315 rows=1 loops=77)
-> Index Scan using poi_geog_idx on poi d (cost=0.41..5309278.81 rows=8049 width=53) (actual time=189.310..189.310 rows=1 loops=77)
Order By: (geog <-> '0104000020E6100000010000000101000000EF91CD55F39CF1BF50C3B7B06EFA4A40'::geography)
Filter: ((category)::text = ANY (((array_agg(dce.category)))::text[]))
Rows Removed by Filter: 5682
Planning Time: 1.665 ms
Execution Time: 14580.360 ms
2 Answers 2
When doing "nearest" queries it's most efficient to use the PostGIS Nearest-Neighbour functionality.
To find the nearest neighbour in each group a separate (internal) query is required. This can be described compactly in a single SQL statement by using JOIN LATERAL
on the distinct group values:
WITH cat(category, grp) AS (VALUES
(1, 'group-1')
,(2, 'group-1')
,(3, 'group-2')
,(4, 'group-2')
,(5, 'group-2')
,(6, 'group-3')
,(7, 'group-3')
),
data(id, category, geom) AS (VALUES
(1, 1, 'POINT (0 0)'::geometry)
,(2, 2, 'POINT (1 1)'::geometry)
,(3, 3, 'POINT (0 0)'::geometry)
,(4, 4, 'POINT (1 1)'::geometry)
,(5, 5, 'POINT (2 2)'::geometry)
,(6, 6, 'POINT (0 0)'::geometry)
,(7, 7, 'POINT (1 1)'::geometry)
,(8, 7, 'POINT (2 2)'::geometry)
)
SELECT id, grp, dist, geom
FROM (SELECT DISTINCT grp FROM cat) AS grps
CROSS JOIN LATERAL
(SELECT d.id, d.category, d.geom,
geom <-> ST_Point( 0.1, 0.1 ) AS dist
FROM data d JOIN cat c ON d.category = c.category
WHERE c.grp = grps.grp
ORDER BY dist LIMIT 1) AS d;
-
You can also use
LEFT JOIN
to ensure all groups are included in result even if there is no points in that group.dr_jts– dr_jts2022年09月09日 20:18:29 +00:00Commented Sep 9, 2022 at 20:18 -
1The JOIN to the cat table can be avoided by using Posgres arrays:
SELECT id, grp, dist, geom 12:21 FROM (SELECT grp, array_agg(category) as cats FROM cat GROUP BY grp) AS grps 12:21 CROSS JOIN LATERAL 12:21 (SELECT d.id, d.category, d.geom, 12:21 geom <-> ST_Point( 0.1, 0.1 ) AS dist 12:21 FROM data d 12:21 WHERE d.category = ANY (grps.cats) 12:21 ORDER BY dist LIMIT 1) AS d;
dr_jts– dr_jts2022年09月09日 20:19:54 +00:00Commented Sep 9, 2022 at 20:19 -
Hi There, is surely is better and faster solution. but am not sure why it is still taking about 14-16 sec for the query. is there any why to optimize it further ?apaleja– apaleja2022年09月10日 13:21:24 +00:00Commented Sep 10, 2022 at 13:21
-
Do you have appropriate indexes on the tables?dr_jts– dr_jts2022年09月10日 16:42:49 +00:00Commented Sep 10, 2022 at 16:42
-
I have index on id and geometry for data/point of interest table and id on categories table.apaleja– apaleja2022年09月10日 17:17:45 +00:00Commented Sep 10, 2022 at 17:17
It's better to normalize the data structure first. In an existing structure, try:
SELECT p.id, p.category, y.distance
FROM
( SELECT min(o.id) as id, c.cat_type, x.distance
FROM
( SELECT t.cat_type, min(i.geog <-> 'SRID=4326;MULTIPOINT ((-1.092345 53.960448))'::geography) as distance
FROM poi i
LEFT JOIN categories t on t.category = i.category
GROUP BY t.cat_type
) x
LEFT JOIN categories c on c.cat_type = x.cat_type
LEFT JOIN poi o on o.category = c.category and o.geog <-> 'SRID=4326;MULTIPOINT ((-1.092345 53.960448))'::geography = x.distance
GROUP BY c.cat_type, x.distance
) y
LEFT JOIN poi p on p.id = y.id
I didn't test it on real data, but this query is at least first step to solve your problem.
-
That's great first step. I ran this query on my data and it is taking about 25-27 sec to return result. I am open to normalize the data structure, any suggestions ? also for optimizing this query further ?apaleja– apaleja2022年09月09日 14:46:58 +00:00Commented Sep 9, 2022 at 14:46
group
withcat_type
for better understanding.