14

I am trying to cluster points in PostGIS. I have the points and the corresponding Latitudes and Longitudes which I have converted into point geometries. I want to cluster points in such a way that all the points within a cluster are within 25 miles of each other.

My input looks like this:

enter image description here

My desired output is this:

enter image description here

Here the new column Cluster_id denotes the cluster to which the point is assigned to. Please tell me the right query to get this done. I am new to SQL so I am finding hard to write the right query.

So far, I have tried this:

SELECT ST_AsText(unnest(ST_ClusterWithin(the_geom, 1))) FROM all_locations
geozelot
31.4k4 gold badges38 silver badges59 bronze badges
asked Jan 24, 2020 at 5:34
4
  • 2
    Well, from the docs, the 2nd parameter is the distance, which is stated in the units of your spatial reference system. I am guessing, in your case, this is lat/long (4326), so you are going to need to convert that 2nd parameter into 25 miles. Ideally, use a projected coordinate system, as calculating distance in 4326 is a bit fiddly. You should update your question stating what is wrong with what you have so far tried. Commented Jan 24, 2020 at 6:24
  • This is best achieved using ST_ClusterDBSCAN; this is a window function and does not aggregate rows! Morning @JohnPowell, long time no chat in comments ;-) Commented Jan 24, 2020 at 8:00
  • 1
    Yes, but that creates clusters where points could be much more than 25 miles apart, as it is based on the distance between neighbouring points, not a global distance. A windowed version of ClusterWithin would be helpful. Commented Jan 24, 2020 at 8:54
  • Same same; ST_ClusterWithin will collect all geometries that are not separated more than the threshold from each other! ST_ClsuterDBSCAN will produce different results if minpoints > 1, but for minpoints = 1 it produces the same result. For global distance (where the cluster radius is no more than a threshold) one would need a kernel/moving window based approach. However, the same issues concerning CRS units apply! Commented Jan 24, 2020 at 9:16

1 Answer 1

13

I'd suggest to use the ST_ClusterDBSCAN Window function rather than the Aggregate function ST_ClusterWithin:

SELECT *,
 ST_ClusterDBSCAN(the_geom, eps := <distance>, minpoints := 1) OVER() AS clst_id
FROM all_locations
;

clst_id will hold INT values representing the cluster each rows geometry belongs to.


As stated in the comments, ST_ClusterWithin will aggregate geometries that are separated by no more than the distance to each other; using minpoints := 1 in ST_ClusterDBSCAN will force the same effect.

Compare

WITH
 pts AS (
 SELECT ST_MakePoint(n, 0) As geom
 FROM Generate_Series(0, 5) AS n
 )
SELECT dmp.clst_id
FROM (
 SELECT ST_ClusterWithin(geom, 1) AS cw
 FROM pts
) AS clst,
 UNNEST(clst.cw) WITH ORDINALITY AS dmp (clst, clst_id),
 LATERAL ST_Dump(ST_CollectionExtract(dmp.clst, 1)) AS extr
;
 clst_id | geom 
---------+------------
 1 | POINT(0 0)
 1 | POINT(1 0)
 1 | POINT(2 0)
 1 | POINT(3 0)
 1 | POINT(4 0)
 1 | POINT(5 0)
(6 rows)

to

WITH
 pts AS (
 SELECT ST_MakePoint(n, 0) As geom
 FROM Generate_Series(0, 5) AS n
 )
SELECT ST_ClusterDBSCAN(geom, 1, 1) OVER() AS clst_id,
 ST_AsText(geom) AS geom
FROM pts
;
 clst_id | geom 
--------+------------
 0 | POINT(0 0)
 0 | POINT(1 0)
 0 | POINT(2 0)
 0 | POINT(3 0)
 0 | POINT(4 0)
 0 | POINT(5 0)
(6 rows)

In both cases the geometries are stretched over a total distance of 5 degrees, but count as one and the same cluster (ST_ClusterDBSCAN starts counting at 0, whereas the ORDINALITY stars at 1) since they are within distance/eps of 1 degree to each other!

This behavior may change for minpoints > 1 (and on other data than the above), as there need to be at least minpoints core geometries within eps distance to get counted as cluster.

Needless to say, the latter approach is way less convoluted, and offers some nice functionality built into the windowing behavior (e.g. easy clustering over attributes etc.)


Note:

Both functions assume distance/eps in units of the underlying CRS; for a geographic reference system, this is degrees! Since there is no signature accepting GEOGRAPHY for neither of them, you will need to ST_Transform your data into a suitable projection to be able to work with metric/imperial units.

answered Jan 24, 2020 at 9:58
3
  • Good answer as always. I have started playing with cuSpatial and other RapidsAI algorithms, as I was starting to feel the pain with certain types of Postgis query (kNN, being a particular pain point). Obviously, not deserting Postgres/Postgis, but there are some algos for which GPU speedups are impossible to ignore. Commented Jan 24, 2020 at 10:23
  • Thank You for your detailed and prompt answer @ThingumaBob. I have implemented your solution but what I have noticed is that in DBSCAN, all the points within a cluster are reachable from another point that is 25 miles away from it. But, I wanted to build clusters in such a way that every point within a cluster is less than 25 miles away from every other points within that cluster. Commented Jan 28, 2020 at 9:54
  • 3
    @KarthikKatragadda indeed, they do in both functions...in fact, no function does what you ask for, and for a reason; consider the example above (line of points, separated by 1 deg), where would you want the clusters to start and end? Is P(0 0)+P(1 0) a cluster, and eventually P(5 0) left out? Even in this very simple case, clustering with a global cluster radius s ambiguous, in real world data sets this ends up being totally random, if doable at all (computationally). A moving window approach might at least get you consistent assignments, but there is no out-of-the-box solution in PostGIS Commented Jan 28, 2020 at 10:21

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.