I am trying to cluster points in PostGIS. I have the points and the corresponding Latitudes and Longitudes which I have converted into point geometries. I want to cluster points in such a way that all the points within a cluster are within 25 miles of each other.
My input looks like this:
My desired output is this:
Here the new column Cluster_id
denotes the cluster to which the point is assigned to. Please tell me the right query to get this done. I am new to SQL so I am finding hard to write the right query.
So far, I have tried this:
SELECT ST_AsText(unnest(ST_ClusterWithin(the_geom, 1))) FROM all_locations
1 Answer 1
I'd suggest to use the ST_ClusterDBSCAN
Window function rather than the Aggregate function ST_ClusterWithin
:
SELECT *,
ST_ClusterDBSCAN(the_geom, eps := <distance>, minpoints := 1) OVER() AS clst_id
FROM all_locations
;
clst_id
will hold INT
values representing the cluster each rows geometry belongs to.
As stated in the comments, ST_ClusterWithin
will aggregate geometries that are separated by no more than the distance
to each other; using minpoints := 1
in ST_ClusterDBSCAN
will force the same effect.
Compare
WITH
pts AS (
SELECT ST_MakePoint(n, 0) As geom
FROM Generate_Series(0, 5) AS n
)
SELECT dmp.clst_id
FROM (
SELECT ST_ClusterWithin(geom, 1) AS cw
FROM pts
) AS clst,
UNNEST(clst.cw) WITH ORDINALITY AS dmp (clst, clst_id),
LATERAL ST_Dump(ST_CollectionExtract(dmp.clst, 1)) AS extr
;
clst_id | geom
---------+------------
1 | POINT(0 0)
1 | POINT(1 0)
1 | POINT(2 0)
1 | POINT(3 0)
1 | POINT(4 0)
1 | POINT(5 0)
(6 rows)
to
WITH
pts AS (
SELECT ST_MakePoint(n, 0) As geom
FROM Generate_Series(0, 5) AS n
)
SELECT ST_ClusterDBSCAN(geom, 1, 1) OVER() AS clst_id,
ST_AsText(geom) AS geom
FROM pts
;
clst_id | geom
--------+------------
0 | POINT(0 0)
0 | POINT(1 0)
0 | POINT(2 0)
0 | POINT(3 0)
0 | POINT(4 0)
0 | POINT(5 0)
(6 rows)
In both cases the geometries are stretched over a total distance of 5 degrees, but count as one and the same cluster (ST_ClusterDBSCAN
starts counting at 0, whereas the ORDINALITY
stars at 1) since they are within distance/eps
of 1 degree to each other!
This behavior may change for minpoints > 1
(and on other data than the above), as there need to be at least minpoints
core geometries within eps
distance to get counted as cluster.
Needless to say, the latter approach is way less convoluted, and offers some nice functionality built into the windowing behavior (e.g. easy clustering over attributes etc.)
Note:
Both functions assume distance/eps
in units of the underlying CRS; for a geographic reference system, this is degrees! Since there is no signature accepting GEOGRAPHY
for neither of them, you will need to ST_Transform
your data into a suitable projection to be able to work with metric/imperial units.
-
Good answer as always. I have started playing with cuSpatial and other RapidsAI algorithms, as I was starting to feel the pain with certain types of Postgis query (kNN, being a particular pain point). Obviously, not deserting Postgres/Postgis, but there are some algos for which GPU speedups are impossible to ignore.John Powell– John Powell2020年01月24日 10:23:57 +00:00Commented Jan 24, 2020 at 10:23
-
Thank You for your detailed and prompt answer @ThingumaBob. I have implemented your solution but what I have noticed is that in DBSCAN, all the points within a cluster are reachable from another point that is 25 miles away from it. But, I wanted to build clusters in such a way that every point within a cluster is less than 25 miles away from every other points within that cluster.Karthik Katragadda– Karthik Katragadda2020年01月28日 09:54:13 +00:00Commented Jan 28, 2020 at 9:54
-
3@KarthikKatragadda indeed, they do in both functions...in fact, no function does what you ask for, and for a reason; consider the example above (line of points, separated by 1 deg), where would you want the clusters to start and end? Is P(0 0)+P(1 0) a cluster, and eventually P(5 0) left out? Even in this very simple case, clustering with a global cluster radius s ambiguous, in real world data sets this ends up being totally random, if doable at all (computationally). A moving window approach might at least get you consistent assignments, but there is no out-of-the-box solution in PostGISgeozelot– geozelot2020年01月28日 10:21:13 +00:00Commented Jan 28, 2020 at 10:21
Explore related questions
See similar questions with these tags.
ST_ClusterDBSCAN
; this is a window function and does not aggregate rows! Morning @JohnPowell, long time no chat in comments ;-)ST_ClusterWithin
will collect all geometries that are not separated more than the threshold from each other!ST_ClsuterDBSCAN
will produce different results ifminpoints > 1
, but forminpoints = 1
it produces the same result. For global distance (where the cluster radius is no more than a threshold) one would need a kernel/moving window based approach. However, the same issues concerning CRS units apply!