I have a dataset of roads that i am trying to clean up and looking for suggestions on workflows that will reduce the manual process of find-and-remove.
The dataset is the legacy result of merging a few datasets, some of which shared the same roads but with slight offsets, my guess is due to coordinate system differences in the original source perhaps, that led to linework being very close, but not identical, see image below for an example.
the same road but with slight offset
Therefore I can not use like selecting identical or remove duplicates, without some accounting for that offset. The attributes are also not a help as they do not contain identical road names or other attributes i could use to identify duplicate records. Their shape_length are usually within a couple meters of each other, so again similar but not identical.
Any suggestions on how to first select records in this dataset that are likely duplicates with an offset, then I can at least flag just those records to weed through, or ideally use another process to remove one and leave the other using QGIS?
I am not very Python savvy, so hoping to find a process more focused on geoprocesses or something like exporting the table to excel and identifying candidates for removal there then joining back to spatial and using that to check candidates, etc.
-
So far my approach is adding a "StartN" and "StartE" field to the data, calculating geometry on them both as Long with UTM and usually the startpoints of the lines are within 1m of each other giving me identical coordinate values to then identify in Excel and add a "flag" to, then rejoin the excel to the spatial table, filter for the flag and then assess the shorter list of roads that may be duplicates. Seems to be working OK so far, but any other suggestions on improving/automating are welcome.user25644– user256442025年03月12日 17:56:13 +00:00Commented Mar 12 at 17:56
-
1I'd do the same thing, but use midpoints and generate near table on itself. It accounts for different direction of lines and no Excel neededFelixIP– FelixIP2025年03月12日 19:28:06 +00:00Commented Mar 12 at 19:28
-
If you wish to also ask about ArcGIS Pro then please do that in a separate question.PolyGeo– PolyGeo ♦2025年03月12日 19:31:43 +00:00Commented Mar 12 at 19:31
-
@FelixIP how does this work when the features are all in the same FC? I get a table with a bunch of "0" for the features when comparing against itself as everything overlaps.user25644– user256442025年03月13日 15:34:17 +00:00Commented Mar 13 at 15:34
-
1Have a good look at Generate Near table options. It is esri, I am talking aboutFelixIP– FelixIP2025年03月13日 18:08:31 +00:00Commented Mar 13 at 18:08
1 Answer 1
A bit complicated but you can try to compare the geometries with Hausdorffdistance using SQL with the DB Manager. Similar geometries will get a small value.
Join the layer to itself by features that are within a certain distance of eachother.
You need a unique id field, mine is named id
and my layer is named Merged
select
row_number() over() as id,
a.id as aid,
b.id as bid,
HausdorffDistance(a.geometry, b.geometry) as hausdiff,
a.geometry
from "Merged" a
join
"Merged" as b
on st_distance(a.geometry, b.geometry)<20 and a.id<b.id
order by a.id, b.id, hausdiff
When line 1 and 2 are compared (green arrows) they get a value of 31, from the map I can see they are not duplicates, but 1 and 281 are. Their distance is just 2.8.
By checking a sample of lines I conclude that lines with a value of < 10 are duplicates.
I add a where clause to the query where hausdiff<10
and load the layer into project.
Then select by attributes on your original line layer features that are in the Loaded Query layer:
array_contains(array:=
aggregate( layer:='QueryLayer', aggregate:='array_agg', expression:="aid"), value:="id")
Start editing and delete them.