0

I have a dataset of roads that i am trying to clean up and looking for suggestions on workflows that will reduce the manual process of find-and-remove.

The dataset is the legacy result of merging a few datasets, some of which shared the same roads but with slight offsets, my guess is due to coordinate system differences in the original source perhaps, that led to linework being very close, but not identical, see image below for an example.

the same road but with slight offset

Therefore I can not use like selecting identical or remove duplicates, without some accounting for that offset. The attributes are also not a help as they do not contain identical road names or other attributes i could use to identify duplicate records. Their shape_length are usually within a couple meters of each other, so again similar but not identical.

Any suggestions on how to first select records in this dataset that are likely duplicates with an offset, then I can at least flag just those records to weed through, or ideally use another process to remove one and leave the other using QGIS?

I am not very Python savvy, so hoping to find a process more focused on geoprocesses or something like exporting the table to excel and identifying candidates for removal there then joining back to spatial and using that to check candidates, etc.

PolyGeo
65.5k29 gold badges115 silver badges350 bronze badges
asked Mar 12 at 17:40
5
  • So far my approach is adding a "StartN" and "StartE" field to the data, calculating geometry on them both as Long with UTM and usually the startpoints of the lines are within 1m of each other giving me identical coordinate values to then identify in Excel and add a "flag" to, then rejoin the excel to the spatial table, filter for the flag and then assess the shorter list of roads that may be duplicates. Seems to be working OK so far, but any other suggestions on improving/automating are welcome. Commented Mar 12 at 17:56
  • 1
    I'd do the same thing, but use midpoints and generate near table on itself. It accounts for different direction of lines and no Excel needed Commented Mar 12 at 19:28
  • If you wish to also ask about ArcGIS Pro then please do that in a separate question. Commented Mar 12 at 19:31
  • @FelixIP how does this work when the features are all in the same FC? I get a table with a bunch of "0" for the features when comparing against itself as everything overlaps. Commented Mar 13 at 15:34
  • 1
    Have a good look at Generate Near table options. It is esri, I am talking about Commented Mar 13 at 18:08

1 Answer 1

0

A bit complicated but you can try to compare the geometries with Hausdorffdistance using SQL with the DB Manager. Similar geometries will get a small value.

Join the layer to itself by features that are within a certain distance of eachother.

You need a unique id field, mine is named id and my layer is named Merged

select 
 row_number() over() as id, 
 a.id as aid, 
 b.id as bid, 
 HausdorffDistance(a.geometry, b.geometry) as hausdiff, 
 a.geometry
from "Merged" a
join
"Merged" as b
on st_distance(a.geometry, b.geometry)<20 and a.id<b.id
order by a.id, b.id, hausdiff

When line 1 and 2 are compared (green arrows) they get a value of 31, from the map I can see they are not duplicates, but 1 and 281 are. Their distance is just 2.8.

By checking a sample of lines I conclude that lines with a value of < 10 are duplicates.

enter image description here

I add a where clause to the query where hausdiff<10 and load the layer into project.

Then select by attributes on your original line layer features that are in the Loaded Query layer:

array_contains(array:=
aggregate( layer:='QueryLayer', aggregate:='array_agg', expression:="aid"), value:="id")

Start editing and delete them.

enter image description here

answered Mar 12 at 18:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.