I have a large shapefile with lots of duplicate records. However, each duplicate record has an accuracy rating. I want to remove all duplicate records with a lower accuracy rating.
This would be simple but there can be duplicate records with the same accuracy rating and I want to leave them in. This might be easier to show than to explain:
Key Accuracy
ABC 100
ABC 100
ABC 50
XYZ 100
XYZ 75
XYZ 75
123 50
123 20
123 20
In the above table I would want to keep the top two ABC records as they both have the best accuracy, but remove the last one. I would want to keep the top XYZ record but remove the other two. And I would want to keep the top 123 record as, although it doesn't have the best possible accuracy, it has the highest accuracy for that batch of duplicate records.
How do I do this using ArcPy?
2 Answers 2
You can do this by looping through the shapefile first with a SearchCursor, creating a dictionary item for each "Key" field, where the value is a list of the "Accuracy" field values. Then loop through again with a UpdateCursor, compare with the maximum accuracy value from the dictionary, and delete the row as appropriate.
import arcpy
# Create dictionary
d = {}
# Loop through shapefile with Search Cursor
with arcpy.da.SearchCursor("myShapefile.shp", ["Key", "Accuracy"]) as sCur:
for row in sCur:
if row[0] in d:
d[row[0]].append(row[1])
else:
d[row[0]] = [row[1]]
# Loop through shapefile with Update Cursor
with arcpy.da.UpdateCursor("myShapefile.shp", ["Key", "Accuracy"]) as uCur:
for row in uCur:
if row[1] < max(d[row[0]]):
uCur.deleteRow()
Similar to what the other posters have said. Use arcpy.da.searchcursor and use the where clause. You can then set the where clause to be "where accuarcy> 80" or whatever you want. check out this resource http://resources.arcgis.com/en/help/main/10.1/index.html#//002z0000001r000000