9

I'm looking for duplicate records in dbf files based upon the attribute called 'ID'. I have various dbf files from 500,000 records to 1.5 million and I know there are a host of duplicates.

I would like to add a field 'Duplicate' that says Yes or No (or 1 or 0 is fine) when the ID attribute is present elsewhere. Using the following python script in Field Calculator returns 1 for a duplicate entry and 0 for unique entry;

uniqueList = []
def isDuplicate(inValue):
 if inValue in uniqueList:
 return 1
 else:
 uniqueList.append(inValue)
 return 0
isDuplicate(!FIELD_NAME!)

However, the 1st record of, for example, 5 duplicate IDs will also be returned as a 0 (the subsequent 4 are considered the duplicates). I would need all 5 to be marked as duplicate as the ID exists elsewhere.

Using the following code will give you an incremental count of how many times that ID occurs with 1 meaning the 1st occasion and so forth;

UniqueDict = {}
def isDuplicateIndex(inValue):
 UniqueDict.setdefault(inValue,0)
 UniqueDict[inValue] += 1
 return UniqueDict[inValue]
isDuplicateIndex( !YOUR_FIELD! )

I just want a 1 (or Yes) if the ID of that record exists elsewhere! (ArcGIS version 10.1)

I have seen other answers such as Python script for identifying duplicate records (follow up) but it doesn't quite work.

asked Jan 14, 2014 at 12:42

3 Answers 3

11

An alternative solution is to use the existing "summary statistics" tool in ArcGIS, then you join the resulting table based on you ID field. The duplicates will have a "COUNT" larger than 1, so it is then simple to calculate it with your field calculator.

answered Jan 14, 2014 at 12:51
5
  • How does your method achieve assigning the first duplicate record found as '0'? Commented Jan 14, 2014 at 13:22
  • @ radouxju Thanks for your answer, i can see what amounts of polygons are duplicates by simply selecting by attribute now. Surprised this didnt occur to me when all the python stuff did! Commented Jan 14, 2014 at 13:27
  • @artwork21 i didnt want the 1st duplicate to be a 0, i wanted anything that had a duplicate to be a 'YES', or now - as it is - a number greater than 1 Commented Jan 14, 2014 at 13:28
  • @Sam, what are you referring about with this statement, "however the 1st record of, for example, 5 duplicate IDs will also be returned as a 0;"? Commented Jan 14, 2014 at 13:36
  • @artwork21. Apologies, i think my original wording wasn't very clear, i will amend. What i was trying to say was that if 5 records all had the same ID, that piece of python code would identify the 1st instance as a unique ID and the subsequent 4 as being the duplicates. I wanted all 5 to be marked as duplicates (i.e. that ID existed elsewhere) Commented Jan 14, 2014 at 13:39
1

The following script creates a new field with the number of occurrences of each value from a specified field. So, for ex, if you have "Paris" 6 times in that field, each row with "Paris" will get a 6.

import arcpy
arcpy.env.workspace=r"D:\test.gdb"
infeature="sample_feature"
field_in="sample_field"
field_out="COUNT_"+field_in
#create the field for the count values
arcpy.AddField_management(infeature,field_out,"SHORT")
#creating the list with all the values in the field, including duplicates
lista=[]
cursor1=arcpy.SearchCursor(infeature)
for row in cursor1:
 i=row.getValue(field_in) 
 lista.append(i)
del cursor1, row
#updating the count field with the number on occurrences of field_in values
#in the previously created list
cursor2=arcpy.UpdateCursor(infeature)
for row in cursor2:
 i=row.getValue(field_in)
 occ=lista.count(i) 
 row.setValue(field_out,occ)
 cursor2.updateRow(row)
del cursor2, row
print ("Done.")

It can be easily modified so that you can have "Yes" or 1 if count>1, but I guess that having the actual count number is more useful.

Later Edit: Or you could use this in field calculator. Pre-Logic script Code:

infeature="sample_feature" #change with the name of your feature
lista=[]
field="sample_field" #change with your field with duplicates
cursor1=arcpy.SearchCursor(infeature)
for row in cursor1:
 i=row.getValue(field) 
 lista.append(i)
del cursor1, row
def duplicates(field_in): 
 occ=lista.count(field_in)
 return occ

duplicate field=

duplicates(!sample_field!)
answered Feb 5, 2017 at 10:17
0
1

Another alternative solution (only works with SDE environments) is to use the existing SQL functionality in ArcGIS to show the duplicate records

Get Duplicate Records in Table (Select by Attribute)

[FIELD_NAME] In (SELECT [FIELD_NAME] FROM [TABLE_NAME] GROUP BY [FIELD_NAME] HAVING Count(*)>1 )

Example:

ID In (SELECT ID FROM GISDATA.MY_TABLE GROUP BY ID HAVING Count(*)>1 )
answered Oct 8, 2015 at 11:44
4
  • Can you get this to work in a file geodatabase? The query works successfully in a personal geodatabase, but when I try running it in a file geodatabase it fails with the message "An invalid SQL statement was used." Edit: according to the documentation link, only limited subqueries are supported in file geodatabases. Commented Mar 30, 2016 at 2:47
  • The query is copied directly from your post and references the correct table and field names. The query is valid when I remove HAVING COUNT(*) > 1. I really don't see a way of getting it to work in file geodatabases. I know this tech article is somewhat dated, but it seems to be the source of your SQL statement and it indicates that it does not work in file geodatabases. I'm ready to upvote your answer if I can make it work in file gdbs, or clarification is added to indicate they are the exception. Commented Mar 30, 2016 at 19:58
  • @isburns I was mistaken, works in SDE environment and not file geodatabase. One thing you can do as a workaround is bring the table data into Excel find the duplicates and then join the list of dupes back in ArcGIS which would then only show those records, not ideal but works. Commented Mar 31, 2016 at 12:03
  • 1
    Thanks for the update. I upvoted your answer because it does work and is fairly simple and fast in the supported geodatabases. I know it's in the comments now, but you might want to also edit the post itself to indicate that it works in personal and SDE geodatabases but not file geodatabases or shapefiles. Commented Mar 31, 2016 at 19:30

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.