Finding duplicates in field depending on another field's values using PyQGIS

Question 1

I need to identify/select/print the duplicate values in a field depending on the values of another one.

Example:

Field1	Field2
100	1
100	2
200	1
200	2
200	3
200	2
300	1
300	2
300	3
300	4

Basically, for every value in "Field1" a unique value should be in "Field2": 100.1, 100.2, etc. it's ok, to have two 200.2 it's not. I'm using these PyQGIS lines to find duplicates within the same field:

import collections
layer = iface.activeLayer()
list_of_values = QgsVectorLayerUtils.getValues(layer, 'Field1')[0]
list_of_values.sort()
print([item for item, count in collections.Counter(list_of_values).items() if count > 1])

but I can't modify them in the proper way to achieve these results.

Question 2

What exactly do you want your result to be? Do you want to know for which values duplicates exist? Do you want to know the feature IDs of all duplicates? Do you want the feature IDs of all but the first duplicate? Or something completely different?

Question 3

@bugmenot123, The main aim is to identify the features whose combination of Field1 and Field2 is a duplicate, because for every value in Field1 you can have n values in Field2 but they have to be sequential and in particular their combination must be unique. Being sequential is not mandatory (100.1, 100.2, 100.23, 100.24 is fine as well), having unique couples of values it is (just one 100.7 is allowed).

Question 4

I like collections.defaultdict(list):

from collections import defaultdict
layer = QgsProject.instance().mapLayersByName('ok_ak_riks')[0]
fieldnames = ['kom_kod', 'lan_kod']
d = defaultdict(list)
for f in layer.getFeatures():
 d[(str(f[fieldnames[0]]), str(f[fieldnames[1]]))].append(f.id()) #For each combination of field1 and 2, append all features ids to a list
 
#d[('0380', '03')]
#[288, 289]
#So we have two features, 288 and 289, with the value 0380 in field1 and 03 in field 2
to_select = []
for key, idlist in d.items():
 if len(idlist)>1: #If there's more than one feature with current field1 and field2 combination, add them to the list
 to_select.extend(idlist)
 
#to_select
#[11, 12, 33, 34, 35, 44, 45, 49, 50, 54, 55, 62, 63, 64, 65, 80, 81, 82, 83, 93, 94, 95, 100, 101, 108, 109, 114, 115, 116, 117, 118, 119, 120, 122, 123, 130, 131, 159, 160, 163, 164, 200, 201, 203, 204, 208, 209, 214, 215, 224, 225, 230, 231, 253, 254, 279, 280, 288, 289, 290, 291, 292, 323, 324, 325, 326] 
layer.select(to_select)

enter image description here

Question 5

Try this:

layer = iface.activeLayer()
field_a = 'Field1' # set inside quotes the name of Field1
field_b = 'Field2' # set inside quotes the name of Field2
feat_list = [(feature.attribute(field_a), feature.attribute(field_b)) for feature in layer.getFeatures()]
selection = []
for id, feature in enumerate(feat_list):
 if feat_list.count(feature) > 1:
 selection.append(id)
layer.select(selection)

Question 6

It works, thanks. But I have to say that it' much slower than @BERA's solution, and with a relatively small test dataset .

Question 7

@HyPhens you are right, it is very inefficient, I updated my answer with a script with better performance

Question 8

I've upvoted both @BERA's and Mayo's solutions as they work well for my needs for now.

Bera Bera 81.7k14 gold badges84 silver badges198 bronze badges · Accepted Answer · 2022-06-18 16:27:13Z

I like collections.defaultdict(list):

from collections import defaultdict
layer = QgsProject.instance().mapLayersByName('ok_ak_riks')[0]
fieldnames = ['kom_kod', 'lan_kod']
d = defaultdict(list)
for f in layer.getFeatures():
 d[(str(f[fieldnames[0]]), str(f[fieldnames[1]]))].append(f.id()) #For each combination of field1 and 2, append all features ids to a list
 
#d[('0380', '03')]
#[288, 289]
#So we have two features, 288 and 289, with the value 0380 in field1 and 03 in field 2
to_select = []
for key, idlist in d.items():
 if len(idlist)>1: #If there's more than one feature with current field1 and field2 combination, add them to the list
 to_select.extend(idlist)
 
#to_select
#[11, 12, 33, 34, 35, 44, 45, 49, 50, 54, 55, 62, 63, 64, 65, 80, 81, 82, 83, 93, 94, 95, 100, 101, 108, 109, 114, 115, 116, 117, 118, 119, 120, 122, 123, 130, 131, 159, 160, 163, 164, 200, 201, 203, 204, 208, 209, 214, 215, 224, 225, 230, 231, 253, 254, 279, 280, 288, 289, 290, 291, 292, 323, 324, 325, 326] 
layer.select(to_select)

enter image description here

Stack Exchange Network

Finding duplicates in field depending on another field's values using PyQGIS

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding duplicates in field depending on another field's values using PyQGIS

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions