I need to identify/select/print the duplicate values in a field depending on the values of another one.
Example:
Field1 | Field2 |
---|---|
100 | 1 |
100 | 2 |
200 | 1 |
200 | 2 |
200 | 3 |
200 | 2 |
300 | 1 |
300 | 2 |
300 | 3 |
300 | 4 |
Basically, for every value in "Field1"
a unique value should be in "Field2"
: 100.1, 100.2, etc. it's ok, to have two 200.2 it's not.
I'm using these PyQGIS lines to find duplicates within the same field:
import collections
layer = iface.activeLayer()
list_of_values = QgsVectorLayerUtils.getValues(layer, 'Field1')[0]
list_of_values.sort()
print([item for item, count in collections.Counter(list_of_values).items() if count > 1])
but I can't modify them in the proper way to achieve these results.
-
1What exactly do you want your result to be? Do you want to know for which values duplicates exist? Do you want to know the feature IDs of all duplicates? Do you want the feature IDs of all but the first duplicate? Or something completely different?bugmenot123– bugmenot1232022年06月18日 16:21:46 +00:00Commented Jun 18, 2022 at 16:21
-
@bugmenot123, The main aim is to identify the features whose combination of Field1 and Field2 is a duplicate, because for every value in Field1 you can have n values in Field2 but they have to be sequential and in particular their combination must be unique. Being sequential is not mandatory (100.1, 100.2, 100.23, 100.24 is fine as well), having unique couples of values it is (just one 100.7 is allowed).HyPhens– HyPhens2022年06月19日 06:16:55 +00:00Commented Jun 19, 2022 at 6:16
2 Answers 2
I like collections.defaultdict(list):
from collections import defaultdict
layer = QgsProject.instance().mapLayersByName('ok_ak_riks')[0]
fieldnames = ['kom_kod', 'lan_kod']
d = defaultdict(list)
for f in layer.getFeatures():
d[(str(f[fieldnames[0]]), str(f[fieldnames[1]]))].append(f.id()) #For each combination of field1 and 2, append all features ids to a list
#d[('0380', '03')]
#[288, 289]
#So we have two features, 288 and 289, with the value 0380 in field1 and 03 in field 2
to_select = []
for key, idlist in d.items():
if len(idlist)>1: #If there's more than one feature with current field1 and field2 combination, add them to the list
to_select.extend(idlist)
#to_select
#[11, 12, 33, 34, 35, 44, 45, 49, 50, 54, 55, 62, 63, 64, 65, 80, 81, 82, 83, 93, 94, 95, 100, 101, 108, 109, 114, 115, 116, 117, 118, 119, 120, 122, 123, 130, 131, 159, 160, 163, 164, 200, 201, 203, 204, 208, 209, 214, 215, 224, 225, 230, 231, 253, 254, 279, 280, 288, 289, 290, 291, 292, 323, 324, 325, 326]
layer.select(to_select)
Try this:
layer = iface.activeLayer()
field_a = 'Field1' # set inside quotes the name of Field1
field_b = 'Field2' # set inside quotes the name of Field2
feat_list = [(feature.attribute(field_a), feature.attribute(field_b)) for feature in layer.getFeatures()]
selection = []
for id, feature in enumerate(feat_list):
if feat_list.count(feature) > 1:
selection.append(id)
layer.select(selection)
-
2It works, thanks. But I have to say that it' much slower than @BERA's solution, and with a relatively small test dataset .HyPhens– HyPhens2022年06月19日 17:29:41 +00:00Commented Jun 19, 2022 at 17:29
-
@HyPhens you are right, it is very inefficient, I updated my answer with a script with better performanceMayo– Mayo2022年06月19日 18:19:15 +00:00Commented Jun 19, 2022 at 18:19
-
I've upvoted both @BERA's and Mayo's solutions as they work well for my needs for now.HyPhens– HyPhens2022年07月03日 19:17:41 +00:00Commented Jul 3, 2022 at 19:17
Explore related questions
See similar questions with these tags.