Identifying equivalent lists

Question 1

Title: Identifying equivalent lists

I am using lists as keys, and therefore want to identify equivalent ones.

For example,

[0, 2, 2, 1, 1, 2]
[4, 0, 0, 1, 1, 0]

are 'equivalent' in my book. I 'equivalify' the list by checking the list value with the highest count, and set that to 0, then the next highest count to 1, and so on and so forth. It's done with this code below, which does the job properly but is fairly slow - and I am wondering if there is a way of doing the same job yet I haven't seen it yet.

for key in preliminary_key_list:
 key = list(key) # convert from immutable tuple to mutable list
 i = 0
 translation = {}
 counts = [[j, key.count(j)] for j in set(key)]
 counts_right = [j[1] for j in counts]
 for count in set(key): 
 translation[counts[counts_right.index(max(counts_right))][0]] = i
 counts_right[counts_right.index(max(counts_right))] = 0
 i += 1
 for i in range(len(key)):
 key[i] = translation[key[i]]

Question 2

Could it be the case that there are multiple values with the same count? E.g. [0, 1, 1, 2, 2, 3, 3, 3]?

Question 3

@DSM Yes it happens. However, to optimise that would be a smaller optimisation than what I'm currently looking for, which is 'how to speed up all key translations'. I will come back to this one after I've optimised the main process

Question 4

for key in preliminary_key_list:
 key = list(key) # convert from immutable tuple to mutable list

You don't need to do this. Rather then creating a modifiable list, just use the tuple and construct a new tuple at the end

 i = 0
 translation = {}
 counts = [[j, key.count(j)] for j in set(key)]

This should be a list of tuples, not a list of lists.

 counts_right = [j[1] for j in counts]
 for count in set(key):

Use for i, count in enumerate(set(key)): to avoid having to manage i yourself.

You use set(key) twice, which will force python to do work twice. Also, there is no guarantee you'll get the same order both times.

 translation[counts[counts_right.index(max(counts_right))][0]] = i
 counts_right[counts_right.index(max(counts_right))] = 0

These lines are really expensive. max will scan the whole list checking for the highest value. .index will then rescan the list to find the index for the value. Then you do the whole thing over again on the next line!

You'd do far better as @GlenRogers suggests to sort by the counts.

 i += 1
 for i in range(len(key)):
 key[i] = translation[key[i]]

Here using a comprehension would be better (I think):

 key = tuple(translation[item] for item in key)

Here is how I'd write the code:

ret = []
for key in preliminary_key_list:
 counts = sorted(set(key), key = lambda k: -key.count(k) )
 translation = { val: idx for idx, val in enumerate(counts) }
 ret.append([translation[k] for k in key])
return ret

Question 5

It's certainly a lot faster and it's concise as well. Makes me wonder how you get to that level where you know how to do it. Your explanation was good too. Thanks Winston

Question 6

You can do things slightly differently - instead of manually sorting, you can use the standard library sort() function. Below I've modified things slightly for convenience, but using it, timeit originally gave 16.3 seconds for 10^6 iterations and now gives 10.8 seconds.

def equivalify( preliminary_key_list ):
 ret = []
 for key in preliminary_key_list:
 key = list(key) # convert from immutable tuple to mutable list
 translation = {}
 counts = [[key.count(j), j] for j in set(key)]
 counts.sort( reverse = True )
 for idx, ( cnt, val ) in enumerate( counts ):
 translation[ val ] = idx
 for i,k in enumerate( key ):
 key[i] = translation[ k ]
 ret += [key]
 return ret

Winston Ewert 30.7k4 gold badges52 silver badges79 bronze badges · Accepted Answer · 2012-10-24 12:24:54Z

for key in preliminary_key_list:
 key = list(key) # convert from immutable tuple to mutable list

You don't need to do this. Rather then creating a modifiable list, just use the tuple and construct a new tuple at the end

 i = 0
 translation = {}
 counts = [[j, key.count(j)] for j in set(key)]

This should be a list of tuples, not a list of lists.

 counts_right = [j[1] for j in counts]
 for count in set(key):

Use for i, count in enumerate(set(key)): to avoid having to manage i yourself.

You use set(key) twice, which will force python to do work twice. Also, there is no guarantee you'll get the same order both times.

 translation[counts[counts_right.index(max(counts_right))][0]] = i
 counts_right[counts_right.index(max(counts_right))] = 0

These lines are really expensive. max will scan the whole list checking for the highest value. .index will then rescan the list to find the index for the value. Then you do the whole thing over again on the next line!

You'd do far better as @GlenRogers suggests to sort by the counts.

 i += 1
 for i in range(len(key)):
 key[i] = translation[key[i]]

Here using a comprehension would be better (I think):

 key = tuple(translation[item] for item in key)

Here is how I'd write the code:

ret = []
for key in preliminary_key_list:
 counts = sorted(set(key), key = lambda k: -key.count(k) )
 translation = { val: idx for idx, val in enumerate(counts) }
 ret.append([translation[k] for k in key])
return ret

It's certainly a lot faster and it's concise as well. Makes me wonder how you get to that level where you know how to do it. Your explanation was good too. Thanks Winston

Stack Exchange Network

Identifying equivalent lists

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Identifying equivalent lists

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions