More elegant way to count anagrams in Python?

Question 1

Question

I feel like my code could be more elegant/more pythonic/faster, but I can't think of too much more to improve. So, I've come to the internet to see what more can be done with standard python.

What my code does

My code counts anagrams for each word in a list of words. For example:

post, spot stop, tops

are four of the words in my list and since all are anagrams of each other, each word should map to the number 4 in a dictionary. Specifically I'm counting the size of the anagram group each word would fit in. The output for a few words would look something like the following then:

{'1-2-3': 1,
 '1980s': 1,
 '1990s': 1,
 ...
 ...
 'top': 1,
 'topic': 2,
 'topics': 1,
 'topped': 1,
 'tops': 4,
 'tory': 2,
 'total': 1,
 'totals': 1,
 'touch': 1,
 'tough': 2,
 ...
 ...
 'zone': 1,
 'zones': 1,
 'zurich': 1}

My code

from itertools import permutations
from collections import Counter
def get_variants(word):
 return map(lambda t: "".join(t), set(permutations(word)))
def count_anagrams(words):
 anagram_counts = {w: 1 for w in words}
 word_counters = list(map(Counter, words))
 for i, (word, counter) in enumerate(zip(words, word_counters)):
 for other_word, other_counter in zip(words[i+1:], word_counters[i+1:]):
 if counter == other_counter:
 anagram_counts[word] += 1
 anagram_counts[other_word] += 1
 return anagram_counts

Question 2

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers .

Question 3

Performance (Language independent)

The permutations scale factorially with the word length and your loop code squares quadratically with the number of words. Both are really bad scaling factors. The nice thing to realize, that all equivalent anagrams map to the same multiset. There are python libraries for multisets, but already with the built-in tools you can go a long way. Two strings are equivalent, under "Anagram-relation", if they are the same after sorting. We will treat the sorted string as representation of our Anagram equivalence class. Since strings are immutable you can even use these represantions directly as dictionary keys.

Your code becomes then quite short

from collections import defaultdict
def count_anagrams(words):
 result = defaultdict(list)
 for word in words:
 result[''.join(sorted(word))].append(word)
 return dict(result)

This scales linearly with word number and (n * log(n)) with word length. (Under the assumption that dictionary insertion is O(1) and sorted uses a reasonable sorting algorithm.).

The output of

count_anagrams(['stop', 'test', 'post'])

is then

{'opst': ['stop', 'post'], 'estt': ['test']}

You can change it to your needs by transforming it to your desired output with len etc.

If you want to convert it to the exact same form as you have it, one example function would be:

def convert_output(words_in_anagram_class):
 return {word: len(words)
 for words in words_in_anagram_class.values() 
 for word in words}

Then you can call convert_output(count_anagrams(words)). If you want to, you can combine these two functions into one. (Although this is IMHO a far less useful representation of your data.)

Small stuff (Python nitpicks)

map can and should be replaced with comprehensions. Especially if you cast the result into a list. [f(x) for x in iterable] is far nicer to read than list(map(f, iterable)). If you really want a generator, there are also generator expressions (f(x) for x in iterable).

Question 4

Why use a sorted string rather than a counter?

Question 5

Conceptional problem: I don't get exactly what you want. Could you expand? Technical problem: You cannot use a counter as dict key.

Question 6

So, my code has the original word, rather than the sorted version as the key and then the corresponding value would be the length (or the list of anagrams in your case). Using that key, you could either compare counters, or the sorted word. Why use one rather than the other? In general, what do I do if I want the actual word (unsorted) as the key?

Question 7

Nice solution. To get the format the OP wants, you just need 2 passes..

Question 8

@MaartenFabré Thank you very much. ;-)

Question 9

Additionally to mcocdawc's answer, since what I mean is too much for a comment

You need an intermediate stage. You used the list of Counters for this. But then to find anagrams in the list is expensive. A dict would be a better way, and a collections.Counter is especially made for this purpose. Now you just need to find an acceptable representation of your word to discern anagrams. mcocdawc suggested the sorted string, since 2 anagrams result in the same response if you sort the letters. An alternative is the frozenset of the items of a Counter. I suspect the sorted list will be faster, but you'll need to test that.

Based on mcocdawc, but without the intermediate lists:

def count_anagrams(words):
 counter = Counter()
 intermediate = {}
 for word in words:
 intermediate_key = ''.join(sorted(word))
# intermediate_key = tuple(sorted(word)) # alternatively
# intermediate_key = frozenset(Counter(word).items()) # alternatively
 counter[intermediate_key] += 1
 intermediate[word] = intermediate_key 
 return {
 word: counter[intermediate_key]
 for word, intermediate_key in intermediate.items()
 }

I'm not saying this is better/faster than mcocdawc's answer, but I think the intermediate structure is closer to

mcocdawc mcocdawc 9744 silver badges12 bronze badges · Accepted Answer · 2020-05-05 11:59:43Z

Performance (Language independent)

The permutations scale factorially with the word length and your loop code squares quadratically with the number of words. Both are really bad scaling factors. The nice thing to realize, that all equivalent anagrams map to the same multiset. There are python libraries for multisets, but already with the built-in tools you can go a long way. Two strings are equivalent, under "Anagram-relation", if they are the same after sorting. We will treat the sorted string as representation of our Anagram equivalence class. Since strings are immutable you can even use these represantions directly as dictionary keys.

Your code becomes then quite short

from collections import defaultdict
def count_anagrams(words):
 result = defaultdict(list)
 for word in words:
 result[''.join(sorted(word))].append(word)
 return dict(result)

This scales linearly with word number and (n * log(n)) with word length. (Under the assumption that dictionary insertion is O(1) and sorted uses a reasonable sorting algorithm.).

The output of

count_anagrams(['stop', 'test', 'post'])

is then

{'opst': ['stop', 'post'], 'estt': ['test']}

You can change it to your needs by transforming it to your desired output with len etc.

If you want to convert it to the exact same form as you have it, one example function would be:

def convert_output(words_in_anagram_class):
 return {word: len(words)
 for words in words_in_anagram_class.values() 
 for word in words}

Then you can call convert_output(count_anagrams(words)). If you want to, you can combine these two functions into one. (Although this is IMHO a far less useful representation of your data.)

Small stuff (Python nitpicks)

map can and should be replaced with comprehensions. Especially if you cast the result into a list. [f(x) for x in iterable] is far nicer to read than list(map(f, iterable)). If you really want a generator, there are also generator expressions (f(x) for x in iterable).

Conceptional problem: I don't get exactly what you want. Could you expand? Technical problem: You cannot use a counter as dict key.
So, my code has the original word, rather than the sorted version as the key and then the corresponding value would be the length (or the list of anagrams in your case). Using that key, you could either compare counters, or the sorted word. Why use one rather than the other? In general, what do I do if I want the actual word (unsorted) as the key?
Nice solution. To get the format the OP wants, you just need 2 passes..

Stack Exchange Network

More elegant way to count anagrams in Python?

Question

What my code does

My code

2 Answers 2

Performance (Language independent)

Small stuff (Python nitpicks)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

More elegant way to count anagrams in Python?

Question

What my code does

My code

2 Answers 2

Performance (Language independent)

Small stuff (Python nitpicks)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions