I am currently taking back Python after over 6 years I do not use it, and to do that I am solving some small challenges on Hacker Rank. I wrote a code that works, but which is not performant and I would like to have some ideas to optimize it.
Problem statement
Given a certain string, count how many anagrams are possible for that string.
The anagrams are those substrings for which the letters used are the same.
For example, if we take the string mama, we have 5 possible anagrams:
- The first
mwith the thirdm - The second
awith the fourtha - The first
mawith the secondma - The first
mawith the middleam - The middle
amwith the lastma
My function should count the number of anagrams for a given string.
Current (working) code
I've come up with the below code which works always as expected (note: I'm adding the __main__ as well so that you can copy-paste and run the code in any IDLE):
(disclaimer: if you want to understand the logic without reading the code, I've added a "logic" section just after the code snippet).
#!/bin/python3
import math
import os
import random
import re
import sys
# Complete the sherlockAndAnagrams function below.
def sherlockAndAnagrams(s):
nb = 0
stringSize = len(s)
for size in range(1, len(s)):
for i in range(0, len(s) - 1):
ss = s[i:i+size]
for j in range(i+1, len(s)):
if j+size <= stringSize:
ss2 = s[j:j+size]
if haveSameLetters(ss, ss2):
nb += 1
return nb
def haveSameLetters(s1,s2):
s1Map = getMapOfLetters(s1)
s2Map = getMapOfLetters(s2)
return areMapEquals(s1Map, s2Map)
def areMapEquals(m1, m2):
if len(m1) != len(m2):
return False
for k,v in m1.items():
if not m2.__contains__(k):
return False
else:
if m2.get(k) != v:
return False
return True
def getMapOfLetters(s):
sMap = {}
for letter in s:
if sMap.__contains__(letter):
sMap[letter] = sMap.get(letter)+1
else:
sMap[letter] = 1
return sMap
if __name__ == '__main__':
s1 = "kkkk"
print("Expect 10, received ", sherlockAndAnagrams(s1))
s2 = "ifailuhkqq"
print("Expect 3, received ", sherlockAndAnagrams(s2))
s3 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
print("Expect 166650, received ", sherlockAndAnagrams(s3))
Logic in words
With this code, I analyze blocks of substrings. First, I analyze each substring of size 1, then of size 2, then of size 3 etc. until n-1 (where n would be the size of the whole string in input).
Let's take the word mama again. What I do is:
The count starts at 0.
- I take the first substring of size
1, which ism, and I compare it with the seconda(False), the thirdm(True) and the fourtha(False). The count is now1. - Then I take the second substring of size
1which isa, and I compare it with the thirdm(False) and the fourtha(True). The count is now2. - Then I take the third substring of size
1, which ism, and I compare it with the fourtha(False). The count stays at2. - Then I move to the first substring of size
2, which isma, and I compare it with the next substringam(True) andma(True). The count is now4. - Then I move to the next substring which is
amand I compare it with the next substringma(True). The count is now5. - Then I move to the first substring of size
3, which ismam, and I compare it with the next substringama(False). The count stays at5.
At this point, the loop is over and I'm left with the count 5.
Optimization
The code works, but it is pretty consuming. For the third example in my code snippet, a string which contains 100 times the letter a, the result of the count of possible anagrams is 166,650 and the code needs to make 490,050 iterations before getting to that result.
In the case of that 100 a's string, it takes 1.324274 seconds to execute.
I am trying to optimize this code. Any thought or suggestion?
2 Answers 2
Unused Imports
You don't use any of the imports in your code. Remove them.
areMapEquals
Dictionarys can be compared using the =
def areMapEquals(m1, m2):
return m1 == m2
But at this point, this function is obsolete since you can now make this check in the hasSameLetters function. But eventually that function gets obsolete too, since you can do that calculation without a function too.
Naming
Functions and variables should be in snake_case, not camelCase.
Counters
Instead of creating a dict yourself, utilize the collections.Counter class, which does all of this for you. It also has it's own equal comparison, so you can reduce a lot of your code.
Looping
Your first loop is just for keeping track of the size. But if you look closely, size is always just one more than i, so just define size inside the second loop as such. This removes the need for the first loop, greatly increasing performance.
Reuse variables
You have stringSize (which should be length) which hold the size of the string, but you still have len(s) in your code. Just use that variable.
Efficient Algorithms
Here's something I wrote a while back that solve this very question:
def count_anagrams(string: str) -> int:
"""
This counts the total amount of anagram substrings, given a string.
@param str string - String to count anagrams.
@return int - Number of anagrams.
"""
n = len(string)
_map = dict()
for i in range(n):
substr = ''
for j in range(i, n):
substr = ''.join(sorted(substr + string[j]))
_map[substr] = _map.get(substr, 0)
_map[substr] += 1
return sum((v * (v - 1)) // 2 for v in _map.values())
While this doesn't utilize the collections.Counter class, it's still extremely efficient and gets the job done!
-
1\$\begingroup\$ Oh hi I see you also though about the Counter class, it is so useful :) \$\endgroup\$Caridorc– Caridorc2020年07月14日 18:47:41 +00:00Commented Jul 14, 2020 at 18:47
-
\$\begingroup\$ @Caridorc Yep! I saw OP counting in a dict and immediately thought of it :). \$\endgroup\$Ben A– Ben A2020年07月14日 18:48:17 +00:00Commented Jul 14, 2020 at 18:48
-
\$\begingroup\$ Thanks a lot, I work daily in Java and just used Python for few months a long while ago. With a single answer you optimized the code and gave me very good tips to refresh the language in my mind (naming, Counter, in...)... thanks! \$\endgroup\$Matteo NNZ– Matteo NNZ2020年07月14日 21:13:22 +00:00Commented Jul 14, 2020 at 21:13
Counter class from collections
You are just re-implementing the Counter class from collections so you can just use:
from collections import Counter
at the start, and then:
def areMapEquals(m1, m2):
return m1 == m2
getMapOfLetters = Counter
Of course you can just do this inline because these are just a few simple operations.
On my machine using the built-ins also nets an about 10% speed-up.
A minor point is that the syntactic sugar in makes the code more readable so:
m2.__contains__(k)
becomes
k in m2
Another usability point is that
>>> help(sherlockAndAnagrams)
Help on function sherlockAndAnagrams in module __main__:
sherlockAndAnagrams(s)
# Complete the sherlockAndAnagrams function below.
is not very very useful, with the library doctest you can put your examples in a string under the function and they will both be run and be provided when you ask help on the function.
As a last point I am not sure but I think that the library itertools has something that would very much help with the sherlockAndAnagrams function.
-
\$\begingroup\$
k in m2added now \$\endgroup\$Caridorc– Caridorc2020年07月14日 19:29:19 +00:00Commented Jul 14, 2020 at 19:29 -
1\$\begingroup\$ Thanks Caridorc, didn't think the "in" would work for key value pairs but just for lists, coming from Java I thought the
__contains__was the equivalent of.contains()forMapbut indeed theinkeyword makes the work and it's more beautiful to see :) \$\endgroup\$Matteo NNZ– Matteo NNZ2020年07月14日 21:15:39 +00:00Commented Jul 14, 2020 at 21:15