Optimizing string analysis using slicing

Question 1

I am currently taking back Python after over 6 years I do not use it, and to do that I am solving some small challenges on Hacker Rank. I wrote a code that works, but which is not performant and I would like to have some ideas to optimize it.

Problem statement

Given a certain string, count how many anagrams are possible for that string. The anagrams are those substrings for which the letters used are the same. For example, if we take the string mama, we have 5 possible anagrams:

The first m with the third m
The second a with the fourth a
The first ma with the second ma
The first ma with the middle am
The middle am with the last ma

My function should count the number of anagrams for a given string.

Current (working) code

I've come up with the below code which works always as expected (note: I'm adding the __main__ as well so that you can copy-paste and run the code in any IDLE): (disclaimer: if you want to understand the logic without reading the code, I've added a "logic" section just after the code snippet).

#!/bin/python3
import math
import os
import random
import re
import sys
# Complete the sherlockAndAnagrams function below.
def sherlockAndAnagrams(s):
 nb = 0
 stringSize = len(s)
 for size in range(1, len(s)):
 for i in range(0, len(s) - 1):
 ss = s[i:i+size]
 for j in range(i+1, len(s)):
 if j+size <= stringSize:
 ss2 = s[j:j+size]
 if haveSameLetters(ss, ss2):
 nb += 1
 return nb
 
def haveSameLetters(s1,s2):
 s1Map = getMapOfLetters(s1)
 s2Map = getMapOfLetters(s2)
 return areMapEquals(s1Map, s2Map)
def areMapEquals(m1, m2):
 if len(m1) != len(m2):
 return False
 for k,v in m1.items():
 if not m2.__contains__(k):
 return False
 else:
 if m2.get(k) != v:
 return False
 return True
def getMapOfLetters(s):
 sMap = {}
 for letter in s:
 if sMap.__contains__(letter):
 sMap[letter] = sMap.get(letter)+1
 else:
 sMap[letter] = 1
 return sMap
if __name__ == '__main__':
 s1 = "kkkk"
 print("Expect 10, received ", sherlockAndAnagrams(s1))
 s2 = "ifailuhkqq"
 print("Expect 3, received ", sherlockAndAnagrams(s2))
 s3 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
 print("Expect 166650, received ", sherlockAndAnagrams(s3))

Logic in words

With this code, I analyze blocks of substrings. First, I analyze each substring of size 1, then of size 2, then of size 3 etc. until n-1 (where n would be the size of the whole string in input).

Let's take the word mama again. What I do is:

The count starts at 0.

I take the first substring of size 1, which is m, and I compare it with the second a (False), the third m (True) and the fourth a (False). The count is now 1.
Then I take the second substring of size 1 which is a, and I compare it with the third m (False) and the fourth a (True). The count is now 2.
Then I take the third substring of size 1, which is m, and I compare it with the fourth a (False). The count stays at 2.
Then I move to the first substring of size 2, which is ma, and I compare it with the next substring am (True) and ma (True). The count is now 4.
Then I move to the next substring which is am and I compare it with the next substring ma (True). The count is now 5.
Then I move to the first substring of size 3, which is mam, and I compare it with the next substring ama (False). The count stays at 5.

At this point, the loop is over and I'm left with the count 5.

Optimization

The code works, but it is pretty consuming. For the third example in my code snippet, a string which contains 100 times the letter a, the result of the count of possible anagrams is 166,650 and the code needs to make 490,050 iterations before getting to that result. In the case of that 100 a's string, it takes 1.324274 seconds to execute.

I am trying to optimize this code. Any thought or suggestion?

Question 2

Unused Imports

You don't use any of the imports in your code. Remove them.

`areMapEquals`

Dictionarys can be compared using the =

def areMapEquals(m1, m2):
 return m1 == m2

But at this point, this function is obsolete since you can now make this check in the hasSameLetters function. But eventually that function gets obsolete too, since you can do that calculation without a function too.

Naming

Functions and variables should be in snake_case, not camelCase.

Counters

Instead of creating a dict yourself, utilize the collections.Counter class, which does all of this for you. It also has it's own equal comparison, so you can reduce a lot of your code.

Looping

Your first loop is just for keeping track of the size. But if you look closely, size is always just one more than i, so just define size inside the second loop as such. This removes the need for the first loop, greatly increasing performance.

Reuse variables

You have stringSize (which should be length) which hold the size of the string, but you still have len(s) in your code. Just use that variable.

Efficient Algorithms

Here's something I wrote a while back that solve this very question:

def count_anagrams(string: str) -> int:
 """
 This counts the total amount of anagram substrings, given a string.
 @param str string - String to count anagrams.
 @return int - Number of anagrams.
 """
 n = len(string)
 _map = dict() 
 
 for i in range(n):
 substr = ''
 for j in range(i, n): 
 substr = ''.join(sorted(substr + string[j]))
 _map[substr] = _map.get(substr, 0)
 _map[substr] += 1
 return sum((v * (v - 1)) // 2 for v in _map.values())

While this doesn't utilize the collections.Counter class, it's still extremely efficient and gets the job done!

Question 3

Oh hi I see you also though about the Counter class, it is so useful :)

Question 4

@Caridorc Yep! I saw OP counting in a dict and immediately thought of it :).

Question 5

Thanks a lot, I work daily in Java and just used Python for few months a long while ago. With a single answer you optimized the code and gave me very good tips to refresh the language in my mind (naming, Counter, in...)... thanks!

Question 6

`Counter` class from collections

You are just re-implementing the Counter class from collections so you can just use:

from collections import Counter

at the start, and then:

def areMapEquals(m1, m2):
 return m1 == m2
getMapOfLetters = Counter

Of course you can just do this inline because these are just a few simple operations.

On my machine using the built-ins also nets an about 10% speed-up.

A minor point is that the syntactic sugar in makes the code more readable so:

m2.__contains__(k)

becomes

k in m2

Another usability point is that

>>> help(sherlockAndAnagrams)
Help on function sherlockAndAnagrams in module __main__:
sherlockAndAnagrams(s)
 # Complete the sherlockAndAnagrams function below.

is not very very useful, with the library doctest you can put your examples in a string under the function and they will both be run and be provided when you ask help on the function.

As a last point I am not sure but I think that the library itertools has something that would very much help with the sherlockAndAnagrams function.

Question 7

k in m2 added now

Question 8

Thanks Caridorc, didn't think the "in" would work for key value pairs but just for lists, coming from Java I thought the __contains__ was the equivalent of .contains() for Map but indeed the in keyword makes the work and it's more beautiful to see :)

Ben A 10.8k5 gold badges38 silver badges103 bronze badges · Accepted Answer · 2020-07-14 18:47:07Z

Unused Imports

You don't use any of the imports in your code. Remove them.

`areMapEquals`

Dictionarys can be compared using the =

def areMapEquals(m1, m2):
 return m1 == m2

But at this point, this function is obsolete since you can now make this check in the hasSameLetters function. But eventually that function gets obsolete too, since you can do that calculation without a function too.

Naming

Functions and variables should be in snake_case, not camelCase.

Counters

Instead of creating a dict yourself, utilize the collections.Counter class, which does all of this for you. It also has it's own equal comparison, so you can reduce a lot of your code.

Looping

Your first loop is just for keeping track of the size. But if you look closely, size is always just one more than i, so just define size inside the second loop as such. This removes the need for the first loop, greatly increasing performance.

Reuse variables

You have stringSize (which should be length) which hold the size of the string, but you still have len(s) in your code. Just use that variable.

Efficient Algorithms

Here's something I wrote a while back that solve this very question:

def count_anagrams(string: str) -> int:
 """
 This counts the total amount of anagram substrings, given a string.
 @param str string - String to count anagrams.
 @return int - Number of anagrams.
 """
 n = len(string)
 _map = dict() 
 
 for i in range(n):
 substr = ''
 for j in range(i, n): 
 substr = ''.join(sorted(substr + string[j]))
 _map[substr] = _map.get(substr, 0)
 _map[substr] += 1
 return sum((v * (v - 1)) // 2 for v in _map.values())

While this doesn't utilize the collections.Counter class, it's still extremely efficient and gets the job done!

Oh hi I see you also though about the Counter class, it is so useful :)
@Caridorc Yep! I saw OP counting in a dict and immediately thought of it :).
Thanks a lot, I work daily in Java and just used Python for few months a long while ago. With a single answer you optimized the code and gave me very good tips to refresh the language in my mind (naming, Counter, in...)... thanks!

Stack Exchange Network

Optimizing string analysis using slicing

Problem statement

Current (working) code

Logic in words

Optimization

2 Answers 2

Unused Imports

`areMapEquals`

Naming

Counters

Looping

Reuse variables

Efficient Algorithms

`Counter` class from collections

You must log in to answer this question.

Hot Network Questions

Optimizing string analysis using slicing

Problem statement

Current (working) code

Logic in words

Optimization

2 Answers 2

Unused Imports

areMapEquals

Naming

Counters

Looping

Reuse variables

Efficient Algorithms

Counter class from collections

You must log in to answer this question.

Related

Hot Network Questions

`areMapEquals`

`Counter` class from collections