5
\$\begingroup\$

I was trying to solve this questions:

Given an input string, find all combination of unique substrings, which are anagrams of each other and distinct from each other. The substrings must follow 2 <= length(substring) < len(original string).

Example 1:

  • String: "vadapav"
  • substrings: va, vad, vada, vadap, vadapa, ad, ada, adap, adapa, adapav, da, dap, dapa, dapav, ap, apa, apav, pa, pav, av
  • Output: (va, av), (ad, da), (ap, pa)

How can it be optimised in terms of algorithmic and code complexity?

# Code goes below
from collections import defaultdict
import itertools
def anagram_substring(str):
 substr_list = []
 ans = []
 is_present = defaultdict(list)
 for i in xrange(len(str)):
 for j in xrange(i+2, len(str)+1):
 substr_list.append(str[i:j])
 substr_list = list(set(substr_list)) 
 for substr in substr_list:
 if is_present[''.join(sorted(substr))]:
 for anagram_str in is_present[''.join(sorted(substr))]:
 ans.append([anagram_str,substr])
 is_present[''.join(sorted(substr))].append(substr) 
 return ans
str = raw_input().strip()
print anagram_substring(str)
asked Mar 31, 2017 at 6:00
\$\endgroup\$
6
  • \$\begingroup\$ The requirement 'The substrings must follow 2 <= length(substring) < len(original string)' is supefluous, because it follows from the preceding sentence: 1-char anagrams can't be distinct, hence len>1, and there's only one substring with length equal to the input string, so it couldn't differ from itself, either. \$\endgroup\$ Commented Mar 31, 2017 at 8:46
  • \$\begingroup\$ If you do not have all expected outputs, the code is broken, which is off-topic here, but my suggestion would be to at least remove duplicates with a set when building the list. \$\endgroup\$ Commented Mar 31, 2017 at 8:50
  • \$\begingroup\$ As far as I can understand your code, you test for two substrings for being reverse of one another (like "dome" & "emod"), not anagrams (like "dome" & "mode"). Reverse is a special case of anagram, but not every anagram is a reverse; not if string has more than 2 characters. For example, in "brabant" your code could find "ab" & "ba", but probably not "bra" & "rab". \$\endgroup\$ Commented Mar 31, 2017 at 8:56
  • \$\begingroup\$ IMHO from the problem description it follows that for "barbarian" the solution should be similar to ("bar", "arb", "bra"). Can your code produce that? \$\endgroup\$ Commented Mar 31, 2017 at 9:05
  • \$\begingroup\$ yes, I modified code, its working fine, can it be optimised more? \$\endgroup\$ Commented Mar 31, 2017 at 12:57

2 Answers 2

2
\$\begingroup\$

There's no need to use a list and convert it to a set, you can use a set from the start.

Also, you're checking for every entry if it's in the list as an anagram and adding it to the answer, which is inefficient. You can do this in a faster way by using a dictionary that has the sorted substr as key and pushing the values in there.

from collections import defaultdict
import itertools
def anagram_substring(str):
 substr_list = set([])
 anagrams = defaultdict(list)
 for i in xrange(len(str)):
 for j in xrange(i + 2, len(str) + 1):
 substr_list.add(str[i:j])
 for substr in substr_list:
 anagrams[''.join(sorted(substr))].append(substr)
 return [values for key, values in anagrams.iteritems() if len(values) > 1]
str = raw_input().strip()
print anagram_substring(str)
answered Apr 1, 2017 at 15:40
\$\endgroup\$
2
\$\begingroup\$

If you look at the current code complexity, it is O(n**2) in space complexity (since all substrings need to be saved).

Also, it is O(n**2)*O(k*log(k)) in time complexity, where O(n**2) is time to generate each substring for original string of length n, and O(k*log(k)) is time complexity to sort each substring of length k and save it as a key to be checked against later on.

In the worst case, k will approach n, so your time complexity would be O(log(n)*n**3) for the overall solution.

However, instead of sorting the keys, if you could just look up the keys for each substring, and compare against their values in other substrings, you would be good to go.

So consider solution below, which basically stores the count of each character within a substring in a Counter dictionary, and compares the Counters. Creation of counter as well as comparision of it is an O(k) operation. So that, the overall time complexity becomes O(n**3) in worst case.

Also note, your original output in the problem statement needs to be corrected: 'dapav', 'vadap' is a pair of anagrams that the output in the question is missing.

def find_anagram_substrings(s):
 length = len(s)
 substrings, substring_chars = [], {}
 results = set()
 for i in xrange(0, length):
 for j in xrange(i+2, length+1):
 substring = s[i:j]
 substrings.append(s[i:j])
 substring_chars[substring] = Counter(substring)
 for s1, s2 in product(substrings, substrings):
 if s1 != s2 and substring_chars[s1] == substring_chars[s2]:
 results.add((s1, s2) if s1 < s2 else (s2, s1))
 return results
print find_anagram_substrings("vadapav")
print find_anagram_substrings("gnomeslikelemons")

Coming to the code in the original post, I've added a few suggestions on the naming conventions that could help make it more readable:

from collections import defaultdict
import itertools
# itertools is not used, no need to import
def anagram_substring(str):
 substr_list = []
 ans = []
 # ans by itself does not clarify what it might contain.
 # result could be a better choice of variable name
 is_present = defaultdict(list)
 # is_present is again an ambiguous choice of variable name
 # it does not tell what values could it be storing
 for i in xrange(len(str)):
 # len(str) is used multiple times, can be a variable
 for j in xrange(i+2, len(str)+1):
 substr_list.append(str[i:j])
 substr_list = list(set(substr_list))
 # a set can be iterated normally, so why convert it back to list 
 for substr in substr_list:
 if is_present[''.join(sorted(substr))]:
 for anagram_str in is_present[''.join(sorted(substr))]:
 ans.append([anagram_str,substr])
 is_present[''.join(sorted(substr))].append(substr) 
 return ans
str = raw_input().strip()
print anagram_substring(str)
answered Apr 4, 2017 at 20:30
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.