I am trying to remove the characters from the list generated from the second string. At the end, if the array is empty, it is the permutation of two strings, else it is not.
Is this logic fine?
def is_permutation(string1, string2):
if len(string1) != len(string2):
return False
string2 = list(string2)
for char in string1:
if char in string2:
string2.remove(char)
if not string2:
return True
return False
if __name__ == '__main__':
print(is_permutation("abc", "cba"))
print(is_permutation("bc", "ab"))
print(is_permutation("dog", "god"))
-
3\$\begingroup\$ Logic is fine but not performances, try to use a Counter object. \$\endgroup\$N74– N742018年03月29日 06:36:50 +00:00Commented Mar 29, 2018 at 6:36
-
3\$\begingroup\$ Have a look at Determining if a word is an anagram of another for more efficient approaches. \$\endgroup\$Martin R– Martin R2018年03月29日 06:39:41 +00:00Commented Mar 29, 2018 at 6:39
-
\$\begingroup\$ Thanks, I have been told not to use any library. \$\endgroup\$Kishan Mehta– Kishan Mehta2018年03月29日 06:40:08 +00:00Commented Mar 29, 2018 at 6:40
-
\$\begingroup\$ A Counter is a special case of a dictionary where the key is the character and the value is the number of time it occurs. You could implement it without any libraries. \$\endgroup\$mochi– mochi2018年04月03日 00:03:50 +00:00Commented Apr 3, 2018 at 0:03
-
\$\begingroup\$ Ok, what would be time complexity for it. For my approach, it is O(n). \$\endgroup\$Kishan Mehta– Kishan Mehta2018年04月03日 05:07:11 +00:00Commented Apr 3, 2018 at 5:07
3 Answers 3
I don't want to post an answer but I have seen no one pointing out that your code objectively speaking is very inefficient.
Now, two strings are permutations of each other if and only if every character that occurs in one string N times occurs N times in the other string, vice versa.
Now you only need to go through the strings once. You can just count the occurrences of characters.
Like this:
occurrences = {}
for c in "occurrences":
occurrences[c] = occurrences.setdefault(c, 0) + 1
occurrences
If you run this code, you will see:
{'o': 1, 'c': 3, 'u': 1, 'r': 2, 'e': 2, 'n': 1, 's': 1}
Now two strings are permutations of each other if the resultant counters after going through this operation are equal, we check using ==
operator.
The code:
def count_chars(string):
counts = {}
for c in string:
counts[c] = counts.setdefault(c, 0) + 1
return counts
def is_permutation(str1, str2):
return count_chars(str1) == count_chars(str2)
Now if you use collections.Counter
this becomes even more trivial:
from collections import Counter
def is_permutation(str1, str2):
return Counter(str1) == Counter(str2)
appropriate datastructure
for char in string1:
if char in string2:
string2.remove(char)
That \$O(n)\$ linear .remove()
call is going to be very expensive.
Suppose we encounter an input like this:
from string import ascii_lowercase
string1 = "".join(reversed(ascii_lowercase)) # "zyx...cba"
Given the quadratic nested loops, we're looking at \$O(n^3)\$ cubic complexity. And with unicode inputs, \$n\$ can take on much more than \26ドル\$ distinct values.
A dict
would make much more sense here than a list
,
as it supports \$O(1)\$ constant time operations.
batteries included
Even better would be a
Counter,
which is a more convenient form of a dict
for the problem's purposes.
from collections import Counter
def is_permutation(a: str, b: str) -> bool:
return Counter(a) == Counter(b)
Time complexity here is \$O(n)\$ linear with the size of the longer string.
simplicity
Even if you were unaware of the Counter
class
and wanted to stick with str
, there's still
room for lots of improvement over the OP code.
def is_permutation(a: str, b: str) -> bool:
return sorted(a) == sorted(b)
Time complexity of \$O(n \log n)\$ is still pretty good.
Documentation
The PEP 8 style guide recommends a docstring for functions. For example:
def is_permutation(string1, string2):
""" Determine if 2 strings are permutations of each other """
Naming
is_permutation
is a good name for a function that returns a boolean.
Transforming the input string to an array is fine, but it is a little confusing to keep the same variable name:
string2 = list(string2)
Renaming it would be clearer:
string2_chars = list(string2)
Tests
The examples you show are straightforward. It would be good
to add a few twists. For example, here are 2 words that have
a repeated letter (o
):
print(is_permutation("odor", "door"))
It passes!
Non-alpha:
print(is_permutation("12345_@#$", "_2@13#54ドル"))
It is always good practice to add a sanity check, like the same word twice:
print(is_permutation("information", "information"))