The code below is an attempt at a solution to an exercise from the book "Cracking the Coding Interview."
I believe that the worst case time complexity of the code below is \$O(n)\,ドル where n is the length of each string (they should be the same length since I am checking if there lengths are equal) and the space complexity is \$O(n)\$.
Is this correct? In particular does checking the length of each string take \$O(1)\$ time?
def is_permutation(first_string, other_string):
if len(first_string) != len(other_string):
return False
count_first = {}
count_other = {}
for char in first_string:
if char in count_first.keys():
count_first[char] += 1
else:
count_first[char] = 1
for char in other_string:
if char in count_other.keys():
count_other[char] += 1
else:
count_other[char] = 1
for char in count_first.keys():
if char not in count_other.keys():
return False
elif count_first[char] != count_other[char]:
return False
return True
4 Answers 4
Yes, len(str)
should be O(1) in Python. (Good question!) Each of your for
loops is O(n), so your whole function is O(n).
Your counting loops could be written more compactly as
for char in first_string:
count_first[char] = 1 + count_first.get(char, 0)
The epilogue could be simplified to
return count_first == count_other
It pays to get familiar with the standard Python library, though. Your entire function could be more simply implemented as
from collections import Counter
def is_permutation(a, b):
return len(a) == len(b) and Counter(a) == Counter(b)
... where len(a) == len(b)
is an optional optimization. Writing less code simplifies maintenance and tends to create fewer opportunities for introducing bugs (as in Rev 2 of your question).
-
5\$\begingroup\$ "Writing less code simplifies maintenance and tends to create fewer opportunities for introducing bugs": Come over to code golf and you'll get some great tips on how to simplify maintenance. ;-P \$\endgroup\$Stewie Griffin– Stewie Griffin2016年09月09日 07:29:42 +00:00Commented Sep 9, 2016 at 7:29
-
\$\begingroup\$ codegolf.stackexchange.com :D \$\endgroup\$mbomb007– mbomb0072016年09月09日 14:46:46 +00:00Commented Sep 9, 2016 at 14:46
-
4\$\begingroup\$ Let's not get carried away here. There's a reason why I wrote "less code" instead of "shorter code", and included "tends to"! \$\endgroup\$200_success– 200_success2016年09月09日 14:49:25 +00:00Commented Sep 9, 2016 at 14:49
-
\$\begingroup\$ is there a difference (computationally) between writing key in dict vs key in dict.key() ? What value does the function keys() add. Is that something worth learning? \$\endgroup\$geekidharsh– geekidharsh2018年03月26日 05:52:26 +00:00Commented Mar 26, 2018 at 5:52
-
1\$\begingroup\$ @geekidharsh Not calling
dict.keys()
means that you save the work of constructing a view of the keys. \$\endgroup\$200_success– 200_success2018年03月26日 05:58:21 +00:00Commented Mar 26, 2018 at 5:58
What I understand is that you want to know if 2 strings contain the same characters in the same quantity and not necessarily arranged similarly.
For example "cat"
and "act"
should return true
.
You could check the length, then sort both strings and just compare them using ==
.
-
2\$\begingroup\$ ideone.com/7sIoF4 Is this kind of implementation what you were talking about? Not sure about the big O notation in this example. I believe sort is O(n log n) \$\endgroup\$N.J.Dawson– N.J.Dawson2016年09月08日 09:56:39 +00:00Commented Sep 8, 2016 at 9:56
-
3\$\begingroup\$ Yes, it's O(n log n), so less efficient even if more compact to write. \$\endgroup\$Quentin Pradet– Quentin Pradet2016年09月08日 10:07:45 +00:00Commented Sep 8, 2016 at 10:07
-
3\$\begingroup\$ In practice, lots of Python code tends to be slow (by a constant factor), and
dict
operations can also be slow (by a constant factor), so the O(n log n) solution with a small constant factor can be faster than the O(n) solution. To figure out which one is faster for you, benchmark it for your typical inputs. Without the benchmark I'd go with the short solutionsorted(a) == sorted(b)
, because it's easier to understand what it does and that it's correct. \$\endgroup\$pts– pts2016年09月08日 12:26:40 +00:00Commented Sep 8, 2016 at 12:26
I think you can avoid the need for count_other
and for the final loop.
Keep the first loop as it is, but in the second loop (i.e. for char in other_string
), remove each character from count_first
. If you can't remove it, you have a character that's not in first
so return false.
If you reach the end of the second loop, then you just need to check whether count_first
is empty (i.e. all values are zero).
def is_permutation(first_string, other_string):
if len(first_string) != len(other_string):
return False
count_first = {}
for char in first_string:
if char in count_first.keys():
count_first[char] += 1
else:
count_first[char] = 1
for char in other_string:
if char not in count_first.keys():
return False
count_first[char] -= 1
if count_first[char] < 0:
return False
for count in count_first.values():
if count > 0:
return False
return True
This improves the original in two ways: it reduces the storage requirements, and it provides an earlier return in some of the negative cases.
You can eliminate the if char in keys()
test in a number of ways:
count_first.setdefault(0);
- use a
collections.defaultdict
, or better,collections.Counter
instead of adict
If we use a Counter
, it's simple to compare the results:
from collections import Counter
def is_permutation(first_string, other_string):
return Counter(first_string) == Counter(other_string)
Yes, that's the whole function!
Note: Before checking these functions, you need to check if len(str(n)) == len(str(m))
def perm(n,m):
return sorted(str(n)) == sorted(str(m))
%timeit perm(783169,781396)
1.25 μs ± 36.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit perm(783169123123123123123123,781396123123123123123123)
3.44 μs ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit perm(783169123123123123123123,781396123123123123123123)
3.79 μs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The sorted()
function of string is the fastest if the string size is small.
def perm(n,m):
n = [i for i in str(n)]
for i in str(m):
try:
n.remove(i)
except:
return False
return True
%timeit perm(783169,781396)
1.57 μs ± 46.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit perm(783169123123123123123123,781396123123123123123123)
4.77 μs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Using a List
instead of a Dictionary
or Counter()
is faster, since you don't always have to check for all the items.
def perm(n,m):
n = collections.deque(str(n))
for i in str(m):
try:
n.remove(i)
except:
return False
return True
%timeit perm(783169,781396)
1.45 μs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit perm(783169123123123123123123,781396123123123123123123)
3.28 μs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The deque()
function from the collections
module is faster than List
when it comes to popping items.
Even though sorted()
is much faster than deque()
for a small string, deque()
is much faster for a longer string. These timings were calculated for the worst case scenario where the two strings are permutations of each other. For cases where this is not true, deque()
would become even faster while sorted()
would stay the same.
-
1\$\begingroup\$ Those test strings are very short, and don't give any indication of how the code scales as they grow longer - but that's exactly what we need to know if we want to discuss big-O performance. \$\endgroup\$Toby Speight– Toby Speight2019年03月12日 09:40:28 +00:00Commented Mar 12, 2019 at 9:40
-
\$\begingroup\$ Yes you are correct. I tested it for bigger strings and updated the post. As i guessed before, sorted is slower for longer strings \$\endgroup\$LastStep– LastStep2019年03月12日 10:46:44 +00:00Commented Mar 12, 2019 at 10:46
Explore related questions
See similar questions with these tags.
len()
might be more expensive than \$O(n)\,ドル and if it's \$O(n)\$ you don't need to worry about it. Your code must examine every character of both strings at least once, hence you no way can get better than \$O(n)\,ドル so the cost oflen()
doesn't matter in this context. \$\endgroup\$