Here is my code for checking to see if two strings are anagrams. This seems a little short. Am I missing anything? Both time and space complexity seem to be \$O(1)\$.
//Determine if a string is an anagram
//Time complexity: O(1)
//Space complexity: O
#include <iostream>
#include <string>
using namespace std;
bool anagram(string one, string two)
{
if (one.length() != two.length())
{
return false;
}
//sorting the strings
sort(one.begin(), one.end());
sort(two.begin(), two.end());
return (one == two);
}
int main() {
string one;
string two;
cout << "please enter in a string: ";
cin >> one;
cout << "please enter in another string: ";
cin >> two;
cout << anagram(one, two) <<endl;
return 0;
}
1 Answer 1
Dos and donts
Don't use using namespace std
. It's fine for very small programs, but std::
immediately tells someone that you're using a standard function and not some self-written sort
.
Use all #include
's necessary. std::sort
is defined in <algorithm>
. It seems like your C++ distribution includes <algorithm>
or stl_algo
in one of the other headers. That's not portable.
Use proper names. anagram
doesn't tell anything about the function. Does it create an anagram? Does it check whether something is an anagram of something else? are_anagrams
or something similar is less ambiguous.
The issue of \$\mathcal O(1) \$
You cannot get \$\mathcal O(1)\$ for this. You have to view each letter in each word at least once, therefore ending up with \$\mathcal O(n+m)\$ if one.size()
\$\mathcal O(n)\$ and two.size()
is \$\mathcal O(m)\$. However, since one.size() == two.size()
\$n = m\,ドル otherwise we can find an answer in \$\mathcal O(1) \$.
Regardless, you can solve this in \$\mathcal O(1)\$ additional memory, if you use std::array<int, 128>
or another fixed/variable size \$O(1)\$ indexable container:
typedef std::array<int, 256> character_count_type;
bool is_anagram(const std::string & one, const std::string & two)
{
if(one.size() != two.size())
{
return false;
}
character_count_type character_count_one;
character_count_type character_count_two;
for(std::size_t i = 0; i < one.size(); ++i){
assert(0 <= one[i] && one[i] < 256);
assert(0 <= two[i] && two[i] < 256);
character_count_one[one[i]]++;
character_count_two[two[i]]++;
}
return character_count_one == character_count_two;
}
This has \$\mathcal O(n)\$ worst time complexity, and due to the constant size of std::array<...>
\$\mathcal O(1)\$ additional space.
-
\$\begingroup\$ BTW, just for completeness and also to show how high level languages can easily hide O(..) parts: there's hidden O(n) + O(m) outside in the definition of string
one
andtwo
. So even when n != m, the whole program runs in O(n+m), although the method itself is O(1) reaping the fruits of "costly"string
initialization. (if C-likechar *
would be used instead, the same cost would move from initialization of strings tostrlen
call). \$\endgroup\$Ped7g– Ped7g2016年07月12日 09:24:01 +00:00Commented Jul 12, 2016 at 9:24 -
\$\begingroup\$ sorry just a little confused - so the overall worst case run time is O(n) not O(nlogn)? and best case would be O(1) (e.g. lengths of strings don't match so return false immediately)? \$\endgroup\$jellyxbean– jellyxbean2016年07月13日 01:09:40 +00:00Commented Jul 13, 2016 at 1:09
-
\$\begingroup\$ @jellyxbean: If you use an algorithm that's based on comparison, it will be O(n log n). If you use an algorithm that's based on counting with a fixed range it will take O(N+n) (counting sort). Since we can consider
N = 128
a constant, it's dropping off in the asymptotic analysis. \$\endgroup\$Zeta– Zeta2016年07月13日 06:52:44 +00:00Commented Jul 13, 2016 at 6:52 -
\$\begingroup\$ Usually when discussing algorithm on theoretical complexity basis, you take input data as granted and you focus only on algorithm itself. So initialization of
string
is then "for free" and when lengths are different, theif (n != m)
is O(1). If it would be C-likechar *
, it would beif (strlen(one) != strlen(two))
, which would be O(n+m)! But in real world performance profiling you will get same result as withstring
, because the cost ofstrlen
is hidden instring(const char *)
constructor. @jellyxbean \$\endgroup\$Ped7g– Ped7g2016年07月13日 12:54:45 +00:00Commented Jul 13, 2016 at 12:54 -
\$\begingroup\$
count[ one[i] ]
should probably becount[ static_cast<unsigned char>(one[i]) ]
. You need the string chars to be zero-extended for use as array indices, not sign-extended. \$\endgroup\$Peter Cordes– Peter Cordes2022年08月21日 13:55:50 +00:00Commented Aug 21, 2022 at 13:55
sort()
in there, then it's not going to be O(1). \$\endgroup\$sort()
. Thanks! \$\endgroup\$sort()
has an average time complexity of O(nlogn), then would this time complexity be O(n)? \$\endgroup\$n!=m
we have +1, ifn==m
there's +(2n.logn) sort, and string==
operator is +n again, so total in common case is O(2n+2n+2n.logn+n) = O(5n+2n.logn) = O(n.(5+2.logn)) ... now you "kill" constants, as in big O notation O(5) is same as O(1), so only O(n.logn) remains. You can also imagine it as the biggest complexity eclipses the lesser ones. n*logn is much bigger than n, so O(n+n.logn) is O(n.logn). Just imagine hugen
, makes sense? \$\endgroup\$