Checking if two strings are anagrams

Question 1

Here is my code for checking to see if two strings are anagrams. This seems a little short. Am I missing anything? Both time and space complexity seem to be \$O(1)\$.

//Determine if a string is an anagram
//Time complexity: O(1) 
//Space complexity: O
#include <iostream>
#include <string>
using namespace std;
bool anagram(string one, string two) 
{
 if (one.length() != two.length())
 { 
 return false;
 }
 //sorting the strings
 sort(one.begin(), one.end());
 sort(two.begin(), two.end());
 return (one == two);
}
int main() {
 string one;
 string two;
 cout << "please enter in a string: ";
 cin >> one;
 cout << "please enter in another string: ";
 cin >> two;
 cout << anagram(one, two) <<endl;
 return 0;
}

Question 2

Well, if you've got a call to sort() in there, then it's not going to be O(1).

Question 3

Like @Greg said, it's not O(1). You can implement the algorithm using character occurence counting but it should be at least O(N) with N = maxlength(one, two)

Question 4

Oh whoops, totally forgot about calling sort(). Thanks!

Question 5

if the C++ function sort() has an average time complexity of O(nlogn), then would this time complexity be O(n)?

Question 6

to calculate total complexity of application you just add all Os together, so in your case the input of strings is (2n+2m) (cin+string constructor), then if n!=m we have +1, if n==m there's +(2n.logn) sort, and string == operator is +n again, so total in common case is O(2n+2n+2n.logn+n) = O(5n+2n.logn) = O(n.(5+2.logn)) ... now you "kill" constants, as in big O notation O(5) is same as O(1), so only O(n.logn) remains. You can also imagine it as the biggest complexity eclipses the lesser ones. n*logn is much bigger than n, so O(n+n.logn) is O(n.logn). Just imagine huge n, makes sense?

Question 7

Dos and donts

Don't use using namespace std. It's fine for very small programs, but std:: immediately tells someone that you're using a standard function and not some self-written sort.

Use all #include's necessary. std::sort is defined in <algorithm>. It seems like your C++ distribution includes <algorithm> or stl_algo in one of the other headers. That's not portable.

Use proper names. anagram doesn't tell anything about the function. Does it create an anagram? Does it check whether something is an anagram of something else? are_anagrams or something similar is less ambiguous.

The issue of \$\mathcal O(1) \$

You cannot get \$\mathcal O(1)\$ for this. You have to view each letter in each word at least once, therefore ending up with \$\mathcal O(n+m)\$ if one.size() \$\mathcal O(n)\$ and two.size() is \$\mathcal O(m)\$. However, since one.size() == two.size() \$n = m\,ドル otherwise we can find an answer in \$\mathcal O(1) \$.

Regardless, you can solve this in \$\mathcal O(1)\$ additional memory, if you use std::array<int, 128> or another fixed/variable size \$O(1)\$ indexable container:

typedef std::array<int, 256> character_count_type;
bool is_anagram(const std::string & one, const std::string & two)
{
 if(one.size() != two.size())
 {
 return false;
 }
 character_count_type character_count_one;
 character_count_type character_count_two;
 for(std::size_t i = 0; i < one.size(); ++i){
 assert(0 <= one[i] && one[i] < 256);
 assert(0 <= two[i] && two[i] < 256);
 character_count_one[one[i]]++;
 character_count_two[two[i]]++;
 }
 return character_count_one == character_count_two;
}

This has \$\mathcal O(n)\$ worst time complexity, and due to the constant size of std::array<...> \$\mathcal O(1)\$ additional space.

Question 8

BTW, just for completeness and also to show how high level languages can easily hide O(..) parts: there's hidden O(n) + O(m) outside in the definition of string one and two. So even when n != m, the whole program runs in O(n+m), although the method itself is O(1) reaping the fruits of "costly" string initialization. (if C-like char * would be used instead, the same cost would move from initialization of strings to strlen call).

Question 9

sorry just a little confused - so the overall worst case run time is O(n) not O(nlogn)? and best case would be O(1) (e.g. lengths of strings don't match so return false immediately)?

Question 10

@jellyxbean: If you use an algorithm that's based on comparison, it will be O(n log n). If you use an algorithm that's based on counting with a fixed range it will take O(N+n) (counting sort). Since we can consider N = 128 a constant, it's dropping off in the asymptotic analysis.

Question 11

Usually when discussing algorithm on theoretical complexity basis, you take input data as granted and you focus only on algorithm itself. So initialization of string is then "for free" and when lengths are different, the if (n != m) is O(1). If it would be C-like char *, it would be if (strlen(one) != strlen(two)), which would be O(n+m)! But in real world performance profiling you will get same result as with string, because the cost of strlen is hidden in string(const char *) constructor. @jellyxbean

Question 12

count[ one[i] ] should probably be count[ static_cast<unsigned char>(one[i]) ]. You need the string chars to be zero-extended for use as array indices, not sign-extended.

Zeta Zeta 19.6k2 gold badges57 silver badges90 bronze badges · Accepted Answer · 2016-07-12 04:29:46Z

Dos and donts

Don't use using namespace std. It's fine for very small programs, but std:: immediately tells someone that you're using a standard function and not some self-written sort.

Use all #include's necessary. std::sort is defined in <algorithm>. It seems like your C++ distribution includes <algorithm> or stl_algo in one of the other headers. That's not portable.

Use proper names. anagram doesn't tell anything about the function. Does it create an anagram? Does it check whether something is an anagram of something else? are_anagrams or something similar is less ambiguous.

The issue of \$\mathcal O(1) \$

You cannot get \$\mathcal O(1)\$ for this. You have to view each letter in each word at least once, therefore ending up with \$\mathcal O(n+m)\$ if one.size() \$\mathcal O(n)\$ and two.size() is \$\mathcal O(m)\$. However, since one.size() == two.size() \$n = m\,ドル otherwise we can find an answer in \$\mathcal O(1) \$.

Regardless, you can solve this in \$\mathcal O(1)\$ additional memory, if you use std::array<int, 128> or another fixed/variable size \$O(1)\$ indexable container:

typedef std::array<int, 256> character_count_type;
bool is_anagram(const std::string & one, const std::string & two)
{
 if(one.size() != two.size())
 {
 return false;
 }
 character_count_type character_count_one;
 character_count_type character_count_two;
 for(std::size_t i = 0; i < one.size(); ++i){
 assert(0 <= one[i] && one[i] < 256);
 assert(0 <= two[i] && two[i] < 256);
 character_count_one[one[i]]++;
 character_count_two[two[i]]++;
 }
 return character_count_one == character_count_two;
}

This has \$\mathcal O(n)\$ worst time complexity, and due to the constant size of std::array<...> \$\mathcal O(1)\$ additional space.

BTW, just for completeness and also to show how high level languages can easily hide O(..) parts: there's hidden O(n) + O(m) outside in the definition of string one and two. So even when n != m, the whole program runs in O(n+m), although the method itself is O(1) reaping the fruits of "costly" string initialization. (if C-like char * would be used instead, the same cost would move from initialization of strings to strlen call).
sorry just a little confused - so the overall worst case run time is O(n) not O(nlogn)? and best case would be O(1) (e.g. lengths of strings don't match so return false immediately)?
@jellyxbean: If you use an algorithm that's based on comparison, it will be O(n log n). If you use an algorithm that's based on counting with a fixed range it will take O(N+n) (counting sort). Since we can consider N = 128 a constant, it's dropping off in the asymptotic analysis.
Usually when discussing algorithm on theoretical complexity basis, you take input data as granted and you focus only on algorithm itself. So initialization of string is then "for free" and when lengths are different, the if (n != m) is O(1). If it would be C-like char *, it would be if (strlen(one) != strlen(two)), which would be O(n+m)! But in real world performance profiling you will get same result as with string, because the cost of strlen is hidden in string(const char *) constructor. @jellyxbean
count[ one[i] ] should probably be count[ static_cast<unsigned char>(one[i]) ]. You need the string chars to be zero-extended for use as array indices, not sign-extended.

Stack Exchange Network

Checking if two strings are anagrams

1 Answer 1

Dos and donts

The issue of \$\mathcal O(1) \$

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Checking if two strings are anagrams

1 Answer 1

Dos and donts

The issue of \$\mathcal O(1) \$

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions