Check if two strings are anagrams

Question 1

I'm doing some practice questions from the book Cracking the coding interview and wanted to get some people to review my code for bugs and optimizations.

Question:

Write a method to decide if two strings are anagrams or not.

/*
Time complexity: O(n^2)
Space complexity: O(n)
*/
bool IsAnagram(std::string str1, std::string str2)
{
 if(str1.length() != str2.length())
 return false;
 for(int i = 0; i < str1.length();i++)
 {
 bool found = false;
 int j = 0;
 while(!found && j < str2.length())
 {
 if(str1[i] == str2[j])
 {
 found = true;
 str2[j] = NULL;
 }
 j++;
 }
 if(!found)
 return false;
 }
 return true;
}

Question 2

Don't know if you're interested (since it doesn't really review your code, but is an optimization), but one algorithm that comes to mind would be to sort both the strings and then do a compare. n log n (assuming an n log n sort)

Question 3

Consider uppercasing the strings, sorting the characters in the string alphabetically, then doing a equality test between them.

Question 4

Since you only modify a copy of the second string, the first one should be a const reference:

bool IsAnagram(const std::string &str1, std::string str2)

Also, i and j should be of type size_t to match what they're compared with.

However, I think I'd do it like this:

bool IsAnagram2(std::string str1, std::string str2)
{
 std::sort(str1.begin(), str1.end());
 std::sort(str2.begin(), str2.end());
 return str1==str2;
}

Test program:

#include <iostream>
#include <string>
#include <algorithm>
#define SHOW(x) std::cout << # x " = " << x << '\n'
int main()
{
 std::cout << std::boolalpha;
 SHOW(IsAnagram("0円0円0円0円0円", "0円lehl")); 
 SHOW(IsAnagram("hello", "")); 
 SHOW(IsAnagram("0円0円0円0円0円", "olehl")); 
 SHOW(IsAnagram("hello", "ole")); 
 SHOW(IsAnagram("hello", "plehl")); 
 SHOW(IsAnagram("hello", "hello")); 
 SHOW(IsAnagram("hello", "12345")); 
 SHOW(IsAnagram("hello", "Hello"));
 SHOW(IsAnagram("hello", "oellh"));
 SHOW(IsAnagram("hello", "olelh"));
 SHOW(IsAnagram("hello", "elelh"));
}

Program output

IsAnagram("0円0円0円0円0円", "0円lehl") = true
IsAnagram("hello", "") = false
IsAnagram("0円0円0円0円0円", "olehl") = false
IsAnagram("hello", "ole") = false
IsAnagram("hello", "plehl") = false
IsAnagram("hello", "hello") = true
IsAnagram("hello", "12345") = false
IsAnagram("hello", "Hello") = false
IsAnagram("hello", "oellh") = true
IsAnagram("hello", "olelh") = true
IsAnagram("hello", "elelh") = false

Question 5

In order to be an anagram, all that is required is that the frequencies of characters in the strings be equal.

/*
 * Time : O(n)
 * Space: O(1)
 */
bool IsAnagram(const std::string &str1, const std::string &str2)
{
 int frequencies[256] {};
 for (int i = 0; i < str1.length(); i++)
 {
 int bucket = (unsigned char) str1[i];
 frequencies[bucket]++;
 }
 for (int i = 0; i < str2.length(); i++)
 {
 int bucket = (unsigned char) str2[i];
 frequencies[bucket]--;
 }
 for (int i = 0; i < 256; i++)
 {
 if (frequencies[i] != 0)
 return false;
 }
 return true;
}

Apologize for any C++ errors, I'm mainly a Java/standard C coder.

Question 6

Good answer! I'd point out, though, that std::string is an alias for std::basic_string<char> and that char can be either signed or unsigned, so your code might not work as you intend, but the basic idea is certainly sound.

Question 7

Nice catch! Edited the answer to reflect it.

Question 8

Why use memset() instead of direcly initializing? You risk getting the size wrong, as happened here.

Question 9

also good catch! i didn't know about the various initialization syntaxes when i wrote this. edited to reflect.

Question 10

Probably better to use UCHAR_MAX+1 rather than 256 in this code, to be maximally portable. Or accept more string types, and deduce the histogram size from the string's character type.

Question 11

Code

I would use two for loops instead of a for loop and a while loop. This way, you shorten your code, and it is more obvious what you are doing. Also, your variable j is initialized as part of the loop and has the scope of the inner loop only, not of the outer loop. You can also remove the found variable this way.

As janos suggested, you should remove the characters when found, not just turn them into NULL; this could significantly speed your search up.

Anagram

You might want to do are convert all strings to lower case. This will allow the input to consist of uppercase and lowercase characters, and still be an anagram if the characters are the same. Also, spaces and punctuation are not always counted as individual anagram characters, so you may wish to remove them.

This is the adjusted loops:

for (int i = 0; i < str1.length(); i++)
{
 for (int j = 0; j < str2.length(); j++)
 {
 if (str1[i] == str2[j])
 {
 str2[j] = NULL;
 break;
 }
 if (j == str2.length() - 1)
 return false;
 }
}

Update

After seeing your question on SO, Cameron's comment gave me an idea. If you #include<algorithm>, you can just do this:

std::sort(str1.begin(), str1.end());
std::sort(str2.begin(), str2.end());
return str1 == str2;

Question 12

You could remove the found variable. As soon as str1[i] == str2[j] is unequal, you can return false. Return true at the end of the method.

You may want to check if str1 and str2 are initialized (unequal to null).

You could perform the check using a single loop. Compare the character in str1 at i with the character in str2 at length - i - 1.

(Another possible solution was to use the reverse iterator.)

Question 13

How about holding a (perhaps balanced) BST to lookup chars?

for each char in string1 do:
 insert char into BST and do nothing if it exists
for each char in string2 do:
 lookup char in BST:
 if char doesn't exist in BST:
 return false
return true

Question 14

Interesting thought, though you will need to make it smarter to handle duplicate characters, like: apostol and ola post... ;-)

Question 15

Thanks for using my name as an example :). Maybe hold the frequency of each char into the node itself? Then when you search, you decrease it, and if any node decreases to sub-zero frequency, you know you have to return false.

Question 16

This review has come out from beyond the grave, but let's try to put my stone to the building.

Reviewing your actual code:

First, you make a early return if the two strings aren't the same size. That's good, probably the first big optimization to do, and yet none of the reviewers did it. Good point for the OP.

When a parameter doesn't have to change, make it const.
Don't assign NULL to a char of a string, instead use '0円' in your case.
Don't compare signed and unsigned types. Use std::size_t for indexes to match the return type of std::string::size() (size_t is in C, while std::size_t is in C++).

Reviewing your logic

Your logic is pretty good, very intuitive, congratulation.

Since it's an interview question, (and requirements can be style over speed), you can also consider alternative options:

Using a "perfect match" standard algorithm, the STL having a function exactly for this purpose: std::is_permutation (\$O(N^2)\$ in the worst case)

bool IsAnagramPermutation(const std::string& lhs, const std::string& rhs)
{
 //return lhs.length() == rhs.length() && std::is_permutation(lhs.cbegin(), lhs.cend(), rhs.cbegin());
 // the following overload is valid since c++14. For c++11 compatible, use the one above.
 return std::is_permutation(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend());
}

Using standard algorithm for sorting and return if both are same. (\$O(N\log N)\$ in the worst case)

bool IsAnagramSorting(std::string lhs, std::string rhs)
{
 if (lhs.length() != rhs.length()) return false;
 std::sort(lhs.begin(), lhs.end());
 std::sort(rhs.begin(), rhs.end());
 return lhs == rhs;
}

Building a frequency table and then return false if at least one char has a different count in the two strings. You can vary container type; since the size is known at compile time, I'll choose sts::array (\$O(N)\$ in the worst case?)

bool IsAnagram(const std::string& lhs, const std::string& rhs)
{
 const auto length = lhs.length();
 if (length != rhs.length()) return false;
 std::array<int, 256> frequencies{}; 
 for (std::size_t index = 0; index < length; ++index)
 {
 if (lhs[index] != rhs[index]) {
 ++frequencies[static_cast<unsigned char>(lhs[index])];
 --frequencies[static_cast<unsigned char>(rhs[index])];
 }
 }
 for (auto frequency : frequencies)
 {
 if (frequency) return false;
 }
 return true;
}

Reviewing ... my review !?

In fact, once all little mistakes fixed, the OP method outperforms all other methods (by far) on GCC. On Clang, std::is_permutation and std::sort are both faster, depending on the inputs.

If you want to test it, there's the benchmark. I commented out some bench, because quick-bench can't deal with all cases (timeout).

Once again, before making assumptions, let's measure!

Question 17

Regarding optimization by size: What is an anagram? Particularly is Calak an anagram of Al Ack?

Question 18

I didn't understand. I miss some subtleties language. What's the joke about the anagram? And no, Calak is just an old old old nickname, that I still use (I'm too much 1rst degree, I guess).

Question 19

I'm sorry I didnt intend it as a joke. The question is: does whitespace invalidate an anagram? If not then the program needs to clear spaces before checking size. (This would apply to the sorted strings too but none of those answers are recent. )

Question 20

Oh, I see! Yeah, it would be a nice evolution, "stripping space or not". And, did you look at the benchmark? Did you get surprised?

Question 21

TBH I don't know enough to have had an expectation prior to the benchmark. Therefore I can't get surprised.

Question 22

Time complexity here is O(n2), whereas if we sort both the keys first, worst case would be O(n2) and O(n2) which is not really optimized, although it really is easy to write.

 bool isAnagram(string key1,string key2){
 int len=0;
 if(key.length()!=key1.length()){return false;}
 for (int i=0;i<key1.length();i++){
 for (int j=0;j<key1.length();j++){
 if(key1[i]==key2[j]){
 len++;
 key2[j] = ' ';//to deal with the duplicates
 break;
 }
 
 }
 }
if (len==key.length()){return true;}
else{return false;}
}

Question 23

Welcome to code review! In the future, try to explain what you think is wrong with the code rather than just giving an alternate solution.

Question 24

thanks all for your kind suggestions, i'll improve by time.

Edward Edward 67.2k4 gold badges120 silver badges284 bronze badges · Answer 1 · 2014-11-14 20:05:36Z

Since you only modify a copy of the second string, the first one should be a const reference:

bool IsAnagram(const std::string &str1, std::string str2)

Also, i and j should be of type size_t to match what they're compared with.

However, I think I'd do it like this:

bool IsAnagram2(std::string str1, std::string str2)
{
 std::sort(str1.begin(), str1.end());
 std::sort(str2.begin(), str2.end());
 return str1==str2;
}

Test program:

#include <iostream>
#include <string>
#include <algorithm>
#define SHOW(x) std::cout << # x " = " << x << '\n'
int main()
{
 std::cout << std::boolalpha;
 SHOW(IsAnagram("0円0円0円0円0円", "0円lehl")); 
 SHOW(IsAnagram("hello", "")); 
 SHOW(IsAnagram("0円0円0円0円0円", "olehl")); 
 SHOW(IsAnagram("hello", "ole")); 
 SHOW(IsAnagram("hello", "plehl")); 
 SHOW(IsAnagram("hello", "hello")); 
 SHOW(IsAnagram("hello", "12345")); 
 SHOW(IsAnagram("hello", "Hello"));
 SHOW(IsAnagram("hello", "oellh"));
 SHOW(IsAnagram("hello", "olelh"));
 SHOW(IsAnagram("hello", "elelh"));
}

Program output

IsAnagram("0円0円0円0円0円", "0円lehl") = true
IsAnagram("hello", "") = false
IsAnagram("0円0円0円0円0円", "olehl") = false
IsAnagram("hello", "ole") = false
IsAnagram("hello", "plehl") = false
IsAnagram("hello", "hello") = true
IsAnagram("hello", "12345") = false
IsAnagram("hello", "Hello") = false
IsAnagram("hello", "oellh") = true
IsAnagram("hello", "olelh") = true
IsAnagram("hello", "elelh") = false

tophyr tophyr 2212 silver badges6 bronze badges · Answer 2 · 2014-11-15 00:05:21Z

12

\$\begingroup\$

In order to be an anagram, all that is required is that the frequencies of characters in the strings be equal.

/*
 * Time : O(n)
 * Space: O(1)
 */
bool IsAnagram(const std::string &str1, const std::string &str2)
{
 int frequencies[256] {};
 for (int i = 0; i < str1.length(); i++)
 {
 int bucket = (unsigned char) str1[i];
 frequencies[bucket]++;
 }
 for (int i = 0; i < str2.length(); i++)
 {
 int bucket = (unsigned char) str2[i];
 frequencies[bucket]--;
 }
 for (int i = 0; i < 256; i++)
 {
 if (frequencies[i] != 0)
 return false;
 }
 return true;
}

Apologize for any C++ errors, I'm mainly a Java/standard C coder.

Share

edited Nov 26, 2018 at 14:36

answered Nov 15, 2014 at 0:05

tophyr's user avatar

tophyr tophyr

2212 silver badges6 bronze badges

\$\endgroup\$

5

2

\$\begingroup\$ Good answer! I'd point out, though, that std::string is an alias for std::basic_string<char> and that char can be either signed or unsigned, so your code might not work as you intend, but the basic idea is certainly sound. \$\endgroup\$

Edward
– Edward

2014年11月15日 00:46:50 +00:00
Commented Nov 15, 2014 at 0:46
\$\begingroup\$ Nice catch! Edited the answer to reflect it. \$\endgroup\$

tophyr
– tophyr

2014年11月15日 00:51:01 +00:00
Commented Nov 15, 2014 at 0:51
1

\$\begingroup\$ Why use memset() instead of direcly initializing? You risk getting the size wrong, as happened here. \$\endgroup\$

Deduplicator
– Deduplicator

2018年11月25日 12:24:57 +00:00
Commented Nov 25, 2018 at 12:24
\$\begingroup\$ also good catch! i didn't know about the various initialization syntaxes when i wrote this. edited to reflect. \$\endgroup\$

tophyr
– tophyr

2018年11月26日 14:36:13 +00:00
Commented Nov 26, 2018 at 14:36
\$\begingroup\$ Probably better to use UCHAR_MAX+1 rather than 256 in this code, to be maximally portable. Or accept more string types, and deduce the histogram size from the string's character type. \$\endgroup\$

Toby Speight
– Toby Speight

2022年08月21日 15:02:20 +00:00
Commented Aug 21, 2022 at 15:02

Add a comment |

user34073user34073 · Answer 3 · 2014-11-14 19:55:44Z

Code

I would use two for loops instead of a for loop and a while loop. This way, you shorten your code, and it is more obvious what you are doing. Also, your variable j is initialized as part of the loop and has the scope of the inner loop only, not of the outer loop. You can also remove the found variable this way.

As janos suggested, you should remove the characters when found, not just turn them into NULL; this could significantly speed your search up.

Anagram

You might want to do are convert all strings to lower case. This will allow the input to consist of uppercase and lowercase characters, and still be an anagram if the characters are the same. Also, spaces and punctuation are not always counted as individual anagram characters, so you may wish to remove them.

This is the adjusted loops:

for (int i = 0; i < str1.length(); i++)
{
 for (int j = 0; j < str2.length(); j++)
 {
 if (str1[i] == str2[j])
 {
 str2[j] = NULL;
 break;
 }
 if (j == str2.length() - 1)
 return false;
 }
}

Update

After seeing your question on SO, Cameron's comment gave me an idea. If you #include<algorithm>, you can just do this:

std::sort(str1.begin(), str1.end());
std::sort(str2.begin(), str2.end());
return str1 == str2;

nrainer nrainer 1711 silver badge7 bronze badges · Answer 4 · 2014-11-14 19:31:37Z

You could remove the found variable. As soon as str1[i] == str2[j] is unequal, you can return false. Return true at the end of the method.

You may want to check if str1 and str2 are initialized (unequal to null).

You could perform the check using a single loop. Compare the character in str1 at i with the character in str2 at length - i - 1.

(Another possible solution was to use the reverse iterator.)

Claudiu Apostol Claudiu Apostol 311 bronze badge · Answer 5 · 2015-03-27 11:22:25Z

3

\$\begingroup\$

How about holding a (perhaps balanced) BST to lookup chars?

for each char in string1 do:
 insert char into BST and do nothing if it exists
for each char in string2 do:
 lookup char in BST:
 if char doesn't exist in BST:
 return false
return true

Share

edited Oct 3, 2015 at 1:35

Jamal's user avatar

Jamal

35.2k13 gold badges134 silver badges238 bronze badges

answered Mar 27, 2015 at 11:22

Claudiu Apostol's user avatar

Claudiu Apostol Claudiu Apostol

311 bronze badge

\$\endgroup\$

2

\$\begingroup\$ Interesting thought, though you will need to make it smarter to handle duplicate characters, like: apostol and ola post... ;-) \$\endgroup\$

rolfl
– rolfl

2015年03月27日 11:58:16 +00:00
Commented Mar 27, 2015 at 11:58
\$\begingroup\$ Thanks for using my name as an example :). Maybe hold the frequency of each char into the node itself? Then when you search, you decrease it, and if any node decreases to sub-zero frequency, you know you have to return false. \$\endgroup\$

Claudiu Apostol
– Claudiu Apostol

2015年03月28日 15:12:22 +00:00
Commented Mar 28, 2015 at 15:12

Add a comment |

Calak Calak 2,39111 silver badges19 bronze badges · Answer 6 · 2018-11-25 15:39:36Z

This review has come out from beyond the grave, but let's try to put my stone to the building.

Reviewing your actual code:

First, you make a early return if the two strings aren't the same size. That's good, probably the first big optimization to do, and yet none of the reviewers did it. Good point for the OP.

When a parameter doesn't have to change, make it const.
Don't assign NULL to a char of a string, instead use '0円' in your case.
Don't compare signed and unsigned types. Use std::size_t for indexes to match the return type of std::string::size() (size_t is in C, while std::size_t is in C++).

Reviewing your logic

Your logic is pretty good, very intuitive, congratulation.

Since it's an interview question, (and requirements can be style over speed), you can also consider alternative options:

Using a "perfect match" standard algorithm, the STL having a function exactly for this purpose: std::is_permutation (\$O(N^2)\$ in the worst case)

bool IsAnagramPermutation(const std::string& lhs, const std::string& rhs)
{
 //return lhs.length() == rhs.length() && std::is_permutation(lhs.cbegin(), lhs.cend(), rhs.cbegin());
 // the following overload is valid since c++14. For c++11 compatible, use the one above.
 return std::is_permutation(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend());
}

Using standard algorithm for sorting and return if both are same. (\$O(N\log N)\$ in the worst case)

bool IsAnagramSorting(std::string lhs, std::string rhs)
{
 if (lhs.length() != rhs.length()) return false;
 std::sort(lhs.begin(), lhs.end());
 std::sort(rhs.begin(), rhs.end());
 return lhs == rhs;
}

Building a frequency table and then return false if at least one char has a different count in the two strings. You can vary container type; since the size is known at compile time, I'll choose sts::array (\$O(N)\$ in the worst case?)

bool IsAnagram(const std::string& lhs, const std::string& rhs)
{
 const auto length = lhs.length();
 if (length != rhs.length()) return false;
 std::array<int, 256> frequencies{}; 
 for (std::size_t index = 0; index < length; ++index)
 {
 if (lhs[index] != rhs[index]) {
 ++frequencies[static_cast<unsigned char>(lhs[index])];
 --frequencies[static_cast<unsigned char>(rhs[index])];
 }
 }
 for (auto frequency : frequencies)
 {
 if (frequency) return false;
 }
 return true;
}

Reviewing ... my review !?

In fact, once all little mistakes fixed, the OP method outperforms all other methods (by far) on GCC. On Clang, std::is_permutation and std::sort are both faster, depending on the inputs.

If you want to test it, there's the benchmark. I commented out some bench, because quick-bench can't deal with all cases (timeout).

Once again, before making assumptions, let's measure!

Regarding optimization by size: What is an anagram? Particularly is Calak an anagram of Al Ack?
I didn't understand. I miss some subtleties language. What's the joke about the anagram? And no, Calak is just an old old old nickname, that I still use (I'm too much 1rst degree, I guess).
I'm sorry I didnt intend it as a joke. The question is: does whitespace invalidate an anagram? If not then the program needs to clear spaces before checking size. (This would apply to the sorted strings too but none of those answers are recent. )
Oh, I see! Yeah, it would be a nice evolution, "stripping space or not". And, did you look at the benchmark? Did you get surprised?
TBH I don't know enough to have had an expectation prior to the benchmark. Therefore I can't get surprised.

Shaheer Hassan Shaheer Hassan 13 bronze badges · Answer 7 · 2018-11-23 17:45:53Z

Time complexity here is O(n2), whereas if we sort both the keys first, worst case would be O(n2) and O(n2) which is not really optimized, although it really is easy to write.

 bool isAnagram(string key1,string key2){
 int len=0;
 if(key.length()!=key1.length()){return false;}
 for (int i=0;i<key1.length();i++){
 for (int j=0;j<key1.length();j++){
 if(key1[i]==key2[j]){
 len++;
 key2[j] = ' ';//to deal with the duplicates
 break;
 }
 
 }
 }
if (len==key.length()){return true;}
else{return false;}
}

Welcome to code review! In the future, try to explain what you think is wrong with the code rather than just giving an alternate solution.

Stack Exchange Network

Check if two strings are anagrams

7 Answers 7

Test program:

Program output

Code

Anagram

Update

Reviewing your actual code:

Reviewing your logic

Reviewing ... my review !?

Once again, before making assumptions, let's measure!

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Check if two strings are anagrams

7 Answers 7

Test program:

Program output

Code

Anagram

Update

Reviewing your actual code:

Reviewing your logic

Reviewing ... my review !?

Once again, before making assumptions, let's measure!

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions