I'm doing some practice questions from the book Cracking the coding interview and wanted to get some people to review my code for bugs and optimizations.
Question:
Write a method to decide if two strings are anagrams or not.
/*
Time complexity: O(n^2)
Space complexity: O(n)
*/
bool IsAnagram(std::string str1, std::string str2)
{
if(str1.length() != str2.length())
return false;
for(int i = 0; i < str1.length();i++)
{
bool found = false;
int j = 0;
while(!found && j < str2.length())
{
if(str1[i] == str2[j])
{
found = true;
str2[j] = NULL;
}
j++;
}
if(!found)
return false;
}
return true;
}
-
\$\begingroup\$ Don't know if you're interested (since it doesn't really review your code, but is an optimization), but one algorithm that comes to mind would be to sort both the strings and then do a compare. n log n (assuming an n log n sort) \$\endgroup\$Beska– Beska2014年11月14日 21:53:01 +00:00Commented Nov 14, 2014 at 21:53
-
\$\begingroup\$ Consider uppercasing the strings, sorting the characters in the string alphabetically, then doing a equality test between them. \$\endgroup\$EvilTeach– EvilTeach2014年11月15日 02:03:47 +00:00Commented Nov 15, 2014 at 2:03
7 Answers 7
Since you only modify a copy of the second string, the first one should be a const
reference:
bool IsAnagram(const std::string &str1, std::string str2)
Also, i
and j
should be of type size_t
to match what they're compared with.
However, I think I'd do it like this:
bool IsAnagram2(std::string str1, std::string str2)
{
std::sort(str1.begin(), str1.end());
std::sort(str2.begin(), str2.end());
return str1==str2;
}
Test program:
#include <iostream>
#include <string>
#include <algorithm>
#define SHOW(x) std::cout << # x " = " << x << '\n'
int main()
{
std::cout << std::boolalpha;
SHOW(IsAnagram("0円0円0円0円0円", "0円lehl"));
SHOW(IsAnagram("hello", ""));
SHOW(IsAnagram("0円0円0円0円0円", "olehl"));
SHOW(IsAnagram("hello", "ole"));
SHOW(IsAnagram("hello", "plehl"));
SHOW(IsAnagram("hello", "hello"));
SHOW(IsAnagram("hello", "12345"));
SHOW(IsAnagram("hello", "Hello"));
SHOW(IsAnagram("hello", "oellh"));
SHOW(IsAnagram("hello", "olelh"));
SHOW(IsAnagram("hello", "elelh"));
}
Program output
IsAnagram("0円0円0円0円0円", "0円lehl") = true
IsAnagram("hello", "") = false
IsAnagram("0円0円0円0円0円", "olehl") = false
IsAnagram("hello", "ole") = false
IsAnagram("hello", "plehl") = false
IsAnagram("hello", "hello") = true
IsAnagram("hello", "12345") = false
IsAnagram("hello", "Hello") = false
IsAnagram("hello", "oellh") = true
IsAnagram("hello", "olelh") = true
IsAnagram("hello", "elelh") = false
In order to be an anagram, all that is required is that the frequencies of characters in the strings be equal.
/*
* Time : O(n)
* Space: O(1)
*/
bool IsAnagram(const std::string &str1, const std::string &str2)
{
int frequencies[256] {};
for (int i = 0; i < str1.length(); i++)
{
int bucket = (unsigned char) str1[i];
frequencies[bucket]++;
}
for (int i = 0; i < str2.length(); i++)
{
int bucket = (unsigned char) str2[i];
frequencies[bucket]--;
}
for (int i = 0; i < 256; i++)
{
if (frequencies[i] != 0)
return false;
}
return true;
}
Apologize for any C++ errors, I'm mainly a Java/standard C coder.
-
2\$\begingroup\$ Good answer! I'd point out, though, that
std::string
is an alias forstd::basic_string<char>
and thatchar
can be either signed or unsigned, so your code might not work as you intend, but the basic idea is certainly sound. \$\endgroup\$Edward– Edward2014年11月15日 00:46:50 +00:00Commented Nov 15, 2014 at 0:46 -
\$\begingroup\$ Nice catch! Edited the answer to reflect it. \$\endgroup\$tophyr– tophyr2014年11月15日 00:51:01 +00:00Commented Nov 15, 2014 at 0:51
-
1\$\begingroup\$ Why use
memset()
instead of direcly initializing? You risk getting the size wrong, as happened here. \$\endgroup\$Deduplicator– Deduplicator2018年11月25日 12:24:57 +00:00Commented Nov 25, 2018 at 12:24 -
\$\begingroup\$ also good catch! i didn't know about the various initialization syntaxes when i wrote this. edited to reflect. \$\endgroup\$tophyr– tophyr2018年11月26日 14:36:13 +00:00Commented Nov 26, 2018 at 14:36
-
\$\begingroup\$ Probably better to use
UCHAR_MAX+1
rather than 256 in this code, to be maximally portable. Or accept more string types, and deduce the histogram size from the string's character type. \$\endgroup\$Toby Speight– Toby Speight2022年08月21日 15:02:20 +00:00Commented Aug 21, 2022 at 15:02
Code
I would use two for
loops instead of a for
loop and a while
loop. This way, you shorten your code, and it is more obvious what you are doing. Also, your variable j
is initialized as part of the loop and has the scope of the inner loop only, not of the outer loop. You can also remove the found
variable this way.
As janos suggested, you should remove the characters when found, not just turn them into NULL; this could significantly speed your search up.
Anagram
You might want to do are convert all strings to lower case. This will allow the input to consist of uppercase and lowercase characters, and still be an anagram if the characters are the same. Also, spaces and punctuation are not always counted as individual anagram characters, so you may wish to remove them.
This is the adjusted loops:
for (int i = 0; i < str1.length(); i++)
{
for (int j = 0; j < str2.length(); j++)
{
if (str1[i] == str2[j])
{
str2[j] = NULL;
break;
}
if (j == str2.length() - 1)
return false;
}
}
Update
After seeing your question on SO, Cameron's comment gave me an idea. If you #include<algorithm>
, you can just do this:
std::sort(str1.begin(), str1.end());
std::sort(str2.begin(), str2.end());
return str1 == str2;
You could remove the found
variable.
As soon as str1[i] == str2[j]
is unequal, you can return false.
Return true at the end of the method.
You may want to check if str1
and str2
are initialized (unequal to null).
You could perform the check using a single loop. Compare the character in str1
at i
with the character in str2
at length - i - 1
.
(Another possible solution was to use the reverse iterator.)
How about holding a (perhaps balanced) BST to lookup char
s?
for each char in string1 do:
insert char into BST and do nothing if it exists
for each char in string2 do:
lookup char in BST:
if char doesn't exist in BST:
return false
return true
-
\$\begingroup\$ Interesting thought, though you will need to make it smarter to handle duplicate characters, like:
apostol
andola post
... ;-) \$\endgroup\$rolfl– rolfl2015年03月27日 11:58:16 +00:00Commented Mar 27, 2015 at 11:58 -
\$\begingroup\$ Thanks for using my name as an example :). Maybe hold the frequency of each char into the node itself? Then when you search, you decrease it, and if any node decreases to sub-zero frequency, you know you have to return false. \$\endgroup\$Claudiu Apostol– Claudiu Apostol2015年03月28日 15:12:22 +00:00Commented Mar 28, 2015 at 15:12
This review has come out from beyond the grave, but let's try to put my stone to the building.
Reviewing your actual code:
First, you make a early return if the two strings aren't the same size. That's good, probably the first big optimization to do, and yet none of the reviewers did it. Good point for the OP.
- When a parameter doesn't have to change, make it
const
. - Don't assign
NULL
to a char of a string, instead use'0円'
in your case. - Don't compare signed and unsigned types. Use
std::size_t
for indexes to match the return type ofstd::string::size()
(size_t
is in C, whilestd::size_t
is in C++).
Reviewing your logic
Your logic is pretty good, very intuitive, congratulation.
Since it's an interview question, (and requirements can be style over speed), you can also consider alternative options:
Using a "perfect match" standard algorithm, the STL having a function exactly for this purpose: std::is_permutation (\$O(N^2)\$ in the worst case)
bool IsAnagramPermutation(const std::string& lhs, const std::string& rhs) { //return lhs.length() == rhs.length() && std::is_permutation(lhs.cbegin(), lhs.cend(), rhs.cbegin()); // the following overload is valid since c++14. For c++11 compatible, use the one above. return std::is_permutation(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend()); }
Using standard algorithm for sorting and return if both are same. (\$O(N\log N)\$ in the worst case)
bool IsAnagramSorting(std::string lhs, std::string rhs) { if (lhs.length() != rhs.length()) return false; std::sort(lhs.begin(), lhs.end()); std::sort(rhs.begin(), rhs.end()); return lhs == rhs; }
Building a frequency table and then return false if at least one char has a different count in the two strings. You can vary container type; since the size is known at compile time, I'll choose
sts::array
(\$O(N)\$ in the worst case?)bool IsAnagram(const std::string& lhs, const std::string& rhs) { const auto length = lhs.length(); if (length != rhs.length()) return false; std::array<int, 256> frequencies{}; for (std::size_t index = 0; index < length; ++index) { if (lhs[index] != rhs[index]) { ++frequencies[static_cast<unsigned char>(lhs[index])]; --frequencies[static_cast<unsigned char>(rhs[index])]; } } for (auto frequency : frequencies) { if (frequency) return false; } return true; }
Reviewing ... my review !?
In fact, once all little mistakes fixed, the OP method outperforms all other methods (by far) on GCC. On Clang, std::is_permutation
and std::sort
are both faster, depending on the inputs.
If you want to test it, there's the benchmark. I commented out some bench, because quick-bench can't deal with all cases (timeout).
Once again, before making assumptions, let's measure!
-
\$\begingroup\$ Regarding optimization by size: What is an anagram? Particularly is
Calak
an anagram ofAl Ack
? \$\endgroup\$Summer– Summer2018年11月25日 18:31:50 +00:00Commented Nov 25, 2018 at 18:31 -
\$\begingroup\$ I didn't understand. I miss some subtleties language. What's the joke about the anagram? And no, Calak is just an old old old nickname, that I still use (I'm too much 1rst degree, I guess). \$\endgroup\$Calak– Calak2018年11月25日 19:18:18 +00:00Commented Nov 25, 2018 at 19:18
-
\$\begingroup\$ I'm sorry I didnt intend it as a joke. The question is: does whitespace invalidate an anagram? If not then the program needs to clear spaces before checking size. (This would apply to the sorted strings too but none of those answers are recent. ) \$\endgroup\$Summer– Summer2018年11月25日 20:16:19 +00:00Commented Nov 25, 2018 at 20:16
-
\$\begingroup\$ Oh, I see! Yeah, it would be a nice evolution, "stripping space or not". And, did you look at the benchmark? Did you get surprised? \$\endgroup\$Calak– Calak2018年11月25日 20:20:21 +00:00Commented Nov 25, 2018 at 20:20
-
\$\begingroup\$ TBH I don't know enough to have had an expectation prior to the benchmark. Therefore I can't get surprised. \$\endgroup\$Summer– Summer2018年11月25日 20:31:08 +00:00Commented Nov 25, 2018 at 20:31
Time complexity here is O(n2), whereas if we sort both the keys first, worst case would be O(n2) and O(n2) which is not really optimized, although it really is easy to write.
bool isAnagram(string key1,string key2){
int len=0;
if(key.length()!=key1.length()){return false;}
for (int i=0;i<key1.length();i++){
for (int j=0;j<key1.length();j++){
if(key1[i]==key2[j]){
len++;
key2[j] = ' ';//to deal with the duplicates
break;
}
}
}
if (len==key.length()){return true;}
else{return false;}
}
-
\$\begingroup\$ Welcome to code review! In the future, try to explain what you think is wrong with the code rather than just giving an alternate solution. \$\endgroup\$Peter– Peter2018年11月23日 18:27:13 +00:00Commented Nov 23, 2018 at 18:27
-
\$\begingroup\$ thanks all for your kind suggestions, i'll improve by time. \$\endgroup\$Shaheer Hassan– Shaheer Hassan2018年11月25日 08:25:52 +00:00Commented Nov 25, 2018 at 8:25