This is my first algorithm in C++ ever to check if two strings are permutations of each other. Please provide guidance on correctness of algorithm + ways to improve + thoughts on my use of the language itself.
The idea is two create two dictionaries which count the number of occurrences of each character using two for loops, and then we use a final for loop to check if the counts are equal.
void checkPermut(std::string &sentence1, std::string &sentence2)
{
std::map<char, int> dict1;
std::map<char, int> dict2;
for (char & c : sentence1)
{
if (dict1.find(c) == dict1.end())
{
dict1.insert(std::pair<char, int>(c, 1));
}
else
{
dict1[c] == dict1[c] + 1;
}
}
for (char & c : sentence2)
{
if (dict2.find(c) == dict2.end())
{
dict2.insert(std::pair<char, int>(c, 1));
}
else
{
dict2[c] == dict2[c] + 1;
}
}
bool arePermut = true;
for (char & c : sentence2)
{
if (dict1.find(c) == dict1.end())
{
arePermut = false;
std::cout << "Not permutations." << std::endl;
break;
}
else if (dict1[c] != dict2[c])
{
arePermut = false;
std::cout << "Not permutations." << std::endl;
break;
}
}
if (arePermut)
{
std::cout << "Are permutations" << std::endl;
}
}
int main()
{
std::string somestring1;
std::string somestring2;
std::cin >> somestring1 >> somestring2;
checkPermut(somestring1, somestring2);
return 0;
}
2 Answers 2
Basic Review
Look below for algorithm ideas:
You are not modifying the input strings. So they should be passed by const reference to prevent accidents:
void checkPermut(std::string const& sentence1, std::string const& sentence2)
^^^^^ ^^^^^
You don't need to check if a value exists in a map like this:
if (dict1.find(c) == dict1.end())
{
dict1.insert(std::pair<char, int>(c, 1));
}
else
{
dict1[c] == dict1[c] + 1;
}
Use operator[]
If it does not exist the item will be created (the value is "zero" initialized" and returned as a reference. You can replace the above code with the line:
++dict1[c];
Be careful when getting things by reference (accidental modification can happen). Unless you need to modify something, get by const reference.
for (char const& c : sentence2)
^^^^^
But in this case I would get by value:
for (char c : sentence2)
There is an easy way to make pairs that does not require all the type info:
std::pair<char, int>(c, 1);
Simpler to write:
std::make_pair(c, 1); // The function gets the type from
// the parameters and creates the
// correct type of underlying pair.
As a side note. technically you may not want to make the assumption that std::map
uses a pair internally. This is why std::map
has an internal type value_type
so you can use that and not make any assumptions about the type used internally.
using Map = std::map<char, int>;
using MapValue = typename Map::value_type;
Map dict1;
...
dict1.insert(MapValue(c, 1));
There is a subtle bug here:
for (char & c : sentence2)
{
if (dict1.find(c) == dict1.end())
{
arePermut = false;
std::cout << "Not permutations." << std::endl;
break;
}
else if (dict1[c] != dict2[c])
{
arePermut = false;
std::cout << "Not permutations." << std::endl;
break;
}
}
You check all the letters from sentence2 are exactly the same as in sentence1. But that may not be enough. You need to check the other way.
// Assume:
Sentence1 = "BigZ"
Sentence2 = "Big"; // Your algorithm will match.
Note most containers are comparable.
There is a subtle bug here:
if (dict1 != dict2)
{
std::cout << "Not permutations." << "\n";
}
else
{
std::cout << "Not permutations." << "\n";
Note: The operator>>
reads a single space separated word. Your code above suggests sentences. Is this what you want?
std::cin >> somestring1 >> somestring2;
Use std::getline(std::cin, somestring1)
to read a line.
This is redundant in main()
:
return 0;
Algorithm
We know you can simply compare the containers. But you can optimize this. By subtracting characters from sentence two from the container. If any become negative then you can exit early.
Also do you want to compare space characters? This makes it really hard to have anagram sentences if the word count has to be the same.
bool checkPermut(std::string const& sentence1, std::string const& sentence2)
{
using Map = std::map<char, int>;
Map count1;
Map zero;
for(auto c: sentence1) {
if (std::isspace(c) {
continue;
}
++count1[c];
zero[c] = 0;
}
for(auto c: sentence2) {
if (std::isspace(c) {
continue;
}
if (--count1[c] < 0) {
return false; // exit early if we go negative.
}
zero[c] = 0;
}
return count1 == zero; // Make sure all letters have reached zero.
}
-
\$\begingroup\$ Ditch the zero map. Just return
std::all_of(count1.begin(), count1.end(), [](std::pair<char, int> p) { return p.second == 0; });
\$\endgroup\$johnchen902– johnchen9022018年07月31日 02:04:20 +00:00Commented Jul 31, 2018 at 2:04 -
\$\begingroup\$ @johnchen902 Don't agree. The point of coding is to express intent as clearly as possible. What you posted is hard to understand therefore should be considered less usable. There are only 255 characters so comparing the maps is a trivial processes. \$\endgroup\$Loki Astari– Loki Astari2018年07月31日 03:52:16 +00:00Commented Jul 31, 2018 at 3:52
-
2\$\begingroup\$ Can I suggest adding basic checks before the algorithm, i.e. comparing the length of the two sentences? Then the second check for letters in B appearing in A is also unneeded. \$\endgroup\$Phil H– Phil H2018年07月31日 06:53:32 +00:00Commented Jul 31, 2018 at 6:53
-
\$\begingroup\$ @MartinYork Isn't this your intent?
// Make sure all letters have reached zero.
Isn't the following the most straightforward way to check it?for (auto [key, value] : count1) if (value) return false; return true;
Personally I'd preferstd::all_of
though. \$\endgroup\$johnchen902– johnchen9022018年07月31日 07:46:06 +00:00Commented Jul 31, 2018 at 7:46 -
\$\begingroup\$ @MartinYork On the other hand, I think checking against
zero
is more difficult to understand. Especially, why are the twozero[c] = 0
needed? (We have to ensure the keys present in both map are the same; that is not obvious IMO. BTW the latterzero[c] = 0
is not needed due to the early exit) \$\endgroup\$johnchen902– johnchen9022018年07月31日 08:00:17 +00:00Commented Jul 31, 2018 at 8:00
Algorithms or data structures
Let's start from an alternative way of checking if the two strings are anagrams: sort them both and compare them for equality:
#include <algorithm>
#include <string>
std::string a, b;
// ...
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
bool is_anagram = a == b;
What is the complexity of this method? sorting is n * log(n)
, comparing is n
, so we have n * log(n) + n * log(n) + n
(given that the length of both strings is the same).
You can notice that it isn't more costly than the method you chose: std::map
is implemented using red-black trees, with log(n)
insertion, so that building one map from a string with n character is n * log(n)
. Comparing between two maps is also n
.
So, should you choose a sorted std::string
or a std::map
, given that the solution in each case has the same algorithmic complexity? The string is generally superior, because its characters are stored in contiguous memory, meaning cache locality and a better allocation strategy than std::map
.
That is an important lesson in C++: algorithms applied to array-like structures are often more efficient than data structured "embedding" an algorithm.
Hashing
Can we now do better? There is a data structure that offers O(n)
insertion and retrieval and an array disposition: hash maps. In c++ they're present as std::unordered_map
. The idea behind the hash map is to associate a value with a key through a hashing function; keys are then positions in an array. So we have: key_n == hash_fn(value_n)
and array[key_n] == value_n
(actually there might be more than on value by key, but this is a good approximation).
In the case of a string with ordinary characters, you can choose the identity function as the hash function and use a simple array as the hash map:
// ...
#include <array>
std::string s;
// ...
std::array<unsigned char, 256> hash_map{}; // the braces are required to ensure that all array elements are initialized to 0
for (auto c : s) ++hash_map[static_cast<unsigned char>(c)];
Checking if two strings are anagrams can now be done by incrementing values in the array for the first string, and decrementing them for the second string:
// ...
#include <array>
std::string a, b;
// ...
if (a.size() != b.size()) return false;
std::array<unsigned char, 256> hash_map{}; // the braces are required to ensure that all array elements are initialized to 0
for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
for (auto c : b) --hash_map[static_cast<unsigned char>(c)];
bool is_anagram = std::none_of(hash_map.begin(), hash_map.end(), std::identity()); // std::identity is c++20, a lambda returning its argument unchanged will do the trick though
We can still do a little better since decrementing a value at zero in the second pass is a sign that the two strings aren't anagrams:
bool are_anagrams(const std::string& a, const std::string& b) {
std::array<unsigned char, 256> hash_map{};
for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
for (auto c : b) if (!hash_map[static_cast<unsigned char>(c)]--) return false;
return std::none_of(hash_map.begin(), hash_map.end(), std::identity());
}
Actually, checking if all values have been decremented to zero is useless if both strings are the same size, so we can simplify a bit further:
bool are_anagrams(const std::string& a, const std::string& b) {
if (a.size() != b.size()) return false;
std::array<unsigned char, 256> hash_map{};
for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
for (auto c : b) if (!hash_map[static_cast<unsigned char>(c)]--) return false;
return true;
}
-
\$\begingroup\$ (1)
char
can be signed or unsigned. Indexing into an array with a negative value is undefined behavior. (2) You make an assumption that a char is 8 bits when it could be fewer or more bits (implementation defined asCHAR_BIT
). The only guarantee is thatsizeof(char)
is 1. (3) This problem deals with sentences, which can be made arbitrarily long by successive iterations. Ischar
a reasonably sized type to count occurrences? Maybe it is. (4)std::identity
doesn't exist and has no pending proposals. \$\endgroup\$Snowhawk– Snowhawk2018年07月31日 09:51:10 +00:00Commented Jul 31, 2018 at 9:51 -
\$\begingroup\$ (5) "FUNERAL" and "REAL FUN" are anagrams as anagrams consider the members of its set as alphanumerics. They are not permutations as members of its set are all representable characters. You should change any reference of anagrams to permutations. \$\endgroup\$Snowhawk– Snowhawk2018年07月31日 09:51:49 +00:00Commented Jul 31, 2018 at 9:51
-
\$\begingroup\$ Did you really mean to say that hash maps have O(n) insertion time, rather than O(1)? Or did you intend that to be the total for inserting n elements? Also, you've written
256
a few times where you clearly meantUCHAR_MAX
. \$\endgroup\$Toby Speight– Toby Speight2018年08月01日 09:28:35 +00:00Commented Aug 1, 2018 at 9:28
dict[c] == dict[c] + 1
. \$\endgroup\$dict
in this context. \$\endgroup\$