Attempt at checking whether two words are permutations

Question 1

This is my first algorithm in C++ ever to check if two strings are permutations of each other. Please provide guidance on correctness of algorithm + ways to improve + thoughts on my use of the language itself.

The idea is two create two dictionaries which count the number of occurrences of each character using two for loops, and then we use a final for loop to check if the counts are equal.

void checkPermut(std::string &sentence1, std::string &sentence2)
{
 std::map<char, int> dict1;
 std::map<char, int> dict2;
 for (char & c : sentence1) 
 {
 if (dict1.find(c) == dict1.end())
 {
 dict1.insert(std::pair<char, int>(c, 1));
 }
 else
 {
 dict1[c] == dict1[c] + 1;
 }
 }
 for (char & c : sentence2)
 {
 if (dict2.find(c) == dict2.end())
 {
 dict2.insert(std::pair<char, int>(c, 1));
 }
 else
 {
 dict2[c] == dict2[c] + 1;
 }
 }
 bool arePermut = true;
 for (char & c : sentence2)
 {
 if (dict1.find(c) == dict1.end())
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 else if (dict1[c] != dict2[c])
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 }
 if (arePermut)
 {
 std::cout << "Are permutations" << std::endl;
 }
}
int main()
{
 std::string somestring1;
 std::string somestring2;
 std::cin >> somestring1 >> somestring2;
 checkPermut(somestring1, somestring2);
 return 0;
}

Question 2

Do you mean that it checks whether two strings are anagrams?

Question 3

Unfortunately, there is a typo in dict[c] == dict[c] + 1.

Question 4

Much simpler approach: order both strings and then compare if equal. no need for loops to check equality. If using native strings, some cost in doing the ordering, but if using an ordered list this becomes trivial.

Question 5

@AJD this method does order both strings, using counting sort. It just doesn't reconstruct the ordered strings.

Question 6

@Snowhawk: Thanks - I missed the significance of dict in this context.

Question 7

Basic Review

Look below for algorithm ideas:

You are not modifying the input strings. So they should be passed by const reference to prevent accidents:

void checkPermut(std::string const& sentence1, std::string const& sentence2)
 ^^^^^ ^^^^^

You don't need to check if a value exists in a map like this:

 if (dict1.find(c) == dict1.end())
 {
 dict1.insert(std::pair<char, int>(c, 1));
 }
 else
 {
 dict1[c] == dict1[c] + 1;
 }

Use operator[] If it does not exist the item will be created (the value is "zero" initialized" and returned as a reference. You can replace the above code with the line:

 ++dict1[c];

Be careful when getting things by reference (accidental modification can happen). Unless you need to modify something, get by const reference.

 for (char const& c : sentence2)
 ^^^^^

But in this case I would get by value:

 for (char c : sentence2)

There is an easy way to make pairs that does not require all the type info:

 std::pair<char, int>(c, 1);

Simpler to write:

 std::make_pair(c, 1); // The function gets the type from
 // the parameters and creates the
 // correct type of underlying pair.

As a side note. technically you may not want to make the assumption that std::map uses a pair internally. This is why std::map has an internal type value_type so you can use that and not make any assumptions about the type used internally.

 using Map = std::map<char, int>;
 using MapValue = typename Map::value_type;
 Map dict1;
 ...
 dict1.insert(MapValue(c, 1));

There is a subtle bug here:

 for (char & c : sentence2)
 {
 if (dict1.find(c) == dict1.end())
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 else if (dict1[c] != dict2[c])
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 }

You check all the letters from sentence2 are exactly the same as in sentence1. But that may not be enough. You need to check the other way.

 // Assume: 
 Sentence1 = "BigZ"
 Sentence2 = "Big"; // Your algorithm will match.

Note most containers are comparable.

There is a subtle bug here:

 if (dict1 != dict2)
 {
 std::cout << "Not permutations." << "\n";
 }
 else
 {
 std::cout << "Not permutations." << "\n";

Note: The operator>> reads a single space separated word. Your code above suggests sentences. Is this what you want?

 std::cin >> somestring1 >> somestring2;

Use std::getline(std::cin, somestring1) to read a line.

This is redundant in main():

 return 0;

Algorithm

We know you can simply compare the containers. But you can optimize this. By subtracting characters from sentence two from the container. If any become negative then you can exit early.

Also do you want to compare space characters? This makes it really hard to have anagram sentences if the word count has to be the same.

bool checkPermut(std::string const& sentence1, std::string const& sentence2)
{
 using Map = std::map<char, int>;
 Map count1;
 Map zero;
 for(auto c: sentence1) {
 if (std::isspace(c) {
 continue;
 }
 ++count1[c];
 zero[c] = 0;
 }
 for(auto c: sentence2) {
 if (std::isspace(c) {
 continue;
 }
 if (--count1[c] < 0) {
 return false; // exit early if we go negative.
 }
 zero[c] = 0;
 }
 return count1 == zero; // Make sure all letters have reached zero.
}

Question 8

Ditch the zero map. Just return std::all_of(count1.begin(), count1.end(), [](std::pair<char, int> p) { return p.second == 0; });

Question 9

@johnchen902 Don't agree. The point of coding is to express intent as clearly as possible. What you posted is hard to understand therefore should be considered less usable. There are only 255 characters so comparing the maps is a trivial processes.

Question 10

Can I suggest adding basic checks before the algorithm, i.e. comparing the length of the two sentences? Then the second check for letters in B appearing in A is also unneeded.

Question 11

@MartinYork Isn't this your intent? // Make sure all letters have reached zero. Isn't the following the most straightforward way to check it? for (auto [key, value] : count1) if (value) return false; return true; Personally I'd prefer std::all_of though.

Question 12

@MartinYork On the other hand, I think checking against zero is more difficult to understand. Especially, why are the two zero[c] = 0 needed? (We have to ensure the keys present in both map are the same; that is not obvious IMO. BTW the latter zero[c] = 0 is not needed due to the early exit)

Question 13

Algorithms or data structures

Let's start from an alternative way of checking if the two strings are anagrams: sort them both and compare them for equality:

#include <algorithm>
#include <string>
std::string a, b;
// ...
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
bool is_anagram = a == b;

What is the complexity of this method? sorting is n * log(n), comparing is n, so we have n * log(n) + n * log(n) + n (given that the length of both strings is the same).

You can notice that it isn't more costly than the method you chose: std::map is implemented using red-black trees, with log(n) insertion, so that building one map from a string with n character is n * log(n). Comparing between two maps is also n.

So, should you choose a sorted std::string or a std::map, given that the solution in each case has the same algorithmic complexity? The string is generally superior, because its characters are stored in contiguous memory, meaning cache locality and a better allocation strategy than std::map.

That is an important lesson in C++: algorithms applied to array-like structures are often more efficient than data structured "embedding" an algorithm.

Hashing

Can we now do better? There is a data structure that offers O(n) insertion and retrieval and an array disposition: hash maps. In c++ they're present as std::unordered_map. The idea behind the hash map is to associate a value with a key through a hashing function; keys are then positions in an array. So we have: key_n == hash_fn(value_n) and array[key_n] == value_n (actually there might be more than on value by key, but this is a good approximation).

In the case of a string with ordinary characters, you can choose the identity function as the hash function and use a simple array as the hash map:

// ...
#include <array>
std::string s;
// ...
std::array<unsigned char, 256> hash_map{}; // the braces are required to ensure that all array elements are initialized to 0
for (auto c : s) ++hash_map[static_cast<unsigned char>(c)];

Checking if two strings are anagrams can now be done by incrementing values in the array for the first string, and decrementing them for the second string:

// ...
#include <array>
std::string a, b;
// ...
if (a.size() != b.size()) return false;
std::array<unsigned char, 256> hash_map{}; // the braces are required to ensure that all array elements are initialized to 0
for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
for (auto c : b) --hash_map[static_cast<unsigned char>(c)];
bool is_anagram = std::none_of(hash_map.begin(), hash_map.end(), std::identity()); // std::identity is c++20, a lambda returning its argument unchanged will do the trick though

We can still do a little better since decrementing a value at zero in the second pass is a sign that the two strings aren't anagrams:

bool are_anagrams(const std::string& a, const std::string& b) {
 std::array<unsigned char, 256> hash_map{};
 for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
 for (auto c : b) if (!hash_map[static_cast<unsigned char>(c)]--) return false;
 return std::none_of(hash_map.begin(), hash_map.end(), std::identity());
}

Actually, checking if all values have been decremented to zero is useless if both strings are the same size, so we can simplify a bit further:

bool are_anagrams(const std::string& a, const std::string& b) {
 if (a.size() != b.size()) return false;
 std::array<unsigned char, 256> hash_map{};
 for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
 for (auto c : b) if (!hash_map[static_cast<unsigned char>(c)]--) return false;
 return true;
}

Question 14

(1)char can be signed or unsigned. Indexing into an array with a negative value is undefined behavior. (2) You make an assumption that a char is 8 bits when it could be fewer or more bits (implementation defined as CHAR_BIT). The only guarantee is that sizeof(char) is 1. (3) This problem deals with sentences, which can be made arbitrarily long by successive iterations. Is char a reasonably sized type to count occurrences? Maybe it is. (4) std::identity doesn't exist and has no pending proposals.

Question 15

(5) "FUNERAL" and "REAL FUN" are anagrams as anagrams consider the members of its set as alphanumerics. They are not permutations as members of its set are all representable characters. You should change any reference of anagrams to permutations.

Question 16

Did you really mean to say that hash maps have O(n) insertion time, rather than O(1)? Or did you intend that to be the total for inserting n elements? Also, you've written 256 a few times where you clearly meant UCHAR_MAX.

Loki Astari Loki Astari 97.7k5 gold badges126 silver badges341 bronze badges · Answer 1 · 2018-07-31 00:25:55Z

Basic Review

Look below for algorithm ideas:

You are not modifying the input strings. So they should be passed by const reference to prevent accidents:

void checkPermut(std::string const& sentence1, std::string const& sentence2)
 ^^^^^ ^^^^^

You don't need to check if a value exists in a map like this:

 if (dict1.find(c) == dict1.end())
 {
 dict1.insert(std::pair<char, int>(c, 1));
 }
 else
 {
 dict1[c] == dict1[c] + 1;
 }

Use operator[] If it does not exist the item will be created (the value is "zero" initialized" and returned as a reference. You can replace the above code with the line:

 ++dict1[c];

Be careful when getting things by reference (accidental modification can happen). Unless you need to modify something, get by const reference.

 for (char const& c : sentence2)
 ^^^^^

But in this case I would get by value:

 for (char c : sentence2)

There is an easy way to make pairs that does not require all the type info:

 std::pair<char, int>(c, 1);

Simpler to write:

 std::make_pair(c, 1); // The function gets the type from
 // the parameters and creates the
 // correct type of underlying pair.

As a side note. technically you may not want to make the assumption that std::map uses a pair internally. This is why std::map has an internal type value_type so you can use that and not make any assumptions about the type used internally.

 using Map = std::map<char, int>;
 using MapValue = typename Map::value_type;
 Map dict1;
 ...
 dict1.insert(MapValue(c, 1));

There is a subtle bug here:

 for (char & c : sentence2)
 {
 if (dict1.find(c) == dict1.end())
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 else if (dict1[c] != dict2[c])
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 }

You check all the letters from sentence2 are exactly the same as in sentence1. But that may not be enough. You need to check the other way.

 // Assume: 
 Sentence1 = "BigZ"
 Sentence2 = "Big"; // Your algorithm will match.

Note most containers are comparable.

There is a subtle bug here:

 if (dict1 != dict2)
 {
 std::cout << "Not permutations." << "\n";
 }
 else
 {
 std::cout << "Not permutations." << "\n";

Note: The operator>> reads a single space separated word. Your code above suggests sentences. Is this what you want?

 std::cin >> somestring1 >> somestring2;

Use std::getline(std::cin, somestring1) to read a line.

This is redundant in main():

 return 0;

Algorithm

We know you can simply compare the containers. But you can optimize this. By subtracting characters from sentence two from the container. If any become negative then you can exit early.

Also do you want to compare space characters? This makes it really hard to have anagram sentences if the word count has to be the same.

bool checkPermut(std::string const& sentence1, std::string const& sentence2)
{
 using Map = std::map<char, int>;
 Map count1;
 Map zero;
 for(auto c: sentence1) {
 if (std::isspace(c) {
 continue;
 }
 ++count1[c];
 zero[c] = 0;
 }
 for(auto c: sentence2) {
 if (std::isspace(c) {
 continue;
 }
 if (--count1[c] < 0) {
 return false; // exit early if we go negative.
 }
 zero[c] = 0;
 }
 return count1 == zero; // Make sure all letters have reached zero.
}

Ditch the zero map. Just return std::all_of(count1.begin(), count1.end(), [](std::pair<char, int> p) { return p.second == 0; });
@johnchen902 Don't agree. The point of coding is to express intent as clearly as possible. What you posted is hard to understand therefore should be considered less usable. There are only 255 characters so comparing the maps is a trivial processes.
Can I suggest adding basic checks before the algorithm, i.e. comparing the length of the two sentences? Then the second check for letters in B appearing in A is also unneeded.
@MartinYork Isn't this your intent? // Make sure all letters have reached zero. Isn't the following the most straightforward way to check it? for (auto [key, value] : count1) if (value) return false; return true; Personally I'd prefer std::all_of though.
@MartinYork On the other hand, I think checking against zero is more difficult to understand. Especially, why are the two zero[c] = 0 needed? (We have to ensure the keys present in both map are the same; that is not obvious IMO. BTW the latter zero[c] = 0 is not needed due to the early exit)

papagaga papagaga 5,80712 silver badges25 bronze badges · Answer 2 · 2018-07-31 08:34:09Z

Algorithms or data structures

Let's start from an alternative way of checking if the two strings are anagrams: sort them both and compare them for equality:

#include <algorithm>
#include <string>
std::string a, b;
// ...
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
bool is_anagram = a == b;

What is the complexity of this method? sorting is n * log(n), comparing is n, so we have n * log(n) + n * log(n) + n (given that the length of both strings is the same).

You can notice that it isn't more costly than the method you chose: std::map is implemented using red-black trees, with log(n) insertion, so that building one map from a string with n character is n * log(n). Comparing between two maps is also n.

So, should you choose a sorted std::string or a std::map, given that the solution in each case has the same algorithmic complexity? The string is generally superior, because its characters are stored in contiguous memory, meaning cache locality and a better allocation strategy than std::map.

That is an important lesson in C++: algorithms applied to array-like structures are often more efficient than data structured "embedding" an algorithm.

Hashing

Can we now do better? There is a data structure that offers O(n) insertion and retrieval and an array disposition: hash maps. In c++ they're present as std::unordered_map. The idea behind the hash map is to associate a value with a key through a hashing function; keys are then positions in an array. So we have: key_n == hash_fn(value_n) and array[key_n] == value_n (actually there might be more than on value by key, but this is a good approximation).

In the case of a string with ordinary characters, you can choose the identity function as the hash function and use a simple array as the hash map:

// ...
#include <array>
std::string s;
// ...
std::array<unsigned char, 256> hash_map{}; // the braces are required to ensure that all array elements are initialized to 0
for (auto c : s) ++hash_map[static_cast<unsigned char>(c)];

Checking if two strings are anagrams can now be done by incrementing values in the array for the first string, and decrementing them for the second string:

// ...
#include <array>
std::string a, b;
// ...
if (a.size() != b.size()) return false;
std::array<unsigned char, 256> hash_map{}; // the braces are required to ensure that all array elements are initialized to 0
for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
for (auto c : b) --hash_map[static_cast<unsigned char>(c)];
bool is_anagram = std::none_of(hash_map.begin(), hash_map.end(), std::identity()); // std::identity is c++20, a lambda returning its argument unchanged will do the trick though

We can still do a little better since decrementing a value at zero in the second pass is a sign that the two strings aren't anagrams:

bool are_anagrams(const std::string& a, const std::string& b) {
 std::array<unsigned char, 256> hash_map{};
 for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
 for (auto c : b) if (!hash_map[static_cast<unsigned char>(c)]--) return false;
 return std::none_of(hash_map.begin(), hash_map.end(), std::identity());
}

Actually, checking if all values have been decremented to zero is useless if both strings are the same size, so we can simplify a bit further:

bool are_anagrams(const std::string& a, const std::string& b) {
 if (a.size() != b.size()) return false;
 std::array<unsigned char, 256> hash_map{};
 for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
 for (auto c : b) if (!hash_map[static_cast<unsigned char>(c)]--) return false;
 return true;
}

(1)char can be signed or unsigned. Indexing into an array with a negative value is undefined behavior. (2) You make an assumption that a char is 8 bits when it could be fewer or more bits (implementation defined as CHAR_BIT). The only guarantee is that sizeof(char) is 1. (3) This problem deals with sentences, which can be made arbitrarily long by successive iterations. Is char a reasonably sized type to count occurrences? Maybe it is. (4) std::identity doesn't exist and has no pending proposals.
(5) "FUNERAL" and "REAL FUN" are anagrams as anagrams consider the members of its set as alphanumerics. They are not permutations as members of its set are all representable characters. You should change any reference of anagrams to permutations.
Did you really mean to say that hash maps have O(n) insertion time, rather than O(1)? Or did you intend that to be the total for inserting n elements? Also, you've written 256 a few times where you clearly meant UCHAR_MAX.

Stack Exchange Network

Attempt at checking whether two words are permutations

2 Answers 2

Basic Review

Algorithm

Algorithms or data structures

Hashing

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Attempt at checking whether two words are permutations

2 Answers 2

Basic Review

Algorithm

Algorithms or data structures

Hashing

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions