3
\$\begingroup\$

This is my first algorithm in C++ ever to check if two strings are permutations of each other. Please provide guidance on correctness of algorithm + ways to improve + thoughts on my use of the language itself.

The idea is two create two dictionaries which count the number of occurrences of each character using two for loops, and then we use a final for loop to check if the counts are equal.

void checkPermut(std::string &sentence1, std::string &sentence2)
{
 std::map<char, int> dict1;
 std::map<char, int> dict2;
 for (char & c : sentence1) 
 {
 if (dict1.find(c) == dict1.end())
 {
 dict1.insert(std::pair<char, int>(c, 1));
 }
 else
 {
 dict1[c] == dict1[c] + 1;
 }
 }
 for (char & c : sentence2)
 {
 if (dict2.find(c) == dict2.end())
 {
 dict2.insert(std::pair<char, int>(c, 1));
 }
 else
 {
 dict2[c] == dict2[c] + 1;
 }
 }
 bool arePermut = true;
 for (char & c : sentence2)
 {
 if (dict1.find(c) == dict1.end())
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 else if (dict1[c] != dict2[c])
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 }
 if (arePermut)
 {
 std::cout << "Are permutations" << std::endl;
 }
}
int main()
{
 std::string somestring1;
 std::string somestring2;
 std::cin >> somestring1 >> somestring2;
 checkPermut(somestring1, somestring2);
 return 0;
}
Heslacher
50.9k5 gold badges83 silver badges177 bronze badges
asked Jul 30, 2018 at 17:21
\$\endgroup\$
5
  • 5
    \$\begingroup\$ Do you mean that it checks whether two strings are anagrams? \$\endgroup\$ Commented Jul 30, 2018 at 17:33
  • 2
    \$\begingroup\$ Unfortunately, there is a typo in dict[c] == dict[c] + 1. \$\endgroup\$ Commented Jul 30, 2018 at 17:36
  • \$\begingroup\$ Much simpler approach: order both strings and then compare if equal. no need for loops to check equality. If using native strings, some cost in doing the ordering, but if using an ordered list this becomes trivial. \$\endgroup\$ Commented Jul 31, 2018 at 6:48
  • \$\begingroup\$ @AJD this method does order both strings, using counting sort. It just doesn't reconstruct the ordered strings. \$\endgroup\$ Commented Jul 31, 2018 at 8:39
  • \$\begingroup\$ @Snowhawk: Thanks - I missed the significance of dict in this context. \$\endgroup\$ Commented Jul 31, 2018 at 19:24

2 Answers 2

5
\$\begingroup\$

Basic Review

Look below for algorithm ideas:

You are not modifying the input strings. So they should be passed by const reference to prevent accidents:

void checkPermut(std::string const& sentence1, std::string const& sentence2)
 ^^^^^ ^^^^^

You don't need to check if a value exists in a map like this:

 if (dict1.find(c) == dict1.end())
 {
 dict1.insert(std::pair<char, int>(c, 1));
 }
 else
 {
 dict1[c] == dict1[c] + 1;
 }

Use operator[] If it does not exist the item will be created (the value is "zero" initialized" and returned as a reference. You can replace the above code with the line:

 ++dict1[c];

Be careful when getting things by reference (accidental modification can happen). Unless you need to modify something, get by const reference.

 for (char const& c : sentence2)
 ^^^^^

But in this case I would get by value:

 for (char c : sentence2)

There is an easy way to make pairs that does not require all the type info:

 std::pair<char, int>(c, 1);

Simpler to write:

 std::make_pair(c, 1); // The function gets the type from
 // the parameters and creates the
 // correct type of underlying pair.

As a side note. technically you may not want to make the assumption that std::map uses a pair internally. This is why std::map has an internal type value_type so you can use that and not make any assumptions about the type used internally.

 using Map = std::map<char, int>;
 using MapValue = typename Map::value_type;
 Map dict1;
 ...
 dict1.insert(MapValue(c, 1));

There is a subtle bug here:

 for (char & c : sentence2)
 {
 if (dict1.find(c) == dict1.end())
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 else if (dict1[c] != dict2[c])
 {
 arePermut = false;
 std::cout << "Not permutations." << std::endl;
 break;
 }
 }

You check all the letters from sentence2 are exactly the same as in sentence1. But that may not be enough. You need to check the other way.

 // Assume: 
 Sentence1 = "BigZ"
 Sentence2 = "Big"; // Your algorithm will match.

Note most containers are comparable.

There is a subtle bug here:

 if (dict1 != dict2)
 {
 std::cout << "Not permutations." << "\n";
 }
 else
 {
 std::cout << "Not permutations." << "\n";

Note: The operator>> reads a single space separated word. Your code above suggests sentences. Is this what you want?

 std::cin >> somestring1 >> somestring2;

Use std::getline(std::cin, somestring1) to read a line.

This is redundant in main():

 return 0;

Algorithm

We know you can simply compare the containers. But you can optimize this. By subtracting characters from sentence two from the container. If any become negative then you can exit early.

Also do you want to compare space characters? This makes it really hard to have anagram sentences if the word count has to be the same.

bool checkPermut(std::string const& sentence1, std::string const& sentence2)
{
 using Map = std::map<char, int>;
 Map count1;
 Map zero;
 for(auto c: sentence1) {
 if (std::isspace(c) {
 continue;
 }
 ++count1[c];
 zero[c] = 0;
 }
 for(auto c: sentence2) {
 if (std::isspace(c) {
 continue;
 }
 if (--count1[c] < 0) {
 return false; // exit early if we go negative.
 }
 zero[c] = 0;
 }
 return count1 == zero; // Make sure all letters have reached zero.
} 
answered Jul 31, 2018 at 0:25
\$\endgroup\$
7
  • \$\begingroup\$ Ditch the zero map. Just return std::all_of(count1.begin(), count1.end(), [](std::pair<char, int> p) { return p.second == 0; }); \$\endgroup\$ Commented Jul 31, 2018 at 2:04
  • \$\begingroup\$ @johnchen902 Don't agree. The point of coding is to express intent as clearly as possible. What you posted is hard to understand therefore should be considered less usable. There are only 255 characters so comparing the maps is a trivial processes. \$\endgroup\$ Commented Jul 31, 2018 at 3:52
  • 2
    \$\begingroup\$ Can I suggest adding basic checks before the algorithm, i.e. comparing the length of the two sentences? Then the second check for letters in B appearing in A is also unneeded. \$\endgroup\$ Commented Jul 31, 2018 at 6:53
  • \$\begingroup\$ @MartinYork Isn't this your intent? // Make sure all letters have reached zero. Isn't the following the most straightforward way to check it? for (auto [key, value] : count1) if (value) return false; return true; Personally I'd prefer std::all_of though. \$\endgroup\$ Commented Jul 31, 2018 at 7:46
  • \$\begingroup\$ @MartinYork On the other hand, I think checking against zero is more difficult to understand. Especially, why are the two zero[c] = 0 needed? (We have to ensure the keys present in both map are the same; that is not obvious IMO. BTW the latter zero[c] = 0 is not needed due to the early exit) \$\endgroup\$ Commented Jul 31, 2018 at 8:00
2
\$\begingroup\$

Algorithms or data structures

Let's start from an alternative way of checking if the two strings are anagrams: sort them both and compare them for equality:

#include <algorithm>
#include <string>
std::string a, b;
// ...
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
bool is_anagram = a == b;

What is the complexity of this method? sorting is n * log(n), comparing is n, so we have n * log(n) + n * log(n) + n (given that the length of both strings is the same).

You can notice that it isn't more costly than the method you chose: std::map is implemented using red-black trees, with log(n) insertion, so that building one map from a string with n character is n * log(n). Comparing between two maps is also n.

So, should you choose a sorted std::string or a std::map, given that the solution in each case has the same algorithmic complexity? The string is generally superior, because its characters are stored in contiguous memory, meaning cache locality and a better allocation strategy than std::map.

That is an important lesson in C++: algorithms applied to array-like structures are often more efficient than data structured "embedding" an algorithm.

Hashing

Can we now do better? There is a data structure that offers O(n) insertion and retrieval and an array disposition: hash maps. In c++ they're present as std::unordered_map. The idea behind the hash map is to associate a value with a key through a hashing function; keys are then positions in an array. So we have: key_n == hash_fn(value_n) and array[key_n] == value_n (actually there might be more than on value by key, but this is a good approximation).

In the case of a string with ordinary characters, you can choose the identity function as the hash function and use a simple array as the hash map:

// ...
#include <array>
std::string s;
// ...
std::array<unsigned char, 256> hash_map{}; // the braces are required to ensure that all array elements are initialized to 0
for (auto c : s) ++hash_map[static_cast<unsigned char>(c)];

Checking if two strings are anagrams can now be done by incrementing values in the array for the first string, and decrementing them for the second string:

// ...
#include <array>
std::string a, b;
// ...
if (a.size() != b.size()) return false;
std::array<unsigned char, 256> hash_map{}; // the braces are required to ensure that all array elements are initialized to 0
for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
for (auto c : b) --hash_map[static_cast<unsigned char>(c)];
bool is_anagram = std::none_of(hash_map.begin(), hash_map.end(), std::identity()); // std::identity is c++20, a lambda returning its argument unchanged will do the trick though

We can still do a little better since decrementing a value at zero in the second pass is a sign that the two strings aren't anagrams:

bool are_anagrams(const std::string& a, const std::string& b) {
 std::array<unsigned char, 256> hash_map{};
 for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
 for (auto c : b) if (!hash_map[static_cast<unsigned char>(c)]--) return false;
 return std::none_of(hash_map.begin(), hash_map.end(), std::identity());
}

Actually, checking if all values have been decremented to zero is useless if both strings are the same size, so we can simplify a bit further:

bool are_anagrams(const std::string& a, const std::string& b) {
 if (a.size() != b.size()) return false;
 std::array<unsigned char, 256> hash_map{};
 for (auto c : a) ++hash_map[static_cast<unsigned char>(c)];
 for (auto c : b) if (!hash_map[static_cast<unsigned char>(c)]--) return false;
 return true;
}
answered Jul 31, 2018 at 8:34
\$\endgroup\$
3
  • \$\begingroup\$ (1)char can be signed or unsigned. Indexing into an array with a negative value is undefined behavior. (2) You make an assumption that a char is 8 bits when it could be fewer or more bits (implementation defined as CHAR_BIT). The only guarantee is that sizeof(char) is 1. (3) This problem deals with sentences, which can be made arbitrarily long by successive iterations. Is char a reasonably sized type to count occurrences? Maybe it is. (4) std::identity doesn't exist and has no pending proposals. \$\endgroup\$ Commented Jul 31, 2018 at 9:51
  • \$\begingroup\$ (5) "FUNERAL" and "REAL FUN" are anagrams as anagrams consider the members of its set as alphanumerics. They are not permutations as members of its set are all representable characters. You should change any reference of anagrams to permutations. \$\endgroup\$ Commented Jul 31, 2018 at 9:51
  • \$\begingroup\$ Did you really mean to say that hash maps have O(n) insertion time, rather than O(1)? Or did you intend that to be the total for inserting n elements? Also, you've written 256 a few times where you clearly meant UCHAR_MAX. \$\endgroup\$ Commented Aug 1, 2018 at 9:28

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.