I'm writing a comparison of a simple word-counting program in various languages. The task is to count the frequencies of unique, space-separated words in the input, and output the results most frequent first. Case should be normalized to lowercase (ASCII is okay).
But it's been a long time since I've written C++ (before the C++11 days). Is this idiomatic modern C++? Any improvements I could make to make this more idiomatic or simpler (I'm not looking for efficiency improvements here). I want to stick to the standard library.
#include <algorithm>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>
using namespace std;
int main() {
string word;
unordered_map<string, int> counts;
while (cin >> word) {
transform(word.begin(), word.end(), word.begin(),
[](unsigned char c){ return tolower(c); });
counts[word]++;
}
vector<pair<string, int>> ordered(counts.begin(), counts.end());
sort(ordered.begin(), ordered.end(), [](auto &a, auto &b) {
return a.second > b.second;
});
for (auto count : ordered) {
cout << count.first << " " << count.second << "\n";
}
return 0;
}
3 Answers 3
Mainly this looks fine.
using namespace std;
is a bad habit to get into due to potential name collisions. Instead prefer to typestd::
where necessary.counts[word]++;
. We don't need an un-incremented copy of the count, so we should use pre-increment here instead:++counts[word];
sort(ordered.begin(), ordered.end(), [](auto &a, auto &b)
we don't altera
orb
in the lambda, so these should beconst&
.for (auto count : ordered)
copies the pairs. Again, we should be usingauto const&
here.return 0;
we don't need this, as the compiler will add it for us automatically. We might also use a named constant instead of0
, specificallyEXIT_SUCCESS
from<cstdlib>
.
-
3\$\begingroup\$ You missed the failure to include
<cctype>
, needed forstd::tolower()
. \$\endgroup\$Toby Speight– Toby Speight2021年03月09日 09:35:20 +00:00Commented Mar 9, 2021 at 9:35 -
\$\begingroup\$ Thank you -- appreciate these tweaks! \$\endgroup\$Ben Hoyt– Ben Hoyt2021年03月09日 19:05:53 +00:00Commented Mar 9, 2021 at 19:05
user673679's answer deals with low-level review; I'll look at the algorithm.
First, top marks for avoiding the most common error with functions from <cctype>
- it's vitally important to pass the value as unsigned char
promoted to int, rather than plain char
.
Rather than writing my own loop for reading the words, I would consider transform()
from an input-stream iterator to a map inserter. That said, we'd need to construct a custom iterator for the inserter we want. It would be very straightforward if we were using a std::multiset
, but that would use more memory (as it stores all the added objects). We could create a "counter" class with appropriate push_back()
, if we were likely to use it again.
After the while
loop, we should test std::cin.bad()
(or set the stream to throw on error). At present, any stream error is ignored, and we proceed as if we'd read the entire input, giving misleading results.
Instead of populating a vector, and subsequently sorting it, we might prefer to insert directly into an ordered container (std::set
perhaps).
Something we can do in modern C++ is to write nested functions, by assigning a lambda expression to a variable. This may be clearer than writing the lambda inline. Or we can use the anonymous namespace for functions with static linkage.
Here's my modification of the code to have no explicit loops, or indeed any flow-control statements:
#include <algorithm>
#include <cctype>
#include <iostream>
#include <iterator>
#include <set>
#include <string>
#include <unordered_map>
#include <utility>
namespace {
auto downcase(std::string s) {
std::transform(s.begin(), s.end(), s.begin(),
[](unsigned char c){ return std::tolower(c); });
return s;
}
}
int main()
{
using counter = std::unordered_map<std::string, unsigned>;
using in_it = std::istream_iterator<std::string>;
using out_it = std::ostream_iterator<std::string>;
counter counts;
auto insert = [&](std::string s) { ++counts[downcase(std::move(s))]; };
// read words into counter
std::cin.exceptions(std::istream::badbit);
std::for_each(in_it{std::cin}, in_it{}, insert);
// sort by frequency, then alphabetical
auto by_freq_then_alpha = [](const auto &a, const auto &b) {
return std::pair{ b.second, a.first } < std::pair{ a.second, b.first};
};
const std::set ordered{counts.begin(), counts.end(), by_freq_then_alpha};
// write the output
auto format = [](const auto& count) {
return count.first + ' ' + std::to_string(count.second) + '\n';
};
std::transform(ordered.begin(), ordered.end(), out_it{std::cout}, format);
}
For a more modern take, C++20 includes the Ranges library, which lets us use views to transform collections:
int main()
{
using counter = std::unordered_map<std::string, unsigned>;
counter counts;
auto insert = [&counts](std::string s) { ++counts[downcase(std::move(s))]; };
// read words into counter
std::cin.exceptions(std::istream::badbit);
auto input_words = std::ranges::istream_view<std::string>(std::cin);
std::ranges::for_each(input_words, insert);
// sort by frequency, then alphabetical
auto by_freq_then_alpha = [](const auto &a, const auto &b) {
return std::pair{ b.second, a.first } < std::pair{ a.second, b.first};
};
const std::set ordered{counts.begin(), counts.end(), by_freq_then_alpha};
// write the output
auto write_out = [](const auto& count) {
std::cout << count.first << ' ' << count.second << '\n';
};
std::ranges::for_each(ordered, write_out);
}
I'm not saying that either of these is necessarily what you should write; at least for now, the range-based loops are more familiar to most C++ programmers, and so probably clearest. However, it showcases some of the options we have in modern C++.
-
4\$\begingroup\$ While correct and very idiomatic C++, I actually slightly prefer OPs code for readability, it's slightly shorter and easier to scan, then again I have 20 years of habit/experience reading for loops and they just read clearer to me than transform/for_each and company from <algorithm>. They do have their place tho. \$\endgroup\$Emily L.– Emily L.2021年03月09日 14:42:57 +00:00Commented Mar 9, 2021 at 14:42
-
\$\begingroup\$ Well, this has been an education, thank you! Definitely a very different, more functional approach. I do prefer the plainer, for-loop approach (perhaps because this reminds me of Scala, which I don't have warm fuzzy feelings for). And thanks for the tip about error handling. \$\endgroup\$Ben Hoyt– Ben Hoyt2021年03月09日 19:09:01 +00:00Commented Mar 9, 2021 at 19:09
Added: structured bindings
Changed: to using range based algorithms. (C++20) This removes the need for begin()/end() and you can use a projection in the sort call.
Removed: lambdas - not needed
#include <algorithm>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>
using namespace std;
int main()
{
unordered_map<string, int> counts;
string word;
while (cin >> word) {
ranges::transform(word, word.begin(), ::tolower);
counts[word]++;
}
using word_count = pair<string, int>;
vector<word_count> ordered(counts.begin(), counts.end());
ranges::sort(ordered, greater{}, &word_count::second);
for (auto [word, count] : ordered) {
cout << word << " " << count << "\n";
}
}
-
2\$\begingroup\$ The lambda wrapping
std::tolower
is very definitely needed - using the<cctype>
functions by directly promoting their arguments from plainchar
toint
leads to UB on platforms wherechar
is signed. Always convert tounsigned char
before converting toint
. \$\endgroup\$Toby Speight– Toby Speight2021年03月10日 17:32:25 +00:00Commented Mar 10, 2021 at 17:32 -
\$\begingroup\$ The range stuff is great - thanks for helping me get more proficient with that part of the Library! \$\endgroup\$Toby Speight– Toby Speight2021年03月10日 17:33:32 +00:00Commented Mar 10, 2021 at 17:33
std::make_heap
since it brings the largest value to the top automatically. It did not go well... \$\endgroup\$