It's sometimes fun to implement the exercises from K&R in C++. It's certainly interesting to see what C++ makes easier and what's still difficult.
I've had a go at Exercise 1-14 in the second edition:
Exercise 1-14. Write a program to print a histogram of the frequencies of different characters in its input.
The extra challenge from 1-13 also applies here:
It is easy to draw the histogram with the bars horizontal; a vertical orientation is more challenging.
I also chose to make my histogram bars logarithmic.
My code looks fairly slick for gathering the data, but I wasn't very satisfied with the output half:
// A program to print a histogram of the frequencies of different
// characters in its input. K&R(2) exercise 1-14.
#include <algorithm>
#include <cctype>
#include <cmath>
#include <cstddef>
#include <iostream>
#include <iomanip>
#include <iterator>
#include <map>
#include <ranges>
template<typename T>
using counter = std::map<T, std::size_t>;
static bool isprint(char c)
{
return std::isprint(static_cast<unsigned char>(c));
}
int main()
{
auto hist = counter<char>{};
std::ranges::for_each(std::istreambuf_iterator<char>{std::cin},
std::istreambuf_iterator<char>{},
[&hist](char c){ ++hist[c]; });
auto const keys = hist | std::views::keys;
auto const values = hist | std::views::values;
auto const max_val = std::ranges::max(values);
// width of the scale column
unsigned count_width = static_cast<unsigned>(std::ceil(std::log10(max_val)));
// scale to fit a 24-line display (but just assume it's wide enough)
auto constexpr rows = 23u;
auto const scale = std::log(static_cast<double>(max_val - 1)) / rows;
std::cout << std::fixed << std::setprecision(0);
for (auto i = rows; i; --i) {
// scale bar
auto const threshold = static_cast<std::size_t>(std::exp(i * scale));
std::cout << std::setw(count_width);
if (i % 2 == rows % 2) {
std::cout << threshold;
} else {
std::cout << "";
}
std::cout << '|';
// character columns
for (auto v: values) {
std::cout << (v >= threshold ? '*' : ' ');
}
std::cout << '\n';
}
// histogram key
std::cout << std::setw(count_width + 1) << '|';
for (auto k: keys) {
std::cout << (isprint(k) ? k : ' ');
}
}
When fed its own source code, it produces this output:
406| *
| *
241| *
| *
143| *
| * *
84| * **
| * * * * * * **
50|** * * *** * * ***
|** * * * *** * ** ****
29|** * * * *** * * ** ****
|** ** *** * *** ** * ** ****
17|** ** *** * * *** ** **** ****
|** ** * ***** ** ******* ***** ****
10|** *** * ***** ** ******* ***** ****** * *
|** * *** * ***** ** ******* ***** ******* * *
6|** * *** * ***** ** ************* ******** * *
|** * *** * * ** ***** **************** ******** ***
3|** * *** *** * ** ***** **************** ******** ***
|****************** ******* ** ***************** ************
2|****************** ******* ** ***************** ************
|**************************************************************
1|**************************************************************
| "#%&'()*+,-./01234:;<=>?AKRT[\]_abcdefghiklmnopqrstuvwxyz{|}
N.B. The C++20 library support in GCC 11 is not sufficiently complete for this, so GCC 12 is required (or any other compiler with good C++20 support).
2 Answers 2
That's some surprisingly compact but still very readable code! Very good use of C++20's ranges.
Casting numbers
You are performing several explicit casts. While I don't think there is anything wrong with them, one is at least unnecessary: std::log()
and many other math functions have overloads that take integers, and will cast to double
automatically for you. So you don't have to do that yourself.
You use auto foo = static_cast<sometype>(somevalue)
in many places, which is good, but you didn't do this for count_width
, where you specified the type twice. This is a bit dangerous, as if the two types don't match you will might have an undesirable implicit cast happening.
Also, since you use count_width
as an argument for std::set()
, which takes an int
, it might be better have it be an int
itself:
const auto count_width = static_cast<int>(std::ceil(std::log10(max_val)));
Use a std::array
while counting
When counting how many times each character occurs, a std::map
is inefficient. Since you know there are only UCHAR_MAX
+ 1 possible characters, you can create a std::array
of that size. This avoids expensive lookups every time you want to update a counter.
After counting you can consider either converting the result into a std::map
, or just iterating over the std::array
and ignoring any element that has a count of zero. The latter would make your code a bit more complicated, although you could probably still do it with ranges. While a std::views::enumerate
would have been very helpful, you can still do things in C++20 like:
std::array<std::size_t, UCHAR_MAX + 1> hist{};
std::ranges::for_each(std::istreambuf_iterator<char>{std::cin},
std::istreambuf_iterator<char>{},
[&hist](char c){ ++hist[static_cast<unsigned char>(c)]; });
auto values = hist | std::views::filter([](auto count){ return count != 0; });
for (auto& value: values) {
auto key = static_cast<char>(&value - &hist[0]);
std::cout << key << ": " << value << "\n";
}
You could create some helper functions to make that look less hacky.
Split the code into multiple functions
You can split main()
into a function that calculates the histogram and one that prints it.
Missing newline
Your program doesn't print a newline after the histogram key.
Infinite loop when the input is empty
Your program goes into an infinite loop when the input is completely empty. This is caused by applying std::ranges::max()
to an empty view. I think the solution is just to check if hist.empty()
right after the for_each()
, and quit the program if that returns true
.
Incorrect output when every count is 1
When every character in the input occurs exactly once, then count_width
is zero, which causes every other line of the output to be shifted by one.
The simple solution is just to add a std::max(1u, ...)
to the calculation of count_width
.
y-axis count is off by one
The y-axis only shows the maximum count minus 1. This is because you literally subtract 1 from max_val
when calculating scale
, not because of floating point imprecision, although the latter is something you should worry about as well. Instead of subtracting 1, I would just add 0.5 to max_val
when calculating scale
.
Consider using std::format()
Since your code relies on C++20, you could consider using std::format()
(or {fmt} if your standard library doesn't support it yet); it makes aligning text much easier. For example:
for (auto i = rows; i; --i) {
auto const threshold = static_cast<std::size_t>(std::exp(i * scale));
if ((i - rows) % 2)
std::cout << fmt::format("{:{}}|", "", count_width);
else
std::cout << fmt::format("{:{}}|", threshold, count_width);
for (auto v: values) {
std::cout << (v >= threshold ? '*' : ' ');
std::cout << '\n';
}
With C++23, you can even format ranges. This would allow you to do something like:
for (auto i = rows; i; --i) {
auto const threshold = static_cast<std::size_t>(std::exp(i * scale));
std::print("{:{}}|{::s}\n",
threshold,
count_width,
values | std::views::transform([threshold](auto v){
return v >= threshold ? '*' : ' ';
}
);
}
-
1\$\begingroup\$ Tricky with the y-axis count - if I scale to get that right, I end up with no bar quite reaching that height. Presumably due to truncation of floating-point values. \$\endgroup\$Toby Speight– Toby Speight2022年12月21日 15:53:08 +00:00Commented Dec 21, 2022 at 15:53
-
\$\begingroup\$ I think I worked it out - use
std::exp(i * scale)
for the label, andstd::exp((i-.5) * scale)
for the threshold to draw a star. \$\endgroup\$Toby Speight– Toby Speight2022年12月22日 13:37:47 +00:00Commented Dec 22, 2022 at 13:37 -
\$\begingroup\$ That might also work :) \$\endgroup\$G. Sliepen– G. Sliepen2022年12月22日 16:20:50 +00:00Commented Dec 22, 2022 at 16:20
C++20 ranges and views are not available for all compilers yet. The default C++ compiler on Ubuntu 22.04 does not support them because it is gcc11 and not gcc12.
Since the problem is in the first chapter of K&R it is considered quite simple and can be safely put into one function, the main function, but personally I would divide it up into input and output functions. That might make the code a little more readable. It takes a little searching to find where the code is actually doing the input from stdin (std::cin)
.
This part of the output loop is complicated enough to be a function:
// scale bar
auto const threshold = static_cast<std::size_t>(std::exp(i * scale));
std::cout << std::setw(count_width);
if (i % 2 == rows % 2) {
std::cout << threshold;
} else {
std::cout << "";
}
std::cout << '|';
That could make the output look slicker.
The variable keys
is created long before it is actually necessary:
// histogram key
auto const keys = hist | std::views::keys;
std::cout << std::setw(count_width + 1) << '|';
for (auto k : keys) {
std::cout << (isprint(k) ? k : ' ');
}
-
\$\begingroup\$ I should have said that I used GCC 12 for this - sorry for omitting that! \$\endgroup\$Toby Speight– Toby Speight2022年12月21日 15:54:05 +00:00Commented Dec 21, 2022 at 15:54