Reading a list of names, counting them and ordering them

Question 1

I would like to know does this code follow common best practices in C++ using STL.

Goal of the code:

There are 10 players that were playing a game 1000 times. Each time someone has won they have put winners name into the binary file data.bin. Each name takes 20 bytes of data in the binary file and uses 0円 as a string terminator.
This program uses that file as an input and prints out ranked list of each player's scores.

Here is my code:

#include <iostream>
#include <fstream>
#include <map>
#include <algorithm>
#include <vector>
using std::map, std::string, std::ifstream, std::ios, std::string, std::cerr, std::vector, std::pair, std::sort, std::cout;
constexpr size_t NAME_SIZE=20;
const char* FILE_NAME = "data.bin";
constexpr int FILE_NOT_OPENED = 1;
struct zuint { uint16_t val = 0; };
int main() {
 char tmp[NAME_SIZE];
 map<string, zuint > points;
 ifstream file(FILE_NAME, ios::in | ios::binary);
 if(!file){
 cerr << "File not opened. 'data.bin' probably does not exist." << "\n";
 exit(FILE_NOT_OPENED);
 }
 while (!file.eof()) {
 file.read(tmp, NAME_SIZE);
 points[tmp].val++;
 }
 file.close();
 vector<pair<string, zuint>> pointscpy(points.begin(), points.end());
 points.clear();
 sort(
 pointscpy.begin(),
 pointscpy.end(), 
 [](pair<string, zuint> v1, pair<string, zuint> v2) {
 return v1.second.val > v2.second.val;
 }
 );
 for(int i=0;i<pointscpy.size(); i++){
 cout << i + 1 << ". " << pointscpy[i].first << ": " << pointscpy[i].second.val << "\n";
 }
 return EXIT_SUCCESS;
}

Question 2

That long line of using is problematic. First, it's far too long (it's not even sorted, which might explain why std::string appears twice). Secondly, it looks like you've been told to avoid using namespace std; but have heard only half the reason it's a bad idea. Those many usings tend to obscure which identifiers in your code are from the library and which are your own. I really recommend you drop that habit and start referring to things by their full names most of the time.

size_t and uint16_t are never defined. I'm guessing you meant std::size_t and std::uint16_t (the latter from <cstdint> header). It's not clear why you need an exactly-16-bit type for this simple counting - what's wrong with std::uint_fast16_t? Or even a plain unsigned int? Other identifiers not defined include exit (presumably std::exit from <cstdlib>) and EXIT_SUCCESS (also from <cstdlib>).

The zuint structure doesn't seem to provide any benefit over using the integer type directly.

The tmp array doesn't need to be scoped to the whole function - it can be temporary, as the name suggests, within the read loop.

No need to write two separate literal strings to std::cerr - combine them into a single string:

 std::cerr << "File not opened. 'data.bin' probably does not exist.\n";

Perhaps consider std::perror (from <cstdio>) as an alternative error reporting function which provides more information.

The loop is flawed (testing for eof before reading is a known anti-pattern). When istream::read() fails, we haven't populated tmp and shouldn't be adding an entry. Even when it fails, we're not defensive against malformed data (missing final null character). It's better to read into an oversize buffer and supply the null even if it's absent in the input.

Consider using std::unordered_map instead of std::map when we don't care about the order of its elements. That will likely be more performant when there are many different names in the file.

We can save writing long types in full, by taking advantage of auto.

The comparator we pass to std::sort can avoid copying strings, by accepting references to pairs.

We don't need to return success value at the end of main() - the main function is magic, and simply running off the end implies a return value of 0.

Modified code

#include <algorithm>
#include <cerrno>
#include <cstdlib>
#include <cstring> // for strerror()
#include <fstream>
#include <iostream>
#include <unordered_map>
#include <vector>
constexpr std::size_t name_size = 20;
const char* file_name = "data.bin";
int main()
{
 std::ifstream file(file_name, std::ios::in | std::ios::binary);
 if (!file) {
 std::cerr << file_name << ": " << std::strerror(errno) << '\n';
 return EXIT_FAILURE;
 }
 std::unordered_map<std::string, unsigned int> points;
 {
 char name[name_size + 1];
 name[name_size] = '0円';
 while (file.read(name, name_size)) {
 ++points[name];
 }
 }
 if (!file.eof()) {
 std::cerr << file_name << ": " << std::strerror(errno) << '\n';
 return EXIT_FAILURE;
 }
 std::vector<std::pair<std::string, unsigned int>>
 pointscpy(points.begin(), points.end());
 std::sort(pointscpy.begin(), pointscpy.end(),
 [](auto const& v1, auto const& v2) {
 return v1.second > v2.second;
 });
 int i = 0;
 for (auto const& p: pointscpy) {
 std::cout << ++i << ". " << p.first << ": " << p.second << '\n';
 }
}

Future directions

Instead of hard-coding the file name in the program, accept it as a run-time argument, and/or accept input on the standard input stream so that it can be used as a filter.

Question 3

I guess I was not clear about '0円' terminated name. It is stored in form of "<Name>0円" then data after '0円' is ignored ( it is like C-style string). When it comes to zuint I used it because of default value(0) but I guess that is the default. Main reason why I chose uint16_t is because I knew that 1000 is the biggest possible number so I chose the smallest possible container. But everything you said stands.

Question 4

I had some doubts about the null-terminated names (particularly whether the 20-char record size was the maximum name length with or without the null), but the code resolved the questions. My point is that when the input is as expected, all is well, but in the real world we need to be prepared for accidents or malicious actions: if I write a file that has 20 non-null characters in a row, I can force a buffer overrun. Never trust your inputs!

Question 5

And yes, a map of string to any numeric type will initialise the values to zero - you can depend on that.

Question 6

Also I forgot to ask. Is points.clear() fine to just clear the memory or there is some reason why you removed it?

Question 7

Yes, it's fine - but it doesn't achieve very much, because the peak usage is still going to be when the vector is created. I'm not sure that clear() will return any of the memory it was using anyway - I think that will remain as unused capacity, though that could depend on your implementation.

Toby Speight Toby Speight 87.2k14 gold badges104 silver badges322 bronze badges · Accepted Answer · 2022-05-06 20:57:51Z

That long line of using is problematic. First, it's far too long (it's not even sorted, which might explain why std::string appears twice). Secondly, it looks like you've been told to avoid using namespace std; but have heard only half the reason it's a bad idea. Those many usings tend to obscure which identifiers in your code are from the library and which are your own. I really recommend you drop that habit and start referring to things by their full names most of the time.

size_t and uint16_t are never defined. I'm guessing you meant std::size_t and std::uint16_t (the latter from <cstdint> header). It's not clear why you need an exactly-16-bit type for this simple counting - what's wrong with std::uint_fast16_t? Or even a plain unsigned int? Other identifiers not defined include exit (presumably std::exit from <cstdlib>) and EXIT_SUCCESS (also from <cstdlib>).

The zuint structure doesn't seem to provide any benefit over using the integer type directly.

The tmp array doesn't need to be scoped to the whole function - it can be temporary, as the name suggests, within the read loop.

No need to write two separate literal strings to std::cerr - combine them into a single string:

 std::cerr << "File not opened. 'data.bin' probably does not exist.\n";

Perhaps consider std::perror (from <cstdio>) as an alternative error reporting function which provides more information.

The loop is flawed (testing for eof before reading is a known anti-pattern). When istream::read() fails, we haven't populated tmp and shouldn't be adding an entry. Even when it fails, we're not defensive against malformed data (missing final null character). It's better to read into an oversize buffer and supply the null even if it's absent in the input.

Consider using std::unordered_map instead of std::map when we don't care about the order of its elements. That will likely be more performant when there are many different names in the file.

We can save writing long types in full, by taking advantage of auto.

The comparator we pass to std::sort can avoid copying strings, by accepting references to pairs.

We don't need to return success value at the end of main() - the main function is magic, and simply running off the end implies a return value of 0.

Modified code

#include <algorithm>
#include <cerrno>
#include <cstdlib>
#include <cstring> // for strerror()
#include <fstream>
#include <iostream>
#include <unordered_map>
#include <vector>
constexpr std::size_t name_size = 20;
const char* file_name = "data.bin";
int main()
{
 std::ifstream file(file_name, std::ios::in | std::ios::binary);
 if (!file) {
 std::cerr << file_name << ": " << std::strerror(errno) << '\n';
 return EXIT_FAILURE;
 }
 std::unordered_map<std::string, unsigned int> points;
 {
 char name[name_size + 1];
 name[name_size] = '0円';
 while (file.read(name, name_size)) {
 ++points[name];
 }
 }
 if (!file.eof()) {
 std::cerr << file_name << ": " << std::strerror(errno) << '\n';
 return EXIT_FAILURE;
 }
 std::vector<std::pair<std::string, unsigned int>>
 pointscpy(points.begin(), points.end());
 std::sort(pointscpy.begin(), pointscpy.end(),
 [](auto const& v1, auto const& v2) {
 return v1.second > v2.second;
 });
 int i = 0;
 for (auto const& p: pointscpy) {
 std::cout << ++i << ". " << p.first << ": " << p.second << '\n';
 }
}

Future directions

Instead of hard-coding the file name in the program, accept it as a run-time argument, and/or accept input on the standard input stream so that it can be used as a filter.

I guess I was not clear about '0円' terminated name. It is stored in form of "<Name>0円" then data after '0円' is ignored ( it is like C-style string). When it comes to zuint I used it because of default value(0) but I guess that is the default. Main reason why I chose uint16_t is because I knew that 1000 is the biggest possible number so I chose the smallest possible container. But everything you said stands.
I had some doubts about the null-terminated names (particularly whether the 20-char record size was the maximum name length with or without the null), but the code resolved the questions. My point is that when the input is as expected, all is well, but in the real world we need to be prepared for accidents or malicious actions: if I write a file that has 20 non-null characters in a row, I can force a buffer overrun. Never trust your inputs!
And yes, a map of string to any numeric type will initialise the values to zero - you can depend on that.
Also I forgot to ask. Is points.clear() fine to just clear the memory or there is some reason why you removed it?
Yes, it's fine - but it doesn't achieve very much, because the peak usage is still going to be when the vector is created. I'm not sure that clear() will return any of the memory it was using anyway - I think that will remain as unused capacity, though that could depend on your implementation.

Stack Exchange Network

Reading a list of names, counting them and ordering them

1 Answer 1

Modified code

Future directions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Reading a list of names, counting them and ordering them

1 Answer 1

Modified code

Future directions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions