4
\$\begingroup\$

Part-1 here

This is a small CSV parser for positive integers mostly intended for personal projects.
I (hopefully) improved the code from Part-1 by condensing some parts and improving others while also keeping performance the same.
My main worries are any obvious pitfalls I might have missed as well as concerns about the general approach. Of course anything else you notice is also welcome.
Please see part-1 for a list of requirements/restrictions I had when writing this.

For anyone who wants to test this against actual input I provide a small Perl script to generate CSV files.

Call it with: perl [scriptname] [how many lines you want] >outputfile

use strict;
use warnings "all";
die "usage: perl 0ドル <amount_of_lines>\n" unless (int(@ARGV) > 0);
for (1 .. $ARGV[0]) {
 my $vals_per_line = int(rand(10));
 my @values;
 for (0 .. $vals_per_line) {
 push(@values, int(rand(1000)));
 }
 print join(",", @values) . "\n";
}

Code:

#include <fstream>
#include <string>
#include <vector>
template<typename T>
void parse_lines(
 const std::string& filename,
 const uint_fast8_t& line_length_estimate,
 const uint_fast8_t& values_per_line_estimate,
 T callback) {
 std::ifstream infile{filename};
 if (!infile.good()) {
 return;
 }
 std::vector<uint_fast16_t> numbers;
 numbers.reserve(values_per_line_estimate);
 std::string buffer;
 buffer.reserve(line_length_estimate);
 while (infile.good() && std::getline(infile, buffer)) {
 if (buffer[buffer.size() - 1] == '\r') {
 buffer[buffer.size() - 1] = ',';
 }
 else {
 buffer.push_back(',');
 }
 uint_fast16_t parsed_number = 0;
 for (const auto& digit : buffer) {
 if (digit == ',') {
 numbers.emplace_back(parsed_number);
 parsed_number = 0;
 }
 else {
 parsed_number = (parsed_number * 10) + (digit - '0');
 }
 }
 callback(numbers);
 numbers.clear();
 }
}
int main(int argc, char** argv) {
 constexpr uint_fast8_t line_length_estimate = 50;
 constexpr uint_fast8_t values_per_line_estimate = 15;
 parse_lines(argv[1], line_length_estimate, values_per_line_estimate, [](auto& values) {
 // ...
 });
}
asked Mar 11, 2018 at 11:46
\$\endgroup\$
1
  • \$\begingroup\$ I tried to break the dependency between different states of the parsed number, e.g. doing parsed_number += (digit - '0') * powers[current_power], but it kept ending up slower. I guess improving performance further will require some more intricate techniques. \$\endgroup\$ Commented Mar 11, 2018 at 15:23

1 Answer 1

2
\$\begingroup\$

Great that you renamed you function according to what it does. I would suggest, that you add a helper function that reads a single line.

Creating small well encapsulated functions is key in writing maintainable code.

Regarding you code there are some small nits:

  1. The code pattern

    container[container.size() - 1] 
    

    can alway be replaced by

    container.back()
    
  2. Regarding the parsing you could use std::string::find_first_of and search for the next occurence of it.

    std::size_t first = 0;
    std::size_t second;
    while (first != buffer.size()) {
     second = buffer.find_first_of(",\r", first);
     // parse [first, second)
     first = second + 1; 
    }
    

    That way in the interval [first, second) there would always be the numbers and you would not have to replace the final character or other stuff

  3. Again encapsulation. There is no reason why the this block of code is not inside its own function:

    uint_fast16_t parsed_number = 0;
    for (const auto& digit : buffer) {
     if (digit == ',') {
     numbers.emplace_back(parsed_number);
     parsed_number = 0;
     }
     else {
     parsed_number = (parsed_number * 10) + (digit - '0');
     }
    }
    

    Even better if you encapsulate it you can easily adopt for iterators/positions

    uint_fast16_t parse_number(cont std::string& buffer, std::size_t first, std::size_t second) {
     uint_fast16_t parsed_number = 0;
     for (std::size_t i = first; i < second; ++i) {
     parsed_number = (parsed_number * 10) + (buffer[i] - '0');
     }
     return parsed_number;
    } 
    

    Now you can write the loop as

    while (infile.good() && std::getline(infile, buffer)) {
     std::size_t first = 0;
     std::size_t second;
     while (first != buffer.size()) {
     second = buffer.find_first_of(",\r", first);
     numbers.push_back(parse_number(buffer, first,second));
     first = second + 1; 
     }
     callback(numbers);
     numbers.clear();
    }
    

    I would even suggest to create another function for parsing the single line:

    void parse_single_line(std::string& buffer, std::vector<uint_fast16_t>& numbers) {
     std::size_t first = 0;
     std::size_t second;
     while (first != buffer.size()) {
     second = buffer.find_first_of(",\r", first);
     numbers.push_back(parse_number(buffer, first,second));
     first = second + 1; 
     }
    

    Together this leads to

    while (infile.good() && std::getline(infile, buffer)) {
     parse_single_line(buffer, numbers);
     callback(numbers);
     numbers.clear();
    }
    
answered Mar 11, 2018 at 13:10
\$\endgroup\$
3
  • \$\begingroup\$ This is an interesting approach. I think you forgot to pass buffer into parse_number? Also if I'm not mistaken this won't work with \n line endings because getline will remove the line ending. This can be fixed by checking second against numeric_limits but performance suffers quite a bit. \$\endgroup\$ Commented Mar 12, 2018 at 9:31
  • \$\begingroup\$ @yuri buffer is obviously missing, Sorry for that. Regarding line ending find will give you std::string::npos if nothing is found o that should also be fine, as we parese [first,second) \$\endgroup\$ Commented Mar 12, 2018 at 13:19
  • \$\begingroup\$ Adding if (second == std::string::npos) second = buffer.size() - 1; makes this work for both line ending types. However performance is now nearly twice as bad as before. Any suggestions on how to speed this up? \$\endgroup\$ Commented Mar 12, 2018 at 16:07

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.