In a question I answered I posted the following code:
#include <iostream>
#include <fstream>
#include <algorithm>
#include <sstream>
int main()
{
fstream iFile("names.txt", ios::in);
// This does not work so replacing with code that does
// iFile >> file;
// This is not the best way.
// But the closest to the intention of the original post.
// Get size of file.
ifile.seekg(0,std::ios::end);
std::streampos length = ifile.tellg();
ifile.seekg(0,std::ios::beg);
// Copy file into string.
std::string file(length, '0円');
ifile.read(&buffer[0],length);
// Original code continues.
std::istringstream ss(file);
std::string token;
std::vector<std::string> names;
while(std::getline(ss, token, ',')) {
names.push_back(token);
}
for (unsigned int i = 0; i < names.size(); i++) {
auto it = std::remove_if(names[i].begin(), names[i].end(), [&] (char c) { return c == '"'; });
names[i] = std::string(names[i].begin(), it);
}
for (unsigned int i = 0; i < names.size(); i++) {
std::cout << "names["<<i<<"]: " << names[i] << std::endl;
}
}
Would this be inefficient for larger files? Would it be better if I read the file into one string, did all the manipulation on that, and then put it into a vector?
2 Answers 2
This code looks a little wild. I'd tame it like this:
#include <algorithm> // for std::remove
#include <fstream> // for std::ifstream
#include <string> // for std::string and std::getline
#include <utility> // for std::move
#include <vector> // for std::vector
std::ifstream infile("thefile.txt"); // we don't really care where you got the name from
std::vector<std::string> names;
for (std::string line; std::getline(infile, line, ','); )
{
line.erase(std::remove(line.begin(), line.end(), '"'), line.end());
names.push_back(std::move(line));
}
// done
A few points to note:
Keep it simple. Don't overthink.
Use ready-made algorithms, don't reinvent wheels (
std::remove
removes values).Include what you use (
std::string
,std::getline
).Use algorithms, don't hand-roll your own loops. (erase/remove)
Understand what algorithms can do for you (don't hand-write your own "erase"). Get familiar with the standard library.
Don't leak scopes. The
line
string is only needed within the loop, and no further.Efficiency tip: You can move from the
line
string that you no longer need and save yourself a copy. Also, we process everything in one step; no need to loop repeatedly.
Furthermore, I suggest factoring this logic into a file:
std::vector<std::string> tokenize_file(std::string const & filename, char delimiter)
{
std::vector<std::string> result;
// ...
return result;
}
Usage:
std::vector<std::string> names = tokenize_file("thefile.txt", ',');
Instead of constructing an fstream
, reading it all into a string
(which is misleadingly named file
, by the way), and creating an istringstream
from that, why not just create an ifstream
and call getline
on it directly?
You appear to be trying to parse a CSV file where the fields may be double quoted. If that is your intention, then simply deleting all "
characters is the wrong behaviour. See RFC 4180 for commonly accepted quoting rules for CSV documents. Consider using a CSV parsing library instead of rolling your own CSV parser.
Your code would be clearer if you unquoted each token
before adding it to names
.
-
1\$\begingroup\$ Totally agree on using CSV parsing library. CSV is one of those badly (not though through enough) defined formats that has a billion edge cases that you would never think about yourself. The simple case is trivial but doing it correctly for all cases is exceptionally hard. \$\endgroup\$Loki Astari– Loki Astari2013年12月23日 21:38:51 +00:00Commented Dec 23, 2013 at 21:38
main()
? You can't have free-floating code like that in C++. \$\endgroup\$iFile >> file;
that is not doing very much. It reads a single space separated word. So I am going to change that to read the whole file into a string. \$\endgroup\$