I have next C++ code in VS2019 under Windows 10:
char const* const fileName = "random_StringArray_10000000";
FILE* infile;
long fileSize;
char* buffer;
size_t readBytes;
infile = fopen(fileName, "rb");
if (infile == NULL)
{
fputs("File error", stderr); exit(1);
}
fseek(infile, 0, SEEK_END);
fileSize = ftell(infile);
rewind(infile);
buffer = (char*)malloc(sizeof(char) * fileSize);
if (buffer == NULL)
{
fputs("Memory error", stderr); exit(2);
}
auto start = chrono::steady_clock::now();
readBytes = fread(buffer, 1, fileSize, infile);
auto end = chrono::steady_clock::now();
if (readBytes != fileSize)
{
fputs("Reading error", stderr); exit(3);
}
fclose(infile);
free(buffer);
auto elapsed_ms = chrono::duration_cast<chrono::milliseconds>(end - start);
cout << "Elapsed ms: " << elapsed_ms.count() << endl;
cout << "String count: " << stringCount << endl;
system("pause");
return 0;
This method used because it is fastest way to read file from disk under VS2019. Now i need to convert char array to the string array. random_StringArray_10000000 - UTF8 text file. Strings lenght 8 - 120 symbols. Hex view of this file: enter image description here
0x0D 0x0A separate strings.
Which fastest way to convert char array (buffer) to the C++ string array?
3 Answers 3
There seems to be a regularity to your data, all strings are eight characters long and separated by the same two characters. With that in mind the following seems fairly fast.
size_t arraySize = readBytes/10;
std::string* array = new std::string[arraySize];
for (size_t i = 0; i < arraySize; ++i)
array[i].assign(buffer + 10*i, 8);
Of course timing is necessary to be sure what is fastest.
Reading lines of text from a file is much simpler if you use the classes from the c++ standard library.
This should be all of the code you need:
#include <fstream>
#include <vector>
#include <string>
#include <iostream>
int main()
{
char const* const fileName = "random_StringArray_10000000";
std::ifstream in(fileName);
if (!in)
{
std::cout << "File error\n";
return 1;
}
std::vector<std::string> lines;
std::string line;
while (std::getline(in, line))
{
lines.push_back(std::move(line));
}
return 0;
}
-
getline give me in debug mode only 2-4 Mb/sec and 10 Mb/sec in Release -O2. fread - 240 Mb/sec in debug mode and up to 355 Mb/sec in Release modeDeim– Deim2021年02月23日 14:03:57 +00:00Commented Feb 23, 2021 at 14:03
-
benchmarking in debug is meaningless, how big is your file? is it really that critical to read it faster? In a lot of cases its better to write simple, correct code than prematurely optimise into something complex with hidden bugsAlan Birtles– Alan Birtles2021年02月23日 14:06:56 +00:00Commented Feb 23, 2021 at 14:06
-
Up to 50Gb. I planed to read big files by parts (~100Mb each part)Deim– Deim2021年02月23日 14:14:53 +00:00Commented Feb 23, 2021 at 14:14
char as[ ] = "a char array";
: is a char array
char const* const fileName = "random_StringArray_10000000";
: is a c string
This is also a c string:
char* cs = const_cast<char*>( fileName );
If you want std::string
use:
std::string s(as);
I wasn't sure which string conversion you wanted so I just added what I could off the top of my head. But here's a compilable example too.
std::fstream
andstd::getline
would do all this in a couple of lines of code rather than using c functions.it is fastest way to read file
Are you sure? Have you measured it?