I have a class that opens a text file and reads out its content. This class has a member function that will extract a substring between two given delimiters. My application needs to extract between 6000 and 7000 substrings of this text file content on start up. Even though speed is not the most important concern, the whole process should still not take more than 1s. My problem is, that the way I wrote the class, the process takes more than 10 seconds! What I am doing is, I am using the string.find() method to find the first delimiter, than I am looking for the second delimiter in the same way. In the last step, I just use the string.substr() method to cut out the piece I want.
There must be much quicker ways to do this right? I found this answer that uses a split approach. I do not think I can use this kind of approach, as my start-delimiter varies from entry to entry. My data looks like this (this might look like JSON, but it is not guaranteed that the data will be in a clean JSON format. All that I know is that I have the relevant substring between "xN" and ","):
"x1" : "SUBSTRING_1",
...
"x6500" : "SUBSTRING_6500",
This is the complete code of the class:
class fileAccess{
public:
fileAccess(const char *path);
fileAccess(const std::string path);
bool overwrite(const std::string content); //will delete old file
bool overwriteBinary(std::vector<uint16_t> &a); //will delete old file
std::string read();
std::vector<uint16_t> readBinary();
std::string getSubStringWithinFile(std::string fromHere, std::string toHere);
private:
std::string m_filepath;
};
////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////
//////// ------------------- IMPLEMENTATION -------------------- ////////
////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////
fileAccess::fileAccess(const char *path)
{
m_filepath = path;
}
fileAccess::fileAccess(const std::string path)
{
m_filepath = path;
}
bool fileAccess::overwrite(const std::string content)
{
std::ofstream file;
file.open(m_filepath, std::ios::out);
file << content;
file.close();
return true;
}
bool fileAccess::overwriteBinary(std::vector<uint16_t> &a)
{
std::ofstream f(m_filepath, std::ios::binary);
if (!f.is_open()) {
std::cout << "[CmpCameraSetup] File could not be opened!\n";
return false;
}
uint16_t *ptr = a.data();
char *b = (char*) ptr;
size_t size = a.size() * sizeof(uint16_t);
f.write(b, size);
f.close();
if (f.bad()){
std::cout << "[CmpCameraSetup] Error when writing!" << std::endl;
return false;
}
return true;
}
std::string fileAccess::read()
{
//see https://stackoverflow.com/questions/2602013/read-whole-ascii-file-into-c-stdstring
std::ifstream t(m_filepath);
std::string str;
t.seekg(0, std::ios::end);
str.reserve(t.tellg());
t.seekg(0, std::ios::beg);
str.assign((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
return str;
}
std::vector<uint16_t> fileAccess::readBinary()
{
std::ifstream file(m_filepath, std::ios::binary);
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(file), {});
uint16_t *ptr = (uint16_t*) buffer.data();
std::vector<uint16_t> ret;
for (unsigned long int i = 0; i < buffer.size()/2; i++){
ret.push_back(ptr[i]);
}
file.close();
return ret;
}
std::string fileAccess::getSubStringWithinFile(std::string fromHere, std::string toHere)
{
std::string err = "ERROR! STRING NOT FOUND!! CHECK FILE";
std::string content = this->read();
size_t foundStart = content.find(fromHere);
if(foundStart == std::string::npos)
return err;
size_t foundEnd = content.find(toHere, foundStart);
if(foundEnd == std::string::npos)
return err;
return content.substr(foundStart + fromHere.length(), foundEnd - (foundStart + fromHere.length()));
}
On startup I am calling the getSubStringWithinFile()
method from within a for loop that goes from 1 to 6500 and passes the correct starting delimiter "x1"
, "x2"
...
1 Answer 1
fileAccess::fileAccess(const char *path)
{
m_filepath = path;
}
fileAccess::fileAccess(const std::string path)
{
m_filepath = path;
}
I prefer an initializer list constructor for such cases. No need to write it in a separate location.
You may use std::string_view
instead of const std::string
to indicate that we're only reading it. Also, that avoids the need for const char *path
overloading since string_view
accepts that too in its constructor.
The performance benefit of std::string_view
is negligible in the path since that is used just once in the program. Use it later in the code in place of string allocations. As long as something owns the string(that includes compile time constants), you don't need to create a new std::string
or a const std::string &
.
//will delete old file
Make such comments near the implementation for easy discovery. IDEs can also pick up documentation formatted comments like:
/**
* Deletes the old file.
*/
str.assign((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
The answer that you've taken this code from got a downvote from me due to the sheer performance penalty it incurs. Please see the answer by Jerry Coffin
std::ifstream t("file.txt");
std::stringstream buffer;
buffer << t.rdbuf();
If the file is huge, this approach is costly in terms of memory usage as well as time taken in memory allocations.
Read the file line by line or use asynchronous threads for reading and processing.
I won't duplicate what I've already written here:
I suggest you to give those answers a read.
-
1\$\begingroup\$ Thank you very much for your effort!
std::string_view
was completely new to me, thank you for teaching me about it! \$\endgroup\$MYZ– MYZ2020年08月04日 06:39:05 +00:00Commented Aug 4, 2020 at 6:39
myClass
class to be able to help with performance issues. Seeing the entire class gives us the context of what the function is doing. Code review is different from stackoverflow, we don't want the briefest question. Please see our guidelines on how to ask a good question. \$\endgroup\$getSubStringWithinFile
is called. You could try reading it once and passing a reference to it instead. \$\endgroup\$read()
function since the reason thethis
pointer is necessary is because you are using the name of a standard C library function. \$\endgroup\$