5
\$\begingroup\$

I'd like to get some expert's view on this first piece of code especially in the area of

  • performance - (e.g. avoidable copies)
  • good practice (e.g. static members?)

The final goal is the provide some abstractions for text manipulation like cleansing, stop-word removal and tokenization. The code base should grow substantially in the future so a lot more string manipulation methods should be supported. I understand that this should be bundled in a library in the end, however, I'd like to get feedback on the general code style first.

#include <fstream>
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
class TextManipulator {
 public:
 static std::string replace(std::string &string, const std::string pattern,
 const std::string toreplace) {
 return std::regex_replace(string, std::regex(pattern), toreplace);
 }
 static std::string toLowerCase(std::string &string) {
 std::locale loc;
 std::string res;
 for (auto elem : string) {
 res += std::tolower(elem, loc);
 }
 return res;
 }
 static std::vector<std::string> split(const std::string &text,
 char seperator) {
 std::vector<std::string> tokens;
 std::size_t start = 0, end = 0;
 while ((end = text.find(seperator, start)) != std::string::npos) {
 std::string extr = text.substr(start, end - start);
 if (!extr.empty()) {
 tokens.push_back(extr);
 }
 start = end + 1;
 }
 tokens.push_back(text.substr(start));
 return tokens;
 }
};
int main() {
 std::ifstream input(
 "/example_docs/Pride_and_Prejudice.txt");
 std::stringstream buffer;
 buffer << input.rdbuf();
 std::string text = buffer.str();
 text = TextManipulator::replace(text, "\r\n", " ");
 text = TextManipulator::replace(text, ",", "");
 text = TextManipulator::replace(text, ":", "");
 text = TextManipulator::replace(text, ".", "");
 text = TextManipulator::toLowerCase(text);
 std::vector<std::string> tokens = TextManipulator::split(text, ' ');
 return 0;
}
asked Nov 2, 2020 at 21:11
\$\endgroup\$

2 Answers 2

7
\$\begingroup\$

Don't use a class

There is no need to create a class just to hold a collection of static functions. Instead, just declare the functions in the global namespace. If you want to avoid polluting the global namespace, consider putting them in a namespace instead, like so:

namespace TextManipulation {
std::string replace(...) {
 ...
};
...
}

Don't write unnecessary trivial wrapper functions

There is no need to create TextManipulator::replace(), when all it does is call std::regex_replace with almost exactly the same arguments. Why not call std::regex_replace() directly instead?

The name replace() also hides the fact that pattern is a regular expression.

Consider renaming toLowerCase() to tolower()

It's a bit annoying to have different naming conventions in one program, and since you already have tolower() for single characters, it would be very convenient to reuse that for whole strings as well. So:

std::string tolower(const std::string &str) {
 ...
}

Note: you forget to add const to the function parameter here, as well as for the first parameter of replace(), but you did it correctly for the other string parameters.

answered Nov 2, 2020 at 21:28
\$\endgroup\$
-3
\$\begingroup\$

Whenever possible, use boost::string_ref in your code, this will reduce the number of copy operations with std::string. Check the places where you have copy operations and replace them with boost::string_ref operations if possible.

answered Nov 2, 2020 at 22:30
\$\endgroup\$
2
  • 1
    \$\begingroup\$ There's string_view in c++17 \$\endgroup\$ Commented Nov 3, 2020 at 9:10
  • \$\begingroup\$ Yes you are right if you are in standard 17 or higher \$\endgroup\$ Commented Nov 3, 2020 at 9:52

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.