8
\$\begingroup\$

I recently posted an answer to this SO question as I did not think any of the current answers were satisfactory (most don't even answer the question!). I was wondering if anyone could offer any improvements on my solution to this problem? I'm most interested in performance here. Also note this algorithm does not require any additional copies of the input string (if passed an r-value), ideally I'd like to keep this property.

#include <string>
#include <cstdint> // std::size_t
#include <cmath> // std::floor, std::log2, std::pow
#include <iterator> // std::cbegin, std::cend, std::next
std::string repeatstring(std::string str, std::size_t n)
{
 if (n == 0) {
 str.clear();
 str.shrink_to_fit();
 return str;
 }
 if (n == 1 || str.empty()) return str;
 const auto repeat_size = str.size();
 if (repeat_size == 1) {
 str.append(n - 1, str.front());
 return str;
 }
 str.reserve(repeat_size * n);
 const auto m = static_cast<decltype(n)>(std::floor(std::log2(n)));
 for (auto i = m; i > 0; --i) str += str;
 n -= static_cast<decltype(n)>(std::pow(2, m));
 str.append(std::cbegin(str), std::next(std::cbegin(str), n * repeat_size));
 return str;
}

Here is an example:

#include <iostream>
#include <chrono>
int main()
{
 auto start = std::chrono::system_clock::now();
 auto str = repeatstring("helloworld", 190238011);
 auto end = std::chrono::system_clock::now();
 //std::cout << str << std::endl;
 std::cout << "string length: " << str.length() << std::endl;
 auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
 std::cout << "time: " << duration.count() << "ms" << std::endl;
 return 0;
}
asked Dec 17, 2015 at 15:36
\$\endgroup\$
1
  • \$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . \$\endgroup\$ Commented Dec 17, 2015 at 19:26

3 Answers 3

7
\$\begingroup\$

Input as reference-to-const

Change your signature to:

std::string repeat_string(std::string const& str, std::size_t N);

First, this avoids an unnecessary copy when users don't pass their strings in by rvalue (which seems more typical?). Secondly, this allows for copy elision on the returned object, as you can't have copy elision from a function parameter. So if I called repeat_string with an lvalue, I just saved two copies.

Also, note the name change. Either use camelCase or snake_case or function names, don't just jamlotsofwordstogether. It's much harder to read.

The new signature makes some of the other code easier.

if (n == 0) {
 return {};
}
...
if (str.size() == 1) {
 return std::string(n str[0]);
}

repeat_size

This is a weird variable at best. Once you make str a reference to const, you can just access str.size() which makes much more sense. repeat_size isn't really the size of the repeat - that's what n is...

Powers of 2

std::log2 and std::pow aren't cheap, and you need neither. For the former, you can just multiply by two until you're done:

std::string result = str;
result.reserve(str.size() * n);
std::size_t m = 2;
for (; m <= n; m *= 2)
{
 result += result;
}

Now m is the first power of 2 larger than n. So you can replace

n -= static_cast<decltype(n)>(std::pow(2, m));

with:

n -= m/2;

And:

str.append(std::cbegin(str), std::next(std::cbegin(str), n * repeat_size));

is a really complicated way of writing what now becomes:

result.append(result.c_str(), n * str.size());
answered Dec 17, 2015 at 16:04
\$\endgroup\$
6
  • \$\begingroup\$ I'm not sure I agree with all these points. You only save 1 copy in the case that n == 0 when passing by const reference. Correct me if I'm wrong, but str is guaranteed to be moved out of the function (if RVO is not applied). I think the benefit of not forcing a copy when n != 0 is preferable. \$\endgroup\$ Commented Dec 17, 2015 at 16:16
  • \$\begingroup\$ @Daniel Did you time it? You're looping anyway. \$\endgroup\$ Commented Dec 17, 2015 at 16:20
  • \$\begingroup\$ no, you're probably right actually, but little point trying to measure the difference given the runtime of this function will be completely dominated by the copying. \$\endgroup\$ Commented Dec 17, 2015 at 16:22
  • \$\begingroup\$ Can I ask where you have seen that copy elision isn't allowed on function parameters? I was just reading this SO answer and the author mentions pass by value as a common form of copy elision. \$\endgroup\$ Commented Dec 17, 2015 at 22:33
  • \$\begingroup\$ @Daniel It's the copy into the function that's elided, not the copy out of. \$\endgroup\$ Commented Dec 17, 2015 at 22:52
5
\$\begingroup\$
n -= static_cast<decltype(n)>(std::pow(2, m));

can be replaced with

n -= 1 << m;

<< m essentially means multiply by 2 to the power of m

Similarly you can get m by finding the highest set bit. For which most compilers will provide an intrinsic which will do this much faster than it could provide the log2 of the number.

answered Dec 17, 2015 at 15:50
\$\endgroup\$
1
  • \$\begingroup\$ I think you actually want to shift a std::size_t value of 1 rather than a (signed) int there. \$\endgroup\$ Commented Mar 11, 2019 at 10:25
0
\$\begingroup\$

There's very little benefit in writing decltype(n) when we know that type is std::size_t - just write the known type. decltype has its place, but that's normally in templates or where the type is difficult or impossible to write (usually when deduced using auto).

Of course, that's irrelevant once you apply the suggested improvement that removes the need for <cmath> and the casts.

answered Mar 11, 2019 at 10:33
\$\endgroup\$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.