Repeating a std::string n times

Question 1

I recently posted an answer to this SO question as I did not think any of the current answers were satisfactory (most don't even answer the question!). I was wondering if anyone could offer any improvements on my solution to this problem? I'm most interested in performance here. Also note this algorithm does not require any additional copies of the input string (if passed an r-value), ideally I'd like to keep this property.

#include <string>
#include <cstdint> // std::size_t
#include <cmath> // std::floor, std::log2, std::pow
#include <iterator> // std::cbegin, std::cend, std::next
std::string repeatstring(std::string str, std::size_t n)
{
 if (n == 0) {
 str.clear();
 str.shrink_to_fit();
 return str;
 }
 if (n == 1 || str.empty()) return str;
 const auto repeat_size = str.size();
 if (repeat_size == 1) {
 str.append(n - 1, str.front());
 return str;
 }
 str.reserve(repeat_size * n);
 const auto m = static_cast<decltype(n)>(std::floor(std::log2(n)));
 for (auto i = m; i > 0; --i) str += str;
 n -= static_cast<decltype(n)>(std::pow(2, m));
 str.append(std::cbegin(str), std::next(std::cbegin(str), n * repeat_size));
 return str;
}

Here is an example:

#include <iostream>
#include <chrono>
int main()
{
 auto start = std::chrono::system_clock::now();
 auto str = repeatstring("helloworld", 190238011);
 auto end = std::chrono::system_clock::now();
 //std::cout << str << std::endl;
 std::cout << "string length: " << str.length() << std::endl;
 auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
 std::cout << "time: " << duration.count() << "ms" << std::endl;
 return 0;
}

Question 2

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers .

Question 3

Input as reference-to-const

Change your signature to:

std::string repeat_string(std::string const& str, std::size_t N);

First, this avoids an unnecessary copy when users don't pass their strings in by rvalue (which seems more typical?). Secondly, this allows for copy elision on the returned object, as you can't have copy elision from a function parameter. So if I called repeat_string with an lvalue, I just saved two copies.

Also, note the name change. Either use camelCase or snake_case or function names, don't just jamlotsofwordstogether. It's much harder to read.

The new signature makes some of the other code easier.

if (n == 0) {
 return {};
}
...
if (str.size() == 1) {
 return std::string(n str[0]);
}

repeat_size

This is a weird variable at best. Once you make str a reference to const, you can just access str.size() which makes much more sense. repeat_size isn't really the size of the repeat - that's what n is...

Powers of 2

std::log2 and std::pow aren't cheap, and you need neither. For the former, you can just multiply by two until you're done:

std::string result = str;
result.reserve(str.size() * n);
std::size_t m = 2;
for (; m <= n; m *= 2)
{
 result += result;
}

Now m is the first power of 2 larger than n. So you can replace

n -= static_cast<decltype(n)>(std::pow(2, m));

with:

n -= m/2;

And:

str.append(std::cbegin(str), std::next(std::cbegin(str), n * repeat_size));

is a really complicated way of writing what now becomes:

result.append(result.c_str(), n * str.size());

Question 4

I'm not sure I agree with all these points. You only save 1 copy in the case that n == 0 when passing by const reference. Correct me if I'm wrong, but str is guaranteed to be moved out of the function (if RVO is not applied). I think the benefit of not forcing a copy when n != 0 is preferable.

Question 5

@Daniel Did you time it? You're looping anyway.

Question 6

no, you're probably right actually, but little point trying to measure the difference given the runtime of this function will be completely dominated by the copying.

Question 7

Can I ask where you have seen that copy elision isn't allowed on function parameters? I was just reading this SO answer and the author mentions pass by value as a common form of copy elision.

Question 8

@Daniel It's the copy into the function that's elided, not the copy out of.

Question 9

n -= static_cast<decltype(n)>(std::pow(2, m));

can be replaced with

n -= 1 << m;

<< m essentially means multiply by 2 to the power of m

Similarly you can get m by finding the highest set bit. For which most compilers will provide an intrinsic which will do this much faster than it could provide the log2 of the number.

Question 10

I think you actually want to shift a std::size_t value of 1 rather than a (signed) int there.

Question 11

There's very little benefit in writing decltype(n) when we know that type is std::size_t - just write the known type. decltype has its place, but that's normally in templates or where the type is difficult or impossible to write (usually when deduced using auto).

Of course, that's irrelevant once you apply the suggested improvement that removes the need for <cmath> and the casts.

Barry Barry 18.5k1 gold badge40 silver badges92 bronze badges · Answer 1 · 2015-12-17 16:04:19Z

Input as reference-to-const

Change your signature to:

std::string repeat_string(std::string const& str, std::size_t N);

First, this avoids an unnecessary copy when users don't pass their strings in by rvalue (which seems more typical?). Secondly, this allows for copy elision on the returned object, as you can't have copy elision from a function parameter. So if I called repeat_string with an lvalue, I just saved two copies.

Also, note the name change. Either use camelCase or snake_case or function names, don't just jamlotsofwordstogether. It's much harder to read.

The new signature makes some of the other code easier.

if (n == 0) {
 return {};
}
...
if (str.size() == 1) {
 return std::string(n str[0]);
}

repeat_size

This is a weird variable at best. Once you make str a reference to const, you can just access str.size() which makes much more sense. repeat_size isn't really the size of the repeat - that's what n is...

Powers of 2

std::log2 and std::pow aren't cheap, and you need neither. For the former, you can just multiply by two until you're done:

std::string result = str;
result.reserve(str.size() * n);
std::size_t m = 2;
for (; m <= n; m *= 2)
{
 result += result;
}

Now m is the first power of 2 larger than n. So you can replace

n -= static_cast<decltype(n)>(std::pow(2, m));

with:

n -= m/2;

And:

str.append(std::cbegin(str), std::next(std::cbegin(str), n * repeat_size));

is a really complicated way of writing what now becomes:

result.append(result.c_str(), n * str.size());

I'm not sure I agree with all these points. You only save 1 copy in the case that n == 0 when passing by const reference. Correct me if I'm wrong, but str is guaranteed to be moved out of the function (if RVO is not applied). I think the benefit of not forcing a copy when n != 0 is preferable.
no, you're probably right actually, but little point trying to measure the difference given the runtime of this function will be completely dominated by the copying.
Can I ask where you have seen that copy elision isn't allowed on function parameters? I was just reading this SO answer and the author mentions pass by value as a common form of copy elision.
@Daniel It's the copy into the function that's elided, not the copy out of.

ratchet freak ratchet freak 13k22 silver badges46 bronze badges · Answer 2 · 2015-12-17 15:50:51Z

n -= static_cast<decltype(n)>(std::pow(2, m));

can be replaced with

n -= 1 << m;

<< m essentially means multiply by 2 to the power of m

Similarly you can get m by finding the highest set bit. For which most compilers will provide an intrinsic which will do this much faster than it could provide the log2 of the number.

I think you actually want to shift a std::size_t value of 1 rather than a (signed) int there.

Toby Speight Toby Speight 87.2k14 gold badges104 silver badges322 bronze badges · Answer 3 · 2019-03-11 10:33:41Z

There's very little benefit in writing decltype(n) when we know that type is std::size_t - just write the known type. decltype has its place, but that's normally in templates or where the type is difficult or impossible to write (usually when deduced using auto).

Of course, that's irrelevant once you apply the suggested improvement that removes the need for <cmath> and the casts.

Stack Exchange Network

Repeating a std::string n times

3 Answers 3

Hot Network Questions

Repeating a std::string n times

3 Answers 3

Related

Hot Network Questions