I recently posted an answer to this SO question as I did not think any of the current answers were satisfactory (most don't even answer the question!). I was wondering if anyone could offer any improvements on my solution to this problem? I'm most interested in performance here. Also note this algorithm does not require any additional copies of the input string (if passed an r-value), ideally I'd like to keep this property.
#include <string>
#include <cstdint> // std::size_t
#include <cmath> // std::floor, std::log2, std::pow
#include <iterator> // std::cbegin, std::cend, std::next
std::string repeatstring(std::string str, std::size_t n)
{
if (n == 0) {
str.clear();
str.shrink_to_fit();
return str;
}
if (n == 1 || str.empty()) return str;
const auto repeat_size = str.size();
if (repeat_size == 1) {
str.append(n - 1, str.front());
return str;
}
str.reserve(repeat_size * n);
const auto m = static_cast<decltype(n)>(std::floor(std::log2(n)));
for (auto i = m; i > 0; --i) str += str;
n -= static_cast<decltype(n)>(std::pow(2, m));
str.append(std::cbegin(str), std::next(std::cbegin(str), n * repeat_size));
return str;
}
Here is an example:
#include <iostream>
#include <chrono>
int main()
{
auto start = std::chrono::system_clock::now();
auto str = repeatstring("helloworld", 190238011);
auto end = std::chrono::system_clock::now();
//std::cout << str << std::endl;
std::cout << "string length: " << str.length() << std::endl;
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << "time: " << duration.count() << "ms" << std::endl;
return 0;
}
-
\$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . \$\endgroup\$TheCoffeeCup– TheCoffeeCup2015年12月17日 19:26:28 +00:00Commented Dec 17, 2015 at 19:26
3 Answers 3
Input as reference-to-const
Change your signature to:
std::string repeat_string(std::string const& str, std::size_t N);
First, this avoids an unnecessary copy when users don't pass their strings in by rvalue (which seems more typical?). Secondly, this allows for copy elision on the returned object, as you can't have copy elision from a function parameter. So if I called repeat_string
with an lvalue, I just saved two copies.
Also, note the name change. Either use camelCase
or snake_case
or function names, don't just jamlotsofwordstogether
. It's much harder to read.
The new signature makes some of the other code easier.
if (n == 0) {
return {};
}
...
if (str.size() == 1) {
return std::string(n str[0]);
}
repeat_size
This is a weird variable at best. Once you make str
a reference to const, you can just access str.size()
which makes much more sense. repeat_size
isn't really the size of the repeat - that's what n
is...
Powers of 2
std::log2
and std::pow
aren't cheap, and you need neither. For the former, you can just multiply by two until you're done:
std::string result = str;
result.reserve(str.size() * n);
std::size_t m = 2;
for (; m <= n; m *= 2)
{
result += result;
}
Now m
is the first power of 2 larger than n
. So you can replace
n -= static_cast<decltype(n)>(std::pow(2, m));
with:
n -= m/2;
And:
str.append(std::cbegin(str), std::next(std::cbegin(str), n * repeat_size));
is a really complicated way of writing what now becomes:
result.append(result.c_str(), n * str.size());
-
\$\begingroup\$ I'm not sure I agree with all these points. You only save 1 copy in the case that
n == 0
when passing by const reference. Correct me if I'm wrong, butstr
is guaranteed to be moved out of the function (if RVO is not applied). I think the benefit of not forcing a copy whenn != 0
is preferable. \$\endgroup\$Daniel– Daniel2015年12月17日 16:16:32 +00:00Commented Dec 17, 2015 at 16:16 -
\$\begingroup\$ @Daniel Did you time it? You're looping anyway. \$\endgroup\$Barry– Barry2015年12月17日 16:20:02 +00:00Commented Dec 17, 2015 at 16:20
-
\$\begingroup\$ no, you're probably right actually, but little point trying to measure the difference given the runtime of this function will be completely dominated by the copying. \$\endgroup\$Daniel– Daniel2015年12月17日 16:22:01 +00:00Commented Dec 17, 2015 at 16:22
-
\$\begingroup\$ Can I ask where you have seen that copy elision isn't allowed on function parameters? I was just reading this SO answer and the author mentions pass by value as a common form of copy elision. \$\endgroup\$Daniel– Daniel2015年12月17日 22:33:28 +00:00Commented Dec 17, 2015 at 22:33
-
\$\begingroup\$ @Daniel It's the copy into the function that's elided, not the copy out of. \$\endgroup\$Barry– Barry2015年12月17日 22:52:44 +00:00Commented Dec 17, 2015 at 22:52
n -= static_cast<decltype(n)>(std::pow(2, m));
can be replaced with
n -= 1 << m;
<< m
essentially means multiply by 2 to the power of m
Similarly you can get m
by finding the highest set bit. For which most compilers will provide an intrinsic which will do this much faster than it could provide the log2 of the number.
-
\$\begingroup\$ I think you actually want to shift a
std::size_t
value of1
rather than a (signed)int
there. \$\endgroup\$Toby Speight– Toby Speight2019年03月11日 10:25:43 +00:00Commented Mar 11, 2019 at 10:25
There's very little benefit in writing decltype(n)
when we know that type is std::size_t
- just write the known type. decltype
has its place, but that's normally in templates or where the type is difficult or impossible to write (usually when deduced using auto
).
Of course, that's irrelevant once you apply the suggested improvement that removes the need for <cmath>
and the casts.