I am currently learning C++ and have learnt about inline
functions and how they can give a performance benefit.
How could this be improved?
#include <iostream>
#include <chrono>
#include <ctime>
long fibonacci(unsigned n) {
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
inline long fib(unsigned n) {
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
void timedFibonacci(unsigned n) {
// Time vars
std::chrono::time_point<std::chrono::system_clock> start, end;
// Start time.
start = std::chrono::system_clock::now();
//std::cout << "Functional Fibonacci of " << n << " = " << fibonacci(n) << '\n';
fibonacci(n);
// End time.
end = std::chrono::system_clock::now();
// Time elapsed.
std::chrono::duration<double> functionalTimeElapsed = end - start;
// Start time.
start = std::chrono::system_clock::now();
//std::cout << "Inline Fibonacci of " << n << " = " << fib(n) << '\n';
fib(n);
// End time.
end = std::chrono::system_clock::now();
// Time elapsed.
std::chrono::duration<double> inlineTimeElapsed = end - start;
if (functionalTimeElapsed > inlineTimeElapsed) {
std::cout << "n = " << n << "\tFunctional method was faster by " << ((functionalTimeElapsed - inlineTimeElapsed) / inlineTimeElapsed) * 100 << "%\n";
} else if (functionalTimeElapsed < inlineTimeElapsed) {
std::cout << "n = " << n << "\tInline method was faster by " << ((inlineTimeElapsed - functionalTimeElapsed) / functionalTimeElapsed) * 100 << "%\n";
} else {
std::cout << "n = " << n << "\tFunctional method and inline method took the same time.\n";
}
}
int main() {
for (int i = 5; i <=40 ; i++) {
timedFibonacci(i);
}
cin.get();
}
2 Answers 2
Inlining
First of all, your inline
function calls the non-inline function, so if the compiler is really paying attention to the inline
specification, it's only affecting a single "layer" of invocation.
inline long fib(unsigned n) {
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
To stand a chance of getting some good out of the inline
specification, you probably want to have this call itself instead of the other function:
inline long fib(unsigned n) {
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
In general, recursive functions are a problem for inlining in any case, so many compilers completely disable inline code generation for recursive functions.
Add in the fact that most reasonably current compilers basically ignore the inline
specifier anyway, and you get a high likelihood that you won't see any difference between these functions at all.
Timing
I'd do the timing somewhat differently. The single responsibility principle applies here, just like everything else. That means the timer should deal only with timing. I usually use something on this general order:
template <typename F, typename ...Args>
auto timer(F f, std::string const &label, Args && ...args) {
using namespace std::chrono;
auto start = high_resolution_clock::now();
auto holder = f(std::forward<Args>(args)...);
auto stop = high_resolution_clock::now();
std::cout << label << " time: " << duration_cast<microseconds>(stop - start).count() << "\n";
return holder;
}
Arguably this already does more than it should (printing out results in addition to the actual timing) but at least it's quite a bit closer to a single responsibility.
With that, we invoke each Fibonacci generator separately, something on this order:
auto foo = timer(fib, "inline", max);
The result will then look something like this:
inline time: 1062
Just a minor note: this is a somewhat more general timing function that most people usually need. In particular, it doesn't require the function being timed to take a specific number of arguments--the number and types of arguments you pass after the label
have to match those needed by the function you're timing, but as far as the timer itself cares, it's essentially wide open.
Efficiency
Ignoring, for the moment, the issue of inlining (which is probably pretty much a red herring anyway), I'd consider other ways of computing Fibonacci numbers. In the absence of memoization, a recursive function is horribly inefficient. An iterative function is many times faster. That can be written with a for
loop, something on this general order:
unsigned long long fib(unsigned limit) {
unsigned long long a = 1;
unsigned long long b = 1;
unsigned long long c;
if (limit < 2)
return 1;
for (auto i=1ULL; i<limit; i++) {
c = a + b;
a = b;
b = c;
}
return c;
}
This is somewhat longer, but the difference in speed is...fairly substantial. Computing fib(43) with each, I get timings like this:
out of line time: 2807407170
inline time: 2794785991
iterative time: 321
Sum (ignore): 1568397607
I had to switch to showing the time in nanoseconds to get a non-zero time for the iterative solution. Even so, for fib(43) and highest resolution-timing I can get, it still shows a time of 0
about one run out of every 3 or 4. The recursive versions are approximately 7 orders of magnitude slower (and although inline code generation did seem to help a tiny bit, it's still minuscule compared to the improvement from a better algorithm. Oh, and for what it's worth, it looks like the inline code generation really did make a difference. It is pretty small, but at least in my testing, the inline version does finish just a tiny bit faster every time.
Final Code
For what it's worth, here's the code I ran to get the timing comparison:
#include <iostream>
#include <string>
#include <chrono>
template <typename F, typename ...Args>
auto timer(F f, std::string const &label, Args && ...args) {
using namespace std::chrono;
auto start = high_resolution_clock::now();
auto holder = f(std::forward<Args>(args)...);
auto stop = high_resolution_clock::now();
std::cout << label << " time: " << duration_cast<nanoseconds>(stop - start).count() << "\n";
return holder;
}
long fibonacci(unsigned n) {
if (n < 2) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
inline long fib(unsigned n) {
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
unsigned long long iter_fib(unsigned limit) {
unsigned long long a = 1;
unsigned long long b = 1;
unsigned long long c;
if (limit < 2)
return 1;
for (auto i = 1ULL; i < limit; i++) {
c = a + b;
a = b;
b = c;
}
return c;
}
int main() {
static const int max = 43;
auto a = timer(fibonacci, "out of line", max);
auto b = timer(fib, " inline", max);
auto c = timer(iter_fib, " iterative", max);
std::cout << "Sum (ignore): " << a + b + c;
}
-
\$\begingroup\$ In your final main, shouldn't
max
beconstexpr
instead ofstatic const
? \$\endgroup\$Yk Cheese– Yk Cheese2016年10月08日 08:58:17 +00:00Commented Oct 8, 2016 at 8:58 -
\$\begingroup\$ @YkCheese Why should it be? Its value is not needed in any constant expressions. If you're thinking about optimisation, then
constexpr
is no panacaea: a non-constexpr
variable can be inlined as a literal in expressions, and conversely, aconstexpr
variable will occupy storage and require loading if it's used via a pointer/reference. As a bonus, in the special case ofmain
, probably thestatic
is redundant since that function can't be called more than once. \$\endgroup\$underscore_d– underscore_d2016年10月08日 11:53:33 +00:00Commented Oct 8, 2016 at 11:53 -
\$\begingroup\$ Ok, I think ils time i get more documented on constexpr... \$\endgroup\$Yk Cheese– Yk Cheese2016年10月08日 12:27:06 +00:00Commented Oct 8, 2016 at 12:27
Read this question regarding the proper use of
inline
. If you get very similar results between the two Fibonacci functions, this may explain it.Instead of abbreviating one function to
fib()
, name itinline_fibonacci()
or something similar. It'll at least make the function calls less-confusing.timedFibonacci()
seems to be doing a little too much. Consider returning the final times and havingmain()
print them.If you won't need those commented-out lines, then remove them. Don't clutter the code.
Since the times are in
double
, it's not very likely that you'll get the exact same times.
Explore related questions
See similar questions with these tags.
inline
specifier is a hint; It does not force the function to be inlined. Compilers are likely to ignore the specifier, as they've become more adept at detecting which pieces of code to inline.inline
might as well be used for member definitions in headers. \$\endgroup\$inline
keyword. This is because humans are really really really bad at deciding what needs to be inlined. The compiler heuristics on determining if a function should be inlined are much better so this is what is used. I would highly discourage any programmer that thinks they can do better than the compiler (and uses the compiler specific commands to force inlining); because even if you are correct with a function in its current state any modification to the code may change that result. \$\endgroup\$inline
is NOT about asking for the function to be inlined, it's about specifying that the definition is inlined so that the toolchain allows the definition to be present in multiple translation units. \$\endgroup\$