Benchmarking inline vs normal functions in C++

Question 1

I am currently learning C++ and have learnt about inline functions and how they can give a performance benefit.

How could this be improved?

#include <iostream>
#include <chrono>
#include <ctime>
long fibonacci(unsigned n) {
 if (n < 2) return n;
 return fibonacci(n - 1) + fibonacci(n - 2);
}
inline long fib(unsigned n) {
 if (n < 2) return n;
 return fibonacci(n - 1) + fibonacci(n - 2);
}
void timedFibonacci(unsigned n) {
 // Time vars
 std::chrono::time_point<std::chrono::system_clock> start, end;
 // Start time.
 start = std::chrono::system_clock::now();
 //std::cout << "Functional Fibonacci of " << n << " = " << fibonacci(n) << '\n';
 fibonacci(n);
 // End time.
 end = std::chrono::system_clock::now();
 // Time elapsed.
 std::chrono::duration<double> functionalTimeElapsed = end - start;
 // Start time.
 start = std::chrono::system_clock::now();
 //std::cout << "Inline Fibonacci of " << n << " = " << fib(n) << '\n';
 fib(n);
 // End time.
 end = std::chrono::system_clock::now();
 // Time elapsed.
 std::chrono::duration<double> inlineTimeElapsed = end - start;
 if (functionalTimeElapsed > inlineTimeElapsed) {
 std::cout << "n = " << n << "\tFunctional method was faster by " << ((functionalTimeElapsed - inlineTimeElapsed) / inlineTimeElapsed) * 100 << "%\n";
 } else if (functionalTimeElapsed < inlineTimeElapsed) {
 std::cout << "n = " << n << "\tInline method was faster by " << ((inlineTimeElapsed - functionalTimeElapsed) / functionalTimeElapsed) * 100 << "%\n"; 
 } else {
 std::cout << "n = " << n << "\tFunctional method and inline method took the same time.\n";
 }
}
int main() {
 for (int i = 5; i <=40 ; i++) {
 timedFibonacci(i);
 }
 cin.get();
}

Question 2

Please note that the inline specifier is a hint; It does not force the function to be inlined. Compilers are likely to ignore the specifier, as they've become more adept at detecting which pieces of code to inline. inline might as well be used for member definitions in headers.

Question 3

All modern compilers ignore the programmers usage of the inline keyword. This is because humans are really really really bad at deciding what needs to be inlined. The compiler heuristics on determining if a function should be inlined are much better so this is what is used. I would highly discourage any programmer that thinks they can do better than the compiler (and uses the compiler specific commands to force inlining); because even if you are correct with a function in its current state any modification to the code may change that result.

Question 4

Actually, inline is NOT about asking for the function to be inlined, it's about specifying that the definition is inlined so that the toolchain allows the definition to be present in multiple translation units.

Question 5

Inlining

First of all, your inline function calls the non-inline function, so if the compiler is really paying attention to the inline specification, it's only affecting a single "layer" of invocation.

inline long fib(unsigned n) {
 if (n < 2) return n;
 return fibonacci(n - 1) + fibonacci(n - 2);
}

To stand a chance of getting some good out of the inline specification, you probably want to have this call itself instead of the other function:

inline long fib(unsigned n) {
 if (n < 2) return n;
 return fib(n - 1) + fib(n - 2);
}

In general, recursive functions are a problem for inlining in any case, so many compilers completely disable inline code generation for recursive functions.

Add in the fact that most reasonably current compilers basically ignore the inline specifier anyway, and you get a high likelihood that you won't see any difference between these functions at all.

Timing

I'd do the timing somewhat differently. The single responsibility principle applies here, just like everything else. That means the timer should deal only with timing. I usually use something on this general order:

template <typename F, typename ...Args>
auto timer(F f, std::string const &label, Args && ...args) {
 using namespace std::chrono;
 auto start = high_resolution_clock::now();
 auto holder = f(std::forward<Args>(args)...);
 auto stop = high_resolution_clock::now();
 std::cout << label << " time: " << duration_cast<microseconds>(stop - start).count() << "\n";
 return holder;
}

Arguably this already does more than it should (printing out results in addition to the actual timing) but at least it's quite a bit closer to a single responsibility.

With that, we invoke each Fibonacci generator separately, something on this order:

auto foo = timer(fib, "inline", max);

The result will then look something like this:

inline time: 1062

Just a minor note: this is a somewhat more general timing function that most people usually need. In particular, it doesn't require the function being timed to take a specific number of arguments--the number and types of arguments you pass after the label have to match those needed by the function you're timing, but as far as the timer itself cares, it's essentially wide open.

Efficiency

Ignoring, for the moment, the issue of inlining (which is probably pretty much a red herring anyway), I'd consider other ways of computing Fibonacci numbers. In the absence of memoization, a recursive function is horribly inefficient. An iterative function is many times faster. That can be written with a for loop, something on this general order:

unsigned long long fib(unsigned limit) { 
 unsigned long long a = 1;
 unsigned long long b = 1;
 unsigned long long c;
 if (limit < 2)
 return 1;
 for (auto i=1ULL; i<limit; i++) {
 c = a + b;
 a = b;
 b = c;
 }
 return c; 
}

This is somewhat longer, but the difference in speed is...fairly substantial. Computing fib(43) with each, I get timings like this:

out of line time: 2807407170
 inline time: 2794785991
 iterative time: 321
Sum (ignore): 1568397607

I had to switch to showing the time in nanoseconds to get a non-zero time for the iterative solution. Even so, for fib(43) and highest resolution-timing I can get, it still shows a time of 0 about one run out of every 3 or 4. The recursive versions are approximately 7 orders of magnitude slower (and although inline code generation did seem to help a tiny bit, it's still minuscule compared to the improvement from a better algorithm. Oh, and for what it's worth, it looks like the inline code generation really did make a difference. It is pretty small, but at least in my testing, the inline version does finish just a tiny bit faster every time.

Final Code

For what it's worth, here's the code I ran to get the timing comparison:

#include <iostream>
#include <string>
#include <chrono>
template <typename F, typename ...Args>
auto timer(F f, std::string const &label, Args && ...args) {
 using namespace std::chrono;
 auto start = high_resolution_clock::now();
 auto holder = f(std::forward<Args>(args)...);
 auto stop = high_resolution_clock::now();
 std::cout << label << " time: " << duration_cast<nanoseconds>(stop - start).count() << "\n";
 return holder;
}
long fibonacci(unsigned n) {
 if (n < 2) return n;
 return fibonacci(n - 1) + fibonacci(n - 2);
}
inline long fib(unsigned n) {
 if (n < 2) return n;
 return fib(n - 1) + fib(n - 2);
}
unsigned long long iter_fib(unsigned limit) {
 unsigned long long a = 1;
 unsigned long long b = 1;
 unsigned long long c;
 if (limit < 2)
 return 1;
 for (auto i = 1ULL; i < limit; i++) {
 c = a + b;
 a = b;
 b = c;
 }
 return c;
}
int main() {
 static const int max = 43;
 auto a = timer(fibonacci, "out of line", max);
 auto b = timer(fib, " inline", max);
 auto c = timer(iter_fib, " iterative", max);
 std::cout << "Sum (ignore): " << a + b + c;
}

Question 6

In your final main, shouldn't max be constexpr instead of static const ?

Question 7

@YkCheese Why should it be? Its value is not needed in any constant expressions. If you're thinking about optimisation, then constexpr is no panacaea: a non-constexpr variable can be inlined as a literal in expressions, and conversely, a constexpr variable will occupy storage and require loading if it's used via a pointer/reference. As a bonus, in the special case of main, probably the static is redundant since that function can't be called more than once.

Question 8

Ok, I think ils time i get more documented on constexpr...

Question 9

Read this question regarding the proper use of inline. If you get very similar results between the two Fibonacci functions, this may explain it.
Instead of abbreviating one function to fib(), name it inline_fibonacci() or something similar. It'll at least make the function calls less-confusing.
timedFibonacci() seems to be doing a little too much. Consider returning the final times and having main() print them.
If you won't need those commented-out lines, then remove them. Don't clutter the code.
Since the times are in double, it's not very likely that you'll get the exact same times.

Jerry Coffin Jerry Coffin 34.1k4 gold badges77 silver badges145 bronze badges · Accepted Answer · 2016-10-08 05:42:38Z

Inlining

First of all, your inline function calls the non-inline function, so if the compiler is really paying attention to the inline specification, it's only affecting a single "layer" of invocation.

inline long fib(unsigned n) {
 if (n < 2) return n;
 return fibonacci(n - 1) + fibonacci(n - 2);
}

To stand a chance of getting some good out of the inline specification, you probably want to have this call itself instead of the other function:

inline long fib(unsigned n) {
 if (n < 2) return n;
 return fib(n - 1) + fib(n - 2);
}

In general, recursive functions are a problem for inlining in any case, so many compilers completely disable inline code generation for recursive functions.

Add in the fact that most reasonably current compilers basically ignore the inline specifier anyway, and you get a high likelihood that you won't see any difference between these functions at all.

Timing

I'd do the timing somewhat differently. The single responsibility principle applies here, just like everything else. That means the timer should deal only with timing. I usually use something on this general order:

template <typename F, typename ...Args>
auto timer(F f, std::string const &label, Args && ...args) {
 using namespace std::chrono;
 auto start = high_resolution_clock::now();
 auto holder = f(std::forward<Args>(args)...);
 auto stop = high_resolution_clock::now();
 std::cout << label << " time: " << duration_cast<microseconds>(stop - start).count() << "\n";
 return holder;
}

Arguably this already does more than it should (printing out results in addition to the actual timing) but at least it's quite a bit closer to a single responsibility.

With that, we invoke each Fibonacci generator separately, something on this order:

auto foo = timer(fib, "inline", max);

The result will then look something like this:

inline time: 1062

Just a minor note: this is a somewhat more general timing function that most people usually need. In particular, it doesn't require the function being timed to take a specific number of arguments--the number and types of arguments you pass after the label have to match those needed by the function you're timing, but as far as the timer itself cares, it's essentially wide open.

Efficiency

Ignoring, for the moment, the issue of inlining (which is probably pretty much a red herring anyway), I'd consider other ways of computing Fibonacci numbers. In the absence of memoization, a recursive function is horribly inefficient. An iterative function is many times faster. That can be written with a for loop, something on this general order:

unsigned long long fib(unsigned limit) { 
 unsigned long long a = 1;
 unsigned long long b = 1;
 unsigned long long c;
 if (limit < 2)
 return 1;
 for (auto i=1ULL; i<limit; i++) {
 c = a + b;
 a = b;
 b = c;
 }
 return c; 
}

This is somewhat longer, but the difference in speed is...fairly substantial. Computing fib(43) with each, I get timings like this:

out of line time: 2807407170
 inline time: 2794785991
 iterative time: 321
Sum (ignore): 1568397607

I had to switch to showing the time in nanoseconds to get a non-zero time for the iterative solution. Even so, for fib(43) and highest resolution-timing I can get, it still shows a time of 0 about one run out of every 3 or 4. The recursive versions are approximately 7 orders of magnitude slower (and although inline code generation did seem to help a tiny bit, it's still minuscule compared to the improvement from a better algorithm. Oh, and for what it's worth, it looks like the inline code generation really did make a difference. It is pretty small, but at least in my testing, the inline version does finish just a tiny bit faster every time.

Final Code

For what it's worth, here's the code I ran to get the timing comparison:

#include <iostream>
#include <string>
#include <chrono>
template <typename F, typename ...Args>
auto timer(F f, std::string const &label, Args && ...args) {
 using namespace std::chrono;
 auto start = high_resolution_clock::now();
 auto holder = f(std::forward<Args>(args)...);
 auto stop = high_resolution_clock::now();
 std::cout << label << " time: " << duration_cast<nanoseconds>(stop - start).count() << "\n";
 return holder;
}
long fibonacci(unsigned n) {
 if (n < 2) return n;
 return fibonacci(n - 1) + fibonacci(n - 2);
}
inline long fib(unsigned n) {
 if (n < 2) return n;
 return fib(n - 1) + fib(n - 2);
}
unsigned long long iter_fib(unsigned limit) {
 unsigned long long a = 1;
 unsigned long long b = 1;
 unsigned long long c;
 if (limit < 2)
 return 1;
 for (auto i = 1ULL; i < limit; i++) {
 c = a + b;
 a = b;
 b = c;
 }
 return c;
}
int main() {
 static const int max = 43;
 auto a = timer(fibonacci, "out of line", max);
 auto b = timer(fib, " inline", max);
 auto c = timer(iter_fib, " iterative", max);
 std::cout << "Sum (ignore): " << a + b + c;
}

In your final main, shouldn't max be constexpr instead of static const ?
@YkCheese Why should it be? Its value is not needed in any constant expressions. If you're thinking about optimisation, then constexpr is no panacaea: a non-constexpr variable can be inlined as a literal in expressions, and conversely, a constexpr variable will occupy storage and require loading if it's used via a pointer/reference. As a bonus, in the special case of main, probably the static is redundant since that function can't be called more than once.

Stack Exchange Network

Benchmarking inline vs normal functions in C++

2 Answers 2

Inlining

Timing

Efficiency

Final Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Benchmarking inline vs normal functions in C++

2 Answers 2

Inlining

Timing

Efficiency

Final Code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions