Simple C++ event loop - static vs dynamic dispatch performance

Question 1

I have two extremely simple toy implementations of an event loop, and would like to understand the performance differences between them.

First impl - events with a virtual 'handle' method - dynamic dispatch:

#include <memory>
#include <vector>
class Event {
 public:
 virtual bool handle() = 0;
};
class Event1 : public Event {
 public:
 bool handle() override {
 return ++n < 1000000000;
 }
 private:
 int n = 0;
};
class Event2 : public Event {
 public:
 bool handle() override {
 --n;
 return true;
 }
 private:
 int n = 0;
};
class EventLoop {
 public:
 void addEvent(Event& event) {
 events.push_back(&event);
 }
 void run() {
 while (true) {
 for (auto event : events) {
 if (!event->handle()) {
 return;
 }
 }
 }
 }
 private:
 std::vector<Event*> events;
};
int main() {
 EventLoop eventLoop;
 Event1 event1;
 Event2 event2;
 eventLoop.addEvent(event1);
 eventLoop.addEvent(event2);
 eventLoop.run();
 return 0;
}

Second impl - wrap events in a lambda - static dispatch:

#include <vector>
#include <functional>
class Event1 {
 public:
 bool handle() {
 return ++n < 1000000000;
 }
 private:
 int n = 0;
};
class Event2 {
 public:
 bool handle() {
 --n;
 return true;
 }
 private:
 int n = 0;
};
class EventLoop {
 public:
 template<typename T>
 void addEvent(T& event) {
 events.push_back([&event]() { return event.handle(); });
 }
 void run() {
 while (true) {
 for (auto& event : events) {
 if (!event()) {
 return;
 }
 }
 }
 }
 private:
 std::vector<std::function<bool()>> events;
};
int main() {
 EventLoop eventLoop;
 Event1 event1;
 Event2 event2;
 eventLoop.addEvent(event1);
 eventLoop.addEvent(event2);
 eventLoop.run();
 return 0;
}

My naive expectation was that the version using lambdas would be faster (ie 'dynamic dispatch is slow'), but compiling with g++, it was twice as fast to run - 17s compared to 35s on my machine, according to time.

I'd like to understand whether this is likely because with my toy event methods, the compiler was able to optimise the former better (and so in a more realistic scenario with more complex event handlers this would indeed typically be slower), or if there's some overhead associated with the lambdas that I don't understand that will likely always make this the case.

Question 2

This submission is about performance, yet it does not include any per-line CPU profiling measurements. It would benefit from some godbolt.org links that highlight what's different in the generated dispatch instructions. // Each example has an Event2 "distractor" thread going on, fine, counting down to negative infinity. It's unclear how much work it does. In particular, if there is a "task scheduling fairness" difference between the examples, we might be measuring that lambdas always do half as many decrements, or that they randomly did half as many on a particular run.

Question 3

Did you benchmark with optimizations enabled? Compiling with -O3, I find the run time more or less the same for both (which is kinda expected).

Question 4

Enable compiler optimizations

The runtimes you posted suggest you compiled your code without optimizations enabled. That makes them very useless. Indeed as Rish commented, if you enable them, the runtimes will be approximately the same (and also much smaller).

Toy code won't give realistic results

The risk with such simple code though is that the compiler might be able to see that your code doesn't do anything at all, and could legally replace it with:

int main() {
 return 0;
}

So for benchmarking you really should ideally use something more realistic. And if in that realisitc case every event would take a long time to process, the performance of the event loop doesn't matter at all. In that case, you should consider optimizing for code simplicity instead of performance.

Lambdas will not make things faster in this case

You heard that dynamic dispatch is slow. The reason is that in the first case, events stores pointers to objects that have a virtual method table. When calling a virtual member function, it first has to load the pointer to the member function from the concrete class from that table. This double indirection is what is "slow", as it can prevent the CPU from doing prefetching and branch prediction. In contrast, std::function<> will hold a pointer directly to the actual function you want to call, so it doesn't suffer from this issue.

However, contemporary CPUs have gotten much better at this, and furthermore, since you only have two entries in events and you never change it while the event loop is running, everything will be in the L1 cache and the branch predictor can easily predict what is going to happen.

What to choose for event loops

Pointers to base classes sometimes are better than using std::function<>, however for the specific case of an event loop, you are usually only interested in calling one function. So using std::function<> is then probably a better choice, as it avoids the double indirection when calling the function, and it avoids needing objects to be derived from a particular base class, so it's a bit more flexible.

That said, if you want to be able to remove events from the event loop, this is going to be tricky if you are just storing std::function<>s, whereas with pointers to objects it is trivial to do.

G. Sliepen G. Sliepen 68.9k3 gold badges74 silver badges180 bronze badges · Accepted Answer · 2024-04-24 06:21:03Z

Enable compiler optimizations

The runtimes you posted suggest you compiled your code without optimizations enabled. That makes them very useless. Indeed as Rish commented, if you enable them, the runtimes will be approximately the same (and also much smaller).

Toy code won't give realistic results

The risk with such simple code though is that the compiler might be able to see that your code doesn't do anything at all, and could legally replace it with:

int main() {
 return 0;
}

So for benchmarking you really should ideally use something more realistic. And if in that realisitc case every event would take a long time to process, the performance of the event loop doesn't matter at all. In that case, you should consider optimizing for code simplicity instead of performance.

Lambdas will not make things faster in this case

You heard that dynamic dispatch is slow. The reason is that in the first case, events stores pointers to objects that have a virtual method table. When calling a virtual member function, it first has to load the pointer to the member function from the concrete class from that table. This double indirection is what is "slow", as it can prevent the CPU from doing prefetching and branch prediction. In contrast, std::function<> will hold a pointer directly to the actual function you want to call, so it doesn't suffer from this issue.

However, contemporary CPUs have gotten much better at this, and furthermore, since you only have two entries in events and you never change it while the event loop is running, everything will be in the L1 cache and the branch predictor can easily predict what is going to happen.

What to choose for event loops

Pointers to base classes sometimes are better than using std::function<>, however for the specific case of an event loop, you are usually only interested in calling one function. So using std::function<> is then probably a better choice, as it avoids the double indirection when calling the function, and it avoids needing objects to be derived from a particular base class, so it's a bit more flexible.

That said, if you want to be able to remove events from the event loop, this is going to be tricky if you are just storing std::function<>s, whereas with pointers to objects it is trivial to do.

Stack Exchange Network

Simple C++ event loop - static vs dynamic dispatch performance

1 Answer 1

Enable compiler optimizations

Toy code won't give realistic results

Lambdas will not make things faster in this case

What to choose for event loops

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple C++ event loop - static vs dynamic dispatch performance

1 Answer 1

Enable compiler optimizations

Toy code won't give realistic results

Lambdas will not make things faster in this case

What to choose for event loops

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions