Optimizing polymorphic objects when there's only one implementation

Question 1

Let's say I have an interface called ParentClass. ParantClass has two implementations, ParentClassA and ParentClassB. There is also the ChildClass interface, with a ChildClassA and ChildClassB implementation. The ParentClassinterface has a function called createChild, which returns a pointer to a ChildClass interface of the appropriate type (ParentClassA::createChild returns a ChildClassA while ParentClassB::createChild returns a ChildClassB).

The code tends to look like this:

ChildClass *child = parentClass->createChild();
child->destroy();

The inefficiency here is that the ParentClass interface will only ever return a ChildClass interface of the same implementation, so there's a ton of vtables being used but there's only one implementation that they lead to.

A theoretically more efficient solution would be for ParentClass::createChild to return an opaque pointer, and to have all member functions contained in ChildClass to be moved to ParentClass and have the opaque pointer be passed as the first argument. This will result in only a single vtable, but the code ends up being a bit uglier.

And that code tends to look like this:

ChildClass *child = parentClass->createChild();
parentClass->destroyChild(child);

This part of my application is used quite frequently, so performance is an important consideration. But readability and maintainable code is as well. I'm not sure which approach I should be using.

Question 2

Is Vtable dispatch really the limiting factor in your code? This can feasibly be the case due to cache thrashing, but most applications never get to a point where this becomes noticeable. E.g. I'd expect dynamic memory allocation to have a stronger effect on performance. And note that you don't have "a ton of vtables" but only 4, one for each class being instantiated. Possibly you can avoid both dynamic allocation and virtual methods by using templates?

Question 3

To expand on @amon's comment, Is the resulting (optimised) binary using vtables? An allowable optimisation is to devirtualize function calls where the target is statically known

Question 4

I know I'm supposed to be ranting about premature optimization here but I can't get passed a purely semantic problem: Why are these parent and child? Only relationship here seems to be a factory. You're not using some weird "gives-birth-to" relationship here are you? If child "is-a" parent then shouldn't the declaration be ParentClass *child = parentClass->createChild(); so that the child can be used just like the parent?

Question 5

What language? (Practically, the performance hit won't matter, but if you really insist on worry about it, then it really depends on the language you are using.)

Question 6

Don't put the car before the horse. Until you have proof that performance is impaired by some inefficiency it doesn't matter where the inefficiencies are. Write the simplest and most maintainable code you can, refactor if any issues arise.

Question 7

That's an interesting concern, but it should be a comment, not an answer. You don't know what's the process OP used to conclude he needed such an optimization. Your answer doesn't add much value to someone who knows he needs such a solution, but not how to implement it.

Question 8

@VincentSavard The title asks about optimization but in the body it looks like the OP wants to choose between performance on one side and readability and maintainability on the other. I answered to the body because TBH I forgot about the title while I was reading the question. After the fact I think that the body is a better description of the OP's actual concern.

Question 9

To expand on @amon's comment, you can use templates to do the polymorphism at compile-time, instead of virtual doing it at runtime.

If the implementation of ParentA and ParentB are identical apart from uses of ChildA and ChildB, just write a template<typename Child> class Parent{ ... }; directly.

Otherwise, you may still want to have an interface, but that too can be a template. You would have something like

template<typename Child>
struct Parent {
 using child_type = Child;
 virtual Child * createChild() = 0;
 // rest of interface, using Child type parameter directly
};
struct ParentA : public Parent<ChildA> {
 // use of final is a strong hint for the compiler to devirtualize
 ChildA * createChild() final;
 // more implementation for ChildA
};

If you do have concrete subclasses for parent, you may want to have a helper to get the parent for each child

template<typename Child>
struct child_traits;
template<>
struct child_traits<ChildA> { 
 using parent_type = ParentA; 
}

Then with either choice, the other parts of your program use Child (and possibly Parent

template<typename Child> 
void ParentCollaborator(ParentArgs args) {
 Parent<Child> parent(args); 
 Child * child parent.makeChild();
 // template type deduction allows omitting <Child> here
 doParentChildStuff(parent, *child);
}

Question 10

Is there any way to switch between the two at runtime? The application picks one implementation at runtime and uses that until it shuts down.

Question 11

If you propagate the templating through all the use of parent and child, main can just have an if (isA) runProgram<A>(); else runProgram<B>(); or similar

Question 12

I don't see how the two options you propose are different from a dispatching efficiency perspective.

The first approach uses a dispatch for ParentClass, then ChildClass, while the latter does a dispatch for ParentClass, then ParentClass again.

Your second solution is not much different because ParentClass also has two implementations.

Now if somehow ParentClass had non-virtual implementation of destroyChild, that would be different, but there is nothing in your post suggesting that, and in fact, they way I read it, you have two different implementations of the same interface, which says to me that destroyChild is virtual.

In a straightforward runtime, these would both be relatively the same performance. The only difference is that the latter has a better chance of having warmed the cache by using the same vtable twice.

Still, in a looping test, I'm not sure you'd notice, as after the first iteration the cache would be warm to all and you'd have to exhaust the cache during the loop to have that become an issue.

In addition to the data cache for the vtable, the CPU is going to do branch prediction, and, it will do it well in a looping test. So, you'll have two caching mechanisms working on these dispatches. (Not to mention the code cache.)

I would go with whatever makes your code more maintainable.

score 2 · Answer 1 · 2017-06-02 10:23:52Z

2

Don't put the car before the horse. Until you have proof that performance is impaired by some inefficiency it doesn't matter where the inefficiencies are. Write the simplest and most maintainable code you can, refactor if any issues arise.

Share

Improve this answer

answered Jun 2, 2017 at 10:23

Stop harming Monica's user avatar

Stop harming Monica Stop harming Monica

8251 gold badge8 silver badges12 bronze badges

2

That's an interesting concern, but it should be a comment, not an answer. You don't know what's the process OP used to conclude he needed such an optimization. Your answer doesn't add much value to someone who knows he needs such a solution, but not how to implement it.

Vincent Savard
– Vincent Savard

2017年06月02日 14:00:12 +00:00
Commented Jun 2, 2017 at 14:00
@VincentSavard The title asks about optimization but in the body it looks like the OP wants to choose between performance on one side and readability and maintainability on the other. I answered to the body because TBH I forgot about the title while I was reading the question. After the fact I think that the body is a better description of the OP's actual concern.

Stop harming Monica
– Stop harming Monica

2017年06月02日 15:37:33 +00:00
Commented Jun 2, 2017 at 15:37

Add a comment |

Caleth Caleth 12.3k2 gold badges29 silver badges44 bronze badges · Answer 2 · 2017-06-02 12:03:45Z

To expand on @amon's comment, you can use templates to do the polymorphism at compile-time, instead of virtual doing it at runtime.

If the implementation of ParentA and ParentB are identical apart from uses of ChildA and ChildB, just write a template<typename Child> class Parent{ ... }; directly.

Otherwise, you may still want to have an interface, but that too can be a template. You would have something like

template<typename Child>
struct Parent {
 using child_type = Child;
 virtual Child * createChild() = 0;
 // rest of interface, using Child type parameter directly
};
struct ParentA : public Parent<ChildA> {
 // use of final is a strong hint for the compiler to devirtualize
 ChildA * createChild() final;
 // more implementation for ChildA
};

If you do have concrete subclasses for parent, you may want to have a helper to get the parent for each child

template<typename Child>
struct child_traits;
template<>
struct child_traits<ChildA> { 
 using parent_type = ParentA; 
}

Then with either choice, the other parts of your program use Child (and possibly Parent

template<typename Child> 
void ParentCollaborator(ParentArgs args) {
 Parent<Child> parent(args); 
 Child * child parent.makeChild();
 // template type deduction allows omitting <Child> here
 doParentChildStuff(parent, *child);
}

Is there any way to switch between the two at runtime? The application picks one implementation at runtime and uses that until it shuts down.
If you propagate the templating through all the use of parent and child, main can just have an if (isA) runProgram<A>(); else runProgram<B>(); or similar

Erik Eidt Erik Eidt 34.8k6 gold badges61 silver badges95 bronze badges · Answer 3 · 2017-06-02 14:34:25Z

I don't see how the two options you propose are different from a dispatching efficiency perspective.

The first approach uses a dispatch for ParentClass, then ChildClass, while the latter does a dispatch for ParentClass, then ParentClass again.

Your second solution is not much different because ParentClass also has two implementations.

Now if somehow ParentClass had non-virtual implementation of destroyChild, that would be different, but there is nothing in your post suggesting that, and in fact, they way I read it, you have two different implementations of the same interface, which says to me that destroyChild is virtual.

In a straightforward runtime, these would both be relatively the same performance. The only difference is that the latter has a better chance of having warmed the cache by using the same vtable twice.

Still, in a looping test, I'm not sure you'd notice, as after the first iteration the cache would be warm to all and you'd have to exhaust the cache during the loop to have that become an issue.

In addition to the data cache for the vtable, the CPU is going to do branch prediction, and, it will do it well in a looping test. So, you'll have two caching mechanisms working on these dispatches. (Not to mention the code cache.)

I would go with whatever makes your code more maintainable.

Stack Exchange Network

Optimizing polymorphic objects when there's only one implementation

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimizing polymorphic objects when there's only one implementation

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions