How can I associate algorithm-specific data to the objects it works with, efficiently yet cleanly?

Question 1

I'm writing in C++, but this problem applies to most non-high-level languages, and possibly some high-level ones as well.

I have a graph of heterogeneous nodes. The graph can be instantiated by different subsystems (e.g. it can be read from a file or generated by one out of many algorithms).

I have several algorithms that can work on the graph to compute something. Most algorithms need to store extra information for each node, and some are computationally intensive and I'm trying to optimize them.

Until now I've been using a hash table (a std::unordered_map<Node*, Data>) to store the extra information of a node, but I just realized that using a hashmap affects performance noticeably: the most intensive algorithm performs 2~5% better if I keep the data in the node type. However placing algorithm-specific data in the node type seems extremely dirty.

What are some good ways to store extra information for each node, in an efficient, yet clean way? Are there any smart ideas/patterns I should consider?

Question 2

I would consider using a hash table as the cleanest approach with good performance. As an alternative, have you considered: (1) instead of using a 64-bit type (a pointer) as key, how about using a 32-bit type (if you can create a 32-bit unique identifier for each node) ? (2) have you considered providing your own key hash function for the key type? (3) have you investigated preallocation for your hash table? These three questions summarize my own experience in optimizing C++ unordered_map performance.

Question 3

Possibly create algorithm-specific classes, that store the data the algorithm requires and the pointer to the associated node. Going from the node back to the extra info can be done using an index added to the Node that identifies the extra data stored with it.

Question 4

A low-level CPU execution profiler (one that has low overhead and observes the "currently executing instruction address" on a periodic timer interrupt) can also help identify where the performance difference happens.

Question 5

2~5%, that's all? I am under the impression you are barking the wrong tree. Please have a look into Is micro-optimisation important when coding?

Question 6

The reason I focused on the key bit-width and hash function, is because I know that certain compilers (e.g. MSVC) uses a byte-oriented hash function FNV1a. When applied to 64-bit key types, the function executes its inner loop 8 times. When the hash table is used in certain types of algorithms (without further customizations or optimizations), the computation of hash function can show up prominently on CPU profiler results. Since a hash table must store the actual key in memory, a wide key type means it will consume more memory, which occupies space on the CPU cache.

Question 7

How can I associate algorithm-specific data to the objects it works with, efficiently yet cleanly?

If the different node classes conform to the Liskov substitution principle then you ought to be able to use inheritance to achieve this.

For every node class Foo you want to store some ancillary information for, define a subclass of it called DecoratedFoo to hold (and possibly even compute) the ancillary information required by your algorithm.

When your program loads the graph from (whatever input source), whenever you see a node of class Foo, instantiate a node of class DecoratedFoo instead.

When your algorithm needs this extra information about a node, your algorithm can retrieve it via attributes or methods exposed by DecoratedFoo. This is efficient in the sense that the only overhead is object dispatch, which can be arranged to be resolved statically in many cases.

It is also clean in the sense that any library or other code you're re-using does not need to be changed, because DectoratedFoo is-a Foo, so well-behaved (that is, LSP-conforming) code will be oblivious to the decorations that your algorithm is using.

Question 8

A computer science engineering or Object Oriented design question?

computationally intensive and I'm trying to optimize them. ... most intensive algorithm performs 2~5% better if I keep the data in the node type ...

I'm with Doc Brown on this.

However placing algorithm-specific data in the node type seems extremely dirty. ... What are some good ways to store extra information for each node

This blinkered "dirty" pronouncement is preventing you from realizing the specific data is an extension of a node. Yet "In the node type" is exactly where it should be.

B Ithica specific answer of the general OO idea of extensibility has merit. Decorator is one of the Structural design patterns.

Design Deeply

Thinking on your performance improvment observation: absence of a coherent design structure often results in scattered, redundant, deeply nested glue coding.

You should read broadly about design patterns because any number of them may be appropriate for various levels of code detail. And generally think about a class whenever two or more things must integrate, stay together, play together.

your algorithm can retrieve it via attributes or methods exposed by DecoratedFoo [B Ithica answer]

I bought a DecoratedFoo. Now What? You asked about:

some good ways to store extra information for each node

That detail and it's algorithm integration is best incapsulated with something like this:

A IntegratedData class than merges node + specific data, exposing this composite with methods.
A UniqueAlgorithmclass containing the algorithm and IntegratedData, exposing generalized methods for driving the algorithm
In the DecoratorFoo class (see B. Ithica answer) a reference to the above class composition. B. Ithica says: "your algorithm can retrieve it via attributes or methods exposed by DecoratedFoo"

B. Ithica B. Ithica 3351 silver badge5 bronze badges · Answer 1 · 2021-01-06 15:21:29Z

How can I associate algorithm-specific data to the objects it works with, efficiently yet cleanly?

If the different node classes conform to the Liskov substitution principle then you ought to be able to use inheritance to achieve this.

For every node class Foo you want to store some ancillary information for, define a subclass of it called DecoratedFoo to hold (and possibly even compute) the ancillary information required by your algorithm.

When your program loads the graph from (whatever input source), whenever you see a node of class Foo, instantiate a node of class DecoratedFoo instead.

When your algorithm needs this extra information about a node, your algorithm can retrieve it via attributes or methods exposed by DecoratedFoo. This is efficient in the sense that the only overhead is object dispatch, which can be arranged to be resolved statically in many cases.

It is also clean in the sense that any library or other code you're re-using does not need to be changed, because DectoratedFoo is-a Foo, so well-behaved (that is, LSP-conforming) code will be oblivious to the decorations that your algorithm is using.

radarbob radarbob 5,85321 silver badges34 bronze badges · Answer 2 · 2021-01-06 20:11:43Z

A computer science engineering or Object Oriented design question?

computationally intensive and I'm trying to optimize them. ... most intensive algorithm performs 2~5% better if I keep the data in the node type ...

I'm with Doc Brown on this.

However placing algorithm-specific data in the node type seems extremely dirty. ... What are some good ways to store extra information for each node

This blinkered "dirty" pronouncement is preventing you from realizing the specific data is an extension of a node. Yet "In the node type" is exactly where it should be.

B Ithica specific answer of the general OO idea of extensibility has merit. Decorator is one of the Structural design patterns.

Design Deeply

Thinking on your performance improvment observation: absence of a coherent design structure often results in scattered, redundant, deeply nested glue coding.

You should read broadly about design patterns because any number of them may be appropriate for various levels of code detail. And generally think about a class whenever two or more things must integrate, stay together, play together.

your algorithm can retrieve it via attributes or methods exposed by DecoratedFoo [B Ithica answer]

I bought a DecoratedFoo. Now What? You asked about:

some good ways to store extra information for each node

That detail and it's algorithm integration is best incapsulated with something like this:

A IntegratedData class than merges node + specific data, exposing this composite with methods.
A UniqueAlgorithmclass containing the algorithm and IntegratedData, exposing generalized methods for driving the algorithm
In the DecoratorFoo class (see B. Ithica answer) a reference to the above class composition. B. Ithica says: "your algorithm can retrieve it via attributes or methods exposed by DecoratedFoo"

Stack Exchange Network

How can I associate algorithm-specific data to the objects it works with, efficiently yet cleanly?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How can I associate algorithm-specific data to the objects it works with, efficiently yet cleanly?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions