I'm writing in C++, but this problem applies to most non-high-level languages, and possibly some high-level ones as well.
I have a graph of heterogeneous nodes. The graph can be instantiated by different subsystems (e.g. it can be read from a file or generated by one out of many algorithms).
I have several algorithms that can work on the graph to compute something. Most algorithms need to store extra information for each node, and some are computationally intensive and I'm trying to optimize them.
Until now I've been using a hash table (a std::unordered_map<Node*, Data>
) to store the extra information of a node, but I just realized that using a hashmap affects performance noticeably: the most intensive algorithm performs 2~5% better if I keep the data in the node type. However placing algorithm-specific data in the node type seems extremely dirty.
What are some good ways to store extra information for each node, in an efficient, yet clean way? Are there any smart ideas/patterns I should consider?
2 Answers 2
How can I associate algorithm-specific data to the objects it works with, efficiently yet cleanly?
If the different node classes conform to the Liskov substitution principle then you ought to be able to use inheritance to achieve this.
For every node class Foo
you want to store some ancillary information for, define a subclass of it called DecoratedFoo
to hold (and possibly even compute) the ancillary information required by your algorithm.
When your program loads the graph from (whatever input source), whenever you see a node of class Foo
, instantiate a node of class DecoratedFoo
instead.
When your algorithm needs this extra information about a node, your algorithm can retrieve it via attributes or methods exposed by DecoratedFoo
. This is efficient in the sense that the only overhead is object dispatch, which can be arranged to be resolved statically in many cases.
It is also clean in the sense that any library or other code you're re-using does not need to be changed, because DectoratedFoo
is-a Foo
, so well-behaved (that is, LSP-conforming) code will be oblivious to the decorations that your algorithm is using.
A computer science engineering or Object Oriented design question?
computationally intensive and I'm trying to optimize them. ... most intensive algorithm performs 2~5% better if I keep the data in the node type ...
I'm with Doc Brown on this.
However placing algorithm-specific data in the node type seems extremely dirty. ... What are some good ways to store extra information for each node
This blinkered "dirty" pronouncement is preventing you from realizing the specific data is an extension of a node. Yet "In the node type" is exactly where it should be.
B Ithica specific answer of the general OO idea of extensibility has merit. Decorator is one of the Structural design patterns.
Design Deeply
Thinking on your performance improvment observation: absence of a coherent design structure often results in scattered, redundant, deeply nested glue coding.
You should read broadly about design patterns because any number of them may be appropriate for various levels of code detail. And generally think about a class whenever two or more things must integrate, stay together, play together.
your algorithm can retrieve it via attributes or methods exposed by DecoratedFoo [B Ithica answer]
I bought a DecoratedFoo. Now What? You asked about:
some good ways to store extra information for each node
That detail and it's algorithm integration is best incapsulated with something like this:
- A
IntegratedData
class than merges node + specific data, exposing this composite with methods. - A
UniqueAlgorithm
class containing the algorithm andIntegratedData
, exposing generalized methods for driving the algorithm - In the
DecoratorFoo
class (see B. Ithica answer) a reference to the above class composition. B. Ithica says: "your algorithm can retrieve it via attributes or methods exposed byDecoratedFoo
"
Explore related questions
See similar questions with these tags.
unordered_map
performance.