I have to represent a certain data structure, in which I have nodes that are related between them. Each node has its numeric id, and a list containing the next directly related nodes and another list containing the previous directly related.
So my code now would be:
class Node {
int id;
ArrayList<Node> previousNodes;
ArrayList<Node> nextNodes;
}
Now the question is: having into account that this data structure may be very big, does the use of lists inside each node affects significantly on performance?
It is supposed that each list contains the reference of each node. Would be a significant improvement if instead of lists I stored an array containing the ids of the related nodes?
So, in summary, the question is:
Is there any significant improvement in performance between my code and the following code for a big data structure?:
class Node {
int id;
int[] previousNodes;
int[] nextNodes;
}
Note: The data structure would be represented as an array of nodes, where each cell of the array contains a node whose id corresponds to the index of the cell, so getting a node by its id would be of O(1)
-
Depends on the code used for implementing the list. There are widespread implementations for either outcome.Kilian Foth– Kilian Foth2019年02月13日 12:01:14 +00:00Commented Feb 13, 2019 at 12:01
-
@KilianFoth The list would be a simple ArrayListjacosro– jacosro2019年02月13日 12:02:47 +00:00Commented Feb 13, 2019 at 12:02
-
Arrays would be more memory-efficient, but potentially more CPU-intensive if you find yourself having to recreate often to accomodate new values. If memory is not an issue, don't bother with arrays. Just stick with Lists.Neil– Neil2019年02月13日 12:56:23 +00:00Commented Feb 13, 2019 at 12:56
1 Answer 1
It's possible that, under certain circumstances, the int[]
s may perform noticeably better than the ArrayList<Node>
s.
By most measures, the two forms will be quite similar. The Node
s in the first form will be a couple of int
s bigger. If you know accurately how many predecessors and successors you have on Node
creation they'll have similar sized arrays. If you don't know, then you may need to be more conservative with the fixed arrays which may increase their size.
Probably the largest practical difference, if there is one, will be if you have to iterate the lists in a tight loop. If you do this frequently and you only need the id
from the Node
then the bare arrays may be quicker. The reason is that the int
s are likely to be more local in cache as they are physically in the arrays. The arrays backing the ArrayList
s will have to dereference the Node
pointer to get the id
which is likely to be less efficient. If, however, you need to get back to the Node
from the id
then any benefit will, almost definitely, be negated or worse.
At this point I'll caution with the usual caveat: profile. The actual performance will depend on a whole host of factors that may interact in surprising ways. If it were me, I would likely write the first form i.e. the more idiomatic one, and only change if there was a clear performance issue. And even then, I may well not do it this way depending on the specifics.