Efficiency in Java: Object reference vs id reference

Question 1

I have to represent a certain data structure, in which I have nodes that are related between them. Each node has its numeric id, and a list containing the next directly related nodes and another list containing the previous directly related.

So my code now would be:

class Node {
 int id;
 ArrayList<Node> previousNodes;
 ArrayList<Node> nextNodes;
}

Now the question is: having into account that this data structure may be very big, does the use of lists inside each node affects significantly on performance?

It is supposed that each list contains the reference of each node. Would be a significant improvement if instead of lists I stored an array containing the ids of the related nodes?

So, in summary, the question is:

Is there any significant improvement in performance between my code and the following code for a big data structure?:

class Node {
 int id;
 int[] previousNodes;
 int[] nextNodes;
}

Note: The data structure would be represented as an array of nodes, where each cell of the array contains a node whose id corresponds to the index of the cell, so getting a node by its id would be of O(1)

Question 2

Depends on the code used for implementing the list. There are widespread implementations for either outcome.

Question 3

@KilianFoth The list would be a simple ArrayList

Question 4

Arrays would be more memory-efficient, but potentially more CPU-intensive if you find yourself having to recreate often to accomodate new values. If memory is not an issue, don't bother with arrays. Just stick with Lists.

Question 5

It's possible that, under certain circumstances, the int[]s may perform noticeably better than the ArrayList<Node>s.

By most measures, the two forms will be quite similar. The Nodes in the first form will be a couple of ints bigger. If you know accurately how many predecessors and successors you have on Node creation they'll have similar sized arrays. If you don't know, then you may need to be more conservative with the fixed arrays which may increase their size.

Probably the largest practical difference, if there is one, will be if you have to iterate the lists in a tight loop. If you do this frequently and you only need the id from the Node then the bare arrays may be quicker. The reason is that the ints are likely to be more local in cache as they are physically in the arrays. The arrays backing the ArrayLists will have to dereference the Node pointer to get the id which is likely to be less efficient. If, however, you need to get back to the Node from the id then any benefit will, almost definitely, be negated or worse.

At this point I'll caution with the usual caveat: profile. The actual performance will depend on a whole host of factors that may interact in surprising ways. If it were me, I would likely write the first form i.e. the more idiomatic one, and only change if there was a clear performance issue. And even then, I may well not do it this way depending on the specifics.

Alex Alex 3,9521 gold badge17 silver badges17 bronze badges · Answer 1 · 2019-02-13 14:32:07Z

It's possible that, under certain circumstances, the int[]s may perform noticeably better than the ArrayList<Node>s.

By most measures, the two forms will be quite similar. The Nodes in the first form will be a couple of ints bigger. If you know accurately how many predecessors and successors you have on Node creation they'll have similar sized arrays. If you don't know, then you may need to be more conservative with the fixed arrays which may increase their size.

Probably the largest practical difference, if there is one, will be if you have to iterate the lists in a tight loop. If you do this frequently and you only need the id from the Node then the bare arrays may be quicker. The reason is that the ints are likely to be more local in cache as they are physically in the arrays. The arrays backing the ArrayLists will have to dereference the Node pointer to get the id which is likely to be less efficient. If, however, you need to get back to the Node from the id then any benefit will, almost definitely, be negated or worse.

At this point I'll caution with the usual caveat: profile. The actual performance will depend on a whole host of factors that may interact in surprising ways. If it were me, I would likely write the first form i.e. the more idiomatic one, and only change if there was a clear performance issue. And even then, I may well not do it this way depending on the specifics.

Stack Exchange Network

Efficiency in Java: Object reference vs id reference

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Efficiency in Java: Object reference vs id reference

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions