JVM optimisation of hashCode() on List

Question 1

Imagine a simple case:

class B{
 public final String text;
 public B(String text){
 this.text = text;
 }
}
class A {
 private List<B> bs = new ArrayList<B>;
 public B getB(String text){
 for(B b :bs){
 if(b.text.equals(text)){
 return b;
 }
 }
 return null;
 }
 [getter/setter]
}

Imagine that for each instance of A, the List<B> is large and we need to call getB(String) often. However assume that it is also possible for the list to change (add/remove element, or even being reassigned).

At this stage, the average complexity for getB(String) is O(n). In order to improved that I was wondering if we could use some clever caching.

Imagine we cache the List<B> in a Map<String, B> where the key is B.text. That would improve the performance but it won't work if the list is changed (new element or deleted element) or reassigned (A.bs points to a new reference).

To go around that I thought that, along with the Map<String, B>, we could store a hash of the list bs. When we call getB(String) method, we compute the hash of the list bs. If the hash hasn't changed, we fetch the result from the map, if it has we reload the map.

The problem is that computing the hash for a java.util.List goes through all the element of the list and computes their hash, which is at least O(n).

Question

What I'd like to know is whether the JVM will be faster at computing the hash for the List than going through my loop in the getB(String) method. May be that depends on the implementation of hash for B. If so what kind of things could work? In a nutshell, I'd like to know whether this is stupid or could bring some performance improvement.

Question 2

Just mutate the Map directly (instead of mutating the List and caching it in a Map)

Question 3

This is just a simple example to illustrate a problem. Imagine that you are working with an existing application that already expects a List. Changing the List to a Map is not an option because you don't want to change the rest of the application too much

Question 4

You can't theoretically recalculate the hash code after a random list change without going through all the elements again.

Question 5

@phoenix7360 The existing application is badly designed then. It should either expect an Iterable or a Collection so you can pass it the map's valueSet(). I don't think it's very reasonable to want to have read-write O(1) access using two sets of keys, not without a custom collection designed to support that.

Question 6

@MarkoTopolnik You could maybe kinda sorta do this if you didn't care about the order of the items in the hash, seeing as XOR is commutative. Whenever an item is inserted or deleted, set the hash to the previous value ^ hash of the item being changed. (This would increase the chance of collisions a fair bit, depending on the input data.) That said, I don't think the OP is really asking about using the List itself as a hash key, the question is confusingly phrased.

Question 7

Without actually explaining why, you seem for some reason to believe that it is essential to keep the list structure as well. The only reasonable reason for this is that you need the order of the collection to be kept consistent. If you switch to a "plain" map, the order of the values is no longer constant, e.g. kept in the order in which you add the items to the map.

If you need both to keep the order (list behaviour) and access individual items using a key, you can use a LinkedHashMap, which essentially joins the behaviour of a LinkedList and a HashMap. Even if LinkedHashMap.values() returns a collection and not a list, the list behaviour is guaranteed within the collection.

Another issue with your question is, that you cannot use the list's hash code to safely determine changes. If the hash code has changed, you are indeed sure that the list has changed as well. If two hash codes are identical, you can still not be sure that the lists are actually identical. E.g. if the hash code implementation is based on strings, the hash codes for "1a" and "2B" are identical.

Question 8

Ok this is quite interesting. The reason I have been insisting on a list is because I am working with a large application that has already been written this way. I was trying to see whether there was a light weight solution that wouldn't involved refactoring a lot of existing code. However it there is no such solution, yes switching for a better data structure seems appropriate. I quite like your LinkedHashMap. I probably should mention that A and B are JPA entities. Not sure how it impacts the data structure choice.

Question 9

From all these discussions I take that there is no easy way to improve the performance without redesigning the model. If I were to do that, I would persist the B in a Map and return Collection of B from A using values.

Question 10

If so what kind of things could work?

Simply put: don't let anything else mutate your list without you knowing about it. I suspect you currently have something like:

public List<String> getAllBs() {
 return bs;
}

... and a similar setter. If you stop doing that, and instead just have appropriate mutation methods, then you can make sure that your code is the only code to mutate the list... which means you can either remember that your map is "dirty" or just mutate the map at the same time that you mutate the list.

Question 11

Yes I guess that would work. Out of interest do you have any comment on the computation of the hash?

Question 12

@phoenix7360: No, I wouldn't expect the hash to be computed any faster.

Question 13

You could implement your own class IndexedBArrayList which extends ArrayList<B>.

Then you add this functionality to it:

A private HashMap<String, B> index
All mutator methods of ArrayList are overridden to keep this index hash map updated in addition to calling the corresponding super-method.
A new public B getByString(String) method which uses the hash map

Question 14

From your description it does not seem that you need a List<B>.
Replace the List with a HashMap. If you need to search for Bs the best data structure is the hashmap and not the list.

Question 15

This is simple example to illustrate my question. Assume that by business requirement this has to be a List

Question 16

@phonenix7360: Assume that by business requirement this has to be a List Business requirements do not specify data structures. Developers use the best data-structure available for the implementation of the business requirements

Question 17

OK, imagine that the application is already using a list and your business requirement is to improve this piece of code without changing the rest of the application using this class. Imagine that you are using a JPA entity and you want to persist a List rather than a map.

Question 18

@phoenix7360: Then why not modify the JPA mappings to persist a map instead of a list?

Question 19

There could be other reasons why one MUST use an array list in this case, for example because other components expect an ArrayList and it would be too expensive to change these (sure, reeks of code smell, but reality isn't perfect).

jarnbjo jarnbjo 34.4k7 gold badges75 silver badges95 bronze badges · Accepted Answer · 2013-04-24 12:34:02Z

Without actually explaining why, you seem for some reason to believe that it is essential to keep the list structure as well. The only reasonable reason for this is that you need the order of the collection to be kept consistent. If you switch to a "plain" map, the order of the values is no longer constant, e.g. kept in the order in which you add the items to the map.

If you need both to keep the order (list behaviour) and access individual items using a key, you can use a LinkedHashMap, which essentially joins the behaviour of a LinkedList and a HashMap. Even if LinkedHashMap.values() returns a collection and not a list, the list behaviour is guaranteed within the collection.

Another issue with your question is, that you cannot use the list's hash code to safely determine changes. If the hash code has changed, you are indeed sure that the list has changed as well. If two hash codes are identical, you can still not be sure that the lists are actually identical. E.g. if the hash code implementation is based on strings, the hash codes for "1a" and "2B" are identical.

Ok this is quite interesting. The reason I have been insisting on a list is because I am working with a large application that has already been written this way. I was trying to see whether there was a light weight solution that wouldn't involved refactoring a lot of existing code. However it there is no such solution, yes switching for a better data structure seems appropriate. I quite like your LinkedHashMap. I probably should mention that A and B are JPA entities. Not sure how it impacts the data structure choice.
From all these discussions I take that there is no easy way to improve the performance without redesigning the model. If I were to do that, I would persist the B in a Map and return Collection of B from A using values.

CollectivesTM on Stack Overflow

JVM optimisation of hashCode() on List

4 Answers 4

2 Comments

2 Comments

Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

2 Comments

2 Comments

Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related