1

Imagine a simple case:

class B{
 public final String text;
 public B(String text){
 this.text = text;
 }
}
class A {
 private List<B> bs = new ArrayList<B>;
 public B getB(String text){
 for(B b :bs){
 if(b.text.equals(text)){
 return b;
 }
 }
 return null;
 }
 [getter/setter]
}

Imagine that for each instance of A, the List<B> is large and we need to call getB(String) often. However assume that it is also possible for the list to change (add/remove element, or even being reassigned).

At this stage, the average complexity for getB(String) is O(n). In order to improved that I was wondering if we could use some clever caching.

Imagine we cache the List<B> in a Map<String, B> where the key is B.text. That would improve the performance but it won't work if the list is changed (new element or deleted element) or reassigned (A.bs points to a new reference).

To go around that I thought that, along with the Map<String, B>, we could store a hash of the list bs. When we call getB(String) method, we compute the hash of the list bs. If the hash hasn't changed, we fetch the result from the map, if it has we reload the map.

The problem is that computing the hash for a java.util.List goes through all the element of the list and computes their hash, which is at least O(n).

Question

What I'd like to know is whether the JVM will be faster at computing the hash for the List than going through my loop in the getB(String) method. May be that depends on the implementation of hash for B. If so what kind of things could work? In a nutshell, I'd like to know whether this is stupid or could bring some performance improvement.

Marat Safin
1,92924 silver badges30 bronze badges
asked Apr 24, 2013 at 12:12
6
  • 2
    Just mutate the Map directly (instead of mutating the List and caching it in a Map) Commented Apr 24, 2013 at 12:19
  • This is just a simple example to illustrate a problem. Imagine that you are working with an existing application that already expects a List. Changing the List to a Map is not an option because you don't want to change the rest of the application too much Commented Apr 24, 2013 at 12:24
  • You can't theoretically recalculate the hash code after a random list change without going through all the elements again. Commented Apr 24, 2013 at 12:32
  • 1
    @phoenix7360 The existing application is badly designed then. It should either expect an Iterable or a Collection so you can pass it the map's valueSet(). I don't think it's very reasonable to want to have read-write O(1) access using two sets of keys, not without a custom collection designed to support that. Commented Apr 24, 2013 at 12:32
  • @MarkoTopolnik You could maybe kinda sorta do this if you didn't care about the order of the items in the hash, seeing as XOR is commutative. Whenever an item is inserted or deleted, set the hash to the previous value ^ hash of the item being changed. (This would increase the chance of collisions a fair bit, depending on the input data.) That said, I don't think the OP is really asking about using the List itself as a hash key, the question is confusingly phrased. Commented Apr 24, 2013 at 12:41

4 Answers 4

5

Without actually explaining why, you seem for some reason to believe that it is essential to keep the list structure as well. The only reasonable reason for this is that you need the order of the collection to be kept consistent. If you switch to a "plain" map, the order of the values is no longer constant, e.g. kept in the order in which you add the items to the map.

If you need both to keep the order (list behaviour) and access individual items using a key, you can use a LinkedHashMap, which essentially joins the behaviour of a LinkedList and a HashMap. Even if LinkedHashMap.values() returns a collection and not a list, the list behaviour is guaranteed within the collection.

Another issue with your question is, that you cannot use the list's hash code to safely determine changes. If the hash code has changed, you are indeed sure that the list has changed as well. If two hash codes are identical, you can still not be sure that the lists are actually identical. E.g. if the hash code implementation is based on strings, the hash codes for "1a" and "2B" are identical.

answered Apr 24, 2013 at 12:34
Sign up to request clarification or add additional context in comments.

2 Comments

Ok this is quite interesting. The reason I have been insisting on a list is because I am working with a large application that has already been written this way. I was trying to see whether there was a light weight solution that wouldn't involved refactoring a lot of existing code. However it there is no such solution, yes switching for a better data structure seems appropriate. I quite like your LinkedHashMap. I probably should mention that A and B are JPA entities. Not sure how it impacts the data structure choice.
From all these discussions I take that there is no easy way to improve the performance without redesigning the model. If I were to do that, I would persist the B in a Map and return Collection of B from A using values.
1

If so what kind of things could work?

Simply put: don't let anything else mutate your list without you knowing about it. I suspect you currently have something like:

public List<String> getAllBs() {
 return bs;
}

... and a similar setter. If you stop doing that, and instead just have appropriate mutation methods, then you can make sure that your code is the only code to mutate the list... which means you can either remember that your map is "dirty" or just mutate the map at the same time that you mutate the list.

answered Apr 24, 2013 at 12:18

2 Comments

Yes I guess that would work. Out of interest do you have any comment on the computation of the hash?
@phoenix7360: No, I wouldn't expect the hash to be computed any faster.
1

You could implement your own class IndexedBArrayList which extends ArrayList<B>.

Then you add this functionality to it:

  • A private HashMap<String, B> index
  • All mutator methods of ArrayList are overridden to keep this index hash map updated in addition to calling the corresponding super-method.
  • A new public B getByString(String) method which uses the hash map
answered Apr 24, 2013 at 12:26

Comments

0

From your description it does not seem that you need a List<B>.
Replace the List with a HashMap. If you need to search for Bs the best data structure is the hashmap and not the list.

answered Apr 24, 2013 at 12:20

10 Comments

This is simple example to illustrate my question. Assume that by business requirement this has to be a List
@phonenix7360: Assume that by business requirement this has to be a List Business requirements do not specify data structures. Developers use the best data-structure available for the implementation of the business requirements
OK, imagine that the application is already using a list and your business requirement is to improve this piece of code without changing the rest of the application using this class. Imagine that you are using a JPA entity and you want to persist a List rather than a map.
@phoenix7360: Then why not modify the JPA mappings to persist a map instead of a list?
There could be other reasons why one MUST use an array list in this case, for example because other components expect an ArrayList and it would be too expensive to change these (sure, reeks of code smell, but reality isn't perfect).
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.