HashMap get/put complexity

Question 1

We are used to saying that HashMap get/put operations are O(1). However it depends on the hash implementation. The default object hash is actually the internal address in the JVM heap. Are we sure it is good enough to claim that the get/put are O(1)?

Available memory is another issue. As I understand from the javadocs, the HashMap load factor should be 0.75. What if we do not have enough memory in JVM and the load factor exceeds the limit?

So, it looks like O(1) is not guaranteed. Does it make sense or am I missing something?

Question 2

You might want to look up the concept of amortized complexity. See for instance here: stackoverflow.com/questions/3949217/time-complexity-of-hash-table Worst case complexity is not the most important measure for a hash table

Question 3

Correct -- it's amortized O(1) -- never forget that first part and you won't have these kinds of questions :)

Question 4

Time complexity worst case is O(logN) since Java 1.8 if I'm not wrong.

Question 5

It depends on many things. It's usually O(1), with a decent hash which itself is constant time... but you could have a hash which takes a long time to compute, and if there are multiple items in the hash map which return the same hash code, get will have to iterate over them calling equals on each of them to find a match.

In the worst case, a HashMap has an O(n) lookup due to walking through all entries in the same hash bucket (e.g. if they all have the same hash code). Fortunately, that worst case scenario doesn't come up very often in real life, in my experience. So no, O(1) certainly isn't guaranteed - but it's usually what you should assume when considering which algorithms and data structures to use.

In JDK 8, HashMap has been tweaked so that if keys can be compared for ordering, then any densely-populated bucket is implemented as a tree, so that even if there are lots of entries with the same hash code, the complexity is O(log n). That can cause issues if you have a key type where equality and ordering are different, of course.

And yes, if you don't have enough memory for the hash map, you'll be in trouble... but that's going to be true whatever data structure you use.

Question 6

@marcog: So what are you assuming to be O(n log n)? Insertion of n items?

Question 7

+1 for a good answer. Would you please provide links like this wikipedia entry for hash table in your answer? That way, the more interested reader could get to the nitty gritty of understanding why you gave your answer.

Question 8

Great ! So many people think that hashmap complexity is AT MAX O(1), which is wrong.Once i was refused a job because i told that at worst case map is O(n)... It was an interview in the company of Stack Overflow's founder : Fog Creek (making the Trello product !)

Question 9

@SleimanJneidi: It still is if the key doesn't implement Comparable<T>` - but I'll update the answer when I have more time.

Question 10

@ip696: Yes, put is "amortized O(1)" - usually O(1), occasionally O(n) - but rarely enough to balance out.

Question 11

It has already been mentioned that hashmaps are O(n/m) in average, if n is the number of items and m is the size. It has also been mentioned that in principle the whole thing could collapse into a singly linked list with O(n) query time. (This all assumes that calculating the hash is constant time).

However what isn't often mentioned is, that with probability at least 1-1/n (so for 1000 items that's a 99.9% chance) the largest bucket won't be filled more than O(logn)! Hence matching the average complexity of binary search trees. (And the constant is good, a tighter bound is (log n)*(m/n) + O(1)).

All that's required for this theoretical bound is that you use a reasonably good hash function (see Wikipedia: Universal Hashing. It can be as simple as a*x>>m). And of course that the person giving you the values to hash doesn't know how you have chosen your random constants.

TL;DR: With Very High Probability the worst case get/put complexity of a hashmap is O(logn).

Question 12

(And notice that none of this assumes random data. The probability arises purely from the choice of hash function)

Question 13

I also has the same question regarding the runtime complexity of a lookup in a hash map. It would seem it's O(n) as constant factors are supposed to be dropped. The the 1/m is a constant factor and thus is dropped leaving O(n).

Question 14

People need to learn what Big Theta is and use that when they mean to want to say "average Big-O", as Big-O is worst-case scenario.

Question 15

@nickdu The 1/m is certainly not a constant factor. You need to grow m (the number of buckets) as fast as n (the number of items). You won't get much benefit from a hashmap with a constant m.

Question 16

I agree with:

the general amortized complexity of O(1)
a bad hashCode() implementation could result to multiple collisions, which means that in the worst case every object goes to the same bucket, thus O(N) if each bucket is backed by a List.
since Java 8, HashMap dynamically replaces the Nodes (linked list) used in each bucket with TreeNodes (red-black tree when a list gets bigger than 8 elements) resulting to a worst performance of O(logN).

But, this is not the full truth if we want to be 100% precise. The implementation of hashCode() and the type of key Object (immutable/cached or being a Collection) might also affect real time complexity in strict terms.

Let's assume the following three cases:

HashMap<Integer, V>
HashMap<String, V>
HashMap<List<E>, V>

Do they have the same complexity? Well, the amortised complexity of the 1st one is, as expected, O(1). But, for the rest, we also need to compute hashCode() of the lookup element, which means we might have to traverse arrays and lists in our algorithm.

Lets assume that the size of all of the above arrays/lists is k. Then, HashMap<String, V> and HashMap<List<E>, V> will have O(k) amortised complexity and similarly, O(k + logN) worst case in Java8.

*Note that using a String key is a more complex case, because it is immutable and Java caches the result of hashCode() in a private variable hash, so it's only computed once.

/** Cache the hash code for the string */
 private int hash; // Default to 0

But, the above is also having its own worst case, because Java's String.hashCode() implementation is checking if hash == 0 before computing hashCode. But hey, there are non-empty Strings that output a hashcode of zero, such as "f5a5a608", see here, in which case memoization might not be helpful.

Question 17

I'm not sure the default hashcode is the address - I read the OpenJDK source for hashcode generation a while ago, and I remember it being something a bit more complicated. Still not something that guarantees a good distribution, perhaps. However, that is to some extent moot, as few classes you'd use as keys in a hashmap use the default hashcode - they supply their own implementations, which ought to be good.

On top of that, what you may not know (again, this is based in reading source - it's not guaranteed) is that HashMap stirs the hash before using it, to mix entropy from throughout the word into the bottom bits, which is where it's needed for all but the hugest hashmaps. That helps deal with hashes that specifically don't do that themselves, although i can't think of any common cases where you'd see that.

Finally, what happens when the table is overloaded is that it degenerates into a set of parallel linked lists - performance becomes O(n). Specifically, the number of links traversed will on average be half the load factor.

Question 18

Dammit. I choose to believe that if I hadn't had to type this on a flipping mobile phone touchscreen, I could have beaten Jon Sheet to the punch. There's a badge for that, right?

Question 19

HashMap operation is dependent factor of hashCode implementation. For the ideal scenario lets say the good hash implementation which provide unique hash code for every object (No hash collision) then the best, worst and average case scenario would be O(1). Let's consider a scenario where a bad implementation of hashCode always returns 1 or such hash which has hash collision. In this case the time complexity would be O(n).

Now coming to the second part of the question about memory, then yes memory constraint would be taken care by JVM.

Question 20

In practice, it is O(1), but this actually is a terrible and mathematically non-sense simplification. The O() notation says how the algorithm behaves when the size of the problem tends to infinity. Hashmap get/put works like an O(1) algorithm for a limited size. The limit is fairly large from the computer memory and from the addressing point of view, but far from infinity.

When one says that hashmap get/put is O(1) it should really say that the time needed for the get/put is more or less constant and does not depend on the number of elements in the hashmap so far as the hashmap can be presented on the actual computing system. If the problem goes beyond that size and we need larger hashmaps then, after a while, certainly the number of the bits describing one element will also increase as we run out of the possible describable different elements. For example, if we used a hashmap to store 32bit numbers and later we increase the problem size so that we will have more than 2^32 bit elements in the hashmap, then the individual elements will be described with more than 32bits.

The number of the bits needed to describe the individual elements is log(N), where N is the maximum number of elements, therefore get and put are really O(log N).

If you compare it with a tree set, which is O(log n) then hash set is O(long(max(n)) and we simply feel that this is O(1), because on a certain implementation max(n) is fixed, does not change (the size of the objects we store measured in bits) and the algorithm calculating the hash code is fast.

Finally, if finding an element in any data structure were O(1) we would create information out of thin air. Having a data structure of n element I can select one element in n different way. With that, I can encode log(n) bit information. If I can encode that in zero bit (that is what O(1) means) then I created an infinitely compressing ZIP algorithm.

Question 21

Shouldn't be the complexity for the tree set O(log(n) * log(max(n))), then? While the comparison at every node may be smarter, in worst case it needs to inspect all O(log(max(n)) bits, right?

Question 22

The complexity for the trees is O(log(max(n)), though your thinking is logical. Comparing two N bit numbers need the comparison of an average of 2 bits only. We have 2^N numbers and 2^N^2 possible comparisons. Half of it will be decided looking only at the first bit. Half of the rest needs another (two) bits and so on. The average number of bits is 2, fixed time and not log(N). At the same time, the calculation of the hash function needs to visit all the bits.

Question 23

Java HashMap time complexity
--------------------------------
get(key) & contains(key) & remove(key) Best case Worst case 
HashMap before Java 8, using LinkedList buckets 1 O(n)
HashMap after Java 8, using LinkedList buckets 1 O(n)
HashMap after Java 8, using Binary Tree buckets 1 O(log n)
 
put(key, value) Best case Worst case 
HashMap before Java 8, using LinkedList buckets 1 1
HashMap after Java 8, using LinkedList buckets 1 1
HashMap after Java 8, using Binary Tree buckets 1 O(log n)

Hints:

Before Java 8, HashMap use LinkedList buckets
After Java 8, HashMap will use either LinkedList buckets or Binary Tree buckets according to the bucket size.

if(bucket size > TREEIFY_THRESHOLD[8]):

treeifyBin: The bucket will be a Balanced Binary Red-Black Tree

if(bucket size <= UNTREEIFY_THRESHOLD[6]):

untreeify: The bucket will be LinkedList (plain mode)

Question 24

The worst case for put(key, value) is Map rebalance. When current capacity threshold is reached and you have to allocate new memory to double Map size and copy existing elements. So complexity will be O(n)

Question 25

In simple word, If each bucket contain only single node then time complexity will be O(1). If bucket contain more than one node them time complexity will be O(linkedList size). which is always efficient than O(n).

hence we can say on an average case time complexity of put(K,V) function :

nodes(n)/buckets(N) = λ (lambda)

Example : 16/16 = 1

Time complexity will be O(1)

Question 26

First of all, in first level HashMap has buckets that means an array of linked list or red-black trees if number of elements in bucket > 8. So addressing right bucket (index is based on hash) it

Jon Skeet 1.5m892 gold badges9.3k silver badges9.3k bronze badges · Accepted Answer · 2010-12-29 11:25:06Z

It depends on many things. It's usually O(1), with a decent hash which itself is constant time... but you could have a hash which takes a long time to compute, and if there are multiple items in the hash map which return the same hash code, get will have to iterate over them calling equals on each of them to find a match.

In the worst case, a HashMap has an O(n) lookup due to walking through all entries in the same hash bucket (e.g. if they all have the same hash code). Fortunately, that worst case scenario doesn't come up very often in real life, in my experience. So no, O(1) certainly isn't guaranteed - but it's usually what you should assume when considering which algorithms and data structures to use.

In JDK 8, HashMap has been tweaked so that if keys can be compared for ordering, then any densely-populated bucket is implemented as a tree, so that even if there are lots of entries with the same hash code, the complexity is O(log n). That can cause issues if you have a key type where equality and ordering are different, of course.

And yes, if you don't have enough memory for the hash map, you'll be in trouble... but that's going to be true whatever data structure you use.

@marcog: So what are you assuming to be O(n log n)? Insertion of n items?
+1 for a good answer. Would you please provide links like this wikipedia entry for hash table in your answer? That way, the more interested reader could get to the nitty gritty of understanding why you gave your answer.
Great ! So many people think that hashmap complexity is AT MAX O(1), which is wrong.Once i was refused a job because i told that at worst case map is O(n)... It was an interview in the company of Stack Overflow's founder : Fog Creek (making the Trello product !)
@SleimanJneidi: It still is if the key doesn't implement Comparable<T>` - but I'll update the answer when I have more time.
@ip696: Yes, put is "amortized O(1)" - usually O(1), occasionally O(n) - but rarely enough to balance out.

CollectivesTM on Stack Overflow

HashMap get/put complexity

8 Answers 8

11 Comments

4 Comments

Comments

1 Comment

Comments

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

8 Answers 8

11 Comments

4 Comments

Comments

1 Comment

Comments

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related