Reason behind JVM's default Object.HashCode() implementation

Question 1

I am trying to understand why JVM's default implementation does not return same hashcode() value for all the objects...

I have written a program where i have overridden equals() but not hashCode(), and the consequences are scary.

HashSet is adding two objects even the equals are same.
TreeSet is throwing exception with Comparable implementation..

And many more..

Had the default Object'shashCode() implementation returns same int value, all these issues could have been avoided...

I understand their's alot written and discussed about hashcode() and equals() but i am not able to understand why things cant be handled at by default, this is error prone and consequences could be really bad and scary..

Here's my sample program..

import java.util.HashSet;
import java.util.Set;
public class HashcodeTest {
 public static void main(String...strings ) {
 Car car1 = new Car("honda", "red");
 Car car2 = new Car("honda", "red");
 Set<Car> set = new HashSet<Car>();
 set.add(car1);
 set.add(car2);
 System.out.println("size of Set : "+set.size());
 System.out.println("hashCode for car1 : "+car1.hashCode());
 System.out.println("hashCode for car2 : "+car2.hashCode());
 }
}
class Car{
 private String name;
 private String color;
 public Car(String name, String color) {
 super();
 this.name = name;
 this.color = color;
 }
 public String getName() {
 return name;
 }
 public void setName(String name) {
 this.name = name;
 }
 public String getColor() {
 return color;
 }
 public void setColor(String color) {
 this.color = color;
 }
 @Override
 public boolean equals(Object obj) {
 if (this == obj)
 return true;
 if (obj == null)
 return false;
 if (getClass() != obj.getClass())
 return false;
 Car other = (Car) obj;
 if (color == null) {
 if (other.color != null)
 return false;
 } else if (!color.equals(other.color))
 return false;
 if (name == null) {
 if (other.name != null)
 return false;
 } else if (!name.equals(other.name))
 return false;
 return true;
 }
}

Output:

size of Set : 2

hashCode for car1 : 330932989

hashCode for car2 : 8100393

Question 2

If the default implementation of hashCode always returned the same value by default, you would get terrible performance from hash-based data structures.

Question 3

Agree but it would be better than the wrong or incorrect functionality...

Question 4

Possible duplicate of Why do I need to override the equals and hashCode methods in Java?

Question 5

I believe you should be functionally correct before you look for the performance

Question 6

The default functionality is correct, though, and gives good performance in hash-based data structures. It is the fact that you have overridden equals without overriding hashCode consistently that makes it functionally incorrect.

Question 7

It seems that you want to propose to calculate hashCode by default just by taking all the object fields and combining their hashCodes using some formula. Such approach is wrong and may lead to many unpleasant circumstances. In your case it would work, because your object is very simple. But real life objects are much more complex. A few examples:

Objects are connected into double-linked list (every object has previous and next fields). How default implementation would calculate the hashCode? If it should check the fields, it will end up with infinite recursion.
Ok, suppose that we can detect infinite recursion. Let's just have single-linked list. In this case the hashCode of every node should be calculated from all the successor nodes? What if this list contains millions of nodes? All of them should be checked to generate the hashCode?
Suppose you have two HashSet objects. First is created like:
```
HashSet<Integer> a = new HashSet<>();
a.add(1);
```
The second is created like this:
```
HashSet<Integer> b = new HashSet<>();
for(int i=1; i<1000; i++) b.add(i);
for(int i=2; i<1000; i++) b.remove(i);
```
From user's point of view both contain only one element. But programmatically the second one holds big hash-table inside (like array of 2048 entries of which only one is not null), because when you added many elements, the hash-table was resized. In contrast, the first one holds small hash-table inside (e.g. 16 elements). So programmatically objects are very different: one has big array, other has small array. But they are equal and have the same hashCode, thanks to custom implementation of hashCode and equals.
Suppose you have different List implementations. For example, ArrayList and LinkedList. Both contain the same elements and from the user's point of view they are equal and should have the same hashCode. And they indeed equal and have the same hashCode. However their internal structure is completely different: ArrayList contains an array while LinkedList contains pointers to the objects representing head and tail. So you cannot just generate the hashCode based on their fields: it surely will be different.
Some object may contain the field which is lazily initialized (initialized to null and calculated from other fields only when necessary). What if you have two otherwise equal objects and one has its lazy field initialized while other is not? We should exclude this lazy field from hashCode calculation.

So, there are many cases when universal hashCode approach would not work and may even produce problems (like making your program crash with StackOverflowError or stuck enumerating all the linked objects). Due to this the simplest implementation was selected which is based on object identity. Note that the contract of hashCode and equals requires them to be consistent, and it's fulfilled by default implementation. If you redefine equals, you just must redefine hashCode as well.

Question 8

you just nailed it, thanks for sharing your analysis.

Question 9

You broke the contract.

hashcode and equals should be written in such a way, that when equals return true these objects has same hashcode.

If you override equals then you must provide hashcode that works properly.

Default implementation can't handle it, because default implementation don't know which fields are important. And automatic implementation would not do it in efficient way, the hashcode function is to speed up operations like data lookup in data structures, if it is implemented improperly, then performance will suffer.

Question 10

Yes that i understand and agree but i broke it unknowingly... why the default implementation cannot or should not handle it.

Question 11

The contract that Object obeys by having equals() as a == b which guarantees that a.hashCode() == b.hashCode().

Question 12

From the Docs

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

Question 13

From documentation:

If two objects are equal according to the equals(Object) method, then calling the hashCode} method on each of the two objects must produce the same integer result.

then if you overrides how equals() behave, you must override hashCode() as well.

Also, from docs of equals() -

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

Question 14

From javadoc of Object class:

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

Thus if default implementation provides the same hash, it defeats the purpose.

And for a default implementation, it cannot assume all the classes are of value class, thus the last sentence from doc:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

Tagir Valeev Tagir Valeev 101k19 gold badges233 silver badges346 bronze badges · Accepted Answer · 2016-02-19 10:38:04Z

It seems that you want to propose to calculate hashCode by default just by taking all the object fields and combining their hashCodes using some formula. Such approach is wrong and may lead to many unpleasant circumstances. In your case it would work, because your object is very simple. But real life objects are much more complex. A few examples:

Objects are connected into double-linked list (every object has previous and next fields). How default implementation would calculate the hashCode? If it should check the fields, it will end up with infinite recursion.
Ok, suppose that we can detect infinite recursion. Let's just have single-linked list. In this case the hashCode of every node should be calculated from all the successor nodes? What if this list contains millions of nodes? All of them should be checked to generate the hashCode?
Suppose you have two HashSet objects. First is created like:
```
HashSet<Integer> a = new HashSet<>();
a.add(1);
```
The second is created like this:
```
HashSet<Integer> b = new HashSet<>();
for(int i=1; i<1000; i++) b.add(i);
for(int i=2; i<1000; i++) b.remove(i);
```
From user's point of view both contain only one element. But programmatically the second one holds big hash-table inside (like array of 2048 entries of which only one is not null), because when you added many elements, the hash-table was resized. In contrast, the first one holds small hash-table inside (e.g. 16 elements). So programmatically objects are very different: one has big array, other has small array. But they are equal and have the same hashCode, thanks to custom implementation of hashCode and equals.
Suppose you have different List implementations. For example, ArrayList and LinkedList. Both contain the same elements and from the user's point of view they are equal and should have the same hashCode. And they indeed equal and have the same hashCode. However their internal structure is completely different: ArrayList contains an array while LinkedList contains pointers to the objects representing head and tail. So you cannot just generate the hashCode based on their fields: it surely will be different.
Some object may contain the field which is lazily initialized (initialized to null and calculated from other fields only when necessary). What if you have two otherwise equal objects and one has its lazy field initialized while other is not? We should exclude this lazy field from hashCode calculation.

So, there are many cases when universal hashCode approach would not work and may even produce problems (like making your program crash with StackOverflowError or stuck enumerating all the linked objects). Due to this the simplest implementation was selected which is based on object identity. Note that the contract of hashCode and equals requires them to be consistent, and it's fulfilled by default implementation. If you redefine equals, you just must redefine hashCode as well.

1

you just nailed it, thanks for sharing your analysis.

Rupesh
– Rupesh

2016年02月19日 10:46:38 +00:00
Commented Feb 19, 2016 at 10:46

CollectivesTM on Stack Overflow

Reason behind JVM's default Object.HashCode() implementation

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related