I am trying to understand why JVM's default implementation does not return same hashcode()
value for all the objects...
I have written a program where i have overridden equals()
but not hashCode()
, and the consequences are scary.
HashSet
is adding two objects even the equals are same.TreeSet
is throwing exception with Comparable implementation..
And many more..
Had the default Object'shashCode()
implementation returns same int value, all these issues could have been avoided...
I understand their's alot written and discussed about hashcode() and equals()
but i am not able to understand why things cant be handled at by default, this is error prone and consequences could be really bad and scary..
Here's my sample program..
import java.util.HashSet;
import java.util.Set;
public class HashcodeTest {
public static void main(String...strings ) {
Car car1 = new Car("honda", "red");
Car car2 = new Car("honda", "red");
Set<Car> set = new HashSet<Car>();
set.add(car1);
set.add(car2);
System.out.println("size of Set : "+set.size());
System.out.println("hashCode for car1 : "+car1.hashCode());
System.out.println("hashCode for car2 : "+car2.hashCode());
}
}
class Car{
private String name;
private String color;
public Car(String name, String color) {
super();
this.name = name;
this.color = color;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getColor() {
return color;
}
public void setColor(String color) {
this.color = color;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Car other = (Car) obj;
if (color == null) {
if (other.color != null)
return false;
} else if (!color.equals(other.color))
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
}
Output:
size of Set : 2
hashCode for car1 : 330932989
hashCode for car2 : 8100393
5 Answers 5
It seems that you want to propose to calculate hashCode
by default just by taking all the object fields and combining their hashCodes using some formula. Such approach is wrong and may lead to many unpleasant circumstances. In your case it would work, because your object is very simple. But real life objects are much more complex. A few examples:
Objects are connected into double-linked list (every object has
previous
andnext
fields). How default implementation would calculate the hashCode? If it should check the fields, it will end up with infinite recursion.Ok, suppose that we can detect infinite recursion. Let's just have single-linked list. In this case the hashCode of every node should be calculated from all the successor nodes? What if this list contains millions of nodes? All of them should be checked to generate the hashCode?
Suppose you have two
HashSet
objects. First is created like:HashSet<Integer> a = new HashSet<>(); a.add(1);
The second is created like this:
HashSet<Integer> b = new HashSet<>(); for(int i=1; i<1000; i++) b.add(i); for(int i=2; i<1000; i++) b.remove(i);
From user's point of view both contain only one element. But programmatically the second one holds big hash-table inside (like array of 2048 entries of which only one is not null), because when you added many elements, the hash-table was resized. In contrast, the first one holds small hash-table inside (e.g. 16 elements). So programmatically objects are very different: one has big array, other has small array. But they are equal and have the same hashCode, thanks to custom implementation of
hashCode
andequals
.Suppose you have different
List
implementations. For example,ArrayList
andLinkedList
. Both contain the same elements and from the user's point of view they are equal and should have the same hashCode. And they indeed equal and have the same hashCode. However their internal structure is completely different:ArrayList
contains an array whileLinkedList
contains pointers to the objects representing head and tail. So you cannot just generate the hashCode based on their fields: it surely will be different.Some object may contain the field which is lazily initialized (initialized to null and calculated from other fields only when necessary). What if you have two otherwise equal objects and one has its lazy field initialized while other is not? We should exclude this lazy field from hashCode calculation.
So, there are many cases when universal hashCode approach would not work and may even produce problems (like making your program crash with StackOverflowError
or stuck enumerating all the linked objects). Due to this the simplest implementation was selected which is based on object identity. Note that the contract of hashCode
and equals
requires them to be consistent, and it's fulfilled by default implementation. If you redefine equals
, you just must redefine hashCode
as well.
-
1you just nailed it, thanks for sharing your analysis.Rupesh– Rupesh2016年02月19日 10:46:38 +00:00Commented Feb 19, 2016 at 10:46
You broke the contract.
hashcode and equals should be written in such a way, that when equals return true these objects has same hashcode.
If you override equals then you must provide hashcode that works properly.
Default implementation can't handle it, because default implementation don't know which fields are important. And automatic implementation would not do it in efficient way, the hashcode function is to speed up operations like data lookup in data structures, if it is implemented improperly, then performance will suffer.
-
Yes that i understand and agree but i broke it unknowingly... why the default implementation cannot or should not handle it.Rupesh– Rupesh2016年02月19日 10:00:55 +00:00Commented Feb 19, 2016 at 10:00
-
The contract that
Object
obeys by havingequals()
asa == b
which guarantees thata.hashCode() == b.hashCode()
.Kayaman– Kayaman2016年02月19日 10:01:11 +00:00Commented Feb 19, 2016 at 10:01
From the Docs
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
From documentation:
If two objects are equal according to the equals(Object) method, then calling the hashCode} method on each of the two objects must produce the same integer result.
then if you overrides how equals() behave, you must override hashCode() as well.
Also, from docs of equals() -
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
From javadoc of Object
class:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
Thus if default implementation provides the same hash, it defeats the purpose.
And for a default implementation, it cannot assume all the classes are of value class, thus the last sentence from doc:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
hashCode
always returned the same value by default, you would get terrible performance from hash-based data structures.equals
without overridinghashCode
consistently that makes it functionally incorrect.