2

I am trying to understand why JVM's default implementation does not return same hashcode() value for all the objects...

I have written a program where i have overridden equals() but not hashCode(), and the consequences are scary.

  1. HashSet is adding two objects even the equals are same.
  2. TreeSet is throwing exception with Comparable implementation..

And many more..

Had the default Object'shashCode() implementation returns same int value, all these issues could have been avoided...

I understand their's alot written and discussed about hashcode() and equals() but i am not able to understand why things cant be handled at by default, this is error prone and consequences could be really bad and scary..

Here's my sample program..

import java.util.HashSet;
import java.util.Set;
public class HashcodeTest {
 public static void main(String...strings ) {
 Car car1 = new Car("honda", "red");
 Car car2 = new Car("honda", "red");
 Set<Car> set = new HashSet<Car>();
 set.add(car1);
 set.add(car2);
 System.out.println("size of Set : "+set.size());
 System.out.println("hashCode for car1 : "+car1.hashCode());
 System.out.println("hashCode for car2 : "+car2.hashCode());
 }
}
class Car{
 private String name;
 private String color;
 public Car(String name, String color) {
 super();
 this.name = name;
 this.color = color;
 }
 public String getName() {
 return name;
 }
 public void setName(String name) {
 this.name = name;
 }
 public String getColor() {
 return color;
 }
 public void setColor(String color) {
 this.color = color;
 }
 @Override
 public boolean equals(Object obj) {
 if (this == obj)
 return true;
 if (obj == null)
 return false;
 if (getClass() != obj.getClass())
 return false;
 Car other = (Car) obj;
 if (color == null) {
 if (other.color != null)
 return false;
 } else if (!color.equals(other.color))
 return false;
 if (name == null) {
 if (other.name != null)
 return false;
 } else if (!name.equals(other.name))
 return false;
 return true;
 }
}

Output:

size of Set : 2

hashCode for car1 : 330932989

hashCode for car2 : 8100393

asked Feb 19, 2016 at 9:56
8
  • If the default implementation of hashCode always returned the same value by default, you would get terrible performance from hash-based data structures. Commented Feb 19, 2016 at 10:02
  • Agree but it would be better than the wrong or incorrect functionality... Commented Feb 19, 2016 at 10:04
  • 1
    Possible duplicate of Why do I need to override the equals and hashCode methods in Java? Commented Feb 19, 2016 at 10:04
  • I believe you should be functionally correct before you look for the performance Commented Feb 19, 2016 at 10:04
  • 1
    The default functionality is correct, though, and gives good performance in hash-based data structures. It is the fact that you have overridden equals without overriding hashCode consistently that makes it functionally incorrect. Commented Feb 19, 2016 at 10:06

5 Answers 5

3

It seems that you want to propose to calculate hashCode by default just by taking all the object fields and combining their hashCodes using some formula. Such approach is wrong and may lead to many unpleasant circumstances. In your case it would work, because your object is very simple. But real life objects are much more complex. A few examples:

  • Objects are connected into double-linked list (every object has previous and next fields). How default implementation would calculate the hashCode? If it should check the fields, it will end up with infinite recursion.

  • Ok, suppose that we can detect infinite recursion. Let's just have single-linked list. In this case the hashCode of every node should be calculated from all the successor nodes? What if this list contains millions of nodes? All of them should be checked to generate the hashCode?

  • Suppose you have two HashSet objects. First is created like:

    HashSet<Integer> a = new HashSet<>();
    a.add(1);
    

    The second is created like this:

    HashSet<Integer> b = new HashSet<>();
    for(int i=1; i<1000; i++) b.add(i);
    for(int i=2; i<1000; i++) b.remove(i);
    

    From user's point of view both contain only one element. But programmatically the second one holds big hash-table inside (like array of 2048 entries of which only one is not null), because when you added many elements, the hash-table was resized. In contrast, the first one holds small hash-table inside (e.g. 16 elements). So programmatically objects are very different: one has big array, other has small array. But they are equal and have the same hashCode, thanks to custom implementation of hashCode and equals.

  • Suppose you have different List implementations. For example, ArrayList and LinkedList. Both contain the same elements and from the user's point of view they are equal and should have the same hashCode. And they indeed equal and have the same hashCode. However their internal structure is completely different: ArrayList contains an array while LinkedList contains pointers to the objects representing head and tail. So you cannot just generate the hashCode based on their fields: it surely will be different.

  • Some object may contain the field which is lazily initialized (initialized to null and calculated from other fields only when necessary). What if you have two otherwise equal objects and one has its lazy field initialized while other is not? We should exclude this lazy field from hashCode calculation.

So, there are many cases when universal hashCode approach would not work and may even produce problems (like making your program crash with StackOverflowError or stuck enumerating all the linked objects). Due to this the simplest implementation was selected which is based on object identity. Note that the contract of hashCode and equals requires them to be consistent, and it's fulfilled by default implementation. If you redefine equals, you just must redefine hashCode as well.

answered Feb 19, 2016 at 10:38
1
  • 1
    you just nailed it, thanks for sharing your analysis. Commented Feb 19, 2016 at 10:46
3

You broke the contract.

hashcode and equals should be written in such a way, that when equals return true these objects has same hashcode.

If you override equals then you must provide hashcode that works properly.

Default implementation can't handle it, because default implementation don't know which fields are important. And automatic implementation would not do it in efficient way, the hashcode function is to speed up operations like data lookup in data structures, if it is implemented improperly, then performance will suffer.

answered Feb 19, 2016 at 9:59
2
  • Yes that i understand and agree but i broke it unknowingly... why the default implementation cannot or should not handle it. Commented Feb 19, 2016 at 10:00
  • The contract that Object obeys by having equals() as a == b which guarantees that a.hashCode() == b.hashCode(). Commented Feb 19, 2016 at 10:01
0

From the Docs

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

answered Feb 19, 2016 at 10:02
0

From documentation:

If two objects are equal according to the equals(Object) method, then calling the hashCode} method on each of the two objects must produce the same integer result.

then if you overrides how equals() behave, you must override hashCode() as well.

Also, from docs of equals() -

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

answered Feb 19, 2016 at 10:02
0

From javadoc of Object class:

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

Thus if default implementation provides the same hash, it defeats the purpose.

And for a default implementation, it cannot assume all the classes are of value class, thus the last sentence from doc:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

answered Feb 19, 2016 at 10:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.