6

I encountered a problem days ago.Now i have tens of millions of words,type of string. now i decide to keep them in database and use index to keep them unique.And i do not want to compare the original words to keep them unique. I would like to make sure whether the hashCode() method of a string can be unique , will it not be changed if a use another laptop or different time or something like that?

asked Sep 9, 2014 at 3:59
7
  • 2
    No, it is deterministic. Commented Sep 9, 2014 at 4:00
  • 3
    How many distinct values can hashCode return? How many distinct strings are there? GO!!! FIT IN! Commented Sep 9, 2014 at 4:00
  • 1
    Anyway, tldr; this is not a suitable use of hashCode. While a hash like SHA-x doesn't have these "issues" (or rather we can pretend that collisions are too unlikely to care about), if just comparing single words then resulting hashes (20 bytes for SHA-1) is larger than the original input! No win! Commented Sep 9, 2014 at 4:02
  • Assuming RDBMS, add AUTO-INCREMENT field to your table and it will be populated with unique numbers during INSERT. Commented Sep 9, 2014 at 4:11
  • 2
    @user2864740 String's hashCode is defined by the String API specification so it is required to be the same for equal strings across implementations. Commented Sep 9, 2014 at 4:12

3 Answers 3

11

Unique, no. By nature, hash values are not guaranteed to be unique.

Any system with an arbitrarily large number of possible inputs and a limited number of outputs will have collisions.

So, you won't be able to use a unique database key to store them if it's based only on the hash code. You can, however, use a non-unique key to store them.

In reply to your second question about whether different versions of Java will generate different hash codes for the same string, no.

Provided a Java implementation follows the Oracle documentation (otherwise it's not really a Java implementation), it will be consistent across all implementations. The Oracle docs for String.hashCode specify a fixed formula for calculation the hash:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

You may want to check this is still the case if you're using wildly disparate versions of Java (such as 1.2 vs 8) but it's been like that for a long time, at least since 1.5.

answered Sep 9, 2014 at 4:05
1
  • 1
    In fact, by the pigeonhole principle, since there are (much) more than 2^32 possible strings, hash codes are guaranteed not to be unique. Commented Sep 9, 2014 at 4:07
10

No,

Because a string in java can have maximum 2,147,483,647 (2^31 - 1) no of characters and all characters will vary so it will produce a very large no of combinations, but integer have only a range from -2,147,483,648 to 2,147,483,648. So it is impossible, and using this method the hash code of a string is computed

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1].

Example :

If you create two string variables as "FB" and "Ea" there hash code will be same.

answered Oct 11, 2017 at 10:39
8

Below is the hashCode computation of a String which a JVM does. As stated it purely calculates based on the individual character and its position in the String and there is nothing which is dependent on JVM or the machine type which runs the JVM which would alter the hashcode.

This is also one of the reason why String class is declared final (not extensible leading to immutability) so that no one alters its behaviour.

Below is as per spec:-

public int hashCode()

Returns a hash code for this string. The hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

answered Sep 9, 2014 at 4:15

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.