2

Could anybody help me to understand, how the GetHashCode() method works for string values?

From MSDN I found:

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

So, different strings can return the same hash code, hashcode is not unique for strings. Can it lead to bugs in the core of a program?

asked Sep 17, 2012 at 18:21
1

5 Answers 5

3

Well, this can lead to bugs, if your algorithms rely on each string to have a unique hash-value.

For example a hash map (Dictionary in .NET) might fail on collisions (i.e. two objects with the same hash that are not equal), or it doesn't if it handles collisions, that depends on the exact implementation. Fail in that case means: If you add a new object to the map and there is already an object in the map that has the same hash value as the new object, then the new object will override the old one instead of just being added. As far as I know the Dictionary class in .NET can handle collisions though.

If you need more concrete advice you'll need to ask a more concrete question: what are you trying to archive, how are you planning to do it etc.

As a side-note: hash values for strings are usually not unique as the size of a hash value is limited, whereas the length of a string is not. Think of it like this: Say the hash function is MD5 (it's not the default in .Net) and you have a string both consisting of hex-dec chars (0-9A-Z) and the string is 200 characters long: there are 200^16 possible values for the string, but only 32^16 possible values for its hash-value.

answered Sep 17, 2012 at 18:24
1
  • 1
    Thank you, Mene. The question is not concrete, because I have not really faced with this problem, so it's just a theoretical problem that was interesting for me. Usually nobody cares about this issue. Commented Sep 18, 2012 at 8:45
3

It can lead to bugs if you assume that a matching hash code means a matching string. Typically, you use the hash code to sort strings into buckets for quick searching and selection. If you detect that two strings have a matching hash code, you would then typically compare the strings themselves for equality.

If that doesn't answer your question, then I don't understand the question.

answered Sep 17, 2012 at 18:23
4
  • 1
    Thank you for your answer, David. As I understand, hashcode is required to identify, that objects are different, but since it is possible, that two objects have the same hashcode value, does it mean, that something wrong could be happened in the future? Commented Sep 17, 2012 at 18:31
  • 1
    It's not required. It's good practice to override the hash values if you override the equality function as you would usually expect equal objects to return the same hash value. Also some things rely on this, like hash maps (Dictionary in .NET) Commented Sep 17, 2012 at 18:34
  • 2
    @Warlock: I don't understand the question. Two strings having the same hash code even if they're not identical does not indicate something wrong. It's like two people having the same initials but different names. Commented Sep 17, 2012 at 18:42
  • Ok, David, you are right. I believe that now things are in their right places. Really hash code is used to speed up searching of objects. I missed this detail and it became as a misunderstanding. Commented Sep 17, 2012 at 19:03
2

So, different strings can return the same hash code, hashcode is not unique for strings. Can it lead to bugs in the core of a program?

It should not lead to bugs, provided the hash values are used as intended. Hashes returned by GetHashCode() are not intended to provide unique hashes - this would be impossible, as there are only about 4 billion possible hash codes (as that method returns an Int32), but an infinite number of possible strings.

Hashes are meant to provide few collections, not no collisions. As such, you should never assume that a hash is a unique representation based on value. The only guarantee you get is that two different hash codes for two different strings mean that the strings are not equal, as two equal values should always produce the same hash. Two hash codes that are equal, however, do not necessarily mean the two string values are equal.

answered Sep 17, 2012 at 18:28
2

Hashcode is used to speed up searching of objects in Hash collections. Internally they store objects in many buckets. Objects being held are divided into buckets based on their hashcode. So when you call for example

var value = Dictionary["someKey"]

dictionary instead of searching in all internal buckets goes directly to the bucket that should contain value under that key. And dictionary searches only in that bucket.

Maybe this isn't exactly how it's implemented but it should be it more or less. So in this case it doesn't matter that different keys in dictionary have same hashcode. That only means that values under that keys will end up in same bucket.

answered Sep 17, 2012 at 18:29
1
  • 1
    Thank you for your answer, Peri. It is very helpful for understanding internal processes in .Net. I would like to know it deeper and I will be very excited if you provide references on articles or books :) Commented Sep 17, 2012 at 18:47
2

The documentation is pretty accurate on the guarantees the method makes, actually. The hash code just follows the following two rules (a == b refers to a.Equals(b), #a refers to a.GetHashCode() for readability reasons):

  • If a == b Then #a == #b
  • If #a != #b Then a != b

Note that this is not an equivalence between Equals and a matching hash. If you rely on more than that, well, then your code has a bug, obviously. GetHashCode is intended for using objects as keys in dictionaries so that there is a quick mapping from objects to a number, but it doesn't need to be reversible. If you look at strings you can easily see that the number of possible strings quickly surpasses the number of possible hash codes, so you could have answered that question on your own. You're beyond 232 possible strings at a little more than two characters already.

answered Sep 17, 2012 at 18:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.