2
\$\begingroup\$

I've built a small library for comparing objects based on their members here (related SO question).

There's a need to compute hash function in order to conform to IEqualityComparer<T> interface. In my case this function has to be computed based on the hash functions of members. So the main problem is how to compose them.

I use the following approach currently:

public int GetHashCode(T obj)
{
 VerifyHaveMembersSetup();
 if (TypeTester<T>.IsNull(obj))
 {
 return 0;
 }
 if (getHashCodeBehavior == GetHashCodeBehavior.Cache && cachedHashCode.HasValue)
 {
 return cachedHashCode.Value;
 }
 int hashCode = 0;
 for (int i = 0; i < memberSetups.Count; ++i)
 {
 int leftShift = i % 32;
 int rightShift = 32 - leftShift;
 int memberHashCode = memberSetups[i].GetMemberHashCode(obj);
 hashCode = hashCode ^ ((memberHashCode << leftShift) | (memberHashCode >> rightShift));
 }
 if (getHashCodeBehavior == GetHashCodeBehavior.ImmutableCheck
 && cachedHashCode.HasValue && cachedHashCode.Value != hashCode)
 {
 throw new InvalidOperationException("Hash code value changed");
 }
 cachedHashCode = hashCode;
 return hashCode;
}
  1. Are there any problems with this code?

  2. I don't like the way many people implement composite functions. With multiplication and addition, significant bits are shifted away and are lost.

That's why I use bitshift operators << and >>. Still I'm not completely sure how good would be a hash function generated such way. Especially when it's computed over small collection of memberSetups of bools for example (since bool has only 2 hash values in .NET: 1 and 0).

asked Jul 11, 2013 at 15:15
\$\endgroup\$
2
  • \$\begingroup\$ "with multiplications and additions significant bits are shifted away and are lost." I'm not buying that argument. The very nature of hashing throws away information, but keeps it within a particular space (in our .NET world, it's a 32-bit signed integer). Read the pages linked to from that page (eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx, blogs.msdn.com/b/ericlippert/archive/2011/02/28/… and amazon.com/dp/0321356683/?tag=stackoverfl08-20). I think you'll find Skeet's implementation solid. \$\endgroup\$ Commented Jul 11, 2013 at 16:00
  • \$\begingroup\$ Okay I was wrong. 23 used in multiplication there has lowest bit of 1 (the number is odd) that's why it will not be shifting everything out completely. \$\endgroup\$ Commented Jul 11, 2013 at 18:16

2 Answers 2

2
\$\begingroup\$

Shifting the values before xor:ing them is somewhat better than the worst thinkable ways of combining the hash codes, but it's not very good. You will easily get combinations that cancel each other out, so you get hash codes with a distribution heavily skewed towards zero.

Multiplying values by a prime number on the other hand gives a pretty good distribution. Back in the day it was actually used to produce random numbers, partly because of how it spreads the values reasonably even over the number range.

A method that would give an even better distribution would be to use an advanced hashing algorithm like MD5 or CRC32, but the drawback would be that they are a lot slower. You would lose much of the advantage of using a hash code in the first place.

All in all, multiplying by a prime number gives a very good distribution in relation to the processing time needed. It's a compromise well suited for the GetHashCode method.

answered Jul 12, 2013 at 13:21
\$\endgroup\$
2
  • \$\begingroup\$ Why are prime numbers so important? Why is it better than just odd numbers? \$\endgroup\$ Commented Jul 12, 2013 at 22:53
  • \$\begingroup\$ @Mike: I don't know the mathematics behind that, but Derrick Henry Lehmer did. Here is a start for further reading: en.wikipedia.org/wiki/Lehmer_random_number_generator \$\endgroup\$ Commented Jul 13, 2013 at 12:54
0
\$\begingroup\$

If I'm not wrong, your hashcode starts using the most significant bit only when memberHashCode << leftShift is using it. This may very well never be the case, if memberSetups.Count is low, and the hashcode of the object you're using is not perfect. For example, ints or bool.

As so, in the worst cases, your hashes are only spread a bit better than your inputs over al the integers, which is not 100% satisfying. Maybe instead of ++iin the loop, you could go with i+=32/memberSetups.Countto fasten the process of using the most significant bits (even if that's still not perfect).

answered Jul 11, 2013 at 16:03
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.