I'm wondering what would be the best way to calculate the hashcode when the order of a sequence doesn't matter. Here's the custom IEqualityComparer<T>
i've implemented for an answer on Stackoverflow.
public class AccDocumentItemComparer : IEqualityComparer<AccDocumentItem>
{
public bool Equals(AccDocumentItem x, AccDocumentItem y)
{
if (x == null || y == null)
return false;
if (object.ReferenceEquals(x, y))
return true;
if (x.AccountId != y.AccountId)
return false;
return x.DocumentItemDetails.Select(d => d.DetailAccountId).OrderBy(i => i)
.SequenceEqual(y.DocumentItemDetails.Select(d => d.DetailAccountId).OrderBy(i => i));
}
public int GetHashCode(AccDocumentItem obj)
{
if (obj == null) return int.MinValue;
int hash = obj.AccountId.GetHashCode();
if (obj.DocumentItemDetails == null)
return hash;
int detailHash = 0;
unchecked
{
var orderedDetailIds = obj.DocumentItemDetails
.Select(d => d.DetailAccountId).OrderBy(i => i);
foreach (int detID in orderedDetailIds)
detailHash = 17 * detailHash + detID;
}
return hash + detailHash;
}
}
As you can see the foreach
in GetHashCode
needs to order the (nested) sequence before it starts calculating the hashcode. If the sequence is large this seems to be inefficient. Is there a better way to calculate the hashcode if the order of a sequence can be ignored?
Here are the simple classes involved:
public class AccDocumentItem
{
public string AccountId { get; set; }
public List<AccDocumentItemDetail> DocumentItemDetails { get; set; }
}
public class AccDocumentItemDetail
{
public int LevelId { get; set; }
public int DetailAccountId { get; set; }
}
1 Answer 1
If the order does not matter, use an kommutative operator to combine the hashCodes of the elements.
Possible candidates are:
^
binary xor
// THIS WOULD BE MY CHOICE
+
addition // problem may be the overflow
*
multiplication // problem may be the overflow
|
binary or
// not recommended, because after some of this operations it is likely that all bits are set, so the same hashCode would appear for quite different instances)
-
1\$\begingroup\$ Thanks. However, Mr Skeet himself advise againt using
xor
: stackoverflow.com/a/263416/284240 Thesum
approach is problematic since1 + 1
would be the same as2
. \$\endgroup\$Tim Schmelter– Tim Schmelter2013年09月30日 09:53:05 +00:00Commented Sep 30, 2013 at 9:53 -
\$\begingroup\$ @Tim Schmelter: You will never be able to avoid the possibility of collisions. And it is also not necessary to try to. Only try to avoid too many collisions because it would decrease performance of e.g. HashHmap. But feel free to multiply the hashCode of each element by the same constant before applying the kommutative operator It will change nothing
17*1 + 17*1 == 17*2
. \$\endgroup\$MrSmith42– MrSmith422013年09月30日 10:26:27 +00:00Commented Sep 30, 2013 at 10:26 -
1\$\begingroup\$ @Tim Schmelter: If you want the
order of a sequence doesn't matter
feature, I cannot imagen an implementation not using ankommutative operator
. \$\endgroup\$MrSmith42– MrSmith422013年09月30日 10:35:05 +00:00Commented Sep 30, 2013 at 10:35 -
3\$\begingroup\$ @TimSchmelter And half of the reason Skeet advises against XOR is because it's commutative. Normally, you don't want hash(x,y)==hash(y,x), but that's exactly what you're asking for. (The other half of his point still stands, though.) \$\endgroup\$svick– svick2013年09月30日 10:50:42 +00:00Commented Sep 30, 2013 at 10:50
-
2\$\begingroup\$ @MrSmith42
*
has a different (and possibly even worse) flaw: 0 * anything == 0. That's not hard to work around, but I think it's worth mentioning. \$\endgroup\$svick– svick2013年09月30日 11:07:55 +00:00Commented Sep 30, 2013 at 11:07
GetHashCode
is called is efficient and in case you have to compare two objects for equal content, you can use your redefined 'Equals' method. \$\endgroup\$ID
is efficient. But the whole point of creating a customIEqualityComparer<T>
is to use it for the linq extension methods likeGroupBy
. And the requirement was to differentiateParentClass
objets with theAccountId
and the nestedList<DetailClass>
and theirDetailAccountId
. So if theAccountId
is equal and all of theDetailAccountId
s(independent of the order), then both objects are equal. \$\endgroup\$