I have a scenario when I want to have a HashSet<ITrackableObjectData>
, where ITrackableObjectData
has string TypeName
and object Id
readonly immutable properties. Id
is a primitive value, ie. string
, int
, long
, Guid
etc.
I want to use these equality rules:
- If the
TypeName
property is different, should anyway be considered non-equal. - If the
Id
is zero, null etc, should use the base object hashcode (reference equal). - If
Id
is not zero, then it should compare theTypeName
andId
.
I created the following comparer to provided to the HashSet
's constructor, and I'd like to hear from you of any improvements/vandalism in my code. As you can see I'm using the GetHashCode
in the Equals
method, is this a bad-idea?
Note that these are my equality requirements in my scenario.
class ITrackableObjectDataComparer : EqualityComparer<ITrackableObjectData>
{
public override bool Equals(ITrackableObjectData x, ITrackableObjectData y) =>
GetHashCode(x) == GetHashCode(y);
public override int GetHashCode(ITrackableObjectData obj)
{
if (obj == null) return 0;
var typeHash = obj.TypeName == null ? 0 : obj.TypeName.GetHashCode();
var idHash = obj.Id == null ? 0 : obj.Id.GetHashCode();
idHash = idHash == 0 ? obj.GetHashCode() : idHash;
return typeHash ^ idHash;
}
}
Note that this EqualityComparer
is a private class and is only meant to serve the purpose of this specific HashSet
which is not exposed at all.
In fact I don't also really care about the Equals
, but rather about the GetHashCode
, so I know whether an object with my specs have already been added to a HashSet
.
Update:
This is basically the equality checker that fulfills my requirements (but again, I don't need an equality comparer, I need a hash equality-comparer)
public override bool Equals(ITrackableObjectData x, ITrackableObjectData y)
{
if (x == y) return true; //ref eq.
else if (x == null)
return y == null;
else if (y == null)
return false;
else if (x.TypeName != y.TypeName)
return false;
else
{
if (x.Id == null)
return y.Id == null;
else if (y.Id == null)
return false;
else
{
int xId = x.Id.GetHashCode(),
yId = y.Id.GetHashCode();
if (xId == 0 || yId == 0)
return x.Equals(y); //use default eq.
else
return xId == yId;
}
}
}
2 Answers 2
As Nikita mentioned, never use GetHashCode
for equality comparison!
This SO answer shows a more sophisticated GetHashCode
implementation. Translated to you case it could look like:
public override int GetHashCode(ITrackableObjectData obj)
{
if (obj == null)
return 0;
if (obj.TypeName == null && obj.Id == null)
return obj.GetHashCode();
unchecked // disable overflow, for the unlikely possibility that you
{ // are compiling with overflow-checking enabled
int hash = 27;
hash = (13 * hash) + obj.TypeName?.GetHashCode() ?? 0;
hash = (13 * hash) + obj.Id?.GetHashCode() ?? 0;
return hash;
}
}
-
\$\begingroup\$ In my scenario, since
TypeName
is never expected to be equal toId
, sum looks redundant to me. \$\endgroup\$Shimmy Weitzhandler– Shimmy Weitzhandler2017年06月05日 12:42:46 +00:00Commented Jun 5, 2017 at 12:42 -
2\$\begingroup\$ I think you misunderstand why we combine with something more sophisticated than
^
. It's not to avoid a hash of 0 (which is a perfectly acceptable hash value); it's to reduce the likelihood of collision (where two non-equal objects have the same hash value). Collisions aren't wrong, but if you have too many, they can affect performance of algorithms that depend on hash codes. For example, we'd like (0,1) to produce a different result to (1,0), which a simple^
or+
won't achieve. \$\endgroup\$Toby Speight– Toby Speight2017年06月05日 16:57:43 +00:00Commented Jun 5, 2017 at 16:57 -
2\$\begingroup\$ If a class can be expected to cache the result from
GetHashCode
in cases where it's expensive to compute, it can sometimes be advantageous to callGetHashCode
as the first step in an equality comparison. If, for example, one has two deeply-immutable trees which are identical except for the last branch, havingEquals
checkGetHashCode()
values before doing anything else may allow it to quickly return false without having to scan through all the matching branches first. Such an approach would be counterproductive ifGetHashCode
isn't cached, however. \$\endgroup\$supercat– supercat2017年06月05日 17:56:51 +00:00Commented Jun 5, 2017 at 17:56 -
1\$\begingroup\$ @Shimmy: Do you know for which Id values
Id.GetHashCode()
returns 0? \$\endgroup\$JanDotNet– JanDotNet2017年06月06日 06:27:49 +00:00Commented Jun 6, 2017 at 6:27 -
1\$\begingroup\$ Than try:
new Guid("d7a24d8c-75a2-4d64-9061-a21aba171be8").GetHashCode()
;) \$\endgroup\$JanDotNet– JanDotNet2017年06月06日 17:43:39 +00:00Commented Jun 6, 2017 at 17:43
I think you are missing the point. Different objects can produce same hashcode. This can happen:
obj.Equals(otherObj) // false
obj.GetHashCode() == otherObj.GetHashCode() //true
If for some reason you want your Equals
methods to return true
when hashcodes are equal, then by all means. Make sure to document this behavior though.
However if you want to compare references , then use ReferenceEquals
method instead. Hashcode comparison will not work for that scenario, because hashcode equality does not guarantee that object are (reference-)equal.
P.S. Also keep in mind, that hashcodes should not change, otherwise hash-tables might break. Make sure that Id
and TypeName
are read-only properties.
-
\$\begingroup\$ Thank you. See my updates. The cautions you mentioned are valid in my scenario. \$\endgroup\$Shimmy Weitzhandler– Shimmy Weitzhandler2017年06月05日 12:34:35 +00:00Commented Jun 5, 2017 at 12:34
-
1\$\begingroup\$ @Shimmy I'm not sure I follow your explanation. You may not care about
Equals
method, but hash-tables will use it internally to compare items, it is pivotal. If equality comparison does not work correctly, neither will yourHashSet
. \$\endgroup\$Nikita B– Nikita B2017年06月05日 12:45:47 +00:00Commented Jun 5, 2017 at 12:45 -
1\$\begingroup\$ @Shimmy,
GetHashCode
method is used to find correct bucked inside aHashSet
.Equals
is then used to find correct item inside that bucket. Those are two different methods that have different signature and serve different purpose. I don't know what you mean, when you say that they are "equal". Does yourEquals
returntrue
when hashcodes are equal? Yes. Does this behavior match your requirements? Most likely not, since instead of comparing IDs you compare IDs' hashcodes, like it's the same thing. It is not the same thing. \$\endgroup\$Nikita B– Nikita B2017年06月05日 13:36:44 +00:00Commented Jun 5, 2017 at 13:36 -
1\$\begingroup\$ One of the other cases to consider is that two sets of different values can return the same hash code. \$\endgroup\$krillgar– krillgar2017年06月05日 13:58:03 +00:00Commented Jun 5, 2017 at 13:58
-
1\$\begingroup\$ @Shimmy That’s fair enough, but then don’t implement
Equals
for your class: by doing so you’re lying to the user of your class (even if that’s just you — it’s simply bad documentation). Unfortunately C# doesn’t allow you to delete the method outright but you should then leave it with its default semantic: reference equality. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2017年06月05日 21:52:12 +00:00Commented Jun 5, 2017 at 21:52
ITrackableObjectData
and one of its implementations... Especially given the nullability of both values involved. \$\endgroup\$Equals
asif (x.GetHashcode() != y.GetHashcode()) return false; else ...
but there has to be something in thatelse
that deals with the situation where there are equal hash codes but unequal objects. \$\endgroup\$Equals
implementation does not follow the requirements you listed. It's just wrong and it won't work. It might work most of the time due to pure luck, but sooner or later you will lose objects due to hashcode collisions, which your implementation does not account for. You might not care about it, but users of your software most likely will when they lose their data. Whether or not this class isprivate
is completely irrelevant. Private class should work correctly too. \$\endgroup\$