Because I often have to deal with two Variant
s that may or may not be Null
, we need a null-safe equality test, so I came up with IsDistinct
which works but I have some issues with the code:
- When I tried to keep it terse, the readability was harmed.
- When I tried to expand the logic for readability, it still makes for some thinking.
- I looked for potentials to short-circuit or otherwise result the number of steps to arrive at a result. In this case, it all takes 2 evaluations, unless both sides are non-null, in which case we have 3 evaluations.1
Can we do better?
Public Function IsDistinct(LeftValue As Variant, RightValue As Variant) As Boolean
If IsNull(LeftValue) Then
If IsNull(RightValue) Then
IsDistinct = False
Else
IsDistinct = True
End If
Else
If IsNull(RightValue) Then
IsDistinct = True
Else
IsDistinct = Not (LeftValue = RightValue)
End If
End If
End Function
Inputs & Expected Outputs
LeftValue RightValue Result
1 1 False
1 0 True
Null 1 True
0 Null True
Null Null False
"" "" False
"" Null True
Note that it doesn't have to be just 0
and 1
; it could be text, dates, or numbers. It's more important that when either inputs are Null
, it should automatically be True
since Null
will always be "distinct" from any non-Null
values. However, when both inputs are Null
, then it's always False
because we are considering them "equal" in this situation.2
The special case of an empty string and a Null
is arguably problematic. I've swung both ways; sometimes I want empty string to be considered "equal" to a Null
, sometimes I don't. In the IsDistinct
as defined, they are not considered equal.3
1) In one of iterations, I considered starting with Result = (LeftValue = RightValue)
and doing additional evaluation if the Result
was Null
, signaling that either or both inputs were Null
. But IIRC, I found that it made for more steps since I had to evaluate each inputs with IsNull
to determine whether both were Null
and thus not distinct.
2) As a matter of fact, when I look at the first 3 lines:
If IsNull(LeftValue) Then
If IsNull(RightValue) Then
IsDistinct = False
My instinct is to go "wait, that's not right.", thinking about it and realize it IS correct. The fact that I stumble on that even more than once tells me that it's quite hard to read. Boo.
3) Thanks to @Ryan Wildry for pointing this blind spot out!
5 Answers 5
You have no idea how excited I am about this. Years after learning about the Xor
operator and I finally get to use it!!
VBA - Logical Operators: Xor Operator
Called Logical Exclusion. It is the combination of NOT and OR Operator. If one, and only one, of the expressions evaluates to be True, the result is True.
The Xor
seems like a great fit because both the parameters need to be tested for nulls and the values can not equal one another.
My Version
Public Function NotEquals(LeftValue As Variant, RightValue As Variant) As Boolean
If IsNull(LeftValue) Xor IsNull(RightValue) Then
NotEquals = True
ElseIf LeftValue <> RightValue Then
NotEquals = True
End If
End Function
Results
Crude Tests
Option Explicit
Public Const Repetitions = 10000000
Public Function IsDistinct(LeftValue As Variant, RightValue As Variant) As Boolean
If IsNull(LeftValue) Then
If IsNull(RightValue) Then
IsDistinct = False
Else
IsDistinct = True
End If
Else
If IsNull(RightValue) Then
IsDistinct = True
Else
IsDistinct = Not (LeftValue = RightValue)
End If
End If
End Function
Public Function NotEquals(LeftValue As Variant, RightValue As Variant) As Boolean
If IsNull(LeftValue) Xor IsNull(RightValue) Then
NotEquals = True
ElseIf LeftValue <> RightValue Then
NotEquals = True
End If
End Function
Sub CompareResults()
Dim SumOfTime As Double
Debug.Print "Number of Repetitions: "; FormatNumber(Repetitions, 0)
Debug.Print
Debug.Print "IsDistinct Test"
Debug.Print "Left"; Tab(8); "Right"; Tab(16); "Time"; Tab(24); "Result"
SumOfTime = TestIsDistinct(1, 1)
SumOfTime = SumOfTime + TestIsDistinct(1, 0)
SumOfTime = SumOfTime + TestIsDistinct(Null, 1)
SumOfTime = SumOfTime + TestIsDistinct(0, Null)
SumOfTime = SumOfTime + TestIsDistinct(Null, Null)
Debug.Print "Total Time for All IsDistinct Test: "; SumOfTime
Debug.Print
Debug.Print "NotEquals Test"
Debug.Print "Left"; Tab(8); "Right"; Tab(16); "Time"; Tab(24); "Result"
SumOfTime = TestNotEquals(1, 1)
SumOfTime = SumOfTime + TestNotEquals(1, 0)
SumOfTime = SumOfTime + TestNotEquals(Null, 1)
SumOfTime = SumOfTime + TestNotEquals(0, Null)
SumOfTime = SumOfTime + TestNotEquals(Null, Null)
Debug.Print "Total Time for All NotEquals Test: "; SumOfTime
End Sub
Function TestIsDistinct(LeftValue As Variant, RightValue As Variant) As Double
Dim t As Double
t = Timer
Dim Result As Boolean
Dim n As Long
For n = 1 To Repetitions
Result = IsDistinct(LeftValue, RightValue)
Next
Dim SumOfTime As Double
SumOfTime = Round(Timer - t, 4)
Debug.Print LeftValue; Tab(8); RightValue; Tab(16); SumOfTime; Tab(24); Result
TestIsDistinct = SumOfTime
End Function
Function TestNotEquals(LeftValue As Variant, RightValue As Variant) As Double
Dim t As Double
t = Timer
Dim Result As Boolean
Dim n As Long
For n = 1 To Repetitions
Result = NotEquals(LeftValue, RightValue)
Next
Dim SumOfTime As Double
SumOfTime = Round(Timer - t, 4)
Debug.Print LeftValue; Tab(8); RightValue; Tab(16); SumOfTime; Tab(24); Result
TestNotEquals = SumOfTime
End Function
-
\$\begingroup\$ Nice! One of my early iterations did indeed use
Xor
but I abandoned it for the sake of readability. I see that the key is how you set up theElseIf
condition, so even for bothNull
inputs, we still getFalse
. We still have 2 evaluations in the best case and 3 in the worse case. Well done! \$\endgroup\$this– this2019年12月05日 14:40:48 +00:00Commented Dec 5, 2019 at 14:40 -
\$\begingroup\$ I don't know much about VBA, but shouldn't you initialize
NotEquals
toFalse
at the start of the method? \$\endgroup\$Simon Forsberg– Simon Forsberg2019年12月23日 21:02:56 +00:00Commented Dec 23, 2019 at 21:02 -
\$\begingroup\$ @SimonForsberg Boolean variable are False by default. \$\endgroup\$TinMan– TinMan2019年12月23日 22:34:52 +00:00Commented Dec 23, 2019 at 22:34
Nested ifs are something I've come to knee-jerk refactor into similar below. Writing functions that state what they're doing reduces the burden to understand. When future-Iven comes along and evaluates the logic I won't be asking myself 'What's this doing?' as I'll already know.
Public Function IsDistinct(leftValue As Variant, rightValue As Variant) As Boolean
If AreBothValuesNull(leftValue, rightValue) Then
IsDistinct = False
Exit Function
End If
If IsOnlyOneValueNull(leftValue, rightValue) Then
IsDistinct = True
Exit Function
End If
IsDistinct = Not (leftValue = rightValue)
End Function
Private Function AreBothValuesNull(ByVal leftValue As Variant, ByVal rightValue As Variant) As Boolean
AreBothValuesNull = IsNull(leftValue) And IsNull(rightValue)
End Function
Private Function IsOnlyOneValueNull(ByVal leftValue As Variant, ByVal rightValue As Variant) As Boolean
IsOnlyOneValueNull = (IsNull(leftValue) And Not IsNull(rightValue) _
Or Not IsNull(leftValue) And IsNull(rightValue))
End Function
What about the same value but with different types? IsDistinct(1, "1")
comes to mind as a test case.
-
\$\begingroup\$ Regarding the question about different types -- I would consider that out of the function's scope. All the weird implicit conversion rules applies in this case and if the user doesn't like it, then user shouldn't be putting in garbage. Your code is definitely much more readable and I don't have to think as hard as with my original version. The only bad thing is that in the worst case, we have 7 evaluations, on the account that the
And
s andOr
s aren't lazy. \$\endgroup\$this– this2019年12月05日 14:44:31 +00:00Commented Dec 5, 2019 at 14:44 -
\$\begingroup\$ Is there a reason you've used
ByVal
in your helper functions and not in the main function signature? \$\endgroup\$Greedo– Greedo2019年12月05日 18:36:47 +00:00Commented Dec 5, 2019 at 18:36 -
\$\begingroup\$ Copy-pasta oversight. \$\endgroup\$IvenBach– IvenBach2019年12月05日 18:38:34 +00:00Commented Dec 5, 2019 at 18:38
Yet another approach, this time with string coercion. Knowing the type certainly helps simplify the tests, however it does come at a cost of speed. However thought I'd share for those who may be unfamiliar with the approach.
Option Explicit
Public Function IsDistinct(LeftValue As Variant, RightValue As Variant) As Boolean
IsDistinct = Not ((LeftValue & vbNullString) = (RightValue & vbNullString))
End Function
Public Sub Tests()
Debug.Print IsDistinct(1, 1)
Debug.Print IsDistinct(1, 0)
Debug.Print IsDistinct(Null, 1)
Debug.Print IsDistinct(0, Null)
Debug.Print IsDistinct(Null, Null)
Dim i As Long
Dim t As Double
Dim result As Boolean
t = Timer
'How long does it take?
For i = 1 To 100000
result = IsDistinct(1, 1)
result = IsDistinct(1, 0)
result = IsDistinct(Null, 1)
result = IsDistinct(0, Null)
result = IsDistinct(Null, Null)
Next
Debug.Print "This took " & Timer - t & " seconds"
End Sub
Here are the results of the tests:
False
True
True
True
False
This took 0.2421875 seconds
-
1\$\begingroup\$ String coercion goes wrong if you want to handle Null and zero-length strings separately, though \$\endgroup\$Erik A– Erik A2019年12月05日 15:20:08 +00:00Commented Dec 5, 2019 at 15:20
-
\$\begingroup\$ @ErikA Yes, in this scheme it's helpful to point out vbNullString and Null would return
False
. Depends if you care about type too. I think there is some practical applications where that doesn't matter very much (or you might want this behavior), but good to point out. +1 \$\endgroup\$Ryan Wildry– Ryan Wildry2019年12月05日 15:24:32 +00:00Commented Dec 5, 2019 at 15:24 -
\$\begingroup\$ That highlights a blind spot in my original tests. I would have considered an empty string "distinct" from a
Null
. But I agree there will be cases where for the purpose, they are "same". I will clarify the question accordingly. \$\endgroup\$this– this2019年12月05日 15:26:04 +00:00Commented Dec 5, 2019 at 15:26
Personally, when iterating possibilities, I always use SELECT CASE True
instead of nested If
and Else
statements to just have a list of possibilities for readability purposes.
Performance might not be 100% optimal, since this can lead to slightly more comparisons, but I will gladly take that to have more readable code.
Public Function IsDistinct(LeftValue As Variant, RightValue As Variant) As Boolean
Select Case True
Case IsNull(LeftValue) And IsNull(RightValue)
IsDistinct = False
Case IsNull(LeftValue) Or IsNull(RightValue)
IsDistinct = True
Case LeftValue = RightValue
IsDistinct = False
Case Else
IsDistinct = True
End Select
End Function
Note that because I'm iterating possibilities, I prefer not to do IsDistinct = Not (LeftValue = RightValue)
but only assign literals for readability (I'll just have Case Condition Then Literal
), but if you don't care, you can of course save two lines.
-
\$\begingroup\$ Definitely readable. Unfortunately, the worst case has 5 evaluations, though the best case is still only 2 evaluations (the first case with the
And
). \$\endgroup\$this– this2019年12月05日 14:50:40 +00:00Commented Dec 5, 2019 at 14:50 -
\$\begingroup\$ Yup, this could definitely be optimized to not call
IsNull
up to 4 times and require less assignments and comparisons (stripping away allIsDistinct = False
statements is also a trivial optimization, for example). However, the cost of this function is so little that I'd be surprised if you have a case where that isn't premature. \$\endgroup\$Erik A– Erik A2019年12月05日 15:01:57 +00:00Commented Dec 5, 2019 at 15:01 -
\$\begingroup\$ One case where it can matter is when the function is used as a part of the conditional formatting for an Access continuous form where it will be repeatedly evaluated as the user moves around in the form. That can matter. \$\endgroup\$this– this2019年12月05日 15:03:59 +00:00Commented Dec 5, 2019 at 15:03
Developing further based on TinMan's excellent answer and other posters' contributions, I came up with this:
Public Function IsDistinct(LeftValue As Variant, RightValue As Variant) As Boolean
If LeftValue <> RightValue Then
IsDistinct = True
ElseIf IsNull(LeftValue) Xor IsNull(RightValue) Then
IsDistinct = True
End If
End Function
Both this version and TinMan's version reduces the number of evaluations to only 1 in the best case and 2 for the worst case. However, this version exploits the fact that the If
conditions will treat Null
result as a falsy result.1
Therefore, only equal values and Null
result will proceed to the ElseIf
condition, which we use the Xor
to verify that they aren't both Null
. If that doesn't match, then we know that they are considered the same value.
This is more terse but I think the readability hurts a bit because of the LeftValue <> RightValue
potentially returning a Null
result which does not make for most intuitive thinking through the logical procession. But I would say that it's more common to compare one value against another value than it is to have Null
against any other value or having both Null
so that is also a good optimization, I think. Need to test whether this is in fact an improvement in speed, though.
1) Aside: I found out that Select Case True
does not like Null
results; we get an Invalid Use of Null
error with that structure. Thus, it must be a If/ElseIf
which tolerates Null
result. However, It's very important to remember that with a Null
result, it will never ever enter any If
or ElseIf
branch, and therefore always fall into the Else
or through.
If IsNull(RightValue) Then IsDistinct = False Else IsDistinct = True End If
can be simplified toIsDistinct = Not IsNull(RightValue)
\$\endgroup\$