It seems that SQL Server considers 0x and 0x00 equal values:
SELECT CASE WHEN 0x = 0x00 THEN 1 ELSE 0 END
This outputs 1
.
How can I get true binary bit-for-bit comparison behavior? Also, what are the exact rules under which two (var)binary
values are considered equal?
Also note the following behavior:
--prints just one of the values
SELECT DISTINCT [Data]
FROM (VALUES (0x), (0x00), (0x0000)) x([Data])
--prints the obvious length values 1, 2 and 3
SELECT DATALENGTH([Data]) AS [DATALENGTH], LEN([Data]) AS [LEN]
FROM (VALUES (0x), (0x00), (0x0000)) x([Data])
Background of the question is that I'm trying to deduplicate binary data. I need to GROUP BY
binary data, not just compare two values. I'm glad I even noticed this problem.
Note, that HASHBYTES
does not support LOBs. I'd also like to find a simpler solution.
-
0x0 is different from 0x (the latter is an empty blob). Regarding DATALENGTH: I'm not sure what the comparison rules are. Is this always enough to guarantee equality? I can't guess here, must be correct. Also, it makes grouping on a blob impossible.usr– usr2013年08月24日 11:55:33 +00:00Commented Aug 24, 2013 at 11:55
2 Answers 2
I couldn't find this comparison behaviour specified anywhere in BOL.
But the Connect Item Invalid equality comparison for varbinary data with right padded zeros states that
Basically, the standard leaves it up to implementation to treat strings that differ only by [trailing]
00
as equal or less. We treat it as equal.
The Connect Item also states that the presence of trailing zeroes is the only case in which SQL Server differs from byte-by-byte comparison behavior.
In order to distinguish between two binary values in SQL Server that differ only by trailing 0x00
characters you can also add DATALENGTH
into the comparison as indicated in your question.
The reason for preferring DATALENGTH
rather than LEN
generally here is because the latter gives an implicit cast to varchar
and then you get the problem with trailing spaces.
+-------------+--------------------+
| LEN(0x2020) | DATALENGTH(0x2020) |
+-------------+--------------------+
| 0 | 2 |
+-------------+--------------------+
Though either would work in your use case.
Interestingly enough, the two values 0x0 and 0x00 are just different character representations for the same stored value. Try running the following snippet to prove this to yourself.
DECLARE @foo sql_variant
, @bar sql_variant
, @bat sql_variant
SET @foo = 0x0
SET @bar = 0x00
SET @bat = 0x00000000
SELECT 'foo' AS 'Var', @foo AS 'Value'
, SQL_VARIANT_PROPERTY(@foo, 'BaseType') AS 'BaseType'
, SQL_VARIANT_PROPERTY(@foo, 'Precisionh') AS 'Precision'
, SQL_VARIANT_PROPERTY(@foo, 'Scale') AS 'Scale'
, SQL_VARIANT_PROPERTY(@foo, 'MaxLength') AS 'MaxLength'
UNION
SELECT 'bar' AS 'Var', @bar
, SQL_VARIANT_PROPERTY(@bar, 'BaseType')
, SQL_VARIANT_PROPERTY(@bar, 'Precisionh')
, SQL_VARIANT_PROPERTY(@bar, 'Scale')
, SQL_VARIANT_PROPERTY(@bar, 'MaxLength')
UNION
SELECT 'bat' AS 'Var', @bat
, SQL_VARIANT_PROPERTY(@bat, 'BaseType')
, SQL_VARIANT_PROPERTY(@bat, 'Precisionh')
, SQL_VARIANT_PROPERTY(@bat, 'Scale')
, SQL_VARIANT_PROPERTY(@bat, 'MaxLength')
SELECT
CASE
WHEN @foo = @bar
THEN 'equal'
ELSE 'NOT EQUAL'
END AS TestResults
I can understand why the zero padding would surprise people, but that has been the default behavior for a very long time so I guess that I've come to expect it.
-PatP
-
Thanks, the 0x0 first appeared in the comments. The question is about 0x and 0x00 (and similar values). Binaries track values on a byte-level so there is no possibility to store half a byte.usr– usr2013年08月31日 11:05:05 +00:00Commented Aug 31, 2013 at 11:05
Explore related questions
See similar questions with these tags.