4

Recently I attempt to search for a particular pattern by converting XML data into varchar(max) although I'm aware it's not the best practice and found out it's not working as expected:-

Setup

declare @container table(
 [Response] xml not null
);
declare @xml xml =
'<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://abc.com/xsd" xmlns:ns="http://abc.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <soapenv:Header>
 <ns:MessageHeader>
 <xsd:ID>ABC</xsd:ID>
 <xsd:Date>2018年12月31日T23:59:59</xsd:Date>
 </ns:MessageHeader>
 </soapenv:Header>
 <soapenv:Body>
 <ns:MessageResponse>
 <ns:return>
 <xsd:ResponseList xsi:nil="true" />
 </ns:return>
 </ns:MessageResponse>
 </soapenv:Body>
 </soapenv:Envelope>';
insert into @container values (@xml);

This query works

select *
 from @container
 where cast(Response as varchar(max))
 like '%<xsd:ResponseList xsi:nil="true"%';

notice the wildcard character ends 3 characters (i.e.' />') before the XML node

but this is not

select *
 from @container
 where cast(Response as varchar(max))
 like '%<xsd:ResponseList xsi:nil="true" %' -- with space
 or cast(Response as varchar(max))
 like '%<xsd:ResponseList xsi:nil="true" />%' -- whole XML node;

I suspect this is probably due to escape characters and tried a few other alternatives but to no avail, appreciate if someone can shed some light on this.

EDIT (ANSWERED)

Following query would work based on Mr. Browstone's insight:-

select *
 from @container
 where cast(Response as varchar(max))
 like '%<xsd:ResponseList xsi:nil="true"/>%';

Here's my follow question @ CodeReview with XQuery expression:-

T-SQL Verify whether XML node from SOAP request contains any child nodes

asked Mar 17, 2019 at 9:22

1 Answer 1

9

This is by design.

When you store a document using the XML data type it is compressed and organised into a structure that Sql Server can perform operations on efficiently. One of the steps that it goes through to do this is to generate the InfoSet. When it does this, it removes anything that it determines to not be necessary, in your example, whitespace:

The InfoSet content may not be an identical copy of the text XML, because the following information is not retained: insignificant white spaces, order of attributes, namespace prefixes, and XML declaration.

When you select the entire contents of the field (such as when you are converting it to NVARCHAR(MAX) it rebuilds the XML document before returning it. This document may not be an identical copy of the document that you inserted. For example, if you have used self-closing elements, Sql Server may return opening and closing elements instead.

The documentation also continues on to say:

Example: Retaining Exact Copies of XML Data

For illustration, assume that government regulations require you to retain exact textual copies of your XML documents. For example, these could include signed documents, legal documents, or stock transaction orders. You may want to store your documents in a [n]varchar(max) column.

So, if you want to store the exact copy of your document, then NVARCHAR(MAX) or VARCHAR(MAX) is the best option. You can then convert it to XML to query it later on (though this can be costly).

For more information, see the documentation on XML Data Type and Columns (SQL Server) and also Define the Serialization of XML Data which outlines the rules that Sql Server applies when converting XML to a string type.

answered Mar 17, 2019 at 10:03
2
  • 2
    Ah good to know that, I've tried to remove space-in-between the self-closing tag and everything works fine! Many thanks Mr. Brownstone! Commented Mar 17, 2019 at 11:34
  • 1
    TIL! I didn't know SQL Server parsed the XML on write, I thought the xml datatype was just a glorified wrapper over nvarchar. But it is odd that SQL Server supports arbitrary XML data in the xml type and querying its inner-contents with XPath (and now JSON too),, which obviously offers a world of possibilities for semi-structured data storage - and has supported it for 16 years now, and yet SQL Server is so desperately lacking for data-types elsewhere (e.g. unsigned ints, array types, complex-types, UTF-8 wasn't added until last year, etc). weird. Commented Oct 30, 2021 at 21:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.