XQuery in SQL Server to convert XML column data to relational data table

Question 1

I previously asked a question about an error I was recieving. You don't really need it to understand this question but it's here for reference:

XML/SQL Server 2008 Error: XQuery...Cannot implicitly atomize or apply 'fn:data()' to complex content elements

The previous xml is a little complex and probably would benefit from a transformation so I applied an XSLT template to get the below structure and changed the tags a little so its more understandable. I've also restructured the table I'm importing to, for maintainability. I imported the transformed XML file to a SQL Server table, xTable, with column xData, Like so (only one row but I suppose you could import more than 1 and merge them all with David Browne's answer):

ID xData
1 <MyXMLFile><Sample><Location>....

The parent node of the xml, <Sample>, can be repeated up to 1 million times but for this illustration, I only have 2. There are 22 child nodes for each sample, one <SampleID> node and 21 <Location> nodes (I've only shown 2 nodes to keep things short). There are 3 child nodes for each node, one <LocationName> node and two <Foo> nodes, designated <Foo1> and <Foo2>.

<?xml version="1.0" encoding="UTF-16"?>
<MyXMLFile>
 <!--There CAN BE up to 1 million <Sample> nodes-->
 <Sample>
 <!--There ARE EXACTLY 22 child nodes for each <Sample> parent node, one <SampleID> and 21 <Location>-->
 <SampleID>0000001A</SampleID>
 <!--There ARE EXACTLY 3 child nodes for each <Location> parent node, on <LocationID> and two <Foo>-->
 <Location>
 <LocationName>Jeff</LocationName>
 <Foo1>10</Foo1>
 <Foo2>11</Foo2>
 </Location>
 <Location>
 <LocationName>Jenn</LocationName>
 <Foo1>11</Foo1>
 <Foo2>12</Foo2>
 </Location>
 </Sample>
 <Sample>
 <SampleID>0000002A</SampleID>
 <Location>
 <LocationName>Greg</LocationName>
 <Foo1>13</Foo1>
 <Foo2>14</Foo2>
 </Location>
 <Location>
 <LocationName>Anne</LocationName>
 <Foo1>14</Foo1>
 <Foo2>16</Foo2>
 </Location>
 </Sample>
</MyXMLFile>

I want to convert the xData column from xTable and put it into this table (ID column for illustration only):

ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11 
2 00000001 Jenn 11 12 
... 00000001 ... ... ... 
22 00000001 ... ... ... 
23 00000002 Greg 13 14 
24 00000002 Anne 17 18
... 00000002 ... ... ...
44 00000002 ... ... ...

At the moment, I'm just trying to SELECT the xData column from xTable and will edit the query later to insert the data. So my first query, just to show that <SampleID> does get selected:

Query 1

SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample') as a(b)

The output looks good:

ID SampleID
1 00000001
2 00000002

So, I added to the query:

Query2

SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID,
 a.b.query('LocationName').value('.', 'varchar(10)') AS LocationName,
 a.b.query('Foo1').value('.', 'varchar(6)') AS Foo1,
 a.b.query('Foo2').value('.', 'varchar(6)') AS Foo2
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID/../Location') as a(b)

For this output, no data gets selected for <SampleID>. This is not surprising to me as the xpath selection is only to the <Location> parent node and returns its children <LocationName>, <Foo1> and <Foo2> and not <SampleID>.

ID SampleID LocationName Foo1 Foo2
1 Jeff 10 11 
2 Jenn 11 12 
... ... ... ... 
22 ... ... ... 
23 Greg 13 14 
24 Anne 17 18 
... ... ... ...
44 ... ... ...

So then I tried this:

Query 3

SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID,
 c.d.query('LocationName').value('.', 'varchar(10)') AS LocationName,
 c.d.query('Foo1').value('.', 'varchar(6)') AS Foo1,
 c.d.query('Foo2').value('.', 'varchar(6)') AS Foo2
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID') as a(b)
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID/../Location') as c(d)

The output is a little better but the rows are duplicated in the table. There should only be 44, but there are 88:

ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11 
2 00000001 Jenn 11 12 
... 00000001 ... ... ... 
42 00000001 ... ... ... 
43 00000001 ... ... ...
44 00000001 ... ... ...
45 00000002 Greg 13 14 
46 00000002 Anne 17 18
... ... ... ... ...
88 00000002 ... ... ...

Then I thought I would try a different way.

Query 4

DECLARE @x xml;
SELECT @x = xData
FROM xTable
SELECT a.b.value('(SampleID/text())[1]', 'varchar(20)') AS SampleID,
 a.b.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
 a.b.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
 a.b.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM @x.nodes('MyXMLFile/Sample') AS xData(a)
CROSS APPLY @x.nodes('MyXMLFile/Sample/SampleID/../Location') AS a(b)

Now, instead of blank SampleID field or duplicated records, SampleID came back NULL and the data was duplicated:

ID SampleID LocationName Foo1 Foo2
1 NULL Jeff 10 11 
2 NULL Jenn 11 12 
... NULL ... ... ... 
42 NULL ... ... ... 
43 NULL ... ... ...
44 NULL ... ... ...
45 NULL Greg 13 14 
46 NULL Anne 17 18
... NULL ... ... ...
88 NULL ... ... ...

So in a final attempt to select the right data, I tried this query:

Query 5

DECLARE @x xml;
SELECT @x = xData
FROM xTable
SELECT a.b.value('(SampleID/text())[1]', 'varchar(20)') AS SampleID,
 c.d.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
 c.d.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
 c.d.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM @x.nodes('MyXMLFile/Sample') AS xData(a)
CROSS APPLY @x.nodes('MyXMLFile/Sample') AS a(b)
CROSS APPLY @x.nodes('MyXMLFile/Sample/SampleID/../Location') AS c(d)

The result here is even more surprising to me, not only did the query populate all the fields but it quadrupled the output:

ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11 
2 00000001 Jenn 11 12 
... 00000001 ... ... ... 
... 00000001 ... ... ... 
... 00000001 ... ... ... 
44 00000001 ... ... ...
45 00000002 Greg 13 14 
46 00000002 Anne 17 18
47 00000002 ... ... ...
48 00000002 ... ... ...
... ... ... ... ...
176 00000002 ... ... ...

I understand my problem to be incorporation of the two different xpaths into the query and my understanding and use of the derived tables in the query. Any help would be appreciated. How can I adjust these queries to get the table I need?

Thanks in advance.

EDIT: At the advice of the answer of David Browne this works for me:

Query 6

INSERT INTO MyTable (SampleID, LocationName, Foo1, Foo2)
SELECT Sample.n.value('(SampleID)[1]', 'varchar(20)') AS SampleName,
 Location.n.value('(LocationName/text())[1]', 'varchar(1)') AS LocationName,
 Location.n.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
 Location.n.value('(Foo2/text())[1]', 'varchar(6)') As Foo2
FROM xTable AS x
CROSS APPLY x.xData.nodes('/MYXMLFile/Sample') AS Sample(n)
CROSS APPLY Sample.n.nodes('Location') AS Location(n)

Question 2

The pattern is that each cross apply picks up the relative location of the parent. Try something like this:

declare @doc xml =N'<?xml version="1.0" encoding="UTF-16"?>
<MyXMLFile>
 <!--There CAN BE up to 1 million <Sample> nodes-->
 <Sample>
 <!--There ARE EXACTLY 22 child nodes for each <Sample> parent node, one <SampleID> and 21 <Location>-->
 <SampleID>0000001A</SampleID>
 <!--There ARE EXACTLY 3 child nodes for each <Location> parent node, on <LocationID> and two <Foo>-->
 <Location>
 <LocationName>Jeff</LocationName>
 <Foo1>10</Foo1>
 <Foo2>11</Foo2>
 </Location>
 <Location>
 <LocationName>Jenn</LocationName>
 <Foo1>11</Foo1>
 <Foo2>12</Foo2>
 </Location>
 </Sample>
 <Sample>
 <SampleID>0000002A</SampleID>
 <Location>
 <LocationName>Greg</LocationName>
 <Foo1>13</Foo1>
 <Foo2>14</Foo2>
 </Location>
 <Location>
 <LocationName>Anne</LocationName>
 <Foo1>14</Foo1>
 <Foo2>16</Foo2>
 </Location>
 </Sample>
</MyXMLFile>'
drop table if exists #xData;
with q as
(
 select 1 ID, @doc xData
 union all 
 select 1 ID, @doc xData
)
select *
into #xData
from q
SELECT Sample.n.value('(SampleID)[1]', 'varchar(20)') AS SampleID,
 Location.n.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
 Location.n.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
 Location.n.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM #xData x
cross apply x.xData.nodes('/MyXMLFile/Sample') AS Sample(n)
cross apply Sample.n.nodes('Location') as Location(n)

outputs

SampleID LocationName Foo1 Foo2
-------------------- ------------ ------ ------
0000001A Jeff 10 11
0000001A Jenn 11 12
0000002A Greg 13 14
0000002A Anne 14 16
0000001A Jeff 10 11
0000001A Jenn 11 12
0000002A Greg 13 14
0000002A Anne 14 16
(8 rows affected)

Question 3

It appears that I have no idea how to work with derived tables. Ha! I don’t have access to my data at the moment. As soon as I do, I will give this a try and mark your answer as correct.

Question 4

I apologize for my naivete, but any particular reason to do the union all and create two rows of the same xml in the temp table? I get the same result with only one.

Question 5

Because you said that the source is a table called xData has (ID,xData) rows.

Question 6

Dang. I confused myself. It only has 1. I’ve edited a little and got it going. I’ll post follow up with my query today. I already have a table so I don’t need a temp, although, I’ll probably go with a temp later.

score 3 · Accepted Answer · 2020-03-19 19:52:17Z

The pattern is that each cross apply picks up the relative location of the parent. Try something like this:

declare @doc xml =N'<?xml version="1.0" encoding="UTF-16"?>
<MyXMLFile>
 <!--There CAN BE up to 1 million <Sample> nodes-->
 <Sample>
 <!--There ARE EXACTLY 22 child nodes for each <Sample> parent node, one <SampleID> and 21 <Location>-->
 <SampleID>0000001A</SampleID>
 <!--There ARE EXACTLY 3 child nodes for each <Location> parent node, on <LocationID> and two <Foo>-->
 <Location>
 <LocationName>Jeff</LocationName>
 <Foo1>10</Foo1>
 <Foo2>11</Foo2>
 </Location>
 <Location>
 <LocationName>Jenn</LocationName>
 <Foo1>11</Foo1>
 <Foo2>12</Foo2>
 </Location>
 </Sample>
 <Sample>
 <SampleID>0000002A</SampleID>
 <Location>
 <LocationName>Greg</LocationName>
 <Foo1>13</Foo1>
 <Foo2>14</Foo2>
 </Location>
 <Location>
 <LocationName>Anne</LocationName>
 <Foo1>14</Foo1>
 <Foo2>16</Foo2>
 </Location>
 </Sample>
</MyXMLFile>'
drop table if exists #xData;
with q as
(
 select 1 ID, @doc xData
 union all 
 select 1 ID, @doc xData
)
select *
into #xData
from q
SELECT Sample.n.value('(SampleID)[1]', 'varchar(20)') AS SampleID,
 Location.n.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
 Location.n.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
 Location.n.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM #xData x
cross apply x.xData.nodes('/MyXMLFile/Sample') AS Sample(n)
cross apply Sample.n.nodes('Location') as Location(n)

outputs

SampleID LocationName Foo1 Foo2
-------------------- ------------ ------ ------
0000001A Jeff 10 11
0000001A Jenn 11 12
0000002A Greg 13 14
0000002A Anne 14 16
0000001A Jeff 10 11
0000001A Jenn 11 12
0000002A Greg 13 14
0000002A Anne 14 16
(8 rows affected)

It appears that I have no idea how to work with derived tables. Ha! I don’t have access to my data at the moment. As soon as I do, I will give this a try and mark your answer as correct.
I apologize for my naivete, but any particular reason to do the union all and create two rows of the same xml in the temp table? I get the same result with only one.
Because you said that the source is a table called xData has (ID,xData) rows.
Dang. I confused myself. It only has 1. I’ve edited a little and got it going. I’ll post follow up with my query today. I already have a table so I don’t need a temp, although, I’ll probably go with a temp later.

Stack Exchange Network

XQuery in SQL Server to convert XML column data to relational data table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

XQuery in SQL Server to convert XML column data to relational data table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions