I previously asked a question about an error I was recieving. You don't really need it to understand this question but it's here for reference:
The previous xml is a little complex and probably would benefit from a transformation so I applied an XSLT template to get the below structure and changed the tags a little so its more understandable. I've also restructured the table I'm importing to, for maintainability. I imported the transformed XML file to a SQL Server table, xTable
, with column xData
, Like so (only one row but I suppose you could import more than 1 and merge them all with David Browne's answer):
ID xData
1 <MyXMLFile><Sample><Location>....
The parent node of the xml, <Sample>
, can be repeated up to 1 million times but for this illustration, I only have 2. There are 22 child nodes for each sample, one <SampleID>
node and 21 <Location>
nodes (I've only shown 2 nodes to keep things short). There are 3 child nodes for each node, one <LocationName>
node and two <Foo>
nodes, designated <Foo1>
and <Foo2>
.
<?xml version="1.0" encoding="UTF-16"?>
<MyXMLFile>
<!--There CAN BE up to 1 million <Sample> nodes-->
<Sample>
<!--There ARE EXACTLY 22 child nodes for each <Sample> parent node, one <SampleID> and 21 <Location>-->
<SampleID>0000001A</SampleID>
<!--There ARE EXACTLY 3 child nodes for each <Location> parent node, on <LocationID> and two <Foo>-->
<Location>
<LocationName>Jeff</LocationName>
<Foo1>10</Foo1>
<Foo2>11</Foo2>
</Location>
<Location>
<LocationName>Jenn</LocationName>
<Foo1>11</Foo1>
<Foo2>12</Foo2>
</Location>
</Sample>
<Sample>
<SampleID>0000002A</SampleID>
<Location>
<LocationName>Greg</LocationName>
<Foo1>13</Foo1>
<Foo2>14</Foo2>
</Location>
<Location>
<LocationName>Anne</LocationName>
<Foo1>14</Foo1>
<Foo2>16</Foo2>
</Location>
</Sample>
</MyXMLFile>
I want to convert the xData
column from xTable
and put it into this table (ID column for illustration only):
ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11
2 00000001 Jenn 11 12
... 00000001 ... ... ...
22 00000001 ... ... ...
23 00000002 Greg 13 14
24 00000002 Anne 17 18
... 00000002 ... ... ...
44 00000002 ... ... ...
At the moment, I'm just trying to SELECT
the xData
column from xTable
and will edit the query later to insert the data. So my first query, just to show that <SampleID>
does get selected:
Query 1
SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample') as a(b)
The output looks good:
ID SampleID
1 00000001
2 00000002
So, I added to the query:
Query2
SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID,
a.b.query('LocationName').value('.', 'varchar(10)') AS LocationName,
a.b.query('Foo1').value('.', 'varchar(6)') AS Foo1,
a.b.query('Foo2').value('.', 'varchar(6)') AS Foo2
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID/../Location') as a(b)
For this output, no data gets selected for <SampleID>
. This is not surprising to me as the xpath selection is only to the <Location>
parent node and returns its children <LocationName>
, <Foo1>
and <Foo2>
and not <SampleID>
.
ID SampleID LocationName Foo1 Foo2
1 Jeff 10 11
2 Jenn 11 12
... ... ... ...
22 ... ... ...
23 Greg 13 14
24 Anne 17 18
... ... ... ...
44 ... ... ...
So then I tried this:
Query 3
SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID,
c.d.query('LocationName').value('.', 'varchar(10)') AS LocationName,
c.d.query('Foo1').value('.', 'varchar(6)') AS Foo1,
c.d.query('Foo2').value('.', 'varchar(6)') AS Foo2
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID') as a(b)
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID/../Location') as c(d)
The output is a little better but the rows are duplicated in the table. There should only be 44, but there are 88:
ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11
2 00000001 Jenn 11 12
... 00000001 ... ... ...
42 00000001 ... ... ...
43 00000001 ... ... ...
44 00000001 ... ... ...
45 00000002 Greg 13 14
46 00000002 Anne 17 18
... ... ... ... ...
88 00000002 ... ... ...
Then I thought I would try a different way.
Query 4
DECLARE @x xml;
SELECT @x = xData
FROM xTable
SELECT a.b.value('(SampleID/text())[1]', 'varchar(20)') AS SampleID,
a.b.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
a.b.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
a.b.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM @x.nodes('MyXMLFile/Sample') AS xData(a)
CROSS APPLY @x.nodes('MyXMLFile/Sample/SampleID/../Location') AS a(b)
Now, instead of blank SampleID
field or duplicated records, SampleID
came back NULL
and the data was duplicated:
ID SampleID LocationName Foo1 Foo2
1 NULL Jeff 10 11
2 NULL Jenn 11 12
... NULL ... ... ...
42 NULL ... ... ...
43 NULL ... ... ...
44 NULL ... ... ...
45 NULL Greg 13 14
46 NULL Anne 17 18
... NULL ... ... ...
88 NULL ... ... ...
So in a final attempt to select the right data, I tried this query:
Query 5
DECLARE @x xml;
SELECT @x = xData
FROM xTable
SELECT a.b.value('(SampleID/text())[1]', 'varchar(20)') AS SampleID,
c.d.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
c.d.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
c.d.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM @x.nodes('MyXMLFile/Sample') AS xData(a)
CROSS APPLY @x.nodes('MyXMLFile/Sample') AS a(b)
CROSS APPLY @x.nodes('MyXMLFile/Sample/SampleID/../Location') AS c(d)
The result here is even more surprising to me, not only did the query populate all the fields but it quadrupled the output:
ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11
2 00000001 Jenn 11 12
... 00000001 ... ... ...
... 00000001 ... ... ...
... 00000001 ... ... ...
44 00000001 ... ... ...
45 00000002 Greg 13 14
46 00000002 Anne 17 18
47 00000002 ... ... ...
48 00000002 ... ... ...
... ... ... ... ...
176 00000002 ... ... ...
I understand my problem to be incorporation of the two different xpaths into the query and my understanding and use of the derived tables in the query. Any help would be appreciated. How can I adjust these queries to get the table I need?
Thanks in advance.
EDIT: At the advice of the answer of David Browne this works for me:
Query 6
INSERT INTO MyTable (SampleID, LocationName, Foo1, Foo2)
SELECT Sample.n.value('(SampleID)[1]', 'varchar(20)') AS SampleName,
Location.n.value('(LocationName/text())[1]', 'varchar(1)') AS LocationName,
Location.n.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
Location.n.value('(Foo2/text())[1]', 'varchar(6)') As Foo2
FROM xTable AS x
CROSS APPLY x.xData.nodes('/MYXMLFile/Sample') AS Sample(n)
CROSS APPLY Sample.n.nodes('Location') AS Location(n)
1 Answer 1
The pattern is that each cross apply
picks up the relative location of the parent. Try something like this:
declare @doc xml =N'<?xml version="1.0" encoding="UTF-16"?>
<MyXMLFile>
<!--There CAN BE up to 1 million <Sample> nodes-->
<Sample>
<!--There ARE EXACTLY 22 child nodes for each <Sample> parent node, one <SampleID> and 21 <Location>-->
<SampleID>0000001A</SampleID>
<!--There ARE EXACTLY 3 child nodes for each <Location> parent node, on <LocationID> and two <Foo>-->
<Location>
<LocationName>Jeff</LocationName>
<Foo1>10</Foo1>
<Foo2>11</Foo2>
</Location>
<Location>
<LocationName>Jenn</LocationName>
<Foo1>11</Foo1>
<Foo2>12</Foo2>
</Location>
</Sample>
<Sample>
<SampleID>0000002A</SampleID>
<Location>
<LocationName>Greg</LocationName>
<Foo1>13</Foo1>
<Foo2>14</Foo2>
</Location>
<Location>
<LocationName>Anne</LocationName>
<Foo1>14</Foo1>
<Foo2>16</Foo2>
</Location>
</Sample>
</MyXMLFile>'
drop table if exists #xData;
with q as
(
select 1 ID, @doc xData
union all
select 1 ID, @doc xData
)
select *
into #xData
from q
SELECT Sample.n.value('(SampleID)[1]', 'varchar(20)') AS SampleID,
Location.n.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
Location.n.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
Location.n.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM #xData x
cross apply x.xData.nodes('/MyXMLFile/Sample') AS Sample(n)
cross apply Sample.n.nodes('Location') as Location(n)
outputs
SampleID LocationName Foo1 Foo2
-------------------- ------------ ------ ------
0000001A Jeff 10 11
0000001A Jenn 11 12
0000002A Greg 13 14
0000002A Anne 14 16
0000001A Jeff 10 11
0000001A Jenn 11 12
0000002A Greg 13 14
0000002A Anne 14 16
(8 rows affected)
-
It appears that I have no idea how to work with derived tables. Ha! I don’t have access to my data at the moment. As soon as I do, I will give this a try and mark your answer as correct.Dan– Dan2020年03月19日 23:10:45 +00:00Commented Mar 19, 2020 at 23:10
-
I apologize for my naivete, but any particular reason to do the union all and create two rows of the same xml in the temp table? I get the same result with only one.Dan– Dan2020年03月20日 12:26:37 +00:00Commented Mar 20, 2020 at 12:26
-
Because you said that the source is a table called xData has (ID,xData) rows.David Browne - Microsoft– David Browne - Microsoft2020年03月20日 13:16:09 +00:00Commented Mar 20, 2020 at 13:16
-
Dang. I confused myself. It only has 1. I’ve edited a little and got it going. I’ll post follow up with my query today. I already have a table so I don’t need a temp, although, I’ll probably go with a temp later.Dan– Dan2020年03月20日 13:59:13 +00:00Commented Mar 20, 2020 at 13:59