5

Every Friday I have to import a couple (sometimes more than 300) of XML files into 2 tables.

The structure of one of the tables, R000000, looks like this:

R00000010 | R00000020 | R00000030 | R00000040 | R00000050 | R00000060 
---------- ------------ ---------- ----------- ----------- ----------
R000000 | I | 0002 | 1 | 2 | 0026
R000000 | I | 0003 | 1 | 2 | 0025
R000000 | I | 0004 | 1 | 2 | 0021
R000000 | I | 0006 | 1 | 2 | 0023
R000000 | I | 0001 | 1 | 2 | 0022

^ Each row corresponds to an XML file.

The structure doesn't change, only the data (in this case I've placed some random ones for e.g).

The XML files look like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<ns0:P4131 xmlns:ns0="http://switching/xi">
<R000000>
 <R00000010>R000000</R00000010>
 <R00000020>I</R00000020>
 <R00000030>0002</R00000030>
 <R00000040>1</R00000040>
 <R00000050>0026</R00000050>
 <R00000060>2</R00000060>
</R000000>
</ns0:P4131>

What is the best way to do this? I'm currently doing this in Access.

Solomon Rutzky
70.1k8 gold badges160 silver badges306 bronze badges
asked Oct 5, 2015 at 9:55
1
  • You have a mismatch between the field definitions of R00000050 and R00000060 between the example table and example XML. Their values are swapped. Please update the examples so that they are consistent with each other :-). Commented Oct 7, 2015 at 14:20

2 Answers 2

2

Give something like the below a try...

You'll obviously need to plug in your variables for your environment, check the data types (may need to add logic to keep leading zeros?), change from the final temp tables to your regular table(s), etc.

Works fine for me for import from XML files to temp tables without deleting the files afterwards but adding logic to delete files from the UNC path shouldn't be too difficult with another xp_cmdshell command.

DECLARE @folder AS VARCHAR(1000) = '\\servername\sharename\folder\subfolder1\'
DECLARE @command VARCHAR(500) = 'DIR /B "' + @folder + '*.xml"'
DECLARE @file VARCHAR(100)
DECLARE @filesinafolder TABLE (filenameswithfolder VARCHAR(500))
DECLARE @sql NVARCHAR(4000)
-- create global temp table
IF OBJECT_ID('tempdb..##XMLImport') IS NOT NULL
 DROP TABLE ##XMLImport
CREATE TABLE ##XMLImport (
 R00000010 VARCHAR(7)
 ,R00000020 VARCHAR(1)
 ,R00000030 INT
 ,R00000040 INT
 ,R00000050 INT
 ,R00000060 INT
 )
INSERT INTO @filesinafolder
EXEC master..xp_cmdshell @command
-- create cursor
DECLARE filecurs CURSOR
FOR
SELECT REPLACE(filenameswithfolder, @folder, '') AS filenames
FROM @filesinafolder
WHERE filenameswithfolder IS NOT NULL
OPEN filecurs
FETCH NEXT
FROM filecurs
INTO @file
IF @file = 'FILE NOT FOUND'
 GOTO exitprocessing
WHILE @@fetch_status != - 1
BEGIN
 SET @sql = 'DECLARE @X XML
 SELECT @X = P
 FROM OPENROWSET(BULK ''' + @folder + '' + @file + ''', SINGLE_BLOB) AS Products(P)
 DECLARE @iX INT
 EXEC sp_xml_preparedocument @iX OUTPUT
 ,@X
 SELECT *
 INTO #XMLResults
 FROM OPENXML(@iX, ''/*/*'', 2) WITH (
 R00000010 VARCHAR(7)
 ,R00000020 VARCHAR(1)
 ,R00000030 INT
 ,R00000040 INT
 ,R00000050 INT
 ,R00000060 INT
 )
 EXEC sp_xml_removedocument @iX
 INSERT INTO ##XMLImport
 SELECT R00000010
 ,R00000020
 ,R00000030
 ,R00000040
 ,R00000050
 ,R00000060
 FROM #XMLResults'
 --PRINT @sql
 EXEC sp_executesql @sql
 -- process next file
 FETCH NEXT
 FROM filecurs
 INTO @file
END
exitprocessing:
-- clean up
CLOSE filecurs
DEALLOCATE filecurs
SELECT *
FROM ##XMLImport
answered Oct 5, 2015 at 21:11
2

This is actually very simple to do via SQLCLR. A Stored Procedure can be set up t read any xml files in a particular directory (or just as easily check all sub-directories) and output a single result set with all of their contents. Doing this, you could populate your table with the following query:

INSERT INTO dbo.R000000 (R00000010, R00000020, R00000030, R00000040, R00000050, R00000060)
 EXEC dbo.GetXmlDataFromFiles(N'C:\Path\To\XML\Files');

And that is it.

The following code will read any .xml file within the directory specified by the @FilePath input parameter, optionally traverse sub-directories, and return a single result set of the contents of each of the files.

Please note:

  • The code assumes only one node per file since the Question states "Each row corresponds to an XML file." and the example data is consistent with that statement.
  • If any of the files might have multiple <R000000> nodes, then it is pretty easy to change this code to handle that.
  • The contents of each file are sent back as result rows as soon as they are read. This means that only a single file is in memory at a time, rather than reading them all and sending back the entire result set when done. Hence, this scales pretty well and it wouldn't matter if you were importing 3000 files.
using System;
using System.Data;
using System.Data.SqlTypes;
using System.IO;
using System.Xml;
using Microsoft.SqlServer.Server;
public class ImportXmlFiles
{
 [Microsoft.SqlServer.Server.SqlProcedure]
 public static void ReadXmlFiles([SqlFacet(MaxSize = 500)] SqlString FilePath,
 SqlBoolean Recursive)
 {
 XmlDocument _FileContents = new XmlDocument();
 SqlDataRecord _ResultRow = new SqlDataRecord(new SqlMetaData[]{
 new SqlMetaData("R00000010", SqlDbType.VarChar, 10),
 new SqlMetaData("R00000020", SqlDbType.VarChar, 10),
 new SqlMetaData("R00000030", SqlDbType.VarChar, 10),
 new SqlMetaData("R00000040", SqlDbType.Int),
 new SqlMetaData("R00000050", SqlDbType.VarChar, 10),
 new SqlMetaData("R00000060", SqlDbType.Int)
 });
 SqlContext.Pipe.SendResultsStart(_ResultRow);
 foreach (string _FileName in Directory.GetFiles(FilePath.Value, "*.xml",
 (Recursive.IsTrue) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly)
 )
 {
 _FileContents.Load(_FileName);
 XmlElement _Row = (XmlElement)_FileContents.SelectSingleNode("//R000000");
 _ResultRow.SetString(0, _Row.SelectSingleNode("./R00000010").InnerText);
 _ResultRow.SetString(1, _Row.SelectSingleNode("./R00000020").InnerText);
 _ResultRow.SetString(2, _Row.SelectSingleNode("./R00000030").InnerText);
 _ResultRow.SetInt32(3,
 Convert.ToInt32(_Row.SelectSingleNode("./R00000040").InnerText));
 _ResultRow.SetString(4, _Row.SelectSingleNode("./R00000050").InnerText);
 _ResultRow.SetInt32(5,
 Convert.ToInt32(_Row.SelectSingleNode("./R00000060").InnerText));
 SqlContext.Pipe.SendResultsRow(_ResultRow);
 }
 SqlContext.Pipe.SendResultsEnd();
 return;
 }
}

An easy to install, working example of the SQLCLR Stored Procedure shown above is available on Pastebin at:

SQLCLR Stored Proc returns one result set of many XML files

!! Please note that while the Assembly is set to EXTERNAL_ACCESS, the database property of TRUSTWORTHY is not set to ON, as is done in most SQLCLR examples that you will find here on the interwebs. The Assembly was signed (given a strong name) when compiled, so the install script creates an Asymmetric Key in [master], a Login based on that Asymmetric Key, and then grants that Login the EXTERNAL ACCESS ASSEMBLY permission. That not only allows the Assembly to be set to EXTERNAL_ACCESS without needing TRUSTWORTHY ON, but it also does not allow the Assembly to be set to UNSAFE, which would be allowed if TRUSTWORTHY was set to ON !!


Another approach that would be more generic and allow for importing various XML structures would be to use a Table-Valued Function instead of a Stored Procedure. It would be even easier than the Stored Procedure shown here to simply read the contents of each file, and return 1 row for each file in a result set that is 1 field of the XML datatype. Then you could use the T-SQL .nodes() and .value() functions to parse out different structures as appropriate.

answered Oct 7, 2015 at 14:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.