Every Friday I have to import a couple (sometimes more than 300) of XML files into 2 tables.
The structure of one of the tables, R000000
, looks like this:
R00000010 | R00000020 | R00000030 | R00000040 | R00000050 | R00000060
---------- ------------ ---------- ----------- ----------- ----------
R000000 | I | 0002 | 1 | 2 | 0026
R000000 | I | 0003 | 1 | 2 | 0025
R000000 | I | 0004 | 1 | 2 | 0021
R000000 | I | 0006 | 1 | 2 | 0023
R000000 | I | 0001 | 1 | 2 | 0022
^ Each row corresponds to an XML file.
The structure doesn't change, only the data (in this case I've placed some random ones for e.g).
The XML files look like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<ns0:P4131 xmlns:ns0="http://switching/xi">
<R000000>
<R00000010>R000000</R00000010>
<R00000020>I</R00000020>
<R00000030>0002</R00000030>
<R00000040>1</R00000040>
<R00000050>0026</R00000050>
<R00000060>2</R00000060>
</R000000>
</ns0:P4131>
What is the best way to do this? I'm currently doing this in Access.
2 Answers 2
Give something like the below a try...
You'll obviously need to plug in your variables for your environment, check the data types (may need to add logic to keep leading zeros?), change from the final temp tables to your regular table(s), etc.
Works fine for me for import from XML files to temp tables without deleting the files afterwards but adding logic to delete files from the UNC path shouldn't be too difficult with another xp_cmdshell command.
DECLARE @folder AS VARCHAR(1000) = '\\servername\sharename\folder\subfolder1\'
DECLARE @command VARCHAR(500) = 'DIR /B "' + @folder + '*.xml"'
DECLARE @file VARCHAR(100)
DECLARE @filesinafolder TABLE (filenameswithfolder VARCHAR(500))
DECLARE @sql NVARCHAR(4000)
-- create global temp table
IF OBJECT_ID('tempdb..##XMLImport') IS NOT NULL
DROP TABLE ##XMLImport
CREATE TABLE ##XMLImport (
R00000010 VARCHAR(7)
,R00000020 VARCHAR(1)
,R00000030 INT
,R00000040 INT
,R00000050 INT
,R00000060 INT
)
INSERT INTO @filesinafolder
EXEC master..xp_cmdshell @command
-- create cursor
DECLARE filecurs CURSOR
FOR
SELECT REPLACE(filenameswithfolder, @folder, '') AS filenames
FROM @filesinafolder
WHERE filenameswithfolder IS NOT NULL
OPEN filecurs
FETCH NEXT
FROM filecurs
INTO @file
IF @file = 'FILE NOT FOUND'
GOTO exitprocessing
WHILE @@fetch_status != - 1
BEGIN
SET @sql = 'DECLARE @X XML
SELECT @X = P
FROM OPENROWSET(BULK ''' + @folder + '' + @file + ''', SINGLE_BLOB) AS Products(P)
DECLARE @iX INT
EXEC sp_xml_preparedocument @iX OUTPUT
,@X
SELECT *
INTO #XMLResults
FROM OPENXML(@iX, ''/*/*'', 2) WITH (
R00000010 VARCHAR(7)
,R00000020 VARCHAR(1)
,R00000030 INT
,R00000040 INT
,R00000050 INT
,R00000060 INT
)
EXEC sp_xml_removedocument @iX
INSERT INTO ##XMLImport
SELECT R00000010
,R00000020
,R00000030
,R00000040
,R00000050
,R00000060
FROM #XMLResults'
--PRINT @sql
EXEC sp_executesql @sql
-- process next file
FETCH NEXT
FROM filecurs
INTO @file
END
exitprocessing:
-- clean up
CLOSE filecurs
DEALLOCATE filecurs
SELECT *
FROM ##XMLImport
This is actually very simple to do via SQLCLR. A Stored Procedure can be set up t read any xml
files in a particular directory (or just as easily check all sub-directories) and output a single result set with all of their contents. Doing this, you could populate your table with the following query:
INSERT INTO dbo.R000000 (R00000010, R00000020, R00000030, R00000040, R00000050, R00000060)
EXEC dbo.GetXmlDataFromFiles(N'C:\Path\To\XML\Files');
And that is it.
The following code will read any .xml
file within the directory specified by the @FilePath
input parameter, optionally traverse sub-directories, and return a single result set of the contents of each of the files.
Please note:
- The code assumes only one node per file since the Question states "Each row corresponds to an XML file." and the example data is consistent with that statement.
- If any of the files might have multiple
<R000000>
nodes, then it is pretty easy to change this code to handle that. - The contents of each file are sent back as result rows as soon as they are read. This means that only a single file is in memory at a time, rather than reading them all and sending back the entire result set when done. Hence, this scales pretty well and it wouldn't matter if you were importing 3000 files.
using System;
using System.Data;
using System.Data.SqlTypes;
using System.IO;
using System.Xml;
using Microsoft.SqlServer.Server;
public class ImportXmlFiles
{
[Microsoft.SqlServer.Server.SqlProcedure]
public static void ReadXmlFiles([SqlFacet(MaxSize = 500)] SqlString FilePath,
SqlBoolean Recursive)
{
XmlDocument _FileContents = new XmlDocument();
SqlDataRecord _ResultRow = new SqlDataRecord(new SqlMetaData[]{
new SqlMetaData("R00000010", SqlDbType.VarChar, 10),
new SqlMetaData("R00000020", SqlDbType.VarChar, 10),
new SqlMetaData("R00000030", SqlDbType.VarChar, 10),
new SqlMetaData("R00000040", SqlDbType.Int),
new SqlMetaData("R00000050", SqlDbType.VarChar, 10),
new SqlMetaData("R00000060", SqlDbType.Int)
});
SqlContext.Pipe.SendResultsStart(_ResultRow);
foreach (string _FileName in Directory.GetFiles(FilePath.Value, "*.xml",
(Recursive.IsTrue) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly)
)
{
_FileContents.Load(_FileName);
XmlElement _Row = (XmlElement)_FileContents.SelectSingleNode("//R000000");
_ResultRow.SetString(0, _Row.SelectSingleNode("./R00000010").InnerText);
_ResultRow.SetString(1, _Row.SelectSingleNode("./R00000020").InnerText);
_ResultRow.SetString(2, _Row.SelectSingleNode("./R00000030").InnerText);
_ResultRow.SetInt32(3,
Convert.ToInt32(_Row.SelectSingleNode("./R00000040").InnerText));
_ResultRow.SetString(4, _Row.SelectSingleNode("./R00000050").InnerText);
_ResultRow.SetInt32(5,
Convert.ToInt32(_Row.SelectSingleNode("./R00000060").InnerText));
SqlContext.Pipe.SendResultsRow(_ResultRow);
}
SqlContext.Pipe.SendResultsEnd();
return;
}
}
An easy to install, working example of the SQLCLR Stored Procedure shown above is available on Pastebin at:
SQLCLR Stored Proc returns one result set of many XML files
!! Please note that while the Assembly is set to EXTERNAL_ACCESS
, the database property of TRUSTWORTHY
is not set to ON
, as is done in most SQLCLR examples that you will find here on the interwebs. The Assembly was signed (given a strong name) when compiled, so the install script creates an Asymmetric Key in [master]
, a Login based on that Asymmetric Key, and then grants that Login the EXTERNAL ACCESS ASSEMBLY
permission. That not only allows the Assembly to be set to EXTERNAL_ACCESS
without needing TRUSTWORTHY ON
, but it also does not allow the Assembly to be set to UNSAFE
, which would be allowed if TRUSTWORTHY
was set to ON
!!
Another approach that would be more generic and allow for importing various XML structures would be to use a Table-Valued Function instead of a Stored Procedure. It would be even easier than the Stored Procedure shown here to simply read the contents of each file, and return 1 row for each file in a result set that is 1 field of the XML datatype. Then you could use the T-SQL .nodes()
and .value()
functions to parse out different structures as appropriate.
Explore related questions
See similar questions with these tags.
R00000050
andR00000060
between the example table and example XML. Their values are swapped. Please update the examples so that they are consistent with each other :-).