Use non-printable characters in SQL Server BCP xml format file as TERMINATOR

Question 1

I have a flat-text data file containing records which fields are separated by the non printable character "File Separator" (0x1c). I am trying to use SQL Server's bcp utility to load this data into my database. Yet, when using the hex encoded value of the File Separator as TERMINATOR, I get a syntax error.

I have tried using

the hex encoded value : "0x1c"
the XML encoded value as hex: ""
the XML encoded value as decimal: ""

None of these work, yet when using the same encoding for a printable character, like tab, this does work: 	, 	 (0x9 doesn't. Not unsurprisingly, since this is an XML file.)

The conclusion seems to be that non-printable characters are not supported. Is this the case? That would be ironical, since the non-printable separator characters are created exactly for this purpose...

Below you can find all code to reproduce this issue:

XML format file: test.xml

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="&#x1c;" MAX_LENGTH="10" COLLATION="Latin1_General_CS_AS_WS"/>
 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="41"/>
 </RECORD>
 <ROW>
 <COLUMN SOURCE="1" NAME="COL1" xsi:type="SQLVARYCHAR"/>
 <COLUMN SOURCE="2" NAME="COL2" xsi:type="SQLNUMERIC" PRECISION="4" SCALE="0"/>
 </ROW>
</BCPFORMAT>

Data file: test.txt
This is just one row, as a test case. StackExchange doesn't show the separator in the row below, yet when you click "Edit" for this post, the separator is included, and you should be able to copy-paste this.

1111111112008

Commandline

bcp TEST_DB.dbo.UL_TEST in "test.txt" -T -f "test.xml"

Question 2

Weird, because the documentation states This attribute specifies the terminator of a data field. The terminator can be any character.

Question 3

Maybe this could work? dba.stackexchange.com/a/64413/65699. The answer is for non-XML format files though.

Question 4

@TT : Thanks for the suggestion. I've tested this, and it doesn't work for XML format files. i've also tested with normal printable characters, and with 2 characters per the example.

Question 5

It seems, from the documentation about BCP - Specify Field and Row Terminators (SQL Server), that non-printable characters are not supported:

Characters Supported As Terminators

The bcp command, BULK INSERT statement, and the OPENROWSET bulk rowset provider support a variety of characters as field or row terminators and always look for the first instance of each terminator. The following table lists the supported characters for terminators.

enter image description here

As an option, you could use POWERSHELL to read in the source file and search\replace the 0x1C characters with some other character like a tilda (~) and output to a different file. Then use that character as your terminator.

(Get-Content c:\test\test.txt) | ForEach-Object {$_ -replace [char]0x1C,'~'} | Set-Content c:\test\testout.txt

Question 6

How arbitrary... Printable characters only, for something that has nothing to do with printing?

Question 7

@Scott Hodgin : The documentation indeed implies that non-printable characters can't be used. It's weird, it sucks, and it should be fixed, but still this is a valid answer to the question :) Thanks

Question 8

Thx for the powershell. In the meantime I had already worked out something similar, yet slightly more elaborate. Posted it as a separate answer.

Question 9

After some testing, I've figured out that only printable ASCII characters are supported. So, up to HEX 0->127. Any other characters in the cp1252 set, like ,ドル is not supported. Also any other UTF8 character is not supported.

Question 10

@Wouter - yeah, bummer! Glad you solved it another way :)

Question 11

Since @ScottHodgin pointed out that non-printable characters are not supported, replacing them in the source file is the only alternative.

Below you can find an example PowerShell script that can make this replacement on a file encoded in UTF8(with or without BOM header doesn't matter), and writes an output file encoded in UTF8 without a BOM header.

Additionally, it uses AppendAllLines to turn this into a streaming operation. This way it also works for huge files, since they don't have to be entirely loaded into memory before doing the replacements. -ReadCount 1000 speeds up the process drastically.

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
Get-Content -Encoding UTF8 -ReadCount 1000 test.txt |
 Foreach-Object { 
 [System.IO.File]::AppendAllLines(
 [string]'test.txt'
 , [string[]]($_ -replace '\x1C', '😎')
 , $Utf8NoBomEncoding
 ) 
 }

Make sure you save the script above in a file that is encoded with UTF8 with BOM header, otherwise PowerShell doesn't process the characters as UTF-8.

score 7 · Accepted Answer · 2018-06-07 14:47:08Z

It seems, from the documentation about BCP - Specify Field and Row Terminators (SQL Server), that non-printable characters are not supported:

Characters Supported As Terminators

The bcp command, BULK INSERT statement, and the OPENROWSET bulk rowset provider support a variety of characters as field or row terminators and always look for the first instance of each terminator. The following table lists the supported characters for terminators.

enter image description here

As an option, you could use POWERSHELL to read in the source file and search\replace the 0x1C characters with some other character like a tilda (~) and output to a different file. Then use that character as your terminator.

(Get-Content c:\test\test.txt) | ForEach-Object {$_ -replace [char]0x1C,'~'} | Set-Content c:\test\testout.txt

How arbitrary... Printable characters only, for something that has nothing to do with printing?
@Scott Hodgin : The documentation indeed implies that non-printable characters can't be used. It's weird, it sucks, and it should be fixed, but still this is a valid answer to the question :) Thanks
Thx for the powershell. In the meantime I had already worked out something similar, yet slightly more elaborate. Posted it as a separate answer.
After some testing, I've figured out that only printable ASCII characters are supported. So, up to HEX 0->127. Any other characters in the cp1252 set, like ,ドル is not supported. Also any other UTF8 character is not supported.

Stack Exchange Network

Use non-printable characters in SQL Server BCP xml format file as TERMINATOR

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Use non-printable characters in SQL Server BCP xml format file as TERMINATOR

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions