问题
I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows:
bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net
But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w:
bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database.windows.net
But then I get '0 rows copied'.
What am I doing wrong and how do I bulk insert Unicode characters using BCP?
回答1:
But data.csv is UTF8 encoded
The UTF-8 encoding is the primary issue. Using -w
won't help because in Microsoft-land, the term "Unicode" nearly always refers to UTF-16 Little Endian.
The solution will depend on which version of BCP you are using as an option was added in the newest version (13.0 / 2016):
If you are using BCP that came with SQL Server prior to SQL Server 2016 (version 13.0) then you need to convert the csv file to UTF-16 Little Endian (LE) as that is what Windows / SQL Server / .NET use for all strings. And use the
-w
switch.I got this to work encoding a file as "UCS-2 LE BOM" in Notepad++, whereas that same import file failed using the
-c
switch.If you are using BCP that came with SQL Server 2016 (version 13.0) or newer, then you can simply add
-c -C 65001
to the command line.-C
is for "code page", and 65001 is the code page for UTF-8.
The MSDN page for bcp Utility states (in the explanation of the -C
switch):
Versions prior to version 13 (SQL Server 2016) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.
UPDATE
Support for UTF-8 / code page 65001 was added to SQL Server 2014 via SP2, as noted in this Microsoft KB article:
UTF-8 encoding support for the BCP utility and BULK INSERT Transact-SQL command in SQL Server 2014 SP2
回答2:
The answer from Solomon helped me in my struggle with Unicode and SQL Server 2014. I would like to share my experience about Unicode here. I hope this helps the next person who suffers from Unicode problems with BCP.
I have had a hard time figuring out the UTF and Unicode of SQL Server 2014. I am using Powershell to upload using BCP to a SQL Server 2014 SP2 database. My files are in Dutch, UTF-8 without BOM. I used Powershell to convert the files into microsoft's Unicode:
Get-ChildItem "C:\Documents\ProjectA" -filter *.CSV |
ForEach-Object {
$path = $_.basename + '.unicode.CSV'
get-content $_ | Set-Content -Encoding Unicode -path $path
}
Then I used BCP without format file:
Get-ChildItem "C:\Documents\ProjectA" -filter *.unicode.CSV |
ForEach-Object {
try { $output = bcp ProjectA.dbo.auditlog in $_.FullName -w "-t," -T -F2
if ($LASTEXITCODE)
{ throw $output
}
catch
{ $Output >> C:\Documents\ProjectA\BCPCommandFailed$(get-date -f yyyy-MM-dd).log
}
}
The conversion into Unicode causes file sizes to double e.g. from 11,630KB into 23,259KB. Template file whether XML or non-XML did not work.
来源:https://stackoverflow.com/questions/41507292/insert-rows-with-unicode-characters-using-bcp