Accented characters not correctly imported with BULK INSERT

前端 未结 4 2014
遥遥无期
遥遥无期 2021-01-15 05:22

I am importing a source CSV file, I don\'t know the source encoding and I can only see either � (ANSI encoding) or (UTF8-without-BOM encoding)

相关标签:
4条回答
  • 2021-01-15 05:27

    It was answered in the comment. Did you try it?

    http://msdn.microsoft.com/en-us/library/ms189941.aspx

    Option DATAFILETYPE ='widenative'

    Based on comment from Esailiga did the text get truncated before or after the bulk import. I agree it sounds like the CSV file itself is single byte. Unicode requires option DATAFILETYPE ='widenative'. If the CSV file is single byte the is not magic translation back.

    What is too bad is é is extended ASCII and supported with SQL char so more evidence the problem is at the CSV.
    SELECT CAST('é' AS char(1))
    notice this works as extended ASCII (<255)

    Sounds like you need to go back to the source.

    The ? in SQL is unknown. Same as � in notepad.

    0 讨论(0)
  • 2021-01-15 05:27

    I've done it! After all these years and we were all looking in the wrong place. No work needed no rewriting scripts...

    The problem lies with SSMS... if you "New Query" by right-clicking on "Queries" you get to rename the file but not create it that is done for you...

    But... if you "Ctrl+N" you get a new query window to edit but no file is created... So you save it yourself and choose encoding on the save button... towards the bottom of the list you'll find UTF-8(without signature) codepage 65001

    And that is it...

    script after script open a new query window with "ctrl+N" copy and paste from an existing query and save as directed above. And as if by magic it works

    If like me you have tables in Excel... parse the table writing the output to the 1st column of a new workbook with 1 sheet in it and then saveas and choose utf-8 encoding

    Speed things up with a template file containing a comment "-- utf-8" something like that. save it as utf-8 and use a file listing of *.sql pasted into excel to concatenate a list of =concatenate("ren templatefile.txt ", char(34), a1, char(34)) in b1 and drop it down

    After all these years of manual solutions I am literally sweating with excitement at the discovery. Thank you for getting me so upset

    0 讨论(0)
  • 2021-01-15 05:36

    I still cannot believe that after all these years Microsoft has not fixed this obvious bug. There should be no problem with èéêë etc because they are all ascii(<255). This quest is posed over and over again on many sites and the question has yet to be answered

    My data is in a table in excel. having generated the insert into statements the table is parsed a 2nd time looking for asccii > 'z' and generating and update table set column statement to overwrite the imported data. Cumbersome but workable

    0 讨论(0)
  • 2021-01-15 05:42

    In my case I can fix the encoding problem with the CODEPAGE option:

    BULK
    INSERT #CSV
    FROM 'D:\XY\xy.csv'
    WITH
    (
       CODEPAGE = 'ACP',
       DATAFILETYPE ='char',
       FIELDTERMINATOR = ',',
       ROWTERMINATOR = '\n',
       FIRSTROW = 2
    )
    

    Possible values: CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } ]

    You can find more information about the option here: BULK INSERT

    0 讨论(0)
提交回复
热议问题