sep=“;” statement breaks utf8 BOM in CSV file which is generated by XSL

前端 未结 3 1737

I\'m currently developing CSV export with XSLT. And CSV file will be used %99 percent with Excel in my case, so I have to consider Excel behavior.

My first problem

相关标签:
3条回答
  • 2020-12-08 05:11

    I can't write comments yet, but I'd like to address @Pier-Luc Gendreau's solution. While it is possible to open it in European Excel (which by default uses ;as delimiter) and have full utf-16LE support, it is apparently not possible to use this technique when you specify sep=,.

    The issue with solution is that while Excel interprets sep=; properly, it displays sep= (yes, it swallows the ;) in the first column of the last row.

    For me it did not work if I specified a delimiter which wasn't the default one (;in my case) so I assume Excel did not interpret the last line correctly and swallowed the last delimiter because this is the default behavior.

    Please correct me if I'm wrong

    0 讨论(0)
  • 2020-12-08 05:15

    You are right, there is no way in Excel 2007 to get it load both the encoding and the seperator correctly across different locales when someone double clicks a CSV file.

    It seems like when you specify sep= after the BOM it forgets the BOM has told it that it is UTF-8.

    You have to specify the BOM because in certain locales Excel does not detect the seperator. For instance in danish, the default seperator is ;. If you output tab or comma seperated text then it does not detect the seperator and in other locales if you seperate with semi-colon it doesn't load. You can test this by changing the locae format in windows settings - excel then picks this up.

    From this question: Is it possible to force Excel recognize UTF-8 CSV files automatically?

    and the answers it seems the only way is to use UTF16 le encoding with BOM.

    Note also that as per http://wiki.scn.sap.com/wiki/display/ABAP/CSV+tests+of+encoding+and+column+separator?original_fqdn=wiki.sdn.sap.com it seems that if you use utf16-le with tab seperators then it works.

    I've wondered if excel reads sep=; and then re-calls the method to get the CSV text and loses the BOM - I've tried giving incorrect text and I can't find any work around that tells excel to take both the sep and the encoding.

    0 讨论(0)
  • 2020-12-08 05:31

    This is the result of my testing with Excel 2013.

    If you're stuck with UTF-8, there is a workaround which consists of BOM + data + sep=;

    Input (written with UTF8 encoding)

    \ufeffSome;Header;Columns
    Wîth;Fàncÿ;Stûff
    sep=;
    

    Output

    |Some|Header|Columns|
    |Wîth|Fàncÿ |Stûff  |
    |sep=|      |       |
    

    The issue with solution is that while Excel interprets sep=; properly, it displays sep= (yes, it swallows the ;) in the first column of the last row.

    However, if you can write the file as UTF16-LE, then there is an actual solution. Use the \t delimiter without specifying sep and Excel will play ball.

    Input (written with UTF16-LE encoding)

    \ufeffSome;Header;Columns
    Wîth;Fàncÿ;Stûff
    

    Output

    |Some|Header|Columns|
    |Wîth|Fàncÿ |Stûff  |
    
    0 讨论(0)
提交回复
热议问题