I\'m currently developing CSV export with XSLT. And CSV file will be used %99 percent with Excel in my case, so I have to consider Excel behavior.
My first problem
I can't write comments yet, but I'd like to address @Pier-Luc Gendreau's solution. While it is possible to open it in European Excel (which by default uses ;
as delimiter) and have full utf-16LE support, it is apparently not possible to use this technique when you specify sep=,
.
The issue with solution is that while Excel interprets sep=; properly, it displays sep= (yes, it swallows the ;) in the first column of the last row.
For me it did not work if I specified a delimiter which wasn't the default one (;
in my case) so I assume Excel did not interpret the last line correctly and swallowed the last delimiter because this is the default behavior.
Please correct me if I'm wrong
You are right, there is no way in Excel 2007 to get it load both the encoding and the seperator correctly across different locales when someone double clicks a CSV file.
It seems like when you specify sep= after the BOM it forgets the BOM has told it that it is UTF-8.
You have to specify the BOM because in certain locales Excel does not detect the seperator. For instance in danish, the default seperator is ;. If you output tab or comma seperated text then it does not detect the seperator and in other locales if you seperate with semi-colon it doesn't load. You can test this by changing the locae format in windows settings - excel then picks this up.
From this question: Is it possible to force Excel recognize UTF-8 CSV files automatically?
and the answers it seems the only way is to use UTF16 le encoding with BOM.
Note also that as per http://wiki.scn.sap.com/wiki/display/ABAP/CSV+tests+of+encoding+and+column+separator?original_fqdn=wiki.sdn.sap.com it seems that if you use utf16-le with tab seperators then it works.
I've wondered if excel reads sep=; and then re-calls the method to get the CSV text and loses the BOM - I've tried giving incorrect text and I can't find any work around that tells excel to take both the sep and the encoding.
This is the result of my testing with Excel 2013.
If you're stuck with UTF-8, there is a workaround which consists of BOM + data + sep=;
Input (written with UTF8 encoding)
\ufeffSome;Header;Columns
Wîth;Fàncÿ;Stûff
sep=;
Output
|Some|Header|Columns|
|Wîth|Fàncÿ |Stûff |
|sep=| | |
The issue with solution is that while Excel interprets sep=;
properly, it displays sep=
(yes, it swallows the ;
) in the first column of the last row.
However, if you can write the file as UTF16-LE, then there is an actual solution. Use the \t
delimiter without specifying sep
and Excel will play ball.
Input (written with UTF16-LE encoding)
\ufeffSome;Header;Columns
Wîth;Fàncÿ;Stûff
Output
|Some|Header|Columns|
|Wîth|Fàncÿ |Stûff |