Microsoft Excel mangles Diacritics in .csv files?

后端未结

关注

 22  1753

I am programmatically exporting data (using PHP 5.2) into a .csv test file.
Example data: Numéro 1 (note the accented e). The data is utf-8 (

相关标签:

22条回答

名媛妹妹

2020-11-22 05:20

As Fregal said \uFEFF is the way to go.

<%@LANGUAGE="JAVASCRIPT" CODEPAGE="65001"%>
<%
Response.Clear();
Response.ContentType = "text/csv";
Response.Charset = "utf-8";
Response.AddHeader("Content-Disposition", "attachment; filename=excelTest.csv");
Response.Write("\uFEFF");
// csv text here
%>

0 讨论(0)

感情败类

2020-11-22 05:20

Excel 2007 properly reads UTF-8 with BOM (EF BB BF) encoded csv.

Excel 2003 (and maybe earlier) reads UTF-16LE with BOM (FF FE), but with TABs instead of commas or semicolons.

0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2020-11-22 05:20

Another solution I found was just to encode the result as Windows Code Page 1252 (Windows-1252 or CP1252). This would be done, for example by setting Content-Type appropriately to something like text/csv; charset=Windows-1252 and setting the character encoding of the response stream similarly.

0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2020-11-22 05:20

Check the encoding in which you are generating the file, to make excel display the file correctly you must use the system default codepage.

Wich language are you using? if it's .Net you only need to use Encoding.Default while generating the file.

0 讨论(0)
发布评论:

提交评论
- 加载中...

北海茫月

2020-11-22 05:22

If you have legacy code in vb.net like I have, the following code worked for me:

    Response.Clear()
    Response.ClearHeaders()
    Response.ContentType = "text/csv"
    Response.Expires = 0
    Response.AddHeader("Content-Disposition", "attachment; filename=export.csv;")
    Using sw As StreamWriter = New StreamWriter(Context.Response.OutputStream, System.Text.Encoding.Unicode)
        sw.Write(csv)
        sw.Close()
    End Using
    Response.End()

0 讨论(0)

萌比男神i

2020-11-22 05:23

A correctly formatted UTF8 file can have a Byte Order Mark as its first three octets. These are the hex values 0xEF, 0xBB, 0xBF. These octets serve to mark the file as UTF8 (since they are not relevant as "byte order" information).1 If this BOM does not exist, the consumer/reader is left to infer the encoding type of the text. Readers that are not UTF8 capable will read the bytes as some other encoding such as Windows-1252 and display the characters ï»¿ at the start of the file.

There is a known bug where Excel, upon opening UTF8 CSV files via file association, assumes that they are in a single-byte encoding, disregarding the presence of the UTF8 BOM. This can not be fixed by any system default codepage or language setting. The BOM will not clue in Excel - it just won't work. (A minority report claims that the BOM sometimes triggers the "Import Text" wizard.) This bug appears to exist in Excel 2003 and earlier. Most reports (amidst the answers here) say that this is fixed in Excel 2007 and newer.

Note that you can always* correctly open UTF8 CSV files in Excel using the "Import Text" wizard, which allows you to specify the encoding of the file you're opening. Of course this is much less convenient.

Readers of this answer are most likely in a situation where they don't particularly support Excel < 2007, but are sending raw UTF8 text to Excel, which is misinterpreting it and sprinkling your text with Ã and other similar Windows-1252 characters. Adding the UTF8 BOM is probably your best and quickest fix.

If you are stuck with users on older Excels, and Excel is the only consumer of your CSVs, you can work around this by exporting UTF16 instead of UTF8. Excel 2000 and 2003 will double-click-open these correctly. (Some other text editors can have issues with UTF16, so you may have to weigh your options carefully.)

_{* Except when you can't, (at least) Excel 2011 for Mac's Import Wizard does not actually always work with all encodings, regardless of what you tell it. </anecdotal-evidence> :)}

0 讨论(0)
发布评论:

提交评论
- 加载中...