Is it possible to force Excel recognize UTF-8 CSV files automatically?

后端 未结 27 1653
醉梦人生
醉梦人生 2020-11-21 22:27

I\'m developing a part of an application that\'s responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at

相关标签:
27条回答
  • 2020-11-21 23:02

    Yes, this is possible. As previously noted by multiple users, there seems to be a problem with excel reading the correct Byte Order Mark when the file is encoded in UTF-8. With UTF-16 it does not seem to have a problem, so it is endemic to UTF-8. The solution I use for this is adding the BOM, TWICE. For this I execute the following sed command twice:

    sed -I '1s/^/\xef\xbb\xbf/' *.csv
    

    , where the wildcard can be replaced with any file name. However, this leads to a mutation of the sep= at the beginning of the .csv file. The .csv file will then open normally in excel, but with an extra row with "sep=" in the first cell. The "sep=" can also be removed in the source .csv itself, but when opening the file with VBA the delimiter should be specified:

    Workbooks.Open(name, Format:=6, Delimiter:=";", Local:=True)
    

    Format 6 is the .csv format. Set Local to true, in case there are dates in the file. If Local is not set to true the dates will be Americanized, which in some cases will corrupt the .csv format.

    0 讨论(0)
  • 2020-11-21 23:03

    The UTF-8 Byte-order marker will clue Excel 2007+ in to the fact that you're using UTF-8. (See this SO post).

    In case anybody is having the same issues I was, .NET's UTF8 encoding class does not output a byte-order marker in a GetBytes() call. You need to use streams (or use a workaround) to get the BOM to output.

    0 讨论(0)
  • 2020-11-21 23:08

    hi i'm using ruby on rails for csv generation. In our application we plan to go for the multi language(I18n) and we faced an issue while viewing I18n content in the CSV file of windows excel.

    Was fine with Linux (Ubuntu) and mac.

    We identified that windows excel need to be imported the data again to view the actual data. While import we will get more options to choose character set.

    But this can’t be educated for each and every user, so solution we looking for is to open just by double click.

    Then we identified the way of showing data by open mode and bom in windows excel with the help of aghuddleston gist. Added at reference.

    Example I18n content

    In Mac and Linux

    Swedish : Förnamn English : First name

    In Windows

    Swedish : Förnamn English : First name

    def user_information_report(report_file_path, user_id)
        user = User.find(user_id)
        I18n.locale = user.current_lang
        open_mode = "w+:UTF-16LE:UTF-8"
        bom = "\xEF\xBB\xBF"
        body user, open_mode, bom
      end
    
    def headers
        headers = [
            "ID", "SDN ID",
            I18n.t('sys_first_name'), I18n.t('sys_last_name'), I18n.t('sys_dob'),
            I18n.t('sys_gender'), I18n.t('sys_email'), I18n.t('sys_address'),
            I18n.t('sys_city'), I18n.t('sys_state'), I18n.t('sys_zip'),
            I18n.t('sys_phone_number')
        ]
      end
    
    def body tenant, open_mode, bom
        File.open(report_file_path, open_mode) do |f|
          csv_file = CSV.generate(col_sep: "\t") do |csv|
            csv << headers
            tenant.patients.find_each(batch_size: 10) do |patient|
              csv <<  [
                  patient.id, patient.patientid,
                  patient.first_name, patient.last_name, "#{patient.dob}",
                  "#{translate_gender(patient.gender)}", patient.email, "#{patient.address_1.to_s} #{patient.address_2.to_s}",
                  "#{patient.city}", "#{patient.state}",  "#{patient.zip}",
                  "#{patient.phone_number}"
              ]
            end
          end
          f.write bom
          f.write(csv_file)
        end
      end
    

    Important things to note here is open mode and bom

    open_mode = "w+:UTF-16LE:UTF-8"

    bom = "\xEF\xBB\xBF"

    Before writing the CSV insert BOM

    f.write bom

    f.write(csv_file)

    Windows and Mac

    File can be opened directly by double clicking.

    Linux (ubuntu)

    While opening a file ask for the separator options -> choose “TAB”

    0 讨论(0)
  • 2020-11-21 23:09

    It is incredible that there are so many answers but none answers the question:

    "When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user,..."

    The answer marked as the accepted answer with 200+ up-votes is useless for me because I don't want to give my users a manual how to configure Excel. Apart from that: this manual will apply to one Excel version but other Excel versions have different menus and configuration dialogs. You would need a manual for each Excel version.

    So the question is how to make Excel show UTF8 data with a simple double click?

    Well at least in Excel 2007 this is not possible if you use CSV files because the UTF8 BOM is ignored and you will see only garbage. This is already part of the question of Lyubomyr Shaydariv:

    "I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."

    I make the same experience: Writing russian or greek data into a UTF8 CSV file with BOM results in garbage in Excel:

    Content of UTF8 CSV file:

    Colum1;Column2
    Val1;Val2
    Авиабилет;Tλληνικ
    

    Result in Excel 2007:

    A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional problem to the UTF-8 problem.

    Excel exists since decades. It is a shame that Microsoft was not able to implement such a basic thing as CSV import in all these years.


    However, if you put the same values into a HTML file and save that file as UTF8 file with BOM with the file extension XLS you will get the correct result.

    Content of UTF8 XLS file:

    <table>
    <tr><td>Colum1</td><td>Column2</td></tr>
    <tr><td>Val1</td><td>Val2</td></tr>
    <tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
    </table>
    

    Result in Excel 2007:

    You can even use colors in HTML which Excel will show correctly.

    <style>
    .Head { background-color:gray; color:white; }
    .Red  { color:red; }
    </style>
    <table border=1>
    <tr><td class=Head>Colum1</td><td class=Head>Column2</td></tr>
    <tr><td>Val1</td><td>Val2</td></tr>
    <tr><td class=Red>Авиабилет</td><td class=Red>Tλληνικ</td></tr>
    </table>
    

    Result in Excel 2007:

    In this case only the table itself has a black border and lines. If you want ALL cells to display gridlines this is also possible in HTML:

    <html xmlns:x="urn:schemas-microsoft-com:office:excel">
        <head>
            <meta http-equiv="content-type" content="text/plain; charset=UTF-8"/>
            <xml>
                <x:ExcelWorkbook>
                    <x:ExcelWorksheets>
                        <x:ExcelWorksheet>
                            <x:Name>MySuperSheet</x:Name>
                            <x:WorksheetOptions>
                                <x:DisplayGridlines/>
                            </x:WorksheetOptions>
                        </x:ExcelWorksheet>
                    </x:ExcelWorksheets>
                </x:ExcelWorkbook>
            </xml>
        </head>
        <body>
            <table>
                <tr><td>Colum1</td><td>Column2</td></tr>
                <tr><td>Val1</td><td>Val2</td></tr>
                <tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
            </table>
        </body>
    </html>
    

    This code even allows to specify the name of the worksheet (here "MySuperSheet")

    Result in Excel 2007:

    0 讨论(0)
  • 2020-11-21 23:09

    I have had the same issue in the past (how to produce files that Excel can read, and other tools can also read). I was using TSV rather than CSV, but the same problem with encodings came up.

    I failed to find any way to get Excel to recognize UTF-8 automatically, and I was not willing/able to inflict on the consumers of the files complicated instructions how to open them. So I encoded them as UTF-16le (with a BOM) instead of UTF-8. Twice the size, but Excel can recognize the encoding. And they compress well, so the size rarely (but sadly not never) matters.

    0 讨论(0)
  • 2020-11-21 23:09

    Just for help users interested on opening the file on Excel that achieve this thread like me.

    I have used the wizard below and it worked fine for me, importing an UTF-8 file. Not transparent, but useful if you already have the file.

    1. Open Microsoft Excel 2007.
    2. Click on the Data menu bar option.
    3. Click on the From Text icon.
    4. Navigate to the location of the file that you want to import. Click on the filename and then click on the Import button. The Text Import Wizard - Step 1 or 3 window will now appear on the screen.
    5. Choose the file type that best describes your data - Delimited or Fixed Width.
    6. Choose 65001: Unicode (UTF-8) from the drop-down list that appears next to File origin.
    7. Click on the Next button to display the Text Import Wizard - Step 2 or 3 window.
    8. Place a checkmark next to the delimiter that was used in the file you wish to import into Microsoft Excel 2007. The Data preview window will show you how your data will appear based on the delimiter that you chose.
    9. Click on the Next button to display the Text Import Wizard - Step 3 of 3.
    10. Choose the appropriate data format for each column of data that you want to import. You also have the option to not import one or more columns of data if you want.
    11. Click on the Finish button to finish importing your data into Microsoft Excel 2007.

    Source: https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0

    0 讨论(0)
提交回复
热议问题