I having a Excel document with a data table containing Chinese characters. I am trying to export this Excel spreadsheet to a CSV file for importing into a MySQL database.
<For some people this solution may work: https://support.geekseller.com/knowledgebase/utf-8/
When saving csv, go to lower right Tools > Web Options > Encoding > Unicode (UTF-8)
Or this SO answer: just use Google Sheets to save csv as unicode: Excel to CSV with UTF8 encoding
I have tried all above methods for my data but it does not quite work for my data (Simplified Chinese, over 700Mb. I have tried Windows Chinese and English system, English and Chinese excel. Windows excel seems not be able to save to utf8 even it claims to do so. I specify the uft8 csv in save as, but when i use the 'open sheet' to detect the encoding mehtods. it is not uft8,not GB* as well. Here is my final solution.
(1) Download 'open sheet'.
(2) Open it properly. You Ccan scroll the encoding method until you see the Chinese character displayed in the preview windows.
(3) Save it as utf-8(if you want utf-8).
PS:You need to figure out the default encoding in your system. As far as I know, Ubuntu deals with UTF8 fine. But the windows default Simplied Chinese is start with GB**.Even if you encode it as utf8, still, you might open it cocrrectly as well. In my case, r could not open my utf-8 csv, but can open the GB* encoding.
This methods work well even your file is very large. Some other work around is google sheet(but the file size can be limited). Notepad++ also works for smaller file.
There is a way to detect the encoding methods by opening your file and scroll through the encoding methods until you see the Chinese displayed correctly.
You should save csv file with:
df.to_csv(file_name, encoding = 'utf_8_sig')
instead of:
df.to_csv(file_name, encoding = 'utf-8')
The following method has been tested and used to import CSV files in MongoDB, so it should work:
In your Excel worksheet, go to File > Save As.
Name the file and choose Unicode Text (*.txt) from the drop-down list next to "Save as type", and then click Save.
Open the unicode .txt file using your preferred text editor, for example Notepad.
Since our unicode text file is a tab-delimited file and we want to convert Excel to CSV (comma-separated) file, we need to replace all tabs with commas.
Select a tab character, right click it and choose Copy from the context menu, or simply press CTRL+C as shown in the screenshot below.
Press CTRL+H to open the Replace dialog and paste the copied tab (CTRL+V) in the Find what field. When you do this, the cursor will move rightwards indicating that the tab was pasted. Type a comma in the Replace with field and click Replace All.
Click File > Save As, enter a file name and change the encoding to UTF-8. Then click the Save button.
Change the .txt extension to .csv directly in Notepad's Save as dialog and choose All files (.) next to Save as type, as shown in the screenshot below.
Open the CSV file from Excel by clicking File > Open > Text files (.prn, .txt, .csv) and verify if the data is Okay.
Source here
As far as I know Excel doesn't save CSV files in any Unicode encoding. I have had similar issues recently trying to export a file as CSV with the £ symbol. I had the benefit of being able to use another tool altogether.
My version of Excel 2010 can export in Unicode format File > Save As > Unicode Text (.txt)
, but the output is a tab-delimited, UCS-2 encoded file. I don't know MySQL at all but a brief look at the specifications and it appears to handle tab delimited imports and UCS-2. It may be worth trying this output.
Edit: Additionally, you could always open this Unicode output in Notepad++ convert it to UTF-8 Encoding > Convert to UTF-8 without BOM
And possibly replace all tab chars with commas too (Use the Replace dialogue in Extended Search mode, \t
in the Find box and ,
in the Replace box.)
You might want to try notepad++, I doubt notepad will support unicode characters.
http://notepad-plus-plus.org/