Pandas - Writing an excel file containing unicode - IllegalCharacterError

后端未结

关注

 7  1362

I have the following code:

import pandas as pd

x = [u\'string with some unicode: \\x16\']
df = pd.DataFrame(x)

If I try to write this datafram

相关标签:

7条回答

北海茫月

2021-02-07 17:46
Use this to remove any error that you might be getting. You can save to excel post this.
```
df = df.applymap(lambda x: x.encode('unicode_escape').
                 decode('utf-8') if isinstance(x, str) else x)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2021-02-07 17:49

I don't know this particular language, but generally there is an error with excel and UTF8. If you just open a file of UTF8 characters with excel programatically, it will corrupt them (it doesn't seem to handle all the bits in the character, but truncates it to effectively the first 2 and last 2 hex numbers of the 8 present in extended characters).

A work around, to load a utf file correctly into excel, is to get the program insert a macro into your excel sheet after you have loaded it which imports the data. I have some code to do this in C#, if that's any help?

does your input contain any extended characters (i.e. àâäçæèëéêìïîñòöôœûüùÿÀÂÄÇÆÈËÉÊÌÏÎÑÒÖÔŒÛÜÙŸ) and if you take them out, does it work?

0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2021-02-07 17:53
When I encounter this error, I usually go around it by writing the file to a '.csv instead of '.xlsx' files. So instead of
```
yourdataframe.to_excel('Your workbook name.xlsx')
```
I would do:
```
yourdataframe.to_csv('Your workbook name.csv')
```
It appears the way pandas decodes .csv files by default is:
```
encoding : string, optional
A string representing the encoding to use in the output file,
defaults to 'ascii' on Python 2 and 'utf-8' on Python 3.
```
On the other hand default encoding of .xlsx files is:
```
encoding: string, default None
encoding of the resulting excel file. Only necessary for xlwt,
other writers support unicode natively.
```
This difference is responsible for that error. You will also get the error when you write data with strings that start with - or + to a .xlsx file.
0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2021-02-07 18:07

I've answered a similar question at this post: https://stackoverflow.com/a/63950544/1851492, below is the same content.

If you don't want to install another excel writer engine (e.g. xlsxwriter), you may try to remove these illegal characters by looking for the pattern which cause IllegalCharacterError raised.

Open cell.py which under the path /path/to/your/python/site-packages/openpyxl/cell/, look for check_string function, you'll see it using a defined regular expression pattern ILLEGAL_CHARACTERS_RE to find those illegal characters. Trying to locate its definition you'll see this line:

ILLEGAL_CHARACTERS_RE = re.compile(r'[\000-\010]|[\013-\014]|[\016-\037]')

This line is what you need to remove those characters. Copy this line to your program and execute below code before your dataframe is writing to excel:

dataframe = dataframe.applymap(lambda x: ILLEGAL_CHARACTERS_RE.sub(r'', x) if isinstance(x, str) else x)

The above line will apply remove those characters to every cells.

0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2021-02-07 18:08
The same problem happened to me. I solved it as follows:

First, install python package xlsxwriter:
```
pip install xlsxwriter
```
Second, replace the default engine 'openpyxl' with 'xlsxwriter':
```
df.to_excel("test.xlsx", engine='xlsxwriter')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

陌清茗

2021-02-07 18:09

for writing a data frame containing unicode characters to multiple sheets in a single excel file below code can be helpful:

%pip install xlsxwriter
from pandas import ExcelWriter
import xlsxwriter
writer = ExcelWriter('notes.xlsx')
for key in dict_df:
        data[key].to_excel(writer, key,index=False,engine='xlsxwriter')
writer.save()

0 讨论(0)

1 2 下一页