问题
I am trying to write a dataframe to an Excel spreadsheet using ExcelWriter, but it keeps returning an error:
openpyxl.utils.exceptions.IllegalCharacterError
I'm guessing there's some character in the dataframe that ExcelWriter doesn't like. It seems odd, because the dataframe is formed from three Excel spreadsheets, so I can't see how there could be a character that Excel doesn't like!
Is there any way to iterate through a dataframe and replace characters that ExcelWriter doesn't like? I don't even mind if it simply deletes them.
What's the best way or removing or replacing illegal characters from a dataframe?
回答1:
Based on Haipeng Su's answer, I added a function that does this:
dataframe = dataframe.applymap(lambda x: x.encode('unicode_escape').
decode('utf-8') if isinstance(x, str) else x)
Basically, it escapes the unicode characters if they exist. It worked and I can now write to Excel spreadsheets again!
回答2:
try a different excel writer engine solved my problem.
writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')
回答3:
The same problem happened to me. I solved it as follows:
- install python package xlsxwriter:
pip install xlsxwriter
- replace the default engine 'openpyxl' with 'xlsxwriter':
dataframe.to_excel("file.xlsx", engine='xlsxwriter')
回答4:
I was also struggling with some weird characters in a data frame when writing the data frame to html or csv. For example, for characters with accent, I can't write to html file, so I need to convert the characters into characters without the accent.
My method may not be the best, but it helps me to convert unicode
string into ascii
compatible.
# install unidecode first
from unidecode import unidecode
def FormatString(s):
if isinstance(s, unicode):
try:
s.encode('ascii')
return s
except:
return unidecode(s)
else:
return s
df2 = df1.applymap(FormatString)
In your situation, if you just want to get rid of the illegal characters by changing return unidecode(s)
to return 'StringYouWantToReplace'
.
Hope this can give me some ideas to deal with your problems.
回答5:
If you're still struggling to clean up the characters, this worked well for me:
import xlwings as xw
import pandas as pd
df = pd.read_pickle('C:\\Users\\User1\\picked_DataFrame_notWriting.df')
topath = 'C:\\Users\\User1\\tryAgain.xlsx'
wb = xw.Book(topath)
ws = wb.sheets['Data']
ws.range('A1').options(index=False).value = df
wb.save()
wb.close()
回答6:
Just remove the illegal characters from your dataframe before exporting it into Excel.
import pandas as pd
import re
import openpyxl
from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE
writer = pd.ExcelWriter(myexcelfilepath, engine='openpyxl')
# [optional] avoid pandas.DataFrame.to_excel overwritting your existing workbook
workbook = openpyxl.load_workbook(myexcelfilepath)
writer.book = workbook
# replace illegal characters in str or unicode value by ''
# using the regex ILLEGAL_CHARACTERS_RE string defined in openpyxl.cell.cell module
mydataframe = mydataframe.applymap(
lambda x: re.sub(ILLEGAL_CHARACTERS_RE, '', x)
if isinstance(x, str) or isinstance(x, unicode) else x)
# export your cleaned dataframe to excel
mydataframe.to_excel(writer, sheet_name='targetsheetname', index=False)
writer.close()
来源:https://stackoverflow.com/questions/42306755/how-to-remove-illegal-characters-so-a-dataframe-can-write-to-excel