How to remove illegal characters so a dataframe can write to Excel

前提是你 提交于 2020-01-10 08:53:20

问题


I am trying to write a dataframe to an Excel spreadsheet using ExcelWriter, but it keeps returning an error:

openpyxl.utils.exceptions.IllegalCharacterError

I'm guessing there's some character in the dataframe that ExcelWriter doesn't like. It seems odd, because the dataframe is formed from three Excel spreadsheets, so I can't see how there could be a character that Excel doesn't like!

Is there any way to iterate through a dataframe and replace characters that ExcelWriter doesn't like? I don't even mind if it simply deletes them.

What's the best way or removing or replacing illegal characters from a dataframe?


回答1:


Based on Haipeng Su's answer, I added a function that does this:

dataframe = dataframe.applymap(lambda x: x.encode('unicode_escape').
                 decode('utf-8') if isinstance(x, str) else x)

Basically, it escapes the unicode characters if they exist. It worked and I can now write to Excel spreadsheets again!




回答2:


try a different excel writer engine solved my problem.

writer = pd.ExcelWriter('file.xlsx', engine='xlsxwriter')



回答3:


The same problem happened to me. I solved it as follows:

  1. install python package xlsxwriter:
pip install xlsxwriter
  1. replace the default engine 'openpyxl' with 'xlsxwriter':
dataframe.to_excel("file.xlsx", engine='xlsxwriter')



回答4:


I was also struggling with some weird characters in a data frame when writing the data frame to html or csv. For example, for characters with accent, I can't write to html file, so I need to convert the characters into characters without the accent.

My method may not be the best, but it helps me to convert unicode string into ascii compatible.

# install unidecode first 
from unidecode import unidecode

def FormatString(s):
if isinstance(s, unicode):
  try:
    s.encode('ascii')
    return s
  except:
    return unidecode(s)
else:
  return s

df2 = df1.applymap(FormatString) 

In your situation, if you just want to get rid of the illegal characters by changing return unidecode(s) to return 'StringYouWantToReplace'.

Hope this can give me some ideas to deal with your problems.




回答5:


If you're still struggling to clean up the characters, this worked well for me:

import xlwings as xw
import pandas as pd
df = pd.read_pickle('C:\\Users\\User1\\picked_DataFrame_notWriting.df')
topath = 'C:\\Users\\User1\\tryAgain.xlsx'
wb = xw.Book(topath)
ws = wb.sheets['Data']
ws.range('A1').options(index=False).value = df
wb.save()
wb.close()



回答6:


Just remove the illegal characters from your dataframe before exporting it into Excel.

import pandas as pd
import re
import openpyxl
from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE


writer = pd.ExcelWriter(myexcelfilepath, engine='openpyxl')

# [optional] avoid pandas.DataFrame.to_excel overwritting your existing workbook 
workbook = openpyxl.load_workbook(myexcelfilepath)
writer.book = workbook

# replace illegal characters in str or unicode value by '' 
# using the regex ILLEGAL_CHARACTERS_RE string defined in openpyxl.cell.cell module
mydataframe = mydataframe.applymap(
               lambda x: re.sub(ILLEGAL_CHARACTERS_RE, '', x) 
               if isinstance(x, str) or isinstance(x, unicode) else x)

# export your cleaned dataframe to excel
mydataframe.to_excel(writer, sheet_name='targetsheetname', index=False)
writer.close()


来源:https://stackoverflow.com/questions/42306755/how-to-remove-illegal-characters-so-a-dataframe-can-write-to-excel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!