I keep getting:
UnicodeEncodeError: \'ascii\' codec can\'t encode characters in position 265-266: ordinal not in range(128)
when I try:
The way it worked for me:
html = df.to_html()
with open("dataframe.html", "w", encoding="utf-8") as file:
file.writelines('<meta charset="UTF-8">\n')
file.write(html)
Your problem is in other code. Your sample code has a Unicode string that has been mis-decoded as latin1
, Windows-1252
, or similar, since it has UTF-8 sequences in it. Here I undo the bad decoding and redecode as UTF-8, but you'll want to find where the wrong decode is being performed:
>>> s = u'Rue du Gu\xc3\xa9, 78120 Sonchamp'
>>> s.encode('latin1').decode('utf8')
u'Rue du Gu\xe9, 78120 Sonchamp'
>>> print(s.encode('latin1').decode('utf8'))
Rue du Gué, 78120 Sonchamp
The issue is actually in using df.to_html("mypage.html")
to save the HTML to a file directly. If instead you write the file yourself, you can avoid this encoding bug with pandas.
html = df.to_html()
with open("mypage.html", "w", encoding="utf-8") as file:
file.write(html)
You may also need to specify the character set in the head of the HTML for it to show up properly on certain browsers (HTML5 has UTF-8 as default):
<meta charset="UTF-8">
This was the only method that worked for me out of the several I've seen.
If you really need to keep the output to html, you could try cleaning the code in a numpy array before writing to_html.
df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})
def clean_unicode(df):
*#Transforms the DataFrame to Numpy array*
df=df.as_matrix()
*#Encode all strings with special characters*
for x in np.nditer(df, flags=['refs_ok'], op_flags =['copy', 'readonly']):
df[df==x]=str(str(x).encode("latin-1", "replace").decode('utf8'))
*#Transform the Numpy array to Dataframe again*
df=pd.DataFrame(df)
return df
df=clean_unicode(df)
df.to_html("Results.html') -----> Success!