How to export DataFrame to Html with utf-8 encoding?

后端 未结 4 1432
南笙
南笙 2021-01-05 05:39

I keep getting:

UnicodeEncodeError: \'ascii\' codec can\'t encode characters in position 265-266: ordinal not in range(128)

when I try:

相关标签:
4条回答
  • 2021-01-05 06:01

    The way it worked for me:

    html = df.to_html()
    
    with open("dataframe.html", "w", encoding="utf-8") as file:
        file.writelines('<meta charset="UTF-8">\n')
        file.write(html)
    
    0 讨论(0)
  • 2021-01-05 06:20

    Your problem is in other code. Your sample code has a Unicode string that has been mis-decoded as latin1, Windows-1252, or similar, since it has UTF-8 sequences in it. Here I undo the bad decoding and redecode as UTF-8, but you'll want to find where the wrong decode is being performed:

    >>> s = u'Rue du Gu\xc3\xa9, 78120 Sonchamp'
    >>> s.encode('latin1').decode('utf8')
    u'Rue du Gu\xe9, 78120 Sonchamp'
    >>> print(s.encode('latin1').decode('utf8'))
    Rue du Gué, 78120 Sonchamp
    
    0 讨论(0)
  • 2021-01-05 06:22

    The issue is actually in using df.to_html("mypage.html") to save the HTML to a file directly. If instead you write the file yourself, you can avoid this encoding bug with pandas.

    html = df.to_html()
    with open("mypage.html", "w", encoding="utf-8") as file:
        file.write(html)
    

    You may also need to specify the character set in the head of the HTML for it to show up properly on certain browsers (HTML5 has UTF-8 as default):

    <meta charset="UTF-8">

    This was the only method that worked for me out of the several I've seen.

    0 讨论(0)
  • 2021-01-05 06:23

    If you really need to keep the output to html, you could try cleaning the code in a numpy array before writing to_html.

    df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})
    
    def clean_unicode(df):
       *#Transforms the DataFrame to Numpy array*
       df=df.as_matrix()
       *#Encode all strings with special characters* 
       for x in np.nditer(df, flags=['refs_ok'], op_flags =['copy', 'readonly']):
             df[df==x]=str(str(x).encode("latin-1", "replace").decode('utf8'))
       *#Transform the Numpy array to Dataframe again*
       df=pd.DataFrame(df)
       return df
    
    df=clean_unicode(df)
    df.to_html("Results.html') -----> Success!
    
    0 讨论(0)
提交回复
热议问题