Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

后端 未结 5 1721
执笔经年
执笔经年 2021-02-20 14:45

I\'m trying to parse, manipulate, and output HTML using Python\'s ElementTree:

import sys
from cStringIO  import StringIO
from xml.etree  import ElementTree as E         


        
5条回答
  •  抹茶落季
    2021-02-20 15:13

    Your   is being converted to '\xa0' which is the default (ascii) encoding for a nonbreaking space (the UTF-8 encoding is '\xc2\xa0'.) The line

    '\xa0'.encode('utf-8')
    

    results in a UnicodeDecodeError, because the default codec, ascii, only works up to 128 characters and ord('\xa0') = 160. Setting the default encoding to something else, i.e.:

    import sys
    reload(sys)  # must reload sys to use 'setdefaultencoding'
    sys.setdefaultencoding('latin-1')
    
    print '\xa0'.encode('utf-8', "xmlcharrefreplace")
    

    should solve your problem.

提交回复
热议问题