I\'m trying to parse, manipulate, and output HTML using Python\'s ElementTree:
import sys
from cStringIO import StringIO
from xml.etree import ElementTree as E
Your
is being converted to '\xa0' which is the default (ascii) encoding for a nonbreaking space (the UTF-8 encoding is '\xc2\xa0'.) The line
'\xa0'.encode('utf-8')
results in a UnicodeDecodeError, because the default codec, ascii, only works up to 128 characters and ord('\xa0') = 160. Setting the default encoding to something else, i.e.:
import sys
reload(sys) # must reload sys to use 'setdefaultencoding'
sys.setdefaultencoding('latin-1')
print '\xa0'.encode('utf-8', "xmlcharrefreplace")
should solve your problem.