I want to parse my XML document. So I have stored my XML document as below
class XMLdocs(db.Expando):
id = db.IntegerProperty()
name=db.StringPro
This worked for me:
from django.utils.encoding import smart_str
content = smart_str(content)
It seems you are hitting a UTF-8 byte order mark (BOM). Try using this unicode string with BOM extracted out:
import codecs
content = unicode(q.content.strip(codecs.BOM_UTF8), 'utf-8')
parser.parse(StringIO.StringIO(content))
I used strip
instead of lstrip
because in your case you had multiple occurences of BOM, possibly due to concatenated file contents.
The problem according to your traceback is the print
statement on line 136 of parseXML.py
. Unfortunately you didn't see fit to post that part of your code, but I'm going to guess it is just there for debugging. If you change it to:
print repr(ch)
then you should at least see what you are trying to print.
The problem is that you're trying to print an unicode character to a possibly non-unicode terminal. You need to encode it with the 'replace
option before printing it, e.g. print ch.encode(sys.stdout.encoding, 'replace')
.
The actual best answer for this problem depends on your environment, specifically what encoding your terminal expects.
The quickest one-line solution is to encode everything you print to ASCII, which your terminal is almost certain to accept, while discarding characters that you cannot print:
print ch #fails
print ch.encode('ascii', 'ignore')
The better solution is to change your terminal's encoding to utf-8, and encode everything as utf-8 before printing. You should get in the habit of thinking about your unicode encoding EVERY time you print or read a string.
Just putting .encode('utf-8')
at the end of object will do the job in recent versions of Python.