UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)

前端 未结 7 1552
旧巷少年郎
旧巷少年郎 2020-11-28 04:24

I want to parse my XML document. So I have stored my XML document as below

class XMLdocs(db.Expando):  
   id = db.IntegerProperty()    
   name=db.StringPro         


        
相关标签:
7条回答
  • 2020-11-28 04:39

    This worked for me:

    from django.utils.encoding import smart_str
    content = smart_str(content)
    
    0 讨论(0)
  • 2020-11-28 04:46

    It seems you are hitting a UTF-8 byte order mark (BOM). Try using this unicode string with BOM extracted out:

    import codecs
    
    content = unicode(q.content.strip(codecs.BOM_UTF8), 'utf-8')
    parser.parse(StringIO.StringIO(content))
    

    I used strip instead of lstrip because in your case you had multiple occurences of BOM, possibly due to concatenated file contents.

    0 讨论(0)
  • 2020-11-28 04:50

    The problem according to your traceback is the print statement on line 136 of parseXML.py. Unfortunately you didn't see fit to post that part of your code, but I'm going to guess it is just there for debugging. If you change it to:

    print repr(ch)
    

    then you should at least see what you are trying to print.

    0 讨论(0)
  • 2020-11-28 04:50

    The problem is that you're trying to print an unicode character to a possibly non-unicode terminal. You need to encode it with the 'replace option before printing it, e.g. print ch.encode(sys.stdout.encoding, 'replace').

    0 讨论(0)
  • 2020-11-28 04:55

    The actual best answer for this problem depends on your environment, specifically what encoding your terminal expects.

    The quickest one-line solution is to encode everything you print to ASCII, which your terminal is almost certain to accept, while discarding characters that you cannot print:

    print ch #fails
    print ch.encode('ascii', 'ignore')
    

    The better solution is to change your terminal's encoding to utf-8, and encode everything as utf-8 before printing. You should get in the habit of thinking about your unicode encoding EVERY time you print or read a string.

    0 讨论(0)
  • 2020-11-28 04:55

    Just putting .encode('utf-8') at the end of object will do the job in recent versions of Python.

    0 讨论(0)
提交回复
热议问题