I have done some research and seen solutions but none have worked for me.
Python - 'ascii' codec can't decode byte
This didn\'t work for me. And
encode
= turn a unicode string into a bytestring
decode
= turn a bytestring into unicode
since you already have a bytestring you need decode to make it a unicode instance (assuming that is actually what you are trying to do)
output_string = '\n'.join(output_lines)
print output_string.decode("latin1") #now this returns unicode
A simple example of the problem is:
>>> '\xe9'.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)
\xe9
isn't an ascii character which means that your string is already encoded. You need to decode it into python's unicode and then encode it again in the serialization format you want.
Since I don't know where your string came from, I just peeked at the python codecs, picked something from Western Europe and gave it a go:
>>> '\xe9'.decode('cp1252')
u'\xe9'
>>> u'\xe9'.encode('utf-8')
'\xc3\xa9'
>>>
You'll have the best luck if you know exactly which encoding the file came from.
Based on what you want to do with your lines, you can do different work here, if you just want to print in consul as normally the consuls use utf8
encoding you dont need to do that by your self as the format of your string is not unicode
:
>>> output_string = '\n'.join(output_lines)
>>> print output_string
<menu>
<day name="monday">
<meal name="BREAKFAST">
<counter name="Entreé">
<dish>
<name icon1="Vegan" icon2="Mindful Item">
Cream of Wheat (Farina)
</name>
</dish>
</counter >
</meal >
</day >
</menu >
But if you want to write to file you can use codecs module:
import codecs
f= codecs.open('out_file','w',encoding='utf8')
You are trying to encode bytestrings:
>>> '<counter name="Entreé">'.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)
Python is trying to be helpful, you can only encode a Unicode string to bytes, so to encode Python first implictly decodes, using the default encoding.
The solution is to not encode data that is already encoded, or first decode using a suitable codec before trying to encode again, if the data was encoded to a different codec than what you needed.
If you have a mix of unicode and bytestring values, decode just the bytestrings or encode just the unicode values; try to avoid mixing the types. The following decodes byte strings to unicode first:
def ensure_unicode(v):
if isinstance(v, str):
v = v.decode('utf8')
return unicode(v) # convert anything not a string to unicode too
output_string = u'\n'.join([ensure_unicode(line) for line in output_lines])