I am parsing an xsl file using xlrd. Most of the things are working fine. I have a dictionary where keys are strings and values are lists of strings. All the keys and values
You can also try this to get the text.
foo.encode('ascii', 'ignore')
As here str(u'\u2013')
is causing error so use isinstance(foo,basestring)
to check for unicode/string, if not of type base string convert it into Unicode and then apply encode
if isinstance(foo,basestring):
foo.encode('utf8')
else:
unicode(foo).encode('utf8')
further read
You can print Unicode objects as well, you don't need to do str() around it.
Assuming you really want a str:
When you do str(u'\u2013') you are trying to convert the Unicode string to a 8-bit string. To do this you need to use an encoding, a mapping between Unicode data to 8-bit data. What str() does is that is uses the system default encoding, which under Python 2 is ASCII. ASCII contains only the 127 first code points of Unicode, that is \u0000 to \u007F1. The result is that you get the above error, the ASCII codec just doesn't know what \u2013 is (it's a long dash, btw).
You therefore need to specify which encoding you want to use. Common ones are ISO-8859-1, most commonly known as Latin-1, which contains the 256 first code points; UTF-8, which can encode all code-points by using variable length encoding, CP1252 that is common on Windows, and various Chinese and Japanese encodings.
You use them like this:
u'\u2013'.encode('utf8')
The result is a str containing a sequence of bytes that is the uTF8 representation of the character in question:
'\xe2\x80\x93'
And you can print it:
>>> print '\xe2\x80\x93'
–
for me this works
unicode(data).encode('utf-8')
I had exactly this issue in a recent project which really is a pain in the rear. I finally found it's because the Python we used in Docker has encoding "ansi_x3.4-1968" instead of "utf-8". So if anyone out there using Docker and got this error, following these steps may thoroughly solve your problem.
create a file and name it default_locale in the same directory of your Dockerfile, put this line in it,
environment=LANG="es_ES.utf8", LC_ALL="es_ES.UTF-8", LC_LANG="es_ES.UTF-8"
add these to your Dockerfile,
RUN apt-get clean && apt-get update && apt-get install -y locales
RUN locale-gen en_CA.UTF-8
COPY ./default_locale /etc/default/locale
RUN chmod 0755 /etc/default/locale
ENV LC_ALL=en_CA.UTF-8
ENV LANG=en_CA.UTF-8
ENV LANGUAGE=en_CA.UTF-8
This thoroughly solved my issue when I built and run my Docker again, hopefully this solve your issue also.
I had the same problem. This work fine for me:
str(objdata).encode('utf-8')