There are a few threads on stackoverflow, but i couldn\'t find a valid solution to the problem as a whole.
I have collected huge sums of textual data from the urllib rea
Your data is unicode data. To write that to a file, use .encode()
:
text = text.encode('ascii', 'ignore')
but that would remove anything that isn't ASCII. Perhaps you wanted to encode to a more suitable encoding, like UTF-8, instead?
You may want to read up on Python and Unicode:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
You can do it through smart_str
of Django
module. Just try this:
from django.utils.encoding import smart_str, smart_unicode
text = u'\u2019'
print smart_str(text)
You can install Django by starting a command shell with administrator privileges and run this command:
pip install Django