There are a few threads on stackoverflow, but i couldn\'t find a valid solution to the problem as a whole.
I have collected huge sums of textual data from the urllib rea
Your data is unicode data. To write that to a file, use .encode()
:
text = text.encode('ascii', 'ignore')
but that would remove anything that isn't ASCII. Perhaps you wanted to encode to a more suitable encoding, like UTF-8, instead?
You may want to read up on Python and Unicode:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder