I get some data from a webpage and read it like this in python
origional_doc = urllib2.urlopen(url).read()
Sometimes this url has characters su
This should work. It will eliminate all characters that are not ascii.
original_doc = (original_doc.decode('unicode_escape').encode('ascii','ignore'))