I\'m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.
The problem is that
I had this issue trying to output Unicode characters to stdout
, but with sys.stdout.write
, rather than print (so that I could support output to a different file as well).
From BeautifulSoup's own documentation, I solved this with the codecs library:
import sys
import codecs
def main(fIn, fOut):
soup = BeautifulSoup(fIn)
# Do processing, with data including non-ASCII characters
fOut.write(unicode(soup))
if __name__ == '__main__':
with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
with codecs.getwriter('utf-8')(sys.stdout) as fOut:
main(fIn, fOut)