Use soup.get_text() with UTF-8
问题 I need to get all the text from a page using BeautifulSoup. At BeautifulSoup's documentation, it showed that you could do soup.get_text() to do this. When I tried doing this on reddit.com, I got this error: UnicodeEncodeError in soup.py:16 'cp932' codec can't encode character u'\xa0' in position 2262: illegal multibyte sequence I get errors like that on most of the sites I checked. I got similar errors when I did soup.prettify() too, but I fixed it by changing it to soup.prettify('UTF-8') .