问题
from bs4 import BeautifulSoup
import urllib.request
link = ('https://mywebsite.org')
req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0'})
url = urllib.request.urlopen(req).read()
soup = BeautifulSoup(url, "html.parser")
body = soup.find_all('div', {"class":"wrapper"})
print(body)
Hi guys, I have a problem with this code. If I run it it come the error
UnicodeEncodeError: 'charmap' codec can't encode character '\u2022' in position 138: character maps to
I tryed to search and I found that I had to add
.encode("utf-8")
but if I add it come the error
AttributeError: 'ResultSet' object has no attribute 'encode'
How I can resolve this?
I'm sorry for my english but I'm italian :)
回答1:
You're on Windows and trying to print to the console. The print()
is throwing the exception.
The Windows console only natively supports 8bit code pages, so anything outside of your region will break (despite what people say about chcp 65001
).
You need to install and use https://github.com/Drekin/win-unicode-console. This module talks at a low-level to the console API, giving support for multi-byte characters.
Alternatively, don't print to the console and write your output to a file, opened with an encoding. For example:
with open("myoutput.log", "w", encoding="utf-8") as my_log:
my_log.write(body)
来源:https://stackoverflow.com/questions/36086399/beautifulsoup-encodeutf-8