BeautifulSoup “encode(”utf-8")

穿精又带淫゛_ 提交于 2020-01-02 10:04:12

问题


from bs4 import BeautifulSoup   
import urllib.request    

link = ('https://mywebsite.org')  
req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0'})
url = urllib.request.urlopen(req).read()

soup =  BeautifulSoup(url, "html.parser")  
body = soup.find_all('div', {"class":"wrapper"})

print(body)

Hi guys, I have a problem with this code. If I run it it come the error

UnicodeEncodeError: 'charmap' codec can't encode character '\u2022' in position 138: character maps to

I tryed to search and I found that I had to add

.encode("utf-8")

but if I add it come the error

AttributeError: 'ResultSet' object has no attribute 'encode'

How I can resolve this?

I'm sorry for my english but I'm italian :)


回答1:


You're on Windows and trying to print to the console. The print() is throwing the exception.

The Windows console only natively supports 8bit code pages, so anything outside of your region will break (despite what people say about chcp 65001).

You need to install and use https://github.com/Drekin/win-unicode-console. This module talks at a low-level to the console API, giving support for multi-byte characters.

Alternatively, don't print to the console and write your output to a file, opened with an encoding. For example:

with open("myoutput.log", "w", encoding="utf-8") as my_log:
    my_log.write(body)


来源:https://stackoverflow.com/questions/36086399/beautifulsoup-encodeutf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!