问题
I'm trying to get beautifulsoup working with a URL, like the following:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html.encode("utf-8"), "html.parser")
print(soup.find_all('a'))
However, I am getting a error:
File "c:\Python3\ProxyList.py", line 3, in <module>
html = urlopen("http://proxies.org").encode("utf-8")
AttributeError: 'HTTPResponse' object has no attribute 'encode'
Any idea why? Could it be to do with the urlopen function? Why is it needing the utf-8?
There clearly seems to be some differences with Python 3 and BeautifulSoup4, regarding the examples that are given (which seem to be out of date or wrong now)...
回答1:
It's not working because urlopen
returns a HTTPResponse object and you were treating that as straight HTML. You need to chain the .read()
method on the response in order to get the HTML:
response = urlopen("http://proxies.org")
html = response.read()
soup = BeautifulSoup(html.decode("utf-8"), "html.parser")
print (soup.find_all('a'))
You probably also want to use html.decode("utf-8")
rather than html.encode("utf-8")
.
回答2:
Check this one.
soup = BeautifulSoup(html.read().encode('utf-8'),"html.parser")
回答3:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://proxies.org")
soup = BeautifulSoup(html, "html.parser")
print(soup.find_all('a'))
- First,
urlopen
will return a file-like object BeautifulSoup
can accept file-like object and decode it automatically, you should not worry about it.
Document:
To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("index.html"))
soup = BeautifulSoup("<html>data</html>")
First, the document is converted to Unicode, and HTML entities are converted to Unicode characters
来源:https://stackoverflow.com/questions/41925548/beautifulsoup-httpresponse-has-no-attribute-encode