问题
I am extracting data for a research project and I have sucessfully used findAll('div', attrs={'class':'someClassName'})
in many websites but this particular website,
WebSite Link
doesn't return any values when I used attrs option. But when I don't use the attrs option I get entire html dom.
Here is the simple code that I started with to test it out:
soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
print div
回答1:
My code is working fine, with requests
import requests
from BeautifulSoup import BeautifulSoup as bs
#grab HTML
r = requests.get(r'http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585')
html = r.text
#parse the HTML
soup = bs(html)
results= soup.findAll('div', attrs={'class': 'data'})
print results
回答2:
If you or anyone reading this question would like to know the reason that the code wasn't able to find the attrs value using the code you've given (copied below):
soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
print div
The issue is when you attempted to create a BeautifulSoup object soup = bs(urlopen(url))
as the value of urlopen(url)
is a response object and not the DOM.
I'm sure any issues you had encountered could have been more easily resolved by using bs(urlopen(url).read())
instead.
来源:https://stackoverflow.com/questions/11713977/beautifulsoup-unable-to-extract-data-using-attrs-class