Nowadays I am using beautiful soup to parse the html page. But sometimes the result I got by find_all is less than the number in pages. For example, this page http://www.totallyfreestuff.com/index.asp?m=0&sb=1&p=5 has 18 headline span. But when i use the following codes, it just got two! Can anybody tell me why. Thank you in advance!
soup = BeautifulSoup(page, 'html.parser')
hrefDivList = soup.find_all("span", class_ = "headline")
#print hrefDivList
print len(hrefDivList)
You can try using different parser for Beautifulsoup.
import requests
from bs4 import BeautifulSoup
url = "<your url>"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
hrefDivList = soup.find_all("span", attrs={"class": "headline"})
print len(hrefDivList)
You can try CSS Selectors to make your life easier
hrefDivList = soup.select("span.headline")
#print hrefDivList
print len(hrefDivList)
Or you can directly iterate over every Span text
for every_span in soup.select("span.headline"):
print(every_span.text)
来源:https://stackoverflow.com/questions/28447522/beautifulsoup-find-all-bug