beautifulsoup find_all bug?

杀马特。学长 韩版系。学妹 提交于 2019-12-04 05:03:28

问题


Nowadays I am using beautiful soup to parse the html page. But sometimes the result I got by find_all is less than the number in pages. For example, this page http://www.totallyfreestuff.com/index.asp?m=0&sb=1&p=5 has 18 headline span. But when i use the following codes, it just got two! Can anybody tell me why. Thank you in advance!

soup = BeautifulSoup(page, 'html.parser')
hrefDivList = soup.find_all("span", class_ = "headline")
#print hrefDivList
print len(hrefDivList)

回答1:


You can try using different parser for Beautifulsoup.

import requests
from bs4 import BeautifulSoup

url = "<your url>"
r = requests.get(url)

soup = BeautifulSoup(r.content, 'lxml')
hrefDivList = soup.find_all("span", attrs={"class": "headline"})
print len(hrefDivList)



回答2:


You can try CSS Selectors to make your life easier

hrefDivList = soup.select("span.headline")
#print hrefDivList
print len(hrefDivList)

Or you can directly iterate over every Span text

for every_span in soup.select("span.headline"):
    print(every_span.text)


来源:https://stackoverflow.com/questions/28447522/beautifulsoup-find-all-bug

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!