Beautifulsoup and Soupstrainer for getting links dont work with hasattr, returning always true

随声附和 提交于 2019-12-04 19:24:58

hasattr() is the wrong test; it tests if there is a a.href attribute, and BeautifulSoup dynamically turns attributes into searches for children. HTML tag attributes are not translated into Python attributes.

Use dictionary-style testing instead; you loop over all elements which can include the DocType instance, so I use getattr() to not break on objects that don't have attributes:

if 'href' in getattr(link, 'attrs', {}):

You can also instruct SoupStrainer to only match a tags with a href attribute by using href=True as a keyword argument filter (not None just means True in any case):

for link in BeautifulSoup(test.text, parse_only=SoupStrainer('a', href=True)):

This still includes the HTML declaration of course; search for just a links:

soup = BeautifulSoup(test.text, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all('a'):
    print link
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!