Searching on class tags with multiple spaces and wildcards with BeautifulSoup

烈酒焚心 提交于 2019-12-07 17:40:34

问题


I am trying to use BeautifulSoup to find all div containers with the class attribute beginning by "foo bar". I had hoped the following would work:

from bs4 import BeautifulSoup

import re

soup.findAll('div',class_=re.compile('^foo bar'))

However, it seems that the class definition is separated into a list, like ['foo','bar'], such that regular expressions are not able to accomplish my task. Is there a way I can accomplish this task? (I have reviewed a number of other posts, but have not found a working solution)


回答1:


You can use a syntax with a function that needs to return True or False, a lambda can do the trick too:

from bs4 import BeautifulSoup as soup
html = '''
<div class="foo bar bing"></div>
<div class="foo bang"></div>
<div class="foo bar1 bang"></div>
'''
soup = soup(html, 'lxml')
res = soup.find_all('div', class_=lambda s:s.startswith('foo bar '))
print(res)
>>> [<div class="foo bar bing"></div>]

res = soup.find_all('div', class_=lambda s:s.startswith('foo bar')) # without space
print(res)
>>> [<div class="foo bar bing"></div>, <div class="foo bar1 bang"></div>]

Another possible syntax with a function :

def is_a_match(clas):
    return clas.startswith('foo bar')

res = soup.find_all('div', class_=is_a_match)

Maybe this answer can help you too : https://stackoverflow.com/a/46719313/6655211



来源:https://stackoverflow.com/questions/36996722/searching-on-class-tags-with-multiple-spaces-and-wildcards-with-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!