BeautifulSoup - How to find a specific class name alone

前端 未结 3 547
闹比i
闹比i 2021-02-10 06:31

How to find the li tags with a specific class name but not others? For example:

...
  • no wanted
  • not his
  • 3条回答
    •  傲寒
      傲寒 (楼主)
      2021-02-10 07:04

      You can use CSS selectors to match the exact class name.

      html = '''
    • no wanted
    • not his one
    • neither this one
    • neither this one
    • neither this one
    • I WANT THIS ONLY ONE
    • ''' soup = BeautifulSoup(html, 'lxml') tags = soup.select('li[class="z"]') print(tags)

      The same result can be achieved using lambda:

      tags = soup.find_all(lambda tag: tag.name == 'li' and tag.get('class') == ['z'])
      

      Output:

      [
    • I WANT THIS ONLY ONE
    • ]

      Have a look at Multi-valued attributes. You'll understand why class_='z' matches all the tags that have z in their class name.

      HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is class (that is, a tag can have more than one CSS class). Others include rel, rev, accept-charset, headers, and accesskey. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:

      css_soup = BeautifulSoup('

      ') css_soup.p['class'] # ["body"] css_soup = BeautifulSoup('

      ') css_soup.p['class'] # ["body", "strikeout"]

    提交回复
    热议问题