BeautifulSoup - How to find a specific class name alone

前端 未结 3 548
闹比i
闹比i 2021-02-10 06:31

How to find the li tags with a specific class name but not others? For example:

...
  • no wanted
  • not his
  • 相关标签:
    3条回答
    • 2021-02-10 06:53

      Possibly with a filter function as in the doc

      def is_only_z(css_class):
          return css_class is not None and css_class == 'z'
      
      bs4.find_all('li',class_=is_only_z)
      
      0 讨论(0)
    • 2021-02-10 07:04

      You can use CSS selectors to match the exact class name.

      html = '''<li> no wanted </li>
      <li class="a"> not his one </li>
      <li class="a z"> neither this one </li>
      <li class="b z"> neither this one </li>
      <li class="c z"> neither this one </li>
      <li class="z"> I WANT THIS ONLY ONE</li>'''
      
      soup = BeautifulSoup(html, 'lxml')
      
      tags = soup.select('li[class="z"]')
      print(tags)
      

      The same result can be achieved using lambda:

      tags = soup.find_all(lambda tag: tag.name == 'li' and tag.get('class') == ['z'])
      

      Output:

      [<li class="z"> I WANT THIS ONLY ONE</li>]
      

      Have a look at Multi-valued attributes. You'll understand why class_='z' matches all the tags that have z in their class name.

      HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is class (that is, a tag can have more than one CSS class). Others include rel, rev, accept-charset, headers, and accesskey. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:

      css_soup = BeautifulSoup('<p class="body"></p>')
      css_soup.p['class']
      # ["body"]
      
      css_soup = BeautifulSoup('<p class="body strikeout"></p>')
      css_soup.p['class']
      # ["body", "strikeout"]
      
      0 讨论(0)
    • 2021-02-10 07:06

      You can simply do:

      data = soup.find_all('li',{'class':'z'})
      print(data)
      

      If you only want to get text:

      for a in data:
         print(a.text)
      
      0 讨论(0)
    提交回复
    热议问题