How to Beautiful Soup (bs4) match just one, and only one, css class

后端 未结 7 613
说谎
说谎 2020-12-10 20:28

I am using following code to match all div that have CSS class \"ad_item\".

soup.find_all(\'div\',class_=\"ad_item\")

problem that I have i

相关标签:
7条回答
  • 2020-12-10 20:34

    You can always write a Python function that matches the tag you want, and pass that function into find_all():

    def match(tag):
        return (
            tag.name == 'div'
            and 'ad_item' in tag.get('class')
            and 'ad_ex_item' not in tag.get('class'))
    
    soup.find_all(match)
    
    0 讨论(0)
  • 2020-12-10 20:34
    soup.fetch('div',{'class':'add_item'})
    
    0 讨论(0)
  • 2020-12-10 20:39

    You can use strict conditions like this:

    soup.select("div[class='ad_item']")
    

    That catch div with exact class. In this case with only 'ad_item' and no others joined by spaces classes.

    0 讨论(0)
  • 2020-12-10 20:47

    I have found one solution, although it have nothing to do with BS4, it is pure python code.

    for item in soup.find_all('div',class_="ad_item"):
         if len(item["class"]) != 1:
             continue;
    

    It basically skip item, if there is more than one CSS class.

    0 讨论(0)
  • 2020-12-10 20:50

    The top answer is correct but if you want a way to keep the for loop clean or like one line solutions then use the list comprehension below.

    data = [item for item in soup.find_all("div", class_="ad_item") if len(item["class"]) == 1] 
    
    0 讨论(0)
  • 2020-12-10 20:58

    Did you try to use select : http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

    soup.select(".add_item")
    

    Unfortunately, it seems that the :not selector of CSS3 is not supported. If you really need this, you may have to look at lxml. It seems to support it. see http://packages.python.org/cssselect/#supported-selectors

    0 讨论(0)
提交回复
热议问题