I am using following code to match all div that have CSS class \"ad_item\".
soup.find_all(\'div\',class_=\"ad_item\")
problem that I have i
You can always write a Python function that matches the tag you want, and pass that function into find_all():
def match(tag):
return (
tag.name == 'div'
and 'ad_item' in tag.get('class')
and 'ad_ex_item' not in tag.get('class'))
soup.find_all(match)
soup.fetch('div',{'class':'add_item'})
You can use strict conditions like this:
soup.select("div[class='ad_item']")
That catch div
with exact class.
In this case with only 'ad_item'
and no others joined by spaces classes.
I have found one solution, although it have nothing to do with BS4, it is pure python code.
for item in soup.find_all('div',class_="ad_item"):
if len(item["class"]) != 1:
continue;
It basically skip item, if there is more than one CSS class.
The top answer is correct but if you want a way to keep the for loop clean or like one line solutions then use the list comprehension below.
data = [item for item in soup.find_all("div", class_="ad_item") if len(item["class"]) == 1]
Did you try to use select
: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors
soup.select(".add_item")
Unfortunately, it seems that the :not
selector of CSS3 is not supported. If you really need this, you may have to look at lxml. It seems to support it. see http://packages.python.org/cssselect/#supported-selectors