I\'m having trouble parsing HTML elements with \"class\" attribute using Beautifulsoup. The code looks like this
soup = BeautifulSoup(sdata)
mydivs = soup.fi
This worked for me:
for div in mydivs:
try:
clazz = div["class"]
except KeyError:
clazz = ""
if (clazz == "stylelistrow"):
print div
The following should work
soup.find('span', attrs={'class':'totalcount'})
replace 'totalcount' with your class name and 'span' with tag you are looking for. Also, if your class contains multiple names with space, just choose one and use.
P.S. This finds the first element with given criteria. If you want to find all elements then replace 'find' with 'find_all'.
You can refine your search to only find those divs with a given class using BS3:
mydivs = soup.findAll("div", {"class": "stylelistrow"})
CSS selectors
single class first match
soup.select_one('.stylelistrow')
list of matches
soup.select('.stylelistrow')
compound class (i.e. AND another class)
soup.select_one('.stylelistrow.otherclassname')
soup.select('.stylelistrow.otherclassname')
Spaces in compound class names e.g. class = stylelistrow otherclassname
are replaced with ".". You can continue to add classes.
list of classes (OR - match whichever present
soup.select_one('.stylelistrow, .otherclassname')
soup.select('.stylelistrow, .otherclassname')
bs4 4.7.1 +
Specific class whose innerText
contains a string
soup.select_one('.stylelistrow:contains("some string")')
soup.select('.stylelistrow:contains("some string")')
Specific class which has a certain child element e.g. a
tag
soup.select_one('.stylelistrow:has(a)')
soup.select('.stylelistrow:has(a)')
A straight forward way would be :
soup = BeautifulSoup(sdata)
for each_div in soup.findAll('div',{'class':'stylelist'}):
print each_div
Make sure you take of the casing of findAll, its not findall
How to find elements by class
I'm having trouble parsing html elements with "class" attribute using Beautifulsoup.
You can easily find by one class, but if you want to find by the intersection of two classes, it's a little more difficult,
From the documentation (emphasis added):
If you want to search for tags that match two or more CSS classes, you should use a CSS selector:
css_soup.select("p.strikeout.body") # [<p class="body strikeout"></p>]
To be clear, this selects only the p tags that are both strikeout and body class.
To find for the intersection of any in a set of classes (not the intersection, but the union), you can give a list to the class_
keyword argument (as of 4.1.2):
soup = BeautifulSoup(sdata)
class_list = ["stylelistrow"] # can add any other classes to this list.
# will find any divs with any names in class_list:
mydivs = soup.find_all('div', class_=class_list)
Also note that findAll has been renamed from the camelCase to the more Pythonic find_all
.