There are two values that i am looking to scrape from a website. These are present in the following tags:
4.1
&
Probably there is a better way, but it is eluding me at present. It can be done with css selectors like this:
html = '''<span class="sp starBig">4.1</span>
<span class="sp starGryB">2.9</span>
<span class="sp starBig">22</span>'''
soup = bs4.BeautifulSoup(html)
selectors = ['span.sp.starBig', 'span.sp.starGryB']
result = []
for s in selectors:
result.extend(soup.select(s))
As per the docs, assuming Beautiful Soup 4, matching for multiple CSS classes with strings like 'sp starGryB'
is brittle and should not be done:
soup.find_all('span', {'class': 'sp starGryB'})
# [<span class="sp starGryB">2.9</span>]
soup.find_all('span', {'class': 'starGryB sp'})
# []
CSS selectors should be used instead, like so:
soup.select('span.sp.starGryB')
# [<span class="sp starGryB">2.9</span>]
soup.select('span.starGryB.sp')
# [<span class="sp starGryB">2.9</span>]
In your case:
items = soup.select('span.sp.starGryB') + soup.select('span.sp.starBig')
or something more sophisticated like:
items = [i for s in ['span.sp.starGryB', 'span.sp.starBig'] for i in soup.select(s)]