Using findAll in BS4 to create list

问题

I'll start by saying I'm sort of new with Python. I've been working on a Slack bot recently and here's where I'm at so far.

source = requests.get(url).content
soup = BeautifulSoup(source, 'html.parser')
price = soup.findAll("a", {"class":"pricing"})["quantity"]

Here is the HTML code I am trying to scrape.

<a class="pricing" saleprice="240.00" quantity="1" added="2017-01-01"> S </a>
<a class="pricing" saleprice="21.00" quantity="5" added="2017-03-14"> M </a>
<a class="pricing" saleprice="139.00" quantity="19" added="2017-06-21"> L </a>

When I only use soup.find(), I'm able to find the first quantity value but I need all of them within a list. I looked into using a different library like lxml instead of bs4 but didn't have any luck with that either. Any help is really appreciated as I've already spent a long time on this.

回答1:

The findAll method returns a list of bs4 Tag elements, so you can't select attributes directly. However you can select attributes from the items in that iterable with a simple list comprehension.

price = [a.get("quantity") for a in soup.findAll("a", {"class":"pricing"})]

Note that it's best to use get when accessing attributes because it returns None (or you can set a default value) if the key does not exist in the attrs dictionary.

As pointed out by Jon Clements you could filter by 'class' and 'quantity' if you don't want your list to have None items, in case some items have no 'quantity' attribute.

price = [a["quantity"] for a in soup.find_all("a", {"class":"pricing", "quantity":True})]

来源：https://stackoverflow.com/questions/45410774/using-findall-in-bs4-to-create-list

标签

python

beautifulsoup

bs4