问题
I'll start by saying I'm sort of new with Python. I've been working on a Slack bot recently and here's where I'm at so far.
source = requests.get(url).content
soup = BeautifulSoup(source, 'html.parser')
price = soup.findAll("a", {"class":"pricing"})["quantity"]
Here is the HTML code I am trying to scrape.
<a class="pricing" saleprice="240.00" quantity="1" added="2017-01-01"> S </a>
<a class="pricing" saleprice="21.00" quantity="5" added="2017-03-14"> M </a>
<a class="pricing" saleprice="139.00" quantity="19" added="2017-06-21"> L </a>
When I only use soup.find()
, I'm able to find the first quantity value but I need all of them within a list. I looked into using a different library like lxml instead of bs4 but didn't have any luck with that either. Any help is really appreciated as I've already spent a long time on this.
回答1:
The findAll
method returns a list of bs4 Tag
elements, so you can't select attributes directly. However you can select attributes from the items in that iterable with a simple list comprehension.
price = [a.get("quantity") for a in soup.findAll("a", {"class":"pricing"})]
Note that it's best to use get
when accessing attributes because it returns None
(or you can set a default value) if the key does not exist in the attrs
dictionary.
As pointed out by Jon Clements you could filter by 'class' and 'quantity' if you don't want your list to have None
items, in case some items have no 'quantity' attribute.
price = [a["quantity"] for a in soup.find_all("a", {"class":"pricing", "quantity":True})]
来源:https://stackoverflow.com/questions/45410774/using-findall-in-bs4-to-create-list