问题
I am using python 3.8.2 and bs4 BeautifulSoup. I am trying to find all instances of a tag and have each one listed in the result set, one per row. However the result set that is returned contains more lines than the original scrape of the website. This is because the first row of the result set contains all instances of the tag. The following row contains all instances except the first instance, the third contains all instances except the first and the second and so on and so forth with the remainder of the result set.
Here is the code:
from bs4 import BeautifulSoup
import requests
url = "https://www.sainsburys.co.uk/shop/gb/groceries/drinks/seeall"
html_content = requests.get(url, timeout=5)
soup = BeautifulSoup(html_content.text)
test_1 = soup.find('ul',{"class": "productLister gridView"})
test = test_1.find_all("li", attrs={"class": "gridItem"})
How do I get it so that each instance of <li class: "gridItem">
is only listed by itself, one per row.
Thanks
回答1:
The website is loaded with JavaScript
event which render it's data dynamically once the page loads.
requests
library will not be able to render JavaScript
on the fly. so you can use selenium
or requests_html
. and indeed there's a lot of modules which can do that.
Now, we do have another option on the table, to track from where the data is rendered. I were able to locate the XHR request which is used to retrieve the data from the back-end
API
and render it to the users side.
You can get the
XHR
request by open Developer-Tools and check Network and checkXHR/JS
requests made depending of the type of call such asfetch
Below you can achieve your goal:
Note the following:
website
holding 3068item
- I've increased the items per page to be
120
usingparameter
"pageSize": "120"
- So
3068 / 120
= let's say26
, Which means 120 item per page for 26 pages. - So you will need to loop from
(0, 3120, 120)
which means0 > 120 > 240
and so on, Using parameter"beginIndex": "0"
which you will increment underfor
loop.
Below you can achieve your goal, since you didn't provided us your end goal. but i believe your target is name
or price
(url, img) or whatever. you will find it.
import requests
from bs4 import BeautifulSoup
params = {
"langId": "44",
"storeId": "10151",
"catalogId": "10241",
"categoryId": "12192",
"parent_category_rn": "",
"top_category": "12192",
"pageSize": "120",
"orderBy": "FAVOURITES_FIRST",
"searchTerm": "",
"catSeeAll": "true",
"beginIndex": "0",
"categoryFacetId1": "12192",
"categoryFacetId2": "",
"requesttype": "ajax"
}
def main(url):
with requests.Session() as req:
r = req.post(url, params=params).json()
for item in r[5]['productLists']:
for nest in item['products']:
soup = BeautifulSoup(nest['result'], 'html.parser')
target = soup.find("div", class_="productNameAndPromotions")
name = target.h3.a.text.strip()
url = target.h3.a.get("href")
img = f"https"+target.h3.a.img.get("src")
price = soup.find(
"p", class_="pricePerUnit").get_text(strip=True)
print(name, price, img, url)
main("https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/gb/groceries/drinks/AjaxApplyFilterSearchResultView")
Brief output for name and price:
Sainsbury's British Semi Skimmed Milk 2.27L (4 pint) £1.10/unit
Sainsbury's British Semi Skimmed Milk 1.13L (2 pint) 80p/unit
Sainsbury's British Whole Milk 2.27L (4 pint) £1.10/unit
Cravendale Purefilter Semi Skimmed Milk 2L £1.90/unit
Sainsbury's British Skimmed Milk 2.27L (4 pint) £1.10/unit
Sainsbury's British Semi Skimmed Milk, SO Organic 2.27L (4 pint) £1.80/unit
Sainsbury's Sparkling Water, Basics 2L 25p/unit
Sainsbury's British Skimmed Milk 1.13L (2 pint) 80p/unit
Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1L £1.30/unit
Sainsbury's Water, Basics 2L 25p/unit
Sainsbury's British Whole Milk 1.13L (2 pint) 80p/unit
Sainsbury's Smooth Pure Orange Juice 1L 95p/unit
Pepsi Max 2L £1.90/unit
Sainsbury's Caledonian Still Water 4x2L £1.50/unit
Highland Spring Still Water 12x500ml £3.00/unit
Sainsbury's 100% Pressed Apple Juice, Not From Concentrate 1L £1.30/unit
Sainsbury's British Whole Milk, SO Organic 2.27L (4 Pint) £1.80/unit
Lactofree Semi Skimmed Lactose Free Fresh Dairy Drink 1L £1.50/unit
Diet Coke 8x330ml £4.00/unit
Alpro Roasted Almond Unsweetened UHT Drink 1L £1.80/unit
Robinsons Orange Squash No Added Sugar 1L £1.65/unit
Sainsbury's Soda Water 1L 60p/unit
Sainsbury's Caledonian Sparkling Water 4x2L £1.60/unit
Tropicana Smooth Orange Juice 950ml £2.45/unit
Sainsbury's Diet Indian Tonic Water 1L 60p/unit
Sainsbury's Pure Apple Juice 1L 95p/unit
Robinsons Apple & Blackcurrant Squash No Added Sugar 1L £1.65/unit
Sainsbury's Sparkling Flavoured Water, Lemon & Lime 1L 50p/unit
Sainsbury's Conegliano Prosecco, Taste the Difference 75cl £8.00/unit
Sainsbury's Unsweetened Soya Drink 1L 90p/unit
Sainsbury's British Semi Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Caledonian Sparkling Water 6x500ml £1.50/unit
Sainsbury's Apple & Blackcurrant Squash, No Added Sugar 1.5L £1.00/unit
Highland Spring Still Water 6x1.5L £3.00/unit
Alpro Roasted Almond Unsweetened Fresh Drink 1L £1.85/unit
Sainsbury's Semi Skimmed Long Life Milk 1L 90p/unit
Tropicana Smooth Orange Juice 1.6L £2.50/unit
Sainsbury's 100% Pure Squeezed Orange Juice with Bits, Not From Concentrate 1L £1.30/unit
Cravendale Purefilter Semi Skimmed Milk 1L £1.15/unit
Sainsbury's Caledonian Still Water Sports Cap 6x500ml £1.50/unit
Sainsbury's Double Strength Orange Squash, No Added Sugar 1.5L £1.00/unit
Diet Coke 18x330ml £7.00/unit
Sainsbury's Indian Tonic Water 1L 60p/unit
Sainsbury's Pure Orange Juice 1L 85p/unit
Sainsbury's Pure Apple Juice 6x200ml £1.50/unit
Buxton Still Natural Mineral Water 8x500ml £2.00/unit
Sainsbury's Whole Long Life Milk 1L £1.05/unit
Cravendale Purefilter Skimmed Milk 2L £1.90/unit
Sainsbury's Sparkling Flavoured Water, Blackcurrant & Cherry 1L 50p/unit
Innocent Smooth Orange Juice 1.35L £3.00/unit
Alpro Original Soya Fresh Drink 1L £1.55/unit
Sainsbury's Still Flavoured Water, Strawberry & Kiwi 1L 50p/unit
Sainsbury's British Filtered Semi Skimmed Milk 2L £1.35/unit
Sainsbury's Sparkling Flavoured Water, Mango & Passionfruit 1L 50p/unit
Sainsbury's Caledonian Still Water 5L £1.10/unit
McGuigan Estate Merlot 75cl £5.10/unit
Schweppes Slimline Tonic Water 1L £1.50/unit
PG tips Pyramid Tea Bags x240 696g £4.50/unit
Sainsbury's Sparkling Flavoured Water, Strawberry & Kiwi 1L 50p/unit
Sainsbury's Caledonian Sparkling Water 2L 55p/unit
Sainsbury's Sweetened Soya Drink 1L 90p/unit
Sainsbury's 100% Pure Squeezed Smooth Orange Juice, Not From Concentrate 1.75L £2.10/unit
Sainsbury's Diet Lemonade 2L 60p/unit
Sainsbury's Apple & Mango Juice, Not From Concentrate 1L £1.30/unit
Robinsons Summer Fruits Squash No Added Sugar 1L £1.65/unit
Sainsbury's 100% Pure Squeezed Pineapple Juice, Not From Concentrate 1L £1.30/unit
Clearsprings Sauvignon Blanc 75cl £5.50/unit
Phantom River Sauvignon Blanc 75cl £5.00/unit
Nestle Pure Life Still Spring Water 12x500ml £2.50/unit
Buxton Sparkling Natural Mineral Water 8x500ml £2.10/unit
Brancott Estate Sauvignon Blanc 75cl £6.75/unit
Schweppes Slimline Lemonade 2L £1.30/unit
McGuigan Estate South Australian Shiraz 75cl £5.10/unit
Coca-Cola Zero Sugar 8x330ml £4.00/unit
Villa Maria Private Bin Sauvignon Blanc 75cl £9.25/unit
Diet Coke Caffeine Free 8x330ml £4.00/unit
Sainsbury's British Skimmed Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Kids Caledonian Still Water 6x300ml £1.10/unit
Canti Prosecco 75cl £7.50/unit
Oatly Enriched with Calcium Oat UHT Drink 1L £1.50/unit
Sainsbury's Pure Orange Juice 6x200ml £1.50/unit
Sainsbury's Still Flavoured Water, Lemon & Lime 1L 50p/unit
Valdo Prosecco Marca Oro 75cl £8.50/unit
Oyster Bay Sauvignon Blanc 75cl £8.00/unit
Ribena Blackcurrant Squash 850ml £2.30/unit
Volvic Mineral Water 6x1.5L £3.40/unit
Campo Viejo Rioja Tempranillo 75cl £6.75/unit
Nescafé Azera Americano Instant Coffee 100g £4.60/unit
Tropicana Orange Juice Original 950ml £2.45/unit
Sainsbury's Double Strength Orange & Mango Squash, No Added Sugar 1.5L £1.00/unit
Robinsons Lemon Squash No Added Sugar 1L £1.65/unit
Schweppes Lemonade 2L £1.30/unit
Robinsons Orange & Pineapple Squash No Added Sugar 1L £1.65/unit
Sainsbury's Diet Indian Tonic with Lime 1L 60p/unit
St Helen's Farm Semi Skimmed Goats Milk 1L £1.80/unit
Sainsbury's Double Strength Orange, Lemon & Pineapple Squash, No Added Sugar 1.5L £1.00/unit
Sainsbury's Double Strength Summerfruits Squash, No Added Sugar 1.5L £1.00/unit
Alpro Oat UHT Drink 1L £1.80/unit
Innocent Smooth Orange Juice 900ml £1.50/unit
Sainsbury's British Whole Milk, SO Organic 1.13L (2 pint) £1.15/unit
Sainsbury's Skimmed Long Life Milk 1L 80p/unit
Nescafé Gold Blend Instant Coffee 200g £7.00/unit
Highland Spring Still Water Sports Cap 12x330ml £3.00/unit
Sainsbury's Cava Brut 75cl £6.00/unit
Alpro Light Unsweetened Soya Fresh Drink 1L £1.55/unit
Sainsbury's Caledonian Still Water 2L 50p/unit
Koko Coconut UHT Drink 1L £1.50/unit
Sainsbury's House Pinot Grigio 75cl £4.50/unit
Sainsbury's Cola Zero 2L 45p/unit
St Helen's Farm Whole Goats Milk 1L £1.80/unit
Sainsbury's Double Strength Cherries & Berries Squash, No Added Sugar 1.5L £1.00/unit
Sainsbury's Lemonade 2L 60p/unit
Sainsbury's Pure Orange Juice With Bits 1L 85p/unit
Sainsbury's Pinot Grigio, Taste the Difference 75cl £6.00/unit
Schweppes Tonic Water 1L £1.50/unit
Sainsbury's Cranberry Juice Drink 1L 85p/unit
Nescafé Gold Blend Instant Coffee Refill 150g £3.50/unit
Sainsbury's Gold Roast Instant Coffee 200g £3.15/unit
Sainsbury's Pure Orange Juice with Juicy Bits 1L 95p/unit
Edizione 789 Di Mondelli Prosecco 75cl £6.25/unit
来源:https://stackoverflow.com/questions/61044616/using-find-all-function-returns-an-unexpected-result-set