How to extract img src from web page via lxml in beautifulsoup using python?

后端未结

关注

 2  2046

I am new in python and I am working on web scraping project from amazon and I have a problem on how to extract the product img src from product page via lxml using BeautifulSoup

相关标签:

2条回答

無奈伤痛

2021-01-22 14:05

The image you like to grab from that page is available in the value of this key data-a-dynamic-image. There are multiple images with different sizes in there. All you need to do now is create a conditional statement to isolate that image containing 395.

import json
import requests
from bs4 import BeautifulSoup

url = 'https://www.amazon.com/crocs-Unisex-Classic-Black-Women/dp/B0014C0LSY/ref=sr_1_2?_encoding=UTF8&qid=1560091629&s=fashion-womens-intl-ship&sr=1-2&th=1&psc=1'

r = requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
s = BeautifulSoup(r.text, "lxml")
img = s.find(id="landingImage")['data-a-dynamic-image']
img = json.loads(img)
for k,v in img.items():
    if '395' in k:
        print(k)

Output:

https://images-na.ssl-images-amazon.com/images/I/71oNMAAC7sL._UX395_.jpg

In that case try like this and pick the one suits your need:

for k,v in img.items():
    print(k)

0 讨论(0)

走了就别回头了

2021-01-22 14:08

What you are seeing there is the base64 encoding of the image. What you do with it depends on what you're doing with image URLs.

0 讨论(0)
发布评论:

提交评论
- 加载中...