I am new in python and I am working on web scraping project from amazon and I have a problem on how to extract the product img src from product page via lxml using BeautifulSoup
The image you like to grab from that page is available in the value of this key data-a-dynamic-image
. There are multiple images with different sizes in there. All you need to do now is create a conditional statement to isolate that image containing 395
.
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.amazon.com/crocs-Unisex-Classic-Black-Women/dp/B0014C0LSY/ref=sr_1_2?_encoding=UTF8&qid=1560091629&s=fashion-womens-intl-ship&sr=1-2&th=1&psc=1'
r = requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
s = BeautifulSoup(r.text, "lxml")
img = s.find(id="landingImage")['data-a-dynamic-image']
img = json.loads(img)
for k,v in img.items():
if '395' in k:
print(k)
Output:
https://images-na.ssl-images-amazon.com/images/I/71oNMAAC7sL._UX395_.jpg
In that case try like this and pick the one suits your need:
for k,v in img.items():
print(k)
What you are seeing there is the base64 encoding of the image. What you do with it depends on what you're doing with image URLs.