How to extract img src from web page via lxml in beautifulsoup using python?

后端 未结 2 2041
梦如初夏
梦如初夏 2021-01-22 13:20

I am new in python and I am working on web scraping project from amazon and I have a problem on how to extract the product img src from product page via lxml using BeautifulSoup

相关标签:
2条回答
  • 2021-01-22 14:05

    The image you like to grab from that page is available in the value of this key data-a-dynamic-image. There are multiple images with different sizes in there. All you need to do now is create a conditional statement to isolate that image containing 395.

    import json
    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://www.amazon.com/crocs-Unisex-Classic-Black-Women/dp/B0014C0LSY/ref=sr_1_2?_encoding=UTF8&qid=1560091629&s=fashion-womens-intl-ship&sr=1-2&th=1&psc=1'
    
    r = requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
    s = BeautifulSoup(r.text, "lxml")
    img = s.find(id="landingImage")['data-a-dynamic-image']
    img = json.loads(img)
    for k,v in img.items():
        if '395' in k:
            print(k)
    

    Output:

    https://images-na.ssl-images-amazon.com/images/I/71oNMAAC7sL._UX395_.jpg
    

    In that case try like this and pick the one suits your need:

    for k,v in img.items():
        print(k)
    
    0 讨论(0)
  • 2021-01-22 14:08

    What you are seeing there is the base64 encoding of the image. What you do with it depends on what you're doing with image URLs.

    0 讨论(0)
提交回复
热议问题