How to handle lazy-loaded images in selenium?

后端 未结 1 421
灰色年华
灰色年华 2021-01-27 14:26

Before marking as duplicate, please consider that I have already looked through many related stack overflow posts, as well as websites and articles. I have not found a solution

相关标签:
1条回答
  • 2021-01-27 14:32

    Your images will only load when they're scrolled into view. It's such a common requirement that the Selenium Python docs have it in their FAQ. Adapting from this answer, the below script will scroll down the page before scraping the images.

        driver.get("https://www.grailed.com/categories/footwear")
    
        SCROLL_PAUSE_TIME = 0.5
        i = 0
        last_height = driver.execute_script("return document.body.scrollHeight")
        while True:
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(SCROLL_PAUSE_TIME)
            new_height = driver.execute_script("return document.body.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height
            i += 1
            if i == 5:
                break
    
        driver.implicitly_wait(10)
        shoe_images = driver.find_elements(By.CSS_SELECTOR, 'div.listing-cover-photo img')
    
        print(len(shoe_images))
    

    In the interest of not scrolling through shoes (seemingly) forever, I have added in a break after 5 iterations, however, you're free to remove the i variable and it will scroll down for as long as it can.

    The implicit wait is there to allow catchup for any remaining images that are still loading in.

    A test run yielded 82 images, I confirmed that it had scraped all on the page by using Chrome's DevTools selector which highlighted 82. You'll see a different number based on how many images you allow to load.

    0 讨论(0)
提交回复
热议问题