Scroll down to bottom of infinite page with PhantomJS in Python

前端 未结 1 850
南旧
南旧 2020-11-27 15:47

I have succeeded in getting Python with Selenium and PhantomJS to reload a dynamically loading infinite scrolling page, like in the example below. But how could this be modi

相关标签:
1条回答
  • 2020-11-27 16:04

    You can check whether the scroll did anything in every step.

    lastHeight = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(pause)
        newHeight = driver.execute_script("return document.body.scrollHeight")
        if newHeight == lastHeight:
            break
        lastHeight = newHeight
    

    This uses a static wait amount which is bad because you don't want to wait unnecessary when it finishes faster and you don't want that the script exits prematurely when the dynamic load is too slow for some reason.

    Since a page usually loads some more elements into a list, you can check the length of the list before the load and wait until the next element is loaded.

    For twitter this could look like this:

    while True:
        elemsCount = browser.execute_script("return document.querySelectorAll('.stream-items > li.stream-item').length")
    
        browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    
        try:
            WebDriverWait(browser, 20).until(
                lambda x: x.find_element_by_xpath(
                    "//*[contains(@class,'stream-items')]/li[contains(@class,'stream-item')]["+str(elemsCount+1)+"]"))
        except:
            break
    

    I used an XPath expression, because PhantomJS 1.x has a bug sometimes when using :nth-child() CSS selectors.

    Full version for reference.

    0 讨论(0)
提交回复
热议问题