I have succeeded in getting Python with Selenium and PhantomJS to reload a dynamically loading infinite scrolling page, like in the example below. But how could this be modi
You can check whether the scroll did anything in every step.
lastHeight = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(pause)
newHeight = driver.execute_script("return document.body.scrollHeight")
if newHeight == lastHeight:
break
lastHeight = newHeight
This uses a static wait amount which is bad because you don't want to wait unnecessary when it finishes faster and you don't want that the script exits prematurely when the dynamic load is too slow for some reason.
Since a page usually loads some more elements into a list, you can check the length of the list before the load and wait until the next element is loaded.
For twitter this could look like this:
while True:
elemsCount = browser.execute_script("return document.querySelectorAll('.stream-items > li.stream-item').length")
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
WebDriverWait(browser, 20).until(
lambda x: x.find_element_by_xpath(
"//*[contains(@class,'stream-items')]/li[contains(@class,'stream-item')]["+str(elemsCount+1)+"]"))
except:
break
I used an XPath expression, because PhantomJS 1.x has a bug sometimes when using :nth-child()
CSS selectors.
Full version for reference.