I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.
for i in range(100):
driver.execute_script(\"w
On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)
def scrollDown(driver, value):
driver.execute_script("window.scrollBy(0,"+str(value)+")")
# Scroll down the page
def scrollDownAllTheWay(driver):
old_page = driver.page_source
while True:
logging.debug("Scrolling loop")
for i in range(2):
scrollDown(driver, 500)
time.sleep(2)
new_page = driver.page_source
if new_page != old_page:
old_page = new_page
else:
break
return True
Here I did it using a rather simple form:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("url")
searchTxt=''
while not searchTxt:
try:
searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
searchTxt.send_keys("USERNAME")
except:continue
Find below 3 methods:
Checking page readyState (not reliable):
def page_has_loaded(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
page_state = self.driver.execute_script('return document.readyState;')
return page_state == 'complete'
The
wait_for
helper function is good, but unfortunatelyclick_through_to_new_page
is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, andpage_has_loaded
just returns true straight away.
id
Comparing new page ids with the old one:
def page_has_loaded_id(self):
self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
try:
new_page = browser.find_element_by_tag_name('html')
return new_page.id != old_page.id
except NoSuchElementException:
return False
It's possible that comparing ids is not as effective as waiting for stale reference exceptions.
staleness_of
Using staleness_of
method:
@contextlib.contextmanager
def wait_for_page_load(self, timeout=10):
self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
old_page = self.find_element_by_tag_name('html')
yield
WebDriverWait(self, timeout).until(staleness_of(old_page))
For more details, check Harry's blog.
You can do that very simple by this function:
def page_is_loading(driver):
while True:
x = driver.execute_script("return document.readyState")
if x == "complete":
return True
else:
yield False
and when you want do something after page loading complete,you can use:
Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
Driver.get("https://www.google.com/")
while not page_is_loading(Driver):
continue
Driver.execute_script("alert('page is loaded')")
Have you tried driver.implicitly_wait
. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.
driver = webdriver.Chrome()
driver.implicitly_wait(10)
So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I've used this in similar scroll-down scenarios so I don't see why it wouldn't work in your case. Hope this is helpful.
To be able to fix this answer, I have to add new text. Be sure to use a lower case 'w' in implicitly_wait
.
How about putting WebDriverWait in While loop and catching the exceptions.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Firefox()
browser.get("url")
delay = 3 # seconds
while True:
try:
WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
print "Page is ready!"
break # it will break from the loop once the specific element will be present.
except TimeoutException:
print "Loading took too much time!-Try again"