Wait until page is loaded with Selenium WebDriver for Python

后端 未结 12 863
借酒劲吻你
借酒劲吻你 2020-11-22 00:26

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.

for i in range(100):
    driver.execute_script(\"w         


        
相关标签:
12条回答
  • 2020-11-22 00:50

    On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)

    def scrollDown(driver, value):
        driver.execute_script("window.scrollBy(0,"+str(value)+")")
    
    # Scroll down the page
    def scrollDownAllTheWay(driver):
        old_page = driver.page_source
        while True:
            logging.debug("Scrolling loop")
            for i in range(2):
                scrollDown(driver, 500)
                time.sleep(2)
            new_page = driver.page_source
            if new_page != old_page:
                old_page = new_page
            else:
                break
        return True
    
    0 讨论(0)
  • 2020-11-22 00:50

    Here I did it using a rather simple form:

    from selenium import webdriver
    browser = webdriver.Firefox()
    browser.get("url")
    searchTxt=''
    while not searchTxt:
        try:    
          searchTxt=browser.find_element_by_name('NAME OF ELEMENT')
          searchTxt.send_keys("USERNAME")
        except:continue
    
    0 讨论(0)
  • 2020-11-22 00:51

    Find below 3 methods:

    readyState

    Checking page readyState (not reliable):

    def page_has_loaded(self):
        self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
        page_state = self.driver.execute_script('return document.readyState;')
        return page_state == 'complete'
    

    The wait_for helper function is good, but unfortunately click_through_to_new_page is open to the race condition where we manage to execute the script in the old page, before the browser has started processing the click, and page_has_loaded just returns true straight away.

    id

    Comparing new page ids with the old one:

    def page_has_loaded_id(self):
        self.log.info("Checking if {} page is loaded.".format(self.driver.current_url))
        try:
            new_page = browser.find_element_by_tag_name('html')
            return new_page.id != old_page.id
        except NoSuchElementException:
            return False
    

    It's possible that comparing ids is not as effective as waiting for stale reference exceptions.

    staleness_of

    Using staleness_of method:

    @contextlib.contextmanager
    def wait_for_page_load(self, timeout=10):
        self.log.debug("Waiting for page to load at {}.".format(self.driver.current_url))
        old_page = self.find_element_by_tag_name('html')
        yield
        WebDriverWait(self, timeout).until(staleness_of(old_page))
    

    For more details, check Harry's blog.

    0 讨论(0)
  • 2020-11-22 00:53

    You can do that very simple by this function:

    def page_is_loading(driver):
        while True:
            x = driver.execute_script("return document.readyState")
            if x == "complete":
                return True
            else:
                yield False
    

    and when you want do something after page loading complete,you can use:

    Driver = webdriver.Firefox(options=Options, executable_path='geckodriver.exe')
    Driver.get("https://www.google.com/")
    
    while not page_is_loading(Driver):
        continue
    
    Driver.execute_script("alert('page is loaded')")
    
    0 讨论(0)
  • 2020-11-22 00:57

    Have you tried driver.implicitly_wait. It is like a setting for the driver, so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed.

    driver = webdriver.Chrome()
    driver.implicitly_wait(10)
    

    So if you set a wait time of 10 seconds it will execute the command as soon as possible, waiting 10 seconds before it gives up. I've used this in similar scroll-down scenarios so I don't see why it wouldn't work in your case. Hope this is helpful.

    To be able to fix this answer, I have to add new text. Be sure to use a lower case 'w' in implicitly_wait.

    0 讨论(0)
  • 2020-11-22 00:57

    How about putting WebDriverWait in While loop and catching the exceptions.

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    
    browser = webdriver.Firefox()
    browser.get("url")
    delay = 3 # seconds
    while True:
        try:
            WebDriverWait(browser, delay).until(EC.presence_of_element_located(browser.find_element_by_id('IdOfMyElement')))
            print "Page is ready!"
            break # it will break from the loop once the specific element will be present. 
        except TimeoutException:
            print "Loading took too much time!-Try again"
    
    0 讨论(0)
提交回复
热议问题