Wait until page is loaded with Selenium WebDriver for Python

后端 未结 12 848
借酒劲吻你
借酒劲吻你 2020-11-22 00:26

I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.

for i in range(100):
    driver.execute_script(\"w         


        
相关标签:
12条回答
  • 2020-11-22 00:33

    From selenium/webdriver/support/wait.py

    driver = ...
    from selenium.webdriver.support.wait import WebDriverWait
    element = WebDriverWait(driver, 10).until(
        lambda x: x.find_element_by_id("someId"))
    
    0 讨论(0)
  • 2020-11-22 00:40

    Trying to pass find_element_by_id to the constructor for presence_of_element_located (as shown in the accepted answer) caused NoSuchElementException to be raised. I had to use the syntax in fragles' comment:

    from selenium import webdriver
    from selenium.common.exceptions import TimeoutException
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    
    driver = webdriver.Firefox()
    driver.get('url')
    timeout = 5
    try:
        element_present = EC.presence_of_element_located((By.ID, 'element_id'))
        WebDriverWait(driver, timeout).until(element_present)
    except TimeoutException:
        print "Timed out waiting for page to load"
    

    This matches the example in the documentation. Here is a link to the documentation for By.

    0 讨论(0)
  • 2020-11-22 00:43

    Solution for ajax pages that continuously load data. The previews methods stated do not work. What we can do instead is grab the page dom and hash it and compare old and new hash values together over a delta time.

    import time
    from selenium import webdriver
    
    def page_has_loaded(driver, sleep_time = 2):
        '''
        Waits for page to completely load by comparing current page hash values.
        '''
    
        def get_page_hash(driver):
            '''
            Returns html dom hash
            '''
            # can find element by either 'html' tag or by the html 'root' id
            dom = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
            # dom = driver.find_element_by_id('root').get_attribute('innerHTML')
            dom_hash = hash(dom.encode('utf-8'))
            return dom_hash
    
        page_hash = 'empty'
        page_hash_new = ''
        
        # comparing old and new page DOM hash together to verify the page is fully loaded
        while page_hash != page_hash_new: 
            page_hash = get_page_hash(driver)
            time.sleep(sleep_time)
            page_hash_new = get_page_hash(driver)
            print('<page_has_loaded> - page not loaded')
    
        print('<page_has_loaded> - page loaded: {}'.format(driver.current_url))
    
    0 讨论(0)
  • 2020-11-22 00:45

    The webdriver will wait for a page to load by default via .get() method.

    As you may be looking for some specific element as @user227215 said, you should use WebDriverWait to wait for an element located in your page:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    from selenium.common.exceptions import TimeoutException
    
    browser = webdriver.Firefox()
    browser.get("url")
    delay = 3 # seconds
    try:
        myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))
        print "Page is ready!"
    except TimeoutException:
        print "Loading took too much time!"
    

    I have used it for checking alerts. You can use any other type methods to find the locator.

    EDIT 1:

    I should mention that the webdriver will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module named expected_conditions.

    0 讨论(0)
  • 2020-11-22 00:46

    As mentioned in the answer from David Cullen, I've always seen recommendations to use a line like the following one:

    element_present = EC.presence_of_element_located((By.ID, 'element_id'))
    WebDriverWait(driver, timeout).until(element_present)
    

    It was difficult for me to find somewhere all the possible locators that can be used with the By, so I thought it would be useful to provide the list here. According to Web Scraping with Python by Ryan Mitchell:

    ID

    Used in the example; finds elements by their HTML id attribute

    CLASS_NAME

    Used to find elements by their HTML class attribute. Why is this function CLASS_NAME not simply CLASS? Using the form object.CLASS would create problems for Selenium's Java library, where .class is a reserved method. In order to keep the Selenium syntax consistent between different languages, CLASS_NAME was used instead.

    CSS_SELECTOR

    Finds elements by their class, id, or tag name, using the #idName, .className, tagName convention.

    LINK_TEXT

    Finds HTML tags by the text they contain. For example, a link that says "Next" can be selected using (By.LINK_TEXT, "Next").

    PARTIAL_LINK_TEXT

    Similar to LINK_TEXT, but matches on a partial string.

    NAME

    Finds HTML tags by their name attribute. This is handy for HTML forms.

    TAG_NAME

    Finds HTML tags by their tag name.

    XPATH

    Uses an XPath expression ... to select matching elements.

    0 讨论(0)
  • 2020-11-22 00:48

    use this in code :

    from selenium import webdriver
    
    driver = webdriver.Firefox() # or Chrome()
    driver.implicitly_wait(10) # seconds
    driver.get("http://www.......")
    

    or you can use this code if you are looking for a specific tag :

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Firefox() #or Chrome()
    driver.get("http://www.......")
    try:
        element = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.ID, "tag_id"))
        )
    finally:
        driver.quit()
    
    0 讨论(0)
提交回复
热议问题