StaleElementException when iterating with Python

前端 未结 2 1789
粉色の甜心
粉色の甜心 2020-11-22 05:00

I\'m trying to create a basic web scraper for Amazon results. As I\'m iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a

相关标签:
2条回答
  • 2020-11-22 05:36

    This error message...

    StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
    

    ...implies that the previous reference of the element is now stale and the element reference is no longer present on the DOM of the page.

    The common reasons behind this this issue are:

    • The element have changed position within the HTML.
    • The element is no longer attached to the DOM TREE.
    • The webpage on which the element was part of has been refreshed.
    • The previous instance of element has been refreshed by a JavaScript or an AjaxCall.

    This usecase

    Preserving your concept of scrolling through scrollIntoView() and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:

    • Code Block:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC
      
      options = Options()
      options.add_argument("start-maximized")
      options.add_argument('disable-infobars')
      options.add_argument("--disable-extensions")
      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
      while True:
          try:
              current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
              driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
              current_page_number = current_page_number_element.get_attribute("innerHTML")
              WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
              print("page # {} : going to next page".format(current_page_number))
          except:
              print("page # {} : error, no more pages".format(current_page_number))
              break
      driver.quit()
      
    • Console Output:

      page # 1 : going to next page
      page # 2 : going to next page
      page # 3 : going to next page
      page # 4 : going to next page
      page # 5 : going to next page
      page # 6 : going to next page
      page # 7 : going to next page
      page # 8 : going to next page
      page # 9 : going to next page
      page # 10 : going to next page
      page # 11 : going to next page
      page # 12 : going to next page
      page # 13 : going to next page
      page # 14 : going to next page
      page # 15 : going to next page
      page # 16 : going to next page
      page # 17 : going to next page
      page # 18 : going to next page
      page # 19 : going to next page
      page # 20 : error, no more pages
      
    0 讨论(0)
  • 2020-11-22 06:00

    If you just want your script to iterate over all the result pages, you don't need any complicated logic - just make a click on Next button while it's possible:

    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.ui import WebDriverWait as wait
    from selenium.common.exceptions import TimeoutException
    
    driver = webdriver.Chrome()
    
    driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')
    
    while True:
        try:
            wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a > span#pagnNextString'))).click()
        except TimeoutException:
            break
    

    P.S. Also note that implicitly_wait(10) should not wait full 10 seconds, but wait up to 10 seconds for element to appear in HTML DOM. So if element is found within 1 or 2 seconds then wait is done and you will not wait rest 8-9 seconds...

    0 讨论(0)
提交回复
热议问题