StaleElementException when iterating with Python

前端 未结 2 1790
粉色の甜心
粉色の甜心 2020-11-22 05:00

I\'m trying to create a basic web scraper for Amazon results. As I\'m iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a

2条回答
  •  抹茶落季
    2020-11-22 05:36

    This error message...

    StaleElementReferenceException: Message: The element reference of  is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
    

    ...implies that the previous reference of the element is now stale and the element reference is no longer present on the DOM of the page.

    The common reasons behind this this issue are:

    • The element have changed position within the HTML.
    • The element is no longer attached to the DOM TREE.
    • The webpage on which the element was part of has been refreshed.
    • The previous instance of element has been refreshed by a JavaScript or an AjaxCall.

    This usecase

    Preserving your concept of scrolling through scrollIntoView() and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:

    • Code Block:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC
      
      options = Options()
      options.add_argument("start-maximized")
      options.add_argument('disable-infobars')
      options.add_argument("--disable-extensions")
      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
      while True:
          try:
              current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
              driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
              current_page_number = current_page_number_element.get_attribute("innerHTML")
              WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
              print("page # {} : going to next page".format(current_page_number))
          except:
              print("page # {} : error, no more pages".format(current_page_number))
              break
      driver.quit()
      
    • Console Output:

      page # 1 : going to next page
      page # 2 : going to next page
      page # 3 : going to next page
      page # 4 : going to next page
      page # 5 : going to next page
      page # 6 : going to next page
      page # 7 : going to next page
      page # 8 : going to next page
      page # 9 : going to next page
      page # 10 : going to next page
      page # 11 : going to next page
      page # 12 : going to next page
      page # 13 : going to next page
      page # 14 : going to next page
      page # 15 : going to next page
      page # 16 : going to next page
      page # 17 : going to next page
      page # 18 : going to next page
      page # 19 : going to next page
      page # 20 : error, no more pages
      

提交回复
热议问题