StaleElementException when iterating with Python

与世无争的帅哥 提交于 2019-11-26 01:28:21

问题


I\'m trying to create a basic web scraper for Amazon results. As I\'m iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a StaleElementException is thrown. When I look at the browser after the exception is thrown, I can see that the driver/page did not scroll down to where the page numbers are (bottom bar).

My code:

driver.get(\'https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush\')

for page in range(1,last_page_number +1):

    driver.implicitly_wait(10)

    bottom_bar = driver.find_element_by_class_name(\'pagnCur\')
    driver.execute_script(\"arguments[0].scrollIntoView(true);\", bottom_bar)

    current_page_number = int(driver.find_element_by_class_name(\'pagnCur\').text)

    if page == current_page_number:
        next_page = driver.find_element_by_xpath(\'//div[@id=\"pagn\"]/span[@class=\"pagnLink\"]/a[text()=\"{0}\"]\'.format(current_page_number+1))
        next_page.click()
        print(\'page #\',page,\': going to next page\')
    else:
        print(\'page #: \', page,\'error\')

I\'ve looked at this question, and I\'m guessing that a similar fix can be applied, but I\'m not sure how to find something on the page that disappears. Also, based on how quickly the print statements are occurring, I can see that the implicitly_wait(10) isn\'t actually waiting a full 10 seconds.

The exception is pointing to the line that starts with \"driver.execute_script\". This is the exception:

StaleElementReferenceException: Message: The element reference of <span class=\"pagnCur\"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

Sometimes I\'ll get a ValueError:

ValueError: invalid literal for int() with base 10: \'\'

So these errors/exceptions lead me to believe that there is something going on with waiting for the page to refresh completely.


回答1:


If you just want your script to iterate over all the result pages, you don't need any complicated logic - just make a click on Next button while it's possible:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException

driver = webdriver.Chrome()

driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')

while True:
    try:
        wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a > span#pagnNextString'))).click()
    except TimeoutException:
        break

P.S. Also note that implicitly_wait(10) should not wait full 10 seconds, but wait up to 10 seconds for element to appear in HTML DOM. So if element is found within 1 or 2 seconds then wait is done and you will not wait rest 8-9 seconds...




回答2:


It seems you were almost there.

Preserving your concept of scrolling through scrollIntoView() and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
    while True:
        try:
            current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
            driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
            current_page_number = current_page_number_element.get_attribute("innerHTML")
            WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
            print("page # {} : going to next page".format(current_page_number))
        except:
            print("page # {} : error, no more pages".format(current_page_number))
            break
    driver.quit()
    
  • Console Output:

    page # 1 : going to next page
    page # 2 : going to next page
    page # 3 : going to next page
    page # 4 : going to next page
    page # 5 : going to next page
    page # 6 : going to next page
    page # 7 : going to next page
    page # 8 : going to next page
    page # 9 : going to next page
    page # 10 : going to next page
    page # 11 : going to next page
    page # 12 : going to next page
    page # 13 : going to next page
    page # 14 : going to next page
    page # 15 : going to next page
    page # 16 : going to next page
    page # 17 : going to next page
    page # 18 : going to next page
    page # 19 : going to next page
    page # 20 : error, no more pages
    


来源:https://stackoverflow.com/questions/53640973/staleelementexception-when-iterating-with-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!