I\'m trying to create a basic web scraper for Amazon results. As I\'m iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a
This error message...
StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
...implies that the previous reference of the element is now stale and the element reference is no longer present on the DOM of the page.
The common reasons behind this this issue are:
Preserving your concept of scrolling through scrollIntoView()
and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush")
while True:
try:
current_page_number_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.pagnCur")))
driver.execute_script("arguments[0].scrollIntoView(true);", current_page_number_element)
current_page_number = current_page_number_element.get_attribute("innerHTML")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.pagnNextArrow"))).click()
print("page # {} : going to next page".format(current_page_number))
except:
print("page # {} : error, no more pages".format(current_page_number))
break
driver.quit()
Console Output:
page # 1 : going to next page
page # 2 : going to next page
page # 3 : going to next page
page # 4 : going to next page
page # 5 : going to next page
page # 6 : going to next page
page # 7 : going to next page
page # 8 : going to next page
page # 9 : going to next page
page # 10 : going to next page
page # 11 : going to next page
page # 12 : going to next page
page # 13 : going to next page
page # 14 : going to next page
page # 15 : going to next page
page # 16 : going to next page
page # 17 : going to next page
page # 18 : going to next page
page # 19 : going to next page
page # 20 : error, no more pages
If you just want your script to iterate over all the result pages, you don't need any complicated logic - just make a click on Next button while it's possible:
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')
while True:
try:
wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'a > span#pagnNextString'))).click()
except TimeoutException:
break
P.S. Also note that implicitly_wait(10)
should not wait full 10 seconds, but wait up to 10 seconds for element to appear in HTML DOM. So if element is found within 1 or 2 seconds then wait is done and you will not wait rest 8-9 seconds...