问题
I'm trying to scrape all the links available in an infinite page, scrolling down and getting the new links available. However, time.sleep() does not allow to pause the driver for a reasonable time, before scrolling down again and again.
Is there any way to adjust the code that you can find at the bottom to reduce the number of sleep during the first iterations (when the page still loads the new content fast) and wait for the necessary time for the next iterations (when the page will load the new content slowly)?
Using the simple
for i in range(1,20):
time.sleep(i)
would not make me save time during the first iterations and would not adjust the time.sleep() efficiently after many iterations.
Here is the code I am using from a suggestion found in "How can I scroll a web page using selenium webdriver in python?":
from selenium import webdriver
scroll_pause_time = 5
scraped_links = []
driver = webdriver.Chrome(executable_path=driver_path)
driver.get(url)
links = driver.find_elements_by_xpath(links_filepath)
for link in links:
if link not in scraped_links:
scraped_links.append(link)
print(link)
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(scroll_pause_time)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
links = driver.find_elements_by_xpath(links_filepath)
for link in links:
if link not in scraped_links:
scraped_links.append(link)
print(link)
After 20-30 iterations the code breaks because time.sleep() is too low compared to the refreshing speed of the webpage.
回答1:
If you do not want to guess each time how long does it take to load the page and set some random seconds to sleep, you can use Explicit Waits. Example:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
except common.exceptions.TimeoutException:
print('TimeoutException')
finally:
driver.quit()
# do what you want after necessary elements are loaded
This will solve the problem when time.sleep() becomes too low compared to the refreshing speed of the webpage.
来源:https://stackoverflow.com/questions/52466038/python-selenium-adjust-pause-time-to-scroll-down-in-infinite-page