问题
So I came from the question here
Now I am able to interact with the page, scroll down the page, close the popup that appears and click at the bottom to expand the page.
The problem is when I count the items, the code only returns 20 and it should be 40.
I have checked the code again and again - I'm missing something but I don't know what.
See my code below:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import datetime
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
#options.add_argument('--headless')
driver = webdriver.Chrome(executable_path=r"C:\\chromedriver.exe", options=options)
url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'
driver.get(url)
iter=1
while True:
scrollHeight = driver.execute_script("return document.documentElement.scrollHeight")
Height=10*iter
driver.execute_script("window.scrollTo(0, " + str(Height) + ");")
if Height > scrollHeight:
print('End of page')
break
iter+=1
time.sleep(3)
popup = driver.find_element_by_class_name('confirm').click()
time.sleep(3)
ver_mas = driver.find_elements_by_class_name('button-load-more')
for x in range(len(ver_mas)):
if ver_mas[x].is_displayed():
driver.execute_script("arguments[0].click();", ver_mas[x])
time.sleep(10)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
# print(soup)
items = soup.find_all('div',class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))
````=
What is wrong?. I newbie in the scraping world.
Regards
回答1:
Your while
and for
statements don't work as intended.
- Using
while True:
is a bad practice - You scroll until the bottom - but the
button-load-more
button isn't displayed there - and Selenium will not find it as displayed find_elements_by_class_name
- looks for multiple elements - the page has only one element with that classif ver_mas[x].is_displayed():
if you are lucky this will be executed only once because the range is 1
Below you can find the solution - here the code looks for the button, moves to it instead of scrolling, and performs a click. If the code fails to found the button - meaning that all the items were loaded - it breaks the while
and moves forward.
url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'
driver.get(url)
time.sleep(3)
popup = driver.find_element_by_class_name('confirm').click()
iter = 1
while iter > 0:
time.sleep(3)
try:
ver_mas = driver.find_element_by_class_name('button-load-more')
actions = ActionChains(driver)
actions.move_to_element(ver_mas).perform()
driver.execute_script("arguments[0].click();", ver_mas)
except NoSuchElementException:
break
iter += 1
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
# print(soup)
items = soup.find_all('div', class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))
来源:https://stackoverflow.com/questions/65833515/scraping-with-selenium-and-beautifulsoup-doesn%c2%b4t-return-all-the-items-in-the-pag