Scraping with selenium and BeautifulSoup doesn´t return all the items in the page

六月ゝ 毕业季﹏ 提交于 2021-02-11 12:29:41

问题


So I came from the question here

Now I am able to interact with the page, scroll down the page, close the popup that appears and click at the bottom to expand the page.

The problem is when I count the items, the code only returns 20 and it should be 40.

I have checked the code again and again - I'm missing something but I don't know what.

See my code below:

from selenium import webdriver 
from bs4 import BeautifulSoup
import pandas as pd
import time
import datetime

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
#options.add_argument('--headless')
driver = webdriver.Chrome(executable_path=r"C:\\chromedriver.exe", options=options)

url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'

driver.get(url)  

iter=1
while True:
        scrollHeight = driver.execute_script("return document.documentElement.scrollHeight")
        Height=10*iter
        driver.execute_script("window.scrollTo(0, " + str(Height) + ");")
        
        if Height > scrollHeight:
            print('End of page')
            break
        iter+=1

time.sleep(3)

popup = driver.find_element_by_class_name('confirm').click()

time.sleep(3)

ver_mas = driver.find_elements_by_class_name('button-load-more')

for x in range(len(ver_mas)):

  if ver_mas[x].is_displayed():
      driver.execute_script("arguments[0].click();", ver_mas[x])
      time.sleep(10)

page_source = driver.page_source

soup = BeautifulSoup(page_source, 'lxml')
# print(soup)

items = soup.find_all('div',class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))
````=

What is wrong?. I newbie in the scraping world.

Regards

回答1:


Your while and for statements don't work as intended.

  1. Using while True: is a bad practice
  2. You scroll until the bottom - but the button-load-more button isn't displayed there - and Selenium will not find it as displayed
  3. find_elements_by_class_name - looks for multiple elements - the page has only one element with that class
  4. if ver_mas[x].is_displayed(): if you are lucky this will be executed only once because the range is 1

Below you can find the solution - here the code looks for the button, moves to it instead of scrolling, and performs a click. If the code fails to found the button - meaning that all the items were loaded - it breaks the while and moves forward.

url = 'https://www.coolmod.com/componentes-pc-procesadores?f=375::No'

driver.get(url)
time.sleep(3)
popup = driver.find_element_by_class_name('confirm').click()

iter = 1
while iter > 0:
    time.sleep(3)
    try:
        ver_mas = driver.find_element_by_class_name('button-load-more')
        actions = ActionChains(driver)
        actions.move_to_element(ver_mas).perform()
        driver.execute_script("arguments[0].click();", ver_mas)

    except NoSuchElementException:
        break
    iter += 1

page_source = driver.page_source

soup = BeautifulSoup(page_source, 'lxml')
# print(soup)

items = soup.find_all('div', class_='col-xs-12 col-sm-6 col-sm-6 col-md-6 col-lg-3 col-product col-custom-width')
print(len(items))


来源:https://stackoverflow.com/questions/65833515/scraping-with-selenium-and-beautifulsoup-doesn%c2%b4t-return-all-the-items-in-the-pag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!