Python click 'More' button is not working

喜夏-厌秋 提交于 2020-01-25 08:05:26

问题


I tried to click "More" button for each review so that I can expand these text reviews to the full contents and then I try to scrape those text reviews. Without clicking "More" button, what I end up retrieving is something like
"This room was nice and clean. The location...More".

I tried a few different functions to figure it out such as selenium button click and ActionChain but I guess I'm not using these properly. Could someone help me out with this issue?

Below is my current code: I didn't upload the whole code to avoid some unnecessary outputs (tried to make it simple).

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains

#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")

#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)

#url I want to visit.
lists=['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']

for k in lists:

    driver.get(k)
    html =driver.page_source
    soup=BeautifulSoup(html,"html.parser")
    time.sleep(3)
    listing=soup.find_all("div", class_="review-container")

    for i in range(len(listing)):

        try:
            #First, I tried this but didn't work.
            #link = driver.find_element_by_link_text('More')
            #driver.execute_script("arguments[0].click();", link)

            #Second, I tried ActionaChains but didn't work.
            ActionChains(driver).move_to_element(i).click().perform()
        except:
            pass

        text_review=soup.find_all("div", class_="prw_rup prw_reviews_text_summary_hsx")
        text_review_inside=text_review[i].find("p", class_="partial_entry")
        review_text=text_review_inside.text

        print (review_text)

回答1:


Your the biggest mistake in all this code is except: pass. Without this you would resolve problem long time ago. Code raise error message with all information but you can't see it. You could at least use

except Exception as ex:
    print(ex)

Problem is that move_to_element() will not work with BeautifulSoup elements. I has to be Selenium's element - like

link = driver.find_element_by_link_text('More')

ActionChains(driver).move_to_element(link)

But after executing some functions Selenium needs some time to do it - and Python has to wait awaile.

I don't use BeautifulSoup to get data but if you want to use it then get driver.page_source after clicking all links. Or you will have to get again and again driver.page_source after every click.

Sometimes after clicking you may have to get again even Selenium elements - so I first get entry to click More and later I get partial_entry to get reviews.

I found that clicking More in first review it shows text for all reviews so it doesn't need to click on all More.

Tested with Firefox 69, Linux Mint 19.2, Python 3.7.5, Selenium 3.141


#from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
import time

#Incognito Mode
option = webdriver.ChromeOptions()
option.add_argument("--incognito")

#Open Chrome
#driver = webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)

driver = webdriver.Firefox()

#url I want to visit.
lists = ['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']

for url in lists:

    driver.get(url)
    time.sleep(3)

    link = driver.find_element_by_link_text('More')

    try:
        ActionChains(driver).move_to_element(link)
        time.sleep(1) # time to move to link

        link.click()
        time.sleep(1) # time to update HTML
    except Exception as ex:
        print(ex)

    description = driver.find_element_by_class_name('vr-overview-Overview__propertyDescription--1lhgd')
    print('--- description ---')
    print(description.text)
    print('--- end ---')

    # first "More" shows text in all reviews - there is no need to search other "More"
    first_entry = driver.find_element_by_class_name('entry')
    more = first_entry.find_element_by_tag_name('span')

    try:
        ActionChains(driver).move_to_element(more)
        time.sleep(1) # time to move to link

        more.click()
        time.sleep(1) # time to update HTML
    except Exception as ex:
        print(ex)

    all_reviews = driver.find_elements_by_class_name('partial_entry')
    print('all_reviews:', len(all_reviews))

    for i, review in enumerate(all_reviews, 1):
        print('--- review', i, '---')
        print(review.text)
        print('--- end ---')

EDIT:

To skip responses I search all class="wrap" and then inside every wrap I search class="partial_entry". I every wrap can be only one review and eventually one response. Review has alwasy index [0]. Some wraps don't keep review so they will gives empty list - and I have to check it before I can get element [0] from list.

all_reviews = driver.find_elements_by_class_name('wrap')
#print('all_reviews:', len(all_reviews))

for review in all_reviews:
    all_entries = review.find_elements_by_class_name('partial_entry')
    if all_entries:
        print('--- review ---')
        print(all_entries[0].text)
        print('--- end ---')


来源:https://stackoverflow.com/questions/58550908/python-click-more-button-is-not-working

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!