问题
I tried to click "More" button for each review so that I can expand these text reviews to the full contents and then I try to scrape those text reviews. Without clicking "More" button, what I end up retrieving is something like
"This room was nice and clean. The location...More".
I tried a few different functions to figure it out such as selenium button click and ActionChain but I guess I'm not using these properly. Could someone help me out with this issue?
Below is my current code: I didn't upload the whole code to avoid some unnecessary outputs (tried to make it simple).
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)
#url I want to visit.
lists=['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']
for k in lists:
driver.get(k)
html =driver.page_source
soup=BeautifulSoup(html,"html.parser")
time.sleep(3)
listing=soup.find_all("div", class_="review-container")
for i in range(len(listing)):
try:
#First, I tried this but didn't work.
#link = driver.find_element_by_link_text('More')
#driver.execute_script("arguments[0].click();", link)
#Second, I tried ActionaChains but didn't work.
ActionChains(driver).move_to_element(i).click().perform()
except:
pass
text_review=soup.find_all("div", class_="prw_rup prw_reviews_text_summary_hsx")
text_review_inside=text_review[i].find("p", class_="partial_entry")
review_text=text_review_inside.text
print (review_text)
回答1:
Your the biggest mistake in all this code is except: pass.
Without this you would resolve problem long time ago. Code raise error message with all information but you can't see it. You could at least use
except Exception as ex:
print(ex)
Problem is that move_to_element()
will not work with BeautifulSoup
elements. I has to be Selenium's element - like
link = driver.find_element_by_link_text('More')
ActionChains(driver).move_to_element(link)
But after executing some functions Selenium needs some time to do it - and Python has to wait awaile.
I don't use BeautifulSoup
to get data but if you want to use it then get driver.page_source
after clicking all links. Or you will have to get again and again driver.page_source
after every click.
Sometimes after clicking you may have to get again even Selenium elements - so I first get entry to click More
and later I get partial_entry
to get reviews.
I found that clicking More
in first review it shows text for all reviews so it doesn't need to click on all More
.
Tested with Firefox 69, Linux Mint 19.2, Python 3.7.5, Selenium 3.141
#from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
import time
#Incognito Mode
option = webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
#driver = webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)
driver = webdriver.Firefox()
#url I want to visit.
lists = ['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']
for url in lists:
driver.get(url)
time.sleep(3)
link = driver.find_element_by_link_text('More')
try:
ActionChains(driver).move_to_element(link)
time.sleep(1) # time to move to link
link.click()
time.sleep(1) # time to update HTML
except Exception as ex:
print(ex)
description = driver.find_element_by_class_name('vr-overview-Overview__propertyDescription--1lhgd')
print('--- description ---')
print(description.text)
print('--- end ---')
# first "More" shows text in all reviews - there is no need to search other "More"
first_entry = driver.find_element_by_class_name('entry')
more = first_entry.find_element_by_tag_name('span')
try:
ActionChains(driver).move_to_element(more)
time.sleep(1) # time to move to link
more.click()
time.sleep(1) # time to update HTML
except Exception as ex:
print(ex)
all_reviews = driver.find_elements_by_class_name('partial_entry')
print('all_reviews:', len(all_reviews))
for i, review in enumerate(all_reviews, 1):
print('--- review', i, '---')
print(review.text)
print('--- end ---')
EDIT:
To skip responses I search all class="wrap"
and then inside every wrap I search class="partial_entry"
. I every wrap can be only one review and eventually one response. Review has alwasy index [0]
. Some wraps don't keep review so they will gives empty list - and I have to check it before I can get element [0]
from list.
all_reviews = driver.find_elements_by_class_name('wrap')
#print('all_reviews:', len(all_reviews))
for review in all_reviews:
all_entries = review.find_elements_by_class_name('partial_entry')
if all_entries:
print('--- review ---')
print(all_entries[0].text)
print('--- end ---')
来源:https://stackoverflow.com/questions/58550908/python-click-more-button-is-not-working