Get text inside a span class of a particular div

前端未结

关注

 3  1046

I am scraping the T-Mobile website for reviews on Samsung Galaxy S9. I am able to create a Beautiful Soup object for the HTML code, but I cannot fetch the text of reviews which

相关标签:

3条回答

眼角桃花

2021-01-29 11:42

You are not getting the data due to dynamic content loading through script. You can try selenium along with scrapy.

import scrapy
from selenium import webdriver
from scrapy.http import HtmlResponse

class ProductSpider(scrapy.Spider):
    name = "product_spider"
    allowed_domains = ['t-mobile.com']
    start_urls = ['https://www.t-mobile.com/cell-phone/samsung-galaxy-s9']

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)
        body = str.encode(self.driver.page_source)
        self.parse_response(HtmlResponse(self.driver.current_url, body=body, encoding='utf-8'))

    def parse_response(self, response):
        tmo_ratings_s9 = []
        for review in response.css('#reviews div.BVRRContentReview'):
            text = review.css('.BVRRReviewText::text').get().strip()
            tmo_ratings_s9.append(text)

        print(tmo_ratings_s9)

    def spider_closed(self, spider, reason):
        self.driver.close()

0 讨论(0)

小鲜肉

2021-01-29 11:44

first if you are using google chrome or mozilla firefox please press ctrl+u from the page, then you will go to the page source. Check if the review content is present anywhere in the source by searching some keywords. If present write the xpath of that data, if not present, check the network section for any json requests sending while the page loads, if not present you will have to use selenium.

In your case send request to this page https://t-mobile.ugc.bazaarvoice.com/9060redes2-en_us/E4F08F7E-8C29-4420-BE87-9226A6C0509D/reviews.djs?format=embeddedhtml

This is a json request send while loading the whole page.

0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-01-29 12:00

use selenium or webscraper.io

https://www.webscraper.io/

https://www.seleniumhq.org/docs/01_introducing_selenium.jsp

0 讨论(0)
发布评论:

提交评论
- 加载中...