I am scraping the T-Mobile website for reviews on Samsung Galaxy S9. I am able to create a Beautiful Soup object for the HTML code, but I cannot fetch the text of reviews which
You are not getting the data due to dynamic content loading through script. You can try selenium along with scrapy.
import scrapy
from selenium import webdriver
from scrapy.http import HtmlResponse
class ProductSpider(scrapy.Spider):
name = "product_spider"
allowed_domains = ['t-mobile.com']
start_urls = ['https://www.t-mobile.com/cell-phone/samsung-galaxy-s9']
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
body = str.encode(self.driver.page_source)
self.parse_response(HtmlResponse(self.driver.current_url, body=body, encoding='utf-8'))
def parse_response(self, response):
tmo_ratings_s9 = []
for review in response.css('#reviews div.BVRRContentReview'):
text = review.css('.BVRRReviewText::text').get().strip()
tmo_ratings_s9.append(text)
print(tmo_ratings_s9)
def spider_closed(self, spider, reason):
self.driver.close()
first if you are using google chrome or mozilla firefox please press ctrl+u from the page, then you will go to the page source. Check if the review content is present anywhere in the source by searching some keywords. If present write the xpath of that data, if not present, check the network section for any json requests sending while the page loads, if not present you will have to use selenium.
In your case send request to this page https://t-mobile.ugc.bazaarvoice.com/9060redes2-en_us/E4F08F7E-8C29-4420-BE87-9226A6C0509D/reviews.djs?format=embeddedhtml
This is a json request send while loading the whole page.
use selenium or webscraper.io
https://www.webscraper.io/
https://www.seleniumhq.org/docs/01_introducing_selenium.jsp