I am trying to scrape rating off of trustpilot.com.
Is it possible to extract a class name using scrapy? I am trying to scrape a rating which is made up of five indi
I had a similar question. Using scrapy v1.5.1 I could extract attributes of elements by name. Here is an example used on Lowes; I did the same with the class
attribute
for product in response.css('ul.product-cards-grid li.product-wrapper'):
prod_href = p.css('li::attr(data-producturl)').extract()
prod_name = p.css('li::attr(data-producttitle)').extract_first()
prod_img = p.css('li::attr(data-productimg)').extract_first()
prod_id = p.css('li::attr(data-productid)').extract_first()
You could use a combination of both somewhere in your code:
import re
classes = response.css('.star-rating').xpath("@class").extract()
for cls in classes:
match = re.search(r'\bcount-\d+\b', cls)
if match:
print("Class = {}".format(match.group(0))
You can extract rating directly using re_first()
and re()
:
for rating in response.xpath('//div[contains(@class, "star-rating")]/@class').re(r'count-(\d+)'):
print(rating)