Extract class name in scrapy

前端未结

关注

 3  1970

粉色の甜心

I am trying to scrape rating off of trustpilot.com.

Is it possible to extract a class name using scrapy? I am trying to scrape a rating which is made up of five indi

相关标签:

3条回答

隐瞒了意图╮

2021-01-19 12:00

I had a similar question. Using scrapy v1.5.1 I could extract attributes of elements by name. Here is an example used on Lowes; I did the same with the class attribute

    for product in response.css('ul.product-cards-grid li.product-wrapper'):
        prod_href = p.css('li::attr(data-producturl)').extract()
        prod_name = p.css('li::attr(data-producttitle)').extract_first()
        prod_img  = p.css('li::attr(data-productimg)').extract_first()
        prod_id   = p.css('li::attr(data-productid)').extract_first()

0 讨论(0)

长发绾君心

2021-01-19 12:10

You could use a combination of both somewhere in your code:

import re

classes = response.css('.star-rating').xpath("@class").extract()
for cls in classes:
    match = re.search(r'\bcount-\d+\b', cls)
    if match:
        print("Class = {}".format(match.group(0))

0 讨论(0)

死守一世寂寞

2021-01-19 12:13
You can extract rating directly using re_first() and re():
```
for rating in response.xpath('//div[contains(@class, "star-rating")]/@class').re(r'count-(\d+)'):
    print(rating)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...