scrapy: request url must be str or unicode, got Selector

老子叫甜甜 提交于 2020-07-19 08:54:16

问题


I am writing a spider using Scrapy, to scrape user details of Pinterest. I am trying to get the details of user and his followers ( and so on until the last node).

Below is the spider code:

from scrapy.spider import BaseSpider

import scrapy from pinners.items import PinterestItem from scrapy.http import FormRequest from urlparse import urlparse

class Sample(BaseSpider):

name = 'sample'
allowed_domains = ['pinterest.com']
start_urls = ['https://www.pinterest.com/banka/followers', ]

def parse(self, response):
    for base_url in response.xpath('//div[@class="Module User gridItem"]/a/@href'):
        list_a = response.urljoin(base_url.extract())
        for new_urls in response.xpath('//div[@class="Module User gridItem"]/a/@href'):
            yield scrapy.Request(new_urls, callback=self.Next)
    yield scrapy.Request(list_a, callback=self.Next)

def Next(self, response):
    href_base = response.xpath('//div[@class = "tabs"]/ul/li/a')
    href_board = href_base.xpath('//div[@class="BoardCount Module"]')
    href_pin = href_base.xpath('.//div[@class="Module PinCount"]')
    href_like = href_base.xpath('.//div[@class="LikeCount Module"]')
    href_followers = href_base.xpath('.//div[@class="FollowerCount Module"]')
    href_following = href_base.xpath('.//div[@class="FollowingCount Module"]')
    item = PinterestItem()
    item["Board_Count"] = href_board.xpath('.//span[@class="value"]/text()').extract()[0]
    item["Pin_Count"] = href_pin.xpath('.//span[@class="value"]/text()').extract()
    item["Like_Count"] = href_like.xpath('.//span[@class="value"]/text()').extract()
    item["Followers_Count"] = href_followers.xpath('.//span[@class="value"]/text()').extract()
    item["Following_Count"] = href_following.xpath('.//span[@class="value"]/text()').extract()
    item["User_ID"] = response.xpath('//link[@rel="canonical"]/@href').extract()[0]
    yield item

I get the following error:

raise TypeError('Request url must be str or unicode, got %s:' % type(url).__name__)
TypeError: Request url must be str or unicode, got Selector:

I did check the type of the list_a ( urls extracted). It gives me unicode.


回答1:


the error is generated by the inner for loop in the parse method:

for new_urls in response.xpath('//div[@class="Module User gridItem"]/a/@href'):
        yield scrapy.Request(new_urls, callback=self.Next)

the new_urls variable is actually a selector, please try something like this:

for base_url in response.xpath('//div[@class="Module User gridItem"]/a/@href'):
    list_a = response.urljoin(base_url.extract())        
    yield scrapy.Request(list_a, callback=self.Next)


来源:https://stackoverflow.com/questions/37604916/scrapy-request-url-must-be-str-or-unicode-got-selector

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!