Scrapy multiple requests and fill single item

前端 未结 1 1185
别跟我提以往
别跟我提以往 2021-01-01 05:17

I need to make 2 request to different urls and put that information to the same item. I have tried this method, but the result is written in different rows. The callbacks

相关标签:
1条回答
  • 2021-01-01 05:55

    Since scrapy is asynchronious you need to chain your requests manually. For transfering data between requests you can use Request's meta attribute:

    def parse(self, response):
        item = dict()
        item['name'] = 'foobar'
        yield request('http://someurl.com', self.parse2,
                      meta={'item': item})
    
    def parse2(self, response):
        print(response.meta['item'])
        # {'name': 'foobar'}
    

    In your case you end up with a split chain when you should have one continuous chain.
    Your code should look something like this:

    def parse_companies(self, response):
        data = json.loads(response.body)
        if not data:
            return
        for company in data:
            item = ThalamusItem()
            comp_id = company["id"]
            url = self.request_details_URL + str(comp_id) + ".json"
            url2 = self.request_contacts + str(comp_id)
            request = Request(url, callback=self.parse_details,
                              meta={'url2': url2, 'item': item})
            yield request
    
    def parse_details(self, response):
        item = response.meta['item']
        url2 = response.meta['url2']
        item['details'] = ''  # add details
        yield Request(url2, callback=self.parse_contacts, meta={'item': item})
    
    def parse_contacts(self, response):
        item = response.meta['item']
        item['contacts'] = ''  # add details
        yield item
    
    0 讨论(0)
提交回复
热议问题