Scrapy: Follow link to get additional Item data?

前端 未结 3 719
无人共我
无人共我 2020-11-29 00:32

I don\'t have a specific code issue I\'m just not sure how to approach the following problem logistically with the Scrapy framework:

The structure of the data I want

相关标签:
3条回答
  • 2020-11-29 00:52

    Please, first read the docs to understand what i say.

    The answer:

    To scrape additional fields which are on other pages, in a parse method extract URL of the page with additional info, create and return from that parse method a Request object with that URL and pass already extracted data via its meta parameter.

    how do i merge results from target page to current page in scrapy?

    0 讨论(0)
  • 2020-11-29 00:59

    An example from scrapy documentation:

    def parse_page1(self, response):
        item = MyItem()
        item['main_url'] = response.url
        request = scrapy.Request("http://www.example.com/some_page.html",
                                 callback=self.parse_page2)
        request.meta['item'] = item
        yield request
    
    def parse_page2(self, response):
        item = response.meta['item']
        item['other_url'] = response.url
        yield item
    
    0 讨论(0)
  • 2020-11-29 01:07

    You can also use Python functools.partial to pass an item or any other serializable data via additional arguments to the next Scrapy callback.

    Something like:

    import functools
    
    # Inside your Spider class:
    
    def parse(self, response):
      # ...
      # Process the first response here, populate item and next_url.
      # ...
      callback = functools.partial(self.parse_next, item, someotherarg)
      return Request(next_url, callback=callback)
    
    def parse_next(self, item, someotherarg, response):
      # ...
      # Process the second response here.
      # ...
      return item
    
    0 讨论(0)
提交回复
热议问题