Scrapy: Follow link to get additional Item data?

前端未结

关注

 3  719

I don\'t have a specific code issue I\'m just not sure how to approach the following problem logistically with the Scrapy framework:

The structure of the data I want

相关标签:

3条回答

时光取名叫无心

2020-11-29 00:52

Please, first read the docs to understand what i say.

The answer:

To scrape additional fields which are on other pages, in a parse method extract URL of the page with additional info, create and return from that parse method a Request object with that URL and pass already extracted data via its meta parameter.

how do i merge results from target page to current page in scrapy?

0 讨论(0)
发布评论:

提交评论
- 加载中...

轮回少年

2020-11-29 00:59

An example from scrapy documentation:

def parse_page1(self, response):
    item = MyItem()
    item['main_url'] = response.url
    request = scrapy.Request("http://www.example.com/some_page.html",
                             callback=self.parse_page2)
    request.meta['item'] = item
    yield request

def parse_page2(self, response):
    item = response.meta['item']
    item['other_url'] = response.url
    yield item

0 讨论(0)

小蘑菇

2020-11-29 01:07

You can also use Python functools.partial to pass an item or any other serializable data via additional arguments to the next Scrapy callback.

Something like:

import functools

# Inside your Spider class:

def parse(self, response):
  # ...
  # Process the first response here, populate item and next_url.
  # ...
  callback = functools.partial(self.parse_next, item, someotherarg)
  return Request(next_url, callback=callback)

def parse_next(self, item, someotherarg, response):
  # ...
  # Process the second response here.
  # ...
  return item

0 讨论(0)