How to collect data from multiple pages into single data structure with scrapy

后端 未结 1 1390
一个人的身影
一个人的身影 2021-02-06 04:28

I am trying to scrape data from a site.The data is structured as multiple objects each with a set of data. For example, people with names, ages, and occupations.

My prob

1条回答
  •  滥情空心
    2021-02-06 04:43

    here is a way you need to deal. you need to yield/return item once when item has all attributes

    yield Request(page1,
                  callback=self.page1_data)
    
    def page1_data(self, response):
        hxs = HtmlXPathSelector(response)
        i = TestItem()
        i['name']='name'
        i['age']='age'
        url_profile_page = 'url to the profile page'
    
        yield Request(url_profile_page,
                      meta={'item':i},
        callback=self.profile_page)
    
    
    def profile_page(self,response):
        hxs = HtmlXPathSelector(response)
        old_item=response.request.meta['item']
        # parse other fileds
        # assign them to old_item
    
        yield old_item
    

    0 讨论(0)
提交回复
热议问题