I am trying to scrape data from a site.The data is structured as multiple objects each with a set of data. For example, people with names, ages, and occupations.
My prob
here is a way you need to deal. you need to yield/return item once when item has all attributes
yield Request(page1,
callback=self.page1_data)
def page1_data(self, response):
hxs = HtmlXPathSelector(response)
i = TestItem()
i['name']='name'
i['age']='age'
url_profile_page = 'url to the profile page'
yield Request(url_profile_page,
meta={'item':i},
callback=self.profile_page)
def profile_page(self,response):
hxs = HtmlXPathSelector(response)
old_item=response.request.meta['item']
# parse other fileds
# assign them to old_item
yield old_item