How to control the order of yield in Scrapy

牧云@^-^@ 提交于 2020-01-01 12:01:37

问题


Help! Reading the following scrapy code and the result of crawler. I want to crawl some data from http://china.fathom.info/data/data.json, and only Scrapy is allowed. But I don't know how to control the order of yield. I look forward to process all parse_member request in the loop and then return the group_item, but seems yield item is always executed before yield request.

start_urls = [
    "http://china.fathom.info/data/data.json"
]

def parse(self, response):
    groups = json.loads(response.body)['group_members']
    for i in groups:
        group_item = GroupItem()
        group_item['name'] = groups[i]['name']
        group_item['chinese'] = groups[i]['chinese']
        group_item['members'] = []

        members = groups[i]['members']
        for member in members:
            yield Request(self.person_url % member['id'], meta={'group_item': group_item, 'member': member},
                          callback=self.parse_member, priority=100)
        yield group_item

def parse_member(self, response):
    group_item = response.meta['group_item']
    member = response.meta['member']
    person = json.loads(response.body)
    ego = person['ego']
    group_item['members'].append({
        'id': ego['id'],
        'name': ego['name'],
        'chinese': ego['chinese'],
        'role': member['role']
    })

Data on MongoDB


回答1:


you need to yield the item on the final callback, parse isn't stopping for parse_member to finish, so the group_item in parse isn't changing while parse_member is working.

Don't yield the group_item of parse, just the one on parse_member, as you already copied the previous item on meta and you already recovered it on parse_member with response.meta['group_item']



来源:https://stackoverflow.com/questions/33875339/how-to-control-the-order-of-yield-in-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!