Passing a argument to a callback function

删除回忆录丶 提交于 2021-01-20 17:47:07

问题


def parse(self, response):
    for sel in response.xpath('//tbody/tr'):
        item = HeroItem()
        item['hclass'] = response.request.url.split("/")[8].split('-')[-1]
        item['server'] = response.request.url.split('/')[2].split('.')[0]
        item['hardcore'] = len(response.request.url.split("/")[8].split('-')) == 3
        item['seasonal'] = response.request.url.split("/")[6] == 'season'
        item['rank'] = sel.xpath('td[@class="cell-Rank"]/text()').extract()[0].strip()
        item['battle_tag'] = sel.xpath('td[@class="cell-BattleTag"]//a/text()').extract()[1].strip()
        item['grift'] = sel.xpath('td[@class="cell-RiftLevel"]/text()').extract()[0].strip()
        item['time'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip()
        item['date'] = sel.xpath('td[@class="cell-RiftTime"]/text()').extract()[0].strip()
        url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip()

        yield Request(url, callback=self.parse_profile)

def parse_profile(self, response):
    sel = Selector(response)
    item = HeroItem()
    item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
    return item

Well, I'm scraping a whole table in the main parse method and I have taken several fields from that table. One of these fields is an url and I want to explore it to get a whole new bunch of fields. How can I pass my already created ITEM object to the callback function so the final item keeps all the fields?

As it is shown in the code above, I'm able to save the fields inside the url (code at the moment) or only the ones in the table (simply write yield item) but I can't yield only one object with all the fields together.

I have tried this, but obviously, it doesn't work.

yield Request(url, callback=self.parse_profile(item))

def parse_profile(self, response, item):
    sel = Selector(response)
    item['weapon'] = sel.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
    return item

回答1:


This is what you'd use the meta Keyword for.

def parse(self, response):
    for sel in response.xpath('//tbody/tr'):
        item = HeroItem()
        # Item assignment here
        url = 'https://' + item['server'] + '.battle.net/' + sel.xpath('td[@class="cell-BattleTag"]//a/@href').extract()[0].strip()

        yield Request(url, callback=self.parse_profile, meta={'hero_item': item})

def parse_profile(self, response):
    item = response.meta.get('hero_item')
    item['weapon'] = response.xpath('//li[@class="slot-mainHand"]/a[@class="slot-link"]/@href').extract()[0].split('/')[4]
    yield item

Also note, doing sel = Selector(response) is a waste of resources and differs from what you did earlier, so I changed it. It's automatically mapped in the response as response.selector, which also has the convenience shortcut of response.xpath.




回答2:


Here's a better way to pass args to callback function:

def parse(self, response):
    request = scrapy.Request('http://www.example.com/index.html',
                             callback=self.parse_page2,
                             cb_kwargs=dict(main_url=response.url))
    request.cb_kwargs['foo'] = 'bar'  # add more arguments for the callback
    yield request

def parse_page2(self, response, main_url, foo):
    yield dict(
        main_url=main_url,
        other_url=response.url,
        foo=foo,
    )

source: https://docs.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-callback-arguments




回答3:


@peduDev

Tried your approach but something failed due to an unexpected keyword.

scrapy_req = scrapy.Request(url=url, 
callback=self.parseDetailPage,
cb_kwargs=dict(participant_id=nParticipantId))


def parseDetailPage(self, response, participant_id ):
    .. Some code here..
    yield MyParseResult (
        .. some code here ..
        participant_id = participant_id
    )

Error reported
, cb_kwargs=dict(participant_id=nParticipantId)
TypeError: _init_() got an unexpected keyword argument 'cb_kwargs'

Any idea what caused the unexpected keyword argument other than perhaps an to old scrapy version?

Yep. I verified my own suggestion and after an upgrade it all worked as suspected.

sudo pip install --upgrade scrapy




回答4:


I had a similar issue with Tkinter's extra argument passing, and found this solution to work (here: http://infohost.nmt.edu/tcc/help/pubs/tkinter/web/extra-args.html), converted to your problem:

def parse(self, response):
    item = HeroItem()
    [...]
    def handler(self = self, response = response, item = item):
        """ passing as default argument values """
        return self.parse_profile(response, item)
    yield Request(url, callback=handler)


来源:https://stackoverflow.com/questions/32252201/passing-a-argument-to-a-callback-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!